Wizard

COMP 524: Programming Language Concepts

Spring, 2008
Jeff Terrell
jsterrel AT cs.unc.edu
(919) 962-1791 (office: Sitterson 138)

COMP 524 Exercise 1 (Python)

General Instructions

First, review the assignment submission policy in the syllabus. Note that there is no collaboration allowed.

This assignment is due at 11:59pm on Tuesday, January 29. Submit assignments to me via email. All of your functions should be included in a file called 'exercise1.py', and 'exercise1.py' should include everything that you turn in. When I am finished grading your assignment, I will email you your grade and any comments that I have. There are a total of 100 points and 10 bonus points.

Remember: start early! I guarantee my availability during office hours, but not at 11pm the night it is due.

Update (Jan 15, 12:50pm): when you turn in exercise1.py, there should be no statements at the top level that produce output or require input. Merely provide the required functions and anything the functions need--do not include any code related to your testing or debugging. When I import your functions into another script, I want the import to be silently successful. Thank you.

My solutions are available.

1. Matching

Summary
Write a program that can recognize identifiers, number literals, and string literals.
Background
Working with strings is an essential part of a programming language. After all, what is a program except one big string? Python has a large set of string methods. In addition, Python supports regular expressions (RE's), which are a powerful tool for strings. The basic string methods will suffice for any task; however, RE's provide a powerful, concise way to perform many operations. Check out the RE howto.
Details

Write a function named categorize_token that accepts a single string as an argument. If the string is an identifier, return the string 'ID'; if the string is a number literal, return the string 'NUMBER_LIT'; and if the string is a string literal, return the string 'STRING_LIT'. Otherwise, return 'NONE'.

Identifiers start with a letter (uppercase or lowercase) or an underscore (_), and afterwards have some combination of letters, digits, or underscores, but no other characters. There can be multiple underscores at the beginning, but the first non-underscore character must be a letter.

Number literals start with an optional sign (either '+' or '-'). There must be at least one digit in the number. There may be a decimal point ('.') in the number, but not more than one. Other than the optional sign and optional decimal point, only digits may appear in a number literal.

String literals must begin and end with a tick ('), and any ticks inside the string must be immediately preceded by a backslash (\). There can be 0 or more characters inside the enclosing ticks. (Note: to pass a string literal to the function, you must use two sets of quotes; for example, categorize_token("'a string literal'"). The first set of quotes tells Python about the string; the second set are actually included in the string.)

FAQ
Q: For the string literal definition, what do you mean by 'character'?
A: Any character, including alphanumeric, whitespace, special ($, #, etc), and so on. The only character that you can easily type on your keyboard that I don't care about is the newline character. (Note that there is a built-in character class with this definition.)

Q: Is _ a valid identifier? What about __ or ___?
A: No, there must be at least one letter in the identifier.

Q: I'm having problems getting all those backslashes into my regular expressions. What should I do?
A: Use raw strings. That way, you don't have to escape your backslashes. In other words, "\n" is a newline character, but r"\n" is two characters, a backslash and an n.

Q: If I use raw strings in my regular expressions, do I need to have a work-around so that I can accept regular strings also?
A: No, raw string literals is merely a piece of syntactic sugar for string literals containing backslashes. They are both represented as strings behind the scenes, and in fact I'm pretty sure there's no way to distinguish them from within a program. So, if your categorize_token() function uses raw strings, you don't need to add support for regular strings--they are already supported.
Points
25 points. 5 bonus points if you use regular expressions exclusively. 5 bonus points if you can find and fix a bug with the definition of the string literals. (My definition of a bug: there exists a string which cannot be represented with the current definition.)
Par (i.e. instructor's solution time)
20 minutes (using regular expressions)
23 minutes extra for the string literal bug fix

2. Counting Characters

Summary
Count each character appearing in a string.
Background
Dictionaries are a very useful built-in type. This problem gives you practice using them. Check out the tutorial section on dictionaries and the library reference on dicts for more details.
Details

Write a function named count_chars that accepts a single string as an argument. The function counts the frequency of each character in the string, and returns a dictionary of the results. The keys of the dictionary are the unique characters in the string, and the values are the corresponding counts.

For example:
>>> count_chars('banana')
{'a': 3, 'b': 1, 'n': 2}

FAQ
No questions yet.
Points
15 points.
Par
6 minutes

3. Mutable Immutables

Summary
"Modify" a tuple by creating a new one.
Background
Python tuples are like lists, except that they are (like strings) immutable (i.e. unchangeable). Immutable types are important because they can make a programming language's job much easier. (Why?) For example, only immutable objects can be used as keys in a dictionary. Trying to use mutable objects for dictionary keys is one of the most common mistakes when using Python. Note: to specify a single-element tuple, an extra comma is needed, e.g. (1,).
Details

Write a function named mod_tuple that accepts 3 arguments: a base tuple (call it base), a tuple of indexes (call it indexes), and a tuple of values (call it values). The length of indexes and values must be equal. The function returns a new tuple (not a list) in which every index of base has been updated to the corresponding value. If the lengths of indexes and values are not equal, return base unmodified. Do not worry about a specified index being out of bounds; I guarantee that this will not happen.

For example:
>>> mod_tuple( (7, 8, 9, 10, 42), (1, 2, 3), ('X', 'Yo!') )
(7, 8, 9, 10, 42)
>>> mod_tuple( (7, 8, 9, 10, 42), (1, 3), ('X', 'Yo!') )
(7, 'X', 9, 'Yo!', 42)

FAQ
No questions yet.
Points
15 points
Par
10 minutes

4. Nested Lists

Summary
Add and concatenate items in a nested list.
Background
Lists can contain lists (which can contain lists, and so on). Typically, a nested list is a giveaway that you should use recursion. Note: for this problem, you will need to use the type function and module (i.e. import types or from types import *).
Details

Write a function named reduce_list that accepts a single list as an argument. Note that this list might contain other lists, which might in turn contain yet more lists, and so on. The non-list "leaf" items will be either strings or numbers. The function will return a tuple of two items. The first item is a single string, the concatenation of every string item in the list. The second item is a single number, the sum of every number item in the list.

For example:
>>> reduce_list( [1, 17, ['EyeA', -2, ['mmS', 7], -7], [26, 'tring!']])
('EyeAmmString!', 42)

Note: the string concatenation depends on the order in which one encounters each string leaf. To avoid ambiguity, you must process each sub-list as it occurs (i.e. immediately recur when you encounter a sub-list). This is called an in-order traversal of a tree. (Note that a nested list is really a general tree.)

FAQ
No questions yet.
Points
20 points
Par
10 minutes

5. Memoization with dicts

Summary
Create a recursive function to compute Fibonacci numbers, and explore the concept of memoization using Python's dictionary objects.
Background

Consider the following function to compute the i-th Fibonacci number:

def fibonacci(n):
    if n == 1 or n == 2:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)

This function is very inefficient because it ends up computing the same thing multiple times. For example, if you call fibonacci(21), then fibonacci(3) gets computed 4,181 times, and fibonacci(50) computes fibonacci(20) 1,346,269 times! Clearly, this solution will not scale to large numbers. (This sort of problem is called a branched recursion problem.)

This problem can be solved by a technique called memoization, also called "dynamic programming". Instead of computing fibonacci(N) many times, fibonacci(N) is computed exactly once, and the result is saved for future computations. To achieve memoization in Python, simply store the result of fibonacci(i) in a global dictionary, and check the dictionary before recurring.

Details
Write a function named fibonacci that accepts one argument: an integer specifying which Fibonacci number to compute. Use recursion and memoization as discussed above to achieve an efficient recursive solution. (You may assume that you will always receive a positive integer as input.)
FAQ
Q: You mean that I can define a dictionary outside of my fibonacci() function? I thought you said you didn't want anything outside of a function in exercise1.py.
A: Yes, you can define the dictionary outside of the function. I don't want anything that will hinder me importing your exercise1.py--specifically, input and output routines. But defining a global variable is fine.
Points
25 points. Full points awarded only for solutions involving recursion and memoization.
Par
23 minutes
exercise1.php: Last Modified: 02/06/08@14:10:21 | Size: 12157 bytes | View Source Valid XHTML 1.1 Valid CSS