So much of our knowledge is stated in words! This includes not only history and philosophy books, but technical descriptions used by chemists, mathematicians, and other scientists. We must be able to tell a computer how to compute on words, whether for translating from French to German, simulating a chemical reaction, solving a mathematical differential equation, or playing a word game.
Please read Dawson, Chapter 4, for an introduction to computing on strings.
We learned earlier that a string is a sequence of characters
surrounded by quotes, for example, "aBc e" and
'ab\nCD!\nE9' are strings. We also learned that strings can
be concatenated together with +, for example:
"aBc e" + 'ab\nCD!\nE9' computes to "aBc eab\nCD!\nE9"
We also learned that the methods, upper and lower,
will make new strings entirely in upper and lower case. For example,
"aBc e".upper() computes to "ABC E"
and when we define a variable, s,
s = "aBc e"
then
s.lower() computes to "abc e"
Here are some new tricks with strings:
"a" < "b" computes to True "bb" < "abc" computes to False "aa" < "a" computes to False "A" < "a" computes to True (the computer places upper-case letters first) "234" < "AB" computes to True (the computer orders numerals before letters)
data = raw_input("Please type an integer: ") if data.isdigit() : num = int(data) . . . else : print "error --- nonnumeric input"
hour = 11 minutes = 59 print hour, ":", minutes, "p.m."we see printed, 11 : 59 p.m. We can remove the extra blanks by using str and +:
hour = 11 minutes = 59 print str(hour) + ":" + str(minutes) + "p.m."This prints 11:59p.m.
0 1 2 3 4 +-+-+-+-+-+ |a|B|c| |e| +-+-+-+-+-+We can examine the individual characters from a string, S, with the indexing operation, S[i]:
s = "aBc e" print s[0]prints a, and
print s[1] + s[4]prints Be. We can build and save a new string from the characters we examine:
t = s[1] + s[4] # makes the new string, "Be", and assigns it to t
s = "aBc e" n = 2 print s[:n]prints aB, that is, characters 0 and 1 from string s.
m = 1 print s[m:] prints Bc e
n = 1 m = 3 print s[n:m] prints cbecause n has value 2, so the characters in the range of 2 up to (and not including 3) are printed. This is only s[2].
s = "abcde" s = s[1:] # this computes the suffix, "bcde", and assigns it to sThe assignment command erases the orginal string named by s and copies the new string, "bcde" into s's cell.
And here is how to replace the middle letter in the example string
by an "X":
s = "abcde"
middle = len(s) / 2
s = s[:middle] + "X" + s[middle+1:]
print s # prints "abXde"
We can use while loops to compute on the characters within a string. Here are some simple examples that illustrate the techniques.
The first example shows how to examine and print the individual
characters in a string:
FIGURE 1=========================
# PrintChars1
# prints a string's characters from left to right, one at a time:
text = raw_input("Type a string: ")
index = 0 # remembers which character we are looking at in text
while index < len(text) :
# invariant: we have printed text[0], text[1], ..., up to text[index-1]
print text[index]
index = index + 1
raw_input("\n\npress Enter to finish")
ENDFIGURE=============================
The loop counter, index, counts all the characters in text,
from 0 to len(text) - 1.
(For an input string
like text = "abcdef", we have that len(text) is 6,
so variable index counts from 0 to 5.)
If you try it, you will see:
$ python PrintChars1.py
Type a string: abc
a
b
c
For practice, we can print the characters from right to left like this:
FIGURE 2======================
# PrintChars2
# prints a string's characters from right to left:
text = raw_input("Type a string: ")
index = len(text) - 1 # remembers which character we are looking at in text
while index >= 0 :
# invariant: text[len(text) - 1] downto text[index + 1] have been printed
print text[index]
index = index - 1
raw_input("\n\npress Enter to finish")
ENDFIGURE===========================
Here, we start counting with the highest-numbered character, at position
len(string) - 1, and count downwards to 0. (For an input string
like text = "abcdef", we have that len(text) is 6,
so variable index counts from 5 to 0.)
This next program reads
a string and prints it in reversed order. We might want its behavior
to go like this:
$ python Reverse.py
Type a string: abcde
edcba
We can do the reversal as a ``game,'' where we examine
characters, one by one, from the input string and copy them
to the output string, like this:
input string: output string:
"abcde" ""
"bcde" "a"
"cde" "ba"
"de" "cba"
"e" "dcba"
"" "edcba"
A lot of what we do in programming is play little games like this one,
moving around bits of data.
We can play this game two ways. The first way merely examines each
character in the input string and copies it to the output string,
but in reverse order:
====================================
input = raw_input("Type a string: ")
output = ""
index = 0 # remembers which character we are looking at in input
while index < len(input) :
# invariant: output == input[index-1] ... input[1] input[0]
output = input[index] + output
index = index + 1
# at this point, output holds all the characters of input, reversed
print output
========================================
The second way to play the reversal game makes the input string appear to
``shrink'' into an empty string by repeatedly assigning shorter and
shorter suffixes to it:
==========================
input = raw_input("Type a string: ") # this is the original input
output = ""
while input != "" :
# invariant: the output string, reversed,
# appended to input equals the original input string
letter = input[0] # extract the front letter from what's left to move
output = letter + output
input = input[1:] # reassign input less its front letter
# at this point, input == "", meaning that output holds the reversed text:
print output
ENDFIGURE================================
Of course, the loop's test computes to True, and execution moves into the loop's body:
The first statement in the loop's body extracts the leading (zero-th) character from input and assigns it to variable letter Since letter does not yet exist in the namespace, it is created:
After completing the assignment to letter, the instruction counter moves to the next instruction, which appends the value in letter to the end of output. First, the expression, letter + output is computed to the string, "a" + "", which computes to the string, "a":
Next, the string, "a" is saved into variable output, destroying the old value:
The command at Line 9 shortens string input by one letter. This happens in two steps: First, the expression, input[1:] computes to a new string that looks like input, starting from character 1 and extending to the end:
Then, the string is saved in variable input:
Now the loop repeats, and again the character at position input[0] (it will be "b" this time) is moved into letter and appended to output (which becomes "ba") and so on. In this way, the loop ``plays'' the letter-shift ''game'' we saw at the beginning of this section.
The example showed us that variables that hold strings behave like variables that hold numbers, and operations like + behave a lot like arithmetic operations (like +). The example did hide a few technical details, however, about how strings are built within the computer, but there is no harm in hiding these details until we reach Chapter 5.
A string is an example of a sequence of elements (here, a sequence of characters). Python has a special command that can neatly march through a sequence, extracting its elements one at a time from left to right. The command is called a for-loop.
Here is a quick example. We can rewrite Figure 1, which printed
the characters of a string from left to right, like this:
FIGURE 4=======================================
text = raw_input("Type a string: ")
for letter in text :
print letter
raw_input("\n\npress Enter to finish")
ENDFIGURE========================================
It's just that simple!
The for-loop automatically
extracts the characters one by one and executes the loop's body each
time for each character.
Let's compare the original while-loop and the new for-loop,
both of which accomplish exactly the same work:
FIGURE 1: FIGURE 4:
index = 0
while index < len(text) : for letter in text :
letter = text[index] print letter
print letter
index = index + 1
The for-loop is simpler to write and is preferred
when we must examine all the characters of a string.
The syntax of the for-loop looks like this:
for VARIABLE in SEQUENCE :
COMMANDs
where VARIABLE is a variable name, and (for the moment)
SEQUENCE is an expression that computes to a string.
The semantics of the for-loop goes like this:
Let's rewrite the first version of the string-reversal program so that it
uses a for-loop:
FIGURE 5=====================================
# Reverse
# reverses the characters in a line of text
# assumed input: a line of text
# guaranteed output: the text with its characters reversed
input = raw_input("Type a line of text: ")
output = ""
for char in input :
# invariant: at the i-th iteration,
# output == input[i-1] input[i-2] ... input[1] input[0]
output = char + output
print output
raw_input("\n\npress Enter to finish")
ENDFIGURE=========================================
In the real world, searching for a missing object is an indefinite activity, because we do not know when we will find the object. Programs can also go ``searching''---for example, a program might search a computerized telephone directory for a person's telephone number. A small but good example of computerized searching is finding a specific character in a string. Searches use an important pattern of loop, so we develop the character-search example in detail.
Here is the first version of the problem:
Our program asks the user to type a string, s,
and then a character, c, to
find in s. The program will print, ``True'' or ``False'',
whether or not c is present in s.
Here is an example of the behavior:
$ python Search.py
Type a string: hurricane
Type a single char to search for: a
True
We can use a for-loop to examine each of the characters in the input string
to see if they match the character we search for:
======================================
# FindChar locates the leftmost occurrence of a character in a string.
# assumed inputs: s - the string to be searched
# c - the character to be found
# guaranteed output:
# if c is in s, then True is printed, else False is printed
s = raw_input("Type a string: ")
c = raw_input("Type a single char to search for: ")
found = False # remembers if we found c within s
for letter in s :
# invariant: found is correctly maintained:
# found == True exactly when c was found in s
if letter == c :
found = True
print found
raw_input("\n\npress Enter to finish")
==================================================
The key to the program is
the boolean variable, found --- it remembers
if we encounter character c within s.
The above solution can be simplified:
The Python language has a built-in operator,
named in, that can find a character within a string.
For example,
s = "uvwxyz"
c = "z"
print (c in s)
prints True.
The following program uses the in-operator:
======================================
# FindChar locates the leftmost occurrence of a character in a string.
# assumed inputs: s - the string to be searched
# c - the character to be found
# guaranteed output:
# if c is in s, then True is printed, else False is printed
s = raw_input("Type a string: ")
c = raw_input("Type a single char to search for: ")
print (c in s)
raw_input("\n\npress Enter to finish")
==================================================
The searching problem usually requires an answer more detailed than
just True or False.
Here is the second version of the problem:
Our program asks the user to type a string, s,
and the character, c, to
find in s. The program will print, one by one,
each of the characters in s, until it encounters c,
at which point it prints an exclamation mark (!) and halts.
(If c is not found in s, then of course all the letters
in s are printed, and no exclamation mark appears.
Here is an example of the behavior:
$ python SearchChar.py
Type a string: hurricane
Type a single char to search for: a
h u r r i c !
The search examines s's characters from left to right until we find an occurrence of c; the algorithm goes like this:
FIGURE========================================== # SearchChar locates the leftmost occurrence of a character in a string. # assumed inputs: s - the string to be searched # c - the character to be found # guaranteed output: the characters in s are printed one by one until # c is encountered --- then ! is printed. s = raw_input("Type a string: ") c = raw_input("Type a single char to search for: ") for letter in s : # invariant: have printed all letters examined so far in s if letter == c : print "!" break # exit this loop immediately else : print letter, # here, the loop terminated because either # (i) c was found and "!" was printed, or # (ii) the entire string was printed and c not found print raw_input("\npress Enter to finish") ENDFIGURE==========================================The example illustrates the standard programming pattern for searching a sequence:
for ITEM in SEQUENCE if ITEM SATISFIES THE SEARCH CRITERION : REMEMBER THE ITEM break else : ...That is, we search the items in a sequence, one by one, until we find the item we want. We break the loop at that point.
Here is a second use of the searching pattern: We again search a string, s, for a character c. This time, if we find c, we remove it from s.
The searching pattern tells us to write this for-loop:
# assume that we have a string, s, and a character, c :
for letter in s : # search all the characters in s for c:
if letter == c :
break # we found c, so we exit the loop immediately
When the computer executes break, it immediately leaves
the loop and proceeds to the first command after the loop.
But the loop isn't complete --- we must remember the index position
where letter resides in s. We use
an extra variable, index, to remember this:
index = 0 # the position of the letter in s that we are examining
for letter in s :
if letter == c :
break # we found c, so we exit the loop immediately!
else : # we must look further, so increment the index:
index = index + 1
We can use index in this way, because Python's for-loop searches
the string from left to right.
Here is the completed search program:
FIGURE 5: searching for a character in a string========================
# FindAndRemove
# locates the leftmost occurrence of a character in a string and removes it.
# assumed inputs: s - the string to be searched
# c - the character to be found
# guaranteed output: s without the first occurrence of c (if c is not
# in s, then s is printed unchanged)
s = raw_input("Type a string: ")
c = raw_input("Type a single char to search for: ")
index = 0 # the position of the letter in s we are examining
for letter in s :
# invariant: c is not any of s[0], s[1], ..., s[index-1]
if letter == c :
break # we found c, so we exit the loop immediately!
else : # we must look further, so increment the index:
index = index + 1
# Here, we have exited the loop. This happened for one of two reasons:
# (1) c == letter, meaning that s[index] == c
# (2) all of s was examined, c was not found, so index == len(s)
if index < len(s) : # did we find c at position index ?
s = s[:index] + s[index + 1:] # remove it
print s
raw_input("\n\npress Enter to finish")
ENDFIGURE================================================
Study the comments that follow after the for-loop --- they explain
how we use the value in variable index to deduce whether
or not character c was found in string s.
When a sequence of elements must be arranged or reordered, nested loops often give a solution. Here is a classic example: Given string, s, we must construct a new string that has all of s's characters but rearranged in alphabetical order. For example, if s is "butterfly", then its alphabetized form is "beflrttuy".
The algorithm for alphabetization will examine and copy the characters, one by one, from string s into a new, alphabetized string, which we call answer:
input : output : "butterfly" "" "utterfly" "b" "tterfly" "bu" "terfly" "btu" "erfly" "bttu" "rfly" "bettu" "fly" "berttu" "ly" "befrttu" "y" "beflrttu" "" "beflrttuy"This strategy is known as insertion sorting.
Our algorithm might be refined like this:
read s
output = ""
for letter in s
insert letter into its correct position in output (*)
Now, what is the algorithm for (*), the body of the for-loop ?
It's another for-loop, where we search within output for
the correct position for inserting letter --- this uses the
searching pattern yet again:
# commands for refining (*):
# First, search for the position in output where we should insert letter:
index = 0
for alpha in output :
if letter < alpha :
break
else :
index = index + 1
Next, insert letter at position index in output.
Here is the completed program:
FIGURE 6==============================================
# Sort
# alphabetizes a string.
# assumed input: s - a string
# guaranteed output: output - the characters in s arranged alphabetically
s = raw_input("Type word to sort: ")
output = "" # holds a sequence of alphabetized letters from s
for letter in s :
# invariant: at the i-th iteration,
# output holds the first i-1 letters in s, alphabetized
# Find the position in output where we should insert letter:
index = 0
for alpha in output :
# invariant: the characters,
# output[0], output[1], ..., output[index-1] are all <= letter
if letter < alpha : # insert letter just before alpha ?
break # yes
else : # no --- look at the next alpha in output
index = index + 1
# Insert letter at position index in output:
output = output[:index] + letter + output[index:]
print letter, "inserted at", index, "; output =", output # print trace info
# Here, we have finished the loops, and output holds all the letters in s:
print output
raw_input("\n\npress Enter to finish")
ENDFIGURE=============================================
Once the inner for-loop locates the position, index,
where letter should be inserted,
the command
output = output[:index] + letter + output[index:]
cleverly slices output into two pieces at position
index and inserts letter between the two pieces.
For example, if
output holds the string, "bttu", and
letter is "e", then we see that
index == 1
output[:index] == "b"
output[index:] == "ttu"
and the assignment to output computes this string:
output[:index] + letter + output[index:] == "b" + "e" + "ttu" == "bettu"
Alphabetization is a case of sorting, where a collection of elements are arranged to some standardized ordering. (For example, a dictionary is a collection of words, sorted alphabetically, and a library is a collection of books, sorted by the catalog numbers stamped on the books' spines.)
# RemoveDuplicateLetters # constructs a string that contains the # same letters as its argument except that all duplicate letters # are removed, e.g., for argument, "butterflies", the result is # "buterflis" # assumed input: s - the argument string # guaranteed output: a string that looks like s but with no duplicates
A string is a sequence of characters. In this sense, the previous examples showed us not only how to program with strings, but how to program with sequences.
What can we do with a sequence?
s = "abcde" for letter in s : print letter
s = "abdef" c = "c" t = s[:2] + c + s[2:] # computes "abcdef" and assigns it to t's cell
There are other sequences than strings. A tuple
is a sequence of any kind of values at all. In Python,
we write a tuple with parentheses and commas. Here is an example
of a tuple that lists the first 6 powers of 2:
(1, 2, 4, 8, 16, 32)
This is a sequence of five numbers, so that if we assign
powers_of_two = (1, 2, 4, 8, 16, 32)
then
print powers_of_two[3] # prints 8
For that matter,
print powers_of_two # prints (1, 2, 4, 8, 16, 32)
and
for num in powers_of_two
print num
prints
1
2
4
8
16
32
Tuples work just like strings!
Again, strings and tuples are both sequences, and their elements
are internally numbered the same way.
Recall that a string like "abcde"
is a sequence numbered like this:
0 1 2 3 4
+-+-+-+-+-+
|a|b|c|d|e|
+-+-+-+-+-+
Similarly, the tuple (1, 2, 4, 8, 16, 32) is a sequence
numbered like this:
0 1 2 3 4 5
+--+--+--+--+--+--+
( 1, 2, 4, 8,16,32)
+--+--+--+--+--+--+
So, a string is a sequence of characters, but a tuple is a sequence
of whatever you like.
You can make a tuple of strings,
listing the last four U.S. presidents:
("G.W. Bush", "W. Clinton", "G.H.W. Bush", "R. Reagan")
and here is a tuple listing a person's name, age, and annual salary:
("G.W. Bush", 60, 300000.00)
(It is ok to combine strings, numbers, and whatever into the same tuple.)
A tuple that holds only one element (e.g., the cyclists who
won the Tour de France bike race 7 times) is written with an extra comma,
like this:
("Lance Armstrong",)
Notice the extra comma --- it's ugly but it's required.
A tuple with no elements (this is acceptable!) looks like this:
()
We can also work with a tuple of strings. For example, the
words in a sentence might be saved within a tuple, like this:
sentence = ("this", "is", "a", "sentence")
We might even write a small program that reads the words one at a time
and collects them into the tuple. The program would be based
on the indefinite-iteration pattern used for input processing:
==============================
# CollectWords.py reads a sequence of words and collects them into a tuple.
# Inputs: a sequence of words, one per line, terminated by an empty line.
# Output: the words printed on one line as a complete sentence.
sentence = () # the tuple that holds the words
OK = True
while OK :
# invariant: all words read so far are saved in sentence
print sentence # print trace information
word = raw_input("Type a word (or just Enter, to quit): ")
if len(word) == 0 :
OK = False
else :
sentence = sentence + (word,) # add a single-tuple to end of sentence
# the loop is finished, so print the sentence on one line:
for w in sentence :
print w, # note the comma -- means ``no new line''
print # print a ``new line''
raw_input("\nPress Enter to finish.")
========================================
The novel part is the command that adds a new word to the end of the
tuple:
sentence = sentence + (word,)
We are required to place the word into a ``one-tuple'' of its
own, and then we concatenate the two tuples together. This is like
concatenating a one-character string to the end of another string.
As we saw above,
the indexing operation, _[_], works with tuples,
as do for-loops.
And so does +, which
appends tuples together:
presidents = ("G.W. Bush", "W. Clinton", "G.H.W. Bush")
presidents = presidents + ("R. Reagan",)
(Note: try to do this:
presidents = presidents + "R. Reagan" --- it will not work ---
you can append only a tuple to another tuple.
There is one other trick we can do with tuples that we did with strings:
we can check for membership.
For example, if letter is a string, we can ask if it is a vowel:
alphabet = "abcdefghijklmnopqrstuvwxyz"
vowels = ("a", "e", "i", "o", "u")
for letter in alphabet :
if letter in vowels :
print letter + " is a vowel"
The condition, if letter in vowels, checks if the string
named by letter is one of the strings contained in tuple,
vowels.
This of course prints
a is a vowel
e is a vowel
i is a vowel
o is a vowel
u is a vowel
Finally, it is possible to build a tuple whose elements are themselves tuples.
Here is an important example. Each pixel (colored dot) on your computer's display is built from three integers: a red-color content, expressed as an int in the range 0 to 255, a green-color content, expressed similarly, and a blue-color content. Pixels are sometimes called ``RGB-values.''
We define the RGB-values in Python as 3-tuples:
red = (255, 0, 0)
white = (255, 255, 255)
green = (0, 255, 0)
black = (0, 0, 0)
and so on.
A colored picture (such as your display) consists of a sequence of
such pixels, which are systematically repainted on your display
as needed.
We build the negative image of a pixel like this:
pixel = ...
negative_pixel = (255 - pixel[0], 255 - pixel[1], 255 - pixel[2])
This builds a new tuple and assigns it to
negative_pixel. The value of pixel is left unchanged.
We lessen by half a pixel's red content like this:
pixel = (pixel[0] / 2, pixel[1], pixel[2])
This example also builds a new tuple and overwrites the previous
tuple that was saved in pixel's cell.
In the next chapter,
we will learn how to build a digital image as a ``list''
of pixels.
Another example of a compound data value is a date:
date = (1, "January", 2000)
Try printing the date like this:
print date # This prints (1, "January", 2000)
and like this:
for item in date : # This prints 1 January 2000
print item, # Remember that the comma prevents a newline
print # Finally, print a newline
A third example of a compound value
is a computerized playing card:
card = (4, "diamonds")
Here is a computerized ``deck'' of such cards:
card_deck = ( ("hearts", 2), ("hearts", 3), ..., ("hearts", 10),
("hearts", "jack"), ("hearts", "queen"), ("hearts", "king"), ("hearts", "ace"),
("diamonds", 2), ... , ("diamonds", "ace")
("clubs", 2), ... , ("diamonds", "ace')
("spades", 2), ... , ("spades", "ace") )
The card deck is a tuple of cards --- a tuple of tuples.
Strings and tuples are sequences, and computing on sequences generates patterns of knowledge that are themselves sequences of some form. We example three forms of ``sequence knowledge'' here.
Recall the Python program that reads a string and builds its reversal.
The loop that reversed the string read the sequence of characters
one by one and prefixed them to an output string, like this:
==================================
input = raw_input("Type a string: ")
output = ""
index = 0 # remembers which character we are looking at in input
while index < len(input) :
# invariant: output == input[index-1] ... input[1] input[0]
output = input[index] + output
index = index + 1
# assert: output equals input reversed
print output
======================================
Note the form of the invariant --- it mentions a sequence, because
the loop's goal is to build a sequence of characters (in reverse).
The sequence in the invariant is written using indexing notation;
this is common and standard. Note also that index is
counting not only the number of characters that have been examined
but at the same time
it is also counting the number of loop repetitions --- each repetition
adds one more letter to the sequence in the invariant.
The second example we study shows how the loop invariant itself is
a sequence of small facts, where each fact is generated by one
repetition of the loop. Here is a program that examines a tuple
of nonnegative numbers, like
tup = (16, 3, 49, 22, 2, 34)
and locates the maximum, largest number in the tuple:
===========================================
tup = ...
# assert: tup is a nonempty tuple of nonnegative ints
max = -1
for num in tup :
# invariant: for all num examined so far, max >= num
# that is, at iteration i,
# max >= tup[0] and max >= tup[1] and ... and max >= tup[i-1]
# that is, at iteration i, max >= tup[j], for all j in 0..i-1
if num > max :
max = num
# assert: for all num in tup, max >= num
print max
===================================================
The
invariant is itself a sequence of small facts:
at iteration i, max >= tup[0] and max >= tup[1] and ... and max >= tup[i-1]
This sequence of facts
can be written more concisely in ``for all'' style, like this:
at iteration i, for all j in 0..i-1, max >= tup[j]
The for-all style of invariant appears often when a loop
must examine all the elements in a sequence.
The third form of sequence invariant is an assertion that we attach directly to a sequence itself (and not the loop that processes it). This is important when the information named by a variable is organized in a certain order, and the commands that update the variable must preserve the ordering.
Here is an example. Say that sorted is a sequence of
characters --- a string --- that is organized in alphabetical
order, like this:
sorted = "dgikmoqrrxz" # data invariant: sorted is alphabetized
We can attach an assertion --- a data invariant ---
that asserts sorted should always remain
alphabetized.
The data invariant is critical for writing this loop,
which reads a new character and attempts to insert the character
in the correct position within sorted:
==================================
sorted = ... # data invariant: sorted is alphabetized
letter = raw_input("Type a single letter: ")
# Find the position in sorted where we should insert letter:
index = 0
for alpha in sorted:
# loop invariant: for all j in 0..index-1, sorted[j] <= letter
# that is, sorted[0], sorted[1], ..., sorted[index-1] are all <= letter
if letter < alpha : # insert letter just before alpha ?
break # yes
else : # no --- look at the next alpha in output
index = index + 1
# Loop has finished.
# assert: for all j in 0..index-1, sorted[j] <= letter
# and either letter < sorted[index] or index = len(sorted)
# Insert letter at position index in sorted:
sorted = sorted[:index] + letter + sorted [index:]
# data invariant: sorted is alphabetized
======================================
Because we have the data invariant, we can calculate the correct loop
invariant, and because we have the loop invariant when the loop
finishes, we can calculate the correct data invariant at the program's
end!
This is a standard occurrence when we compute on sequences ---
We employ these ideas in the next chapter.
When we teach a computer to do a big job, we should think the same way and write a program that that solves smaller jobs one after the other --- we design, write, and test a program for each smaller job, and at the end, we connect together the programs into one big program.
The two subsections that follow show how two smaller programs can fit together to make one big program.
Tuples work well for collecting a sequence of values when we do not know in advance how many to collect. Here is an example: A program must read a sentence, extract all words from the sentence, and collect them into a tuple. Since do not know in advance how many words will be in the sentence, a tuple that can grow as needed is good structure for collecting the words.
Here is the behavior we want of our program, Words.py:
$ python Words.py
Type a sentence: my dog has fleas.
('my', 'dog', 'has', 'fleas')
The program's job is a
kind of ``slide-the-letter game,'' where each letter in the input
sentence is examined and copied from
the left column into the middle column, and the (word in the) middle
column is moved to the right column when a blank or punctuation mark is
encountered:
input sentence: next word: tuple of words:
------------------ ----------- ----------------
"my dog has fleas." "" ()
"y dog has fleas." "m" ()
" dog has fleas." "my" ()
The blank signals that the word is completed:
"dog has fleas." "" ("my",)
"og has fleas." "d" ("my",)
"g has fleas." "do" ("my",)
" has fleas." "dog" ("my",)
The blank signals that the word is completed:
" has fleas." "" ("my", "dog")
(the extra blank is ignored:)
"has fleas." "" ("my", "dog")
...
"s." "flea" ("my", "dog", "has")
"." "fleas" ("my", "dog", "has")
The punctuation signals that the word is completed:
"" "" ("my", "dog", "has", "fleas")
The algorithm must remember the input sentence, the next word,
and the tuple of words:
=============================
read sentence
next_word = ""
tuple = ()
for letter in sentence :
if letter is a separator (that is, " " or "." or "," or ....) :
add the next_word to the end of tuple (and forget about letter)
else :
add letter to the end of next_word
======================================
If we write the algorithm in Python, we have this:
====================================
sentence = raw_input("Type a sentence: ")
next_word = "" # holds the characters of each word that we collect
tuple = () # holds the words we collect
separators = (".", ",", ";", ":", "-", "?", "!", " ")
for letter in sentence :
if letter in separators : # at the end of a word ?
tuple = tuple + (next_word,) # add next_word to tuple
next_word = "" # prepare to collect another word
else :
next_word = next_word + letter # add to the word we are building
print "next_word =", next_word, "; tuple =", tuple # print trace info
=======================================
We placed all the separator characters in a tuple, so we can
ask if a letter is a separator, like this:
separators = (".", ",", ";", ":", "-", "?", "!", " ")
. . .
if letter in separators :
. . .
We could also place the separators into a string and ask the same question:
separators = ".,;:-?! "
. . .
if letter in separators :
. . .
Take your pick.
When we finish building the next_word, we place it in a
singleton tuple from
it and append it to variable tuple:
tuple = tuple + (next_word,)
When we test this program, we see that it operates fine except
when we have two separators in a row, like this:
"I eat; I sleep."
Try the example;
the semicolon followed by the space creates an ``empty word.''
Fortunately, the repair is easy and is shown in the program that follows.
FIGURE======================================
# Words
# extracts the words from a line of text
# assumed input: a sentence typed as a single line of text
# guaranteed output: a tuple of the words, listed in the order that
# they appeared in the sentence
sentence = raw_input("Type a sentence: ")
next_word = "" # holds the characters of each word that we collect
tuple = () # holds the tuple of words we collect
separators = (".", ",", ";", ":", "-", "?", "!", " ")
for letter in sentence :
# invariant: all chars examined so far are grouped into words in tuple
# along with the word we are collecting in next_word
if letter in separators : # have we reached the end of a word ?
if next_word != "" : # have we collected a nonempty word ?
tuple = tuple + (next_word,) # add next_word to the tuple
next_word = "" # prepare to collect another word
else :
next_word = next_word + letter # add letter to the word we are building
print "next_word =", next_word, "; tuple =", tuple # print trace info
if next_word != "" : # oops --- did sentence end without closing punctuation?
tuple = tuple + (next_word,)
print tuple
raw_input("\n\npress Enter to finish")
ENDFIGURE=====================================
The program also contains a last repair: If the input sentence was not
terminated by a separator character, then the word resting in
variable next_word that must be copied into the
tuple --- this is the reason for the if-command that follows the loop.
How can we teach a computer to read an English sentence and speak it as a
``Yoda sentence''? We can do it in two stages: we use
the previous program, Words.py,
as Stage 1, and then we do Stage 2 by
randomly selecting words from the tuple built in Stage 1
and concatenating the words into a Yoda sentence.
Our program will behave like this:
$ python Yoda.py
Type a sentence: My dog has fleas, but I don't mind.
Yoda's sentence:
dog but don't My mind fleas has I!
The algorithm for word extraction is simple: one by one, we
randomly choose a word from the tuple of words and append the word
to the yoda_sentence we must build.
We use the random.randrange function to choose each
word, and
we use a while-loop to process all the words:
=============================
tuple = ... # Remember that tuple holds a tuple of words.
yoda_sentence = "" # the sentence we will build
# Choose the words one by one from tuple:
while len(tuple) != 0 :
choice = get a random index number in the range 0..len(tuple)
yoda_sentence = yoda_sentence + " " + tuple[choice]
tuple = rebuild tuple without the word at tuple[choice]
print yoda_sentence
================================
Notice that when we choose a word to add to the yoda_sentence,
we must ensure that the word is never chosen again --- we ``remove'' it
from the tuple by rebuilding the tuple less the
chosen word:
choice = random.randrange(len(tuple)) # choose a word to extract
yoda_sentence = yoda_sentence + " " + tuple[choice]
tuple = tuple[:choice] + tuple[choice+1:] # rebuild tuple without the choice
The last command uses two slices to ``remove'' the word.
The finished program looks like this; it contains a new Python trick,
input, which is explained below:
FIGURE============================================
# Rearrange
# extracts the words from a tuple one by one and makes a Yoda sentence.
import random # this makes available the random-number generator
# Use the Python operator, input, to read a _Python tuple_ of words:
tuple = input("Please type a tuple of quoted words: ")
print "\ntuple = ", tuple # print trace info
yoda_sentence = "" # the sentence we will build
while len(tuple) != 0 :
# invariant: the words extracted from tuple are in yoda_sentence
# print tuple, yoda_sentence # print trace info
choice = random.randrange(len(tuple)) # choose a word to extract
yoda_sentence = yoda_sentence + " " + tuple[choice]
tuple = tuple[:choice] + tuple[choice+1:] # rebuild tuple without the choice
print "\nYoda's sentence: "
print yoda_sentence + "!"
raw_input("\n\npress Enter to finish")
ENDFIGURE================================================
The input operator lets you type an actual Python expression,
which is assigned directly to a variable! When we test this program,
the test would look like this:
$ python Rearrange.py
Please type a tuple of quoted words: ("this", "is", "a", "sentence")
tuple = ('this', 'is', 'a', 'sentence')
Yoda's sentence:
sentence a this is!
For the program's input data,
we type an actual tuple, just like we would write in Python,
and the input operator assigns it to tuple.
This trick is great for testing a program, but it is not a good
idea to use input to let non-programmers communicate with
a program --- strange things can happen when the input is typed
incorrectly!
Now that we have completed Stage 2, we combine it with Stage 1,
and we have constructed the Yoda-sentences program:
FIGURE===========================================
# Yoda
# rearranges a normal sentence into a ``Yoda sentence''
# assumed input: a sentence typed as a single line of text
# guaranteed output: the words of the sentence rearranged as a Yoda sentence
separators = (".", ",", ";", ":", "-", "?", "!", " ")
sentence = raw_input("Type a sentence: ")
# Stage 1: extract the words in the input sentence:
tuple = ()
next_word = ""
for letter in sentence :
# invariant: the chars examined so far have been grouped into words in
# word_tuple, and the next word we are building lives in next_word
if letter in separators : # have we reached the end of a word ?
if next_word != "" : # have we collected a nonempty word ?
tuple = tuple + (next_word,) # add next_word to tuple
next_word = "" # prepare to collect another word
else :
next_word = next_word + letter # add letter to the word we are building
if next_word != "" : # oops --- did sentence end without closing punctuation?
tuple = tuple + (next_word,)
# Stage 2: randomly extract the words one by one and make a Yoda sentence:
import random # this makes available the random-number generator
yoda_sentence = "" # the sentence we will build
while len(tuple) != 0 :
# invariant: the words extracted from tuple so far are in yoda_sentence
# print "tuple =", tuple # print trace info
choice = random.randrange(len(tuple))
# print choice, words[choice] # print more trace info
yoda_sentence = yoda_sentence + " " + tuple[choice]
tuple = tuple[:choice] + tuple[choice+1:] # rebuild tuple without the choice
print "\nYoda's sentence: "
print yoda_sentence + "!"
raw_input("\n\npress Enter to finish")
ENDFIGURE==========================================
card_deck = ( ("hearts", 2), ("hearts", 3), ..., ("hearts", 10), ("hearts", "jack"), ("hearts", "queen"), ("hearts", "king"), ("hearts", "ace"), ("diamonds", 2), ... , ("diamonds", "ace") ("clubs", 2), ... , ("diamonds", "ace') ("spades", 2), ... , ("spades", "ace") )(Hint: define these variables:
suits = ("hearts", "diamonds", "clubs", "spades") high_cards = ("jack", "queen", "king", "ace") card_deck = () # starts as an empty tupleand write several loops to generate the individual cards and insert them into card_deck.)
humans_hand == ( ("spades", 4), ("spades", "ace"), ("diamonds", "ace") )then the scorings of the hand are 6, 16, and 26. Use a tuple to collect all the possible scores!
for VARIABLE in SEQUENCE : COMMANDswhere VARIABLE is a variable name and SEQUENCE is an expression that computes to a sequence. (See below.) The semantics of the for-loop goes like this:
We also learned about sequences. At the moment, we know of two forms of them:
A SEQUENCE is either
() or (EXPRESSION,) or (EXPRESSION1, EXPRESSION2, ..., EXPRESSIONn) where n > 1
Here are the operations that can be applied to sequences:
len(SEQUENCE)computes to the integer length of SEQUENCE.
SEQUENCE[EXPRESSION]where EXPRESSION computes to a nonegative integer, m, extracts the element numbered by m in the SEQUENCE.
There are also these general forms of indexing, which can extract multiple elements from a sequence:
SEQUENCE[:EXPRESSION]where EXPRESSION computes to nonnegative integer, n.
SEQUENCE[EXPRESSION:]where EXPRESSION computes to nonnegative integer, m.
SEQUENCE[EXPRESSION1:EXPRESSION2]where EXPRESSION1 computes to nonnegative integer, n and EXPRESSION2 computes to nonnegative integer, m.
SEQUENCE1 + SEQUENCE2builds a new sequence whose elements are exactly the ones of SEQUENCE1 followed by SEQUENCE2.
EXPRESSION in SEQUENCEwhich computes to True when the value named by EXPRESSSION is an element within SEQUENCE. (Otherwise, it computes to False.)
We also learned that
Finally, here are many more useful operations on strings that were not used in this chapter but will be helpful in the future:
name = "JanE32" print name.lower() + " " + nameprints jane32 JanE32. If we wish to permanently alter the string, we use an assignment as well, e.g., name = name.lower().
Similarly, S1.upper() makes a new string whose letters are upper case, e.g., for name = "JanE32", name.upper() builds the string, "JANE32".
Similarly, S1.title() builds a new string that capitalizes the first letter of each ``word'' in S1, case, e.g., for book = "The 4 corners of Earth", book.title() builds , "The 4 Corners Of Earth".
data = raw_input("Please type an integer: ") if data.isdigit() : num = int(data) . . . else : print "error --- nonnumeric input"
We can convert between single letters and integers:
To print an integer with a fixed number of places, use this trick:
num = . . .
import string
print string.zfill(num, PLACES)
where PLACES is a nonnegative int. For example,
cents = 4
print string.zfill(cents, 3)
prints
004 --- leading zeroes are added.
(When the number is larger than the number of places, the number
prints correctly, anyway.)
line = "abcdecd" pattern = "cd" print line.find(pattern)prints 2, since "cd" first begins at index 2 in line. We can use find to rewrite the FindChar program in Figure 5 like this:
# FindChar # locates the leftmost occurrence of a character in a string. # assumed inputs: s - the string to be searched # c - the character to be found # guaranteed output: if c is in s, then c's position is printed # if c is not in s, then -1 is printed s = raw_input("Type a string: ") c = raw_input("Type a single char to search for: ") c = c[0] # in case the user typed some extra blanks, extract the first char print s.find(c)All the hard work is done by find!
line = "abcabdabe" new_line = line.replace("ab", "!")assigns the string, "!c!d!e" to new_line.