Copyright © 2006 David Schmidt

Chapter 4:
Strings, for-loops, and tuples


4.1 More about Strings
4.2 Design: definite-iteration for strings
4.3 Definite Iteration with a for-loop
4.4 Searching strings
    4.4.1 Design: The searching pattern
    4.4.2 Alphabetizing (sorting) a string
4.5 Tuples
    4.5.1 Tuples as compound data values
4.6 Foundations: sequence invariants and data invariants
4.7 Design: building one program in stages of smaller programs
    4.7.1 Case study, Part I: Collecting words from a sentence
    4.7.2 Case study, Part II: Yoda sentences
    4.7.3 Case study, Part III: connecting the two programs
4.8 Summary


So much of our knowledge is stated in words! This includes not only history and philosophy books, but technical descriptions used by chemists, mathematicians, and other scientists. We must be able to tell a computer how to compute on words, whether for translating from French to German, simulating a chemical reaction, solving a mathematical differential equation, or playing a word game.

Please read Dawson, Chapter 4, for an introduction to computing on strings.


4.1 More about Strings

In computing, words and sentences are called strings.

We learned earlier that a string is a sequence of characters surrounded by quotes, for example, "aBc e" and 'ab\nCD!\nE9' are strings. We also learned that strings can be concatenated together with +, for example:

"aBc e" + 'ab\nCD!\nE9'   computes to   "aBc eab\nCD!\nE9"
We also learned that the methods, upper and lower, will make new strings entirely in upper and lower case. For example,
"aBc e".upper()  computes to  "ABC E"
and when we define a variable, s,
s = "aBc e"
then
s.lower()   computes to   "abc e"

Here are some new tricks with strings:

With these new operations, we can write Python programs that compute on text. There are many more operations on strings that we might use, and they are listed in the summary section at the end of this chapter.


4.2 Design: definite-iteration for strings

We can use while loops to compute on the characters within a string. Here are some simple examples that illustrate the techniques.

The first example shows how to examine and print the individual characters in a string:

FIGURE 1=========================
# PrintChars1
#  prints a string's characters from left to right, one at a time:

text = raw_input("Type a string: ")

index = 0  # remembers which character we are looking at in  text
while  index < len(text) :
    # invariant: we have printed  text[0], text[1], ..., up to text[index-1]
    print  text[index]
    index = index + 1

raw_input("\n\npress Enter to finish")

ENDFIGURE=============================
The loop counter, index, counts all the characters in text, from 0 to len(text) - 1. (For an input string like text = "abcdef", we have that len(text) is 6, so variable index counts from 0 to 5.) If you try it, you will see:
$ python PrintChars1.py
Type a string: abc
a
b
c

For practice, we can print the characters from right to left like this:

FIGURE 2======================
# PrintChars2
#   prints a string's characters from right to left:

text = raw_input("Type a string: ")

index = len(text) - 1  # remembers which character we are looking at in  text
while  index >= 0 :
    # invariant: text[len(text) - 1] downto text[index + 1]  have been printed
    print text[index]
    index = index - 1

raw_input("\n\npress Enter to finish")

ENDFIGURE===========================
Here, we start counting with the highest-numbered character, at position len(string) - 1, and count downwards to 0. (For an input string like text = "abcdef", we have that len(text) is 6, so variable index counts from 5 to 0.)

This next program reads a string and prints it in reversed order. We might want its behavior to go like this:

$ python Reverse.py
Type a string: abcde
edcba
We can do the reversal as a ``game,'' where we examine characters, one by one, from the input string and copy them to the output string, like this:
input string:   output string:
 "abcde"         ""
 "bcde"          "a"
 "cde"           "ba"
 "de"            "cba"
 "e"             "dcba"
 ""              "edcba"
A lot of what we do in programming is play little games like this one, moving around bits of data.

We can play this game two ways. The first way merely examines each character in the input string and copies it to the output string, but in reverse order:

====================================

input = raw_input("Type a string: ")
output = ""

index = 0  # remembers which character we are looking at in  input
while  index < len(input) :
    # invariant:  output  ==  input[index-1] ... input[1] input[0]
    output = input[index] + output
    index = index + 1

# at this point,  output  holds all the characters of input, reversed
print output

========================================

The second way to play the reversal game makes the input string appear to ``shrink'' into an empty string by repeatedly assigning shorter and shorter suffixes to it:

==========================

input = raw_input("Type a string: ") # this is the  original input
output = ""

while  input != "" :
    # invariant: the  output  string, reversed,
    #  appended to  input  equals  the original input string
    letter = input[0]  # extract the front letter from what's left to move
    output = letter + output
    input = input[1:]  # reassign  input  less its front letter

# at this point,  input == "",  meaning that  output  holds the reversed text:
print output

ENDFIGURE================================

Semantics of strings

The reversal example is important, and for that reason, we will show the step-by-step execution of the second version of its solution. Say that the program has started and the user has supplied the string, "abcde", as the value of variable input. Variable output is set to the empty string, and the computer configuration upon entry to the while-loop is this:

Of course, the loop's test computes to True, and execution moves into the loop's body:

The first statement in the loop's body extracts the leading (zero-th) character from input and assigns it to variable letter Since letter does not yet exist in the namespace, it is created:

After completing the assignment to letter, the instruction counter moves to the next instruction, which appends the value in letter to the end of output. First, the expression, letter + output is computed to the string, "a" + "", which computes to the string, "a":

Next, the string, "a" is saved into variable output, destroying the old value:

The command at Line 9 shortens string input by one letter. This happens in two steps: First, the expression, input[1:] computes to a new string that looks like input, starting from character 1 and extending to the end:

Then, the string is saved in variable input:

Now the loop repeats, and again the character at position input[0] (it will be "b" this time) is moved into letter and appended to output (which becomes "ba") and so on. In this way, the loop ``plays'' the letter-shift ''game'' we saw at the beginning of this section.

The example showed us that variables that hold strings behave like variables that hold numbers, and operations like + behave a lot like arithmetic operations (like +). The example did hide a few technical details, however, about how strings are built within the computer, but there is no harm in hiding these details until we reach Chapter 5.


4.3 Definite Iteration with a for-loop

A string is an example of a sequence of elements (here, a sequence of characters). Python has a special command that can neatly march through a sequence, extracting its elements one at a time from left to right. The command is called a for-loop.

Here is a quick example. We can rewrite Figure 1, which printed the characters of a string from left to right, like this:

FIGURE 4=======================================

text = raw_input("Type a string: ")

for letter in text :
  print letter

raw_input("\n\npress Enter to finish")

ENDFIGURE========================================
It's just that simple! The for-loop automatically extracts the characters one by one and executes the loop's body each time for each character.

Let's compare the original while-loop and the new for-loop, both of which accomplish exactly the same work:

FIGURE 1:                           FIGURE 4:

index = 0                           
while  index < len(text) :          for letter in text :
    letter = text[index]                print letter
    print letter 
    index = index + 1
The for-loop is simpler to write and is preferred when we must examine all the characters of a string.

The syntax of the for-loop looks like this:

for VARIABLE in SEQUENCE :
    COMMANDs
where VARIABLE is a variable name, and (for the moment) SEQUENCE is an expression that computes to a string. The semantics of the for-loop goes like this:
  1. The SEQUENCE is computed into a string, call it S.
  2. A cell for VARIABLE is constructed.
The semantics description implies that the for-loop builds automatically (and hides from us!) a loop-counter variable that counts from 0 to the length of string, S.

Let's rewrite the first version of the string-reversal program so that it uses a for-loop:

FIGURE 5=====================================

# Reverse
#   reverses the characters in a line of text
# assumed input: a line of text
# guaranteed output: the text with its characters reversed

input = raw_input("Type a line of text: ")
output = ""

for char in input :
    # invariant: at the i-th iteration,  
    #   output ==  input[i-1] input[i-2] ... input[1] input[0]
    output = char + output

print output
raw_input("\n\npress Enter to finish")

ENDFIGURE=========================================


4.4 Searching strings

In the real world, searching for a missing object is an indefinite activity, because we do not know when we will find the object. Programs can also go ``searching''---for example, a program might search a computerized telephone directory for a person's telephone number. A small but good example of computerized searching is finding a specific character in a string. Searches use an important pattern of loop, so we develop the character-search example in detail.

Here is the first version of the problem: Our program asks the user to type a string, s, and then a character, c, to find in s. The program will print, ``True'' or ``False'', whether or not c is present in s. Here is an example of the behavior:

$ python Search.py
Type a string: hurricane
Type a single char to search for: a
True
We can use a for-loop to examine each of the characters in the input string to see if they match the character we search for:
======================================

# FindChar  locates the leftmost occurrence of a character in a string.
# assumed inputs: s - the string to be searched
#                 c - the character to be found
# guaranteed output:
# if  c  is in  s,  then  True is printed, else False is printed

s = raw_input("Type a string: ")
c = raw_input("Type a single char to search for: ")

found = False  # remembers if we found  c  within  s
for letter in s :
    # invariant:  found  is correctly maintained: 
    #      found == True  exactly when  c  was found in  s
    if letter == c :
        found = True
print found
raw_input("\n\npress Enter to finish")

==================================================
The key to the program is the boolean variable, found --- it remembers if we encounter character c within s.

The above solution can be simplified: The Python language has a built-in operator, named in, that can find a character within a string. For example,

s = "uvwxyz"
c = "z"
print (c in s)
prints True.

The following program uses the in-operator:

======================================

# FindChar  locates the leftmost occurrence of a character in a string.
# assumed inputs: s - the string to be searched
#                 c - the character to be found
# guaranteed output:
# if  c  is in  s,  then  True is printed, else False is printed

s = raw_input("Type a string: ")
c = raw_input("Type a single char to search for: ")
print (c in s)
raw_input("\n\npress Enter to finish")

==================================================


4.4.1 Design: The searching pattern

The searching problem usually requires an answer more detailed than just True or False. Here is the second version of the problem: Our program asks the user to type a string, s, and the character, c, to find in s. The program will print, one by one, each of the characters in s, until it encounters c, at which point it prints an exclamation mark (!) and halts. (If c is not found in s, then of course all the letters in s are printed, and no exclamation mark appears. Here is an example of the behavior:

$ python SearchChar.py
Type a string: hurricane
Type a single char to search for: a
h u r r i c !

The search examines s's characters from left to right until we find an occurrence of c; the algorithm goes like this:

  1. Read string s and character c
  2. For each character, letter, within s, check if letter equals c:
The program based on this algorithm uses a for-loop to examine all the characters in string s, but the loop must quit exactly when we find an occurrence of c. We can stop the loop prematurely with a break command. Here is the Python program:
FIGURE==========================================

# SearchChar  locates the leftmost occurrence of a character in a string.
# assumed inputs: s - the string to be searched
#                 c - the character to be found
# guaranteed output:  the characters in  s  are printed one by one until
#   c  is encountered --- then  !  is printed.

s = raw_input("Type a string: ")
c = raw_input("Type a single char to search for: ")

for letter in s :
    # invariant: have printed all letters examined so far in  s
    if letter == c :
        print "!"
        break  # exit this loop immediately
    else :
        print letter,

# here, the loop terminated because either
# (i) c  was found and "!" was printed, or
# (ii) the entire string was printed and  c  not found

print
raw_input("\npress Enter to finish")

ENDFIGURE==========================================
The example illustrates the standard programming pattern for searching a sequence:
for ITEM in SEQUENCE
    if ITEM SATISFIES THE SEARCH CRITERION :
        REMEMBER THE ITEM
        break
    else :
        ...
That is, we search the items in a sequence, one by one, until we find the item we want. We break the loop at that point.

Here is a second use of the searching pattern: We again search a string, s, for a character c. This time, if we find c, we remove it from s.

The searching pattern tells us to write this for-loop:

# assume that we have a string,  s,  and a character,  c :

for letter in s :  # search all the characters in  s  for  c:
    if letter == c :
        break      # we found  c,  so we exit the loop immediately
When the computer executes break, it immediately leaves the loop and proceeds to the first command after the loop.

But the loop isn't complete --- we must remember the index position where letter resides in s. We use an extra variable, index, to remember this:

index = 0  # the position of the  letter  in  s  that we are examining
for letter in s :
    if  letter == c :
        break # we found  c,  so we exit the loop immediately!
    else :  # we must look further, so increment the index:
        index = index + 1
We can use index in this way, because Python's for-loop searches the string from left to right.

Here is the completed search program:

FIGURE 5: searching for a character in a string========================

# FindAndRemove
#   locates the leftmost occurrence of a character in a string and removes it.
# assumed inputs: s - the string to be searched
#                 c - the character to be found
# guaranteed output: s  without the first occurrence of  c  (if  c  is not
#    in  s,  then  s  is printed unchanged)

s = raw_input("Type a string: ")
c = raw_input("Type a single char to search for: ")

index = 0  # the position of the  letter  in  s  we are examining

for letter in s :
    # invariant:  c  is not any of  s[0], s[1], ..., s[index-1]
    if  letter == c :  
        break  # we found  c,  so we exit the loop immediately!
    else :  # we must look further, so increment the index:
        index = index + 1

# Here, we have exited the loop.  This happened for one of two reasons:
#  (1)  c == letter,  meaning that  s[index] == c
#  (2)  all of  s  was examined, c  was not found, so  index == len(s)
if  index < len(s) :  # did we find  c  at position  index ?
    s = s[:index] + s[index + 1:]    # remove it
print s

raw_input("\n\npress Enter to finish")

ENDFIGURE================================================
Study the comments that follow after the for-loop --- they explain how we use the value in variable index to deduce whether or not character c was found in string s.

Exercise

Modify FindChar so that it locates the rightmost occurrence of character c in string s. (You must use a while-loop!) Remember to revise the loop's invariant.


4.4.2 Alphabetizing (sorting) a string

When a sequence of elements must be arranged or reordered, nested loops often give a solution. Here is a classic example: Given string, s, we must construct a new string that has all of s's characters but rearranged in alphabetical order. For example, if s is "butterfly", then its alphabetized form is "beflrttuy".

The algorithm for alphabetization will examine and copy the characters, one by one, from string s into a new, alphabetized string, which we call answer:

  1. Set output = ""
  2. For each letter in s, copy letter into its proper position in output.
Here is how the algorithm would examine the each of the characters in the input and copy them to output to alphabetize "butterfly":
   input :        output :
"butterfly"      ""
"utterfly"       "b"
"tterfly"        "bu"
"terfly"         "btu"
"erfly"          "bttu"
"rfly"           "bettu"
"fly"            "berttu"
"ly"             "befrttu"
"y"              "beflrttu"
""               "beflrttuy"
This strategy is known as insertion sorting.

Our algorithm might be refined like this:

read s
output = ""
for letter in s
    insert  letter  into its correct position in  output (*)
Now, what is the algorithm for (*), the body of the for-loop ? It's another for-loop, where we search within output for the correct position for inserting letter --- this uses the searching pattern yet again:
# commands for refining (*):
# First, search for the position in  output  where we should insert  letter:
index = 0
for alpha in output :
    if  letter < alpha :
        break
    else :
        index = index + 1
Next, insert  letter  at position  index  in  output.
Here is the completed program:
FIGURE 6==============================================

# Sort
#  alphabetizes a string.
# assumed input: s - a string
# guaranteed output: output - the characters in  s  arranged alphabetically

s = raw_input("Type word to sort: ")
output = ""   # holds a sequence of alphabetized letters from  s

for letter in s :
    # invariant: at the i-th iteration,
    #   output  holds the first i-1 letters in  s,  alphabetized

    # Find the position in  output  where we should insert  letter:
    index = 0
    for alpha in output :
        # invariant:  the characters,
        #   output[0], output[1], ..., output[index-1] are all <= letter
        if  letter < alpha :  # insert  letter  just before  alpha ?
	    break  # yes
        else :     # no --- look at the next  alpha  in  output
	    index = index + 1 
    # Insert  letter  at position  index  in  output:
    output = output[:index] + letter + output[index:]
    print letter, "inserted at", index, "; output =", output  # print trace info

# Here, we have finished the loops, and  output  holds all the letters in  s:
print output

raw_input("\n\npress Enter to finish")

ENDFIGURE=============================================

Once the inner for-loop locates the position, index, where letter should be inserted, the command

output = output[:index] + letter + output[index:]
cleverly slices output into two pieces at position index and inserts letter between the two pieces. For example, if output holds the string, "bttu", and letter is "e", then we see that
index == 1
output[:index] == "b"
output[index:] == "ttu"
and the assignment to output computes this string:
output[:index] + letter + output[index:]  ==  "b" + "e" + "ttu"  ==  "bettu"

Alphabetization is a case of sorting, where a collection of elements are arranged to some standardized ordering. (For example, a dictionary is a collection of words, sorted alphabetically, and a library is a collection of books, sorted by the catalog numbers stamped on the books' spines.)

Exercise

Write this program:
# RemoveDuplicateLetters 
#   constructs a string that contains the
#   same letters as its argument except that all duplicate letters
#   are removed, e.g.,  for argument, "butterflies", the result is
#   "buterflis"  
# assumed input: s - the argument string
# guaranteed output: a string that looks like  s  but with no duplicates


4.5 Tuples

A string is a sequence of characters. In this sense, the previous examples showed us not only how to program with strings, but how to program with sequences.

What can we do with a sequence?

  1. We can examine each of the items in a sequence, using indexing. For example,
    s = "abcde"
    for letter in s :
        print letter
    
  2. We can examine a slice of a sequence, print it, combine it with another sequence, and save it for later use. For example,
    s = "abdef"
    c = "c"
    t = s[:2] + c + s[2:]  # computes "abcdef" and assigns it to t's cell
    

There are other sequences than strings. A tuple is a sequence of any kind of values at all. In Python, we write a tuple with parentheses and commas. Here is an example of a tuple that lists the first 6 powers of 2:

(1, 2, 4, 8, 16, 32)
This is a sequence of five numbers, so that if we assign
powers_of_two = (1, 2, 4, 8, 16, 32)
then
print powers_of_two[3]   # prints  8
For that matter,
print powers_of_two   # prints  (1, 2, 4, 8, 16, 32)
and
for num in powers_of_two
    print num
prints
1
2
4
8
16
32
Tuples work just like strings!

Again, strings and tuples are both sequences, and their elements are internally numbered the same way. Recall that a string like "abcde" is a sequence numbered like this:

     0 1 2 3 4
    +-+-+-+-+-+
    |a|b|c|d|e|
    +-+-+-+-+-+
Similarly, the tuple (1, 2, 4, 8, 16, 32) is a sequence numbered like this:
      0  1  2  3  4  5
    +--+--+--+--+--+--+
    ( 1, 2, 4, 8,16,32)
    +--+--+--+--+--+--+
So, a string is a sequence of characters, but a tuple is a sequence of whatever you like.

You can make a tuple of strings, listing the last four U.S. presidents:

("G.W. Bush", "W. Clinton", "G.H.W. Bush", "R. Reagan")
and here is a tuple listing a person's name, age, and annual salary:
("G.W. Bush", 60, 300000.00)
(It is ok to combine strings, numbers, and whatever into the same tuple.)

A tuple that holds only one element (e.g., the cyclists who won the Tour de France bike race 7 times) is written with an extra comma, like this:

("Lance Armstrong",)
Notice the extra comma --- it's ugly but it's required. A tuple with no elements (this is acceptable!) looks like this:
()

We can also work with a tuple of strings. For example, the words in a sentence might be saved within a tuple, like this:

sentence = ("this", "is", "a", "sentence")
We might even write a small program that reads the words one at a time and collects them into the tuple. The program would be based on the indefinite-iteration pattern used for input processing:
==============================

# CollectWords.py  reads a sequence of words and collects them into a tuple.
# Inputs: a sequence of words, one per line, terminated by an empty line.
# Output: the words printed on one line as a complete sentence.

sentence = ()  # the tuple that holds the words
OK = True
while OK :
    # invariant: all words read so far are saved in  sentence
    print sentence  # print trace information
    word = raw_input("Type a word (or just Enter, to quit): ")
    if  len(word) == 0 :
        OK = False
    else :
        sentence = sentence + (word,)  # add a single-tuple to end of sentence

# the loop is finished, so print the sentence on one line:
for w in sentence :   
    print w,   # note the comma -- means ``no new line''
print          # print a ``new line''

raw_input("\nPress Enter to finish.")

========================================
The novel part is the command that adds a new word to the end of the tuple:
sentence = sentence + (word,) 
We are required to place the word into a ``one-tuple'' of its own, and then we concatenate the two tuples together. This is like concatenating a one-character string to the end of another string.

As we saw above, the indexing operation, _[_], works with tuples, as do for-loops. And so does +, which appends tuples together:

presidents = ("G.W. Bush", "W. Clinton", "G.H.W. Bush")
presidents = presidents + ("R. Reagan",)
(Note: try to do this: presidents = presidents + "R. Reagan" --- it will not work --- you can append only a tuple to another tuple.

There is one other trick we can do with tuples that we did with strings: we can check for membership. For example, if letter is a string, we can ask if it is a vowel:

alphabet = "abcdefghijklmnopqrstuvwxyz"
vowels = ("a", "e", "i", "o", "u")

for letter in alphabet :
    if letter in vowels :
        print letter + " is a vowel"
The condition, if letter in vowels, checks if the string named by letter is one of the strings contained in tuple, vowels. This of course prints
a is a vowel
e is a vowel
i is a vowel
o is a vowel
u is a vowel

Finally, it is possible to build a tuple whose elements are themselves tuples.


4.5.1 Tuples as compound data values

To this point, we have used numbers and strings as the data values upon which we calculate. But a string is itself a sequence of single letters, and there are many situations where we use data values that are, in fact, collections. A tuple is often used to represent such ``collected'' data values.

Here is an important example. Each pixel (colored dot) on your computer's display is built from three integers: a red-color content, expressed as an int in the range 0 to 255, a green-color content, expressed similarly, and a blue-color content. Pixels are sometimes called ``RGB-values.''

We define the RGB-values in Python as 3-tuples:

red = (255, 0, 0)
white = (255, 255, 255)
green = (0, 255, 0)
black = (0, 0, 0)
and so on. A colored picture (such as your display) consists of a sequence of such pixels, which are systematically repainted on your display as needed.

We build the negative image of a pixel like this:

pixel = ...
negative_pixel = (255 - pixel[0],  255 - pixel[1],  255 - pixel[2])
This builds a new tuple and assigns it to negative_pixel. The value of pixel is left unchanged.

We lessen by half a pixel's red content like this:

pixel = (pixel[0] / 2,  pixel[1],  pixel[2])
This example also builds a new tuple and overwrites the previous tuple that was saved in pixel's cell. In the next chapter, we will learn how to build a digital image as a ``list'' of pixels.

Another example of a compound data value is a date:

date = (1, "January", 2000)
Try printing the date like this:
print date  # This prints  (1, "January", 2000)
and like this:
for item in date :    # This prints   1 January 2000
   print item,        #   Remember that the comma prevents a newline
print                 #   Finally, print a newline
A third example of a compound value is a computerized playing card:
card = (4, "diamonds")
Here is a computerized ``deck'' of such cards:
card_deck = ( ("hearts", 2), ("hearts", 3), ..., ("hearts", 10),
  ("hearts", "jack"), ("hearts", "queen"), ("hearts", "king"), ("hearts", "ace"),
  ("diamonds", 2), ... , ("diamonds", "ace")
  ("clubs", 2), ... ,  ("diamonds", "ace')
  ("spades", 2), ... , ("spades", "ace") )
The card deck is a tuple of cards --- a tuple of tuples.

Exercise

Return to the previous chapter and study the checkbook assistant in Figure 3.
  1. Rewrite the program so that it saves the transactions in a ledger that is a tuple of strings. (Hint: use the trick just seen in presidents = presidents + ("G.W. Bush",).)
  2. What happens when you view the ledger (type v?) Use a for-loop to print the ledger so that the transactions it holds print on separate lines. (Hint: use a for-loop.)


4.6 Foundations: sequence invariants and data invariants

Strings and tuples are sequences, and computing on sequences generates patterns of knowledge that are themselves sequences of some form. We example three forms of ``sequence knowledge'' here.

Recall the Python program that reads a string and builds its reversal. The loop that reversed the string read the sequence of characters one by one and prefixed them to an output string, like this:

==================================

input = raw_input("Type a string: ")
output = ""

index = 0  # remembers which character we are looking at in  input
while  index < len(input) :
    # invariant:  output  ==  input[index-1] ... input[1] input[0]
    output = input[index] + output
    index = index + 1

# assert:  output  equals  input reversed
print output

====================================== 
Note the form of the invariant --- it mentions a sequence, because the loop's goal is to build a sequence of characters (in reverse). The sequence in the invariant is written using indexing notation; this is common and standard. Note also that index is counting not only the number of characters that have been examined but at the same time it is also counting the number of loop repetitions --- each repetition adds one more letter to the sequence in the invariant.

The second example we study shows how the loop invariant itself is a sequence of small facts, where each fact is generated by one repetition of the loop. Here is a program that examines a tuple of nonnegative numbers, like

tup = (16, 3, 49, 22, 2, 34)
and locates the maximum, largest number in the tuple:
===========================================

tup = ...
# assert:  tup  is a nonempty tuple of nonnegative ints
max = -1
for num in tup :
    # invariant: for all  num  examined so far,  max >= num
    #   that is, at iteration i, 
    #       max >= tup[0]  and  max >= tup[1] and ... and  max >= tup[i-1]
    #   that is, at iteration i,  max >= tup[j], for all  j in 0..i-1
    if num > max :
        max = num
# assert: for all  num in tup,  max >= num
print max

===================================================
The invariant is itself a sequence of small facts:
at iteration i,  max >= tup[0]  and  max >= tup[1] and ... and  max >= tup[i-1]
This sequence of facts can be written more concisely in ``for all'' style, like this:
at iteration i, for all j in 0..i-1,   max >= tup[j]
The for-all style of invariant appears often when a loop must examine all the elements in a sequence.

The third form of sequence invariant is an assertion that we attach directly to a sequence itself (and not the loop that processes it). This is important when the information named by a variable is organized in a certain order, and the commands that update the variable must preserve the ordering.

Here is an example. Say that sorted is a sequence of characters --- a string --- that is organized in alphabetical order, like this:

sorted = "dgikmoqrrxz"   # data invariant: sorted  is alphabetized
We can attach an assertion --- a data invariant --- that asserts sorted should always remain alphabetized.

The data invariant is critical for writing this loop, which reads a new character and attempts to insert the character in the correct position within sorted:

==================================

sorted =  ...  # data invariant:  sorted  is alphabetized
letter = raw_input("Type a single letter: ")

# Find the position in  sorted  where we should insert  letter:
index = 0
for alpha in sorted:
    # loop invariant:  for all j in 0..index-1,  sorted[j] <= letter
    #   that is,  sorted[0], sorted[1], ..., sorted[index-1] are all <= letter

    if  letter < alpha :  # insert  letter  just before  alpha ?
        break  # yes
    else :     # no --- look at the next  alpha  in  output
	index = index + 1 

# Loop has finished.
# assert: for all j in 0..index-1,  sorted[j] <= letter   
# and     either  letter < sorted[index]  or  index = len(sorted)

# Insert  letter  at position  index  in  sorted:
sorted = sorted[:index] + letter + sorted [index:]  
# data invariant:  sorted is alphabetized

======================================
Because we have the data invariant, we can calculate the correct loop invariant, and because we have the loop invariant when the loop finishes, we can calculate the correct data invariant at the program's end!

This is a standard occurrence when we compute on sequences ---

A sequence has a data invariant, and the loop that computes upon the sequence has an invariant that is a for-all sequence of facts.
We employ these ideas in the next chapter.


4.7 Design: building one program in stages of smaller programs

When we do a big job, we often think of the job as several smaller jobs that must be completed one after the other. Thinking this way helps us organize our time, energy, and enthusiasm so that we can work our way in stages from the beginning to the ending of the big job.

When we teach a computer to do a big job, we should think the same way and write a program that that solves smaller jobs one after the other --- we design, write, and test a program for each smaller job, and at the end, we connect together the programs into one big program.

The two subsections that follow show how two smaller programs can fit together to make one big program.


4.7.1 Case study, Part I: Collecting words from a sentence

Tuples work well for collecting a sequence of values when we do not know in advance how many to collect. Here is an example: A program must read a sentence, extract all words from the sentence, and collect them into a tuple. Since do not know in advance how many words will be in the sentence, a tuple that can grow as needed is good structure for collecting the words.

Here is the behavior we want of our program, Words.py:

$ python Words.py
Type a sentence: my dog  has fleas.

('my', 'dog', 'has', 'fleas')

The program's job is a kind of ``slide-the-letter game,'' where each letter in the input sentence is examined and copied from the left column into the middle column, and the (word in the) middle column is moved to the right column when a blank or punctuation mark is encountered:

input sentence:      next word:   tuple of words:
------------------   -----------  ----------------
"my dog  has fleas."    ""           ()
"y dog  has fleas."     "m"          ()
" dog  has fleas."      "my"         ()
                               The blank signals that the word is completed:
"dog  has fleas."       ""           ("my",)
"og  has fleas."        "d"          ("my",)
"g  has fleas."         "do"         ("my",)
"  has fleas."          "dog"        ("my",)
                               The blank signals that the word is completed:
" has fleas."           ""           ("my", "dog")
                                     (the extra blank is ignored:)
"has fleas."            ""           ("my", "dog")
    ...

"s."                    "flea"       ("my", "dog", "has")
"."                     "fleas"      ("my", "dog", "has")
                               The punctuation signals that the word is completed:
""                      ""           ("my", "dog", "has", "fleas")
The algorithm must remember the input sentence, the next word, and the tuple of words:
=============================

read sentence
next_word = ""
tuple = ()

for letter in sentence :
    if  letter is a separator (that is, " " or "." or "," or ....) : 
        add the  next_word  to  the end of  tuple  (and forget about  letter)
    else :
        add  letter  to the end of  next_word

====================================== 
If we write the algorithm in Python, we have this:
====================================

sentence = raw_input("Type a sentence: ")
next_word = ""  # holds the characters of each word that we collect
tuple = ()   # holds the words we collect

separators = (".", ",", ";", ":", "-", "?", "!", " ")

for letter in sentence :
    if  letter in separators :        # at the end of a word ?
        tuple = tuple + (next_word,)  # add  next_word  to  tuple
        next_word = ""                # prepare to collect another word
    else :
        next_word = next_word + letter  # add to the word we are building
    print "next_word =", next_word, "; tuple =", tuple  # print trace info

=======================================
We placed all the separator characters in a tuple, so we can ask if a letter is a separator, like this:
separators = (".", ",", ";", ":", "-", "?", "!", " ")
 . . .
if  letter in separators :
    . . .
We could also place the separators into a string and ask the same question:
separators = ".,;:-?! "
 . . .
if letter in separators :
    . . .
Take your pick.

When we finish building the next_word, we place it in a singleton tuple from it and append it to variable tuple:

tuple = tuple + (next_word,) 

When we test this program, we see that it operates fine except when we have two separators in a row, like this: "I eat; I sleep." Try the example; the semicolon followed by the space creates an ``empty word.'' Fortunately, the repair is easy and is shown in the program that follows.

FIGURE======================================

# Words
#   extracts the words from a line of text
# assumed input: a sentence typed as a single line of text
# guaranteed output: a tuple of the words, listed in the order that
#   they appeared in the sentence

sentence = raw_input("Type a sentence: ")
next_word = ""  # holds the characters of each word that we collect
tuple = ()   # holds the tuple of words we collect

separators = (".", ",", ";", ":", "-", "?", "!", " ")

for letter in sentence :
    # invariant: all chars examined so far are grouped into words in  tuple
    #   along with the word we are collecting in  next_word
    if letter in separators :   # have we reached the end of a word ?
        if  next_word != "" :   # have we collected a nonempty word ?
	    tuple = tuple + (next_word,)  # add  next_word  to the tuple
	    next_word = ""      # prepare to collect another word
    else :
        next_word = next_word + letter  # add letter to the word we are building
    print "next_word =", next_word, "; tuple =", tuple  # print trace info 

if next_word != "" :  # oops --- did sentence end without closing punctuation?
    tuple = tuple + (next_word,)  

print tuple

raw_input("\n\npress Enter to finish")

ENDFIGURE=====================================
The program also contains a last repair: If the input sentence was not terminated by a separator character, then the word resting in variable next_word that must be copied into the tuple --- this is the reason for the if-command that follows the loop.


4.7.2 Case study, Part II: Yoda sentences

In the Star Wars movies, the Jedi Master, Yoda, speaks sentences whose adjectives, adverbs, and verbs land in suprising places. For example, when you say, ``Make a right turn,'' Yoda would say, ``a turn right make.''

How can we teach a computer to read an English sentence and speak it as a ``Yoda sentence''? We can do it in two stages: we use the previous program, Words.py, as Stage 1, and then we do Stage 2 by randomly selecting words from the tuple built in Stage 1 and concatenating the words into a Yoda sentence. Our program will behave like this:

$ python Yoda.py
Type a sentence: My dog has fleas, but I don't mind.

Yoda's sentence:
dog but don't My mind fleas has I!

The algorithm for word extraction is simple: one by one, we randomly choose a word from the tuple of words and append the word to the yoda_sentence we must build. We use the random.randrange function to choose each word, and we use a while-loop to process all the words:

=============================

tuple =  ...  # Remember that  tuple  holds a tuple of words.
yoda_sentence = ""  # the sentence we will build

# Choose the words one by one from  tuple:
while len(tuple) != 0 :
    choice = get a random index number in the range 0..len(tuple)
    yoda_sentence = yoda_sentence + " " + tuple[choice]
    tuple = rebuild  tuple  without the word at  tuple[choice]

print yoda_sentence

================================
Notice that when we choose a word to add to the yoda_sentence, we must ensure that the word is never chosen again --- we ``remove'' it from the tuple by rebuilding the tuple less the chosen word:
choice = random.randrange(len(tuple))   # choose a word to extract
yoda_sentence = yoda_sentence + " " + tuple[choice]
tuple = tuple[:choice] + tuple[choice+1:]  # rebuild tuple without the choice
The last command uses two slices to ``remove'' the word.

The finished program looks like this; it contains a new Python trick, input, which is explained below:

FIGURE============================================

# Rearrange
#   extracts the words from a tuple one by one and makes a Yoda sentence.

import random   # this makes available the random-number generator

# Use the Python operator,  input,  to read a _Python tuple_ of words: 
tuple = input("Please type a tuple of quoted words: ")
print "\ntuple = ", tuple  # print trace info

yoda_sentence = ""  # the sentence we will build
while  len(tuple) != 0 :
    # invariant:  the words extracted from tuple are in  yoda_sentence
    # print tuple, yoda_sentence  # print trace info
    choice = random.randrange(len(tuple))   # choose a word to extract
    yoda_sentence = yoda_sentence + " " + tuple[choice]
    tuple = tuple[:choice] + tuple[choice+1:]  # rebuild tuple without the choice   

print "\nYoda's sentence: "
print yoda_sentence + "!"

raw_input("\n\npress Enter to finish")

ENDFIGURE================================================
The input operator lets you type an actual Python expression, which is assigned directly to a variable! When we test this program, the test would look like this:
$ python  Rearrange.py
Please type a tuple of quoted words: ("this", "is", "a", "sentence")

tuple = ('this', 'is', 'a', 'sentence')

Yoda's sentence: 
sentence a this is!
For the program's input data, we type an actual tuple, just like we would write in Python, and the input operator assigns it to tuple. This trick is great for testing a program, but it is not a good idea to use input to let non-programmers communicate with a program --- strange things can happen when the input is typed incorrectly!


4.7.3 Case study, Part III: connecting the two programs

Now that we have completed Stage 2, we combine it with Stage 1, and we have constructed the Yoda-sentences program:

FIGURE===========================================

# Yoda
#   rearranges a normal sentence into a ``Yoda sentence''
# assumed input: a sentence typed as a single line of text
# guaranteed output: the words of the sentence rearranged as a Yoda sentence

separators = (".", ",", ";", ":", "-", "?", "!", " ")

sentence = raw_input("Type a sentence: ")

# Stage 1: extract the words in the input sentence:
tuple = ()  
next_word = ""
for letter in sentence :
    # invariant: the chars examined so far have been grouped into words in
    #   word_tuple,  and the next word we are building lives in  next_word
    if  letter in separators :    # have we reached the end of a word ?
        if  next_word != "" :   # have we collected a nonempty word ?
	    tuple = tuple + (next_word,)   # add  next_word  to tuple
	    next_word = ""      # prepare to collect another word
    else :
        next_word = next_word + letter  # add letter to the word we are building

if next_word != "" :  # oops --- did sentence end without closing punctuation?
    tuple = tuple + (next_word,)


# Stage 2: randomly extract the words one by one and make a Yoda sentence:
import random   # this makes available the random-number generator

yoda_sentence = ""  # the sentence we will build

while  len(tuple) != 0 :
    # invariant:  the words extracted from tuple so far are in  yoda_sentence
    # print "tuple =", tuple  # print trace info
    choice = random.randrange(len(tuple))
    # print choice, words[choice]  # print more trace info
    yoda_sentence = yoda_sentence + " " + tuple[choice]
    tuple = tuple[:choice] + tuple[choice+1:]  # rebuild tuple without the choice   

print "\nYoda's sentence: "
print yoda_sentence + "!"

raw_input("\n\npress Enter to finish")

ENDFIGURE==========================================

Exercise

Model a playing card like this: ("hearts", "ace") is a tuple that stands for the ace of hearts.
  1. Write a program that builds a deck of 52 cards as this tuple of 52 tuples:
    card_deck = ( ("hearts", 2), ("hearts", 3), ..., ("hearts", 10),
      ("hearts", "jack"), ("hearts", "queen"), ("hearts", "king"), ("hearts", "ace"),
      ("diamonds", 2), ... , ("diamonds", "ace")
      ("clubs", 2), ... ,  ("diamonds", "ace')
      ("spades", 2), ... , ("spades", "ace") )
    
    (Hint: define these variables:
    suits = ("hearts", "diamonds", "clubs", "spades")
    high_cards = ("jack", "queen", "king", "ace")
    card_deck = ()  # starts as an empty tuple
    
    and write several loops to generate the individual cards and insert them into card_deck.)
  2. Add to the program a loop that lets the human ask for cards one by one. The cards are randomly selected and removed from card_deck and copied into the human's ``hand.'' (Hint: use another variable: humans_hand = ().)
  3. This is harder: Pretend the human is playing ``21'' (blackjack, where numbered cards count as expected and face cards count for ten points and an ace can count for either 1 or 11 points). Make the program print the score of the hand. Since an ace can be 1 or 11, if there is an ace in the hand, then the program must print all possible 21 scores of the hand. For example, if
    humans_hand == ( ("spades", 4), ("spades", "ace"), ("diamonds", "ace") )
    
    then the scorings of the hand are 6, 16, and 26. Use a tuple to collect all the possible scores!


4.8 Summary

There is a new COMMAND, the for-loop; its syntax is
for VARIABLE in SEQUENCE :
    COMMANDs
where VARIABLE is a variable name and SEQUENCE is an expression that computes to a sequence. (See below.) The semantics of the for-loop goes like this:
  1. The SEQUENCE is computed into a sequence, call it S.

We also learned about sequences. At the moment, we know of two forms of them:

A SEQUENCE is either

The elements of a sequence are numbered (indexed) by 0, 1, 2, ....

Here are the operations that can be applied to sequences:

We also learned that

Finally, here are many more useful operations on strings that were not used in this chapter but will be helpful in the future: