Copyright © 2005 David Schmidt

Chapter 7:
Sequential Files


7.1 How to read and write a sequential file in Python


Computer programs can read input from and write output to disk files. Simply stated, a file is a sequence of symbols stored on a diskette or on a computer's internal disk. There are two formats of files:

A file is indeed one long sequence; for example, these three text lines,

Hello to   you!
How are you?
  49
would be stored as a character file as this long sequence:
Hello to   you!\nHow are you?\n  49\n
Every symbol, including the exact quantity of blanks, are faithfully reproduced in the sequence. Newline characters ('\n') mark the ends of lines. If you create a character file with a Windows-based PC, the Windows-based operating system marks the end of each line with the two-character sequence, \r\n, e.g.,
Hello to   you!\r\nHow are you?\r\n  49\r\n
(The \r is ``carriage return'' and of course, \n is ``newline.'') The reasons for this are purely historical.

If you could see with your own eyes the characters saved on the magnetic disk, you would not see letters like H and e. Instead, you would see that each character is translated to a standard sized, numeric coding. One standard coding is ASCII (pronounced ``as-key''), which uses one byte to represent each character. Another coding is Unicode, which uses two bytes to represent a character. (Unlike ASCII, Unicode can represent accented and umlauted letters.) We will not concern ourselves with which coding your computer uses to encode characters in a character file; these details are hidden from us.

If you like, start this script, which prints all the ASCII characters in the range of 0 to 255:

for i in range(256) :
    print i, "=", chr(i)
    if (i % 10 == 0) :
        raw_input("press Enter to proceed")
Note: chr is a Python function that maps an int in the range 0..255 to a one-character string, and ord maps a one-character string into its underlying int.

A binary file is a sequence of ones and zeros. Binary files hold information that is not keyboard-based. For example, if one wishes to save information about the address numbers inside the computer's primary storage, then a binary file is a good choice. Also, purely numerical data is often stored in a binary file: Inside the computer, an integer like 49 is used in its binary representation, 11001, and not in its character representation (the characters '4' and '9'). It is easier for the computer to do arithmetic on binary representations, and programs that work extensively with files of numbers work faster when the input and output data are saved in binary files---this saves the trouble of converting from character codings to binary codings and back again.

Python even allows a program to copy data structures in computer heap storage to a binary file; this proves especially useful when one wants to archive the contents of an executing program for later use.

When we obtain input information from a file, we say that we read from the file; when we deposit output information into a file, we write to it. When we begin using a file, we say that we open it, and when we are finished using the file, we close it. The notions come from the old-fashioned concept of a filing cabinet whose drawer must be opened to get the papers (``files'') inside and closed when no longer used. Indeed, a computer must do a similar, internal ``opening'' of a disk file to see its contents, and upon conclusion a ``closing'' of the disk file to prevent damage to its contents.

Files can be read/written in two ways:

Sequential files are simple to manage and work well for standard applications. Random access files are used primarily for database applications, where specific bits of information must be found and updated. In this chapter, we deal only with sequential files.

When a Python program uses a sequential file, it must state whether the file is used for

It is possible for a program to open a sequential file, read from it, close it, and then reopen it and write to it. If one desires a file that one can open and both read and write at will, then it is best to use a random-access file.

From this point onwards, we use the term sequential file to mean a sequential file of (codings of) characters. We will not deal with binary files.


7.1 How to read and write a sequential file in Python

This is really easy, and the example program below shows the key ideas. You can also read Dawson, Chapter 7, for additional, nonessential details.
FIGURE================================================

# This program makes a copy of one sequential file into another.

# Ask the user for the names of the files to read from and to write to:

inputfilename = raw_input("Type input file to copy: ")
outputfilename = raw_input("Type output file: ")  # this can be a name of a
                                             # file that does not yet exist

# A file must be opened --- this requires a new variable to hold the
#   address in heap storage of the file's data-structure representation:

input = open(inputfilename, "r")  # "r" means we will read from inputfilename
output = open(outputfilename, "w") # "w" means we will write to outputfilename

## You can read the lines one by one with a while loop:
#
# line = input.readline()   # read the next line of  input --- this includes
#                           #   the ending  \r  and  \n  characters!
#                           # The line is of course a string.
#
# while line != "" :        # When the input is all used up,  line == ""
#    print line
#    output.write(line)     # Write the string to the output file
#    line = input.readline()

# But here is the simplest way to read all the lines one by one:

for line in input :
    print line    
    output.write(line)

# when finished, you must close the files:
input.close()  
output.close()