Computer programs can read input from and write output to disk files. Simply stated, a file is a sequence of symbols stored on a diskette or on a computer's internal disk. There are two formats of files:
A file is indeed one long sequence; for example, these three text lines,
Hello to you!
How are you?
49
would be stored as a character file as this long sequence:
Hello to you!\nHow are you?\n 49\n
Every symbol, including the exact quantity of blanks, are
faithfully reproduced in the sequence.
Newline characters ('\n')
mark the ends of lines. If you create a character file
with a Windows-based PC, the Windows-based operating
system marks the end of each
line with the two-character sequence, \r\n, e.g.,
Hello to you!\r\nHow are you?\r\n 49\r\n
(The \r is ``carriage return'' and of course, \n is
``newline.'')
The reasons for this are purely historical.
If you could see with your own eyes the characters saved on the magnetic disk, you would not see letters like H and e. Instead, you would see that each character is translated to a standard sized, numeric coding. One standard coding is ASCII (pronounced ``as-key''), which uses one byte to represent each character. Another coding is Unicode, which uses two bytes to represent a character. (Unlike ASCII, Unicode can represent accented and umlauted letters.) We will not concern ourselves with which coding your computer uses to encode characters in a character file; these details are hidden from us.
If you like, start this script, which prints all the ASCII characters
in the range of 0 to 255:
for i in range(256) :
print i, "=", chr(i)
if (i % 10 == 0) :
raw_input("press Enter to proceed")
Note: chr is a Python function that maps an int in the range
0..255 to a one-character string, and ord maps a one-character
string into its underlying int.
A binary file is a sequence of ones and zeros. Binary files hold information that is not keyboard-based. For example, if one wishes to save information about the address numbers inside the computer's primary storage, then a binary file is a good choice. Also, purely numerical data is often stored in a binary file: Inside the computer, an integer like 49 is used in its binary representation, 11001, and not in its character representation (the characters '4' and '9'). It is easier for the computer to do arithmetic on binary representations, and programs that work extensively with files of numbers work faster when the input and output data are saved in binary files---this saves the trouble of converting from character codings to binary codings and back again.
Python even allows a program to copy data structures in computer heap storage to a binary file; this proves especially useful when one wants to archive the contents of an executing program for later use.
When we obtain input information from a file, we say that we read from the file; when we deposit output information into a file, we write to it. When we begin using a file, we say that we open it, and when we are finished using the file, we close it. The notions come from the old-fashioned concept of a filing cabinet whose drawer must be opened to get the papers (``files'') inside and closed when no longer used. Indeed, a computer must do a similar, internal ``opening'' of a disk file to see its contents, and upon conclusion a ``closing'' of the disk file to prevent damage to its contents.
Files can be read/written in two ways:
When a Python program uses a sequential file, it must state whether the file is used for
From this point onwards, we use the term sequential file to mean a sequential file of (codings of) characters. We will not deal with binary files.
FIGURE================================================ # This program makes a copy of one sequential file into another. # Ask the user for the names of the files to read from and to write to: inputfilename = raw_input("Type input file to copy: ") outputfilename = raw_input("Type output file: ") # this can be a name of a # file that does not yet exist # A file must be opened --- this requires a new variable to hold the # address in heap storage of the file's data-structure representation: input = open(inputfilename, "r") # "r" means we will read from inputfilename output = open(outputfilename, "w") # "w" means we will write to outputfilename ## You can read the lines one by one with a while loop: # # line = input.readline() # read the next line of input --- this includes # # the ending \r and \n characters! # # The line is of course a string. # # while line != "" : # When the input is all used up, line == "" # print line # output.write(line) # Write the string to the output file # line = input.readline() # But here is the simplest way to read all the lines one by one: for line in input : print line output.write(line) # when finished, you must close the files: input.close() output.close()