It is unlikely that you will ever design a general-use language like Fortran, C++, ML, or Prolog, but if you become a professional software engineer or software architect, it is highly likely that you will specialize in some problem area, like telecommunications, aviation, website management , banking, or gaming. You will become expert at building systems in your problem area, and you may well design a notation, a language, that helps you and others write solutions to problems in this area. In this case, you are a designer of a domain-specific language that is used to build domain-specific software architectures.
This chapter introduces these concepts, applying the concepts already learned.
Specific problem areas, e.g., flight-control or telecommunications or banking, use specific hardware architectures, and they also use specific software architectures. When a new model of airplane is designed, the hardware architecture (the airplane hardware, including its computers) is based on a hardware design that has succeeded in the past. (It is too great of a risk to start from scratch; it is also better to build on and refine what is known to work.) The software architecture for the plane will also be based on some standard layout that is known to work well.
Software architects use a collection of concepts, techniques, and patterns to build a new system in an established problem area; this collection is called a domain-specific software architecture:
Strictly speaking, the reference requirements form part of the terminology of the application domain, but they are often specially identified because they are often treated specially in the implementation methodology.
A language for discussing problems, behaviors, and solutions within a problem domain is a domain-specific language (DSL). The language's vocabulary includes concepts and notation from the domain --- the nouns, pronouns, adjectives, verbs, and adverbs of the language. The language lets participants (people and machines) discuss and implement solutions in the domain. Because its vocabulary is limited to the specific domain, a DSL is often useless for discussing and solving problems outside the domain.
A DSL uses concepts familiar to people who work in the domain. Here are two examples:
The entities can have features/attributes (``adjectives''): e.g., a word can be a label or data (in a cell). A grid has dimensions (rows and columns). A number can be data or a total value.
There are certainly operations on the entities --- inserting data into a cell, totalling the values in a row or column, printing a table.
A sequence or "script" of operations in some pattern or order, perhaps triggered by an event, is called an action. (Actions are "sentences" or commands.) Example: "When a number is inserted into Row 9, update the total for Row 9 using Equation 9 and redisplay the updated grid."
The secretary thinks in the language of the spreadsheet domain when building a spreadsheet, whether or not a computer program is helping to assemble and display the spreadsheet. But if the computer is doing the speadsheet layout, the DSL for spreadsheets becomes a programming language, e.g., Excel.
A sample behavior or "test case" of a spreadsheet is a scenario. For example,
domains (''nouns''): sites (building, floor, hallway, room), devices (alarm, movement detector, camera, badge), people (employee, guard, police, intruder). These are the ``nouns'' of the DSL.
Elements have features/attributes (``adjectives'') and operations (''verbs'').
Actions (``sentences'') are initiated by events.
Here is a scenario, stated in the DSL:
Compare the lingo of sensor alarms to the lingo you write in Java --- in the latter, the ``nouns'' are numbers, arrays, objects, and variables that name numbers, arrays, objects, etc. The ``adjectives'' are data types and other declaration modifiers. The ``operations'' are arithmetic, data-structure indexing, method call, etc. ``Actions'' are commands, or groups of commands. ``Events'' can be GUI events or a call to a method to start execution. Java is a ``DSL'' for computation on numbers and arrays and objects.
A DSL lets stakeholders (the participants in a systems project) communicate their ideas (needs, suggestions, solutions, implementations, orders). The DSL is a is a modelling language that lets us discuss models, structures, and behaviors specialized to a problem domain like telecommunications, banking, transportation, gaming, algebra, typesetting, etc.
If the computer is a ``participant,'' that is, we can use the DSL to tell the computer what to do --- we can program the computer --- then the DSL is a domain-specific programming language (DSPL).
But not all computational mechanisms are reactive. For example, the equational language of algebra is a DSL, and the computation underlying its equation sets are simplification laws.
Yet another variation is a domain related to constraint solving, such as crossword puzzles or Sudoku or database queries, where the domain language is a set of clues or constraints that must be solved.
In these cases, the appropriate DSL might be less``event-action oriented'', but in any case, it will certainly remain as the appropriate language that the stakeholders use to discuss their problems and the solutions.
One might argue that a general-purpose computer language is ``domain un-specific'' because it favors no one application domain very much over another. A user of a general-purpose language must become an expert modeller of real-life application domains in the domains of the general-purpose language --- This is core computer science: how to model real-life domains and domain language within general-purpose computing machines and general-purpose computer language.
On the other side of the coin, we can argue that a language like C is domain-specific to the domain of von Neumann machines, and a language like Java is domain-specific to heap-based object machines. Such machines are used to mimick/model other computational domains, and this is why we use C or Java to mimick/model other DSLs.
When the complexities in domain modelling become too great, the general-purpose programming language must be abandoned for a domain-specific one.
But there is another origin of DSPLs that comes totally from within the programming world: It is inconvenient to drag out a general-purpose language to code a solution to something small and simple. For example, do you code a Java program each time you do some calculator arithmetic? No --- you use a calculator language instead. It is always better to use a smaller, simpler language --- a DSPL --- that matches the problem you face.
For this reason, programmers sometimes call DSPLs ``little languages'' (e.g., ``here is a little language for drawing figures''; ``here is a little language for linking files''.) Here is a short list of ''little language'' DSPLs that have/had wide use:
In terms of domain-specific software architecture, someone might ask you,
For example, ''It would be nice to have a little language to help us lay out the wiring and sensors for a building's alarm system.''.
Or, ''It would be nice to have a little language to help us write the protocols for how the movement detectors send/receive messages to/from the other devices and people in the network.''
This kind of wishful thinking can lead to a domain-specific programming language, in particular, a top-down domain-specific programming language.
Excel is a good example --- it has a nice mix of graphical and textual notations that falls within the grasp of a user who has rudimentary math and problem-solving skills. The user can lay out a spreadsheet that computes totals of rows and columns. (If you have never used Excel or a spreadsheet tool, you can read a tutorial here.)
Another good example is Yacc --- a user writes the BNF rules of a language, and this gives the information the Yacc compiler uses to build a parser matching the BNF rules. (Here is a tutorial.) The programmer can attach semantic-processing components to the generated parser by pairing the components with the BNF rules.
Another good example is SQL --- without knowing the internal layout of a data base, a user can write a query in terms of an implicit logic of sets and set operations, and the SQL interpreter executes the query as if it were a data-structures lookup algorithm. (There is a demo and tutorial here.)
HTML lets a user format a web document in terms of paragraphs, lists, and fonts and hides the details of spacing, line breaks, and painting text and pictures. (Use the View/PageSource menu option on your web browser to see this chapter's HTML coding. Here is a tutorial.) CSS is another little language, used with HTML, to set default layouts, fonts, and colors for an HTML file.
There is one critical standard for the success of a top-down DSPL:
Upon first hearing, it sounds like top-down DSPLs are wonderful --- a language for just my problem that lets me say exactly what I want! --- but in reality, a top-down DSPL is a ``mixed bag'' of assets and drawbacks:
The starting point is this: become an expert in the problem domain: learn the vocabulary --- nouns, verbs, and adjectives. Develop many scenarios (case studies) within the domain. Extract from the scenarios patterns or schemes of structure, behavior, computation. Build computer systems in the domain; learn about their
The language must be friendly towards these persons' views of the domain. If the top-down DSPL is for non-expert programmers (e.g., like Excel or HTML), then you must de-emphasize procedureal programming notions like assignments, control, and data-structures. You must use classic definition-style concepts, like equations, arithmetic-style functions, and Prolog-style predicates --- these appear often in other areas of science and technology.
Most non-experts have difficulty with control structures of any form --- sequencing is about the most they can handle. Repetition is often a challenge.
Data structures must be kept simple, resembling real-life, physical structures (a sheet of graph paper, a chest of drawers, a filing cabinet, a dictionary) or resembling the structures that are fundamental to the problem domain (hallways, buildings, wiring bundles...).
Keep this directive in mind, always:
If the DSPL's users are forced to code in notation and concepts that lie outside their problem domain, the users will get lost. (That's why non-programmers don't use Java as a DSPL for spreadsheet building!) In a serious development effort, you will design an IDE-like tool as well.
The DSL talks about ``gates'', which have features like input connections (ports/wires) and output connections. Gates have a function (feature): AND, OR, NOT, etc. Gates are assembled into subassemblies (e.g., one can build an XOR-subsassembly), and subassemblies must be connected and grouped on a board. Gates and subassembles might be annotated with power and space requirements.
The DSL is not event-or-action driven, like an alarm system or GUI, so scenarios are more like puzzles:
IN1 IN2 | OUT1 OUT2 ---------------------- 1 1 | 1 0 1 0 | 1 1 0 1 | 1 1 0 0 | 0 0
We might design a language whose scripts handle scenarios like the ones
above, e.g., the first scenario is programmed like this:
ASSEMBLY A1 : inputs IN1, IN2; outputs OUT1.
w1 = AND(w1, w2).
w2 = OR(w1, IN2).
OUT1 = NOT(w2).
ENDASSEMBLY
The connections are coded as equations, where the left-hand side of
each equation is the name of a wire.
The second scenario might be programmed like this:
ASSEMBLE A2 : inputs IN1, IN2; outputs OUT1, OUT2.
solve for 1 1 1 0;
1 0 1 1;
0 1 1 1;
0 0 0 0.
using {NAND}
ENDASSEMBLY
The new operation, solve for _ using {_}, accepts the tabular input and generates
a solution that is named A2 that has the required functionality and behavior.
The third might go like this:
ASSEMBLY A3 : inputs IN1; outputs OUT1;
suchthat w2.voltage < (2 mv).
w1 = AND(In1, w2).
w2 = NOT(w1)
OUT1 = w2
ENDASSEMBLY
Here, the constraint on a wire's voltage is listed as part of the assembly's
output specification ("suchthat ..."). The user does not code how to solve the constraint ---
the implementation knows the details.
At this point, think about what operations you might add to this little language that connects assemblies like A1 to A2 or A3 --- you will want a little linking language, and it should probably look like equations that "equate" inputs to outputs. Try it.
Of course, there are many hardware languages ("VHDLs") that can do all the above and then some.
Next, consider how a parser and interpreter would be defined to read programs in this DSPL and execute them (in this case, generate circuit-diagram layouts for a board).
In summary, little languages (top-down DSPLs) solve problems in a narrow problem domain in a language that is understood by the workers in the domain.
Users of top-down DSPLs are even more ``IDE-dependent'' than programmers. For example, an Excel user will interact with Excel's GUI to insert data into cells of a spreadsheet and write equations that are embedded into the spreadsheet's ``logic'' so that the row-and-column totals are correctly computed and displayed. Exactly where is the ``program''?
In a note at
http://martinfowler.com/articles/languageWorkbench.html Martin Fowler
coins the term, ``projecting editor'' for the GUI portion of a DSPL's
IDE:
Here, the editor keeps an abstract representation
of the program that is filled in bit by bit, not necessarily sequentially,
not always as script. The program's abstract representation might
be a parse-tree-plus-symbol-table or some other data structure that
stores the program's semantic intent. The tool must have a back end
that can interpret the abstract program or can generate a script
that can be interpreted. (The ``storage representation'' in the diagram
is some
file format that archives the abstract representation
at the end of the IDE session.)
If you are an Emacs or a vi or an Eclipse or a Visual Studio user, you are using a DSPL for document generation, presented in the format of a GUI (even a command-window GUI). The most extreme view is that any user interface for any application is a DSPL.
If you are developing a top-down DSPL for non-programmer users, then you are almost certainly forced to develop an IDE to go with it.
There is another variant of DSPL, one that is used by an experienced programmer who wants to ``extend'' a general-purpose language with concepts specific to the problem domain. In this situation, the DSPL is added to the general-purpose, host language, so that programs are a mix of host-language code and the DSPL.
This is called a bottom-up (or ``internal'') DSPL.
Consider these libraries for GUI-building:
These libraries are called frameworks, because each has their own collection of components that implement nouns (''window,'' ''frame,'' ''button,'' ''layout,'' ...) and verbs (''setTitle,'' ''getText,'' ''paint,'' ...) of the GUI domain. They usually come with sample programs that suggest patterns for assembling and calling the components. But they are implemented in their host languages, and a programmer must write (lots of) code in the host language assemble a working GUI from the GUI framework.
A GUI framework is an almost-DSPL for GUIs, because it is a library that implements GUI concepts, but there is no ``programming language'' for GUI building, only the components and suggested assembly patterns, where the assembly is written in host-language code.
Again, an application is a mixture of components from the GUI-framework and calls to the components of the GUI-framework and coding written from scratch in the host language. It is a mixture.
A GUI framework is often ``married'' to its host language by means of a visual editor; Visual Basic and Visual C++ are standard examples. The visual editor tries to fill the gap between framework and DSPL by giving the user some guidance in GUI buiding.
Experienced programmers naturally become bottom-up DSPL designers, because over time they assemble a library of components and custom data-structure patterns and custom control-structure patterns and custom assembly/linking patterns (all written as templates or macros) that they use over and over again to solve problems in the same domain.
Eventually, the programs they write consist almost totally of their components and their pre-written template/macro patterns. The host programming language acts merely as minimal ``glue code'' for connecting the components and patterns.
At this point, the host language plus the library or components and patterns is a bottom-up DSPL, because the library has become ``more important'' to the problem solving than the host language itself. What has happened is this:
The custom-written library for the problem area is written in the host language, and it is oriented towards encoding ``domain-concepts-as-code'' (nouns as data-structure patterns, verbs as operations/control-structure patterns, sentences and paragraphs as assembly patterns) so that the scenarios discussed in the domain's DSL can be readily converted into programs in the bottom-up DSPL. Experienced programmers have good instincts for coding domain concepts as code and saving them as libraries. It is almost a matter of survival --- there is never enough time to build a new solution completely from scratch!
Many of the design ideas from object-oriented design can be applied in the development of a bottom-up DSPL, where classes and methods are used as nouns and verbs and adjectives. Languages like Scheme (via lambda abstraction and hygienic macros) and Smalltalk and Ruby (via blocks and macros) let a programmer easily define custom control structures and templates directly in source-code syntax. Languages like Scala use templates and even allow access to the language's parser modules for language extension by patterns.
But any general-purpose language can serve as a host language. Usually the host language is whatever language in which the starting libraries and frameworks are written.
A bottom-up DSPL has its strengths and weaknesses also:
Experienced programmers are the natural users of a bottom-up DSPL because they design it themselves, over time, as a library of implemented components and patterns. Eventually the host programming language acts merely as ``glue'' for connecting the components selected from the library: The programmer has extended the host language ``upwards'' towards the problems to be solved.
Bottom-up design might go like this: Use a framework to write lots of systems in a domain. Add more and more components to the framework. Classes are a good tool for coding data-structure patterns.
Notice which control and assembly patterns you are copying-and-pasting into the systems you build. Code these patterns into macros or parameterized procedures/control structures, that is, find some way to extend your host language with the patterns.
Repeat the previous steps until the components and patterns you implemented have names and computation power that match the enitities/features/opertions/events/actions in the DSL that you think in and talk in.
Once you start your library of data structures and control structures, force yourself to use the library (and improve it!) as much as possible, instead of writing from scratch something similar. You should program by selecting code from your library and ``gluing'' it together with minimal code from the underlying host language.
Your ultimate goal is to make the library ``stand-alone,'' where you write programs just with your library and with almost zero new glue code from the host language. This means you will use the underlying host language only as a as an ``interface language'' to contact external components that you have not written, and you use the host language only as a ``trap door'' to ``escape'' from the problem-domain area to execute code from some other library or application.
Implicit in the previous paragraphs are the notions of framework and product line from mainstream Software Engineering: A framework is a data-structures library that often comes with sample program skeletons, written in the host language, that one studies to learn to use the framework. The programmer can modify an appropriate skeleton and fill in gaps with a mix of framework code and custom-written code. GUI libraries, server libraries, and protocol libraries are almost always organized as frameworks. You can find some Python-coded frameworks for mail servers and networking at http://effbot.org/librarybook/.
A product line is a family of programs based on one, fixed program skeleton, differing only in minor customizations. Consider a product line of cars all based on the same engine-chassis assembly. Now, consider the control software for the varieties of engine that can be installed in the car. The software is almost the same for all the engine variations, and each engine controller is generated from a standard parameterized program that is instantiated by data for the intended engine. Another example is Notepad/Wordpad/Word, which are based on the same word-processor structure but have different degrees of customizations for font choices, formatting, and file formats.
A product line of software is built from a library when the one and the same program skeleton is used for all software products, and the gaps in the skeleton are filled by other library components. This is a kind of bottom-up DSPL, where the selections of data structures, control structures, and components structures are tightly restricted.
In this situation, the DSL and its scenarios are extracted from the many existing, similar, redundant applications you have already constructed --- what are you copying and pasting each time? what are you imitating each time? what techniques have you already informally named and saved in "scratch files"? These are the best candidates for a stage of bottom-up DSL development. (Remember, a bottom-up DSL is constructed in many stages, over a period of time.)
There isn't time or space here to present dozens of grid-GUIs,
so here's just one, a game board, where coding patterns in the
Tkinkter-coded GUI are bounded by #****:
===================================================
#Game board for "Pente":
from Tkinter import *
### the CONTROLLER module --- this can be placed in a separate file, if desired.
import PenteBoard # the model subassembly
import ComputerPlayer
humans_mark = PenteBoard.player
computers_mark = PenteBoard.opponent
buttons = [] # a nested list that remembers addresses of all button objects
game_on = True # remembers if the game is still going on
#1 ***********************************8
def makeHandler(myrow, mycolumn) :
"""makeHandler constructs a handler function for a new button.
parameters:
myrow - an int, the row coordinate where the new button lives
mycolumn - an int, the column coordinate where the new button lives
returns: the handler function customized for a new button
"""
def handleButtonPress() :
"""handleButtonPress is the constructed handler function.
It makes the move for the human who pressed
this button (which is at position myrow,mycolumn).
If the move is legal, then the computer player is called next
to make its move. The updated board is then painted.
"""
global game_on
if game_on :
ok_move = PenteBoard.makeMove(myrow, mycolumn, humans_mark)
if ok_move :
if PenteBoard.checkWinner(humans_mark) :
game_on = False
repaintGUI()
else :
computers_move = ComputerPlayer.makeMove()
repaintGUI()
#print "computer moved to", computers_move
buttons[computers_move[0]][computers_move[1]].configure(bg = "yellow")
if PenteBoard.checkWinner(computers_mark) :
game_on = False
return handleButtonPress
#END 1 *****************************************
### the VIEW module starts here: ###############
def repaintGUI() :
"""repaintGUI repaints the foreground text of all the buttons on the GUI,
it also updates the displayed count of captures, and if there is a
winner, it prints a message as to who won.
"""
global buttons, label1, label2, label3
#2 ****************************
for i in range(size) :
for j in range(size) :
buttons[i][j].configure(text = PenteBoard.contents(i,j))
buttons[i][j].configure(bg = "white")
#END 2 *****************************
label1.configure(text = "Captures for " + humans_mark + " = " \
+ str(PenteBoard.getCapturesFor(humans_mark)))
label2.configure(text = "Captures for " + computers_mark + " = " \
+ str(PenteBoard.getCapturesFor(computers_mark)))
winner = ""
if PenteBoard.checkWinner(humans_mark) :
winner = humans_mark
elif PenteBoard.checkWinner(computers_mark) :
winner = computers_mark
if winner != "" :
label3.configure(text = "GAME OVER: " + winner + " WINS!")
#3 *********************************
window = Tk()
window.title("Pente")
size = PenteBoard.size
window.geometry(str(50 * size) + "x" + str(50 * (size)))
frame = Frame(window)
frame.grid()
#END 3 *********************************
label1 = Label(frame,
text = "Captures for " + humans_mark + " = " \
+ str(PenteBoard.getCapturesFor(humans_mark)),
font=("Arial", 12, "bold") )
label1.grid(row = 0, column = 0, columnspan = 5)
label2 = Label(frame,
text = "Captures for " + computers_mark + " = " \
+ str(PenteBoard.getCapturesFor(computers_mark)),
font=("Arial", 12, "bold") )
label2.grid(row = 1, column = 0, columnspan = 5)
#4 **********************************
for i in range(size) :
button_row = []
for j in range(size) :
button = Button(frame,
font = ("Arial", 14, "bold"), fg = "blue", bg = "white",
width = 2, height = 1)
button.configure(text = PenteBoard.contents(i,j))
button.configure(command = makeHandler(i, j))
button.grid(row = i+2, column = j)
button_row = button_row + [button]
buttons.append(button_row)
#END 4 *********************************
label3 = Label(frame, text = " ", font=("Arial", 12, "bold") )
label3.grid(row = size+2, column = 0, columnspan = 6)
window.mainloop() # activate GUI
===================================================
Here are the proposed macro patterns and their expansions into Tkinter/Python code:
=================================================== WINDOWNAME = Tk() WINDOWNAME.title(TITLE) size = SIZE WINDOWNAME.geometry(str(50 * SIZE) + "x" + str(50 * SIZE)) frame = Frame(WINDOWNAME) frame.grid() ===================================================Now, why don't we just define a function/method named initGridGUI and save it in a separate module/class M, and call it in the usual way, e.g., M.initGridGUI(...,...)?
This isn't a good idea, because the code named by initGridGUI isn't part of a separate subassembly or component that makes sense by itself --- the code is literally "torn out" of the application and given an abbreviation for a name. The code makes no sense by itself in a separate component --- the macro definition is a "short hand" or abbreviation for pretending we have added to the Tkinter/Python language a new "primitive" command for GUI setup.
Notice also that the macro expansion initializes global variables like size, that are not declared in the pattern. In this way, macros can be used to "bend" variable visibility rules.
=================================================== for i in range(I) : button_row = [] for j in range(J) : #NOTE: size was initialized by the earlier macro button = Button(frame, OPTIONS) button.configure(text = MODEL.contents(i,j)) button.configure(command = makeHandler(i, j)) button.grid(row = i+2, column = j) button_row = button_row + [button] buttons.append(button_row) ===================================================The macro assumes that the MODEL component has a contents method that understands that the GUI button grid as semantics in the model. The host language's interface checker will verify this is so when it checks the expanded program, where all macros are expanded to host-language code.
Notice also that the code calls makeHandler, which is a helper method the defines the event-handler for each constructed button. We will define the makeHandler macro separately, but it seems we should have a macro-assembly pattern for using the @configure ... buttons macro correctly with the macro that follows:
=================================================== def makeHandler(myrow, mycolumn) : """makeHandler constructs a handler function for a new button. parameters: myrow - an int, the row coordinate where the new button lives mycolumn - an int, the column coordinate where the new button lives returns: the handler function customized for a new button """ def handleButtonPress() : """handleButtonPress is the constructed handler function. It makes the move intended by the button press. """ global game_on # ASSUMES THERE IS A GLOBAL VAR, game_on if game_on : HANDLERCODE repaintGUI() # ASSUMES METHOD, repaintGUI return handleButtonPress ===================================================Argument HANDLERCODE is a command sequence or the name of a function. For example, we can call it like this:
@handler PenteBoard.makeMove(@ROW, @COLUMN, humans_mark) if ok_move : if PenteBoard.checkWinner(humans_mark) : game_on = False else : computers_move = ComputerPlayer.makeMove() buttons[computers_move[0]][computers_move[1]].configure(bg = "yellow") if PenteBoard.checkWinner(computers_mark) : game_on = False @endand the command sequence that is the argument to @handler is pasted into its correct position in the expanded macro. The example uses two little macros, @ROW and @COLUMN, which expand to the (hidden) parameter names, myrow and mycolumn.
We defined the @handler macro separately from the earlier one because we want the exapnded code to look somewhat natural, like the example program we started from, and also is it not always possible to define a function in an arbitrary position in the program.
Since a call to @handler makes no sense without an
accompanying call to @configure .. buttons, we define an
assembly-pattern macro, which inserts the
needed subassemblies in proper order:
@GridGUI FRAMECONSTRUCTION
Handler HANDLERDEFINITION
Buttons CONFIGUREDBUTTONS
Widgets OTHERWIDGETS
Paint REPAINTMETHOD
@End
Here is the example rebuilt with the macros:
===================================================
from Tkinter import *
import PenteBoard # the model subassembly
import ComputerPlayer
humans_mark = PenteBoard.player
computers_mark = PenteBoard.opponent
buttons = [] # a nested list that remembers addresses of all button objects
game_on = True # remembers if the game is still going on
@GridGUI
@initGridGUI(window, 'Pente Board', PenteBoard.size)
Handler
@handler
PenteBoard.makeMove(@ROW, @COLUMN, humans_mark)
if ok_move : ... # SEE ABOVE EXAMPLE FOR REST OF CODE
@end
Buttons
size = PenteBoard.size
@configure size by size buttons for PenteModel
where
font = ("Arial", 14, "bold"),
fg = "blue", bg = "white",
width = 2, height = 1
@end
Widgets
... CODE THAT BUILDS LABELS AND REST OF GUI; SEE ORIGINAL EXAMPLE.
Paint
def repaintGUI():
... # CODING OF repaintGUI METHOD FROM ORIGINAL EXAMPLE.
@End # gridGUI
window.mainloop() # activate GUI
===================================================
The macros are meant to look and behave like part of the
Tkinter/Python host language. There is certainly more work to do to improve
them (@handler's argument, which is command code, exposes some apparently
unbound variables; we can do better!), but the basic methodology is in place and works immediately.
A danger in any top-down DSPL is that it is isolated from other systems and implementations. Your top-down DSPL should let you call library components and execute code written in the implementation language. To do this, add a ``trap door'' to the DSPL so that the execution of the DSPL program can be paused and the implementation-language code can be executed instead. Many scripting languages provide such a trap door, in the guise of an eval operation, which takes as its argument a string that holds executable code --- eval runs the code. Here are three useful forms of trap door in Python:
Here is a program that builds a string and runs it:
x = 2; y = 3; z = 5
invar = raw_input("Type name of variable (x, y, or z) to zero out: ")
if invar in ("x", "y", "z") :
code = invar + " = 0"
else :
code = "pass"
exec(code)
The exec command can also read and execute the contents of an opened text file:
handleToCodefile = open("MyPythonProgram.py", "r") # open a readable file
exec(handleToCodefile) # execute its contents
import os cwd = os.getcwd() # get current working directory if os.path.basename(cwd) == "MyPictures": # is the lowest-level dir "MyPictures" ? # then, move up one level to parent directory: os.chdir(os.pardir) print "Current path is ", os.getcwd() os.system("ls -a") # ANY OS command can be supplied as a string arg
# run an external program from within Python code: import subprocess # general format: subprocess.call(["program-name", "param1", "param2", ...]) subprocess.call(["C:/Python26/Python.exe", "MyPythonPgm.py"])
=================================================== CL : CommandList A : Atom C : Command S : String CL ::= C | C . CL C ::= A1 eats A2 | do S A ::= bird | bug S is a quoted string ===================================================An example program that the GUI might generate is
bird eats bug. bird eats bug. bug eats birdThe game has limited functionality (haha), but notice the do command, which is a trap door that lets a programmer insert Python code that directly manipulates the language's interpreter, say, like this:
bird eats bug. do "census['cat'] = 1\ncensus['bird'] = 0\nprint 'uh oh!'"The string holds Python code:
census['cat'] = 1 census['bird'] = 0 print 'uh oh!'Here is the interpreter for the bird-cage language:
===================================================
"""Interpreter for mini top-down DSL for bird-cage domain of birds and bugs.
Includes trap-door operation, do S, for embedding Python source code.
Source language syntax to be parsed:
CL : CommandList A : Atom
C : Command S : String
CL ::= C | C . CL
C ::= A1 eats A2 | do S
A ::= bird | bug
S is a quoted string
Operator-tree structures resulting from the parser:
CLIST ::= [ C* ]
CTREE ::= ["eat", A1, A2 ] | ["do", S]
A ::= "bird" | "bug"
S ::= a quoted string
"""
# Global variable: remembers count of entities in bird cage:
census = {"bird": 9, "bug": 99}
def interpretCLIST(p) :
"""interprets CLIST p"""
for command in p :
interpretCTREE(command)
def interpretCTREE(c) :
"""interprets CTREE c"""
operator = c[0]
if operator == "eat" :
eater = c[1]
lunch = c[2]
if census[eater] > 0 and census[lunch] > 0 :
census[lunch] = census[lunch] - 1
elif operator == "do" : # trap-door ``eval'' operation ---
exec(c[1]) # executes c[1] as python code. Can affect census,
# add new global variables to interpreter's namespace,
# print trace information, etc.
else :
crash("invalid command")
def crash(message) :
print message + "! crash! core dump: ", census
raise Exception
def main(program) :
"""interprets the operator tree, program"""
interpretCLIST(program)
print "final census =", census
===================================================
Here are some sample uses of the interpreter:
python -i top.py
>>> main([["eat", "bird", "bug"]])
final census = {'bird': 9, 'bug': 98}
>>> main([["eat", "bird", "bug"], ["do", "census['cat'] = 1\ncensus['bird'] = 0\nprint 'uh oh!'"]])
uh oh!
final census = {'bird': 0, 'bug': 97, 'cat': 1}
The do command lets a programmer escape from the limited functionality
of the DSPL and use the operations of the implementation language.
A good host language will give you a technique to add custom patterns. Here is a simple example:
Say that your problem domain has lots of solutions that use the
phrase, ``repeat ACTION until CONDITION holds'' so that
this pattern should be added to the DSPL
library. Some languages let you define higher-order functions
(functions that take code/closures as parameters) in mix-fix
keyword notation, like this:
def repeat(action)until(condition)holds :
"""executes the command, action, until expression, condition, is true"""
action() # do the action step
if condition(): # finished ?
return
else: # do it again:
repeat(action)until(condition)holds
This defines a function named, repeat..action..until.
The function is used in a program like this:
...
repeat([x = x - 1])until([x == 0])end
...
The brackets, [..], are quoting the code .., that is,
constructing a closure holding the code.
Functional languages, like Scheme and Haskell, support this approach,
as do Ruby and Smalltalk to a lesser degree.
For older programming languages, the traditional way to add custom control structures is with a macro-processor (``preprocessor''). A macro-processor is a program that reads as input a program in the host language that has the custom structures mixed into the code. The macro-processor locates the occurrences of the custom structures and replaces them with the instructions in the host language that perform the intended operations.
C's preprocessor is a standard but not-too-exciting example. A segment of C code like this,
#define PI 3.14159
#define Double(x) (x + x)
// now, PI and Double act like they are built-in C functions:
y = Double(PI * 5) ;
defines two macros, PI and double, which look like functions and
can be called like functions.
When the above code
is input to C's preprocessor, this text is the output:
y = (3.14159 * 5 + 3.14159 * 5) ;
The macro definitions are removed, and the calls are replaced by
C-text, giving a program in pure C.
When a macro is called, its arguments are text and not computed values! At a macro call, the text argument is bound to the parameter and the text is inserted for occurrences of the parameter in the macro body. The text computed by the macro's body is copied back in place ofthe macro call. In the example, y = Double(PI * 5) is rewritten to y = (PI * 5 + PI * 5), which is rewritten to y = (3.14159 * 5 + PI * 5), which is rewritten to y = (3.14159 * 5 + 3.14159 * 5). The example shows why the macro-processor must be a separate program, run first, before the parser, interpreter or translator. There is a preprocessor, called GPP, that can be used stand-alone to process any program that contains C-like macros. Like C's preprocessor, GPP requires that a macro call look like a function call, of the form, MACRONAME(ARG1, ... ARGn). The m4 macroprocessor lets its user write macro definitions whose calls look somewhat like the mix-fix notation seen in the previous repeat..until..holds example.
Here are some references for existing macro-processors:
Ruby supports a ``block'' construction (the [..] syntax) that makes it possible to code simple customized control structures directly in Ruby. There are some Ruby-implementation approaches at http://weblog.jamisbuck.org/2006/4/20/writing-domain-specific-languages
Some host languages (e.g., Scheme and C) come with their own macro-processors. Others (e.g., Smalltalk and Ruby) have flexible procedure-call syntax for defining new patterns. Others (e.g., Perl, PHP, Python, Ruby) supply regular-expression libraries that have powerful pattern-matching operations that you can use to write your own macro-processor.
(If you have never worked with regular expressions, then think of them as "string patterns". The example below will give you enough background to cope.)
Here is an example of using regular-expression string matching in Python. We use Python's
regular-expression module, re, to define a pattern, match
the pattern in a string, and replace it.
The comments in the code explain how this operates:
===================================================
import re # re is the module of regular-expression operations
# Here is a pattern that matches strings of form,
# @DOUBLE alpha END
# where alpha is some substring that holds no occurrences of @ :
# "(\\s*)@DOUBLE\\b([^@]*?)\\bEND\\b"
# where
# \\s means a whitespace character
# \\b means a word boundary
# E* means match E zero or more times as much as possible for success
# E*? means match E zero or more times as little as possible for success
# [^c] means match any character that is NOT character c
# The parens mark _groups_ that are used below.
# p is a string-matching object compiled from the pattern string:
p = re.compile("(\\s*)@DOUBLE\\b([^@]*?)\\bEND\\b")
# try this multi-line example:
source = """
x = 0
x = @DOUBLE x END
print x
"""
print "source text ="
print source
# search for compiled pattern p in source:
m = p.search(source)
print
# if the match succeeds, m is an object; else m = None
print "match result =", m
# m holds a list of substrings that matched parenthesized groups in the pattern:
print "matched groups =", m.groups()
# m also holds the start and end indexes of the matched string:
print "span of matched text =", m.span()
# the start and end indexes can be referenced individually, too:
print "matched text =", source[m.start() : m.end()]
print
# let's replace the matched string by something else:
matches = m.groups()
source = source[:m.start()] \
+ matches[0] + "(2 * " + matches[1] + ")" \
+ source[m.end():]
print "updated text ="
print source
# We have completed a simple macro-expansion of "!DOUBLE alpha END"
# into "(2 * alpha )", preserving any leading spacing
===================================================
Here is the output from the above script:
source text =
x = 0
x = @DOUBLE x END
print x
match result = <_sre.SRE_Match object at 0x7ff3d6e0>
matched groups = (' ', ' x ')
span of matched text = (10, 25)
matched text = @DOUBLE x END
updated text =
x = 0
x = (2 * x )
print x
The example shows that patterns can be complex. There is a tutorial
on writing patterns at
http://docs.python.org/howto/regex.html
and there is a mostly complete listing of pattern options at
http://docs.python.org/library/re.html.
We now use the ideas in the example to write
a macro-processor in Python that searches for macro-call
patterns and replaces them with expansions. Here are the two
macro calls the processor will perform:
@REPEAT Code FOR Expr TIMES ===> newvar = Expr
while newvar > 0 :
Code
newvar = newvar - 1
@DOUBLE Expr END ===> ((Expr) * 2)
Each macro call on the left is coded as a pattern string,
and each translation is done by a Python-coded handler function.
The macro-processor's main data structure is a list of
(compiled-pattern, handler-function) pairs.
Here is the macro-processor:
===================================================
"""Simplistic macroprocessor based on regular expressions.
main data structure:
macrotable : list of (COMPILED_PATTERN, HANDLER) pairs
Example:
macrotable = [ (re.compile("(\\s*)@REPEAT\\b(\\s*)([^@]*?)\\bFOR\\b([^@]*?)\\bTIMES\\b")
translateREPEAT),
(re.compile("@DOUBLE\\b([^@]*?)\\bEND\\b"), translateDOUBLE) ]
holds these two macro definitions:
indent1 @REPEAT indent2 alpha FOR beta TIMES
=> translateREPEAT(indent1,indent2,alpha,beta)
@DOUBLE alpha END => translateDOUBLE(alpha,)
Compiled patterns, as written above, match macro-call symbol, @,
followed by keywords (which are required to be separate words by \\b )
such that included text arguments do not include any call symbols, @.
Note that E*? denotes the minimal match of E* such that
the overall pattern match succeeds. Thus, the macro-processor computes
inside-out processing of macro calls so that nested calls are never confused.
The pattern for @REPEAT also records the amount of indentations via (\\s*).
Macro-processor algorithm:
read source
repeat until no more macro matches:
search source for each compiled pattern in macrotable
if successful match,
then call accompanying handler function,
which assembles appropriate translation
insert translation in place of matched pattern in source
write source
"""
### This portion should be a separate module that holds the translation
### functions. It is embedded here for simplicity.
#GENSYM function:
var_count = 0 # count of new names generated for expanded macros
def genNewVar() :
"""genNewVar is a gensym function, generating unique new names
returns: a string of form, "_varN", where N is a unique nonneg int
"""
global var_count
newvar = "_var" + str(var_count)
var_count = var_count + 1
return newvar
def translateREPEAT(args) :
"""translateREPEAT expands this macro call:
indent1 @REPEAT
indent2 Code
FOR Expr TIMES =into=> indent1 newvar = Expr
indent1 while newvar > 0 :
indent2 Code
indent2 newvar = newvar - 1
where indent1 = args[0] and indent2 = args[1]
Code = args[2] and Expr = args[3]
(indent1 and indent2 are leading white-space)
returns: ans, a string holding the macro-expanded call
"""
indent1 = args[0]
indent2 = args[1]
bodycode = args[2]
exprcode = args[3]
# the call to REPEAT is replaced by this python code, as documented above:
newvar = genNewVar()
ans = indent1 + newvar + " = " + exprcode \
+ indent1 + "while " + newvar + " > 0:" \
+ indent2 + bodycode \
+ indent2 + newvar + " = " + newvar + " - 1"
return ans
def translateDOUBLE(arg) :
"""translates @DOUBLE(arg,) =into=> '((arg) * 2)' """
ans = "((" + arg[0] + ") * 2)"
return ans
### END OF HANDLER-FUNCTION MODULE
### MACRO PROCESSOR CONTROL ALGORITHM:
import re # import regular-expression module
# initialize macrotable:
macrotable = [ (re.compile("(\\s*)@REPEAT\\b(\\s*)([^@]*?)\\bFOR\\b([^@]*?)\\bTIMES\\b"),
translateREPEAT),
(re.compile("@DOUBLE\\b([^@]*?)\\bEND\\b"), translateDOUBLE)
]
# read source:
import sys
if len(sys.argv) < 2 :
inputfilename = raw_input("Type input file to copy: ")
else :
inputfilename = sys.argv[1]
input = open(inputfilename, "r")
source = input.read()
input.close()
# replace all macro calls:
still_matching = True
while still_matching :
still_matching = False
for (pattern, handler) in macrotable :
match = pattern.search(source)
if match : # != None
replacement = handler(match.groups())
source = source[:match.start()] + replacement + source[match.end():]
still_matching = True
# write source:
index = inputfilename.find(".py")
outputfilename = inputfilename[:index] + "out" + ".py"
output = open(outputfilename, "w")
output.write(source)
output.close()
print
print "Contents of " + outputfilename + ":"
print source
===================================================
Say we have this file, test.py, whose contents are:
x = 0
@REPEAT
x = @DOUBLE x + 1 END
@REPEAT
pass
FOR 2 TIMES
FOR 3 TIMES
print x
When we use the macroprocessor to rewrite the file (python macrop.py test.py), we get this report:
Contents of testout.py:
x = 0
_var1 = 3
while _var1 > 0:
x = (( x + 1 ) * 2)
_var0 = 2
while _var0 > 0:
pass
_var0 = _var0 - 1
_var1 = _var1 - 1
print x
All macro calls are expanded.
To close the gap, you might design a little control language that is implemented by a parser and interpreter. The control language executes the new constructions you design to fill the "gap" and uses a trap door to pass to the host-language interpreter/compiler the code that is written in the host language (and execute it).
The result is a partly top-down, partly bottom-up DSPL.
Although it is almost never done, it is an excellent project to implement a DSPL one way and then use the acquired knowledge to implement it the ``inverse way.'' That is, if you designed a DSPL top-down, try to extract from its interpreter the parts that become components for a bottom-up implementation. Dually, if you first built a bottom-up DSPL, then next use the components as the ``logic'' within a top-down, interpreter implementation. The second version of the language might be the one that you prefer!
Any domain defines individuals, attributes, and operations. These are exactly the constituants of a first-order logic, and the behavior of the constituants is defined by the axioms of the logic. If the domain is puzzle- or constraint-oriented, that is, its scenarios go like this: "find a solution to these rules/restrictions", then Prolog is prefect for coding the behaviors as Horn clauses and the scenarios as queries. Horn clauses also code "macros" or "patterns" that make the behaviors and queries easier to express.
Again, if your domain is oriented towards solving puzzles or constraints, first try coding your little language in Prolog before you do the hard work of building a stand-alone parser, interpreter, and constraint solver. Prolog's theorem prover (constraint solver) is simple, smart, and hides implementation details --- Prolog is already a "half-way DSPL".
When and how to develop domain-specific languages by M. Mernik, J. Heering, A.M. Sloane, CWI, Amsterdam, 2005.
Domain-Specific Languages: An Annotated Bibliography by A. van Deursen, P. Klint, and J. Visser, 2000.
There is also a text that might make some sense to you at this point: Domain-Specific Languages by Martin Fowler.
Here is a useful on-line text that documents how to use Python for advanced systems and language development: The Python Standard Library by Fredrik Lundh.