Copyright © 2010 David Schmidt

Chapter 9:
Domain-Specific Languages


9.1 A story
9.2 Domain-specific software architecture
9.3 Domain-specific language (DSL) and domain-specific programming language (DSPL)
    9.3.1 From DSL to DSPL
    9.3.2 DSPLs as ``little languages''
9.4 Top-down (``external'') DSPL
    9.4.1 Developing a top-down DSPL
    9.4.2 Example top-down language: Gate-layout language
    9.4.3 Script versus projecting editor
    9.4.4 Summary
9.5 Bottom-up (''internal'') DSPL
    9.5.1 A bottom-up DSPL evolves over time
    9.5.2 Developing a bottom-up DSPL
    9.5.3 Example: Grid-GUI patterns
    9.5.4 Example: Observer design pattern
    9.5.5 Summary
9.6 Hybrid DSPL
    9.6.1 Implementation techniques
9.7 Further reading


It is unlikely that you will ever design a general-use language like Fortran, C++, ML, or Prolog, but if you become a professional software engineer or software architect, it is highly likely that you will specialize in some problem area, like telecommunications, aviation, website management , banking, or gaming. You will become expert at building systems in your problem area, and you may well design a notation, a language, that helps you and others write solutions to problems in this area. In this case, you are a designer of a domain-specific language that is used to build domain-specific software architectures.

This chapter introduces these concepts, applying the concepts already learned.


9.1 A story

Here is a story about what this chapter is about. Say you bought a robot, one that has a camera for "seeing" objects and an arm for grasping objects. The robot can rotate and move forwards. You want to talk to the robot to tell it what to do, perhaps to tell it how to pick up your mess and clean house. Perhaps the robot's controller understands C, but it is painful to talk to anyone/anything in C. You want to talk in "robot language."

Here is how you might want to talk, technical-style, to the robot, to tell it how to pick up an item on the floor:

repeatedly rotate 30 degrees until you see an object on the floor.
repeatedly move forwards 6 inches until the object is within reach.
lower your arm then grasp the object then lift your arm.
return to Home Location.
These instructions aren't C code and they aren't dot-and-paren-laden method calls to library packages for robot controllers. They are a custom language for the robot domain, a robot-domain-specific language.

With the techniques in this chapter, you can design and implement the robot language. There are two main approaches:

  1. Top down: You design the syntax and semantics of robot language. The language has control structures like repeatedly and then, and primitive operators like move, grasp, and lift. You implement a parser and interpreter for the language. The interpreter uses the robot as its target machine. Perhaps you use PLY to generate the parser and Python to code the interpreter. Python's os package is used when you must execute segments of C code that operate the hardware (or there is a middleware platform, a virtual machine, that fits on the robot and the interpreter talks to it). The language implementation is a separate, stand-alone system just for talking to robots and not for doing anything else.

  2. Bottom up: You download the library packages for the robot controller. They are written in Java, and there are lots of methods, dots, and parens. You might use Java or Scala to augment the packages. In particular, since Scala can import Java packages, you can use Scala's parser-pattern language and its macro-processor package to add macros for repeatedly, then, move, etc., that expand into Scala code. You use Scala's parser and compiler to read, translate, and execute the robot language. The language implementation is Scala enriched with "robot language", and you can mix Scala code with robot code in a program that does both robot and non-robot things.
The starting point for designing such a new language, a domain-specific language, is to understand completely and thoroughly the problem domain:


9.2 Domain-specific software architecture

Every large system is built from software and hardware components; the pattern of layout and connection of the components is called its architecture. A software architecture is the layout of software components. The software architecture is deployed (installed) on the hardware architecture.

Specific problem areas, e.g., flight-control or telecommunications or banking, use specific hardware architectures, and they also use specific software architectures. When a new model of airplane is designed, the hardware architecture (the airplane hardware, including its computers) is based on a hardware design that has succeeded in the past. (It is too great of a risk to start from scratch; it is also better to build on and refine what is known to work.) The software architecture for the plane will also be based on some standard layout that is known to work well.

Software architects use a collection of concepts, techniques, and patterns to build a new system in an established problem area; this collection is called a domain-specific software architecture:

  1. domain-specific language: the language of the problem area: (i) concepts and terminology (words, phrases, and actions that the clients, designers, and builders use to discuss the problem and design the solution); (ii) customer requirements (what the system must do) and reference requirements (what the system can do --- see the next numbered item below); (iii) scenarios (examples of behaviors); (iv) configuration models (blueprinting techniques for the system and its operation --- entity-relationship ("class" or "dependency") diagrams, data flow ("sequence") diagrams, deployment diagrams, etc.)

  2. reference requirements: the ``features'' or ``customizations'' or ``attributes'' or ``ordering options'' that clients (customers/users) select to configure the desired system. (Think about all the choices you make when you order a brand new car from an auto dealer --- colors, engine options, accessories --- these are the reference requirements for the car you want.)

    The reference requirements are part of the terminology of the application domain, but they are often specially identified because they are often treated specially in the implementation methodology.

  3. reference architecture: the software and hardware architectures ("platforms") used as starting points for building the implementation, typically mapped out by blueprints. (These can be based on earlier versions/releases.)

  4. supporting environment/infrastructure: the available hardware platforms and software languages, libraries, frameworks, and development tools.

  5. a methodology for designing, implementing, and evaluating the system using the above items.
We can't study all these topics, so we focus on the first one: The language of a domain is called a domain-specific language (DSL).


9.3 Domain-specific language (DSL) and domain-specific programming language (DSPL)

English is a general-purpose language. Legal English (used by lawyers) is a special-purpose language, dedicated to writing contracts and laws --- it is specific to the domain of contracts and laws. Algebra is a domain-specific language for stating numerical relationships.

A language for discussing problems, behaviors, and solutions within a problem domain is a domain-specific language (DSL). The language's vocabulary includes concepts and notation from the domain --- the nouns, pronouns, adjectives, verbs, and adverbs of the language. The language lets participants (people and machines) discuss and implement solutions in the domain. Because its vocabulary is limited to the specific domain, a DSL is often useless for discussing and solving problems outside the domain.

A DSL uses concepts familiar to people who work in the domain. Here are two examples:

Spreadsheets:

To an accountant who must prepare spreadsheets each day, the "spreadsheet domain" is a little world of its own. Its domain consists of grids, cells, numbers, words. These are the "entities" or "nouns" of spreadsheet-language.

The entities can have features/attributes (``adjectives''): e.g., a word can be a label or data (in a cell). A grid has dimensions (rows and columns). A number can be data or a total value.

There are certainly operations on the entities --- inserting data into a cell, totalling the values in a row or column, printing a table.

A sequence or "script" of operations, perhaps triggered by an event, is called an action. Example action: "(event part:)"When a number is inserted into Row 9, then (operation part:) update the total for Row 9 using Equation 9 and redisplay the updated grid."

The accountant thinks in the language of the spreadsheet domain when building a spreadsheet, whether or not a computer program is helping to assemble and display the spreadsheet. But if the computer is doing the speadsheet layout, the DSL for spreadsheets becomes a programming language, e.g., Excel. (See below.)

A sample "use case" is a scenario. For example, the accountant might explain this scenario to the software architect:

"I have constructed a spreadsheet that models an order form, and I insert a new item to my order: in the first empty row of my order form, I typed the product number, the quantity, and cost per item. My typing (it's an "event") triggers these operations by the spreadsheet: (i) it computes, in the rightmost column of the row, the total cost of the items; (ii) it adds the items' total cost to the other costs in the rightmost column and updates the cell in the last row, last column, with the new total cost of my order."

Scenarios like this help the software architect implement the spreadsheet application for the accountant to use.

Alarm systems:

Say you install alarm systems in office buildings. You think and talk and work in the "alarm-system domain" with the building's owners and employees. A DSL for sensor-alarm networks would discuss

domains (''nouns''): sites (building, floor, hallway, room), devices (alarm, movement detector, camera, badge), people (employee, guard, police, intruder). These are the ``nouns'' of the DSL. There are also features/attributes (``adjectives'') and operations (''verbs'').

Here is a scenario:

``when a movement detector detects an intruder in a room (this is an event), it sends an alarm to the guard's remote monitor, it switches on the lights in the room, and it activates the room's camera.''

The scenarios help you design and install an alarm system with the desired devices and behaviors. If a computer is involved in the domain, then some of the DSL about alarm systems will be a computer language that you use to tell the computer what it must do.

Compare the lingo of alarms to the lingo you write in Java --- in the latter, the ``nouns'' are numbers, arrays, objects, and variables that name numbers, arrays, objects, etc. The ``adjectives'' are data types and other declaration modifiers. The ``operations'' are arithmetic, data-structure indexing, method call, etc. ``Actions'' are commands, or groups of commands. ``Events'' can be GUI events or a call to a method to start execution. Java is a ``DSL'' for computation on numbers and arrays and objects.

A DSL lets stakeholders (the participants in a systems project) communicate their ideas (needs, suggestions, solutions, implementations, orders). The DSL is a is a modelling language that lets us discuss models, structures, and behaviors specialized to a problem domain like telecommunications, banking, transportation, gaming, algebra, typesetting, etc.

If the computer is a ``participant,'' that is, we can use some subset of the DSL to tell the computer what to do --- we can program the computer in a domain-specific programming language (DSPL).


9.3.1 From DSL to DSPL

The previous point is critical. Consider the spreadsheet scenario again: If we build a spreadsheet application from scratch, say in Java, then we use scenarios to make "use-case realizations" --- an explanation of the Java components that are required to make the scenario come to life on a computer. We study a lot of scenarios; we write lots of use-case realizations; we design a lot of object diagrams, class diagrams, sequence diagrams, etc.; we code a lot of Java components; and we have a Java-coded system --- it's a lot of work, maybe weeks or months of work.

Now, say we learn Excel (or a similar spreadsheet application). Excel understands the DSL of spreadsheets, and we "talk" (program) directly in Excel, implementing the scenarios and the spreadsheet application in a matter of hours or days. This is because Excel is a DSPL for spreadsheets --- we say what we want in the language of Excel and we have the application we want in minimal time and effort. This is the big payoff of having a DSPL.

IMPORTANT: EXCEL is actually a pair: a GUI that translates user inputs into the spreadsheet's DSPL, and a DSPL parser-interpreter. DSPLs designed for non-software engineers are usually accompanied by graphical front ends like EXCEL's.

Indeed, many sophisticated IDEs are actually GUI-DSPL pairs. And, you might argue that the input language (of mouse and keyboard events) of any application defines a crude DSPL. But the bestapplications have an input language that matches the DSL of the problem domain for which the application is intended!

Other problem domains and their DSLs

Domain-specific languages are often used for describing reactive (event-driven) systems --- alarm systems, telecommunications systems, vending machines, multi-player games, single-user applications that use a GUI, and communications protocols --- hence the classification of the DSL into nouns, features, events, and operations.

But not all computational mechanisms are reactive. For example, the equational language of algebra is a DSL, and the computation underlying its equation sets are simplification laws. Yet another variation is the DSL of grammar equations, which we use to define programming languages. Another example is the box-and-arrow graphical language of UML class diagrams. There are many others --- see the section on "little languages."

In these cases, the DSL might be less``event/action oriented'', but in any case, it will be the language that the stakeholders use to discuss their problems and solutions.

General-purpose languages

Why are languages like C, Java, and Prolog called ``general purpose'' languages? After all, each such language is specific to data domains like numbers, strings, tables, structs, objects, relations, and so on.

One might argue that a general-purpose computer language is ``domain un-specific'' because it favors no one application domain very much over another. (A cynic would say that a general-purpose language is a ``no-domain language,'' since there is no real-life application domain that matches it!)

A user of a general-purpose language must become an expert modeller of real-life application domains in the domains of the general-purpose language --- This is core computer science: how to model real-life domains and domain-specific language within general-purpose computing machines and general-purpose computer language.

On the other side of the coin, we can argue that a language like C is domain-specific to the domain of von Neumann machines, and a language like Java is domain-specific to heap-based object machines. Such machines are used to mimick/model other computational domains, and this is why we use C or Java to mimick/model other DSLs.

When the complexities in domain modelling become too great, the general-purpose programming language must be abandoned for a domain-specific one.


9.3.2 DSPLs as ``little languages''

The previous section suggests you define a Domain-Specific Programming Language (DSPL) by extracting from a DSL its computational part. This treats the computer as one stakeholder in the community of participants.

But there is another origin of DSPLs that comes totally from within the programming world: It is inconvenient to drag out a general-purpose language to code a solution to something small and simple. For example, do you code a Java program each time you do some calculator arithmetic? No --- you use a calculator language instead. It is always better to use a smaller, simpler language --- a DSPL --- that matches the problem you face.

For this reason, programmers sometimes call DSPLs little languages (e.g., ``here is a little language for drawing figures''; ``here is a little language for linking files''.) Here is a short list of ''little language'' DSPLs that have/had wide use:

  1. Make --- for linking files
  2. Matlab (and Mathematica) --- for doing and graphing linear algebra
  3. SQL --- for doing queries and updates to databases
  4. VHDL, Verilog, and VHSIC -- for laying out hardware circuits
  5. Yacc, Bison, Antlr, PLY --- for programming parsers
  6. HTML, CSS --- for generating web-browser documents
Admittedly, the current versions of many of these ``little languages'' aren't so little anymore, but almost all them came about because someone thought,
''It would be nice to have a little language to help me do ...this little job....''
So, that person designed a little language to do the little job.

In terms of domain-specific software architecture, someone might ask you,

''It would be nice to have a little language to help ...somebody... do ...some little job in this domain.... Can you put together something for us?''

For example, ''It would be nice to have a little language to help us lay out the wiring and sensors for a building's alarm system.''.

Or, ''It would be nice to have a little language to help us write the protocols for how the movement detectors send/receive messages to/from the other devices and people in the network.''

This kind of wishful thinking can lead to a domain-specific programming language, in particular, a top-down domain-specific programming language.

(IMPORTANT: "little languages" are used by software engineers, and there is less-or-no need for a "GUI for dummies" that acts as a translator into the little language.)


9.4 Top-down (``external'') DSPL

Each of the languages in the list in the previous section does one thing well in one application domain; by no means should any of these languages be used for general-purpose computing. All the examples in the list are called top-down (or ``external'') DSPLs because they are designed as stand-alone languages that implement domain concepts and nothing more.

Let's review some of the languages mentioned earlier.

There is one critical standard for the success of a top-down DSPL:

Programmers must ``see'' their domain and the actions within it in the DSPL.
That is, the top-down DSPL lets the programmer think and talk directly to the computer in the problem domain, in the DSL; there are no distractions. For example, a typical Excel user sees (figuratively and literally) a spreadsheet and does not write C++ code to compute on the spreadsheet!

Upon first hearing, it sounds like top-down DSPLs are wonderful --- a language for just my problem that lets me say exactly what I want! --- but in reality, a top-down DSPL is a ``mixed bag'' of assets and drawbacks:

+ non-programmers often can discuss and use the DPSL
+ the DSPL encourages standard patterns of design, implementation, and optimization
+ there is fast development of programs that fall squarely in the problem domain
- staff must be trained to use the DSPL, which is a brand-new language to them
- the interaction of DSPL-generated software with other software components can be difficult
- there is a high cost in designing, developing, and maintaining a DSPL
For these reasons, top-down DSPLs are not always the best tool for solving domain-specific problems.


9.4.1 Developing a top-down DSPL

The starting point is this: become an expert in the problem domain. (This might take years!)

Learn the domain's vocabulary --- its nouns, verbs, and adjectives. As best as you can, define the domain's DSL. Develop many scenarios (case studies) and build lots of systems. Extract from the scenarios and code the patterns of data, features, control, and assembly linkages. Associate these with the DSL (each programming construction should be some concept from the DSL!) and extract the appropriate DSPL to implement.

There is one last question:

Who are the intended users of the DSPL?

This is critical: the concepts expressed within the DSL must be natural and comprehensible to the language's users.

The language must be friendly towards these persons' views of the domain. If the top-down DSPL is for non-expert programmers (e.g., like Excel or HTML), then you must de-emphasize and even dispense with procedural/imperative programming notions like assignments, loops, and data-structures. You must use classic definition-style, concepts, like equations, arithmetic-style operations, and Prolog-style predicates --- these are common in other areas of science and technology.

Most non-experts have difficulty with control structures of any form --- sequencing is about the most they can handle. Repetition is often an impossible challenge to these people.

Data structures must be kept simple, resembling real-life, physical structures (a sheet of graph paper, a chest of drawers, a filing cabinet, a dictionary) or resembling the structures that are fundamental to the problem domain (hallways, buildings, wiring bundles...).

Keep this directive in mind, always:

The programmers must ``see'' their domain and the actions within it in the DSPL!

If the DSPL's users are forced to code in notation and concepts that lie outside their problem domain, the users will get lost. (That's why non-programmers don't use Java as a DSPL for spreadsheet building!) In a serious development effort, you will design an IDE-like tool as well.

Implementation

A top-down DSPL is usually built as a stand-alone parser-interpreter, which reads programs in the little language, parses and executes them. The language in which the interpreter is built is hidden from the DSPL user --- users believe they are "talking" directly to the computer, which "understands" the DSPL "directly".


9.4.2 Example top-down language: Gate-layout language

Here is a simpified version of a classic example. People who design chips from gates once drew huge wiring diagrams, which were photo-reduced to tiny templates from which circuit board were fabricated. This is tedious, error-prone, and expensive. A DSPL would be a better solution.

The DSL talks about ``gates'', which have features like input connections (ports/wires) and output connections. Gates have a function (feature): AND, OR, NOT, etc. Gates are assembled into subassemblies (e.g., one can build an XOR-subsassembly), and subassemblies must be connected and grouped on a board. Gates and subassembles might be annotated with power and space requirements.

The DSL is not event-or-action driven, like an alarm system or GUI, so scenarios are more like puzzles:

There is no need for a general-purpose language, with assignments and loops, for expressing these scenarios --- an equation-like layout language with a few key operations and a few assembly constructions will suffice. (Or, a predicate-based, definition language might be the way to go. Or even an annotated boxes-and-arrows language!)

We might design a language whose scripts handle scenarios like the ones above, e.g., the first scenario is programmed like this:

ASSEMBLY A1 : inputs IN1, IN2;  outputs OUT1.
   w1 = AND(w1, w2).
   w2 = OR(w1, IN2).
   OUT1 = NOT(w2).
ENDASSEMBLY
The connections are coded as equations, where the left-hand side of each equation is the name of a wire. The second scenario might be programmed like this:
ASSEMBLE A2 : inputs IN1, IN2;  outputs OUT1, OUT2.
  solve for           1     1              1     0;
                      1     0              1     1;
                      0     1              1     1;
                      0     0              0     0.
  using {NAND}
ENDASSEMBLY
The new operation, solve for _ using {_}, accepts the tabular input and generates a solution that is named A2 that has the required functionality and behavior.

The third might go like this:

ASSEMBLY A3 : inputs IN1;  outputs OUT1; 
              suchthat w2.voltage < (2 mv).
   w1 = AND(In1, w2).
   w2 = NOT(w1)
   OUT1 = w2
ENDASSEMBLY
Here, the constraint on a wire's voltage is listed as part of the assembly's output specification. The user does not code how to solve the constraint --- the implementation knows the details.

At this point, think about what operations you might add to this little language that connects assemblies like A1 to A2 or A3 --- you will want a little linking language, and it should probably look like equations that "equate" inputs to outputs. Try it.

Of course, there are many hardware languages that can do the above and then some.

Next, consider how a parser and interpreter would be defined to read programs in this DSPL and execute them (in this case, generate circuit-diagram layouts for a board). Think about how a GUI might help a programmer "draw out" the DSPL programs, rather than typing them as script.


9.4.3 Script versus projecting editor

Computer programmers treat language as text that must be typed with a keyboard. This view is outdated.

Indeed, what is a program? Is it

All of the above are means of inputting the program into a computer. Actually, a program is a data structure holding semantic information.

Software engineers use development tools, such as text editors, IDEs, and debuggers, for inputting their programs, interacting with the IDE, choosing menu options and completing templates until the IDE announces that a program is completed. What goes on within the IDE? Does it build a sequence of text lines? (No --- see below.)

Users of top-down DSPLs are even more ``IDE-dependent'' than software engineers. For example, an Excel user will use Excel's GUI to insert data into cells of a spreadsheet and write equations that are embedded into the spreadsheet's ``logic'' so that row-and-column totals are correctly computed. Exactly what is the ``program''? Indeed, the ''program'' is completely intertwined with the Excel development environment and its internal data structures and event handlers.

In a note at http://martinfowler.com/articles/languageWorkbench.html Martin Fowler coins the term, ``projecting editor'' for a DSPL that is programmed within an IDE:


The projecting editor keeps an abstract representation of the user's "program" that is filled in bit by bit, not necessarily sequentially, not necessarily as script. (The "program" is data structures, event handlers, and GUIs held within the projecting editor!) The classic abstract representation of a program is a parse tree whose nodes are annotated with semantic info, plus a global symbol table, along with event handlers and support libraries.

IMPORTANT: The projecting editor must have a back-end that can interpret the abstract representation or can generate a script ("target code") that can be interpreted. (The ``storage representation'' in the diagram is some file format that archives ("pickles") the abstract representation at the end of the IDE session.)

If you are an Emacs or Eclipse or a Visual Studio user, you are using a projecting editor/DSPL for document generation.

If you are developing a top-down DSPL for non-programmer users, then you are almost certainly forced to develop a projecting editor to go with it.


9.4.4 Summary


9.5 Bottom-up (''internal'') DSPL

Our motivation for studying DSPLs is simple: we want to treat the computer as our collegue and talk to it in DSL. We want the computer to do what we ask it in the DSL. To succeed, we must "train"/"teach" the computer as much of the DSL as we can. If we do this "top-down style", we write a stand-alone program. This is a big project!

There is an alternative: Maybe we can "train" the computer day-by-day, teaching it the DSL in stages. We saw this approach in the previous chapter on logic programming: Starting from the library database written in Prolog, we added definitions that teach the computer what an overdue item is, what a fine is, and so on. The librarian starts Prolog, loads the definitions, and talks to the computer by using our definitions. Over time, we include enough library-DSL definitions so that the librarian talks to the computer exclusively in terms of "library-DSL".

The general principle goes like this: an experienced programmer wants to ``extend'' a general-purpose, host language with concepts specific to a problem domain. The programmer codes the DSPL concepts in the host language as procedures, classes, macros, etc., and writes programs that are a mix of host-language code and the codings of the DSPL. Over time, more and more DSPL constructions are added; the host language is "covered over" by the DSPL; and programs are written (almost) entirely in the DSPL. We have extended the host language with the DSPL.

This is called a bottom-up (or ``internal'') DSPL.

Background: GUI-building frameworks

DSPLs often arise as extensions of frameworks.

A framework is a library of software components that help someone implement solutions in a problem domain. (Think of javax.swing or a library from .NET.) Frameworks are ``not-quite bottom-up DSPLs.'' (We will explain the remark later.)

Consider these libraries for GUI-building:

Each library is married to a host language, because a GUI by itself is useless --- the GUI must be connected to components that do something.

These libraries are called frameworks, because each has their own collection of components that implement nouns (''window,'' ''frame,'' ''button,'' ''layout,'' ...) and verbs (''setTitle,'' ''getText,'' ''paint,'' ...) of the GUI domain. They usually come with sample programs that suggest patterns for assembling and calling the components. But they are implemented in their host languages, and a programmer must write (lots of) code in the host language to assemble a working GUI from the GUI framework. (Often, an IDE helps generate the host-language code --- for example, use Visual Studio to drag-and-drop a GUI and then read the hundreds of lines of C# code that it generates.)

A GUI framework is an "almost-DSPL" for GUIs, because it is a library that implements GUI concepts, but there is no ``programming language'' for GUI building, only the components and some example assemblies, where the assemblies are written in host-language code.

An application is a mixture of components from the GUI-framework, calls to the GUI-framework, and coding written from scratch in the host language.

A GUI framework is often ``married'' to its host language by a visual editor, e.g., Visual Basic, Visual C++, Visual Studio, and Eclipse. The visual editor tries to fill the gap between the framework and host language. This usually isn't enough, because there is no language for GUI building, only bits and pieces.


9.5.1 A bottom-up DSPL evolves over time

If you build GUIs for a living, you will not be satisfied with just a GUI framework and an IDE --- you will develop and use design patterns, macros, templates, custom components, and shortcut-code that reduce your time and effort. You are developing your own little language, your own bottom-up DSPL, for GUI building.

Experienced programmers naturally become bottom-up DSPL designers, because over time they assemble a library of that express domain concepts that they use over and over to solve problems.

Eventually, the programs these people write consist almost totally of code from their own library. The host programming language --- when it is used at all --- acts merely as minimal ``glue code'' for assembling components and templates and patterns.

At this point, the host language plus the library of custom constructions is a bottom-up DSPL, because the library has become ``more important'' to the problem solving than the host language itself. What has happened is this:

The programmer has extended the host language ``upwards'' towards the problems to be solved.
The host language has been extended to include the DSL. This makes the host(glue)-language-plus-its-library a bottom-up DSPL.

The custom-written library is coded in the host language, and it is oriented towards encoding ``domain-concepts-as-code'' (nouns as data-structure patterns, verbs as operations/control-structure patterns, features as attributes, sentences and paragraphs as assembly/design patterns) so that scenarios discussed in the DSL are readily converted to code. Experienced programmers have good instincts for coding domain concepts as code and saving them as libraries. It is almost a matter of survival --- there is never enough time to build a new solution completely from scratch!

Many ideas from object-oriented design and design patterns apply to bottom-up DSPL development: classes, methods, templates, and design patterns implement domain concepts. You code them, save them, reuse them --- you have a language.

Languages like Scheme (via lambda abstraction and hygienic macros) and Smalltalk and Ruby (via blocks and macros) let a programmer easily define design patterns as custom templates directly in source-code syntax. The Scala language even lets you alter its own parser so that it can be extended to parse your new syntax patterns! These are ways of extending the host language upwards towards the application domain.

But any general-purpose language can serve as a host language. Usually the host language is whatever language in which the starting libraries and frameworks are written.

A bottom-up DSPL has its strengths and weaknesses also:

+ It integrates well with applications written in its host language
+ Its development and maintenance is managed naturally over the time period when it is used, since it is an ``organic library'' that grows and adapts to its applications.
- Its range of users is largely limited to experienced programmers, perhaps just to the people who develop and maintain the library.
- Its ease of use is connected to the ease of use of the underlying host language.

A scenario of how this might happen

Say your alarm-installation company is successful, and you are installing a lot of cameras, sensors, alarms, and servers in a lot of buildings. You've done enough installations that you have written and saved software drivers for the cameras, sensors, etc., and you have pieces of networking code for connecting devices to servers.

You also have several completely coded software implementations of networked alarm systems. But each time you design and install a new system for a new customer, you must study earlier implementations and copy-and-paste code and write new code.

At this point, you have a framework for alarm systems --- lots of useful pieces, several completed assemblies, but no "language" for talking about the "big picture", namely, no computer language for mapping a design into a complete system of multiple hardware and software devices.

You must "computerize" more of the work that you are currently doing and redoing by hand. To do this, you look at the systems you've already built and study how your designs were hand-coded into the implementations. You look for standard patterns (design patterns!) and you look for variations (parameters) in the patterns. You extract code templates that correspond to the patterns and you name them.

You also study the specifications you developed when you met with your clients and you convert as many of the specifications' nouns and verbs of the Domain Language (DSL) into computer code for those nouns and verbs --- that is, you are training the computer to understand the same DSL that you and the client understand.

You implement the design patterns, the templates, the Domain Language, with macroprocessors, with custom components that use eval/exec, and with simple parser-translator tools. (These techniques are described in the remainder of this chapter.) You extend/lift-up the host language to include as much of the Domain Language as possible.

Later in the chapter, there are two small examples that attempt to illustrate this approach.


9.5.2 Developing a bottom-up DSPL

Some of the following was already stated but bears repeating:

Experienced programmers are the natural users of a bottom-up DSPL because they design it themselves, over time, as a library of implemented components and patterns, meant to represent how the computer understands the nouns and verbs and features of the DSL. Eventually the host programming language acts merely as ``glue'' for linking/assembling the library's templates and patterns and components --- The programmer has extended the host language ``upwards'' towards the problems to be solved.

Bottom-up design might go like this:

  1. Use a framework to write lots of systems in your application domain. Add more and more components (classes, modules) to the framework.
  2. Notice which control and assembly patterns you are copying-and-pasting into the systems you build. Code these patterns into macros or templates (parameterized procedures/control structures) --- find some way to extend your host language with the patterns.
  3. Repeat the previous steps until the components and patterns you implemented match the DSL that you think in and talk in.
  4. Most importantly, force yourself to use your library (and improve it!) as much as possible, instead of writing new code from scratch. You should program by selecting code from your library and ``gluing'' (linking) it together with minimal code from the underlying host language. (Research has shown this is the most difficult step for software engineers to do!)

Your goal is to make the bottom-up-DSPL library ``stand-alone,'' so that you write programs just with your library and with almost zero new code from the host language. This means you use the host language only as a minimal glue language for your library, and as as an ``interface language'' to contact external components that you have not written, and in rare cases to ``escape'' from the DSPL to execute host-language code.

Implicit in the previous paragraphs are the notions of framework and product line from mainstream Software Engineering:

When you implement template or design patterns as new constructions, you want to have a nice syntax to call them. Ruby, has a built-in "macro processor" for defining new syntax patterns that extend Ruby's syntax. Scala has an ML-like pattern language that lets you update the Scala parser to handle new syntax constructions. Other languages are less helpful, and you might be forced to write a parser-translator that translates your nice syntax of patterns into host-language code.

Here are some possible combinations to use:
Host languagePattern language that links to host
Java Scala (provides ML-like front end) or Groovy (provides Python-like front end)
C# F# (provides ML-like front end)
PythonPython's re module (provides macroprocessor)
CGPP or m4 (macroprocessors)

Unlike "little languages" (top-down DSPLs), bottom-up DSPLs are "big", because they start with a host language and a framework and get bigger and bigger with extensions until the DSPL library is completed. So, it is difficult to give simple examples of bottom-up DSPLs. But we will try.


9.5.3 Example: Grid-GUI patterns

Here is a beginner example that shows how one might search for patterns or templates to extract and name.

Say that you use Tkinter to program lots of Python GUIs that are grids. The grids always turn out to be matrices of buttons that look and behave the same. (Spreadsheets and game boards work like this!) This means you are copying-and-pasting many patterns of definition, assembly, and control from existing applications to new ones. It would be much better to code the patterns as classes, functions, templates, and macros that are inserted into your Tkinter programs. The resulting programs would be easier to code and read and would work reliably (because your patterns are implemented correctly once and for all).

There isn't time or space here to present lots of grid-GUIs, but here's one, a game board, where some common grid-GUI coding patterns are marked by #n ****.

===================================================

#Game board for "Pente" game:     

from Tkinter import *
import PenteBoard   # the model subassembly --- holds the gameboard data

### the CONTROLLER module --- this should be placed in a separate file.

#1 ***********************************
def makeHandler(myrow, mycolumn) :
    """makeHandler constructs a handler function for a new button.
       parameters: 
          myrow - an int, the row coordinate where the new button lives
          mycolumn  - an int, the column coordinate where the new button lives
       returns: the handler function customized for a new button
    """
    def handleButtonPress() :
        """handleButtonPress is the constructed handler function.
           It makes the move for the human who pressed
           this button (which is at position myrow,mycolumn).
           The updated board is then painted.
        """
        if PenteBoard.game_on() :
            PenteBoard.makeMove(myrow, mycolumn)
        repaintGUI()

    return handleButtonPress
# END ***********************************

### the VIEW module starts here:

def repaintGUI() :
    """repaintGUI  repaints the foreground text of all the buttons on the GUI,
       it also updates the displayed count of captures, and if there is a
       winner, it prints a message as to who won.
    """
    global buttons, label1

    #2 ****************************
    for i in range(size) :
        for j in range(size) :
             buttons[i][j].configure(text = PenteBoard.contents(i,j))
             buttons[i][j].configure(bg = "white")
    #END *****************************

    label1.configure(text = "Your captures " + " = " \
                      + str(PenteBoard.getCaptures()))

#3 *********************************
window = Tk()
window.title("Pente")
size = PenteBoard.size
window.geometry(str(50 * size) + "x" + str(50 * (size)))
frame = Frame(window)
frame.grid()
#END  *********************************

label1 = Label(frame,
              text = "Captures " + " = " \
                      + str(PenteBoard.getCaptures()),
              font=("Arial", 12, "bold") )
label1.grid(row = 0, column = 0, columnspan = 5)

#4 **********************************
buttons = []    # a nested list that remembers addresses of all button objects

for i in range(size) :
    button_row = []
    for j in range(size) :
        button = Button(frame,
                        font = ("Arial", 14, "bold"), fg = "blue", bg = "white",
                        width = 2, height = 1)
        button.configure(text = PenteBoard.contents(i,j))
        button.configure(command = makeHandler(i, j))
        button.grid(row = i+2, column = j)
        button_row = button_row + [button]
    buttons.append(button_row)
#END*********************************

window.mainloop()   # activate GUI

===================================================
There's a lot of ugly code here, and an IDE will not help you avoid the tedious coding of the nested loops for initializing and resetting the button grids. The event handlers must also be manually coded. The numbered patterns seen above are simple Domain concepts:
  1. Pattern 1 is code for defining a family of similar-behaving event-handling functions for a grid of buttons.
  2. Pattern 2 is code for repainting a grid of buttons when there has been an update to the GUI's model (in this case, when the model, Penteboad has been altered due to a move).
  3. Pattern 3 is "boilerplate" for allocating the main window.
  4. Pattern 4 configures the appearance of the buttons.
All the "patterns" are copy-and-paste coding from earlier applications. They should be converted into Domain Language concepts, lifting the GUI framework upwards towards the DSL of "Grids".

Say we use a macroprocessor to define nice syntax names for the four patterns. The above code is simplified to the following, where the macros are prefixed by @-signs:

===================================================

from Tkinter import *
import PenteBoard   # the model subassembly

def repaintGUI() :
    """repaintGUI  repaints the foreground text of all the buttons on the GUI,
       it also updates the displayed count of captures.
    """
    global label1
    @repaintGrid from (PenteBoard)
    label1.configure(text = "Your captures " + " = " \
                      + str(PenteBoard.getCaptures()))

window, frame = @initializeFrame("Pente", PenteBoard)

label1 = Label(frame,
              text = "Captures " + " = " \
                      + str(PenteBoard.getCaptures()),
              font=("Arial", 12, "bold") )
label1.grid(row = 0, column = 0, columnspan = 5)

@configureGrid from (PenteBoard, frame)
  handler (lambda(myrow, mycolumn) =>
              if PenteBoard.game_on() :
                  PenteBoard.makeMove(myrow, mycolumn);
              repaintGUI() )  # THE HANDLER IS DEFINED AS CLOSURE CODE
  and (font = ("Arial", 14, "bold"),
       fg = "blue", bg = "white",
       width = 2, height = 1)

window.mainloop()

===================================================
We have a more readable mix of GUI-domain concepts and host-language code. Here are the macro patterns that were used above:
  1. @repaintGrid from (MODEL) consults the MODEL object for the values of all the cells that are modelled and uses this information to repaint the grid GUI. The call expands into this template:
    global buttons
    size = MODEL.getSize()
    for i in range(size) :
            for j in range(size) :
                 buttons[i][j].configure(text = MODEL.contents(i,j))
                 buttons[i][j].configure(bg = "white")
    

  2. @initializeFrame(TITLE, MODEL) generates "boilerplate" code for configuring a window and frame large enough to display the MODEL object:
    window = Tk()
    window.title(TITLE)
    size = MODEL.getSize()
    window.geometry(str(50 * size) + "x" + str(50 * (size)))
    frame = Frame(window)
    frame.grid()
    

  3. @configureGrid from (MODEL, FRAME) handler (HANDLER) and (ATTRIBUTES) allocates a grid of buttons for the MODEL, binds each to a unique handler function generated from HANDLER and annotates each button with the optional ATTRIBUTES:
    buttons = []
    size = MODEL.getSize()
    for i in range(size) :
        button_row = []
        for j in range(size) :
            button = Button(frame, ATTRIBUTES)
            button.configure(text = MODEL.contents(i,j))
            button.configure(command = HANDLER(i, j))
            button.grid(row = i+2, column = j)
            button_row = button_row + [button]
        buttons.append(button_row)
    

Over time, more and more components and patterns will be named, saved, and reusued. The GUI programs will call more and more of the saved DSPL concepts and less and less of new code. A bottom-up DSPL evolves. We will learn in a future section how to use a macroprocessor to declare and call the above macro patterns.


9.5.4 Example: Observer design pattern

Here is an example where a critical Domain concept, the "observer pattern", is made understandable to the computer by means of a macro definition that expands into a mix of classes, interfaces, and commands.

Programmers who use object languages use design patterns that help them assemble systems faster. Say that you develop a lot of blackboard-style systems where the central database (``model'' or ``entity object'') must be monitored and/or displayed by actors or GUI widgets (``observers'') and contacted each time the model is updated. The Observer Design Pattern is a well-known design pattern for this situation. If you use it a lot, then it is a concept in your DSL, and you should define it as a "template" and add it to your DSPL library.

Here is one version of the Observer Design Pattern, used in Java programming:

  1. The ConcreteObserver objects want to be told whenever the ConcreteSubject object is updated. But the ConcreteSubject never contacts ConcreteObservers directly --- this is too messy. So, the ConcreteObservers' ask to be registered (saved) in an Observable (sub)object (or wrapper object), which knows the identity of the ConcreteSubject. Each ConcreteObserver has a handle method, that is, it must implement the Observer abstract interface.
  2. All requests to setState(...) of the ConcreteSubject are in fact sent to the Observable object, which (i) forwards the request to the ConcreteSubject and (ii) then contacts all ConcreteObservers, by event broadcast, which indirectly calls the handle method.
  3. When a ConcreteObserver's handle() is called, the handler calls the ConcreteSubject.getState() to obtain the information it needs to repaint its GUI.
In this way, the ConcreteSubject is isolated from the overhead of observing it. This is the standard subassembly in GUI-based tools/toys; it is the key part of the ``Model-View-Controller'' software architecture.

Rather than recode the assembly each time, we define this syntax, a named macro pattern, @observed, that returns a handle to the Observable object that anchors the design pattern:

Observable omodel = @observed (SUBJECT) by (OBSERVER LIST);
The macro expands to code that declares the Observer-related event, allocates the Observable wrapper object, registers the OBSERVER LIST, and returns the handle that the rest of the system uses for contacting the SUBJECT:
===================================================

# declare a new event type, subjectUpdated, and bind it to its event handler:
public Observer event/delegate subjectUpdated;

# code for constructing wrapper object:
class Observable {
  ConcreteSubject model;
  Observer[] registered;

  public Observable(m, olist) {
    model = m; registered = olist
    foreach (obs in registered) { obs.setSubject(model);
                                  subjectUpdated.register(ob.handle); }
  }

  public setState(...) { model.SetState(...); signal subjectUpdated; }

  public getState() { return model.getState(); }
}

# return handle to observable object:
return new Observable(SUBJECT, OBSERVER LIST);

===================================================
The above code is simplified a bit (it isn't strict Java or C#), but the idea is clear. Now, the controllers that contact the model do so by calling omodel.setState(...), which triggers the update to the SUBJECT and an event broadcast that activates the handle method of each observer.

The macro code above is more than one method or class --- it is components, declarations, and executed code; it is a subassembly pattern, a template, an extension of what Java/C# provides; it is a coded, reusable DSPL concept.


9.5.5 Summary


9.6 Hybrid DSPL

Most DSPLs use a mix of top-down and bottom-up concepts.

Mostly top-down

We might say that a DSPL is ``mostly top-down'' if it is designed to express DSL scenarios-in-code and has its own parser (or IDE editor) and interpreter/translator.

A mostly-top-down DSPL can appear like this: You use some framework or component library to build systems, and you develop insight about what the "dream language" (the DSL language!) truly is for writing the algorithms you regularly implement with host-language code and library calls.

So, you design the dream language: You write grammar rules for the syntax and you write semantic equations that map syntax into host language code and library calls. You build the translator. In this way, you have built a bridge from the DSL, at the top of your thinking, mostly downwards to the frameworks below.

A danger of any top-down DSPL is that it is isolated from other systems and implementations. Your mostly-top-down DSPL should let you call library components and execute code written in the implementation language. To do this, add a ``trap door'' to the DSPL so that the execution of the DSPL program can be paused and the implementation-language code can be executed instead. Many scripting languages provide such a trap door, in the guise of an eval operation, which takes as its argument a string that holds executable code --- eval runs the code. Here are three useful forms of trap door in Python:

  1. A eval-like function that executes a string as Python script: For example, exec("y = 2 + x; print y"). executes the program y = 2 + x; print y, using the variables that are visible at the position where exec appears. If one does not want the executed string to affect existing variables, one can invent a namespace exclusively for the string's use, like this: exec("y = 2 + x; print y") in {'x':0}.

    Here is a program that builds a string and runs it:

    x = 2;  y = 3;  z = 5
    invar = raw_input("Type name of variable (x, y, or z) to zero out: ")
    if invar in ("x", "y", "z") :
        code = invar + " = 0"
    else :
        code = "pass"
    exec(code)
    
    The exec command can also read and execute the contents of an opened text file:
    handleToCodefile = open("MyPythonProgram.py", "r")  # open a readable file
    exec(handleToCodefile)  # execute its contents 
    

  2. Python's os package contains procedures for querying the operating system and performing OS commands, e.g.,
    import os
    
    cwd = os.getcwd()     # get current working directory
    if os.path.basename(cwd) == "MyPictures": # is the lowest-level dir "MyPictures" ?
         # then, move up one level to parent directory:
         os.chdir(os.pardir)
    print "Current path is ", os.getcwd()
    
    os.system("ls -a")   # ANY OS command can be supplied as a string arg
    

  3. We can pause execution and execute any external program we wish:
    # run an external program from within Python code:
    import subprocess
    # general format:   subprocess.call(["program-name", "param1", "param2", ...])
    subprocess.call(["C:/Python26/Python.exe", "MyPythonPgm.py"])
    
A mostly-top-down DSPL includes some form of trap door so that bottom-up defined components written in the implementation language can be executed. Some implementation tricks are shown later in this chapter.

Mostly bottom-up

A DSPL is ``mostly bottom-up'' if it is developed as layers of host-language-coded components and macro-coded patterns that help model the problem domain. Perhaps the layers of components do not express directly and immediately the DSL --- there is still a "gap" between the code solutions and the solutions described in scenarios.

To close the gap, we add customized control- or linking-patterns that express the missing domain concepts, so that the concepts look like they are built into the host language. (In particular, we want to avoid writing ugly dot-notation, like packageName.objectName.methodName(arg1, arg2, ...), each time we use a custom-coded domain concept/pattern.)

A good host language will give you a technique to add custom patterns. Here is a simple example:

Say that your problem domain has lots of solutions that use the phrase, ``repeat ACTION until CONDITION holds'' so that this pattern should be added to the DSPL library. Some languages let you define higher-order functions (functions that take code/closures as parameters) in mix-fix keyword notation, like this:

def repeat(action)until(condition)holds :
    """executes the command,  action,  until expression,  condition,  is true"""
    action()         # do the action step
    if condition():  # finished ?
        return
    else:            # do it again:
        repeat(action)until(condition)holds 
This defines a function named, repeat..action..until. The function is used in a program like this:
...
repeat([x = x - 1])until([x == 0])end
...
The brackets, [..], are quoting the code .., that is, constructing a closure holding the code. Functional languages, like Scheme and Haskell, support this approach, as do Ruby and Smalltalk to a lesser degree.

For older programming languages, the traditional way to add custom control structures is with a macro processor (``preprocessor''). A macro processor is a program that reads as input a program in the host language that has the custom structures mixed into the code. The macro processor locates the occurrences of the custom structures and replaces them with the instructions in the host language that perform the intended operations.

C's preprocessor is a standard but not-too-exciting example. A segment of C code like this,

#define PI 3.14159
#define Double(x)  (x + x)
// now,  PI  and  Double  act like they are built-in C functions:
y = Double(PI * 5) ;
defines two macros, PI and double, which look like functions and can be called like functions. When the above code is input to C's preprocessor, this text is the output:
y = (3.14159 * 5 + 3.14159 * 5) ;
The macro definitions are removed, and the calls are replaced by C-text, giving a program in pure C.

When a macro is called, its arguments are text and not computed values! At a macro call, the text argument is bound to the parameter and the text is inserted for occurrences of the parameter in the macro body. The text computed by the macro's body is copied back in place ofthe macro call. In the example, y = Double(PI * 5) is rewritten to y = (PI * 5 + PI * 5), which is rewritten to y = (3.14159 * 5 + PI * 5), which is rewritten to y = (3.14159 * 5 + 3.14159 * 5). The example shows why the macro processor must be a separate program, run first, before the parser, interpreter or translator. There is a preprocessor, called GPP, that can be used stand-alone to process any program that contains C-like macros. Like C's preprocessor, GPP requires that a macro call look like a function call, of the form, MACRONAME(ARG1, ... ARGn). The m4 macroprocessor lets its user write macro definitions whose calls look somewhat like the mix-fix notation seen in the previous repeat..until..holds example.

In a future section, we will see how to use a language's regular-expression library to code a simple but useful macro processor.

Here are some references for existing macro processors:

Ruby supports a ``block'' construction (the [..] syntax) that makes it possible to code simple customized control structures directly in Ruby. There are some Ruby-implementation approaches at http://weblog.jamisbuck.org/2006/4/20/writing-domain-specific-languages


9.6.1 Implementation techniques

Perhaps this is obvious, but the first question to ask is: What language is understood by the hardware platform that you use in the problem domain? If the hardware language lets you code a parser and interpreter, then you can readily implement a top-down DSPL on the hardware. (Note: a hardware platform might ``understand'' several languages, if there already exist quality interpreters or compilers for the languages on the hardware.)

If the hardware language is not expressive enough, or it is limited in space and speed, you must protoype the top-down DSPL interpreter in a different language and then convert the interpreter into a compiler that translates into the hardware language. Do this as a last resort, since compiler development and maintenance are expensive tasks.

In the case of a bottom-up DSPL, you should select a host language that either (i) is directly understood by the hardware or (ii) has an efficient compiler from the host language to the hardware language. In all cases, the chosen host language must support components and libraries, so that you can extend the host language bottom up.

Although it is almost never done, it is an excellent project to implement a DSPL one way and then use the acquired knowledge to implement it the ``inverse way.'' That is, if you designed a DSPL top-down, try to extract from its interpreter the parts that become components for a bottom-up implementation. Dually, if you first built a bottom-up DSPL, then next use the components as the ``logic'' within a top-down, interpreter implementation. The second version of the language might be the one that you prefer!

Top-down DSPLs and trap doors

If you have designed a top-down DSPL, you should add a ``trap door'' so that code in the implementation language can be embedded in the programs you write. The simplest way to do this is to use an implementation language that has an eval/exec operation.

Here is a small example. Perhaps you have designed a game-app for a cell phone, where a child can tell birds to eat bugs. The game has a GUI front-end, but the mouse moves and clicks on the GUI generate code in this syntax:

===================================================

CL : CommandList       A : Atom
C : Command            S : String

CL ::=  C |  C . CL
C  ::=  A1 eats A2 | do S
A  ::=  bird  |  bug
S  is a quoted string

===================================================
An example program that the GUI might generate is
bird eats bug.
bird eats bug.
bug eats bird
The game has limited functionality (haha), but notice the do command, which is a trap door that lets a programmer insert Python code that directly manipulates the language's interpreter, say, like this:
bird eats bug.
do "census['cat'] = 1\ncensus['bird'] = 0\nprint 'uh oh!'"
The string holds Python code:
census['cat'] = 1
census['bird'] = 0
print 'uh oh!'
Here is the interpreter for the bird-cage language:

===================================================

"""Interpreter for mini top-down DSL for bird-cage domain of birds and bugs.
   Includes trap-door operation,  do S,   for embedding Python source code.  
   Source language syntax to be parsed:
     CL : CommandList       A : Atom
      C : Command           S : String 
           CL ::=  C |  C . CL
           C  ::=  A1 eats A2 | do S
           A  ::=  bird  |  bug
           S  is a quoted string

   Operator-tree structures resulting from the parser:
      CLIST ::=  [ C* ]
      CTREE ::=  ["eat", A1, A2 ]  |  ["do", S]
      A     ::=  "bird"  |  "bug"  
      S     ::=  a quoted string
"""
# Global variable: remembers count of entities in bird cage:
census = {"bird": 9,  "bug": 99}

def interpretCLIST(p) :
    """interprets CLIST  p"""
    for command in p :
        interpretCTREE(command)

def interpretCTREE(c) :
    """interprets CTREE  c"""
    operator = c[0]
    if operator == "eat" :
        eater = c[1] 
        lunch = c[2]
        if census[eater] > 0 and census[lunch] > 0 :
            census[lunch] = census[lunch] - 1
    elif operator == "do" :  # trap-door ``eval'' operation ---
        exec(c[1])    # executes  c[1]  as python code.  Can affect  census,
                      # add new global variables to interpreter's namespace,
                      # print trace information, etc.
    else :  
        crash("invalid command")

def crash(message) :
    print message + "! crash! core dump: ", census
    raise Exception  

def main(program) :
    """interprets the operator tree,  program"""
    interpretCLIST(program)
    print "final census =", census

===================================================
Here are some sample uses of the interpreter:
python -i top.py
>>> main([["eat", "bird", "bug"]])
final census = {'bird': 9, 'bug': 98}

>>> main([["eat", "bird", "bug"], ["do", "census['cat'] = 1\ncensus['bird'] = 0\nprint 'uh oh!'"]])
uh oh!
final census = {'bird': 0, 'bug': 97, 'cat': 1}
The do command lets a programmer escape from the limited functionality of the DSPL and use the operations of the implementation language.

Bottom-up DSPLs and macro expansion

If you have developed a bottom-up DSPL, you should also define control-structure patterns and linking patterns for your DSPL library. It is always best to use host-language facilities to do this.

Some host languages (e.g., Scheme and C) come with their own macro processors. Others (e.g., Smalltalk and Ruby) have flexible procedure-call syntax for defining new patterns. Others (e.g., Perl, PHP, Python, Ruby) supply regular-expression libraries that have powerful pattern-matching operations that you can use to write your own macro processor.

Here is an example of using regular-expression string matching in Python. We use Python's regular-expression module, re, to define a pattern, match the pattern in a string, and replace it. The comments in the code explain how this operates:

===================================================

import re    # re  is the module of regular-expression operations

# Here is a pattern that matches strings of form,  
#    @DOUBLE alpha END
# where  alpha  is some substring that holds no occurrences of  @ :
#     "(\\s*)@DOUBLE\\b([^@]*?)\\bEND\\b"
# where 
#     \\s  means a whitespace character
#     \\b  means a word boundary
#     E*   means match  E  zero or more times as much as possible for success
#     E*?  means match  E  zero or more times as little as possible for success
#     [^c] means match any character that is NOT character  c
# The parens mark _groups_ that are used below.

# p  is a string-matching object compiled from the pattern string:
p = re.compile("(\\s*)@DOUBLE\\b([^@]*?)\\bEND\\b")

# try this multi-line example:
source = """
x = 0
x =  @DOUBLE x END
print x
"""
print "source text ="
print source

# search for compiled pattern  p  in  source:
m = p.search(source)
print

# if the match succeeds,  m  is an object; else  m = None
print "match result =", m
# m  holds a list of substrings that matched parenthesized groups in the pattern:
print "matched groups =", m.groups()

# m  also holds the start and end indexes of the matched string:
print "span of matched text =", m.span()
# the start and end indexes can be referenced individually, too:
print "matched text =", source[m.start() : m.end()]
print

# let's replace the matched string by something else:
matches = m.groups()
source = source[:m.start()]  \
       + matches[0] + "(2 * " + matches[1] + ")" \
       + source[m.end():]
print "updated text ="
print source

# We have completed a simple macro-expansion of  "!DOUBLE alpha END"
# into  "(2 * alpha )", preserving any leading spacing

===================================================
Here is the output from the above script:
source text =

x = 0
x =  @DOUBLE x END
print x


match result = <_sre.SRE_Match object at 0x7ff3d6e0>
matched groups = ('  ', ' x ')
span of matched text = (10, 25)
matched text =   @DOUBLE x END

updated text =

x = 0
x =  (2 *  x )
print x
The example shows that patterns can be complex. There is a tutorial on writing patterns at http://docs.python.org/howto/regex.html and there is a mostly complete listing of pattern options at http://docs.python.org/library/re.html.

We now use the ideas in the example to write a macroprocessor in Python that searches for macro-call patterns and replaces them with expansions. Here are the two macro calls the processor will perform:

@REPEAT Code FOR Expr TIMES  ===>  newvar = Expr
                                   while newvar > 0 :
                                       Code
                                       newvar = newvar - 1

@DOUBLE Expr END  ===>  ((Expr) * 2) 
Each macro call on the left is coded as a pattern string, and each translation is done by a Python-coded handler function. The macro processor's main data structure is a list of (compiled-pattern, handler-function) pairs.

Here is the macro processor:

===================================================

"""Simplistic macroprocessor based on regular expressions.

   main data structure:
      macrotable : list of (COMPILED_PATTERN, HANDLER) pairs

   Example:
   macrotable = [ (re.compile("(\\s*)@REPEAT\\b(\\s*)([^@]*?)\\bFOR\\b([^@]*?)\\bTIMES\\b")
                   translateREPEAT),
                  (re.compile("@DOUBLE\\b([^@]*?)\\bEND\\b"), translateDOUBLE) ]

      holds these two macro definitions:
         indent1 @REPEAT indent2 alpha FOR beta TIMES  
                        =>  translateREPEAT(indent1,indent2,alpha,beta)
         @DOUBLE alpha END  =>  translateDOUBLE(alpha,)

   Compiled patterns, as written above, match macro-call symbol, @,
   followed by keywords (which are required to be separate words by  \\b )
   such that included text arguments do not include any call symbols, @.  

   Note that  E*?  denotes the minimal match of  E*  such that
   the overall pattern match succeeds.  Thus, the macro processor computes
   inside-out processing of macro calls so that nested calls are never confused.

   The pattern for @REPEAT  also records the amount of indentations via  (\\s*).

   Macro-processor algorithm:
   read  source
   repeat until no more macro matches:
       search  source  for each compiled pattern in  macrotable
       if successful match,
          then call accompanying  handler  function,
            which assembles appropriate translation
            insert translation in place of matched pattern in  source
   write source
"""
### This portion should be a separate module that holds the translation
### functions.  It is embedded here for simplicity.

#GENSYM function:
var_count = 0   # count of new names generated for expanded macros
def genNewVar() :
    """genNewVar is a gensym function, generating unique new names
       returns: a string of form, "_varN", where N is a unique nonneg int
    """
    global var_count
    newvar = "_var" + str(var_count)
    var_count = var_count + 1
    return newvar


def translateREPEAT(args) :
    """translateREPEAT  expands this macro call:
       indent1 @REPEAT 
               indent2 Code
               FOR Expr TIMES  =into=>  indent1 newvar = Expr
                                        indent1 while newvar > 0 :
                                                indent2 Code
                                                indent2 newvar = newvar - 1

       where indent1 = args[0]  and  indent2 = args[1]
             Code = args[2]     and  Expr = args[3]
       (indent1  and  indent2  are leading white-space)
       returns: ans, a string holding the macro-expanded call
    """
    indent1 = args[0]
    indent2 = args[1]
    bodycode = args[2]
    exprcode = args[3]
    # the call to REPEAT is replaced by this python code, as documented above:
    newvar = genNewVar()
    ans = indent1 + newvar + " = " + exprcode  \
          + indent1 +  "while " + newvar + " > 0:"  \
          + indent2 + bodycode  \
          + indent2 + newvar + " = " + newvar + " - 1"
    return ans


def translateDOUBLE(arg) :
    """translates   @DOUBLE(arg,) =into=>  '((arg) * 2)' """
    ans = "((" + arg[0] + ") * 2)"
    return ans

### END OF HANDLER-FUNCTION MODULE

### MACRO PROCESSOR CONTROL ALGORITHM:

import re   # import regular-expression module
# initialize macrotable:
macrotable = [ (re.compile("(\\s*)@REPEAT\\b(\\s*)([^@]*?)\\bFOR\\b([^@]*?)\\bTIMES\\b"),
                translateREPEAT), 
               (re.compile("@DOUBLE\\b([^@]*?)\\bEND\\b"), translateDOUBLE)
             ]

# read source:
import sys
if len(sys.argv) < 2 : 
    inputfilename = raw_input("Type input file to copy: ")
else :
    inputfilename = sys.argv[1]
input = open(inputfilename, "r")
source = input.read()
input.close()  

# replace all macro calls:
still_matching = True
while still_matching :
    still_matching = False
    for (pattern, handler) in macrotable :
        match = pattern.search(source)
        if match :  # != None
            replacement = handler(match.groups())
            source = source[:match.start()] + replacement + source[match.end():]
            still_matching = True

# write source:
index = inputfilename.find(".py")
outputfilename = inputfilename[:index] + "out" + ".py"
output = open(outputfilename, "w")
output.write(source)
output.close()

print
print "Contents of " + outputfilename + ":"
print source

===================================================
Say we have this file, test.py, whose contents are:
x = 0
@REPEAT
    x = @DOUBLE x + 1 END
    @REPEAT
        pass
    FOR 2 TIMES
FOR 3 TIMES
print x
When we use the macroprocessor to rewrite the file (python macrop.py test.py), we get this report:
Contents of testout.py:
x = 0
_var1 =  3 
while _var1 > 0:
    x = (( x + 1 ) * 2)
    _var0 =  2 
    while _var0 > 0:
        pass
    
        _var0 = _var0 - 1

    _var1 = _var1 - 1
print x
All macro calls are expanded.


9.7 Further reading

It is tough finding good tutorial material about DSL development. (The people who do these things are in industry, and they have little time to write scholarly papers and books. The few academics who have tried to research the topic have found themselves swallowed up by the application domains, and they either give up or they never come back.) Have fun!