Challenges of model-checking program source code

Next: When might someone be Up: Overview Previous: Overview

Challenges of model-checking program source code

Modern computing applications increasingly require concurrent/distributed software systems that are extremely reliable. Unfortunately, current software validation techniques, such as inspections and testing, are failing to provide high levels of assurance of correctness for these systems due to system size and complexity as well as the fundamental difficulties of reasoning about state/event sequences in concurrent behavior.

Model-checking techniques (now widely used for hardware verification) hold promise for establishing crucial behavioral properties of complex software because they can automatically check to see if an abstract finite-state transition system model of the software conforms to a given state/event sequence property. If the model fails to satisfy the property, the model-checker gives a counter-example -- a path through the model's transitions that violates the property. This can be used to locate and correct the corresponding software defect.

Although it holds great promise, we believe that there are four problems that are currently preventing model-checking technology from being successfully applied to software.

The state explosion problem: the exponential increase in the size of a finite-state model as the number of system components grows. A variety of methods exist for curbing the state explosion when analyzing certain types of systems, and these methods have proven sufficient to make analysis of many hardware designs tractable. Unfortunately, software systems tend to have more complex state than hardware components and thus must be more aggressively abstracted to produce tractable models.
The model construction problem: bridging the semantic gap between the artifacts produced by software developers and those accepted by current verification tools. Most development is done with general-purpose programming languages (e.g., C, C++, Java, Ada), but most verification tools accept specification languages designed for the simplicity of their semantics (e.g., process algebras, state machines). In order to use a verification tool on a real program, a developer must extract an abstract mathematical model of the program's salient behavior and specify this model in the input language of the verification tool. This process is both error-prone and time-consuming.
The requirement specification problem: the difficulty of expressing software requirements in the temporal specification languages of existing model-checking tools. Although model-checker property specification languages are built on theoretically elegant temporal logics, practitioners and even researchers find it difficult to use them to accurately express complex event-sequencing properties. Once written, the specifications are often hard to read and debug.
Moreover, model-checker specification languages are designed to state properties of mathematical models rather than software source code. Most software specifications include references to program features such as control-points (e.g., method entry/exit), local and instance variables, array access, nested object dereferences. However, current tools provide little or no support for the intricate mappings that are often required to bridge the gap between source code features and their corresponding model realizations. This means that the user is often forced to state the specifications in terms of the model's representation of program features such as coded e.g., in Spin's Promela input language, instead of in terms of the source code itself. Thus, the user must understand these typically highly optimized representations to accurately render the specifications. This is somewhat analogous to asking a programmer to state assertions in terms of the compiler's intermediate representation. Moreover, the representations may change depending on which optimizations were used when generating the model. Even greater challenges arise when modeling the dynamism found in typical object-oriented software: components corresponding to dynamically created objects/threads are dynamically added to the state-space during execution. These components are anonymous in the sense that they are often not bound directly to variables appearing in the source program. The lack of fixed source-level component names makes it difficult to write specifications describing dynamic component properties: such properties have to be expressed in terms of the model's representation of the heap.
The output interpretation problem: When a property fails when checking large models (and software systems typically produce very large models), the counter-example traces produced by the checker can be hundreds or even thousands of steps long. Manually matching up these counterexamples with source code is extremely tedious for several reasons. First, the length is quite long and it may require hours to walk through the trace. Second, the error trace is expressed in terms of the low-level, possibly highly optimized model representations. Thus, one has the reverse of the ``representation gap'' issue mentioned in the property specification problem: the analyst must understand the model's representation of complex program features in order to accurately project the model error trace back to the source level. Typically, one ``step'' in the source program may correspond to as many as ten steps in the low-level model representation.

Although Bandera provides multiple forms of tool support to address the problems above, we believe that each of these problems is significant and it is unclear at present exactly which technologies and forms of tool support are best suited for solving these problems. Thus, the ultimate aim of the Bandera project is not to provide ``silver bullet'' solutions to the problems above, but, instead, to provide several different forms of tool support along with an open infrastructure that allows for easier experimentation with new techniques.

Next: When might someone be Up: Overview Previous: Overview

Roby Joehanes
Wed Mar 7 18:30:51 CST 2001