Slicing.

In [19], we examine the notion of control dependence, underlying many program analysis techniques such as slicing. We argue that existing definitions are difficult to apply seamlessly to modern program structures which make substantial use of exception processing and increasingly support reactive systems designed to run indefinitely; we repair on that by developing definitions that apply also to control flow graphs without end nodes (or with more than one end node) and which conservatively extend classic definitions. For one of the new definitions, we show the correctness of the induced slicing algorithm, wrt. a correctness criterion based on weak bisimulation. Algorithms for computing the new control dependences form the basis of a publicly available program slicer that has been implemented for full Java.

In [6], we extend [19] so as to handle control flow graphs without end nodes also if they are irreducible. This requires a new notion of control-based dependence, called “order dependence”. A detailed correctness proof is given for the slicing induced by the modified definitions.

[5] provides a feature missing in [6]: the foundation of an approach to slicing which allows for the elimination of loops that do not affect the values of relevant variables, and thereby is more likely to generate slices of manageable size. The corresponding correctness criterion is based on “weak simulation”, implying that the observational behavior of the original program is a prefix of the behavior of the sliced program. A crisp correctness proof shows that for slicing to satisfy this correctness property, even wrt. control flow graphs that are irreducible or have no end nodes, it is sufficient that the given slice set is closed under (data dependence and) “weak order dependence”, one of the new dependencies proposed in [6].

[3] shows that for a control-flow graph where all nodes are reachable from each other, the abovementioned notion of weak order dependence can be expressed in terms of traditional control dependence where one node has been converted into an end node.

[31] provides a foundation for slicing in a non-deterministic setting, addressing two key correctness properties. “Weak correctness”, which allows to slice away irrelevant loops, now requires not only that each observable action by the original program can be simulated by the sliced program but also, so as not to increase non-determinism, that each observable action by the sliced program can be simulated by the original program, unless the original program gets stuck or it loops, the latter possibility excluded if we consider “strong correctness”. To ensure weak/strong correctness, we do not try to invent new suitable control dependence relations but instead simply demand that the slice set (in addition to being closed under data dependence) satisfies the properties of “weak commitment”/”strong commitment” proposed by Danicic et al. We carry out the development in the setting of “extended finite state machines” (EFSMs), and also prove that for each of the properties “weak commitment” and “strong commitment” there exists a least set with that property and that least slice set can be computed by a low polynomial algorithm. We conduct extensive experiments with widely-studied benchmark and industrial EFSMs so as to measure the relative sizes of slices produced by our algorithms, and to compare with the results of slicing algorithms that use existing definitions of control dependence.

[12] provides a foundation for slicing in a probabilistic setting, where variables may be assigned random values from a given distribution, and undesirable combinations of values may be removed by “observe” statements; in the presence of these features, the standard notions of data and control dependence no longer suffice for semantically correct slicing, as demonstrated by Hur et al. in recent work. We present a theory for slicing probabilistic programs, represented as control flow graphs (pCFGs) whose nodes transform probability distributions. We separate the specification of slicing from its implementation: first we develop syntactic conditions that a slice must satisfy; next we prove that any such slice is semantically correct; finally we give an algorithm to compute the least slice. The theory is a non-trivial extension of the recent framework by Danicic et al. that unified previous works on slicing and provided solid semantic foundations to the slicing of a large class of (deterministic) programs. Our correctness results states that the original program and the sliced program have the same final probability distribution, modulo a constant factor so as to allow the removal of “observe” statements that do not introduce any bias in the final distribution. This will be the case if the variables tested by “observe” statements are probabilistically independent of those variables relevant for the final value. To ensure this, a key feature of our syntactic conditions is that they involve two disjoint slices, such that the variables of one are probabilistically independent of the variables of the other.

[2] is an extended version of [12]. A key additional contribution is that we apply our results to the slicing of structured imperative probabilistic programs; this involves establishing the adequacy of the semantics of pCFGs with respect to the “classical” semantics (based on expectation functions) of structured programs, and allows us to show that slicing based on our syntactic conditions (for the corresponding pCFG) will preserve the normalized semantics of a structured program. Another additional contribution is that we allow to slice away certain loops if they are known (through some analysis, or an oracle) to terminate with probability 1.