2010 Technical Reports

Department of Computing and Information Sciences
Kansas State University

Report 2010-1. Abstract parsing of string updates and user input by Kyung-Goo Doh, Hyunha Kim, and David A. Schmidt
Abstract: We extend our formulation of demand-driven, static-analysis-based, abstract parsing of the strings generated by PHP scripts to include strings that are generated from string-replacement operators and user input. Our approach combines LR(k)-parsing technology and data-flow analysis to analyze, in advance of execution, the documents generated dynamically by a script. String-replacement operations are computed statically by composing the finite-state automaton defined by a string replacement with the finite-state control of the LR(k)-parser, and user input is predicted and processed by characterizing the input by an LR(k)-grammer and analyzing the strings generated by the grammer. Our work is implemented in Objective Caml.

Report 2010-2 Modular, parsing-based, flow analysis of dictionary data structures in scripting languages by David A. Schmidt
Abstract: We design and implement a modular, constant-propagation-like forwards flow analysis for a Python subset containing strings and dictionaries (hash tables). The analysis infers types of dictionaries and the functions and modules that use them. Unlike records and class-based objects, dictionaries are wholly dynamic, and we employ a domain of dictionary types that delineate which fields a dictionary must have. We have deliberately omitted unification-based inference and row variables to obtain the benefits of a forwards analysis that matches a programmer's intuitions. Nonetheless, to accommodate a modular analysis, the values of parameters and free (global) variables are represented by tokens to which are attached constraints. At link- and function-call-time, the constraints are matched against the actual values of arguments and global variables. Finally, programmers are encouraged to use a BNF-like syntax to define the forms of data types employed in their scripts. The analysis uses the programmer-written BNF rules to ``abstractly parse'' program phrases and associate them with derivations possible from the programmer-defined grammars. A prototype of the system is under construction.