BigOp: The BigSys Object Parser

Version 1.0
Ward Cunningham, Cunningham & Cunningham, Inc.
for BigCorporation, Anytown, USA

This document describes how to build, run and maintain BigOp, the BigSys Object Parser. {Note: BigOp, BigSys and BigCorporation are psudonyms for a client that would rather remain annonymous. The author is grateful for their permission to distribute an actual development document in this sanitized form.}

Theory of Operation

BigOp is a java program that reads BigSys schema files, builds an in-memory version of its contents, and then writes that as specially formatted java source to be used in BigSys based products

BigOp has been developed using the java compiler-compiler, JavaCC. JavaCC preprocesses the BigOp source code expanding scanning and parsing rules into ordinary java statements. A scanning (or lexical analysis) rule looks like:

which says that numeric tokens are formed out of strings of one or more digits. All input characters will match some token. (If any didn't it would stop the scanner.) In parsing jargon tokens are also called terminal strings, or just terminals.

A parsing rule (sometimes called a production) combines these and other tokens into a unit appropriately called a non-terminal. JavaCC adds a method to the parser for every parsing rule. The parser actually calls this method when ever it expects to find a particular non-terminal (and, therefore, its corresponding tokens) in the input. A parsing rule referes to other parsing rules by citing the name of the other rule's method, something like:

for an identifier. A complete parsing rule for recognizing a simple assignment could be written as:

The parser calls SimpleAssignment when it expects an assignment, say X=12, in the input. SimpleAssignment then calls Id because it expects the assignment to begin with an Id. (Id will eventually call the scanner to match a legal id, in this case X.) When Id returns, the SimpleAssignment rule (method) instructs the parser to consume two more terminals, the equal sign and the digits 12.

A parser rule's left-hand side, everything before the colon, names the non-terminal. The right-hand side describes how to parse when that non-terminal is expected in the input. JavaCC permits some on-the-fly decision making about what is expected. These choices are expressed with additional syntax in the right-hand side.

These forms cannot be used without discression. It's easy to write rules that are beyond JavaCC's ability. JavaCC is most effective when it can fully resolve every choice by looking a few tokens ahead in the input. Writing rules that parse a given language is a bit of an art. Rule tuning involves moving parsing responsibilities around among the rules, into or out of the scanner, or even into custom java code envoked as the parse proceeds.

We can add fragments of java code to the parsing methods that JavaCC produces. For example, the parsing methods can take arguments and return results. We can also add arbitrary code within braces:

The SimpleAssignment rule above began it's right-hand side with empty braces. This is where java declarations would go if we added code that would need them. (I don't know if the empty block in every rule is a JavaCC requirement or just a convention.)

We must add code fragments to the parsing rules to capture information the parser recognizes in the input. The fragments are sometimes called action routines, or simply actions, because they take action on our behalf just when the parser is passing the input of interest. You need to know two things to write actions: where the parser keeps the information, and where you want to put it. Typical actions incrementally build application specific object structures that are parameterized with the terminals found in the input.

BigOp uses action routines to construct ParseNodes such as EnumDefn and the FieldEntries within them. These objects will be kept until the parse is complete. Then they are given the task of generating java source code files for classes that will carry the BigSys information forward into all corporate java applicatons.

References: JavaCC documentation is currently rather scattered. Readme files in the distribution explain the rule syntax and enough about the parser to write action routines. At the time of writing it is still best to find a sample that already does something like what you need and copy it. The "Dragon Book" is a classic text used in university compiler classes. It provides the background needed to tune parsing rules effectivly. {Note: JavaCC was between versions when this was written. Current readers will find excellent documentation at SunTest.}

Build and Generation Processes

BigOp is a pure java application. It can be built from the following files. {Note: The following build and documentation files are still confidential and therefore not available in the sanitized version of this document.}

BigOp documentation consists of the following.

The folowing diagram illustrates the build and generation processes detailed in the Build Script.

Token's Tour

Let's follow the path of a token traveling through BigOp. Here is some sample input, a BigSys structure definition. We will follow one token (the red one) through the parser, actions, and object structures and then finally to the generated output.

The following non-terminal parseing rule StructDefn matches the input and picks up the structure name as an Id. We declare a temporary string named id and use it to hold onto the token until the action routine can place it in an appropriate structure along with a Vector of fields that define the structure.

As an aside, here is where the StructDefn rule got the terminal string from the scanner. Again we declare a temporary, this time a Token. The termninal, <ID>, returns a Token. We extract the String image from the Token and return that to the caller.

Recall that the parsing rule StructDefn (a method) constructs an object (an instance of a class of the same name, i.e. StructDefn) to save information that it has recognized. StructDefn is a kind of ParseNode. Here is it's constructor.

The constructor finishes by installing the constructed ParseNode into a static HashTable of definitions. Once the parse has completed, all definitions in the table will be asked to write java code that will represent the same information. Here is the code emiter in StructDefn.

        public void emit(PrintWriter s) {
                String i = prefix + id;
                s.println ("class " + i + " extends Struct {");
                for (Enumeration e = entries.elements(); e.hasMoreElements(); ) {
                        FieldEntry n = (FieldEntry) e.nextElement();
                        n.emitAccessors(s, i);
                s.println ("public static Vector getFields () {Vector retVal = new Vector();");
                for (Enumeration e = entries.elements(); e.hasMoreElements(); ) {
                        FieldEntry n = (FieldEntry) e.nextElement();
                        n.emitProperties(s, i);
                s.println ("\treturn retVal;}");
                s.println ("protected " + i + " (int d, String k; String f) {super(new Integer(d),k);}}");

The emit method is passed a PrintWriter that has been opened on the file, a name constructed from the recorded id of the structure. The last s.println writes a short-hand constructor used by code emited from n.emitProperties.

Here is the file that the method writes with the aid of a couple of other emitters defined for FieldEntries held by the StructDefn. Note that each instance of StructDefn generates java code for a whole class.

{Note: The above stubs and meta data follow a form already in use on the BigSys project at the time of development.}

This concludes our token's tour.
March 10, 1997 revised July 29, 1997