Structural Regular Expressions

An extremely influential paper by RobPike, then at BellLabs.

ABSTRACT The use of RegularExpressions for text search is widely known and well understood. It is then surprising that the standard techniques and tools prove to be of limited use for searching StructuredText formatted with SGML [StandardGeneralizedMarkupLanguage] or similar MarkupLanguages. Our experience with structured text search has caused us to reexamine the current practice. The generally accepted rule of "leftmost longest match" is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner. This rule is generally applicable to a variety of text search applications, including SourceCode analysis, and has interesting properties in its own right. We have written a publicly available search tool implementing the theory in the article, which has proved valuable in a variety of circumstances.

Rob Pike, "Structural Regular Expressions", EUUG Spring 1987 Conference Proceedings , Helsinki, May 1987

http://doc.cat-v.org/bell_labs/structural_regexps/


(Moved discussion to ProcessingMarkupLanguages)


CategoryPaper CategoryRegularExpressions

EditText of this page (last edited March 15, 2009) or FindPage with title or text search