The Simplest Possible Compiler

See XpForOptimizingCompilers.

The simplest optimizing compilers I've seen are in PeterNorvig's ParadigmsOfArtificialIntelligenceProgramming and ChristianQueinnec's LispInSmallPieces. They share several traits:

All of these compilers were written for instructional purposes, and all were built up from even simpler interpreters. Norvig does an especially spiffy job: he fits a compiler, an assembler, a peephole optimizer, a branch simplifier and a VM all into two chapters.

Furthermore, I suspect that any of these compilers could be built using XP and a bit of domain knowledge. Even better, you could improve them further without radical re-design--techniques are known for doing native code generation straight from an AbstractSyntaxTree.

But what about a modern production compiler?

Modern production compilers generally don't rely on ASTs. Typically, they use something like (say) N-address code, possibly in StaticSingleAssignmentForm. They use lots of funky algorithms. For an idea of how hairy it can get, compare the algorithms in StevenMuchnick?'s AdvancedCompilerDesignAndImplementation, or even TheDragonBook (older), to the books above.

I don't think you could build one of these beasts without some prior design work. I don't think you could start out with one of the simple compilers above, and transform it into one of these without breaking a lot of test suites for several programmer-weeks. I could be wrong.

Or does XP classify this kind of broad-sweeping, hard-won, architectural knowledge as a SystemMetaphor, as was suggested by RalphJohnson? -- EricKidd?

AssemblyLanguage fits here in 2 different ways:

To get really fast applications, the compiler will, of course, have to compile all the way to MachineCode. (To make it easier to test if it's working right, many compilers have a compiler option to save a ".cod" (MachineCode) file, which shows the raw hex MachineCode bytes, the AssemblyLanguage, and the high-level source.).

If I get to pick the source language, the simplest possible compiler for me to write is an assembler for AssemblyLanguage. If you get to pick the source language and don't have to optimize, the simplest possible compiler is the null compiler that performs an identity transformation. Let the source language be the target runtime format. Done.

The simplest optimizing compiler I've written was 200 lines of C code, compiling ForthLanguage to powerpc MachineCode. Optimizations were limited, but included TailCallOptimization, ConstantFolding?, inlining, and a generic PeepholeOptimization? mechanism that used something akin to sed's s/foo/bar/ to rewrite short segments of code. Oh, and it was hard RealTime in the absense of ForthMacros (although, in Forth, macros are rarely absent); the amount of work for a character in the source code was bounded. -- AdamBerger

ForthSimplicity has the complete source code for a (non-optimizing) compiler.

ColorForth also has a very small compiler, with limited optimizations. ForthLanguage lends itself to simple compilers. It is more of a macro assembler for a virtual machine than a high-level language.

I remember that the compiler (to 68k) for the FalseLanguage had only 1K size (written in AssemblyLanguage). -- GunnarZarncke

The C compiler I wrote for the BbcMicro while at Acornsoft ran in 20K - code and data. It barely deserved the title "optimizing", though. Paul Fellows also wrote a Pascal compiler for the same machine that also ran in 20K. -- PaulHudson
See also StepsTowardTheReinventionOfProgramming


View edit of June 10, 2008 or FindPage with title or text search