Sufficiently Smart Compiler

This is a classic argument often pulled out in a LanguagePissingMatch. The gist is that the HighLevelLanguage H may be slower than the LowLevelLanguage L, but given a SufficientlySmartCompiler this would not be the case. Moreover, this hypothetical compiler could use the high-level information available in language H to perform optimizing transformations which are simply not possible in L, thereby making an optimally-compiled H even faster than an optimally-compiled L.

In reality, there are a few such compilers, such as the Stalin compiler for SchemeLanguage (which was never intended to be usable on huge, production-level programs), but that's not the point. It's mostly an empty argument that's trotted out whenever a language is put down for not being AsFastAsCee.

Note that this kind of argument is only invalid as long as the SufficientlySmartCompiler is hypothetical. The very moment compilers appear that do exploit the high level information this argument is valid. Java is the best example - in the beginning it was slow - much slower than C - and SufficientlySmartCompiler was only used as an excuse. But then as Java became more popular, support (for whatever reason) for better compilers (or rather just in time optimizers) started to appear which did exploit information that was not available to C compilers: Real time profiles of code use (optimize code); real time use of reference use (optimize garbage collection). These make Java indeed AsFastAsCee (or faster) in a large area of applications (I can give lots of studies/examples).

Note that CeeLanguage has optimized compilers. Perhaps a SmartCeeCompiler? simply generates faster code than a SmartHighLevelLanguageCompiler? as a result of the language design. It's like a normal person on caffeine versus an Olympic athlete on caffeine. The issue isn't how much caffeine they drank.

At what point does C stop being C and become a new language? In order to get past the limitations of the old C, the new C (C99) no longer permits pointing different sized types at the same memory. This allows out-of-order processing of pointers to different types. (A further extension allows you specify that no other pointer ever points to a value). The new language permits a SmarterCeeCompiler?. Is that cheating?

In many cases, the (alleged) optimizations that would make language X AsFastAsCee are either a) unknown, b) still a subject of active research (and not yet ready for production use), c) computationally difficult (NpComplete or worse--many optimization strategies are in this category), or d) just plain undecidable.

A related issue is the SufficientlySmartVirtualMachine.

Many optimization strategies for C are NpComplete as well, and so is register allocation by graph coloring. NP completeness is not a criterion for dismissing a compiler technique. -- ScottWolchok?

I agree with that assertion. I'd add that even 'just plain undecidable' isn't sufficient criterion for dismissing out of hand a compiler technique. Most undecidable problems that are undecidable in the general case are quite decidable in specific cases. Similarly, many NpComplete problems are only non-polynomial in the general case or when seeking a global optimum; human-written code tends to have a lot of structure that can be leveraged by pattern-matching algorithms to reduce search-space for optimization points. And one can usually grow things much faster by settling for less-than-optimum... which allows for heuristics and such (even specialized AI) to replace brute-force methods. Undecidable and NpComplete are only a problem if you force them to 'completion'. Just abort or deprioritize optimization-efforts that are taking too long, or make them more incremental. It isn't as though most optimizations are required.

If a compiler DOES use some undecidable and NpComplete techniques, though, I'd only suggest a parameter in the optimizer options saying exactly how long it may spend optimizing code (e.g. 10 seconds), and possibly make the optimization-process interruptible (complete it now!). Further, I'd design the compiler to create extra files that track successful and unsuccessful optimization-attempts (at least the expensive attempts) on a per-project basis in order to speed up iterative edit-compile-run cycles and have a more additive effect over time (so it uses its time mostly to make new optimizations rather than reproduce the old ones time and again).

The greatest of reasons to skip optimization techniques has less to do with their time-complexity than with the fact that they're truly, awfully, horridly painful both to implement and verify as legal. That really needs to change... we need something like a DomainSpecificLanguage dedicated to describing optimization techniques that makes them easy to implement (e.g. pattern-matching, functional or ConstraintLogicProgramming spec of attempted transform(s), etc.), along with a system that systematically searches for, attempts, and verifies these as legal and more-optimal at point of use (possibly in combination with other transforms, though that would need to be limited by heuristic to avoid CombinatorialExplosion). It would certainly be nice to be able to pass a file describing optimization suggestions, patterns, and heuristics to the optimizer - it would certainly improve research in the field. (First we compile the optimizer, then we optimize the code... hmmm... MetaProgramming at its finest.)

Moved discussion about C, assembly, risc processors and LISP machines to DoMicroprocessorsLoveCee. -- ClaesWallin

The exception that really does prove the rule is that there some domains where there is profit in building a SufficientlySmartCompiler, to make specific cases run faster than a C compiler.

Take VerilogHdl for example. This is a language used for hardware design. Running fast simulations requires code that produces the effect of massive parallelism. Some simulation vendor provide solutions that translate the verilog code to (very efficient) C, and then rely on the C compiler's optimisations to make it really fast. But the top vendors don't compile to C: they compile directly to object code, making full use of their application-specific knowledge and the full instruction set of the cpu (and cache structure, etc.) They even have engineers who look at customer's code, and modify the compiler to recognise specific hardware design styles (and components). A high-end direct-to-object-code compiled simulation knocks the socks of anything that a general-purpose C compiler can produce - because it is smart: it doesn't attempt to be a good compiler for general-purpose code.

Certainly, but this is really rather off topic, because translating language X into C and then compiling the result is not at all the usual topic of conversation when people raise issues about a SufficientlySmartCompiler or AsFastAsCee etc.

To knock the strawman down: if a human expert translated a program in language X (specifically including VerilogHdl) into C by hand with speed as the goal, then the result would in general be vastly faster than the above-discussed SufficientlySmartCompiler of VerilogHdl->C. To put it another way, it is in general true that a good native compiler for language X will tend to be better than translating X to C and then using a C compiler, but that is not surprising and not usually at issue.

Superior performance isn't the only attribute frequently assigned to the (mythical) SufficientlySmartCompiler. Many such compilers are also capable of other incredible feats of reasoning about programs, such as detecting errors at compile-time that lesser compilers need expensive runtime checks for (or ignore completely).

In some case, the hypothetical compilers are SufficientlySmart? that they can do so, despite the fact that the computation in question has been proven to be undecideable on a TuringMachine. (i.e. "with a sufficiently smart compiler, you can prove whether or not a program written in X will halt!" If X isn't TuringComplete, I suppose).

Does the MercuryLanguage compiler count as sufficiently smart? Compile time garbage collection, automatic parallelization of code, beats other logic programming languages in performance, etc.

It might be a smart compiler, but there's no way it touches on the 'magic' of the 'SufficientlySmartCompiler'. Keep in mind that the term is often used with much hand-waving.

It was once believed that no C compiler could ever produce code that ran faster than hand written assembly. However, modern C compilers can relentlessly optimize every part of the code, and anecdotally, can out perform human assembly programmers.

This is still true - a compiler can never win this battle. All a human programmer has to do is take the output of the compiler and make a single optimisation, and he/she wins. This is the advantage that the human has - they can use any of a wide variety of tools at their disposal (including the compiler), whilst the compiler can only do what it was programmed. The best the compiler can hope for is a tie.

But, this is ad nausea. Provide the human edited improvement to a compiler. What has been implied is From Scratch, a Compiler will output better optimized object code than a Human, and this should be qualified with "for a given time constraint." Both of these are not only valid constraints but one could say necessary. Not many projects have the Railroad Tycoon's luxury of time (and compulsion) or need to be written the bottom up in assembly.


Link-time optimizations are rather difficult to perform given a set of '.o' object files. Thus, given that a great many global static optimizations cannot be automated prior to link-time, it shouldn't be difficult (with a language designed to support optimizations) to surpass a C compiler in global optimizations. This is especially true given C's slow and inflexible procedure-call protocol; assembler benefits a great deal from the ability to specify register-passed parameters, letting the programmer control exactly which registers might be 'trashed' and thus might be desired for saving, etc... and to largely avoid unnecessary memory access (because RAM access is moderately expensive, even with L1 and L2 cache). Further, due to state-based programming ('side-effect' programming), C is awful at most partial evaluation optimizations (save inlining and basic integer or floating-point ops). Partial evaluation and redundancy elimination are the two biggest local optimizations one can perform. With C missing one big set of local optimizations and the vast majority of global optimizations, it should be quite possible to design a high-level language that can produce code that is FasterThanCee, possibly by a wide margin in some problem domains.

The trick is to make a language that can be compiled to something FasterThanCee that also provides the extra features seen as so desirable in the high level languages. And, once that is done, the hours upon hours of arduous effort placed into the C compiler would need to be matched.

Regarding link time optimizations, whole program optimization and even profile-guided optimizations are already possible for C compilers. I'll grant some of the other points -- I've sometimes thought it would beneficial for a compiler to be able to change the default register usage (how many are used for arguments, how many for local values, how many are saved & restored) depending on the specific program, but in such a case you have to know you have the *whole* program, and you're not part of a dll, for example.

Some link time optimizations are possible (e.g. limited inlining and redundancy elimination for C string constants, functions provided as 'static' in header files, etc.), but there are many that would require disassembly of the code. Of course, many of the best link-time optimizations are still based in partial evaluation, which the C compiler isn't so good at even locally.

For the register usage, it is possible for a compiler to adjust those internally so long as it provides (at the ABI, and for function-pointer passing) the required c-declaration wrappers. I wouldn't be surprised if some compilers already do this. I'd track per-function communications and register usage information (input, output, trashed) and, where I don't just inline the function, make it the job of the optimizer to color all those registers to minimize the total register saves and restores program-wide. Then, for the DLL exports and function-pointers, one can create an interface to the functions that additionally meets the C calling conventions... e.g. saving exactly the necessary registers. Link-time optimizations then become possible if you hold onto that register info by linking directly to the inner version. Unfortunately, such link-time optimizations ideally require program-wide register coloring... it's exactly that sort of thing that is made difficult by the CeeCeePlusPlus(C/C++) compilation mechanism.

Link time optimizations is mostly a red hering IMO. Compilers like msvc and icc do tricks like saving their intermediate representation in the object files and doing their normal compilation steps at link time. Anyway in practice the gains here are small. Most time is still spend in inner loops. Most software is designed that all stuff needed for the inner loop is in a single file and inlining even surpasses all gains possible by adjusting calling convention as we can avoid the calls completely!

I tend to see a great deal of UniformlySlowCode when measured in terms of CPU time spent within any given function. However, that may be a consequence of more event-driven programming with intelligent caching. Link-time optimizations can achieve uniform speedup in these cases. It is true that MSVC and ICC do achieve some link-time optimizations, which helps across '.o' files.

A SufficientlySmartCompiler may also be able to turn a high percentage of "imperative" programming statements into declarative ones, such as into a functional sub-language. This may blur the distinction between "imperative" and "declarative" language. As far as determining if a given statement is one or the other, it may be difficult to formulate a general way to tell. One may not be able to rule out that the viewer/optimizer is not just clever enough to find a declarative solution to it. There may always be undiscovered axiom that allows us to turn some or all of it into a declarative statement. -t

Link-time optimizations in C are usually done by delaying code generation until link time. With Microsoft's toolset, this is known as Link Time Code Generation, or LTCG. No disassembly required.

View edit of October 9, 2013 or FindPage with title or text search