Optimize Later

YouArentGonnaNeedIt applied to Optimization. -- FalkBruegmann

In other words, you are unlikely to know up front whether an optimisation will be of any real benefit. Just write the code the simplest way. If, eventually after profiling you discover a bottleneck optimise that. [ProfileBeforeOptimizing]
Here's a two-page article by MartinFowler: http://martinfowler.com/ieeeSoftware/yetOptimization.pdf.
Code And Then Optimize

Code written in assembler or C is almost impossible to maintain. Code written in scripting languages is dog-slow. But you can combine the two, and you can profile the dog-slow scripts to find out where the bottlenecks are.


Don't code for performance. Don't use a "fast" language. Code for maintainability and use a language that improves that maintainability. Then profile your code, find out where the bottlenecks are, and replace only those bits with performance-coded fast-language stuff. The result is that your code will effectively run just as fast as if you'd optimized all of it, but it'll be vastly more maintainable. [AlternateHardAndSoftLayers]

["Code written in assembler or C is almost impossible to maintain." I'm sorry, but this is unmitigated bullshit [EditHint: please rephrase with cooler head]. I've been dropped into countless lines of C, C++, and assembly for a couple dozen different processors over the last 35 years, and properly-coded software can be maintained regardless of the level of source being used. Let's please don't get carried away with the hyperbole, shall we? -- MartySchrader]

How much extra effort does it take to develop this "properly coded software"? Does it require SelfDiscipline and BestPractices to stray from a PathOfLeastResistance that would not be required in the scripting language? It is empirically proven that developing and maintaining code in traditional 'hard layer' languages can easily be an order of magnitude less productive than doing the same in a scripting language, especially for certain classes of tasks: configuration, animation, UI layout, character AI, and other domains where we need a lot of fuzzy, specialized code with empirical tweaking. Assuming time or budget constraints, an order of magnitude can indeed qualify as "almost impossible". So how much is hyperbole, and how much of ItDepends on context?

Make it work,
make it correct,
make it fast,
make it cheap.
-- (attributed to) AlanKay

And it is way harder than it might seem because:

Make it work (and you will have bugs),
make it correct ( or just GoodEnough ?),
make it fast (and for that, you will modify the code, and add bugs, and that will send you back to Make it work)
make it cheap. (which is very hard to achieve after all the iterations you required to reach this point)

What about making it "safe" (secure, NotEasyToCrack??)? does it go inside "correct"?
OptimizeLater is one of the ways to produce SelfDocumentingCode.

Don't optimize until you need to.


There are many reasons to leave optimizations until later. The prime candidate is that the necessity to optimize is a very powerful force when writing the code. If it is applied too early, it can impact the codebase so powerfully that the code becomes unreadable from early on. If code is unreadable, it is unmaintainable, and it is unlikely it will function.

Optimizations almost always require extensive documentation! They are rarely the simplest approach. (cf. YouArentGonnaNeedIt, DoTheSimplestThingThatCouldPossiblyWork, LazyProgrammer).

Also, without accurate information as to what need to be optimized, you are likely to throw resources in the wrong places. TheFireIsWhereItIsHot?, not everywhere.

Fast code is easy to write, as long as it it doesn't have any other good qualities. The temptation in early optimization is to ignore other things that matter more.


"I like tight, efficient code."'

So do I, I just hate reading it. Moreover, it is difficult to understand, even by the original programmer, and thus highly errorprone.

"Optimized GoodTightCode is elegant!"

No, elegant code is elegant. Optimized code is usually ugly. Optimal algorithms are usually elegant for the sheer genius of them, but their implementations are horrendous.

But code that is elegant and fast is better than code that is elegant and slow.


When it comes down to it, you might need to optimize in certain locations. If you had bothered to NarrowTheInterface, this would be easy because you'd only have to change a small locality.

Never write code with the intent to optimize it!

Never say never. The practicalities of the ComputerGamesIndustry mean that I am constantly writing code with the intent to optimize it (later). Many specific pieces of code in a game (e.g. lighting calculations) are executed thousands of times - and this is known in advance. OF COURSE I'm going to write code with the intent to optimize it (later). The statement above is terse enough that I may be misinterpreting it completely, of course! Can someone clarify? -- EddieEdwards

I'm with Eddie on this one. In fact, I'll take it a step further. I program DSPs for a living, and the processing unit I use most has 1K of instruction space and a similar dearth of data space. Note: This is a good reason to optimize (now). I'm constantly fighting a resource battle; at any given time, all of the resources are used and I need to cram in yet another feature. What this means is that I have to further optimize the existing code to make space and then jam in the code for the new feature. This means optimizing (now). If I don't optimize (now), I can't even test the code, much less release it -Timdog

See also DoTheThingThatMightWorkWell, in other words, don't necessarily fully optimize upfront, but certain upfront optimizations are very healthy. Isn't this page (and all related) a jump from one extreme to another? -- CostinCozianu

I think the above can be summarized by saying "OptimizeLater" - unless you have worked in this ProblemSpace before, using these technologies, in which case you use HeuristicRules of particular optimizations that you know apply to the environment. -- RobertMcAuliffe

Optimize Later is not an excuse for initial incompetence. Programmers should have a clear understanding of what affects application performance. It is a required skill. They should keep this in the backs of their minds while coding, but without it being a driving force. For example, you should know if you have a long list of items that need to be looked up by a key then you might need to use an indexed Set or Map of some sort, and not an array or a Vector. This is not an optimization, it is just doing it right in the first place.

It is an optimsation. If you have correctly factored your code then it doesn't matter what representation you use. Then you can do the simple thing first and optimise it only when you need to. What programmers need to keep in the back of their mind, and the front of their mind, is that this code may, almost certainly will, need to change, so it should be simple, clear, elegant, well-factored, and easy to change. Then it doesn't matter if the first choices are wrong, you can put it right later. Yes, programmers should have a clear understanding of what affects performance, but that just means even more that they should correctly modularise and factor their code. Ability to change when appropriate is what really matters.

This is true, and illustrates the point. If code is well factored, the cost difference between using one solution that is slow and one solution that is faster (but not necessarily optimal) can be minimal, if made up front. Often, it's just a question of selecting a suitable class, method or algorithm, rather than selecting an unsuitable one, in circumstances where the choice is clear and simple. But changing it later, even if the code is well written is always expensive. You have to find the problem, you have to change it, and you have to retest it. Furthermore, if a programmer makes the wrong choice once, they are very likely to make it again.

You have confused me - you seem to be arguing both sides at once. My point is that if you have properly factored code then changing you class, method or algorithm will not be expensive. Work done now is speculative. Optimisation done now may not have been necessary. Many people have optimised idle loops. Don't optimise until you have profiled and know that it's needed at this spot. If your code is well written it won't be more expensive to change it later.
Yes, I know the caveats and exceptions, yes I know this is not universally true, but you will save more time by using this guiding principle than it will cost you.

Where this issue might be important is, for example, in a code review. If a reviewer sees that a trivial change can be made that will clearly improve the performance of the system without affecting the clarity or functionality of the code, should they insist the change be made? If the change is made, the cost may be a few minutes of time. In addition, the developer whose code is being reviewed may learn something. In the future, the developer may now deliver more efficient code at zero cost. If the code is not changed, there is a chance that it may have to be fixed later, at a much higher cost. Furthermore, the developer may introduce further inefficiencies in the system, which cumulatively could have a major effect.

Over-optimization is bad, but that does not mean that considerations of efficiency have no value. They have value, but they are subservient to those of structure. Where the structure is the same, the more efficient solution is the right one.


Text searching is a very well studied field. There are many good algorithms for doing fast text searches beyond the simple loop and test (aka scan) algorithm. However, in most cases, a simple loop and test (or even better, strcmp() or string::operator ==) are likely to get you where you want to go with the minimum of hassle. If you're writing a web search engine, well maybe you need a better search algorithm. But keep that contained in a nice module so it can be changed, upgraded and tested easily.

A long time ago, people suggested to me that instead of

  y = x * 320;

I should write

  // y = x * 320;
  y = (x << 8) + (x << 6);

(or even worse, without the comment)

The theory is that bitshifting is faster than multiplication. However, I figure, if I'm going to multiply by 320, I should multiply by 320. Moreover, most modern compilers will actually optimize the multiplication to be the shifts anyway and are, in fact, faster at it than you are!

Not to mention the fact that you shouldn't be using hardcoded values like that anyway. NamedConstants are much better.

Actually, as it turns out, most modern (embedded systems) compilers won't change the multiplication to bit shifts for you. Yes, I recognize the irony in contradicting my own anecdote albeit years later. Perhaps there's a lesson in that. -- SunirShah

Also, that bit shifting could be slower on most systems. If a CPU can execute a multiplication instruction in one clock cycle then your bit shifting instruction may take 3 cycles (shift, shift, and add). And if you are developing this on a microcontroller with limited instruction space, your going to end up using more instructions to accomplish the same task. -- BlakeMason

Most microcontrollers that are so silicon-starved as to not support a large instruction space will also lack multiply instructions. The most you can hope for is a multiply-step instruction. Shifting is still preferred there too. --Samuel A. Falvo II

Actually, on a Pentium, the multiply operation is a non-parallelizable 3-cycle integer multiply (and two loads, one maybe immediate, and maybe a store) while the shift-and-add operation is 3 possibly parallelizable 1-cycle integer add/shift operations (and three loads and maybe a store). The worst-case run time of the multiply operation is equal to the best-case run time of the add/shift operations, and both of them will either stall or occupy one of the two instruction pipelines. On machines with slow memory (i.e. most modern consumer machines), extra instruction decodes and memory accesses cost much more than even a floating-point multiply.

Most modern CPU's (even those made by Intel) are better than Pentiums - the integer multiply is as fast or much faster than equivalent shifts and adds, except for trivial cases which are equivalent to simple bit shifts (no addition), and of course multiplication by constant 1 and 0.

Microcontrollers without useful integer multiply instructions have no choice but to use the shift/add algorithm, call a function in a math library, or even use application-specific code.

I think this thread of discussion illustrates perfectly why you should start with the FirstRuleOfOptimization, and only then, maybe, move on to OptimizeLater: while effecting multiplication via bitshifting may be faster on some platforms, it is certainly not faster on all platforms. Writing your code at the highest semantic level appropriate for the application allows the compiler the greatest amount of leeway (usually) in optimizations, especially when it comes to cross-platform code that may run faster with multiplication or may run faster with bitshifts, and which may be optimized for speed or may be optimized for space. The general rule I follow when considering optimizing my code is that the people who write my compiler are Smarter Than I Am, and know my hardware Better Than I Do. While I’m here, I’ll note that while shifting may outperform multiplication on some systems, there comes a point (in terms of the number of high bits in your multiplicand, and thus necessary shifts and sums) on just about any system beyond which shifting and adding simply will not be faster.

"Optimizations always bust things, because all optimizations are, in the long haul, a form of cheating, and cheaters eventually get caught." (from LarryWall)

Horror Stories from the Trench:

Once, I met a programmer who attempted to optimize code while typing it in for the first time. He never got his code working. In the end, we had to chuck both the code and him and start again. -- SunirShah

I have just built an elegant and extremely slow GUI program. I'll let you know next week whether I optimize it into performing beautifully, or whether my colleagues chuck me and my code and start again. ;-)

Also the bit above about shifting vs multiplication reminds me of an classic war story from someone else's trench: many commonly used compilers were "optimizing" divisions into arithmetic shifts, e.g x div 2 -> x asr 1. This is of course incorrect because when you arithmetic shift right a 2's complement negative number, 1's are placed on the left hand side, so -1 asr n is still -1! This is described in the paper ArithmeticShiftingConsideredHarmful.

-- LukeGorrie

Actually, "still -1" is the right answer! The division gives -0.5, which rounds to -1 for an integer result.

Most definitions of integer division specify that the result is truncated, not rounded (or floor()ed): -1/2==1/2==0.

It depends on the programming language. Some languages, like C and Java, have division that rounds towards 0. Other languages, like Python and Ruby, have division that always rounds down. In those languages, division by 2 and right shift by 1 would be equivalent.
I was working with a Unix/Oracle business system that was running too slow. A highly paid "expert performance consultant" took exception to the code at the start and end of each function that said "if global_debug_flag is set, then call trace routine." Consultant demanded that we delete all the "if debug" statements, due to performance issues. (Not make them a conditional compile; DELETE them!) (Numbers: Maybe 200 flag tests a transaction, all being false. Each transaction also does about two dozen database calls.) I insisted he was full of it, and refused to do it.

Soon after that, we made the program stop inserting a record into a certain empty table at the start of each transaction, and deleting the record at the end of each transaction, leaving the table empty. That made the program run 3 times faster.

(In Oracle, the overhead to add the first extent to a table, or drop the last one, is rather high. ;-) -- JeffGrigg
Early dBASE-II advertisements described how their product was unusually fast and efficient because it was "written entirely in assembly language."

However, as anyone who actually used it could tell you, it was dog slow: (...not to mention being chock full of bugs! The 16-bit version, in converted assembly, was DOA -- practically unusable.)

I actually used it, and it was lightning fast, compared to alternatives available to the average users at the time (early 1980's). You could run scripts and the scripts could be compiled and used via Clipper! You could be done in the time Windows takes to boot up! And if you take the time it takes Access to load you could be into the second thing for the day, instead of still waiting for the program to get ready!
PrematureOptimization often makes things slower anyway. This is because optimizations work against modularity; they use one's global knowledge of the program to cut across modularization and produce a solution that is faster overall, but less easy to understand, since the new solution combines details of how several parts operate, in order to produce a special-case algorithm that couples them together. But many systems are too slow not because the algorithm at their core is slow, but because there is insufficient modularization and poor structuring, and so unintentional duplication of work.
I'm surprised not to find on this page or elsewhere my golden rule for any optimisation, which is to define it such that it can be switched on or off with a simple flag. You then turn the flag on or off in the code, with start-up switches or even via a debug menu. -- DavidWright

It seems this rule would lead to a combinatoric explosion of flags, plus you now must test and maintain both the original and optimised versions of each optimisation.

I do see one way of making this work in a maintainable manner: define optimizations as 'code transforming' stages, annotate code (e.g. with identity functions) to 'suggest' points of profitable transformation, then use switches to enable/disable these passes. This would be similar to how the LLVM optimizer works. However, it is more difficult to develop generic optimizations that would work this way.

One problem is that this very good advice often leads developers to make the step from OptimizeLater to skip optimization altogether resulting in slow applications and servers. Often as developers, things that we can do later often get skipped completely in a crunch. So by all means optimize later and definitely optimize what matters, but always measure so you don't introduce gates. Optimize what matters is really important, for example there is usually no gain from optimizing some loop conditions if the loop is performing database queries.
Jonathan Blow, the creator of Braid recently talked about becoming an independent game programmer, http://the-witness.net/news/2011/06/how-to-program-independent-games/ and made two exceptionally good points:
  1. ) That we seldom know what is really slow in code (underlining the focus on profiling over optimization)
  2. ) If you're optimizing holistically, you should also be optimizing for "programmer time spent": heavily and unnecessarily optimized solutions and their effect on code maintainability and extensibility make the overall process far less optimal.

That is a good talk. Thanks for sharing.

See: PrematureOptimization, MakeItFastBreaksMakeItRight, DoTheSimplestThingThatCouldPossiblyWork, LazyOptimization (because KentBeck and KenAuer are smarter than me), PhasesOfOptimizingLater, OptimizationUnitTest, RulesOfOptimization

Contrast: UniformlySlowCode

CategoryOptimization, CategoryLazyPattern

View edit of December 28, 2012 or FindPage with title or text search