Any defect where a fixed-size numerical type is selected to represent some parameter, and the bound of the parameter grow to exceed the range of the type, is a FixedQuantityOverflowBug
. (Perhaps a catchier, idiomatic name would be better. This page used to be called B
illGatesBug in honor of the BillGatesSixFortyKbytesQuote
, but as he apparently never said the phrase often attributed to him, it is hereby renamed. Bill gets enough abuse, though he deserves much of it...)
Without fail, the type selected is initially adequate for the purpose, but as time passes becomes inadequate. In many cases, the pending inadequacy could have been foreseen; though in many others systems were being used long after the initial designers expected them to be replaced.
Notorious examples of this class of bug:
- The failure of the ArianeFive launcher's maiden flight (not the only cause, but the most important direct cause).
- 640k memory limitation in MS-DOS and PC architecture, and all the design warts the PC suffered through (and still does, to some extent) due to this heritage.
- YtwoK problem.
- Each major version of the LinuxOs kernel, in addition to new features, seems to remove yet another such constrait (many left over from its Unix heritage), such as the maximum number of inodes, processes, major/minor device numbers, or other system resources. Modern Linux is now quite scalable; its main issues for high-end use are CPU scalability (which isn't a FixedQuantityOverflowBug) and high-availability concerns.
- The PacMan video game (the original coin-op game, not the versions for consoles, home computers, or any of the numerous sequels) used a 6502 (or something similar) as its CPU. An unsigned byte is used to store the current level, the first level is level 1. Numerous expert players have managed to get past level 255, leading to... level zero, naturally. Except there is no description in the program for level 0, and the machine crashes in spectacular fashion. (The first couple of PacMan addicts who managed this feat actually fled the video arcade, fearing they had damaged the machine and were about to get sued...however a reboot fixes the problem). See PerfectGameOfPacMan.
- Pascal-style strings, which use a single byte to indicate the string length.
Strategies to avoid these bugs:
For quantities that could (potentially - always
and extrapolate it for at least a generation or two; you never know how long your system will last) grow without bound:
- Use unbounded types (such as BigInts) to store the quantity. Many HighLevelLanguages provide only unbounded types; but this is an issue in C/C++/Java/C#/etc.
- If the first step is impractical (there are many applications where the size of an entity must be fixed), choose a bound that exceeds some well-grounded theoretical limit on the Universe. For example, a 256-bit pointer is capable of uniquely addressing every particle in the Universe, give or take a few bits - at least according to current theory. A 64-bit seconds counter will overflow long after the predicted end of the Universe (and certainly long after you retire).
- If both steps 1 and 2 are impractical - you need a fixed size and you need to be economical in space (or you have to use hardware primitive types for performance reasons), then encapsulation is your friend. If you keep the FixedQuantityOverflowBug well-encapsulated and make it an implementation detail of a system, then it will be easier to fix. If the FixedQuantityOverflowBug propagates from your system into the wild (i.e. you use "short" in your implementation so you put short in your API), it gets a lot harder - everything that depends on that API needs fixing as well.
- In addition, be very cautious with persistent data.
For quantities which you can establish an upper bound (either absolute, such as the number of days in an Earth calendar year, or reasonable, such as the number of legs on an animal); feel free to use a fixed-size type if that's more convenient. However, except for memory-limited situations, or datatypes that you will have millions of, you should probably avoid types that are smaller than the standard type (which, on most of today's systems, is a 32-bit integer).
I believe that in most cases a FixedQuantityOverflowBug can be avoided by choosing a sufficiently good ApproximationOfInfinity, a value limited only by hardware resources and empirical performance. Sadly, the "word-sized integer vs arbitrary-precision integer" involves a trade-off between an Infinity which bottoms out far below the hardware-resource limit but is hardly ever reached in the common case, and an Infinity which eats away at the empirical-performance limit even if it isn't necessary. I hear some Smalltalk VMs reserve a bit in the word to differentiate between small integers and pointers (being dynamically typed, these aren't necessarily to actual big integers), but that's not feasible without hardware or OS support.
In many martial arts, such as Aikido and Baghua, the key teaching is to use your opponent's own destructive energy to your own advantage.
This is the common approach in large integer numerical algorithms, where the fixed word size (e.g. 32 bits) of the machine is used as an advantage rather than a bug, by operating in a finite cyclic group that just happens (what a coincidence) to be of order 2^32. Modular arithmetic, a part of NumberTheory
, allows for a vast number of powerful, interesting, and fast algorithms, and is often the preferred approach over e.g. BigNum
s - and indeed, BigNum
s are always based on modular arithmetic, at least in a trivial sense, and often in a deep sense (when the ChineseRemainderTheorem?
is used to reconstruct final answers). -- DougMerritt
- A related pair of signal-processing tricks (not very useful for spectral analysis, but useful in some cases for convolution and filtering) are the various number-theoretic transforms (the discrete Mersenne and Fermat transforms). I've long considered these to be CoolHacks?. :) -- ScottJohnson
- Actually, that general class of things can be used to implement FFTs for sequences of any length, rather than just power of two lengths, thus obviating the need for padding (which can be rather large sometimes), and also in some cases eliminating an otherwise unavoidable source of error (e.g. spectral smearing when the signal of interest has a period unrelated to a power of two). Work by Winograd, Good, Rader, et al. -- DougMerritt
- And without all the nasty round-off errors that the classical FourierTransform has when dealing with limited-precision math.
- Depends; there are two conceptual paths, one which uses only integers - the classical number theoretic transforms, the other path uses reals/complex numbers but of order prime or product of primes != power of two. The latter path is the one that is most similar in its range of end uses to regular 2^N FFT, e.g. the transform space is indeed "frequency", whereas many number theoretic transforms are largely useful only for accellerating convolution. -- Doug
Discussion moved here from CounterInManyProgrammingLanguages:
Both the C++ and Java variants have a potential overflow problem. Doing a general purpose counter in either of these languages is sufficiently more painful. I don't know about all the other languages on this page (some don't exhibit this problem).
Which is why the templated C++ version was provided - if you need greater capacity than that provided by an integer; you can have it by instantiating a counter<X> for some arbitrary-precision integral type X. None is provided by the language, of course, but many are readily available.
Right. As I said, significantly more painful (you have to provide a facility for it). What is the Java solution?
Throw an Exception?
That isn't a solution (for that problem), it's an error recovery strategy.
Uh, why isn't throwing an exception a solution? What would you like to happen if you hit the maximum value the chosen data-type can handle? Wrap? Seems to my the only proper way to handle it is to throw an exception and let the caller (or catcher) decide how to deal with the problem. Doing anything but throw an exception seems to be a distinctly non-general solution.
Well, what a whole bunch of languages do is just give you the right answer. ;) Grab any Smalltalk implementation and evaluate "200 factorial". It works just fine...
Not a good answer, try again. All languages have numerical boundaries at some point, even if that point is in the trillions. Are you saying that even if you want a counter for a 32 bit integer, that you never want to know if it is about to wrap? Try to be more specific with your answer.
[His answer was quite specific. It helps to have a feel for how big factorial is. 200 factorial is approximately 10 to the 375 power. A trillion is only 10 to the 12 power. He's saying that Smalltalk will give over 1200 bits of precision for 200 factorial; it is limited only by the amount of RAM available. Clearly the universe will die before the counter wraps; in fact, that will happen a very, very, very large number of times before the counter wraps. The same is true for the Scheme/Lisp solutions mentioned several places on this page. So the complaint is that throwing an exception is not a whole lot better than just wrapping; neither gives the correct answer, yet other language implementations do.]
Sorry, I think you are oversimplifying the situation. Smalltalk has SmallInteger, LargeInteger?, and various Fraction and Float types that can be treated as numbers. A SmallInteger typically only has a range of -2^30 to 2^30-1. What if you need a general counter for a float or a small-integer? What would you like to happen when you need a small-integer counter and are about to overflow? A general purpose counter usually has to be used for something and often that something may require something other than a LargeInteger?
You could certainly implement an Int32 class in Smalltalk if you wanted, and give it the same semantics as Java's ints. (And similarly, Java has those BigInteger
things, which I imagine are similar to Smalltalk's BigIntegers?
.) There's no reason why you can't have both abstractions available in a language; the question is just which one you want to be the default.]
Yes, exactly. Java has the same issue as Smalltalk - a java Integer is like a Smalltalk SmallInteger and a java BigInteger is like a Smalltalk LargeInteger. Often you will want a small-integer counter
[When this page was still young and small, someone pointed out that TypeMigration
is the most desirable solution, and that's what Scheme and some Lisps offer: the counter starts as a small int, and when it overflows, it is automatically converted to a Bignum, so that you get the best of all possible worlds.]
[It is not
typically desirable to just let a counter overflow, except in C when performance is more important than correctness. :-) It is not clear to me that Smalltalk and Java advocate such an approach, so it's not really clear where this conversation has been going - maybe it's just a misunderstanding of the Scheme/Lisp shining example?]
Smalltalk is doing the same thing as Scheme and Lisp (and Ruby and Python too, I believe) - using the efficient SmallInteger
s when it can, and returning L
argeIntegers when it can't, and you never ever need to worry about it or be aware of it. Java and C are doing something completely different. The problem isn't that Java is incapable of doing the same thing the others are; it's that it's a pain in the ass, because when you type "42" you get one of these kinda-like-an-integer-but-it-wraps things, instead of an object that behaves like an integer.
That's cool until you need the number to be fixed precision, say, to store in a database field. If I need 32 or 64 bit integers, I'd rather just start with that and be told on the off chance that it overflows. I'd prefer that to it just morphing into some byte array with no bounds. Most things I program have bounds, so it is rarely a problem in real life programming. Scientific programming of course is another thing altogether.
I think maybe you've never had really big persistent counters. Making arbitrarily large counters isn't that hard in a decent language, why should we now be limited because the database isn't up to the task? I had to solve this problem by writing the fast-to-parse rep (a platform-independent serializer I wrote for the occasion) to file and databasing the filename. We really needed huge (HUGE, we were counting molecules impacting a sensor), persistent counters.
Likewise, one can correspondingly use tools that are similarly open to unlimited precision numbers - serialize your BigNums? into strings, use a typeless database like SqLite, etc.
See also ZeroOneInfinityRule