Zero And One Based Indexes

Confusion between zero- and one-based indexes is a special case of OffByOne. It often occurs at the boundaries between zero- and one-based worlds. Sometimes the boundaries get blurred, and one world intrudes deeply into another. When objects (or data structures) hold a mix of zero- and one-based indexes, beware.

For example, VisualBasic is one-based by default, but provides access to zero-based Windows GUI controls. Mapping between a selection in a list control and data in an Array requires an adjustment. Get the adjustment wrong, and you're off by 2. Oops.

It doesn't have to; you can create 0-based arrays if you need 'em.

-- DaveSmith

Want the worst of both worlds? Make it a global (or dynamic) option. AplLanguage made this mistake, and so has PerlLanguage more recently.

Many of these result from FencePostErrors... -- TomStambaugh

It's also possible for two people to have a zero- vs one-based mismatch. Is "the first house from the corner" the corner house, or the one next to it? Is "next Friday" this coming Friday, or this coming Friday "this Friday" and the one thereafter "next Friday"? Does the toilet paper go on... Aaargh. -- DaveSmith

Or even, "the first drawer". (Counting from the top or the bottom?) -- KatyMulvey

A bit of culture-shock that hit me on my first trip across the Atlantic was dealing with buildings whose "first" floor was up one level from the street.

At that time a fledgling CeeLanguage-programmer, I integrated my learnings by viewing Europe as a "C" kind of place where people counted floors from street level starting at "0", and where America was simply a PascalLanguage kind of place where the counting started at "1".

Where in Europe do you come from? In many countries the 1st floor is the one above the ground floor. He presumably comes from North America, where the first floor is the one on the street or ground level.

But isn't "First" always an ordinal, whether its index is 0, 1, 2, 42? <grinning>

My 2c here - the floor above ground level is called the first floor in Europe because the ground level would have been exactly that in ye olde days - the ground (and therefore not a "floor" which is a surface that is not on the ground - it is raised above the ground, or on another level). So the first floor is an ordinal, as the ground "floor" was not considered a floor. Jees - hope that makes sense.

If they didn't have a floor, what kept them from falling into the basement? (-1st floor, wine cellar, potatos, stack underflows)

And speaking of elevators...shouldn't I be able to clear a floor stop request in an elevator by pressing (again) its lit indicator? Have I been spending too much time at a GUI?

-- TomStambaugh

What if the light behind the lift button's broken? I currently have to deal with such a situation. I agree that it should be possible to cancel, but I think cancelling should be by 'double-clicking'.

It's crazier in Quebec. I've seen several buildings where the first floor is marked 'Rez de Chauss? (ground floor) but then the rest of the floors are numbered 2,3,... The Grand Hôtel de l'Europe (!) in Bad Gastein, Austria, is built on the side of a hill. The main entrance is somewhere near the middle of the building, so it's considered the ground floor. As is normal in Austria, floor 1 is above the ground floor...but there are also 3 levels below the ground floor. I was extremely tickled to see that the elevator buttons were labeled 1-4 as expected, then 0 for ground, and -1 to -3 for the lower levels! Logical, but I've never seen it anywhere else. --MarnenLaibowKoser, 17 Dec 2014

One friend of mine half-seriously advanced the following thesis: We should count from zero. But "first" is, etymologically, a diminution of "foremost", and (as TomStambaugh says) should mean "0th" when we count from 0. And "second" is from the Latin "secundus", meaning "following", so it should mean "1th" when we count from 0. Obviously the ordinals from "third" onwards get their names from the numbers. So... we need a new word for "2th". He proposed "twifth". Thus: first, second, twifth, third, fourth, ... -- GarethMcCaughan

Ah, but long-established usage takes precedence over etymology. We count birthdays from zero, but first and second are still 1st & 2nd, not 0st & 1nd. Actually, if you want to count something other than birthdays from zero (or fingers - the thumb is the zeroth finger), you can: first, first but one, first but two... Come to think of it, we have both 0- and 1- based conventions in counting from the end, eg second last is last but one.

It seems to be worse in Barcelona, where the ground floor is Planta Baja (Bottom floor), then above it is Planta Primero (Main Floor), then Planta 1, 2, 3 ... So Barcelona's (maybe all of Spain's?) floors are numbered two off from the UnitedStates.

My vote still goes for JavaLanguage dates. The month is 0-based, the day is 1-based and the year is 1900 based. So new Date(01, 01, 1979) gets you February 1st, 3879. Very clever. -- a.l.

Also, a Java array is 0-based but then some of their containers are 1-based, e.g. the SQL ResultSet. -- JohnHarby

Also, Java days of the week start with Sunday as 0. This puts Monday at day 2. Half expect Monday to be day 0, the other half expect it to be day 1, but only the folks at Sun expect 2. (see -- dimiter

Your statement seems odd - if Sunday is 0 then surely Monday is 1? Or do you mean Sunday is 1? I personally wouldn't expect Monday to be 0.

According to that page, the Date class uses 1=Sunday,2=Monday... the only mention of 0=Sunday is that if you erroneously give a 0=Sunday-based weekday value to the Timezone class, it will be silently ignored on some JVMs but throw an exception on others.

One useful verbose feature of some versions of PascalLanguage was the option of either zero or one based variables..

	array[0..128] of int
	array[1..129] of int
But that's also because Pascal forces you to type everything out very verbosely to make sure you are exactly understanding what you are doing. I forget if that was the exact standard or not.

A clarification: PascalLanguage, just as its unfortunate contemporary, Algol68 (AlgolLanguage), and later AdaLanguage, allows any integer range as an index range. In addition, Pascal and Ada have enumerated types, subranges of which can also be used as index ranges. Example:


var work:array[workday] of integer;
It is odd how this, which I guess was common knowledge ten years ago, can appear today as intriguing, verbose, or even at all interesting. Nonetheless, I believe it is something which has been lost in the general surrender to C-style languages.

-- LasseHp (My first contribution to the wiki, bear with me.)

But even this you example ord(sunday)=0 and ord(monday)=1, so enumerated types are implicit zero-based. -- CS

Things get much more precise in the hardware world of VerilogHdl. It is possible to define

  signal foo : std_logic_vector(4 downto 2) -- 4 is first, 2 is last

  signal foo : std_logic_vector(107 to 109) -- 107 is first, 109 is last
When you assign one to the other, you get a (virtual) reversal of the order!

I remember that in old "Advanced Basic" on the IBM PC there was indeed a command that let you dynamically, globally, switch between 0- and 1-based indexing, 1-based indexing being the default IIRC. -- StephanHouben


"All You OPTION BASE Are Belong..." - NeverMind?
Actually, you can have a 1-based CeeLanguage array:

	int zeroBasedArray[10], *oneBasedArray=zeroBasedArray - 1;
I've never done that, but I've done something similar to base an array at -1.

This isn't standard C. Pointer arithmetic (and comparison) in C is undefined outside the range of a contiguous block of storage (allocated either by an array declaration or a call to malloc())

If you want this, you need something like this:

 int array[N+1];
 int *zero_based = &array[1];
 int *one_based = &array[0];
The above is a perfect example of Brian Kernighan's point:
Programming idioms matter, because if they're violated, it's often an error.

The use of "array[N+1]" with "*zero_based = &array[1]" is idiomatic in C, whereas the prior example with a -1 index was not idiomatic even 25 years ago when it didn't violate standards.

Of course, it's even more idiomatic in CeeLanguage to just get used to zero-based indexing, but there are times when 1-based models the problem domain, e.g. with a port of FortranLanguage code that has lots of assumptions of 1-based arrays or implementing HeapSort.

AntiPattern: Don't get too clever in a language without first learning all of its idioms.

-- DougMerritt

Smart variable names can help with the confusion that is sometimes caused here.

My favourite technique when dealing with zero-based indexing was to include "count" in the name when you are counting things, and include index in the name when you are indexing into a zero based array. This is tidier that using say "lineIndex + 1" for count, or "lineCount - 1" for index. -- RichardCollins

The IBM research and development lab in Rochester, Minn., is a very large complex of connected "buildings." Most people would consider it one building, because the buildings are closely connected in roughly a checkerboard pattern. For office location purposes, each office is identified by building, floor, and then a letter/number grid location. However, some of the buildings have a below ground level, and some do not. Floors are numbered starting at 1 for the lowest most floor in that building

The net is, as you walk from one building to the next (and it isn't always obvious that you've done so) you may have just gone from the first floor to the second floor without using any stairs, ramps or elevators.

So, why not just drop the arrays and define a container with a Begin, End, and Next? There is no reason for the outside world to be concerned with whether you use a 0-based array, a 1-based array, a down counting array, a linked list, or something else.

Key Point Or just use a standard set of sequence mapping/filtering operations

This last comment is highly insightful. Why do we still have to write for loops to go through a list? Why can't we have containers that can produce a subset based on a filter, order themselves based on a definition, return the first value, and apply an operation to all members. I believe that with this set of capabilities, I would never have to write another for loop or deal with an index again.

Depending on your language, you can do some or all of that. The standard library containers in C++ can do a surprising amount of the above, albeit with some pretty heinous syntactical pushups. PythonLanguage gets pretty close, though, with its iterable, DuckTyping-compatible containers, and list comprehensions:

 >>> food = ['spam', 'egg', 'sausage', 'bacon']

>>> filter(lambda x: x[0] == 's', food) # Old-style ['spam', 'sausage']

>>> [x for x in food if x[0] == 's'] # New-style ['spam', 'sausage']

>>> sorted(food) # Defaults to value sort on member ['bacon', 'egg', 'sausage', 'spam']

>>> sorted(food, reverse=True) # Or reversed ['spam', 'sausage', 'egg', 'bacon']

>>> sorted(food, key=len) # sorts on len(member) ['egg', 'spam', 'bacon', 'sausage']

>>> [x.upper() for x in food] # apply to each member ['SPAM', 'EGG', 'SAUSAGE', 'BACON']

>>> silly = [1, "two", food, filter] # Homogenous list! food is our list, filter is a function

>>> sorted(silly) # Not useful, but it has a defined meaning [1, <built-in function filter>, ['spam', 'egg', 'sausage', 'bacon'], 'two']

>>> [str(x) for x in silly] # More useful ['1', 'two', "['spam', 'egg', 'sausage', 'bacon']", '<built-in function filter>']

>>> [x.__doc__ for x in silly if callable(x)] # Much more useful ['filter(function or None, sequence) -> list, tuple, or string\n\nReturn those i tems of sequence for which function(item) is true. If\nfunction is None, return the items that are true. If sequence is a tuple\nor string, return the same ty pe, else return a list.']

Dijkstra tells us WhyNumberingShouldStartAtZero:

Screw Dijkstra. Sometimes 1 is a better domain fit. Unless machine efficiency is primary factor, I prefer if languages give one the option of picking the range: int x[1..99][0..7] // declare a 2D array

{Numbering in general should start at zero, but it is indeed occasionally useful to be able to specify index ranges when declaring arrays. I miss that from the languages that don't have it, and I don't know why more languages don't have it. Its performance impact can sometimes be optimised out, and when it can't be optimised out it often winds up being explicit. Perhaps the reason for not providing it is that arrays would often be declared as [1..n] when [0..n] would be more efficient, and it can make code harder to read if you're not aware of the original array declaration.}

To users, one normally displays counts or items starting at 1. If internally the software uses 0, then the code is translating back and forth between the "external" (user) convention and the "internal" convention. If both inside and outside are in-sync, then the code doesn't need translation steps/layers, and is thus simpler and has less risk of cross-confusing conventions. Make the machine slave away to make human life and coding easier instead of the other way around. -t

{The ratio of internal calculations vs presenting a value must be on the order of 1000:1, at least, in the code I write. Thus, the only "translating back and forth" is when the value is displayed, which might involve adding 1 to it, once. In rare cases, it might involve getting user input, and subtracting 1 from it, once. Is your code different?}

I display numberings and/or counts fairly often. Users like to see counts, and numbering can simplify trouble-shooting conversations over the phone by having a common point of reference.

{I display numberings and/or counts wherever they may be useful to the user, and often where they may be useful to the developer in troubleshooting. I still generate the view once (i.e., calculate n + 1 in one place) and display it wherever, and the ratio of internal calculations to presenting the view of the value is 1000:1. Performing +1 or -1 at the boundaries of the system seems like little effort.}

If you have an editing feature, you'll have to translate it back to machine-side for editing. Often the same ID is used in multiple screens and/or views. I'd rather KISS and not have any translation layer so that there is no conversion layer to remember to hook up or to mentally compensate for when debugging. If it costs 0.00001 seconds in response time to get that KISS, fine. I don't work for Google or Twitter and nobody is going to miss that 0.00001. I code for humans, not machines.

{Conversions are needed from user i/o for various types, especially dates, times, etc. Integer indexes are no different.}

True, but the fewer you need or use, the better.

{Perhaps, but if you're used to thinking of array indices as offsets from the start (i.e., zero based), then it's easier to work with what you're used to than work awkwardly with 1-based indexes.}

Since I've used languages and/or am in shops with a variety of indexing conventions, I'll select the numbering that best maps to the domain, when possible, to reduce the need for translation in order to keep the software simpler unless it's clear that machine performance trumps software maintenance costs.

{Fundamental algorithms are domain-independent. Indexes used in A*, for example, don't map to any domain.}

I'm not sure what that has to do with the topic. A coded A* algorithm can use either index convention.

View edit of December 20, 2014 or FindPage with title or text search