Identifiers Are Comments

Consider this example from the MethodCommenting page.


  ++i;	/* record another match of this expression */

As the example above demonstrates, a typical comment can be encoded in an identifier. Most programmers attempt to encode some information in an identifier that is useful in understanding what role the identifier serves in the code. However, this information could be completely misleading or even complete gibberish. If the programmer isn't vigilant when modifying code, identifiers with initially well chosen names could end up playing very different roles in the code while retaining their old names. This is exactly the same problem as when a comment isn't updated to reflect changes in the code it is documenting.

This is a refactoring issue. When you change what the code does, you should also change identifier names (and method names: IntentionRevealingSelector) as appropriate to make sure the code still communicates its intent correctly. At least you have the advantage that your identifiers and method names are precisely captured in a form amenable to tool manipulation (source code). It's much easier to update the references to an identifier than it is to find and fix all the comments which describe in English a certain concept.

A great quote someone left on the CommentCostsAndBenefits page:

English is more descriptive than any programming language. It doesn't express programs.

One of the tenets of ExtremeProgramming is that large methods should be broken up into small methods. This kind of code will have a lot more semantic annotation in the form of all these extra method names that code with longer methods won't have. If XP also stresses long method names, then even more annotation will be present. If identifiers really are comments, then practitioners of XP are in fact systematically documenting their code even though they officially eschew comments. Is this correct?

Hmmn - I agree that meaningful identifiers can obviate the need for various comments. But at what point does it become impractical to use a really long variable name? Something like the example above is pretty darn unwieldy to type if its a commonly used variable. Yes - we all know that programs are supposedly "write once; read many." But at some point, doesn't the length of the variable name and the inconvenience of typing it and the increased likelihood of mistyping it begin to outweigh its usefulness?

number_of_expression matches is very clear, but also very long and unwieldy. num_expr_matches is a tad shorter. How much less readable is it? What about num_matches? Where is the midpoint? (somewhere between 16 & 32 characters perhaps???) Or is it just a matter of having nice tools (like Emacs for example) that have auto-completion for your identifier.

I'm not trying to be facetious, this is a genuine question I have -- BradAppleton
To me, it is not just the name which communicates, but the name and its immediate context. At some point, the information in a long name becomes redundant given its context. It is like the difference between a bunch of rather short names which work together to convey a meaning and a bunch of long names which hit readers over the head like baseball bats. At some point, the readers of the latter will say, "Gee, the authors must think that we are stupid." -- MichaelFeathers

Names should be as meaningful as possible, but not necessarily long. I'd never abbreviate, if I weren't so lazy. "maxPay" is probably ok, but "maximumPayment" probably better. Readers who think that code writers are thinking about them when they write code are, in fact, stupid. We write this way because we are stupid. -- RonJeffries

I don't think it's that we are stupid, more that we are forgetful. Just as we were entrenched and immersed in the muck and mire when we coded a particular section, we are often more distant from it when we revisit (and entrenched in something else ;-) So if we are thinking only of ourselves, we use names and comments that will trigger the requisite mental association to "refresh our mental state" and push the retrieved context onto our mental stack.

I think this is different (or at least *can* be different) from trying to teach or explain to first timers with our names and comments. The former presupposes a non-trivial degree of "clue" about the overall design/organization and tries to "recreate" a mental flow-state. The latter makes no such presupposition (or at least significantly less). Which is better? Who can say, it probably varies from shop-to-shop. -- BradAppleton

Ya know, every now and then someone around here starts a conversation to which I can actually contribute something. Makes me feel all glow-y.

CodeComplete, SteveMcConnell's killer book on coding, offers the following interesting data point:

Gorla, Benander, and Benander found that the effort required to debug a COBOL program was minimized when variables had names that averaged 10 to 16 characters (1990).

Also, simply from my own experience, I would add that a very important convention for naming identifiers is the use of specialized sub-words with absolutely rigid meanings:

I also use:

I do not use HungarianNotation in any way, though I do add a trailing underscore to member variables. -- MichaelHill
number_of_expression_matches - it is almost never good to call something "number" or "value". All integers are numbers; all variables are values. Use a more descriptive word, like "count".

I agree with Brad's comment about context. As an instance variable (of a regular expression class) I might call it "matchCount". In a function, I might call it just "count", because the function name would make it obvious what it was counting. One consequence is that variables (and methods) usually get renamed as they leave their context. -- DaveHarris

I think IndexOrCount deserves more discussion. -- JayBazuzi

This appears to a case of being overly rigid. I find the use of "number" in the above example to be perfectly understandable and correct. Other synonyms, such as "count" or "quantity" or even "total" used as a suffix (expression_match_total) provide equivalent meaning for the variable name. Focus on providing a clear name, and avoid worrying about limiting the vocabulary to be used, though brevity may be important. -- WayneMack

This is a subject near to my heart. See As far as having long names, there are lots of ways to work with that. One is to use a good editor. If you use even something as primitive as vi (I use gvim a lot) you have an abbreviation command. You tell it "abb nexpr number_of_expressions", and when you type "nexpr" it expands the word. It is smart enough to wait to make sure the character after 'r' is really a punctuation or whitespace.

Surely the modern IDEs aren't so stupid that they can't do this? If you have an editor that insists on acting like a word processor (mouse-heavy, leave the home row constantly) you can type nexpr, then highlight the function and do a replace so it changes the short name to the long. Or, I suppose, there's the cut-n-paste.

So having more modern (and yet more primitive) tools shouldn't prevent us from having good identifier names. If worse comes to worse, you can always learn VI (gvim is great) or EMACS and you can easily do abbreviations and the like.

-- TimOttinger

(PatternsOfSoftware discusses tools that do this: RichardGabriel seems to like the idea, but in this case might like the editor to also be able to show you the abbreviated form. -- MartinPool)

Even better: writing Java with EclipseIde, you can type the first few letters of an identifier and then press Ctrl-Space for code assist, and it pops up a menu with any matching identifiers (or if there is only one, it just inserts it for you).

It may seem counter intuitive, but if you think about it a while you realize that code with many small methods has a greater amount of context with in each method. Each method has greater focus. With increased context comes shorter names. -- DonWells

Given IdentifiersAreComments and what we know about comments, then: (1) use good identifiers named by intention, and (2) see if you can refactor rather than commenting. If you must use globals, give them good names, but better yet is to factor them out entirely. --MartinPool

Sometimes I go the other direction, like
  double Q = this->maximumPrimaryEnergyUsageForTheMonth();
  double P = this->currentPrice();
  return (P*Q)/pow(1-P*P, P/Q);
  Select, A.balance
  From dbo.t_currentcustomer C, dbo.t_acctreceivables A
  Where = A.customer_id
I'm leaning toward the adage: if "local variable names matter, your method is too long".

-- JohnClonts

What about, instead:
  double energy = this->maximumPrimaryEnergyUsageForTheMonth();
  double price = this->currentPrice();
  return (price*energy)/pow(1-price*price, price/energy);
Hmm, interesting ... I thought I'd prefer the longer names ... but in this case I'm not sure I do. Hmmm. --

There is at least one undiscovered method in the last line. --

The name energy is poorly chosen. Consider using quantity or usage instead. --'re advocating maximumPrimaryUsageUsageForTheMonth? :-)

No, I'm suggesting:
  double quantity = this->maximumPrimaryEnergyUsageForTheMonth();
  double price = this->currentPrice();
  return (price*quantity)/pow(1-price*price, price/quantity);
  double usage = this->maximumPrimaryEnergyUsageForTheMonth();
  double price = this->currentPrice();
  return (price*usage)/pow(1-price*price, price/usage);

Over-long names are more of a problem for reading than for writing, so in my experience editor abbreviation-expansion tricks don't really help. I type pretty fast. -- DaveHarris
I'd like to see a language where we can have WhitespaceInIdentifiers?. I think this has potential for great improvements in readability and even writing speed.

For example, variable identifiers could be composed of one or more capitalized words; class identifiers of one or more all-uppercase words; and method identifiers of one or more all-lowercase words. If a generalized form of the Smalltalk message syntax were adopted - one where parameters may be arbitrarily placed in the method identifier, e.g., "kill Evil Bear brutally with Stick" - I think it could be pretty cool.

-- DanielBrockman

In lisp, you can have those.

Long names are a OnceAndOnlyOnce sin. If a variable is commonly-used, then long names just bloat up the code size and make it harder to read. For example, single lines are more likely to need wrapping if there are lots of long names, and wrapped lines decrease readability of code greatly (in my opinion/experience). For one, they muck up the visual indentation. Generally the more often an identifier is used, the shorter it should be. Give the full name as a comment near where it is first declared. If you forget what it means, then simply find the declaration. Good use of abbreviations makes code cleaner. True, it is an art-form that is hard to get right, but when done right, things sing along. Example:

	double maxMonthUsage;	// maximum primary energy usage for current month
If there are other kinds of monthly usages, such as "secondary" in addition of primary, then here are some options:

	double maxPriMonthUsage;	// maximum primary energy usage for current month

double maxPEUCM; // maximum primary energy usage for current month
Often such are just "slots" to the programmer anyhow. They don't really need to readily know what the difference is between a primary and secondary usage. They just want some kind of unique name to serve as slots to stick stuff. Note that I left "Max" in the name because calculating the max often is something the programmer has to deal with. From experience one knows that there is fairly likely to be a "curPEUCM" ("current") that is used for the summation and/or max-finding algorithm. Of course it is experience that teaches one what to abbreviate and what not to. (Even the experienced get it wrong sometimes. But the benefits of the times it is done right outweigh the losses from bad guessing.)
Suppose I start with a concept like "maximum primary energy usage for current month". Even in the example above, one author apparently proposes that this concept be referenced by multiple variable names:

That author then claims that a long name violates OnceAndOnlyOnce. Come again? I see the three different names for the same thing. From one author. Now consider code with, say, a dozen authors. Sounds like something on the order of 30-40 variations to me. Now combine this proliferation of names with the anemic ability of our current tools to actually find things in code that is even moderately complex, and I get the heebee-jeebees. Finally comes the claim: "But the benefits of the times it is done right outweigh the losses from bad guessing."

Not in my experience. If we can find programmers who can even spell correctly, then there is only ONE thing to search for: "maximum_primary_energy_usage_current_month". I elided the "for". If you only need the "maximum month usage" (and don't care about "primary", "energy" and "current", then call it "maximum_month_usage". If you are in an environment like PythonLanguage, JavaScript, or SmalltalkLanguage, then call them "maximumPrimaryEnergyUsageCurrentMonth" and "maximumMonthUsage".

The point is that when I search code, please don't make me guess at all the possible abbreviations some programmer someplace might come up with. If you do, please don't then argue that this is a good thing. In my opinion, it is not.

cf. "HttpReferer" (sic) property

Also check out AreLongAndDescriptiveRelated


View edit of September 13, 2010 or FindPage with title or text search