Why Are References Hard

The "references" in the title are the kind of "safe pointers" found in languages like Java and Smalltalk, that can only point to valid objects or to Nil/NULL. No undefined behaviour. This discussion was spawned off NoPointers, which is about the trade-offs of allowing undefined behaviour (as is possible from pointer arithmetic in C/C++). Comments on this page probably apply to both kinds of pointer.

This reference/pointer terminology isn't universally agreed, and often people on this page use either word interchangeably.

Incidentally, C++ has a language feature called "references", but that's not specifically what we're talking about.

See also ThreeStarProgrammers and RavioliCode.
"...The name of the song is called 'Haddocks' Eyes'!"

"Oh, that's the name of the song, is it?" Alice said, trying to feel interested.

"No, you don't understand," the Knight said, looking a little vexed. "That's what the name is called. The name really is, 'The Aged Aged Man.'"

"Then I ought to have said "That's what the song is called'?" Alice corrected herself.

"No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it is called you know!"

"Well, what is the song then?" said Alice, who was by this time completely bewildered.

"I was coming to that," the Knight said. "The song really is "A-sitting on a Gate": and the tune's my own invention."

-- Lewis Carroll, "Through the Looking Glass"


What makes pointers difficult for some people is confusion between the pointer and the thing it points at. It's a higher-level version of the reason why some people don't "get" algebra.

So, why is the reference/referent fallacy a hang up for people on Wiki? Surely people here understand the difference between the two. Not "getting" algebra is not acceptable for a mathematician, as not "getting" pointers is not acceptable for programmers. Too fundamental.

Pointers (and indirection in general) are a fundamental computing concept, but you can still get a lot done without them, like the label making program I wrote for my Commodore64 in BASIC. Even if an object in a language like VBScript is actually a reference, it still seems concrete like a number of a string.

Now, I'd want anyone I hire to know pointers (for a C/C++ programmer) or grok indirection techniques in general for any Real Work I needed done. But I can understand why there is a thinking-with-pointers gap in our field. -- AdamVandenberg

The point with pointers is not that they are necessary, but they are not evil or hard or somehow bad to have. And, from JavaDoesntPassByValue, understanding that you are working with pointers and not objects is kind of important. Not that it's in your conscious mind, but awareness is essential.

Anyway, understanding and accepting that there is a deficiency in our discipline's education is a cop out. More evidence towards SoftwareIsReallyPointless, I suppose. :(

Hmmm. I wouldn't say that I'm accepting the deficiency, since it directly affects whether or not I hire someone. Rather than said there is a problem with teaching CS, I'd point out that there are lots of self-taught people doing programming (Database apps in VB, web development, etc.) that get useful work done. When they try to "go to the next level" and do something that is much better done via pointers or clear understanding of references vs objects or whatever, that's where there are problems.

So, pointerless languages are to help the self-taught people? Do well-trained people fear pointers as well?

There are few pointerless languages in the sense we're using here. Java isn't pointerless.

I suspect it is the same reason that many programmers don't understand what a TuringMachine is and what it means, as well as the related "CodeIsData" concept that many people seem to miss.


References aren't pointers. With a reference, you don't know where in memory your object is stored. This deliberate abstraction also makes it easier to:
 1 build distributed systems which reference remote objects, 
 2 manage memory (i.e. shift blocks of memory around to free up contiguous space)
 3 thwart security hacks that use pointers to write into certain specific memory locations
 4 serialize objects in databases or in other representations that aren't just memory locations

... Sun really should have chosen a better name than "NullPointerException" (!).

--

With pointers, you don't know where in memory your object is stored either. At least in modern contexts. The virtual memory system just makes the pointer an index into its RubeGoldbergMachine. The difference between references and pointers is that references are guaranteed to point to a live object, except when they don't. Hence the NullPointerException. -- SunirShah
Pointers (or references) are the data equivalent of the goto statement.

They are especially difficult in conjunction with mutable state because of aliasing issues. They allow assignments to have side-effects on data at arbitrary distances. As always, re-reading The StructureAndInterpretationOfComputerPrograms will give you some insight into the issues.

I think they are pretty "hard" anyway. As I recall, I first understood them only by drawing diagrams. I suspect some people are not good enough at learning to know to draw diagrams. -- DaveHarris
Please list here the 5 basic computer programming concepts which you consider to be "harder" than pointers/references. I'll give you a few to start with: I count 6 concepts... are you sure one of them shouldn't be "off by one errors"?
Restricting this to theory, clearly from this page [or NoPointers], (Universal) TuringMachines, TuringCompleteness, and VonNeumannArchitectures aren't understood. I would also suggest code is data is code isn't understood (well, this is the TuringMachine bit), or isomorphisms in general. Most programmers don't understand the ScientificMethod either. Or information hiding and abstraction (especially equally descriptive yet disjoint ontological systems). Or even basic data structures. Or OccamsRazor. Or the local optimization fallacy (whatever its called) - i.e. simple hill-climbing doesn't find the highest peak. I know that most programmers don't know classical logic, even in mathematical contexts.

I would suggest that it's because of this that pointers are bad. But I would also conclude that programming is too difficult for programmers. SoftwareIsReallyPointless, after all (especially if it is pointerless). -- SunirShah, DevilsAdvocate
But pointers are necessary for "the first rule of programming": any problem can be solved by an additional level of indirection. -- anon.
I think they are also when they are used to build recursive data structures, because recursion is quite hard. In some languages (with lazy evaluation), you can build infinite data structures, which again can be a stumbling block.
This page confuses "pointers" with "addresses". A pointer is an indirect reference to some typed data. It is therefore a language-level concept, and is exactly what Java calls a "reference". Most languages, apart from C and C++, do not allow a pointer to be used in any way like an address. Most of the difficulty with pointers in C and C++ come from the way that the language allows a pointer to be implicitly used as an address or an array iterator and vice versa.

An address is an untyped reference into the address of the process. It is therefore a machine-level concept. Sensible languages such as those in the Modula family, for example, separate the concepts of "pointer" and "address". That does not mean that these languages stop one from achieving what one can write in C. Rather, the programmer is forced by the language and runtime libraries to make conversions between pointers and addresses explicit, thereby making programs easier to understand and maintain.

This page is about pointers, not addresses. It is about the problems of Java and Smalltalk rather than C and C++. I tried to make that clear in the opening remarks. Is there a way I could make it clearer?

Suggestion: by deleting or abstracting elsewhere the section that starts with some guy saying "References aren't pointers.", and contributions up to the next horizontal rule. The rest of it is pretty unambiguous, I think.

Suggestion #2: make a new page called ReferencesInJavaAndSmalltalkAreHard? and do some refactoring!

I think you mean ReferencesInJavaAndSmalltalkAndLispAndSchemeAndPascalAndAlmostAnythingElseThatsNotCeeOrCeePlusPlus?

If anyone creates a page with that title, I'll scream. Having to add the rider "but not the C++ stuff" every single time we use the word reference is like, uh, having to add the rider "but not the XP stuff" whenever we use the term unit test.

Suggestion #3: Edit comments like: "References aren't pointers. With a reference, you don't know where in memory your object is stored.". Pointers and References are the same thing! It is only in languages which confuse/mix the concepts of pointer/reference and address that you can determine where in (virtual) memory an object is stored from the value of a pointer to that object.

Suggestion #4: Put a statement right at the start of the page stating that what Java and Smalltalk call "references" are the same as what other languages (esp. those in the Algol/Pascal family) call "pointers".

EditText of this page (last edited April 10, 2012) or FindPage with title or text search