Definition Of Homoiconic Discussion

[Moved from DefinitionOfHomoiconic]

The above statement, and the referenced page, are both controversial. I believe further that it is false that there is an issue due to language evolution, since Lisp is considered universally to be (still) one of the best examples of a homoiconic language, yet its homoiconic core has not changed since around 1960, a little before the term "homoiconic" was even coined.

What is clear is that the Lisp approach to homoiconicity is stronger in multiple ways than the languages that approach homoiconicity via raw string evaluation, whether this means that the definition of "homoiconic" should be changed or not.

I (ATS) am not sure about that. Can you manipulate java code in the same way as when you're writing it? I dunno, but when writing it, I don't manipulate byte arrays. OTOH, see the bit of code I proposed on MyNaiveAttemptAtUnderstandingHomoiconicity. If that would work, I'd say it was in (though not everybody will agree with that either).

When I write LISP code, I typically do so by manipulating text in a text editor, not by issuing CAR and CDR commands. So there is some difference between my editing and runtime manipulation of LISP code. -- jtg

TentativeSummary: We have several proposed definitions: Java is clearly not homoiconic. Java byte code arrays represent the machine language of the Java virtual machine, which is a different language, not Java. Even a cursory glance at the definitions of Java byte codes make this clear. In contrast, Lisp can manipulate nested lists of symbols and literal data that that represent Lisp source code directly. In other words, what Java allows you to manipulate is a representation of the output of the Java compiler; whereas Lisp allows you to work with data that represents the input to the compiler or interpreter. (It's also the output of the reader. The reader, being the front-end of the interpretation/compilation process, is hidden as an implementation detail in most languages, but is distinctly identified and accessible in Lisp languages.) TCL's strings also have a very direct relationship to TCL source code, albeit at a lower level of abstraction. -- DanMuller

[Java lets you manipulate the input to the interpreter. (Byte code is typically compiled to machine code at run time these days. But that's a hidden transparent implementation detail.) -- JeffGrigg]

Alas, Java also lets you "manipulate nested lists of symbols and literal data that represent (Java) source code directly." You can do that with Strings out of the box or add a library like ANTLR and do that with ASTs of statements, expressions, etc.. Also, a more detailed study of the definitions and intent of Java byte codes reveals that they are little more than an AST of Java source code. The primary design goal was to provide a way to feed Java source to an optimizing compiler at run-time. -- EH

Example disassembly from

  iconst_0      // 03
  istore_0      // 3b
  iinc 0, 1     // 84 00 01
  iload_0       // 1a
  iconst_2      // 05
  imul          // 68
  istore_0      // 3b
  goto -7       // a7 ff f9

This doesn't look much like Java source code. Any nesting structure is necessarily lost when moving to a linear byte code array; it's only very indirectly implied by branching instructions. Individual loads and stores have no direct analogue to the original source code. There's really no comparison, IMO. The ability to manipulate strings of source code are not in and of themselves an indication of homoiconicity, as has already been discussed ad nauseum. -- DanM

FYI: There are decompilers for java (e.g. JAD), that can recover java code from java byte code. Not that I'd think, that that makes java homoiconic. To forstall some further arguments: Arguing, that the formatting or even the exact ordering of branches is not the same doesn't count: The formatting is lost with lisp too (e.g. whitespace between tokens) and ordering could be considered part of the presentation or the result of an implied normalization step.

I am not arguing that formatting is significant, but nesting structure is, and I would argue that ordering may be. Decompilers are not news, but they don't generally produce output that closely resembles the original source code in terms of variable names and nesting structure. (Although they should certainly produce something that produces equivalent side-effects and results.) They are not a good replacement for homoiconicity, because it's much harder to write code to manipulate code-as-data when that data doesn't have the form that you expect (i.e., the form in which it was originally written). Even reordering code in computationally insignificant ways would complicate such tasks. Consider Lisp macros that receive fragments of code written by a user according to the input requirements of the macro; if the data that the macro operates on isn't an accurate reflection of the original code/data, life can become immediately much more difficult.

Earlier on this page, someone makes the argument that TCL's strings are as unstructured as byte codes. Although on the face of it this is true, it's also true that the structure of the source code that they represent is encoded in them absolutely unambiguously, which is not true of Java byte codes. -- DanM

Actually, byte codes are a lot more structured than TCL strings. They don't preserve all the structure, but preserve the structure that matters for practical applications, making for the success of projects like AspectWerkz?

That's wonderful, but it's not evidence of homoiconicity. The structure that's important to this definition is the structure of the source code, obviously. Homo iconic -- same representation. TCL uses the same representation at the level of characters. Lisp does so at the level of list structure and atoms, which are the explicit elements of the language, one step up in abstraction from characters. There is no such defined one-to-one correspondence between elements of Java source code and the elements of Java byte code arrays. (Also, are you implying that Lisp macros which operate on their input code structures are not practical applications? Two generations of Lisp programmers would tend to disagree.) -- DanM Most of the AOP libraries only do point cuts to the granularity of method calls. Some, like AspectWerkz?, can also PointCut? field get and set references. That, and catching calls to methods are ways to "reach into" and modify the code inside a method. But is this enough to argue that Java byte code in individual method definitions has "sufficient structure?" (AOP tools probably could point cut at a more granular level, but it hasn't been a widely desired feature. At least not yet.) -- jtg

How about this: A language is homoiconic if treating code as data (or vice versa) is Good Style, and is heteroiconic if treating code as data (or vice versa) is either not possible or is a Gross Hack.

So LISP, with its macros, is clearly homoiconic, but anything where you have to invoke a compiler via system() is heteroiconic. Languages with string-based eval might fall into either category, depending on whether the use of eval is a fundamental language feature or a bolted-on kludge.

But one man's fundamental feature is another man's bolted-on kludge. I can think of one language where all of its fundamental features are bolted-on kludges.


EditText of this page (last edited November 8, 2005) or FindPage with title or text search