Syntactically Significant Whitespace Considered Harmful

Is syntactically significant whitespace harmful? (A.K.A. "The Great Tab Fight")

Should this page be whittled down to a stub stating the opinion, and the content refactored over to the PythonWhiteSpaceDiscussion? [[No, because significant whitespace is also used in Haskell, Occam, YAML, and merd. And Cache COS]]

In many programming languages (C, C++, Lisp, Java, Forth, ... etc., as well as HTML, ...), the presence or absence of whitespace is syntactically significant (otherwise "foo bar" would mean the same thing as "foobar") (token separator). (LISP uses spaces to separate items in lists ... while other languages often use commas.) However, the various flavors of whitespace (tab, spacebar, return, newline, CR/LF) and various amounts of space are all treated identically as "some whitespace". One space, three spaces, 2 linefeeds, a tab - no significant difference between them.


One must wonder why whitespace is always used as a delimiter. By using whitespace as a delimiter, it prevents users from just including spaces in their names, resulting in the CamelCase and EmbeddedUnderscore holy war. Consider that, in C, spaces are only used for type/scope information. If the language had opted for a more pascal-like approach of using colons, then the whole HolyWar could've been avoided. While

 global, const, string: null pointer exception message = "A null pointer exception has occurred"
might look foreign to our eyes, it would have avoided some nasty style problems. Especially since space-delimiting seems to have propagated throughout Unix, making compatibility with win32 (that has no qualms about spaces in filenames, SQL tables, etc.) a hassle.

Since UNIX predates win32, why should it not have fallen to win32 to be compatible with UNIX?

Because, by that logic, win32 is already compatible with UNIX; UNIX is just not compatible with all the options presented by win32.


The reasoning is that white space is for formatting, and should not be used for logic. [[This is conjecture, not reasoning.]]

However, some programming languages are ... shall we say ... different. They treat different kinds or amounts of whitespace differently.

But FORTRAN wants certain things in specific columns, right? Python just requires you to add at least another space for every deeper nested block, and to be consistent. The same is true for a number of languages, and programmers are typically required to write code like that by project standards etc even if the language doesn't require it.

The fortran language has developed substantially over the years, the evolution manifested defined through a set of standards (f77, f90, f95 and most recently f2003). The Fortran 90 standard introduced a new "free format" for source code which do not give syntactic meaning to whitespace. The fixed format was rendered an obsolescent feature of the fortran language with the introduction of the Fortran 95 standard. The comments on FORTRAN above are thus valid for Fortran 77 and earlier versions.

It seems to be rare that people who actually tried to program Python dislike this. (Even if it happens.) It's easy to get used to, see the Eric S. Raymond quote in http://www.thinkware.se/cgi-bin/thinki.cgi/PythonQuotes (quoting the interview at http://www.linuxjournal.com/article.php?sid=3882 ). "Oddly enough, Python's use of whitespace stopped feeling unnatural after about twenty minutes. I just indented code, pretty much as I would have done in a C program anyway, and it worked."

Contributors:


As a Pythonista I am happy with *leading* whitespace, but significant *TRAILING* whitespace? That’s something so stupid nobody would ever suggest it, right? Right? Enters MarkDown.

http://daringfireball.net/projects/markdown/syntax#p: “When you do want to insert a <br /> break tag using Markdown, you end a line with two or more spaces, then type return.”

Oh boy! If AaronSwartz (RIP) really agreed with this, then my otherwise very high regard for him falls down like a rock.


I find syntactically significant whitespace annoying. The presence or absence of whitespace may be significant (e.g. to separate two identifiers that would otherwise lex as one long identifier), but the kind and amount should not matter. I realize that when I say should not I am expressing a personal preference, not a moral imperative or a law of nature, but that's how I feel. Also, I feel that syntactically significant whitespace is related to the HotComments AntiPattern (comments should be like whitespace, and neither should be significant). I guess I'm saying that I feel syntactically significant whitespace is bad for the same reason that HotComments is an AntiPattern. If you feel that HotComments are bad but SSWS is OK, please explain why. Thanks. -- CameronSmith

HotComments are even worse than SSWS because they A) change how the language is supposed to work, and B) slowify the interpretation/compilation process.
One reason is that whitespace has no defined width, making whitespace significant programs unreadable to humans under some circumstances. For example, the text of a Python program cannot be CutAndPasted out of the browser window into a text file and then run or compiled. However, one can always pop open a source window and cut & paste the text out of that, preserving formatting. And most websites with code samples wrap them in a <pre> tag anyway, which can be cut/paste out of at will.

But will this result in code indented with spaces or tabs, and how many of each?


Syntactically significant indentation eliminates the following problem:
  if (condition);
	code;		// oops, always executed

But that is what that code is supposed to do, so it is not really a problem but a typo.

Any decent programming editor will solve that problem for you just as well. However, syntactical whitespace creates other problems; for example cut and paste is broken across indentation levels, and you may be stuck with an obnoxious whitespace convention, etc.

Any decent programming editor will solve that for you just as well. ;)

No it won't, in fact it can't. There is no way for a computer to tell how python code is supposed to be indented once it's been mangled in some way. This is the reason curly braces exist. Yes, it can, and if you tell it to, it will. Please try this in Emacs: copy a region of code (M-w), paste it somewhere (C-y) and type M-x py-indent-region. Done. It would be trivial to define a key that automatically called both yank and py-indent-region. Maybe there's one defined by default, but I'm too lazy to find that out now. -- MatthiasBenkard?
A last-minute program had to be sent by telex. The telex operator had been trained to minimize whitespace... oops!

What is telex?

It's a telegraph with a keyboard. Typically with some kind of lock I think. They were at least used as late as in the nineties, since they were considered more secure that fax etc. As far as I know, telex messages are often considered legally binding, while a fax isn't, since you can't really be sure of who sent it. I'd never trust a telex operator to transmit any piece of code though. Whitespace sensitive language or not.

If the telex argument is the strongest argument against syntactically significant whitespace, I can live with that...

Isn't syntactically significant whitespace a violation of SeparationOfConcerns, in that it mixes the business logic of language syntax with the presentation logic of how the user, i.e. programmer, sees it? Indentation can make code more readable, but it's not the only way to do so. You could unindent the block, change color, change font, whatever. The argument could be made that not all hardware supports these methods, but then I would refer to the telex argument above, and I would point out that a piece of paper doesn't have discrete whitespace on it (except maybe graph paper) it just has space.

Unlike colour and font, indentation is pretty well technology neutral - basically every modern computer system and editor supports it. Your other suggestion, unindentation, is bad as it runs counter to what indentation means to any programmer (regardless of whether said indentation is syntactically significant). A piece of paper is irrelevant to the argument since Python's syntactic whitespace is really as discrete as leading whitespace on a piece of paper - it's purely a relative comparison - "Is this line at the same level as the previous one, or is it indented, or is it dedented?". Indentation is what I'd use anyway (unless I'm writing ObfuscatedPython) and - as I've not had any trouble with indentation errors in the last N years - syntactically significant whitespace is IMO a GoodThing. It's semantically significant whitespace (where "f -1" can be different from "f-1") that's evil.

Yes, that's precisely the point. Syntactically significant whitespace does not violate SeparationOfConcerns any more than does any other part of syntax. Why should syntax be a part of the language at all? Someone might like to view control structures graphically, for example. If you follow that path, though, you end up with Lisp, which is probably the most syntax-free language in existence.

But speculation about having a good editor that makes programming easier by showing your program in such an unorthodox form is just that: speculation. Until someone writes such a thing (probably for Lisp then, huh?), I'll stick to the language that's easiest to read and write as-is: Python :) -- MatthiasBenkard?

Telex is a comms system that uses a 5-bit baudot code, with shift characters to enhance the character space. Since you know the originator station of a message, and can verify that by answer-back, it's regarded as secure and things like money transfer instructions & SWIFT messages (which are also about money transfer) can be sent over it. Various alphabets exist for various applications - stock ones with 3/8 symbols are an example. Alphabet 5 was for teletypes with 7 bits of data and eventually became the AsciiCode. Telex is rather slow - sometimes as low as 10 characters per second (in order to help message reception - note that there is no inbuilt error detection), so brevity of messages can become important on crowded networks. Telex still gets used a lot, and there's a lot of very freaky/scary hybrid telex/IP solutions for companies to move data between networks and things. Particularly it's still used for inter-governmental communications, so the telex network is extremely well maintained. The famed "hotline" between Moscow & Washington was, for most of its history, a telex line going through neutral intermediaries.

-- KatieLucas (Who hasn't got any work to do right now..)
In Java, refactoring or other changes occasionally leave me hunting for a missing/superfluous bracket. Semantic "bracketing" errors are seldom, but possible. On the other hand, it has never happened to me that I indented code in a way that didn't match what I meant.
If SyntacticallySignificantWhitespaceConsideredHarmful then the PlbLanguage is deadly.
One thing not to like about Python's whitespace usage is that there is syntactic fluff that should not be required. For instance, when making an if statement or a function definition, you must end the line with a colon. Since the whitespace is significant, it seems that this would not be necessary. I read somewhere that people new to programming found this to be more natural but it's a constant source of annoyance when I program Python. I can never remember to add those colons since they seem to serve no purpose.

The colons are an extra visual clue to scanning the blocks. As such, they improve readability, one of Python's main strengths.

Then Opening and closing braces should be called an "extra visual clue to scanning the block" in C++ and Java? Because they improved readability. Braces ({) then is C++, Java's main Strength.

As an aside to any visual clues, the colons make parsing significantly easier; In an if statement, it obviously terminates the condition.

Can you still write ugly code with Python like this (Sorry, I only learn a bit of python)? Does this generate error?

 if (condition):
    statement1 """not 'if' """
    statement2
    statement3
        statement4
does all of them get executed and still looks ugly? Or do I get compile error?

You get a compile error on the third line (statement2). Python is pretty tolerant of weird indentation, but the indentation must be consistent. The third line is indented further than the second line, which is only appropriate if you're beginning a new block.

That's part of my main complaint about Python - the interpreter doesn't hint that indentation might be the cause. I've spent too much time staring at a line and digging through my references before realizing it's an extra space that's stopping things. -- PeteHardie

Try using Python with tab-checking then. Just enable the "-tt" switch, like so: "python -tt mymodule.py" -- MichaelA


Does this topic also include syntactic whitespace requirements, in the alternate form of mandatory brace-placement, like that imposed by TclLanguage? As a result of Tcl's command-line interpreted heritage, code blocks must be started on the same line as the opening if or while statement, so you are locked into a K&R-type bracing style.

Not entirely correct for Tcl, but the work-around is not really pleasing either:

 if {$foo == $bar}   {
    set foo $blah
 }   else  {
    set bar blah
 }
In this example, you can escape the newline character with a backquote, if you really want to insist on breaking Tcl's inherent K&R style. Most Tcl programmers wouldn't recommend that approach, however.

While I was a bit dubious about Pythons trade-off of indent levels for braces, the coding process and resulting code actually end up looking and feeling pretty natural. But also as a sometime project lead, it has been physically painful to have otherwise reasonable developers dig in their heels on WhereDoTheBracesGo? Just doing away with the question altogether would put a Python development team 6 weeks ahead of a C++ team. -- PaulMcGuire

I think I would agree with the above. Perhaps we have been given too much flexibility to handle. A long, long time ago, I started programming in FORTRAN using punch cards. There were no debates about which column to use; there was one choice or the computer rejected the card. Now the compiler gives us more options and is less likely to reject a line of code for formatting reasons. Instead, we now have Al wanting to dictate how Bill should write his code. I think for my next project I will hire a WWF wrestler and his job will be to smack any programmer upside the head if he complains about brace placement, TABs, or any other style issue as used by another programmer.

Just to be accurate: Tcl's limitation, if you want to deem it that, is due to the fact that blocks are simply arguments to a command, and the braces quote it. In theory you could give those blocks without the braces, if you escape properly. In Tcl EverythingIsAString and EverythingIsACommand —BySetok

Most people using significa-whitespace languages seem to like it. Few don't. Actually, I have some problem with them. You can't paste code on most internet forums cause it will be screwed up. If your language allows you to use tabs as syntax elements (python does, sadly), your code will be impossible to understand for someone not using your same tabwidth. It shouldn't be, Python only requires that one tab is the same width as one tab and two tabs is wider than one tab. You can use 16 spaces for the first tab and one space for every tab after it and the interpreter won't mind. [Huh? Python interprets tab as "indent to the next multiple of eight". When editing Python, use eight-space tabs and you won't have "someone not using your same tabwidth".] You can't (I think) have an expression-oriented language with postfix operators, at least it will not be easy to parse. You can't simply change an algorithm commenting out a piece of code, because it could break the whole program. Sometimes you just don't want to indent a five line hack.

The indentation feature is visually pleasing but making it required leads to some annoyances that are easily handled with a simple disambiguating end-block keyword whilst not making it any less readable or visually appealing. My personal pet peeves:

example (Python):

 if foo:
    statement 1
    if bar:
        statement 2
versus:

 if foo:
    statement 1
 if bar:
    statement 2
lack of closures in Python means you have to keep in your head that the second if statement is NOT UNDERNEATH the first if you were cutting and pasting or if it got misaligned while editting and you were fixing it.

[I use a text editor with a macro that can comment/uncomment all the lines in the block. A block comment would be nice but I almost prefer the line-only form at this point. The extra visual feedback that this whole chunk of code is a noop is worth it to me. One problem with Python is that block comment/uncomment can end up messing your indentation, which need to fix/refix. This again is easy with an intelligent editor, but is annoying. -- ChrisMellon?] The visual clue is less needed if you use an editor with SyntaxColoring?. There you go again, depending on your editor. It is not unpleasant to edit non-WSS code in TextEdit/Notepad/Leafpad or whatever. I'd almost rather rewrite the thing if I had to do that with Python.

I've become a Python fan, I do think it's pretty, and I am accustomed to it; but in the process of writing scripts, Syntactically Significant indentation especially without closures can be annoying and even hinder the coding process.
 if condition:
   consequent
 #
 else:
   alternative
 #
 another_statement
 if condition:
   consequent
   another
would be identical to:

  if condition { consequent; another }
and equivalent to:

  if condition {
    consequent;
    another
  }

After pasting in code, or otherwise heavily mangling code in a braces-language, if your editor supports reformatting of the entire document or, perhaps more intelligently, of the current scope, this can make the code formatted and indented correctly based on understanding of the braces. This is quick and easy and often highly effective.

With a syntactically-significant-whitespace-language, is it possible for an editor to do this to the same extent? Unless I'm much mistaken, it's not (you can't calculate in the general case the correct level of indentation for the current line), which means that you're in a mess if your cutting, refactoring, moving and general work on the program has not preserved indentation or line breaks.

Do Python programmers find this is a problem?

Copying code from other source file would not be 'heavily mangling'. You only need to take care of outer indentation. Copying code from e-mails, html can be a problem, so links to source files should be used.

My initial reaction to Python's whitespace was mixed. I really liked the fact that it forces you to show some of the structure in the source text. But I was a COBOL programmer, and can remember the disasters when a box of (2000) cards was dropped. To rebuild the Py source would be very difficult.

I had to use punched-cards once for an IBM-360 assembler class because they didn't have a communications line hooked up to the off-site mainframe at the time. We would put sequence numbers, similar to BASIC line numbers, in the far-right of the comments column to deal just with such an event. Every now and then, the operators would indeed drop the cards and they came back out of order. And don't get me started about slightly misaligned holes. Talk about HeisenBugs. Cards sucked for many reasons.

Those sequence numbers aided non-interactive editing as well.


Syntactically significant whitespace in programming languages was first proposed for ISWIM by PeterLandin, in 1966. See <http://www.cs.utah.edu/~wilson/compilers/old/papers/p157-landin.pdf>, which has an interesting discussion on this subject near the end.
Then there's WhitespaceLanguage, in which only the whitespace has syntactic significance...
And even C has the worst kind of syntactically significant whitespace - syntactically significant whitespace at the end of a line! Backslash space newline... Why is the compiler complaining?

The compiler just shows a warning not a compile error, then it is just one character per source file instead of always having to depend on whitespace.
Designing language semantics around a copy/paste style of programming seems counter-productive. The benefits of readability and eliminating excess scrolling would seem to outweigh the short-lived frustrations of pasting in code (which should rarely be done in the first place).
The whole badness is not the space being syntactically significant (it is so even in other languages, maybe except very old one like Fortran77); the fact is that indentation (that you create through white spaces) identifies block of codes. This is very annoying and dangerous. I had hard time trying to post to or grab from forums or similar, since if something goes wrong with spaces, the code could experience problems. Then what happens if you mix tabs and spaces (can you?), and an editor changes a tab to 8-spaces e.g.? Anyway block-recognition should not be just a visual fact, visible begin-end delimiters are welcome. -- MauroPanigada
I think the creator of Merd got it (mostly) right -- our brains impose structure on code according to the whitespace in it, so the compiler should use that whitespace in the same ways that our brains do. A significant problem in programming languages is operator precedence -- quirks and differences between languages have left many of us not trusting the rules for operator precedence, or that the reader/writer understand them. Why not say that all operators have the same precedence but whitespace is significant? For example, in

   func1
       func2 a+1 * b-1
       func2 a-1 / b+1

we immediately interpret this, because of the whitespace, to mean

 ( func1
     ( func2 (a+1) * (b-1) )
     ( func2 (a-1) / (b+1) ) )

Our brains intuit this structure naturally; why have rules that create a different interpretation? For generated code that you don't expect people to read, just throw parentheses in everywhere, but when writing for people, whitespace rules.

Also, I personally abhor the use of commas as separators; who among us has never rearranged a list like
    foo,
    bar,
    zot,
    blit
and either forgotten to remove the comma on the new last line or to add a comma to the old last line? Whitespace all the way, please, with a dash of parentheses.
"Silence remains, inescapably, a form of speech." -- Susan Sontag

Is not whitespace a written form of silence?

Only if thought and speech are equivalent. I'm not sure they are. Otherwise all written language is a form of silence and whitespace is written gesture.
"Generalizing Overloading for C++2000," by Bjarne Stroustrup: http://www.research.att.com/~bs/whitespace98.pdf


If there is much indentation in the source and the displayed line length in the editor is limited, the code viewer has to wrap and the block display ends up containing mostly blank lines. If the code processor answers this problem by enabling horizontal scrolling instead, the result is equally dramatic, in that I cannot see the whole of the neighbouring lines, only how they begin or end, so I really do not know what is going on.

Both options produce unreadable results. I even remember recoursing to a preprocessor to Python that would generate the required indentation for each line from an explicit indentation number at the beginning of each line.


See: PythonWhiteSpaceDiscussion, YamlAintMarkupLanguage, SemiColon, TabsVersusSpaces, HolyWar

CategorySyntax

EditText of this page (last edited October 4, 2013) or FindPage with title or text search