Restricted Symbols In Languages

Have you guys noticed that most (all?) programming languages are restricted to use only the 94 printable ASCII characters? That may be because currently that's all that most keyboards are able to do (with the exception of a few extra symbols if your keyboard has a non-US layout

But... if as developers we started using keyboards as OptimusMaximusKeyboard then we would be freed from that limitation, we could perhaps find a visually better notation, for example:

Not all programming languages are restricted to 94 ASCII characters - AplLanguage and ZeeNotation? for example.

While use of a more flexible keyboard might be a partial solution, I've a feeling that a more practical solution would be to support more keyboard macros in common editors (e.g. so after typing 'up', ALT flips to an up-arrow or other related characters (based on other surrounding characters) and back in a rapid user-controlled cycle (Alt,Alt,Alt to the right character or back to the original with Ctrl+z). There are other macro schema that also work. This overloads things a bit, but for the same reasons that the number pad was heavily overloaded on phone text messages - if done right, with a good and adjustable macro language, it shouldn't be too much a hassle.

And perhaps something like a RoundMenu? ( could appear showing you the symbols that are available as you write the combination of keys that will produce the specialized symbol

I don't think that the reason is the limitation of the keyboard. After all any special character in any special text, be it math or economics or design or otherwise must be entered somehow - usually indeed by some menu or your edit tool of preference. And that is the problem: Source code is often expected to be viewable and editable by the most primitive tools available. This said I'm wondering why SmalltalkLanguage doesn't have special characters. (Actually, it did. Early Smalltalk used ↑ (up-arrow) for "return" and ← (left-arrow) for assignment.) Its closed environment should have made this easily possible. Hm. So maybe the reason is rather that the those special symbols aren't really needed or even distracting from the uniform presentation. -- GunnarZarncke

Of course, technically we don't require more than two symbols, perhaps representing '1' and '0'; uniform presentation at the symbolic level is not a goal for most programming languages. While it may be that source-code is expected to be viewable and editable by primitive tools, I think that's a lesser reason - the larger issue is that source-code needs to be easy to edit, and reaching up to grab something out of a menu is a real pain. Programmers won't put up with it. Any language designer who follows EatYourOwnDogFood wouldn't put up with it - at least not on a regular basis. If it were a really rare symbol that didn't come up but once in several thousand characters, perhaps it would be tolerable... but it gets to the point that a keyword is easier to enter and just as clear when it comes to these rare symbols without being overly verbose (because verbosity = frequency * width; low frequency can balance a greater width).

Language designers, instead, end up parceling out a few critical symbols - those readily available above the number-keys and the right pinky side of the QWERTY keyboard - to things that are high frequency and of semantic significance, and that ideally are familiar to programmers from their prior education. They also start using combined symbols, like '->', '=>', '>>=', '==', ':-', etc. - which can graphically approximate several other symbols. Even a closed environment doesn't change issues of convenience or the basic formula for verbosity. Symbols strictly aren't necessary for anything, but they do reduce verbosity (and consequently improve clarity) - i.e. Smalltalk could have been written without any symbols at all outside of 'a-z' and spaces, but it would have been far more verbose.
It would seem possible to have an editor that would render certain keywords of a language as symbols. For example, in LispLanguage, "lambda" could be rendered with a single lambda character. You could still use primitive tools to work with the code (it could all be standard ASCII chars), but in the "special" editor, you can see a symbol. Somewhat like syntax-based coloring and fonts, but you replace the whole keyword with a symbol.

Perhaps, though how one then handles quoted lists containing 'lambda' that might or might not later be evaluated is an issue. For non-HomoiconicLanguages, this is probably quite doable. I suppose it depends on the other language properties for which the designer is aiming. HomoiconicLanguages would need special handling because suddenly the same code can look different in different places based on whether it is data or code in that particular location. And if you're aiming at SymmetryOfLanguage, then ExtensibleProgrammingLanguages, to have symmetry, would require an equally extensible editor - i.e. the mechanism by which 'lambda' is collapsed to the greek character would need to be just as applicable to any other syntax rule for the language.

Personally, I'd be against it, if only because it seems like an attempt to elevate ASCII upon some pedestal. I'd prefer the editor simply offer greater access to Unicode (e.g. through keyboard macros), and that it simply be accepted that the "simple tools" will be required to both render and accept Unicode. Unicode is now state-of-the-art. There isn't anything 'special' about ASCII that requires we continue using it as a primary basis for communication between programmer, program editor, and program compiler/interpreter.

Yes. I agree that most symbols one might come up with are easily available in UniCode. It seems as the legacy of ASCII is still showing and slowing us. ASCII really got hold on programming languages early on and so will not let go shortly. There are really enough editors and viewers out now that can cope with UniCode. The programmer editor par excellence Emacs has support for Unicode probably from the beginnings of unicode so one has to wonder why new programming languages didn't use the ability to define characters a long time ago. I mean e.g. in Forth it'd be too easy. And in Prolog it's also possible. On the other hand one could indeed just render the code differently for presentation issues like generated docs. I once toyed with the idea of using TeX notation for formulas that would have the added bonus of rendering compactly. But I agree that reflection/homoiconicity can make this really difficult. -- .gz

I also gave serious consideration to TeX and similar annotated code in the language I'm designing, but I decided against it; it seems to me that it would be a similar mistake to mixing content structure with style information in HTML: display is not a primary purpose for code. However, a smart display system that can understand the syntax and automatically format code for documentation purposes would be very worthwhile, and depending on stuff inside comments seems to be quite hackish. To support this, I did eventually include a generic facility for including post-processor annotations that are not intended to directly affect the semantics of the content - e.g. for giving a programmer's guesstimate profile to the optimizer or offering search strategy suggestions to the constraint logic analysis ... or even offering formatting suggestions for documentation.

You misunderstood. I didn't mean TeX in comments or the like but rather executable TeX (nickname: eXo (echo)): Example:


This would be equivalent to (expanded to)

  a={tmp=0; for(i=1; i<n; i++) { tmp+=i**2 }; tmp }

-- .gz

I believe I understood, but I must have failed dreadfully to make myself clear - I also meant executable TeX, i.e. code designed such that it simultaneously possesses both specific display and execution properties. However, as I noted above, looking good on display is not the primary purpose of code, and mixing content with display seems to be a mistake - it has, at least, proven to be a mistake in other fields like HTML. Something like syntax and structure-aware CSS with some rare support annotations in code (e.g. classification and section identifiers to allow for easier specification of region display characteristics) has proven itself a better approach in other fields. And so a smart (by which I mean syntax-aware) display system would be very worthwhile. And, in this endeavor, putting the annotations inside comments seems quite hackish, and so display annotations to the smart display system need support directly within the language, much as HTML handles 'class=' and the occasional default 'style=' annotations, which is what I eventually chose to provide (via a more generic annotations system).

I do like your example. There isn't any reason a display-rule couldn't be attached to the '\sum_' syntax rule by the display system. One might even be able to do it even if '\sum_' happens to be a macro rule or function added to the language via a library. That happens to be the advantage of syntax-aware systems: the language itself need not be designed to support the display; it only needs to support annotations and extensions appropriate to aiding display - even particular function-applications or macros could have attached display rules: default rules, rules provided within the programming language via annotations, and rules provided from the outside via stylesheets.

But there are always issues with the reverse direction: WysiWyg is very difficult, and even more so if you have a modifiable display system that can make two different things look very similar.


I am inclined to disagree. More is not always better. The more you have, the more one can be mistaken for another because the number and likelihood of resemblances increase. The existing set is close enough to the ideal in my opinion, at least for English coders. -top

See also IntentionalProgramming



View edit of July 9, 2010 or FindPage with title or text search