Spoken Programming

For the last few years I've had this idea to write a programming language that could be spoken to a computer.

Why would this be a good idea?

  1. I believe that speech is more naturally expressive than text
  2. I believe people would learn a spoken language much faster than a written language
  3. I believe the social effect of being able to discuss a programming problem in the language you would be writing the program in would be tremendously beneficial.
  4. Keyboards irritate me.

Why would this be a bad idea?

  1. Speech has a max input output speed that is far lower than text
  2. Humans can realistically keep track of only one spoken conversation at a time. This would rule out multiple "windows."
  3. You can't read back over a speech stream.
  4. Talking for long periods is stressful on the throat
  5. It would annoy the heck out of your cube neighbors

Assuming this is something worth pursuing, I'd go for LojbanLanguage as the language most suited to being a spoken programming language.

I'd love to hear what other people think about this idea.

-- ShaeErisson

Any word-and-stack based language such as FORTH [ForthLanguage] would be suitable for a project like this. I recall reading a passage many years ago in a book by RobertHeinlein (I can't think of the name right now..) where the characters programmed their space/time vehicle using a spoken programming language. The language they used reminded me of FORTH, but maybe just because I was very "into" it at the time.

Languages which need extensive lookahead grammars or are picky about syntax would probably not be suitable. -- FrankCarver
The book is named TheMoonIsaHarshMistress and the spoken programming language there is Loglan (LoglanLanguage), the predecessor of the LojbanLanguage. -- ShaeErisson


Can you sketch out an example showing why this would be good in practice?

I assume Forth was brought up because it could execute each word as it's spoken. Why is this important for spoken programming? If anything, it seems to make for trouble, like the computer executing a remark meant for someone else, or something like "rm -r /* oops, dammit, I meant ~/*.".

-- DariusBacon

It's not so much that FORTH executes each word as spoken (during definitions, it doesn't, for example), but that it allows you to define/redefine any word and has minimal syntax for usage of defined words. You would never need to say

  print open-bracket open-quote hello close-quote close-bracket semicolon

for example, but could get by with

  quote hello unquote print

or, as the "hello" is only one word, something like

  quoted hello print

Definitions are simple too. Something like :

  define greet quoted hello print end
  ...
  greet

is a lot simpler than:

  void greet open-bracket close-bracket 
  open-brace 
	print open-bracket open-quote hello close-quote close-bracket semicolon
  close-brace
  ...
  greet open-bracket close-bracket semicolon

-- FrankCarver

Also, the AnsForth standard defines the pronunciation for all standard words, including punctuation. I haven't heard of another standard that goes into that kind of detail. Examples: '@' is 'fetch', '!' is 'store', '.' is 'dot', 'DUP' is 'dupe', 'ROT' is 'rote'.


Have you ever tried to program over the telephone? WardAndKent have, but only after PairProgramming in person for some time. Even then, the ability wanes with time. If you have ever had telephone support talk you through fixing a configuration problem then you know how bad it can get. But how good could it get? I assume, with practice, it could get pretty good.

While reading the Lojban pages I couldn't help thinking, these folks need a wiki. It might have different TextFormattingRules, especially for linking, but they would probably end up more robust and less intrusive than ours. I wonder if the wiki PromptingStatement form would lend itself to 'logical' discourse. -- WardCunningham

The Complete Lojban Language by John Cowan ISBN 0-9660283-0-9
Speech has a max input output speed that is far lower than text - I'd have said it was the other way around. It's taking me a lot longer to type this than it would to say it.

Speech is linear. I'm not convinced that programming is. When I program, my cursor hops all over the place. I doubt speech will work unless we also get one of those Xerox Parc video walls, with eye-movement detectors, so you can say, "Put that there" and the machine will know where you mean. -- DaveHarris

Maybe current languages aren't designed for linear programming, because it's not too hard to hop your cursor all over.


Nostalgia trip... For many years, off and on, I spent hours at a time programming with a friend of mine over voice phone lines. We'd be working on completely separate projects but we'd cross-check crazy ideas with each other. It would have been funny to eavesdrop because for twenty minutes we'd say nothing, then someone would start sputtering code while typing and the conversation would [http://www.google.com resume]. Since we learnt C together, it was also natural to exchange questions about syntax as we went. Much solid code got banged out during those sessions. -- SunirShah


The problem I see with LojbanLanguage (and EsperantoLanguage, &c) is that the HumanMind? is not logical. It is probably not a single mind at all. Natural Languages are the way they are because that's the way our Mind works.

[how do you know that? maybe it is the opposite! and language is whatever we want it to be, if a tool then a tool, and if a tool why a shitty ambiguous tool? if you learn about lojban you will learn that it is only eliminating grammatical redundancy which serves no useful purpose in human communication, unless repeated miscommunication somehow creates intelligence and subtler communication, which is indeed possible. frankly we dont really have evidence that this language group we are in now is NATURAL, especially since we have no primitive languages on earth today -- anyway, lojban actually fascilitates expression of subtleties, redundancy, control over emphasis mechanisms, etc... furthermore, according to whom do we NEED ambiguity? and who is WE. what is goal or destination will filling the need lead to? sounds like an ideology to me, which is no reason NOT to do something...]

We need ambiguity, redundancy, polisemy, polyphony, &c... Do you mean the problem with LojbanLanguage/LoglanLanguage as a spoken language? or as a programming language? If as a programming language, why then is C so internationally used?

I am sure it's possible to create a new language and set a group of people to talk and think on it, but I cannot forget language is culture. Just think of books like Babel 17 (SamuelDelaney?) or Dispossessed (UrsulaLeGuin). In both, artificial languages played a key role, and in both we could see how language modifies behavior the same way behavior modifies the language.

Considering the huge variety of cultures, I don't think a single language can cope with it, and certainly not an artificial one... It's something funny what's happening with Spanish or English nowadays... natural evolution towards a single language (with lots of dialects, that's for sure) is probably possible if a Global Culture is attained (or at least a big enough number of common cultural issues... maybe 30%, 40%, maybe 70%, I cannot estimate...). Internet and TV and other cultural things (music, fashion, ...) can be the gluing factor... but that may be not possible... who knows? -- DavidDeLis


The only problem I see to be solved is that of creating context easily. Operating on "that thing on the other" is possible if both the programmer and interpreter understand the context under which the operation is to be performed. Any spoken language is merely a thought to sound interface. Perhaps the real question being asked is " can a vocal programming interface be built to be as easily understood and remembered as a symbol based interface? " Language is merely a tool used by people to get communication done. Language is defined, redefined, appended to as the need arises. The only languages that don't do that are dead...

Speech context is largely imbued via visual cues.

Perhaps I'm three buzzword generations back from the rest of the world, but I feel that most of the world never understood the power of "multimedia". The most powerful programming interface that could be conceived would be that one which we as being would be in complete contact with - visual, vocal and tactile.

A speech interface would have to have a "1 concept, 1 word" relationship, or you would create another Microsoft... ;)

I think you could create context via speech by merely adding suffixes or prefixes to the word, I.E. interprogrammable or however the Germans make those REAALLLY long street signs... WRL


Appears it's already being done: http://ii2.ai.iit.nrc.ca/VoiceCode/ --ShaeErisson


I considered this problem some time ago. What I came up with was using the perl-latin module and a voice recognition program. However, I hate doing real work in perl, so it wasn't a great solution.
One of the major differences between computer code and human language is that human language tends to be a lot less precise in usage. If you're speaking to another human, you can cut corners in context, intonation, grammar, etc., because you assume they have enough common sense to fill in the semantic gaps. Computers are very stupid; having no common sense, they need to be given explicit instructions, all the time. If people needed explicit instructions all the time, we'd all be a lot less social.

This is the problem with trying to make computer code like human language, and it's not simply a matter of designing a good language for it. (Though we could always use better programming languages.) If you say something like "print out each one of its elements", will the computer understand what you mean by "it"? Common sense would appear to be one of the Holy Grails of ArtificialIntelligence, and decades of research by diligent, intelligent academics has resulted in little progress so far. See ArtificialCommonSense.

There is, incidentally, a computer language that tries to be colloquial in usage: Perl. (more under PerlLanguage) Of course, depending on who you talk to, PerlLanguage is either loved or hated for that feature.

The lack of precision in colloquial speech (and even in normal writing) is the reason for the "legalese" of contracts and courtroom speech. Lawyers do not speak the way that they do for no reason: they are using a dialect which is much more precise than other dialects. Words and phrases in the legal dialect have very precise and well-defined meanings. For example, when a lawyer says "arbitrary and capricious", he/she is not being redundant, but is making a very specific point.

Given that the instructions which make up a program must be precise in order for the compiler to develop object code which does what we want, any spoken computer language would need the precision of legal-speak. A free-form spoken computer language is a pipe dream.

(On a further point, if you have ever read a transcript of yourself speaking, you were undoubtedly surprised at how poorly spoken you seemed. I am not intending to insult programmers or anyone else here, having had the same experience. My point is that spoken language includes not only words, but facial expressions and other body language, and that integration of these is necessary to make normal human speech coherent.)

How do you pronounce CeeLanguage code? One reason I enjoy VisualBasic so much more than C is that it's possible to read it aloud, and think about the code in words. I understand how to pronounce VB:

 dim C as new Connection
 C.open "myDatabase" ' pronounced "see dot open"
 dim R as new Recordset
 while not R.eof

etc. On the other hand, I am completely at a loss to do an oral interpretation of:

 if (c=(arg *)R!=0) { ...

But then I never had a class in C.

The C code above would be pronounced as you'd expect, had you any experience in C at all: Given C equal to R, as an arg pointer, if R is not equal to zero, then . . .
Perhaps this might work for the spoken C:

 c becomes arg pointer R
 if c not null then
 ...

Note: use of 'becomes' for '=' and 'equals' for '==' keeps operators clear.


I think this is somewhat connected to VirtualPairProgramming -- ShaeErisson


After thinking about it for the briefest span of time, here are some things worth considering:

  1. Overcoming natural language barriers. Shae tackles this one with the use of Lojban, which strives to be an international ur-language, like Esperanto, etc. These respective merits and demerits of this strategy with concern to adoption are the same as the spoken language itself, the use of it as a programming language would only attract people, I would think, as long as such usage doesn't change the language.

  2. Collision issues between implied semantics, literal semantics, implied syntax, and literal syntax. How would I express something like (pseudocode):

 print "quoted"

or

 print "quotes quotes"

This potentially gets even more difficult in recursive or self-referential languages like LispLanguage, etc. with lambda defined procedures (which have no names).
-- DanMoniz
You may also want to check out the equivalent LojbanLanguage wiki page: http://www.lojban.org/wiki/index.php/samtrosku -- ShaeErisson
Technologically speaking (sic), I would think that getting a computer to recognize spoken code would be much easier than getting it to recognize spoken language. Whereas spoken language is difficult because any word can follow practically any other word, it is already feasible for computers to accurately pick a phrase that a user speaks from among a list of options (e.g., the SpokenItems folder in MacOS). Since most computer languages have a very formally structured syntax, and many require that names be defined before they are used, the speech interpreter would be able to restrict the list of possible continuations for any given context, and thus greatly increase its accuracy.

Using speech to program in an existing computer language like this might not be 'natural', but it would certainly have its uses, e.g. allowing people with vision impairments and RepetitiveStrainInjury to program.

EditText of this page (last edited December 10, 2008) or FindPage with title or text search