In many systems, the response to an exception or error is to ReportAndDie
. One style is to display a dialog that says something like "An error has occurred, program terminating" and a single "Ok" button that finishes the deed. Another style is to dump a logfile or something similar and then exit. Sometimes a stacktrace or its equivalent is displayed.
All these share common "choreography" - the user is told that something terrible happened, and the program or operation is then terminated.
This page is crying out for advocation of an alternate approach: In ReincarnationProgramming?
is replaced with ReportAndDieAndLiveAgain?
I use a product called TreePad
, written by a cool Dutch guy, HenkHagedoorn?
, using Delphi. When it does something horrible, like an access violation, it throws the exception, issues the complaint, waits for the user to click OK, and recovers
from the error without data loss.
It's the only Windows program I've ever used that accomplishes this.
I would suggest that whatever he's doing, or whatever Borland is doing in Delphi that makes this possible should be adopted as a standard.
The only technical features that come to mind that allow ReincarnationProgramming? are checkpointing and rollback and such, which are rarely pervasive anywhere except a database server. Are there others?
People often claim this with regard to Delphi. As a user of many Delphi applications, and a sometime Delphi programmer, it's an illusion - the application isn't actually recovering from an error, it's simply declining to crash, leaving internal data in an unstable state. This may accidentally be recovered from or it may not. Sometimes it's just obnoxious, like when a Delphi app decides to iterate over a non-existent array, popping up dozens of access violation boxes but refusing to die. People don't like applications that crash, but a crash is the *only* reasonable answer to an exception you aren't sure you can recover from. If you are sure you can recover from it, there's no reason to alert the user. The delphi framework installs an exception handler that pops up that message box, then continues as if nothing had happened. There is no magic involved whatsoever.
Smalltalk applications written against VisualWorks
(I don't know about Squeak) can gracefully recover. Sometimes they don't notify the user, but instead log the incident in some way.
Recover how? Cosmic rays destroy the contents of RAM. A redundancy check reveals a fatal error: important data is smashed. And Smalltalk just...recovers. I think more specifics are in order.
Ok, ok, ok. What I mean is that in the Smalltalk environment, the context of the expression that raised the exception is kept around, and is pointed at by the exception. When an exception is caught, the handler can (some of this varies by vendor):
Yeah; back in 1990 I implemented an error package for C++ that did exactly this list of things (I had to implement exceptions, too; they weren't yet available in C++). I left the company before seeing what sorts of idioms formed around these capabilities, and I haven't seen such elsewhere. Do you know of a URL that has info about how these things can be used to best effect?
- Do something and continue after the expression/block that raised the exception
- Do something and reraise the exception (to give other handlers a shot)
- Do something and restart the expression that raised the exception
- Do something and restart the block containing the expression that raised the exception
- Do something and cause the expression to return with a value, just as if the exception had never occurred
- Other options that I don't remember at the moment
So when your user is deep in a stack of nested modal dialogs, is trying to save and exit, and accidently clicks the (empty) A:\ button instead of the C:\\ button that she wanted, you can handle the no-volume-mounted exception and return back to the nest, no harm done. As opposed to making the user start at the top all over again.
Or, to quote an old apocryphal myth, when your missile guidance system pulls a segmentation fault in the middle of a routine in a math package calculating the inbound target's velocity, you can tell your code to:
- Log the failure
- Check a timer to see if you have time to be smart
- Load an alternative class
- Silently try again
- Repeat, working further and further up the stack until it looks right
- Resume where you left off
Isn't the CosmicRaysDestroyedRam?
exception fatal? All kidding aside, you surely COULD try reloading RAM and restarting the operation. That's what it's for.
- Yep. I've done that even in C with signal handlers forcing process restart. Handy, especially for a 24/7 task. It's weird, though, fellow programmers actively argued against this approach (at a whole succession of companies, not just one environment). I don't understand why it's so widespread to look at error recovery as evil. "Thou shalt not catch the SEGV signal for any reason!" Huh?
(See also http://catless.ncl.ac.uk/Risks/23.12.html#subj2
lmao. I didn't know Microsoft made aircraft instruments. I wonder if the displays spanked the user for improperly shutting them down on their way back up.
word processor is rather less stable than I would like, but its pattern is "save the current document to a .CRASHED file and die". Thus, although it crashes too often, I've never lost work. (Aside: I'm not using the very latest version; it may be getting more stable.)
The ability to "retry" (restart) in a smart way after an error was discovered is an important factor in software quality - knowing that bugfree software is hard to find. The critical part for a successful recovery seems to be the ability of the program to access a non-corrupted state to continue the execution from there. Is there a universal strategy for this? (A WordProcessor
seems to work fine with autosaves, but is this always applicable? What are other approaches?)
See also LimpVersusDie
Related to SegmentationFault