RefactorMe: Please feel free to edit this as it is my first attempt at a pattern submission. I am sure that this involves several other AntiPatterns that should be referenced.
: Your module testing works, the test team's testing works but the users keep getting errors.
: I am trying to document a mistake that I recently made, in the hope that it might help others. I programmed a receipt printer interface. The basic operation was:
- The operator selects "print" option in a GUI program.
- The printer is readied and a light comes on.
- The operator inserts a receipt into the printer and presses a button on the printer
- The receipt is printed, then the program continues.
This is problem probably involves several other AntiPattern
s, because the program was tested but not deployed for several months. At the time of deployment I was very busy on another project which had an imminent deadline. After deployment many users reported failures and errors.
: Since the product had been thoroughly tested, and I was busy, I said it must be operator error.
: The situation appeared to improve, to the point where it was decided to deploy it to more users. After the deployment we were inundated with errors. I did not have time to look at the problem in detail so the release was put on hold.
: When the other work was completed, I examined the situation in detail. There were several factors that should have given a clue to the problem:
- Some users had no problems, whereas others had many failures.
- The situation appeared to improve with time.
- When a tester worked with someone having problems everything worked fine.
After examination of log files I discovered that whereas the testers followed the exact procedure above (as I did), users deviated. Some pressed the button on the printer several times. Users would also press the button before inserting the document, then insert it, then press again. This was not OperatorError?
, but an alternative sequence of events, which the previous software coped with. The apparent "improvement with time" was users getting used to the new systems limitations, not some miraculous self-repair!
Speculation: The multiple presses could be because the user's printer buttons may have faults, either bouncing causing multiple presses or not always working so the users got into the routine of pressing more than once.
The pressing a button without a document present was probably an operator mistake, but the old system did not report an error, it just retried so the user could insert the document and press the button again.
The original program was purely sequential. It would wait for a signal meaning the button was pressed, send a signal to check for a document, wait for the response, send a print, wait for a response etc. If another "button pressed" signal was received instead of a "document OK" signal, the print failed etc. Knowing the problem the actual software solution was obvious (ignoring button press messages when already printing etc.)
There were two faulty beliefs here:
Our Testing Was Adequate
Clearly it was not. Knowing how the system should
be used meant that we did not rigorously test other possible alternatives. It is difficult for an expert user to anticipate all the ways a novice might use a system.
Because We Had Tested, It Was An Operator Problem
There were two factors here. First, our belief in the testing. Second, because we were busy with another project it was very tempting to throw back a quick answer.
This suggests a GUI design pattern / thread safety pattern:
In many similar situations, the resulting context includes:
Because the users were blamed when they complained, they are less likely to point out problems in the future.