Its An Operator Problem

RefactorMe: Please feel free to edit this as it is my first attempt at a pattern submission. I am sure that this involves several other AntiPatterns that should be referenced.

AntiPattern Name: ItsAnOperatorProblem

Problem: Your module testing works, the test team's testing works but the users keep getting errors.

Context: I am trying to document a mistake that I recently made, in the hope that it might help others. I programmed a receipt printer interface. The basic operation was:

This is problem probably involves several other AntiPatterns, because the program was tested but not deployed for several months. At the time of deployment I was very busy on another project which had an imminent deadline. After deployment many users reported failures and errors.

Supposed Solution: Since the product had been thoroughly tested, and I was busy, I said it must be operator error.

Resulting context: The situation appeared to improve, to the point where it was decided to deploy it to more users. After the deployment we were inundated with errors. I did not have time to look at the problem in detail so the release was put on hold.

Real solution: When the other work was completed, I examined the situation in detail. There were several factors that should have given a clue to the problem:

After examination of log files I discovered that whereas the testers followed the exact procedure above (as I did), users deviated. Some pressed the button on the printer several times. Users would also press the button before inserting the document, then insert it, then press again. This was not OperatorError?, but an alternative sequence of events, which the previous software coped with. The apparent "improvement with time" was users getting used to the new systems limitations, not some miraculous self-repair!

Speculation: The multiple presses could be because the user's printer buttons may have faults, either bouncing causing multiple presses or not always working so the users got into the routine of pressing more than once.

The pressing a button without a document present was probably an operator mistake, but the old system did not report an error, it just retried so the user could insert the document and press the button again.

The original program was purely sequential. It would wait for a signal meaning the button was pressed, send a signal to check for a document, wait for the response, send a print, wait for a response etc. If another "button pressed" signal was received instead of a "document OK" signal, the print failed etc. Knowing the problem the actual software solution was obvious (ignoring button press messages when already printing etc.)

Faulty belief: There were two faulty beliefs here:

Our Testing Was Adequate Clearly it was not. Knowing how the system should be used meant that we did not rigorously test other possible alternatives. It is difficult for an expert user to anticipate all the ways a novice might use a system.

Because We Had Tested, It Was An Operator Problem There were two factors here. First, our belief in the testing. Second, because we were busy with another project it was very tempting to throw back a quick answer.
Discussion:

This suggests a GUI design pattern / thread safety pattern:

DisableJobRequestsWhileRunningJob
In many similar situations, the resulting context includes:

Because the users were blamed when they complained, they are less likely to point out problems in the future.
CategoryAntiPattern CategoryDevelopmentAntiPattern

EditText of this page (last edited June 27, 2012) or FindPage with title or text search