Proof Of Correctness

Formal proof of correctness is not only tedious, time-consuming, and outlandishly expensive, it's also not necessarily effective! People commit errors when attempting a formal proof. There is no fool-proof way of determining if a proof is correct or not. -- JeffGrigg

There certainly is. A Proof Calculus is a method of stating a proof and then checking it's correctness within acceptable time bounds, which is a complete and correct process. This is what this page is about. What you can not check is, if you proved the right thing. In practice, this would be important only in exceptionable cases, where the specifications themselves are too complex to be immediately clear.

UnitTests are not only tedious, time-consuming, and outlandishly expensive, they're also not necessarily effective! People commit errors when programming UnitTests. There is no fool-proof way of determining if a UnitTest is correct or not.

See ProofsCantProveTheAbsenceOfBugs for a longer argument to this effect.

But you would in general write the proof and the program in a formal, machine-checkable, language. Then you automatically extract the program from the proof. This is how the CoqProofAssistant does its thing.

Assuming the proof assistant has been proven to be correct of course...

It's the same problem as UnitTests. You don't know that there are no bugs in your tests, but the agreement of your tests with your code gives you more confidence. You don't know that there are no errors in your proofs, but the agreement of your proofs and your code gives you more confidence, and proofs and tests can cover different aspects.

See also ProofCarryingCode.
People seem to have utterly bizarre ideas about what proofs are and how they look, and seem to take an all-or-nothing attitude that smacks of a StrawMan argument. Proofs don't have to be tedious, time-consuming or outlandishly expensive if their inherent limitations are understood and balanced against their benefits. You don't necessarily want to prove that the program meets the requirements, just as you can't prove the compiler, hardware and requirements themselves are correct. You can, however, easily prove that loops terminate, all cases are allowed for, types never mismatch, parameters are never NULL, and so forth.

The argument that ProofsCantProveTheAbsenceOfBugs is not the point. Proofs can and should be a tool to avoid using more difficult and error-prone methods instead. As a working mathematician, programmer and systems designer I am constantly appalled at how people dismiss such a powerful tool and insist on drilling this negative attitude into their juniors.

See also ProofsCanBeSimple

You may be able to prove that a given program conforms to a given set of requirements. But can you prove that the requirements correctly express what the user really wants?

This problem (the difference between verification and validation) is the driving force behind the take-up of IterativeDevelopment methods, such as ExtremeProgramming.

The effectiveness of a proof depends a lot on how subtle the things are that can go wrong, and what damage they cause when they go wrong. If a proof exercise eliminates even one nasty defect, it is effective as an exercise (see, for example, ExtremeProgrammingChallengeFourteen). It is well known that there are many ways to have defects in code that comes from "proven" designs, so the "proof" of the design is not a guarantee of defect freedom in the running software. The proof exercise is likely to find and remove defects that otherwise would be incredibly hard to isolate. (LeslieLamport? has written extensively on proofs of concurrent code. See

When I was in school learning about DiscreteMathematics? they told me that we could talk about a Partial Proof of Correctness with Respect to some proposition. Out in the world, I have seen that the problem is not so much to generate correct code matching the proposition, rather finding the correct proposition.

Business Analysts where I work tend not to be programmers or engineers, and so tend not to have the talent or desire to express or even conceptualize the business problem with enough detail and rigor for the formality required of a machine.

See CodeReview for something less formal. See CleanroomSoftwareEngineering - a way to use CodeReview that's less costly than a formal proof of correctness and more effective.

If you measure cost effectiveness as cost per bug found, I think so too.

I would measure it, not as the cost per bug found, but the cost of finding bugs per the cost of those bugs occurring in a deployed system.

To prove code you have to understand it at a deep level. The understanding brings benefits beyond that of the proof itself. On the other hand, it's often expensive. Compare with running the code through a debugger, which also brings understanding. -- DaveHarris

It is more useful to formally prove a concurrent algorithm than an implementation. The details of a typical implementation often act to hide the properties that you want to prove (freedom from DeadLock or LiveLock, lack of RaceConditions, fairness etc.). A CodeReview or UnitTests do not always help because design errors in concurrent algorithms often appear to be HeisenBugs. For example, a formal model of TcpIp discovered bugs in the protocol, even though it has been in use for years!

AnswerMe please: URL/citation?

I've recently had to modify some very tricky classes that form the heart of a multithreaded system. I used ExtractMethod to clean things up and I renamed the existing private methods to make the intentions more apparent. I can only believe that this will make things easier for the next person. The process really helped me understand the code. Sometimes, just asking "what is the best name for this thing?" helps me understand it faster than I would otherwise. -- MichaelFeathers

Proofs of correctness don't always have to be predicate calculus and lots of symbols (as the CleanRoom folk have said for a long time), but rigorous thinking about key points can be used to good effect. We used the latter at the Norwegian Central Bank, when reviewing the design of a new online bank-to-bank transaction collecting system. The online part was new. There's a funneling agency called BBS that feeds packets to NB. We walked through the system, starting from the LU6.2 link between computers, asking What is the guarantee that this provides? We looked for answers like "Delivers all of a packet or none of it" broadening out to "messages will arrive in sequence or not at all" all the up to and finally, "transactions will be completed in in sequence or the entire computer system is down." At which point we found a design flaw, and fixed it.

Did a similar thing with a multi-table mutual exclusion problem in the database. In this case we started with symbols, also found problems, and fixed and simplified until it was only one IF statement comparing a cell in two tables, needing only that the tables be read in a specific order...and no longer needed symbols to represent.

The uptake - people can think clearly sometimes using simple language instead of symbols, and that is often sufficient to the task. --AlistairCockburn

That reminds me of a cool experience we had with the formal language VDM. We were trying to communicate clearly within a team of programmers the desired rules for a complex set of graphical queries to be provided to some highly qualified but non-computer literate "knowledge workers". Like so many Human Computer Interaction issues this was more ProofOfThePuddingIsInTheEating? than ProofOfCorrectness territory but the data was highly complex and incomplete enough for the query possibilities to be easily misunderstood. We started with a limited prototype that had been approved at the highest level and an English language description but the combination was found confusing and the programming were floundering. So NeilDyer went off to do a VDM version. The neat thing was that the English and the VDM versions each turned out to have a single bug, in different places, and in discovering this fact the whole team came to an unambiguous understanding of the "real rules". The rules of course were in any case varied after delivery to the real users, but at least they started with something reasonably sensible and self-consistent to criticize!

Please note that I would definitely try to find a way to "prove the rules" incrementally using XP and UnitTesting today, refactoring and building from the prototype - and evolving from data just for Mali to all of Africa to the World over a number of years in this case! But this was 1992. And the formal language, used in a limited way, provided real value in this case.

-- RichardDrake
Can and should proofs be extended to prove that code not only is correct but is "safe" i.e. does not have the potential for buffer overflows and other problems that can be exploited by attackers and viruses? Example:

  ASN.1 'Double Free' Vulnerability - CAN-2004-0123 
  A remote code execution vulnerability exists in the Microsoft ASN.1 Library. 
  The vulnerability is caused by a possible "double-free" condition in the Microsoft 
  ASN.1  Library that could lead to memory corruption on an affected system. 
  An attacker who successfully exploited this vulnerability could take 
  complete control of an affected system. However, under the most likely 
  attack  scenario this issue is a denial of service vulnerability

Could things like this be prevented by formal methods? Or are these kinds of problems just an artifact of using C and C++ that will go away when languages that handle memory management automatically such as Java are used more prevalently?

Yes, if you take the care to put that sort of stuff into the specification. As with UnitTests, a failed proof of correctness can only discover a discrepency between your expectations and an implementation. They cannot encode intentions, at least not without a standard library of proofs.
Could things like this be prevented by formal methods? Or are these kinds of problems just an artifact of using C and C++ that will go away when languages that handle memory management automatically such as Java are used more prevalently?

This type of thing will go away when lanuages that aren't completely defective are used for most programming languages. It would be pretty foolish to buy an enterprise-class peice of software written in c/c++ or languages with similar flaws (explicitly-managed memory, unchecked array accesses). Using a poor language like C or C++ for something that does not need the features siad language is incompetent. It baffles me how many open source projects that have no need for access to stuff at the low level are written in C. If its not managed code, its crap. Exceptions - perhaps disk access routines of a DatabaseServer?? A voice-mail system or an enterprise-anything - forget it.
A proof may not be perfect, but an item that has a proof is much more likely to be correct and to be correct to a greater degree than an item without a proof. Likewise, code with a ProgrammerTest set is much more likely to be correct than code without the test set.

Reminds me of an old engineering joke. A mathematician and an engineer die in an accident. In the afterlife, each is chained to a wall and on the opposite wall is a beautiful woman. Their host informs them that every 5 minutes the walls will close half the distance. The mathematician says "This must be hell, I will never reach the women." The engineer, however, says "This must be heaven, in about an hour I will be close enough for all practical purposes." Moral: absolutes are not always necessary.

If your code uses components i.e. other objects/methods that have been proven correct, does that simplify things? Just have to prove the logic of your wrappers around them? What if the components are binary and you have no access to their code and proofs, how do you verify their correctness?

If the component fails under use, there is a problem. Given a choice between a component with a proof of correctness and one without, the chances are the component with the proof is less likely to fail. The counter argument would have to be either: the proof of correctness is immaterial, i.e., it provides no indication of the correctness of the component; or the proof of correctness degrades the correctness of the component.

Why would the one with the proof be more likely to fail? Wouldn't that be the one designed (and presumably tested) more rigorously? [Sorry, typo. I omitted the word "less" in "... less likely to fail." Now corrected.]

You demand that your components come with formal specification and with proofs that they meet their spec. See ProofCarryingCode.

"Dear MicroSoft, please send us the formal specification and proofs for MSXML.DLL, CRYPTO.DLL ..."

Note that formal proof-checking systems actually make it possible to write statements that are half-way between tests and proofs, like this example from Agda:

 prop1 : {x y z : ℕ} -> reverse (x ∷ y ∷ z ∷ []) ≡ (z ∷ y ∷ x ∷ [])
 prop1 = refl

It's fairly close to a test, because it's giving a very specific example. But it's also quantified universally over the variables x y and z, so it "tests" a small infinite set of inputs all in one go.

Some examples:

For ObjectOriented:
See also ProcessCalculus and FormalModelChecking, SymbolicLogic, ProofsCanBeSimple, LoopInvariantAnalysis.

View edit of December 12, 2014 or FindPage with title or text search