Proof Annotations For Bubble Sort

I did the following proof annotations for bubblesort today morning in the underground (yes I admit, I worked around with it a bit when I keyed it in):

The following bubble sort method is given in a hypothetical java-like language.

  // [ predicate ] is a proof annotation about the variables at the spot where this "comment" is located. 
  // The predicate must hold (evaluate to true) every time the program can reach that point.
  // These predicates state simple but relevant facts about the state of the algorithm like what part is already sorted.
  // The predicate language must be closed i.e. not depend on unlimited execution (not turing complete), but only predicate logic.
  // Example: "forall 0 < num i < n : a[i]>0" means that all elements of a with an index of 1..n-1 are positive.
  // Strictly this forall is syntactic sugar for  "forall i elem num : (0 < i && i < n) && ( a(i) > 0 )"

void bubblesort(number[] a) { unsigned n = a.length;

for (unsigned i = 0; i + 1 < n; i++) { // 1 (partially sorted at the end, trivially true for i=0) // [ forall n-i <= num l && l < n : // forall l <= k && k < n : // a[k] <= a[l] // ]

for (unsigned j = 0; j + 1 < n-i; j++) { // 2 (higher than all earlier elements; trivially true for j=0) // [ forall 0 <= num k && k < j : // a[k] <= a[j] // ]

if (a[j] > a[j+1]) { //[ a[j] > a[j+1] ] // 3a (trivial from if condition) a[j] <=> a[j+1]; //[ a[j] < a[j+1] ] // 3b (trivial, swap) //[ a[j] <= a[j+1] ] // 3c (trivial, relaxation of 3b) } else { //[ a[j] <= a[j+1] ] // 3d (trivial) }

//[ a[j] <= a[j+1] ] // 3 (two elements ordered) //[ forall 0 <= num k && k < j+1: a[k] <= a[j+1] ] // 4 (higher than all earlier elements+1) } //[ forall 0 <= num k && k < n-i: a[k] <= a[n-i] ] // 5 (last one is higher than all earlier elements)

//[ forall n-i-1 <= num l && l < n : forall l <= num k && k < n : a[k] <= a[l] ] // 6 (partially sorted at the end+1) }

//[ forall 0 < num l && l < n : forall l < num k && k < n : a[k] <= a[l] ] // 7 (all sorted) // might be shortened to [ sorted a ] if "sorted" were a predefined predicate }

I started with the result of the "swap" (annotation 3); later I added 3a to 3d, because that would guarantee that 3 can be proven by simply joining over both if-branches.

I remembered, that the inner loop will move the highest element to the end, so I wrote it down and to prove it I added annotation 2 to the beginning of the loop. Any simple reasoning system will now be able to prove 4 from 2 and 3 as well as 2 from 4 (next iteration).

Then I wrote down 6, that the last elements are ordered. To prove this, I added 1 (trivially true at start) and 5 (just the annotation for the last nested loop iteration).

From this follows 7 the same way as 5.

We do not even need mathematical induction in the prover to prove 5 or 7, but just plug in the instance of the annotation for the last element of the nested loop.

I believe, that these annotations can be proven fully automatically, because they are I suppose, that most of these annotations could be dropped and inferred automatically (especially 3a-3d, but possibly also 1, 2, 4 and 6). But every dropped annotation will drive the search space for the prover up (with all the given annotations it will not have to do any searching, but only to test).

I think the difficulty is not the proof itself, but selecting the few relevant annotations (invariants), that together lead to the conclusion (here 7). The programmer, who has an understanding of the algorithm will be able to select these easily and the prover will easily prove them from the expressions (or not) and combine them together to the surrounding annotations (or not). An automatic prover cannot guess these invariants, because there are just too many of them. To repeat: The difficulty lies in the selection of the annotations not in their complexity.

Of course, I can't test the above program, because these proof annotations are not yet (:-) understood by any system, but they might be added easily, say to Java as code attributes, that may (!) be evaluated by the JavaVirtualMachine to, e.g., improve the code (remove unneeded bounds checks or assertions) or by an automatic verification system.

The latter could, e.g., run in the IDE and tell us beforehand, that e.g. the last annotation ([sorted a]) cannot be proved from the other (I probably made fence post errors and could then correct these until the proof completes successfully). That way there will be no errors in the proof provided the sorted predicate is defined correctly, because the primitive assertions must follow from the code itself.

-- GunnarZarncke


Shouldn't you also prove that n is the correct number of elements to sort?

What if n is the maximum unsigned value? Then your calculations i+1 and j+1 can overflow!

Why do the annotations have int types for indices and the code has unsigned? Are the ints in the annotations supposed to be of infinite range?

    //[ forall 0 < unsigned l && l < n : forall l < unsigned k && k < n : a[k] <= a[l] ]  // 8 (all sorted in range)
Critique: I cannot read the annotations, even for an example as simple as this.
Possible implementation strategy

This probably would not generate a universal checker, but it seems like it would lead to something damn useful:

Add an "assume" annotation, to patch over what you can't get the checker to do automatically (yet). Then, start with a sample piece of code (probably much simpler than a sort), and an annotation that assumes the final result, and one to check the final result:
    if (a >= 10) a = 0;
    //[ assume [a < 10]]
    //[a < 10]
At each step, find an assumption that you are unsure about, then set about proving it, and remove the assumption. Proving the assumption may require only changing the annotations, or it may require extending the checker.

I take it, the checker should reject any program with assumptions it cannot prove (yet) and list exactly these remaining assumptions.

Actually, I was thinking that the checker should accept an assumption as a fact, unless a previously existing fact contradicts it - i.e. it is an axiom. Also, the programming language definition (and maybe the implementation) would be treated as a set of axioms. The source code could be treated also as a set of axioms, and the non-assumption annotations as hypotheses.

So, maybe there are 3 states for any given annotation: And an attribute "depends_on_assumptions" that is true if It would, of course, be able to list the assumptions that any hypothesis depends on.


Maybe I should add, that I have done little actual formal verification of programs myself. I know, that it is done, but not, if the annotations sketched here are a viable way.
This sounds similar to the Z Notation (ZedLanguage), a FormalSpecification language.

From what I have seen, Z is not about annotating a program (like shown in the comments above), but about the separate specification of a program. Further it has quite a different syntax and use. Not what I expected.
It looks like EscJava does at least 90% of this.
See also HoareTriple

See ProofsCanBeSimple, AutomatedTheoremProving, ProofOfCorrectness

View edit of May 9, 2008 or FindPage with title or text search