Thursday, 24 May 2007

The type says everything

I've mentioned a number of times that given a sufficiently rich type theory we can use the type to provide a specification for the function, such that the function is correct by construction. I'd like to give a simple example of how this works in the Coq proof assistant.

The following code is an illustration of how the type theory of Coq allows you to fully specify the behaviour of functions. Kragen Sitaker made the comment that there weren't too many reasonable interpretations of the function of the type given to the assoc function:

A -> list (A * B) -> option B 

that is, a function taking an element of type A, a list of pairs of A and B, to the possibility of a B (and possible failure). That got me wondering how hard it would be to tighten the noose so that there was only one way to interpret the type but still possibly many implementations that satisfy the type, all of which will yield the same result.

The following code illustrates that this is indeed possible to do in Coq.

Require Import List. 
Parameter A : Set.
Parameter B : Set. 
Definition assoc_list := list (A * B)%type. 
Parameter eq_dec : forall (x:A) (y:A),{x=y}+{x<>y}.

Theorem assoc : 
  forall (x:A) (l: assoc_list), 
    {y | In (x,y) l} + {forall (y:B), ~In (x,y) l}.
    (fix assoc (x:A) (l:assoc_list) {struct l} 
      : {y | In (x,y) l} + {forall (y:B), ~In (x,y) l}
      := match l 
           return {y | In (x,y) l} + {forall (y:B), ~In (x,y) l}
           | nil => inright _ (fun y => (fun p : In (x,y) nil => _))
           | ((x',y)::t) => 
             match eq_dec x x' with
               | left lft => inleft _ (exist _ y _)
               | right rgt => match assoc x t with 
                                | inleft (exist y p) => inleft _ (exist _ y _)
                                | inright inrgt => inright _ _
         end) ; clear assoc ; eauto ; subst. constructor. reflexivity.
  firstorder. intros. intro. inversion H ; clear H. firstorder. congruence.
  apply (inrgt y0). assumption.

This isn't the most beautiful code but what it lacks in beauty of implementation it makes up for in ease of reading. There are only a couple of lines that you need to understand to be assured that this is in fact the correct implementation.

Theorem assoc : 
  forall (x:A) (l: assoc_list), 
    {y | In (x,y) l} + {forall (y:B), ~In (x,y) l}.

This function has a type that takes an "x" of type "A", a list of pairs and supplies a value "y" of type "B" with a proof that it came from a pair in which the x provided occurs *AND* that that pair exists in the list l. If the function can't find a pair from which to supply a y, it must supply a proof that no such pair exists in the list.

First a bit of the notation so you can read this theorem properly

{ y | P y }

Can be read: "there exists a y such that the property P holds of y".

The notation:

A + {B}

Means that we must supply an A or a proof of B.

The "In" predicate is an inductive predicate which specifies membership in a list and is provided by the Coq library.

With a little practice reading types, we can see that the two lines are correct. We can then run Coq on this code to check that the proof is correct and we can be assured of the correctness of the implementation WITHOUT EVER READING THE CODE! This means we have reduced the problem of correctness to two simple lines of completely declarative statements.

This function is more complicated than a normal assoc function because it carries all of these proof terms along with it. However, because of the clever people involved in writing Coq we have actually mixed two different type universes together. One (called Set) for values and one (called Prop) for proofs. Coq uses this fact to do something very clever. The {y|P y} existential type and the A+{B} type are designed such that the "P y" proof and the {B} proof disappear when we extract (compile) our code. As an illustration of this amazing fact look at the following ocaml code which has been automatically extracted from our definition:

let rec assoc x = function
  | Nil -> Inright
  | Cons (p, t) ->
      let Pair (x', y) = p in
      (match eq_dec x x' with
         | Left -> Inleft y
         | Right -> assoc x t)

Voila! Not only are the proof terms gone, but this is pretty much the code you probably would have written yourself. Notice that "Inright" is a constructor with no information (other than failure), basically implementing the "Nothing" of a Maybe type, and the "Inleft" as the "Just" of a Maybe type.

We have extracted a provably terminating, well specified, totally correct function that runs in ocaml! And to top it off we can also compile to Haskell!!

assoc :: A -> Assoc_list -> Sumor B
assoc x l =
  case l of
    Nil -> Inright
    Cons p t ->
      (case p of
         Pair x' y ->
           (case eq_dec x x' of
              Left -> Inleft y
              Right -> assoc x t))

So it isn't as if this isn't without some cost. There is a huge learning curve on Coq and I'm by no means an expert. The proofs can be arduous, and although this one only took me a couple of minutes, it took far longer than it would have to implement the function directly in ocaml. However as I gain experience with Coq, I increasingly believe the approach will scale better than traditional approaches to software design.

In addition to these caveats there is more than one function that satisfies this type. Namely there is an assoc which starts returning from the end of the list, instead of the beginning. We could get rid of this problem by specifying which of the elements should be returned, returning all of them, or restricting formation of assoc's such that they are mappings. The latter is very elegant, but somewhat more involved than what we have done.

Functional programming reduces the difficulty of writing correct software by reducing the ways in which bugs can occur. However, most functional programming languages are unable to put complex constraints like "this is a sorted list" or "implements an assoc function" into the language. When other functions rely on the behaviour all kinds of funny things can happen.

I recently had a nasty non-termination bug in SML because a function *required* that the input list be sorted or the function wouldn't terminate. My sort routine made use of a bogus less-than-equal predicate over a data-type which I had neglected to write properly. This caused hours of very difficult debugging.

I'm acting on the thesis that I can avoid these problems and I am rewriting a large program in Coq as an experiment. I'll let you know how it goes.