doc-src/TutorialI/CodeGen/CodeGen.thy
changeset 8744 22fa8b16c3ae
child 8771 026f37a86ea7
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc-src/TutorialI/CodeGen/CodeGen.thy	Wed Apr 19 11:56:06 2000 +0200
@@ -0,0 +1,132 @@
+(*<*)
+theory CodeGen = Main:
+(*>*)
+
+text{*\noindent
+The task is to develop a compiler from a generic type of expressions (built
+up from variables, constants and binary operations) to a stack machine.  This
+generic type of expressions is a generalization of the boolean expressions in
+\S\ref{sec:boolex}.  This time we do not commit ourselves to a particular
+type of variables or values but make them type parameters.  Neither is there
+a fixed set of binary operations: instead the expression contains the
+appropriate function itself.
+*}
+
+types 'v binop = "'v \\<Rightarrow> 'v \\<Rightarrow> 'v";
+datatype ('a,'v)expr = Cex 'v
+                     | Vex 'a
+                     | Bex "'v binop"  "('a,'v)expr"  "('a,'v)expr";
+
+text{*\noindent
+The three constructors represent constants, variables and the combination of
+two subexpressions with a binary operation.
+
+The value of an expression w.r.t.\ an environment that maps variables to
+values is easily defined:
+*}
+
+consts value :: "('a \\<Rightarrow> 'v) \\<Rightarrow> ('a,'v)expr \\<Rightarrow> 'v";
+primrec
+"value env (Cex v) = v"
+"value env (Vex a) = env a"
+"value env (Bex f e1 e2) = f (value env e1) (value env e2)";
+
+text{*
+The stack machine has three instructions: load a constant value onto the
+stack, load the contents of a certain address onto the stack, and apply a
+binary operation to the two topmost elements of the stack, replacing them by
+the result. As for \isa{expr}, addresses and values are type parameters:
+*}
+
+datatype ('a,'v) instr = Const 'v
+                       | Load 'a
+                       | Apply "'v binop";
+
+text{*
+The execution of the stack machine is modelled by a function \isa{exec}
+that takes a store (modelled as a function from addresses to values, just
+like the environment for evaluating expressions), a stack (modelled as a
+list) of values, and a list of instructions, and returns the stack at the end
+of the execution---the store remains unchanged:
+*}
+
+consts exec :: "('a\\<Rightarrow>'v) \\<Rightarrow> 'v list \\<Rightarrow> ('a,'v)instr list \\<Rightarrow> 'v list";
+primrec
+"exec s vs [] = vs"
+"exec s vs (i#is) = (case i of
+    Const v  \\<Rightarrow> exec s (v#vs) is
+  | Load a   \\<Rightarrow> exec s ((s a)#vs) is
+  | Apply f  \\<Rightarrow> exec s ( (f (hd vs) (hd(tl vs)))#(tl(tl vs)) ) is)";
+
+text{*\noindent
+Recall that \isa{hd} and \isa{tl}
+return the first element and the remainder of a list.
+Because all functions are total, \isa{hd} is defined even for the empty
+list, although we do not know what the result is. Thus our model of the
+machine always terminates properly, although the above definition does not
+tell us much about the result in situations where \isa{Apply} was executed
+with fewer than two elements on the stack.
+
+The compiler is a function from expressions to a list of instructions. Its
+definition is pretty much obvious:
+*}
+
+consts comp :: "('a,'v)expr \\<Rightarrow> ('a,'v)instr list";
+primrec
+"comp (Cex v)       = [Const v]"
+"comp (Vex a)       = [Load a]"
+"comp (Bex f e1 e2) = (comp e2) @ (comp e1) @ [Apply f]";
+
+text{*
+Now we have to prove the correctness of the compiler, i.e.\ that the
+execution of a compiled expression results in the value of the expression:
+*}
+theorem "exec s [] (comp e) = [value s e]";
+(*<*)oops;(*>*)
+text{*\noindent
+This theorem needs to be generalized to
+*}
+
+theorem "\\<forall>vs. exec s vs (comp e) = (value s e) # vs";
+
+txt{*\noindent
+which is proved by induction on \isa{e} followed by simplification, once
+we have the following lemma about executing the concatenation of two
+instruction sequences:
+*}
+(*<*)oops;(*>*)
+lemma exec_app[simp]:
+  "\\<forall>vs. exec s vs (xs@ys) = exec s (exec s vs xs) ys"; 
+
+txt{*\noindent
+This requires induction on \isa{xs} and ordinary simplification for the
+base cases. In the induction step, simplification leaves us with a formula
+that contains two \isa{case}-expressions over instructions. Thus we add
+automatic case splitting as well, which finishes the proof:
+*}
+apply(induct_tac xs, simp, simp split: instr.split).;
+
+text{*\noindent
+Note that because \isaindex{auto} performs simplification, it can
+also be modified in the same way \isa{simp} can. Thus the proof can be
+rewritten as
+*}
+(*<*)
+lemmas [simp del] = exec_app;
+lemma [simp]: "\\<forall>vs. exec s vs (xs@ys) = exec s (exec s vs xs) ys"; 
+(*>*)
+apply(induct_tac xs, auto split: instr.split).;
+
+text{*\noindent
+Although this is more compact, it is less clear for the reader of the proof.
+
+We could now go back and prove \isa{exec s [] (comp e) = [value s e]}
+merely by simplification with the generalized version we just proved.
+However, this is unnecessary because the generalized version fully subsumes
+its instance.
+*}
+(*<*)
+theorem "\\<forall>vs. exec s vs (comp e) = (value s e) # vs";
+apply(induct_tac e, auto).;
+end
+(*>*)