8744
|
1 |
(*<*)
|
|
2 |
theory CodeGen = Main:
|
|
3 |
(*>*)
|
|
4 |
|
10885
|
5 |
section{*Case Study: Compiling Expressions*}
|
9844
|
6 |
|
|
7 |
text{*\label{sec:ExprCompiler}
|
11458
|
8 |
\index{compiling expressions example|(}%
|
8744
|
9 |
The task is to develop a compiler from a generic type of expressions (built
|
10795
|
10 |
from variables, constants and binary operations) to a stack machine. This
|
8744
|
11 |
generic type of expressions is a generalization of the boolean expressions in
|
|
12 |
\S\ref{sec:boolex}. This time we do not commit ourselves to a particular
|
|
13 |
type of variables or values but make them type parameters. Neither is there
|
|
14 |
a fixed set of binary operations: instead the expression contains the
|
|
15 |
appropriate function itself.
|
|
16 |
*}
|
|
17 |
|
10171
|
18 |
types 'v binop = "'v \<Rightarrow> 'v \<Rightarrow> 'v";
|
8744
|
19 |
datatype ('a,'v)expr = Cex 'v
|
|
20 |
| Vex 'a
|
|
21 |
| Bex "'v binop" "('a,'v)expr" "('a,'v)expr";
|
|
22 |
|
|
23 |
text{*\noindent
|
8771
|
24 |
The three constructors represent constants, variables and the application of
|
|
25 |
a binary operation to two subexpressions.
|
8744
|
26 |
|
10795
|
27 |
The value of an expression with respect to an environment that maps variables to
|
8744
|
28 |
values is easily defined:
|
|
29 |
*}
|
|
30 |
|
10171
|
31 |
consts value :: "('a,'v)expr \<Rightarrow> ('a \<Rightarrow> 'v) \<Rightarrow> 'v";
|
8744
|
32 |
primrec
|
8771
|
33 |
"value (Cex v) env = v"
|
|
34 |
"value (Vex a) env = env a"
|
|
35 |
"value (Bex f e1 e2) env = f (value e1 env) (value e2 env)";
|
8744
|
36 |
|
|
37 |
text{*
|
|
38 |
The stack machine has three instructions: load a constant value onto the
|
10795
|
39 |
stack, load the contents of an address onto the stack, and apply a
|
8744
|
40 |
binary operation to the two topmost elements of the stack, replacing them by
|
9792
|
41 |
the result. As for @{text"expr"}, addresses and values are type parameters:
|
8744
|
42 |
*}
|
|
43 |
|
|
44 |
datatype ('a,'v) instr = Const 'v
|
|
45 |
| Load 'a
|
|
46 |
| Apply "'v binop";
|
|
47 |
|
|
48 |
text{*
|
8771
|
49 |
The execution of the stack machine is modelled by a function
|
9792
|
50 |
@{text"exec"} that takes a list of instructions, a store (modelled as a
|
8771
|
51 |
function from addresses to values, just like the environment for
|
|
52 |
evaluating expressions), and a stack (modelled as a list) of values,
|
10971
|
53 |
and returns the stack at the end of the execution --- the store remains
|
8771
|
54 |
unchanged:
|
8744
|
55 |
*}
|
|
56 |
|
10171
|
57 |
consts exec :: "('a,'v)instr list \<Rightarrow> ('a\<Rightarrow>'v) \<Rightarrow> 'v list \<Rightarrow> 'v list";
|
8744
|
58 |
primrec
|
8771
|
59 |
"exec [] s vs = vs"
|
|
60 |
"exec (i#is) s vs = (case i of
|
10171
|
61 |
Const v \<Rightarrow> exec is s (v#vs)
|
|
62 |
| Load a \<Rightarrow> exec is s ((s a)#vs)
|
|
63 |
| Apply f \<Rightarrow> exec is s ((f (hd vs) (hd(tl vs)))#(tl(tl vs))))";
|
8744
|
64 |
|
|
65 |
text{*\noindent
|
9792
|
66 |
Recall that @{term"hd"} and @{term"tl"}
|
8744
|
67 |
return the first element and the remainder of a list.
|
11458
|
68 |
Because all functions are total, \cdx{hd} is defined even for the empty
|
8744
|
69 |
list, although we do not know what the result is. Thus our model of the
|
10795
|
70 |
machine always terminates properly, although the definition above does not
|
9792
|
71 |
tell us much about the result in situations where @{term"Apply"} was executed
|
8744
|
72 |
with fewer than two elements on the stack.
|
|
73 |
|
|
74 |
The compiler is a function from expressions to a list of instructions. Its
|
10795
|
75 |
definition is obvious:
|
8744
|
76 |
*}
|
|
77 |
|
10171
|
78 |
consts comp :: "('a,'v)expr \<Rightarrow> ('a,'v)instr list";
|
8744
|
79 |
primrec
|
|
80 |
"comp (Cex v) = [Const v]"
|
|
81 |
"comp (Vex a) = [Load a]"
|
|
82 |
"comp (Bex f e1 e2) = (comp e2) @ (comp e1) @ [Apply f]";
|
|
83 |
|
|
84 |
text{*
|
|
85 |
Now we have to prove the correctness of the compiler, i.e.\ that the
|
|
86 |
execution of a compiled expression results in the value of the expression:
|
|
87 |
*}
|
8771
|
88 |
theorem "exec (comp e) s [] = [value e s]";
|
8744
|
89 |
(*<*)oops;(*>*)
|
|
90 |
text{*\noindent
|
11458
|
91 |
This theorem needs to be generalized:
|
8744
|
92 |
*}
|
|
93 |
|
10171
|
94 |
theorem "\<forall>vs. exec (comp e) s vs = (value e s) # vs";
|
8744
|
95 |
|
|
96 |
txt{*\noindent
|
11458
|
97 |
It will be proved by induction on @{term"e"} followed by simplification.
|
|
98 |
First, we must prove a lemma about executing the concatenation of two
|
8744
|
99 |
instruction sequences:
|
|
100 |
*}
|
|
101 |
(*<*)oops;(*>*)
|
|
102 |
lemma exec_app[simp]:
|
10171
|
103 |
"\<forall>vs. exec (xs@ys) s vs = exec ys s (exec xs s vs)";
|
8744
|
104 |
|
|
105 |
txt{*\noindent
|
9792
|
106 |
This requires induction on @{term"xs"} and ordinary simplification for the
|
8744
|
107 |
base cases. In the induction step, simplification leaves us with a formula
|
9792
|
108 |
that contains two @{text"case"}-expressions over instructions. Thus we add
|
11458
|
109 |
automatic case splitting, which finishes the proof:
|
8744
|
110 |
*}
|
10171
|
111 |
apply(induct_tac xs, simp, simp split: instr.split);
|
|
112 |
(*<*)done(*>*)
|
8744
|
113 |
text{*\noindent
|
11428
|
114 |
Note that because both \methdx{simp_all} and \methdx{auto} perform simplification, they can
|
|
115 |
be modified in the same way as @{text simp}. Thus the proof can be
|
8744
|
116 |
rewritten as
|
|
117 |
*}
|
|
118 |
(*<*)
|
9933
|
119 |
declare exec_app[simp del];
|
10171
|
120 |
lemma [simp]: "\<forall>vs. exec (xs@ys) s vs = exec ys s (exec xs s vs)";
|
8744
|
121 |
(*>*)
|
10971
|
122 |
apply(induct_tac xs, simp_all split: instr.split);
|
10171
|
123 |
(*<*)done(*>*)
|
8744
|
124 |
text{*\noindent
|
|
125 |
Although this is more compact, it is less clear for the reader of the proof.
|
|
126 |
|
8771
|
127 |
We could now go back and prove \isa{exec (comp e) s [] = [value e s]}
|
8744
|
128 |
merely by simplification with the generalized version we just proved.
|
|
129 |
However, this is unnecessary because the generalized version fully subsumes
|
11458
|
130 |
its instance.%
|
|
131 |
\index{compiling expressions example|)}
|
8744
|
132 |
*}
|
|
133 |
(*<*)
|
10171
|
134 |
theorem "\<forall>vs. exec (comp e) s vs = (value e s) # vs";
|
9458
|
135 |
by(induct_tac e, auto);
|
8744
|
136 |
end
|
|
137 |
(*>*)
|