| author | wenzelm | 
| Thu, 04 Jan 2024 15:16:10 +0100 | |
| changeset 79419 | 93f26d333fb5 | 
| parent 69597 | ff784d5a5bfb | 
| permissions | -rw-r--r-- | 
| 8744 | 1 | (*<*) | 
| 16417 | 2 | theory CodeGen imports Main begin | 
| 8744 | 3 | (*>*) | 
| 4 | ||
| 67406 | 5 | section\<open>Case Study: Compiling Expressions\<close> | 
| 9844 | 6 | |
| 67406 | 7 | text\<open>\label{sec:ExprCompiler}
 | 
| 11458 | 8 | \index{compiling expressions example|(}%
 | 
| 8744 | 9 | The task is to develop a compiler from a generic type of expressions (built | 
| 10795 | 10 | from variables, constants and binary operations) to a stack machine. This | 
| 8744 | 11 | generic type of expressions is a generalization of the boolean expressions in | 
| 12 | \S\ref{sec:boolex}.  This time we do not commit ourselves to a particular
 | |
| 13 | type of variables or values but make them type parameters. Neither is there | |
| 14 | a fixed set of binary operations: instead the expression contains the | |
| 15 | appropriate function itself. | |
| 67406 | 16 | \<close> | 
| 8744 | 17 | |
| 58860 | 18 | type_synonym 'v binop = "'v \<Rightarrow> 'v \<Rightarrow> 'v" | 
| 58305 
57752a91eec4
renamed 'datatype' to 'old_datatype'; 'datatype' is now alias for 'datatype_new'
 blanchet parents: 
48985diff
changeset | 19 | datatype (dead 'a, 'v) expr = Cex 'v | 
| 
57752a91eec4
renamed 'datatype' to 'old_datatype'; 'datatype' is now alias for 'datatype_new'
 blanchet parents: 
48985diff
changeset | 20 | | Vex 'a | 
| 58860 | 21 |                       | Bex "'v binop"  "('a,'v)expr"  "('a,'v)expr"
 | 
| 8744 | 22 | |
| 67406 | 23 | text\<open>\noindent | 
| 8771 | 24 | The three constructors represent constants, variables and the application of | 
| 25 | a binary operation to two subexpressions. | |
| 8744 | 26 | |
| 10795 | 27 | The value of an expression with respect to an environment that maps variables to | 
| 8744 | 28 | values is easily defined: | 
| 67406 | 29 | \<close> | 
| 8744 | 30 | |
| 27015 | 31 | primrec "value" :: "('a,'v)expr \<Rightarrow> ('a \<Rightarrow> 'v) \<Rightarrow> 'v" where
 | 
| 32 | "value (Cex v) env = v" | | |
| 33 | "value (Vex a) env = env a" | | |
| 34 | "value (Bex f e1 e2) env = f (value e1 env) (value e2 env)" | |
| 8744 | 35 | |
| 67406 | 36 | text\<open> | 
| 8744 | 37 | The stack machine has three instructions: load a constant value onto the | 
| 10795 | 38 | stack, load the contents of an address onto the stack, and apply a | 
| 8744 | 39 | binary operation to the two topmost elements of the stack, replacing them by | 
| 69505 | 40 | the result. As for \<open>expr\<close>, addresses and values are type parameters: | 
| 67406 | 41 | \<close> | 
| 8744 | 42 | |
| 58305 
57752a91eec4
renamed 'datatype' to 'old_datatype'; 'datatype' is now alias for 'datatype_new'
 blanchet parents: 
48985diff
changeset | 43 | datatype (dead 'a, 'v) instr = Const 'v | 
| 8744 | 44 | | Load 'a | 
| 58860 | 45 | | Apply "'v binop" | 
| 8744 | 46 | |
| 67406 | 47 | text\<open> | 
| 8771 | 48 | The execution of the stack machine is modelled by a function | 
| 69505 | 49 | \<open>exec\<close> that takes a list of instructions, a store (modelled as a | 
| 8771 | 50 | function from addresses to values, just like the environment for | 
| 51 | evaluating expressions), and a stack (modelled as a list) of values, | |
| 10971 | 52 | and returns the stack at the end of the execution --- the store remains | 
| 8771 | 53 | unchanged: | 
| 67406 | 54 | \<close> | 
| 8744 | 55 | |
| 27015 | 56 | primrec exec :: "('a,'v)instr list \<Rightarrow> ('a\<Rightarrow>'v) \<Rightarrow> 'v list \<Rightarrow> 'v list"
 | 
| 57 | where | |
| 58 | "exec [] s vs = vs" | | |
| 8771 | 59 | "exec (i#is) s vs = (case i of | 
| 10171 | 60 | Const v \<Rightarrow> exec is s (v#vs) | 
| 61 | | Load a \<Rightarrow> exec is s ((s a)#vs) | |
| 27015 | 62 | | Apply f \<Rightarrow> exec is s ((f (hd vs) (hd(tl vs)))#(tl(tl vs))))" | 
| 8744 | 63 | |
| 67406 | 64 | text\<open>\noindent | 
| 69597 | 65 | Recall that \<^term>\<open>hd\<close> and \<^term>\<open>tl\<close> | 
| 8744 | 66 | return the first element and the remainder of a list. | 
| 11458 | 67 | Because all functions are total, \cdx{hd} is defined even for the empty
 | 
| 8744 | 68 | list, although we do not know what the result is. Thus our model of the | 
| 10795 | 69 | machine always terminates properly, although the definition above does not | 
| 69597 | 70 | tell us much about the result in situations where \<^term>\<open>Apply\<close> was executed | 
| 8744 | 71 | with fewer than two elements on the stack. | 
| 72 | ||
| 73 | The compiler is a function from expressions to a list of instructions. Its | |
| 10795 | 74 | definition is obvious: | 
| 67406 | 75 | \<close> | 
| 8744 | 76 | |
| 27015 | 77 | primrec compile :: "('a,'v)expr \<Rightarrow> ('a,'v)instr list" where
 | 
| 78 | "compile (Cex v) = [Const v]" | | |
| 79 | "compile (Vex a) = [Load a]" | | |
| 17212 | 80 | "compile (Bex f e1 e2) = (compile e2) @ (compile e1) @ [Apply f]" | 
| 8744 | 81 | |
| 67406 | 82 | text\<open> | 
| 8744 | 83 | Now we have to prove the correctness of the compiler, i.e.\ that the | 
| 84 | execution of a compiled expression results in the value of the expression: | |
| 67406 | 85 | \<close> | 
| 58860 | 86 | theorem "exec (compile e) s [] = [value e s]" | 
| 87 | (*<*)oops(*>*) | |
| 67406 | 88 | text\<open>\noindent | 
| 11458 | 89 | This theorem needs to be generalized: | 
| 67406 | 90 | \<close> | 
| 8744 | 91 | |
| 58860 | 92 | theorem "\<forall>vs. exec (compile e) s vs = (value e s) # vs" | 
| 8744 | 93 | |
| 67406 | 94 | txt\<open>\noindent | 
| 69597 | 95 | It will be proved by induction on \<^term>\<open>e\<close> followed by simplification. | 
| 11458 | 96 | First, we must prove a lemma about executing the concatenation of two | 
| 8744 | 97 | instruction sequences: | 
| 67406 | 98 | \<close> | 
| 58860 | 99 | (*<*)oops(*>*) | 
| 8744 | 100 | lemma exec_app[simp]: | 
| 58860 | 101 | "\<forall>vs. exec (xs@ys) s vs = exec ys s (exec xs s vs)" | 
| 8744 | 102 | |
| 67406 | 103 | txt\<open>\noindent | 
| 69597 | 104 | This requires induction on \<^term>\<open>xs\<close> and ordinary simplification for the | 
| 8744 | 105 | base cases. In the induction step, simplification leaves us with a formula | 
| 69505 | 106 | that contains two \<open>case\<close>-expressions over instructions. Thus we add | 
| 11458 | 107 | automatic case splitting, which finishes the proof: | 
| 67406 | 108 | \<close> | 
| 58860 | 109 | apply(induct_tac xs, simp, simp split: instr.split) | 
| 10171 | 110 | (*<*)done(*>*) | 
| 67406 | 111 | text\<open>\noindent | 
| 11428 | 112 | Note that because both \methdx{simp_all} and \methdx{auto} perform simplification, they can
 | 
| 69505 | 113 | be modified in the same way as \<open>simp\<close>. Thus the proof can be | 
| 8744 | 114 | rewritten as | 
| 67406 | 115 | \<close> | 
| 8744 | 116 | (*<*) | 
| 58860 | 117 | declare exec_app[simp del] | 
| 118 | lemma [simp]: "\<forall>vs. exec (xs@ys) s vs = exec ys s (exec xs s vs)" | |
| 8744 | 119 | (*>*) | 
| 58860 | 120 | apply(induct_tac xs, simp_all split: instr.split) | 
| 10171 | 121 | (*<*)done(*>*) | 
| 67406 | 122 | text\<open>\noindent | 
| 8744 | 123 | Although this is more compact, it is less clear for the reader of the proof. | 
| 124 | ||
| 69597 | 125 | We could now go back and prove \<^prop>\<open>exec (compile e) s [] = [value e s]\<close> | 
| 8744 | 126 | merely by simplification with the generalized version we just proved. | 
| 127 | However, this is unnecessary because the generalized version fully subsumes | |
| 11458 | 128 | its instance.% | 
| 129 | \index{compiling expressions example|)}
 | |
| 67406 | 130 | \<close> | 
| 8744 | 131 | (*<*) | 
| 58860 | 132 | theorem "\<forall>vs. exec (compile e) s vs = (value e s) # vs" | 
| 133 | by(induct_tac e, auto) | |
| 8744 | 134 | end | 
| 135 | (*>*) |