author | haftmann |
Tue, 21 Sep 2010 14:36:13 +0200 | |
changeset 39599 | d9c247f7afa3 |
parent 38857 | 97775f3e8722 |
child 39664 | 0afaf89ab591 |
permissions | -rw-r--r-- |
38405 | 1 |
theory Foundations |
28419 | 2 |
imports Introduction |
28213 | 3 |
begin |
4 |
||
38437 | 5 |
section {* Code generation foundations \label{sec:foundations} *} |
28419 | 6 |
|
38437 | 7 |
subsection {* Code generator architecture \label{sec:architecture} *} |
28419 | 8 |
|
9 |
text {* |
|
38437 | 10 |
The code generator is actually a framework consisting of different |
11 |
components which can be customised individually. |
|
12 |
||
13 |
Conceptually all components operate on Isabelle's logic framework |
|
14 |
@{theory Pure}. Practically, the object logic @{theory HOL} |
|
15 |
provides the necessary facilities to make use of the code generator, |
|
16 |
mainly since it is an extension of @{theory Pure}. |
|
17 |
||
18 |
The constellation of the different components is visualized in the |
|
19 |
following picture. |
|
20 |
||
21 |
\begin{figure}[h] |
|
22 |
\includegraphics{architecture} |
|
23 |
\caption{Code generator architecture} |
|
24 |
\label{fig:arch} |
|
25 |
\end{figure} |
|
26 |
||
27 |
\noindent Central to code generation is the notion of \emph{code |
|
28 |
equations}. A code equation as a first approximation is a theorem |
|
29 |
of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} (an equation headed by a |
|
30 |
constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right |
|
31 |
hand side @{text t}). |
|
32 |
||
33 |
\begin{itemize} |
|
34 |
||
35 |
\item Starting point of code generation is a collection of (raw) |
|
36 |
code equations in a theory. It is not relevant where they stem |
|
37 |
from, but typically they were either produced by specification |
|
38 |
tools or proved explicitly by the user. |
|
39 |
||
40 |
\item These raw code equations can be subjected to theorem |
|
41 |
transformations. This \qn{preprocessor} (see |
|
42 |
\secref{sec:preproc}) can apply the full expressiveness of |
|
43 |
ML-based theorem transformations to code generation. The result |
|
44 |
of preprocessing is a structured collection of code equations. |
|
45 |
||
46 |
\item These code equations are \qn{translated} to a program in an |
|
47 |
abstract intermediate language. Think of it as a kind of |
|
48 |
\qt{Mini-Haskell} with four \qn{statements}: @{text data} (for |
|
49 |
datatypes), @{text fun} (stemming from code equations), also |
|
50 |
@{text class} and @{text inst} (for type classes). |
|
51 |
||
52 |
\item Finally, the abstract program is \qn{serialised} into |
|
53 |
concrete source code of a target language. This step only |
|
54 |
produces concrete syntax but does not change the program in |
|
55 |
essence; all conceptual transformations occur in the translation |
|
56 |
step. |
|
57 |
||
58 |
\end{itemize} |
|
59 |
||
60 |
\noindent From these steps, only the last two are carried out |
|
61 |
outside the logic; by keeping this layer as thin as possible, the |
|
62 |
amount of code to trust is kept to a minimum. |
|
28419 | 63 |
*} |
64 |
||
65 |
||
66 |
subsection {* The preprocessor \label{sec:preproc} *} |
|
67 |
||
68 |
text {* |
|
38437 | 69 |
Before selected function theorems are turned into abstract code, a |
70 |
chain of definitional transformation steps is carried out: |
|
71 |
\emph{preprocessing}. The preprocessor consists of two |
|
72 |
components: a \emph{simpset} and \emph{function transformers}. |
|
28419 | 73 |
|
38437 | 74 |
The \emph{simpset} can apply the full generality of the Isabelle |
75 |
simplifier. Due to the interpretation of theorems as code |
|
32000 | 76 |
equations, rewrites are applied to the right hand side and the |
77 |
arguments of the left hand side of an equation, but never to the |
|
78 |
constant heading the left hand side. An important special case are |
|
38437 | 79 |
\emph{unfold theorems}, which may be declared and removed using the |
80 |
@{attribute code_unfold} or \emph{@{attribute code_unfold} del} |
|
34155 | 81 |
attribute, respectively. |
28213 | 82 |
|
28419 | 83 |
Some common applications: |
84 |
*} |
|
85 |
||
86 |
text_raw {* |
|
87 |
\begin{itemize} |
|
88 |
*} |
|
89 |
||
90 |
text {* |
|
91 |
\item replacing non-executable constructs by executable ones: |
|
92 |
*} |
|
93 |
||
37211 | 94 |
lemma %quote [code_unfold]: |
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
95 |
"x \<in> set xs \<longleftrightarrow> List.member xs x" by (fact in_set_member) |
28419 | 96 |
|
97 |
text {* |
|
98 |
\item replacing executable but inconvenient constructs: |
|
99 |
*} |
|
100 |
||
37211 | 101 |
lemma %quote [code_unfold]: |
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
102 |
"xs = [] \<longleftrightarrow> List.null xs" by (fact eq_Nil_null) |
28419 | 103 |
|
38437 | 104 |
text {* |
105 |
\item eliminating disturbing expressions: |
|
106 |
*} |
|
107 |
||
108 |
lemma %quote [code_unfold]: |
|
109 |
"1 = Suc 0" by (fact One_nat_def) |
|
110 |
||
28419 | 111 |
text_raw {* |
112 |
\end{itemize} |
|
113 |
*} |
|
114 |
||
115 |
text {* |
|
38437 | 116 |
\noindent \emph{Function transformers} provide a very general |
117 |
interface, transforming a list of function theorems to another list |
|
118 |
of function theorems, provided that neither the heading constant nor |
|
119 |
its type change. The @{term "0\<Colon>nat"} / @{const Suc} pattern |
|
120 |
elimination implemented in theory @{theory Efficient_Nat} (see |
|
121 |
\secref{eff_nat}) uses this interface. |
|
28419 | 122 |
|
38437 | 123 |
\noindent The current setup of the preprocessor may be inspected |
38505 | 124 |
using the @{command_def print_codeproc} command. @{command_def |
125 |
code_thms} (see \secref{sec:equations}) provides a convenient |
|
126 |
mechanism to inspect the impact of a preprocessor setup on code |
|
127 |
equations. |
|
28419 | 128 |
|
129 |
\begin{warn} |
|
32000 | 130 |
Attribute @{attribute code_unfold} also applies to the |
131 |
preprocessor of the ancient @{text "SML code generator"}; in case |
|
132 |
this is not what you intend, use @{attribute code_inline} instead. |
|
28419 | 133 |
\end{warn} |
134 |
*} |
|
135 |
||
38437 | 136 |
|
137 |
subsection {* Understanding code equations \label{sec:equations} *} |
|
28419 | 138 |
|
139 |
text {* |
|
38437 | 140 |
As told in \secref{sec:principle}, the notion of code equations is |
141 |
vital to code generation. Indeed most problems which occur in |
|
142 |
practice can be resolved by an inspection of the underlying code |
|
143 |
equations. |
|
28419 | 144 |
|
38437 | 145 |
It is possible to exchange the default code equations for constants |
146 |
by explicitly proving alternative ones: |
|
28419 | 147 |
*} |
148 |
||
38437 | 149 |
lemma %quote [code]: |
150 |
"dequeue (AQueue xs []) = |
|
151 |
(if xs = [] then (None, AQueue [] []) |
|
152 |
else dequeue (AQueue [] (rev xs)))" |
|
153 |
"dequeue (AQueue xs (y # ys)) = |
|
154 |
(Some y, AQueue xs ys)" |
|
155 |
by (cases xs, simp_all) (cases "rev xs", simp_all) |
|
28213 | 156 |
|
28419 | 157 |
text {* |
38437 | 158 |
\noindent The annotation @{text "[code]"} is an @{text attribute} |
159 |
which states that the given theorems should be considered as code |
|
160 |
equations for a @{text fun} statement -- the corresponding constant |
|
161 |
is determined syntactically. The resulting code: |
|
29798 | 162 |
*} |
29794 | 163 |
|
38437 | 164 |
text %quote {*@{code_stmts dequeue (consts) dequeue (Haskell)}*} |
29794 | 165 |
|
29798 | 166 |
text {* |
38437 | 167 |
\noindent You may note that the equality test @{term "xs = []"} has |
168 |
been replaced by the predicate @{term "List.null xs"}. This is due |
|
169 |
to the default setup of the \qn{preprocessor}. |
|
170 |
||
171 |
This possibility to select arbitrary code equations is the key |
|
172 |
technique for program and datatype refinement (see |
|
173 |
\secref{sec:refinement}. |
|
174 |
||
175 |
Due to the preprocessor, there is the distinction of raw code |
|
176 |
equations (before preprocessing) and code equations (after |
|
177 |
preprocessing). |
|
178 |
||
38505 | 179 |
The first can be listed (among other data) using the @{command_def |
180 |
print_codesetup} command. |
|
38437 | 181 |
|
182 |
The code equations after preprocessing are already are blueprint of |
|
183 |
the generated program and can be inspected using the @{command |
|
184 |
code_thms} command: |
|
29798 | 185 |
*} |
29794 | 186 |
|
38437 | 187 |
code_thms %quote dequeue |
28419 | 188 |
|
189 |
text {* |
|
38437 | 190 |
\noindent This prints a table with the code equations for @{const |
191 |
dequeue}, including \emph{all} code equations those equations depend |
|
192 |
on recursively. These dependencies themselves can be visualized using |
|
38505 | 193 |
the @{command_def code_deps} command. |
28419 | 194 |
*} |
195 |
||
28213 | 196 |
|
30938
c6c9359e474c
wellsortedness is no issue for a user manual any more
haftmann
parents:
30227
diff
changeset
|
197 |
subsection {* Equality *} |
28213 | 198 |
|
28419 | 199 |
text {* |
38437 | 200 |
Implementation of equality deserves some attention. Here an example |
201 |
function involving polymorphic equality: |
|
28419 | 202 |
*} |
203 |
||
28564 | 204 |
primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where |
28447 | 205 |
"collect_duplicates xs ys [] = xs" |
38437 | 206 |
| "collect_duplicates xs ys (z#zs) = (if z \<in> set xs |
207 |
then if z \<in> set ys |
|
208 |
then collect_duplicates xs ys zs |
|
209 |
else collect_duplicates xs (z#ys) zs |
|
210 |
else collect_duplicates (z#xs) (z#ys) zs)" |
|
28419 | 211 |
|
212 |
text {* |
|
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
213 |
\noindent During preprocessing, the membership test is rewritten, |
38437 | 214 |
resulting in @{const List.member}, which itself performs an explicit |
215 |
equality check, as can be seen in the corresponding @{text SML} code: |
|
28419 | 216 |
*} |
217 |
||
28564 | 218 |
text %quote {*@{code_stmts collect_duplicates (SML)}*} |
28419 | 219 |
|
220 |
text {* |
|
221 |
\noindent Obviously, polymorphic equality is implemented the Haskell |
|
38437 | 222 |
way using a type class. How is this achieved? HOL introduces an |
38857
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
223 |
explicit class @{class equal} with a corresponding operation @{const |
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
224 |
HOL.equal} such that @{thm equal [no_vars]}. The preprocessing |
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
225 |
framework does the rest by propagating the @{class equal} constraints |
38437 | 226 |
through all dependent code equations. For datatypes, instances of |
38857
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
227 |
@{class equal} are implicitly derived when possible. For other types, |
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
228 |
you may instantiate @{text equal} manually like any other type class. |
28419 | 229 |
*} |
230 |
||
231 |
||
38440 | 232 |
subsection {* Explicit partiality \label{sec:partiality} *} |
28462 | 233 |
|
234 |
text {* |
|
235 |
Partiality usually enters the game by partial patterns, as |
|
236 |
in the following example, again for amortised queues: |
|
237 |
*} |
|
238 |
||
29798 | 239 |
definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where |
240 |
"strict_dequeue q = (case dequeue q |
|
241 |
of (Some x, q') \<Rightarrow> (x, q'))" |
|
242 |
||
243 |
lemma %quote strict_dequeue_AQueue [code]: |
|
244 |
"strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)" |
|
245 |
"strict_dequeue (AQueue xs []) = |
|
246 |
(case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))" |
|
38437 | 247 |
by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split) |
28462 | 248 |
|
249 |
text {* |
|
250 |
\noindent In the corresponding code, there is no equation |
|
29798 | 251 |
for the pattern @{term "AQueue [] []"}: |
28462 | 252 |
*} |
253 |
||
28564 | 254 |
text %quote {*@{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}*} |
28462 | 255 |
|
256 |
text {* |
|
257 |
\noindent In some cases it is desirable to have this |
|
258 |
pseudo-\qt{partiality} more explicitly, e.g.~as follows: |
|
259 |
*} |
|
260 |
||
28564 | 261 |
axiomatization %quote empty_queue :: 'a |
28462 | 262 |
|
29798 | 263 |
definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where |
264 |
"strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)" |
|
28213 | 265 |
|
29798 | 266 |
lemma %quote strict_dequeue'_AQueue [code]: |
267 |
"strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue |
|
268 |
else strict_dequeue' (AQueue [] (rev xs)))" |
|
269 |
"strict_dequeue' (AQueue xs (y # ys)) = |
|
270 |
(y, AQueue xs ys)" |
|
38437 | 271 |
by (simp_all add: strict_dequeue'_def split: list.splits) |
28462 | 272 |
|
273 |
text {* |
|
29798 | 274 |
Observe that on the right hand side of the definition of @{const |
34155 | 275 |
"strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs. |
28462 | 276 |
|
29798 | 277 |
Normally, if constants without any code equations occur in a |
278 |
program, the code generator complains (since in most cases this is |
|
34155 | 279 |
indeed an error). But such constants can also be thought |
280 |
of as function definitions which always fail, |
|
29798 | 281 |
since there is never a successful pattern match on the left hand |
282 |
side. In order to categorise a constant into that category |
|
38505 | 283 |
explicitly, use @{command_def "code_abort"}: |
28462 | 284 |
*} |
285 |
||
28564 | 286 |
code_abort %quote empty_queue |
28462 | 287 |
|
288 |
text {* |
|
289 |
\noindent Then the code generator will just insert an error or |
|
290 |
exception at the appropriate position: |
|
291 |
*} |
|
292 |
||
28564 | 293 |
text %quote {*@{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}*} |
28462 | 294 |
|
295 |
text {* |
|
38437 | 296 |
\noindent This feature however is rarely needed in practice. Note |
297 |
also that the HOL default setup already declares @{const undefined} |
|
298 |
as @{command "code_abort"}, which is most likely to be used in such |
|
299 |
situations. |
|
300 |
*} |
|
301 |
||
302 |
||
303 |
subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *} |
|
304 |
||
305 |
text {* |
|
306 |
Under certain circumstances, the code generator fails to produce |
|
38440 | 307 |
code entirely. To debug these, the following hints may prove |
308 |
helpful: |
|
38437 | 309 |
|
310 |
\begin{description} |
|
311 |
||
38440 | 312 |
\ditem{\emph{Check with a different target language}.} Sometimes |
313 |
the situation gets more clear if you switch to another target |
|
314 |
language; the code generated there might give some hints what |
|
315 |
prevents the code generator to produce code for the desired |
|
316 |
language. |
|
38437 | 317 |
|
38440 | 318 |
\ditem{\emph{Inspect code equations}.} Code equations are the central |
319 |
carrier of code generation. Most problems occuring while generation |
|
320 |
code can be traced to single equations which are printed as part of |
|
321 |
the error message. A closer inspection of those may offer the key |
|
322 |
for solving issues (cf.~\secref{sec:equations}). |
|
38437 | 323 |
|
38440 | 324 |
\ditem{\emph{Inspect preprocessor setup}.} The preprocessor might |
325 |
transform code equations unexpectedly; to understand an |
|
326 |
inspection of its setup is necessary (cf.~\secref{sec:preproc}). |
|
38437 | 327 |
|
38440 | 328 |
\ditem{\emph{Generate exceptions}.} If the code generator |
329 |
complains about missing code equations, in can be helpful to |
|
330 |
implement the offending constants as exceptions |
|
331 |
(cf.~\secref{sec:partiality}); this allows at least for a formal |
|
332 |
generation of code, whose inspection may then give clues what is |
|
333 |
wrong. |
|
38437 | 334 |
|
38440 | 335 |
\ditem{\emph{Remove offending code equations}.} If code |
336 |
generation is prevented by just a single equation, this can be |
|
337 |
removed (cf.~\secref{sec:equations}) to allow formal code |
|
338 |
generation, whose result in turn can be used to trace the |
|
339 |
problem. The most prominent case here are mismatches in type |
|
340 |
class signatures (\qt{wellsortedness error}). |
|
38437 | 341 |
|
342 |
\end{description} |
|
28462 | 343 |
*} |
28213 | 344 |
|
345 |
end |