| author | wenzelm |
| Sat, 10 Sep 2011 14:28:07 +0200 | |
| changeset 44863 | 49ea566cb3b4 |
| parent 43410 | 957617fe0765 |
| child 45211 | 3dd426ae6bea |
| permissions | -rw-r--r-- |
| 38405 | 1 |
theory Foundations |
| 28419 | 2 |
imports Introduction |
| 28213 | 3 |
begin |
4 |
||
| 38437 | 5 |
section {* Code generation foundations \label{sec:foundations} *}
|
| 28419 | 6 |
|
| 38437 | 7 |
subsection {* Code generator architecture \label{sec:architecture} *}
|
| 28419 | 8 |
|
9 |
text {*
|
|
| 38437 | 10 |
The code generator is actually a framework consisting of different |
11 |
components which can be customised individually. |
|
12 |
||
13 |
Conceptually all components operate on Isabelle's logic framework |
|
14 |
@{theory Pure}. Practically, the object logic @{theory HOL}
|
|
15 |
provides the necessary facilities to make use of the code generator, |
|
16 |
mainly since it is an extension of @{theory Pure}.
|
|
17 |
||
18 |
The constellation of the different components is visualized in the |
|
19 |
following picture. |
|
20 |
||
21 |
\begin{figure}[h]
|
|
22 |
\includegraphics{architecture}
|
|
23 |
\caption{Code generator architecture}
|
|
24 |
\label{fig:arch}
|
|
25 |
\end{figure}
|
|
26 |
||
27 |
\noindent Central to code generation is the notion of \emph{code
|
|
28 |
equations}. A code equation as a first approximation is a theorem |
|
29 |
of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} (an equation headed by a
|
|
30 |
constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right
|
|
31 |
hand side @{text t}).
|
|
32 |
||
33 |
\begin{itemize}
|
|
34 |
||
35 |
\item Starting point of code generation is a collection of (raw) |
|
36 |
code equations in a theory. It is not relevant where they stem |
|
37 |
from, but typically they were either produced by specification |
|
38 |
tools or proved explicitly by the user. |
|
39 |
||
40 |
\item These raw code equations can be subjected to theorem |
|
41 |
transformations. This \qn{preprocessor} (see
|
|
42 |
\secref{sec:preproc}) can apply the full expressiveness of
|
|
43 |
ML-based theorem transformations to code generation. The result |
|
44 |
of preprocessing is a structured collection of code equations. |
|
45 |
||
46 |
\item These code equations are \qn{translated} to a program in an
|
|
47 |
abstract intermediate language. Think of it as a kind of |
|
48 |
\qt{Mini-Haskell} with four \qn{statements}: @{text data} (for
|
|
49 |
datatypes), @{text fun} (stemming from code equations), also
|
|
50 |
@{text class} and @{text inst} (for type classes).
|
|
51 |
||
52 |
\item Finally, the abstract program is \qn{serialised} into
|
|
53 |
concrete source code of a target language. This step only |
|
54 |
produces concrete syntax but does not change the program in |
|
55 |
essence; all conceptual transformations occur in the translation |
|
56 |
step. |
|
57 |
||
58 |
\end{itemize}
|
|
59 |
||
60 |
\noindent From these steps, only the last two are carried out |
|
61 |
outside the logic; by keeping this layer as thin as possible, the |
|
62 |
amount of code to trust is kept to a minimum. |
|
| 28419 | 63 |
*} |
64 |
||
65 |
||
66 |
subsection {* The preprocessor \label{sec:preproc} *}
|
|
67 |
||
68 |
text {*
|
|
| 38437 | 69 |
Before selected function theorems are turned into abstract code, a |
70 |
chain of definitional transformation steps is carried out: |
|
71 |
\emph{preprocessing}. The preprocessor consists of two
|
|
72 |
components: a \emph{simpset} and \emph{function transformers}.
|
|
| 28419 | 73 |
|
| 38437 | 74 |
The \emph{simpset} can apply the full generality of the Isabelle
|
75 |
simplifier. Due to the interpretation of theorems as code |
|
| 32000 | 76 |
equations, rewrites are applied to the right hand side and the |
77 |
arguments of the left hand side of an equation, but never to the |
|
78 |
constant heading the left hand side. An important special case are |
|
| 38437 | 79 |
\emph{unfold theorems}, which may be declared and removed using the
|
80 |
@{attribute code_unfold} or \emph{@{attribute code_unfold} del}
|
|
| 34155 | 81 |
attribute, respectively. |
| 28213 | 82 |
|
| 28419 | 83 |
Some common applications: |
84 |
*} |
|
85 |
||
86 |
text_raw {*
|
|
87 |
\begin{itemize}
|
|
88 |
*} |
|
89 |
||
90 |
text {*
|
|
91 |
\item replacing non-executable constructs by executable ones: |
|
92 |
*} |
|
93 |
||
| 37211 | 94 |
lemma %quote [code_unfold]: |
|
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
95 |
"x \<in> set xs \<longleftrightarrow> List.member xs x" by (fact in_set_member) |
| 28419 | 96 |
|
97 |
text {*
|
|
98 |
\item replacing executable but inconvenient constructs: |
|
99 |
*} |
|
100 |
||
| 37211 | 101 |
lemma %quote [code_unfold]: |
|
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
102 |
"xs = [] \<longleftrightarrow> List.null xs" by (fact eq_Nil_null) |
| 28419 | 103 |
|
| 38437 | 104 |
text {*
|
105 |
\item eliminating disturbing expressions: |
|
106 |
*} |
|
107 |
||
108 |
lemma %quote [code_unfold]: |
|
109 |
"1 = Suc 0" by (fact One_nat_def) |
|
110 |
||
| 28419 | 111 |
text_raw {*
|
112 |
\end{itemize}
|
|
113 |
*} |
|
114 |
||
115 |
text {*
|
|
| 38437 | 116 |
\noindent \emph{Function transformers} provide a very general
|
117 |
interface, transforming a list of function theorems to another list |
|
118 |
of function theorems, provided that neither the heading constant nor |
|
119 |
its type change. The @{term "0\<Colon>nat"} / @{const Suc} pattern
|
|
120 |
elimination implemented in theory @{theory Efficient_Nat} (see
|
|
121 |
\secref{eff_nat}) uses this interface.
|
|
| 28419 | 122 |
|
| 38437 | 123 |
\noindent The current setup of the preprocessor may be inspected |
| 38505 | 124 |
using the @{command_def print_codeproc} command. @{command_def
|
125 |
code_thms} (see \secref{sec:equations}) provides a convenient
|
|
126 |
mechanism to inspect the impact of a preprocessor setup on code |
|
127 |
equations. |
|
| 28419 | 128 |
|
129 |
\begin{warn}
|
|
| 32000 | 130 |
Attribute @{attribute code_unfold} also applies to the
|
131 |
preprocessor of the ancient @{text "SML code generator"}; in case
|
|
132 |
this is not what you intend, use @{attribute code_inline} instead.
|
|
| 28419 | 133 |
\end{warn}
|
134 |
*} |
|
135 |
||
| 38437 | 136 |
|
137 |
subsection {* Understanding code equations \label{sec:equations} *}
|
|
| 28419 | 138 |
|
139 |
text {*
|
|
| 38437 | 140 |
As told in \secref{sec:principle}, the notion of code equations is
|
141 |
vital to code generation. Indeed most problems which occur in |
|
142 |
practice can be resolved by an inspection of the underlying code |
|
143 |
equations. |
|
| 28419 | 144 |
|
| 38437 | 145 |
It is possible to exchange the default code equations for constants |
146 |
by explicitly proving alternative ones: |
|
| 28419 | 147 |
*} |
148 |
||
| 38437 | 149 |
lemma %quote [code]: |
150 |
"dequeue (AQueue xs []) = |
|
151 |
(if xs = [] then (None, AQueue [] []) |
|
152 |
else dequeue (AQueue [] (rev xs)))" |
|
153 |
"dequeue (AQueue xs (y # ys)) = |
|
154 |
(Some y, AQueue xs ys)" |
|
155 |
by (cases xs, simp_all) (cases "rev xs", simp_all) |
|
| 28213 | 156 |
|
| 28419 | 157 |
text {*
|
| 38437 | 158 |
\noindent The annotation @{text "[code]"} is an @{text attribute}
|
159 |
which states that the given theorems should be considered as code |
|
160 |
equations for a @{text fun} statement -- the corresponding constant
|
|
161 |
is determined syntactically. The resulting code: |
|
| 29798 | 162 |
*} |
| 29794 | 163 |
|
| 39745 | 164 |
text %quotetypewriter {*
|
| 39683 | 165 |
@{code_stmts dequeue (consts) dequeue (Haskell)}
|
|
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
38857
diff
changeset
|
166 |
*} |
| 29794 | 167 |
|
| 29798 | 168 |
text {*
|
| 38437 | 169 |
\noindent You may note that the equality test @{term "xs = []"} has
|
170 |
been replaced by the predicate @{term "List.null xs"}. This is due
|
|
171 |
to the default setup of the \qn{preprocessor}.
|
|
172 |
||
173 |
This possibility to select arbitrary code equations is the key |
|
174 |
technique for program and datatype refinement (see |
|
| 39677 | 175 |
\secref{sec:refinement}).
|
| 38437 | 176 |
|
177 |
Due to the preprocessor, there is the distinction of raw code |
|
178 |
equations (before preprocessing) and code equations (after |
|
179 |
preprocessing). |
|
180 |
||
| 38505 | 181 |
The first can be listed (among other data) using the @{command_def
|
182 |
print_codesetup} command. |
|
| 38437 | 183 |
|
184 |
The code equations after preprocessing are already are blueprint of |
|
185 |
the generated program and can be inspected using the @{command
|
|
186 |
code_thms} command: |
|
| 29798 | 187 |
*} |
| 29794 | 188 |
|
| 38437 | 189 |
code_thms %quote dequeue |
| 28419 | 190 |
|
191 |
text {*
|
|
| 38437 | 192 |
\noindent This prints a table with the code equations for @{const
|
193 |
dequeue}, including \emph{all} code equations those equations depend
|
|
194 |
on recursively. These dependencies themselves can be visualized using |
|
| 38505 | 195 |
the @{command_def code_deps} command.
|
| 28419 | 196 |
*} |
197 |
||
| 28213 | 198 |
|
|
30938
c6c9359e474c
wellsortedness is no issue for a user manual any more
haftmann
parents:
30227
diff
changeset
|
199 |
subsection {* Equality *}
|
| 28213 | 200 |
|
| 28419 | 201 |
text {*
|
| 38437 | 202 |
Implementation of equality deserves some attention. Here an example |
203 |
function involving polymorphic equality: |
|
| 28419 | 204 |
*} |
205 |
||
| 28564 | 206 |
primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where |
| 28447 | 207 |
"collect_duplicates xs ys [] = xs" |
| 38437 | 208 |
| "collect_duplicates xs ys (z#zs) = (if z \<in> set xs |
209 |
then if z \<in> set ys |
|
210 |
then collect_duplicates xs ys zs |
|
211 |
else collect_duplicates xs (z#ys) zs |
|
212 |
else collect_duplicates (z#xs) (z#ys) zs)" |
|
| 28419 | 213 |
|
214 |
text {*
|
|
|
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
215 |
\noindent During preprocessing, the membership test is rewritten, |
| 38437 | 216 |
resulting in @{const List.member}, which itself performs an explicit
|
217 |
equality check, as can be seen in the corresponding @{text SML} code:
|
|
| 28419 | 218 |
*} |
219 |
||
| 39745 | 220 |
text %quotetypewriter {*
|
| 39683 | 221 |
@{code_stmts collect_duplicates (SML)}
|
|
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
38857
diff
changeset
|
222 |
*} |
| 28419 | 223 |
|
224 |
text {*
|
|
225 |
\noindent Obviously, polymorphic equality is implemented the Haskell |
|
| 38437 | 226 |
way using a type class. How is this achieved? HOL introduces an |
|
38857
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
227 |
explicit class @{class equal} with a corresponding operation @{const
|
|
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
228 |
HOL.equal} such that @{thm equal [no_vars]}. The preprocessing
|
|
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
229 |
framework does the rest by propagating the @{class equal} constraints
|
| 38437 | 230 |
through all dependent code equations. For datatypes, instances of |
|
38857
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
231 |
@{class equal} are implicitly derived when possible. For other types,
|
|
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
232 |
you may instantiate @{text equal} manually like any other type class.
|
| 28419 | 233 |
*} |
234 |
||
235 |
||
| 38440 | 236 |
subsection {* Explicit partiality \label{sec:partiality} *}
|
| 28462 | 237 |
|
238 |
text {*
|
|
239 |
Partiality usually enters the game by partial patterns, as |
|
240 |
in the following example, again for amortised queues: |
|
241 |
*} |
|
242 |
||
| 29798 | 243 |
definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where |
244 |
"strict_dequeue q = (case dequeue q |
|
245 |
of (Some x, q') \<Rightarrow> (x, q'))" |
|
246 |
||
247 |
lemma %quote strict_dequeue_AQueue [code]: |
|
248 |
"strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)" |
|
249 |
"strict_dequeue (AQueue xs []) = |
|
250 |
(case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))" |
|
| 38437 | 251 |
by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split) |
| 28462 | 252 |
|
253 |
text {*
|
|
254 |
\noindent In the corresponding code, there is no equation |
|
| 29798 | 255 |
for the pattern @{term "AQueue [] []"}:
|
| 28462 | 256 |
*} |
257 |
||
| 39745 | 258 |
text %quotetypewriter {*
|
| 39683 | 259 |
@{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}
|
|
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
38857
diff
changeset
|
260 |
*} |
| 28462 | 261 |
|
262 |
text {*
|
|
263 |
\noindent In some cases it is desirable to have this |
|
264 |
pseudo-\qt{partiality} more explicitly, e.g.~as follows:
|
|
265 |
*} |
|
266 |
||
| 28564 | 267 |
axiomatization %quote empty_queue :: 'a |
| 28462 | 268 |
|
| 29798 | 269 |
definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where |
270 |
"strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)" |
|
| 28213 | 271 |
|
| 29798 | 272 |
lemma %quote strict_dequeue'_AQueue [code]: |
273 |
"strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue |
|
274 |
else strict_dequeue' (AQueue [] (rev xs)))" |
|
275 |
"strict_dequeue' (AQueue xs (y # ys)) = |
|
276 |
(y, AQueue xs ys)" |
|
| 38437 | 277 |
by (simp_all add: strict_dequeue'_def split: list.splits) |
| 28462 | 278 |
|
279 |
text {*
|
|
| 29798 | 280 |
Observe that on the right hand side of the definition of @{const
|
| 34155 | 281 |
"strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs.
|
| 28462 | 282 |
|
| 29798 | 283 |
Normally, if constants without any code equations occur in a |
284 |
program, the code generator complains (since in most cases this is |
|
| 34155 | 285 |
indeed an error). But such constants can also be thought |
286 |
of as function definitions which always fail, |
|
| 29798 | 287 |
since there is never a successful pattern match on the left hand |
288 |
side. In order to categorise a constant into that category |
|
| 38505 | 289 |
explicitly, use @{command_def "code_abort"}:
|
| 28462 | 290 |
*} |
291 |
||
| 28564 | 292 |
code_abort %quote empty_queue |
| 28462 | 293 |
|
294 |
text {*
|
|
295 |
\noindent Then the code generator will just insert an error or |
|
296 |
exception at the appropriate position: |
|
297 |
*} |
|
298 |
||
| 39745 | 299 |
text %quotetypewriter {*
|
| 39683 | 300 |
@{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}
|
|
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
38857
diff
changeset
|
301 |
*} |
| 28462 | 302 |
|
303 |
text {*
|
|
| 38437 | 304 |
\noindent This feature however is rarely needed in practice. Note |
305 |
also that the HOL default setup already declares @{const undefined}
|
|
306 |
as @{command "code_abort"}, which is most likely to be used in such
|
|
307 |
situations. |
|
308 |
*} |
|
309 |
||
310 |
||
311 |
subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *}
|
|
312 |
||
313 |
text {*
|
|
314 |
Under certain circumstances, the code generator fails to produce |
|
| 38440 | 315 |
code entirely. To debug these, the following hints may prove |
316 |
helpful: |
|
| 38437 | 317 |
|
318 |
\begin{description}
|
|
319 |
||
| 38440 | 320 |
\ditem{\emph{Check with a different target language}.} Sometimes
|
321 |
the situation gets more clear if you switch to another target |
|
322 |
language; the code generated there might give some hints what |
|
323 |
prevents the code generator to produce code for the desired |
|
324 |
language. |
|
| 38437 | 325 |
|
| 38440 | 326 |
\ditem{\emph{Inspect code equations}.} Code equations are the central
|
| 43410 | 327 |
carrier of code generation. Most problems occurring while generating |
| 38440 | 328 |
code can be traced to single equations which are printed as part of |
329 |
the error message. A closer inspection of those may offer the key |
|
330 |
for solving issues (cf.~\secref{sec:equations}).
|
|
| 38437 | 331 |
|
| 38440 | 332 |
\ditem{\emph{Inspect preprocessor setup}.} The preprocessor might
|
333 |
transform code equations unexpectedly; to understand an |
|
334 |
inspection of its setup is necessary (cf.~\secref{sec:preproc}).
|
|
| 38437 | 335 |
|
| 38440 | 336 |
\ditem{\emph{Generate exceptions}.} If the code generator
|
337 |
complains about missing code equations, in can be helpful to |
|
338 |
implement the offending constants as exceptions |
|
339 |
(cf.~\secref{sec:partiality}); this allows at least for a formal
|
|
340 |
generation of code, whose inspection may then give clues what is |
|
341 |
wrong. |
|
| 38437 | 342 |
|
| 38440 | 343 |
\ditem{\emph{Remove offending code equations}.} If code
|
344 |
generation is prevented by just a single equation, this can be |
|
345 |
removed (cf.~\secref{sec:equations}) to allow formal code
|
|
346 |
generation, whose result in turn can be used to trace the |
|
347 |
problem. The most prominent case here are mismatches in type |
|
348 |
class signatures (\qt{wellsortedness error}).
|
|
| 38437 | 349 |
|
350 |
\end{description}
|
|
| 28462 | 351 |
*} |
| 28213 | 352 |
|
353 |
end |