author | haftmann |
Wed, 19 Oct 2011 23:07:48 +0200 | |
changeset 45211 | 3dd426ae6bea |
parent 43410 | 957617fe0765 |
child 46521 | addcdf0dd283 |
permissions | -rw-r--r-- |
38405 | 1 |
theory Foundations |
28419 | 2 |
imports Introduction |
28213 | 3 |
begin |
4 |
||
38437 | 5 |
section {* Code generation foundations \label{sec:foundations} *} |
28419 | 6 |
|
38437 | 7 |
subsection {* Code generator architecture \label{sec:architecture} *} |
28419 | 8 |
|
9 |
text {* |
|
38437 | 10 |
The code generator is actually a framework consisting of different |
11 |
components which can be customised individually. |
|
12 |
||
13 |
Conceptually all components operate on Isabelle's logic framework |
|
14 |
@{theory Pure}. Practically, the object logic @{theory HOL} |
|
15 |
provides the necessary facilities to make use of the code generator, |
|
16 |
mainly since it is an extension of @{theory Pure}. |
|
17 |
||
18 |
The constellation of the different components is visualized in the |
|
19 |
following picture. |
|
20 |
||
21 |
\begin{figure}[h] |
|
22 |
\includegraphics{architecture} |
|
23 |
\caption{Code generator architecture} |
|
24 |
\label{fig:arch} |
|
25 |
\end{figure} |
|
26 |
||
27 |
\noindent Central to code generation is the notion of \emph{code |
|
28 |
equations}. A code equation as a first approximation is a theorem |
|
29 |
of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} (an equation headed by a |
|
30 |
constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right |
|
31 |
hand side @{text t}). |
|
32 |
||
33 |
\begin{itemize} |
|
34 |
||
35 |
\item Starting point of code generation is a collection of (raw) |
|
36 |
code equations in a theory. It is not relevant where they stem |
|
37 |
from, but typically they were either produced by specification |
|
38 |
tools or proved explicitly by the user. |
|
39 |
||
40 |
\item These raw code equations can be subjected to theorem |
|
41 |
transformations. This \qn{preprocessor} (see |
|
42 |
\secref{sec:preproc}) can apply the full expressiveness of |
|
43 |
ML-based theorem transformations to code generation. The result |
|
44 |
of preprocessing is a structured collection of code equations. |
|
45 |
||
46 |
\item These code equations are \qn{translated} to a program in an |
|
47 |
abstract intermediate language. Think of it as a kind of |
|
48 |
\qt{Mini-Haskell} with four \qn{statements}: @{text data} (for |
|
49 |
datatypes), @{text fun} (stemming from code equations), also |
|
50 |
@{text class} and @{text inst} (for type classes). |
|
51 |
||
52 |
\item Finally, the abstract program is \qn{serialised} into |
|
53 |
concrete source code of a target language. This step only |
|
54 |
produces concrete syntax but does not change the program in |
|
55 |
essence; all conceptual transformations occur in the translation |
|
56 |
step. |
|
57 |
||
58 |
\end{itemize} |
|
59 |
||
60 |
\noindent From these steps, only the last two are carried out |
|
61 |
outside the logic; by keeping this layer as thin as possible, the |
|
62 |
amount of code to trust is kept to a minimum. |
|
28419 | 63 |
*} |
64 |
||
65 |
||
66 |
subsection {* The preprocessor \label{sec:preproc} *} |
|
67 |
||
68 |
text {* |
|
38437 | 69 |
Before selected function theorems are turned into abstract code, a |
70 |
chain of definitional transformation steps is carried out: |
|
71 |
\emph{preprocessing}. The preprocessor consists of two |
|
72 |
components: a \emph{simpset} and \emph{function transformers}. |
|
28419 | 73 |
|
38437 | 74 |
The \emph{simpset} can apply the full generality of the Isabelle |
75 |
simplifier. Due to the interpretation of theorems as code |
|
32000 | 76 |
equations, rewrites are applied to the right hand side and the |
77 |
arguments of the left hand side of an equation, but never to the |
|
78 |
constant heading the left hand side. An important special case are |
|
38437 | 79 |
\emph{unfold theorems}, which may be declared and removed using the |
80 |
@{attribute code_unfold} or \emph{@{attribute code_unfold} del} |
|
34155 | 81 |
attribute, respectively. |
28213 | 82 |
|
28419 | 83 |
Some common applications: |
84 |
*} |
|
85 |
||
86 |
text_raw {* |
|
87 |
\begin{itemize} |
|
88 |
*} |
|
89 |
||
90 |
text {* |
|
91 |
\item replacing non-executable constructs by executable ones: |
|
92 |
*} |
|
93 |
||
37211 | 94 |
lemma %quote [code_unfold]: |
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
95 |
"x \<in> set xs \<longleftrightarrow> List.member xs x" by (fact in_set_member) |
28419 | 96 |
|
97 |
text {* |
|
98 |
\item replacing executable but inconvenient constructs: |
|
99 |
*} |
|
100 |
||
37211 | 101 |
lemma %quote [code_unfold]: |
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
102 |
"xs = [] \<longleftrightarrow> List.null xs" by (fact eq_Nil_null) |
28419 | 103 |
|
38437 | 104 |
text {* |
105 |
\item eliminating disturbing expressions: |
|
106 |
*} |
|
107 |
||
108 |
lemma %quote [code_unfold]: |
|
109 |
"1 = Suc 0" by (fact One_nat_def) |
|
110 |
||
28419 | 111 |
text_raw {* |
112 |
\end{itemize} |
|
113 |
*} |
|
114 |
||
115 |
text {* |
|
38437 | 116 |
\noindent \emph{Function transformers} provide a very general |
117 |
interface, transforming a list of function theorems to another list |
|
118 |
of function theorems, provided that neither the heading constant nor |
|
119 |
its type change. The @{term "0\<Colon>nat"} / @{const Suc} pattern |
|
120 |
elimination implemented in theory @{theory Efficient_Nat} (see |
|
121 |
\secref{eff_nat}) uses this interface. |
|
28419 | 122 |
|
38437 | 123 |
\noindent The current setup of the preprocessor may be inspected |
38505 | 124 |
using the @{command_def print_codeproc} command. @{command_def |
125 |
code_thms} (see \secref{sec:equations}) provides a convenient |
|
126 |
mechanism to inspect the impact of a preprocessor setup on code |
|
127 |
equations. |
|
28419 | 128 |
*} |
129 |
||
38437 | 130 |
|
131 |
subsection {* Understanding code equations \label{sec:equations} *} |
|
28419 | 132 |
|
133 |
text {* |
|
38437 | 134 |
As told in \secref{sec:principle}, the notion of code equations is |
135 |
vital to code generation. Indeed most problems which occur in |
|
136 |
practice can be resolved by an inspection of the underlying code |
|
137 |
equations. |
|
28419 | 138 |
|
38437 | 139 |
It is possible to exchange the default code equations for constants |
140 |
by explicitly proving alternative ones: |
|
28419 | 141 |
*} |
142 |
||
38437 | 143 |
lemma %quote [code]: |
144 |
"dequeue (AQueue xs []) = |
|
145 |
(if xs = [] then (None, AQueue [] []) |
|
146 |
else dequeue (AQueue [] (rev xs)))" |
|
147 |
"dequeue (AQueue xs (y # ys)) = |
|
148 |
(Some y, AQueue xs ys)" |
|
149 |
by (cases xs, simp_all) (cases "rev xs", simp_all) |
|
28213 | 150 |
|
28419 | 151 |
text {* |
38437 | 152 |
\noindent The annotation @{text "[code]"} is an @{text attribute} |
153 |
which states that the given theorems should be considered as code |
|
154 |
equations for a @{text fun} statement -- the corresponding constant |
|
155 |
is determined syntactically. The resulting code: |
|
29798 | 156 |
*} |
29794 | 157 |
|
39745 | 158 |
text %quotetypewriter {* |
39683 | 159 |
@{code_stmts dequeue (consts) dequeue (Haskell)} |
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
38857
diff
changeset
|
160 |
*} |
29794 | 161 |
|
29798 | 162 |
text {* |
38437 | 163 |
\noindent You may note that the equality test @{term "xs = []"} has |
164 |
been replaced by the predicate @{term "List.null xs"}. This is due |
|
165 |
to the default setup of the \qn{preprocessor}. |
|
166 |
||
167 |
This possibility to select arbitrary code equations is the key |
|
168 |
technique for program and datatype refinement (see |
|
39677 | 169 |
\secref{sec:refinement}). |
38437 | 170 |
|
171 |
Due to the preprocessor, there is the distinction of raw code |
|
172 |
equations (before preprocessing) and code equations (after |
|
173 |
preprocessing). |
|
174 |
||
38505 | 175 |
The first can be listed (among other data) using the @{command_def |
176 |
print_codesetup} command. |
|
38437 | 177 |
|
178 |
The code equations after preprocessing are already are blueprint of |
|
179 |
the generated program and can be inspected using the @{command |
|
180 |
code_thms} command: |
|
29798 | 181 |
*} |
29794 | 182 |
|
38437 | 183 |
code_thms %quote dequeue |
28419 | 184 |
|
185 |
text {* |
|
38437 | 186 |
\noindent This prints a table with the code equations for @{const |
187 |
dequeue}, including \emph{all} code equations those equations depend |
|
188 |
on recursively. These dependencies themselves can be visualized using |
|
38505 | 189 |
the @{command_def code_deps} command. |
28419 | 190 |
*} |
191 |
||
28213 | 192 |
|
30938
c6c9359e474c
wellsortedness is no issue for a user manual any more
haftmann
parents:
30227
diff
changeset
|
193 |
subsection {* Equality *} |
28213 | 194 |
|
28419 | 195 |
text {* |
38437 | 196 |
Implementation of equality deserves some attention. Here an example |
197 |
function involving polymorphic equality: |
|
28419 | 198 |
*} |
199 |
||
28564 | 200 |
primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where |
28447 | 201 |
"collect_duplicates xs ys [] = xs" |
38437 | 202 |
| "collect_duplicates xs ys (z#zs) = (if z \<in> set xs |
203 |
then if z \<in> set ys |
|
204 |
then collect_duplicates xs ys zs |
|
205 |
else collect_duplicates xs (z#ys) zs |
|
206 |
else collect_duplicates (z#xs) (z#ys) zs)" |
|
28419 | 207 |
|
208 |
text {* |
|
37612
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
haftmann
parents:
37427
diff
changeset
|
209 |
\noindent During preprocessing, the membership test is rewritten, |
38437 | 210 |
resulting in @{const List.member}, which itself performs an explicit |
211 |
equality check, as can be seen in the corresponding @{text SML} code: |
|
28419 | 212 |
*} |
213 |
||
39745 | 214 |
text %quotetypewriter {* |
39683 | 215 |
@{code_stmts collect_duplicates (SML)} |
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
38857
diff
changeset
|
216 |
*} |
28419 | 217 |
|
218 |
text {* |
|
219 |
\noindent Obviously, polymorphic equality is implemented the Haskell |
|
38437 | 220 |
way using a type class. How is this achieved? HOL introduces an |
38857
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
221 |
explicit class @{class equal} with a corresponding operation @{const |
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
222 |
HOL.equal} such that @{thm equal [no_vars]}. The preprocessing |
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
223 |
framework does the rest by propagating the @{class equal} constraints |
38437 | 224 |
through all dependent code equations. For datatypes, instances of |
38857
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
225 |
@{class equal} are implicitly derived when possible. For other types, |
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
haftmann
parents:
38505
diff
changeset
|
226 |
you may instantiate @{text equal} manually like any other type class. |
28419 | 227 |
*} |
228 |
||
229 |
||
38440 | 230 |
subsection {* Explicit partiality \label{sec:partiality} *} |
28462 | 231 |
|
232 |
text {* |
|
233 |
Partiality usually enters the game by partial patterns, as |
|
234 |
in the following example, again for amortised queues: |
|
235 |
*} |
|
236 |
||
29798 | 237 |
definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where |
238 |
"strict_dequeue q = (case dequeue q |
|
239 |
of (Some x, q') \<Rightarrow> (x, q'))" |
|
240 |
||
241 |
lemma %quote strict_dequeue_AQueue [code]: |
|
242 |
"strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)" |
|
243 |
"strict_dequeue (AQueue xs []) = |
|
244 |
(case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))" |
|
38437 | 245 |
by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split) |
28462 | 246 |
|
247 |
text {* |
|
248 |
\noindent In the corresponding code, there is no equation |
|
29798 | 249 |
for the pattern @{term "AQueue [] []"}: |
28462 | 250 |
*} |
251 |
||
39745 | 252 |
text %quotetypewriter {* |
39683 | 253 |
@{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)} |
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
38857
diff
changeset
|
254 |
*} |
28462 | 255 |
|
256 |
text {* |
|
257 |
\noindent In some cases it is desirable to have this |
|
258 |
pseudo-\qt{partiality} more explicitly, e.g.~as follows: |
|
259 |
*} |
|
260 |
||
28564 | 261 |
axiomatization %quote empty_queue :: 'a |
28462 | 262 |
|
29798 | 263 |
definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where |
264 |
"strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)" |
|
28213 | 265 |
|
29798 | 266 |
lemma %quote strict_dequeue'_AQueue [code]: |
267 |
"strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue |
|
268 |
else strict_dequeue' (AQueue [] (rev xs)))" |
|
269 |
"strict_dequeue' (AQueue xs (y # ys)) = |
|
270 |
(y, AQueue xs ys)" |
|
38437 | 271 |
by (simp_all add: strict_dequeue'_def split: list.splits) |
28462 | 272 |
|
273 |
text {* |
|
29798 | 274 |
Observe that on the right hand side of the definition of @{const |
34155 | 275 |
"strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs. |
28462 | 276 |
|
29798 | 277 |
Normally, if constants without any code equations occur in a |
278 |
program, the code generator complains (since in most cases this is |
|
34155 | 279 |
indeed an error). But such constants can also be thought |
280 |
of as function definitions which always fail, |
|
29798 | 281 |
since there is never a successful pattern match on the left hand |
282 |
side. In order to categorise a constant into that category |
|
38505 | 283 |
explicitly, use @{command_def "code_abort"}: |
28462 | 284 |
*} |
285 |
||
28564 | 286 |
code_abort %quote empty_queue |
28462 | 287 |
|
288 |
text {* |
|
289 |
\noindent Then the code generator will just insert an error or |
|
290 |
exception at the appropriate position: |
|
291 |
*} |
|
292 |
||
39745 | 293 |
text %quotetypewriter {* |
39683 | 294 |
@{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)} |
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
38857
diff
changeset
|
295 |
*} |
28462 | 296 |
|
297 |
text {* |
|
38437 | 298 |
\noindent This feature however is rarely needed in practice. Note |
299 |
also that the HOL default setup already declares @{const undefined} |
|
300 |
as @{command "code_abort"}, which is most likely to be used in such |
|
301 |
situations. |
|
302 |
*} |
|
303 |
||
304 |
||
305 |
subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *} |
|
306 |
||
307 |
text {* |
|
308 |
Under certain circumstances, the code generator fails to produce |
|
38440 | 309 |
code entirely. To debug these, the following hints may prove |
310 |
helpful: |
|
38437 | 311 |
|
312 |
\begin{description} |
|
313 |
||
38440 | 314 |
\ditem{\emph{Check with a different target language}.} Sometimes |
315 |
the situation gets more clear if you switch to another target |
|
316 |
language; the code generated there might give some hints what |
|
317 |
prevents the code generator to produce code for the desired |
|
318 |
language. |
|
38437 | 319 |
|
38440 | 320 |
\ditem{\emph{Inspect code equations}.} Code equations are the central |
43410 | 321 |
carrier of code generation. Most problems occurring while generating |
38440 | 322 |
code can be traced to single equations which are printed as part of |
323 |
the error message. A closer inspection of those may offer the key |
|
324 |
for solving issues (cf.~\secref{sec:equations}). |
|
38437 | 325 |
|
38440 | 326 |
\ditem{\emph{Inspect preprocessor setup}.} The preprocessor might |
327 |
transform code equations unexpectedly; to understand an |
|
328 |
inspection of its setup is necessary (cf.~\secref{sec:preproc}). |
|
38437 | 329 |
|
38440 | 330 |
\ditem{\emph{Generate exceptions}.} If the code generator |
331 |
complains about missing code equations, in can be helpful to |
|
332 |
implement the offending constants as exceptions |
|
333 |
(cf.~\secref{sec:partiality}); this allows at least for a formal |
|
334 |
generation of code, whose inspection may then give clues what is |
|
335 |
wrong. |
|
38437 | 336 |
|
38440 | 337 |
\ditem{\emph{Remove offending code equations}.} If code |
338 |
generation is prevented by just a single equation, this can be |
|
339 |
removed (cf.~\secref{sec:equations}) to allow formal code |
|
340 |
generation, whose result in turn can be used to trace the |
|
341 |
problem. The most prominent case here are mismatches in type |
|
342 |
class signatures (\qt{wellsortedness error}). |
|
38437 | 343 |
|
344 |
\end{description} |
|
28462 | 345 |
*} |
28213 | 346 |
|
347 |
end |