| author | blanchet | 
| Mon, 21 Feb 2011 14:02:07 +0100 | |
| changeset 41800 | 7f333b59d5fb | 
| parent 39745 | 3aa2bc9c5478 | 
| child 43407 | 666962d17142 | 
| permissions | -rw-r--r-- | 
| 38405 | 1 | theory Foundations | 
| 28419 | 2 | imports Introduction | 
| 28213 | 3 | begin | 
| 4 | ||
| 38437 | 5 | section {* Code generation foundations \label{sec:foundations} *}
 | 
| 28419 | 6 | |
| 38437 | 7 | subsection {* Code generator architecture \label{sec:architecture} *}
 | 
| 28419 | 8 | |
| 9 | text {*
 | |
| 38437 | 10 | The code generator is actually a framework consisting of different | 
| 11 | components which can be customised individually. | |
| 12 | ||
| 13 | Conceptually all components operate on Isabelle's logic framework | |
| 14 |   @{theory Pure}.  Practically, the object logic @{theory HOL}
 | |
| 15 | provides the necessary facilities to make use of the code generator, | |
| 16 |   mainly since it is an extension of @{theory Pure}.
 | |
| 17 | ||
| 18 | The constellation of the different components is visualized in the | |
| 19 | following picture. | |
| 20 | ||
| 21 |   \begin{figure}[h]
 | |
| 22 |     \includegraphics{architecture}
 | |
| 23 |     \caption{Code generator architecture}
 | |
| 24 |     \label{fig:arch}
 | |
| 25 |   \end{figure}
 | |
| 26 | ||
| 27 |   \noindent Central to code generation is the notion of \emph{code
 | |
| 28 | equations}. A code equation as a first approximation is a theorem | |
| 29 |   of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} (an equation headed by a
 | |
| 30 |   constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right
 | |
| 31 |   hand side @{text t}).
 | |
| 32 | ||
| 33 |   \begin{itemize}
 | |
| 34 | ||
| 35 | \item Starting point of code generation is a collection of (raw) | |
| 36 | code equations in a theory. It is not relevant where they stem | |
| 37 | from, but typically they were either produced by specification | |
| 38 | tools or proved explicitly by the user. | |
| 39 | ||
| 40 | \item These raw code equations can be subjected to theorem | |
| 41 |       transformations.  This \qn{preprocessor} (see
 | |
| 42 |       \secref{sec:preproc}) can apply the full expressiveness of
 | |
| 43 | ML-based theorem transformations to code generation. The result | |
| 44 | of preprocessing is a structured collection of code equations. | |
| 45 | ||
| 46 |     \item These code equations are \qn{translated} to a program in an
 | |
| 47 | abstract intermediate language. Think of it as a kind of | |
| 48 |       \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for
 | |
| 49 |       datatypes), @{text fun} (stemming from code equations), also
 | |
| 50 |       @{text class} and @{text inst} (for type classes).
 | |
| 51 | ||
| 52 |     \item Finally, the abstract program is \qn{serialised} into
 | |
| 53 | concrete source code of a target language. This step only | |
| 54 | produces concrete syntax but does not change the program in | |
| 55 | essence; all conceptual transformations occur in the translation | |
| 56 | step. | |
| 57 | ||
| 58 |   \end{itemize}
 | |
| 59 | ||
| 60 | \noindent From these steps, only the last two are carried out | |
| 61 | outside the logic; by keeping this layer as thin as possible, the | |
| 62 | amount of code to trust is kept to a minimum. | |
| 28419 | 63 | *} | 
| 64 | ||
| 65 | ||
| 66 | subsection {* The preprocessor \label{sec:preproc} *}
 | |
| 67 | ||
| 68 | text {*
 | |
| 38437 | 69 | Before selected function theorems are turned into abstract code, a | 
| 70 | chain of definitional transformation steps is carried out: | |
| 71 |   \emph{preprocessing}.  The preprocessor consists of two
 | |
| 72 |   components: a \emph{simpset} and \emph{function transformers}.
 | |
| 28419 | 73 | |
| 38437 | 74 |   The \emph{simpset} can apply the full generality of the Isabelle
 | 
| 75 | simplifier. Due to the interpretation of theorems as code | |
| 32000 | 76 | equations, rewrites are applied to the right hand side and the | 
| 77 | arguments of the left hand side of an equation, but never to the | |
| 78 | constant heading the left hand side. An important special case are | |
| 38437 | 79 |   \emph{unfold theorems}, which may be declared and removed using the
 | 
| 80 |   @{attribute code_unfold} or \emph{@{attribute code_unfold} del}
 | |
| 34155 | 81 | attribute, respectively. | 
| 28213 | 82 | |
| 28419 | 83 | Some common applications: | 
| 84 | *} | |
| 85 | ||
| 86 | text_raw {*
 | |
| 87 |   \begin{itemize}
 | |
| 88 | *} | |
| 89 | ||
| 90 | text {*
 | |
| 91 | \item replacing non-executable constructs by executable ones: | |
| 92 | *} | |
| 93 | ||
| 37211 | 94 | lemma %quote [code_unfold]: | 
| 37612 
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
 haftmann parents: 
37427diff
changeset | 95 | "x \<in> set xs \<longleftrightarrow> List.member xs x" by (fact in_set_member) | 
| 28419 | 96 | |
| 97 | text {*
 | |
| 98 | \item replacing executable but inconvenient constructs: | |
| 99 | *} | |
| 100 | ||
| 37211 | 101 | lemma %quote [code_unfold]: | 
| 37612 
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
 haftmann parents: 
37427diff
changeset | 102 | "xs = [] \<longleftrightarrow> List.null xs" by (fact eq_Nil_null) | 
| 28419 | 103 | |
| 38437 | 104 | text {*
 | 
| 105 | \item eliminating disturbing expressions: | |
| 106 | *} | |
| 107 | ||
| 108 | lemma %quote [code_unfold]: | |
| 109 | "1 = Suc 0" by (fact One_nat_def) | |
| 110 | ||
| 28419 | 111 | text_raw {*
 | 
| 112 |   \end{itemize}
 | |
| 113 | *} | |
| 114 | ||
| 115 | text {*
 | |
| 38437 | 116 |   \noindent \emph{Function transformers} provide a very general
 | 
| 117 | interface, transforming a list of function theorems to another list | |
| 118 | of function theorems, provided that neither the heading constant nor | |
| 119 |   its type change.  The @{term "0\<Colon>nat"} / @{const Suc} pattern
 | |
| 120 |   elimination implemented in theory @{theory Efficient_Nat} (see
 | |
| 121 |   \secref{eff_nat}) uses this interface.
 | |
| 28419 | 122 | |
| 38437 | 123 | \noindent The current setup of the preprocessor may be inspected | 
| 38505 | 124 |   using the @{command_def print_codeproc} command.  @{command_def
 | 
| 125 |   code_thms} (see \secref{sec:equations}) provides a convenient
 | |
| 126 | mechanism to inspect the impact of a preprocessor setup on code | |
| 127 | equations. | |
| 28419 | 128 | |
| 129 |   \begin{warn}
 | |
| 32000 | 130 |     Attribute @{attribute code_unfold} also applies to the
 | 
| 131 |     preprocessor of the ancient @{text "SML code generator"}; in case
 | |
| 132 |     this is not what you intend, use @{attribute code_inline} instead.
 | |
| 28419 | 133 |   \end{warn}
 | 
| 134 | *} | |
| 135 | ||
| 38437 | 136 | |
| 137 | subsection {* Understanding code equations \label{sec:equations} *}
 | |
| 28419 | 138 | |
| 139 | text {*
 | |
| 38437 | 140 |   As told in \secref{sec:principle}, the notion of code equations is
 | 
| 141 | vital to code generation. Indeed most problems which occur in | |
| 142 | practice can be resolved by an inspection of the underlying code | |
| 143 | equations. | |
| 28419 | 144 | |
| 38437 | 145 | It is possible to exchange the default code equations for constants | 
| 146 | by explicitly proving alternative ones: | |
| 28419 | 147 | *} | 
| 148 | ||
| 38437 | 149 | lemma %quote [code]: | 
| 150 | "dequeue (AQueue xs []) = | |
| 151 | (if xs = [] then (None, AQueue [] []) | |
| 152 | else dequeue (AQueue [] (rev xs)))" | |
| 153 | "dequeue (AQueue xs (y # ys)) = | |
| 154 | (Some y, AQueue xs ys)" | |
| 155 | by (cases xs, simp_all) (cases "rev xs", simp_all) | |
| 28213 | 156 | |
| 28419 | 157 | text {*
 | 
| 38437 | 158 |   \noindent The annotation @{text "[code]"} is an @{text attribute}
 | 
| 159 | which states that the given theorems should be considered as code | |
| 160 |   equations for a @{text fun} statement -- the corresponding constant
 | |
| 161 | is determined syntactically. The resulting code: | |
| 29798 | 162 | *} | 
| 29794 | 163 | |
| 39745 | 164 | text %quotetypewriter {*
 | 
| 39683 | 165 |   @{code_stmts dequeue (consts) dequeue (Haskell)}
 | 
| 39664 
0afaf89ab591
more canonical type setting of type writer code examples
 haftmann parents: 
38857diff
changeset | 166 | *} | 
| 29794 | 167 | |
| 29798 | 168 | text {*
 | 
| 38437 | 169 |   \noindent You may note that the equality test @{term "xs = []"} has
 | 
| 170 |   been replaced by the predicate @{term "List.null xs"}.  This is due
 | |
| 171 |   to the default setup of the \qn{preprocessor}.
 | |
| 172 | ||
| 173 | This possibility to select arbitrary code equations is the key | |
| 174 | technique for program and datatype refinement (see | |
| 39677 | 175 |   \secref{sec:refinement}).
 | 
| 38437 | 176 | |
| 177 | Due to the preprocessor, there is the distinction of raw code | |
| 178 | equations (before preprocessing) and code equations (after | |
| 179 | preprocessing). | |
| 180 | ||
| 38505 | 181 |   The first can be listed (among other data) using the @{command_def
 | 
| 182 | print_codesetup} command. | |
| 38437 | 183 | |
| 184 | The code equations after preprocessing are already are blueprint of | |
| 185 |   the generated program and can be inspected using the @{command
 | |
| 186 | code_thms} command: | |
| 29798 | 187 | *} | 
| 29794 | 188 | |
| 38437 | 189 | code_thms %quote dequeue | 
| 28419 | 190 | |
| 191 | text {*
 | |
| 38437 | 192 |   \noindent This prints a table with the code equations for @{const
 | 
| 193 |   dequeue}, including \emph{all} code equations those equations depend
 | |
| 194 | on recursively. These dependencies themselves can be visualized using | |
| 38505 | 195 |   the @{command_def code_deps} command.
 | 
| 28419 | 196 | *} | 
| 197 | ||
| 28213 | 198 | |
| 30938 
c6c9359e474c
wellsortedness is no issue for a user manual any more
 haftmann parents: 
30227diff
changeset | 199 | subsection {* Equality *}
 | 
| 28213 | 200 | |
| 28419 | 201 | text {*
 | 
| 38437 | 202 | Implementation of equality deserves some attention. Here an example | 
| 203 | function involving polymorphic equality: | |
| 28419 | 204 | *} | 
| 205 | ||
| 28564 | 206 | primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where | 
| 28447 | 207 | "collect_duplicates xs ys [] = xs" | 
| 38437 | 208 | | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs | 
| 209 | then if z \<in> set ys | |
| 210 | then collect_duplicates xs ys zs | |
| 211 | else collect_duplicates xs (z#ys) zs | |
| 212 | else collect_duplicates (z#xs) (z#ys) zs)" | |
| 28419 | 213 | |
| 214 | text {*
 | |
| 37612 
48fed6598be9
adapted to reorganization of auxiliary list operations; split off predicate compiler into separate theory
 haftmann parents: 
37427diff
changeset | 215 | \noindent During preprocessing, the membership test is rewritten, | 
| 38437 | 216 |   resulting in @{const List.member}, which itself performs an explicit
 | 
| 217 |   equality check, as can be seen in the corresponding @{text SML} code:
 | |
| 28419 | 218 | *} | 
| 219 | ||
| 39745 | 220 | text %quotetypewriter {*
 | 
| 39683 | 221 |   @{code_stmts collect_duplicates (SML)}
 | 
| 39664 
0afaf89ab591
more canonical type setting of type writer code examples
 haftmann parents: 
38857diff
changeset | 222 | *} | 
| 28419 | 223 | |
| 224 | text {*
 | |
| 225 | \noindent Obviously, polymorphic equality is implemented the Haskell | |
| 38437 | 226 | way using a type class. How is this achieved? HOL introduces an | 
| 38857 
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
 haftmann parents: 
38505diff
changeset | 227 |   explicit class @{class equal} with a corresponding operation @{const
 | 
| 
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
 haftmann parents: 
38505diff
changeset | 228 |   HOL.equal} such that @{thm equal [no_vars]}.  The preprocessing
 | 
| 
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
 haftmann parents: 
38505diff
changeset | 229 |   framework does the rest by propagating the @{class equal} constraints
 | 
| 38437 | 230 | through all dependent code equations. For datatypes, instances of | 
| 38857 
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
 haftmann parents: 
38505diff
changeset | 231 |   @{class equal} are implicitly derived when possible.  For other types,
 | 
| 
97775f3e8722
renamed class/constant eq to equal; tuned some instantiations
 haftmann parents: 
38505diff
changeset | 232 |   you may instantiate @{text equal} manually like any other type class.
 | 
| 28419 | 233 | *} | 
| 234 | ||
| 235 | ||
| 38440 | 236 | subsection {* Explicit partiality \label{sec:partiality} *}
 | 
| 28462 | 237 | |
| 238 | text {*
 | |
| 239 | Partiality usually enters the game by partial patterns, as | |
| 240 | in the following example, again for amortised queues: | |
| 241 | *} | |
| 242 | ||
| 29798 | 243 | definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where | 
| 244 | "strict_dequeue q = (case dequeue q | |
| 245 | of (Some x, q') \<Rightarrow> (x, q'))" | |
| 246 | ||
| 247 | lemma %quote strict_dequeue_AQueue [code]: | |
| 248 | "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)" | |
| 249 | "strict_dequeue (AQueue xs []) = | |
| 250 | (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))" | |
| 38437 | 251 | by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split) | 
| 28462 | 252 | |
| 253 | text {*
 | |
| 254 | \noindent In the corresponding code, there is no equation | |
| 29798 | 255 |   for the pattern @{term "AQueue [] []"}:
 | 
| 28462 | 256 | *} | 
| 257 | ||
| 39745 | 258 | text %quotetypewriter {*
 | 
| 39683 | 259 |   @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}
 | 
| 39664 
0afaf89ab591
more canonical type setting of type writer code examples
 haftmann parents: 
38857diff
changeset | 260 | *} | 
| 28462 | 261 | |
| 262 | text {*
 | |
| 263 | \noindent In some cases it is desirable to have this | |
| 264 |   pseudo-\qt{partiality} more explicitly, e.g.~as follows:
 | |
| 265 | *} | |
| 266 | ||
| 28564 | 267 | axiomatization %quote empty_queue :: 'a | 
| 28462 | 268 | |
| 29798 | 269 | definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where | 
| 270 | "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)" | |
| 28213 | 271 | |
| 29798 | 272 | lemma %quote strict_dequeue'_AQueue [code]: | 
| 273 | "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue | |
| 274 | else strict_dequeue' (AQueue [] (rev xs)))" | |
| 275 | "strict_dequeue' (AQueue xs (y # ys)) = | |
| 276 | (y, AQueue xs ys)" | |
| 38437 | 277 | by (simp_all add: strict_dequeue'_def split: list.splits) | 
| 28462 | 278 | |
| 279 | text {*
 | |
| 29798 | 280 |   Observe that on the right hand side of the definition of @{const
 | 
| 34155 | 281 |   "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs.
 | 
| 28462 | 282 | |
| 29798 | 283 | Normally, if constants without any code equations occur in a | 
| 284 | program, the code generator complains (since in most cases this is | |
| 34155 | 285 | indeed an error). But such constants can also be thought | 
| 286 | of as function definitions which always fail, | |
| 29798 | 287 | since there is never a successful pattern match on the left hand | 
| 288 | side. In order to categorise a constant into that category | |
| 38505 | 289 |   explicitly, use @{command_def "code_abort"}:
 | 
| 28462 | 290 | *} | 
| 291 | ||
| 28564 | 292 | code_abort %quote empty_queue | 
| 28462 | 293 | |
| 294 | text {*
 | |
| 295 | \noindent Then the code generator will just insert an error or | |
| 296 | exception at the appropriate position: | |
| 297 | *} | |
| 298 | ||
| 39745 | 299 | text %quotetypewriter {*
 | 
| 39683 | 300 |   @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}
 | 
| 39664 
0afaf89ab591
more canonical type setting of type writer code examples
 haftmann parents: 
38857diff
changeset | 301 | *} | 
| 28462 | 302 | |
| 303 | text {*
 | |
| 38437 | 304 | \noindent This feature however is rarely needed in practice. Note | 
| 305 |   also that the HOL default setup already declares @{const undefined}
 | |
| 306 |   as @{command "code_abort"}, which is most likely to be used in such
 | |
| 307 | situations. | |
| 308 | *} | |
| 309 | ||
| 310 | ||
| 311 | subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *}
 | |
| 312 | ||
| 313 | text {*
 | |
| 314 | Under certain circumstances, the code generator fails to produce | |
| 38440 | 315 | code entirely. To debug these, the following hints may prove | 
| 316 | helpful: | |
| 38437 | 317 | |
| 318 |   \begin{description}
 | |
| 319 | ||
| 38440 | 320 |     \ditem{\emph{Check with a different target language}.}  Sometimes
 | 
| 321 | the situation gets more clear if you switch to another target | |
| 322 | language; the code generated there might give some hints what | |
| 323 | prevents the code generator to produce code for the desired | |
| 324 | language. | |
| 38437 | 325 | |
| 38440 | 326 |     \ditem{\emph{Inspect code equations}.}  Code equations are the central
 | 
| 327 | carrier of code generation. Most problems occuring while generation | |
| 328 | code can be traced to single equations which are printed as part of | |
| 329 | the error message. A closer inspection of those may offer the key | |
| 330 |       for solving issues (cf.~\secref{sec:equations}).
 | |
| 38437 | 331 | |
| 38440 | 332 |     \ditem{\emph{Inspect preprocessor setup}.}  The preprocessor might
 | 
| 333 | transform code equations unexpectedly; to understand an | |
| 334 |       inspection of its setup is necessary (cf.~\secref{sec:preproc}).
 | |
| 38437 | 335 | |
| 38440 | 336 |     \ditem{\emph{Generate exceptions}.}  If the code generator
 | 
| 337 | complains about missing code equations, in can be helpful to | |
| 338 | implement the offending constants as exceptions | |
| 339 |       (cf.~\secref{sec:partiality}); this allows at least for a formal
 | |
| 340 | generation of code, whose inspection may then give clues what is | |
| 341 | wrong. | |
| 38437 | 342 | |
| 38440 | 343 |     \ditem{\emph{Remove offending code equations}.}  If code
 | 
| 344 | generation is prevented by just a single equation, this can be | |
| 345 |       removed (cf.~\secref{sec:equations}) to allow formal code
 | |
| 346 | generation, whose result in turn can be used to trace the | |
| 347 | problem. The most prominent case here are mismatches in type | |
| 348 |       class signatures (\qt{wellsortedness error}).
 | |
| 38437 | 349 | |
| 350 |   \end{description}
 | |
| 28462 | 351 | *} | 
| 28213 | 352 | |
| 353 | end |