diff -r f51d4a302962 -r 5386df44a037 src/Doc/Codegen/Foundations.thy --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/Doc/Codegen/Foundations.thy Tue Aug 28 18:57:32 2012 +0200 @@ -0,0 +1,347 @@ +theory Foundations +imports Introduction +begin + +section {* Code generation foundations \label{sec:foundations} *} + +subsection {* Code generator architecture \label{sec:architecture} *} + +text {* + The code generator is actually a framework consisting of different + components which can be customised individually. + + Conceptually all components operate on Isabelle's logic framework + @{theory Pure}. Practically, the object logic @{theory HOL} + provides the necessary facilities to make use of the code generator, + mainly since it is an extension of @{theory Pure}. + + The constellation of the different components is visualized in the + following picture. + + \begin{figure}[h] + \includegraphics{architecture} + \caption{Code generator architecture} + \label{fig:arch} + \end{figure} + + \noindent Central to code generation is the notion of \emph{code + equations}. A code equation as a first approximation is a theorem + of the form @{text "f t\<^isub>1 t\<^isub>2 \ t\<^isub>n \ t"} (an equation headed by a + constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \ t\<^isub>n"} and right + hand side @{text t}). + + \begin{itemize} + + \item Starting point of code generation is a collection of (raw) + code equations in a theory. It is not relevant where they stem + from, but typically they were either produced by specification + tools or proved explicitly by the user. + + \item These raw code equations can be subjected to theorem + transformations. This \qn{preprocessor} (see + \secref{sec:preproc}) can apply the full expressiveness of + ML-based theorem transformations to code generation. The result + of preprocessing is a structured collection of code equations. + + \item These code equations are \qn{translated} to a program in an + abstract intermediate language. Think of it as a kind of + \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for + datatypes), @{text fun} (stemming from code equations), also + @{text class} and @{text inst} (for type classes). + + \item Finally, the abstract program is \qn{serialised} into + concrete source code of a target language. This step only + produces concrete syntax but does not change the program in + essence; all conceptual transformations occur in the translation + step. + + \end{itemize} + + \noindent From these steps, only the last two are carried out + outside the logic; by keeping this layer as thin as possible, the + amount of code to trust is kept to a minimum. +*} + + +subsection {* The preprocessor \label{sec:preproc} *} + +text {* + Before selected function theorems are turned into abstract code, a + chain of definitional transformation steps is carried out: + \emph{preprocessing}. The preprocessor consists of two + components: a \emph{simpset} and \emph{function transformers}. + + The \emph{simpset} can apply the full generality of the Isabelle + simplifier. Due to the interpretation of theorems as code + equations, rewrites are applied to the right hand side and the + arguments of the left hand side of an equation, but never to the + constant heading the left hand side. An important special case are + \emph{unfold theorems}, which may be declared and removed using the + @{attribute code_unfold} or \emph{@{attribute code_unfold} del} + attribute, respectively. + + Some common applications: +*} + +text_raw {* + \begin{itemize} +*} + +text {* + \item replacing non-executable constructs by executable ones: +*} + +lemma %quote [code_unfold]: + "x \ set xs \ List.member xs x" by (fact in_set_member) + +text {* + \item replacing executable but inconvenient constructs: +*} + +lemma %quote [code_unfold]: + "xs = [] \ List.null xs" by (fact eq_Nil_null) + +text {* + \item eliminating disturbing expressions: +*} + +lemma %quote [code_unfold]: + "1 = Suc 0" by (fact One_nat_def) + +text_raw {* + \end{itemize} +*} + +text {* + \noindent \emph{Function transformers} provide a very general + interface, transforming a list of function theorems to another list + of function theorems, provided that neither the heading constant nor + its type change. The @{term "0\nat"} / @{const Suc} pattern + elimination implemented in theory @{text Efficient_Nat} (see + \secref{eff_nat}) uses this interface. + + \noindent The current setup of the preprocessor may be inspected + using the @{command_def print_codeproc} command. @{command_def + code_thms} (see \secref{sec:equations}) provides a convenient + mechanism to inspect the impact of a preprocessor setup on code + equations. +*} + + +subsection {* Understanding code equations \label{sec:equations} *} + +text {* + As told in \secref{sec:principle}, the notion of code equations is + vital to code generation. Indeed most problems which occur in + practice can be resolved by an inspection of the underlying code + equations. + + It is possible to exchange the default code equations for constants + by explicitly proving alternative ones: +*} + +lemma %quote [code]: + "dequeue (AQueue xs []) = + (if xs = [] then (None, AQueue [] []) + else dequeue (AQueue [] (rev xs)))" + "dequeue (AQueue xs (y # ys)) = + (Some y, AQueue xs ys)" + by (cases xs, simp_all) (cases "rev xs", simp_all) + +text {* + \noindent The annotation @{text "[code]"} is an @{text attribute} + which states that the given theorems should be considered as code + equations for a @{text fun} statement -- the corresponding constant + is determined syntactically. The resulting code: +*} + +text %quotetypewriter {* + @{code_stmts dequeue (consts) dequeue (Haskell)} +*} + +text {* + \noindent You may note that the equality test @{term "xs = []"} has + been replaced by the predicate @{term "List.null xs"}. This is due + to the default setup of the \qn{preprocessor}. + + This possibility to select arbitrary code equations is the key + technique for program and datatype refinement (see + \secref{sec:refinement}). + + Due to the preprocessor, there is the distinction of raw code + equations (before preprocessing) and code equations (after + preprocessing). + + The first can be listed (among other data) using the @{command_def + print_codesetup} command. + + The code equations after preprocessing are already are blueprint of + the generated program and can be inspected using the @{command + code_thms} command: +*} + +code_thms %quote dequeue + +text {* + \noindent This prints a table with the code equations for @{const + dequeue}, including \emph{all} code equations those equations depend + on recursively. These dependencies themselves can be visualized using + the @{command_def code_deps} command. +*} + + +subsection {* Equality *} + +text {* + Implementation of equality deserves some attention. Here an example + function involving polymorphic equality: +*} + +primrec %quote collect_duplicates :: "'a list \ 'a list \ 'a list \ 'a list" where + "collect_duplicates xs ys [] = xs" +| "collect_duplicates xs ys (z#zs) = (if z \ set xs + then if z \ set ys + then collect_duplicates xs ys zs + else collect_duplicates xs (z#ys) zs + else collect_duplicates (z#xs) (z#ys) zs)" + +text {* + \noindent During preprocessing, the membership test is rewritten, + resulting in @{const List.member}, which itself performs an explicit + equality check, as can be seen in the corresponding @{text SML} code: +*} + +text %quotetypewriter {* + @{code_stmts collect_duplicates (SML)} +*} + +text {* + \noindent Obviously, polymorphic equality is implemented the Haskell + way using a type class. How is this achieved? HOL introduces an + explicit class @{class equal} with a corresponding operation @{const + HOL.equal} such that @{thm equal [no_vars]}. The preprocessing + framework does the rest by propagating the @{class equal} constraints + through all dependent code equations. For datatypes, instances of + @{class equal} are implicitly derived when possible. For other types, + you may instantiate @{text equal} manually like any other type class. +*} + + +subsection {* Explicit partiality \label{sec:partiality} *} + +text {* + Partiality usually enters the game by partial patterns, as + in the following example, again for amortised queues: +*} + +definition %quote strict_dequeue :: "'a queue \ 'a \ 'a queue" where + "strict_dequeue q = (case dequeue q + of (Some x, q') \ (x, q'))" + +lemma %quote strict_dequeue_AQueue [code]: + "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)" + "strict_dequeue (AQueue xs []) = + (case rev xs of y # ys \ (y, AQueue [] ys))" + by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split) + +text {* + \noindent In the corresponding code, there is no equation + for the pattern @{term "AQueue [] []"}: +*} + +text %quotetypewriter {* + @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)} +*} + +text {* + \noindent In some cases it is desirable to have this + pseudo-\qt{partiality} more explicitly, e.g.~as follows: +*} + +axiomatization %quote empty_queue :: 'a + +definition %quote strict_dequeue' :: "'a queue \ 'a \ 'a queue" where + "strict_dequeue' q = (case dequeue q of (Some x, q') \ (x, q') | _ \ empty_queue)" + +lemma %quote strict_dequeue'_AQueue [code]: + "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue + else strict_dequeue' (AQueue [] (rev xs)))" + "strict_dequeue' (AQueue xs (y # ys)) = + (y, AQueue xs ys)" + by (simp_all add: strict_dequeue'_def split: list.splits) + +text {* + Observe that on the right hand side of the definition of @{const + "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs. + + Normally, if constants without any code equations occur in a + program, the code generator complains (since in most cases this is + indeed an error). But such constants can also be thought + of as function definitions which always fail, + since there is never a successful pattern match on the left hand + side. In order to categorise a constant into that category + explicitly, use @{command_def "code_abort"}: +*} + +code_abort %quote empty_queue + +text {* + \noindent Then the code generator will just insert an error or + exception at the appropriate position: +*} + +text %quotetypewriter {* + @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)} +*} + +text {* + \noindent This feature however is rarely needed in practice. Note + also that the HOL default setup already declares @{const undefined} + as @{command "code_abort"}, which is most likely to be used in such + situations. +*} + + +subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *} + +text {* + Under certain circumstances, the code generator fails to produce + code entirely. To debug these, the following hints may prove + helpful: + + \begin{description} + + \ditem{\emph{Check with a different target language}.} Sometimes + the situation gets more clear if you switch to another target + language; the code generated there might give some hints what + prevents the code generator to produce code for the desired + language. + + \ditem{\emph{Inspect code equations}.} Code equations are the central + carrier of code generation. Most problems occurring while generating + code can be traced to single equations which are printed as part of + the error message. A closer inspection of those may offer the key + for solving issues (cf.~\secref{sec:equations}). + + \ditem{\emph{Inspect preprocessor setup}.} The preprocessor might + transform code equations unexpectedly; to understand an + inspection of its setup is necessary (cf.~\secref{sec:preproc}). + + \ditem{\emph{Generate exceptions}.} If the code generator + complains about missing code equations, in can be helpful to + implement the offending constants as exceptions + (cf.~\secref{sec:partiality}); this allows at least for a formal + generation of code, whose inspection may then give clues what is + wrong. + + \ditem{\emph{Remove offending code equations}.} If code + generation is prevented by just a single equation, this can be + removed (cf.~\secref{sec:equations}) to allow formal code + generation, whose result in turn can be used to trace the + problem. The most prominent case here are mismatches in type + class signatures (\qt{wellsortedness error}). + + \end{description} +*} + +end