--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/doc-src/Codegen/Foundations.thy Mon Aug 27 23:00:38 2012 +0200
@@ -0,0 +1,347 @@
+theory Foundations
+imports Introduction
+begin
+
+section {* Code generation foundations \label{sec:foundations} *}
+
+subsection {* Code generator architecture \label{sec:architecture} *}
+
+text {*
+ The code generator is actually a framework consisting of different
+ components which can be customised individually.
+
+ Conceptually all components operate on Isabelle's logic framework
+ @{theory Pure}. Practically, the object logic @{theory HOL}
+ provides the necessary facilities to make use of the code generator,
+ mainly since it is an extension of @{theory Pure}.
+
+ The constellation of the different components is visualized in the
+ following picture.
+
+ \begin{figure}[h]
+ \includegraphics{architecture}
+ \caption{Code generator architecture}
+ \label{fig:arch}
+ \end{figure}
+
+ \noindent Central to code generation is the notion of \emph{code
+ equations}. A code equation as a first approximation is a theorem
+ of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} (an equation headed by a
+ constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right
+ hand side @{text t}).
+
+ \begin{itemize}
+
+ \item Starting point of code generation is a collection of (raw)
+ code equations in a theory. It is not relevant where they stem
+ from, but typically they were either produced by specification
+ tools or proved explicitly by the user.
+
+ \item These raw code equations can be subjected to theorem
+ transformations. This \qn{preprocessor} (see
+ \secref{sec:preproc}) can apply the full expressiveness of
+ ML-based theorem transformations to code generation. The result
+ of preprocessing is a structured collection of code equations.
+
+ \item These code equations are \qn{translated} to a program in an
+ abstract intermediate language. Think of it as a kind of
+ \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for
+ datatypes), @{text fun} (stemming from code equations), also
+ @{text class} and @{text inst} (for type classes).
+
+ \item Finally, the abstract program is \qn{serialised} into
+ concrete source code of a target language. This step only
+ produces concrete syntax but does not change the program in
+ essence; all conceptual transformations occur in the translation
+ step.
+
+ \end{itemize}
+
+ \noindent From these steps, only the last two are carried out
+ outside the logic; by keeping this layer as thin as possible, the
+ amount of code to trust is kept to a minimum.
+*}
+
+
+subsection {* The preprocessor \label{sec:preproc} *}
+
+text {*
+ Before selected function theorems are turned into abstract code, a
+ chain of definitional transformation steps is carried out:
+ \emph{preprocessing}. The preprocessor consists of two
+ components: a \emph{simpset} and \emph{function transformers}.
+
+ The \emph{simpset} can apply the full generality of the Isabelle
+ simplifier. Due to the interpretation of theorems as code
+ equations, rewrites are applied to the right hand side and the
+ arguments of the left hand side of an equation, but never to the
+ constant heading the left hand side. An important special case are
+ \emph{unfold theorems}, which may be declared and removed using the
+ @{attribute code_unfold} or \emph{@{attribute code_unfold} del}
+ attribute, respectively.
+
+ Some common applications:
+*}
+
+text_raw {*
+ \begin{itemize}
+*}
+
+text {*
+ \item replacing non-executable constructs by executable ones:
+*}
+
+lemma %quote [code_unfold]:
+ "x \<in> set xs \<longleftrightarrow> List.member xs x" by (fact in_set_member)
+
+text {*
+ \item replacing executable but inconvenient constructs:
+*}
+
+lemma %quote [code_unfold]:
+ "xs = [] \<longleftrightarrow> List.null xs" by (fact eq_Nil_null)
+
+text {*
+ \item eliminating disturbing expressions:
+*}
+
+lemma %quote [code_unfold]:
+ "1 = Suc 0" by (fact One_nat_def)
+
+text_raw {*
+ \end{itemize}
+*}
+
+text {*
+ \noindent \emph{Function transformers} provide a very general
+ interface, transforming a list of function theorems to another list
+ of function theorems, provided that neither the heading constant nor
+ its type change. The @{term "0\<Colon>nat"} / @{const Suc} pattern
+ elimination implemented in theory @{text Efficient_Nat} (see
+ \secref{eff_nat}) uses this interface.
+
+ \noindent The current setup of the preprocessor may be inspected
+ using the @{command_def print_codeproc} command. @{command_def
+ code_thms} (see \secref{sec:equations}) provides a convenient
+ mechanism to inspect the impact of a preprocessor setup on code
+ equations.
+*}
+
+
+subsection {* Understanding code equations \label{sec:equations} *}
+
+text {*
+ As told in \secref{sec:principle}, the notion of code equations is
+ vital to code generation. Indeed most problems which occur in
+ practice can be resolved by an inspection of the underlying code
+ equations.
+
+ It is possible to exchange the default code equations for constants
+ by explicitly proving alternative ones:
+*}
+
+lemma %quote [code]:
+ "dequeue (AQueue xs []) =
+ (if xs = [] then (None, AQueue [] [])
+ else dequeue (AQueue [] (rev xs)))"
+ "dequeue (AQueue xs (y # ys)) =
+ (Some y, AQueue xs ys)"
+ by (cases xs, simp_all) (cases "rev xs", simp_all)
+
+text {*
+ \noindent The annotation @{text "[code]"} is an @{text attribute}
+ which states that the given theorems should be considered as code
+ equations for a @{text fun} statement -- the corresponding constant
+ is determined syntactically. The resulting code:
+*}
+
+text %quotetypewriter {*
+ @{code_stmts dequeue (consts) dequeue (Haskell)}
+*}
+
+text {*
+ \noindent You may note that the equality test @{term "xs = []"} has
+ been replaced by the predicate @{term "List.null xs"}. This is due
+ to the default setup of the \qn{preprocessor}.
+
+ This possibility to select arbitrary code equations is the key
+ technique for program and datatype refinement (see
+ \secref{sec:refinement}).
+
+ Due to the preprocessor, there is the distinction of raw code
+ equations (before preprocessing) and code equations (after
+ preprocessing).
+
+ The first can be listed (among other data) using the @{command_def
+ print_codesetup} command.
+
+ The code equations after preprocessing are already are blueprint of
+ the generated program and can be inspected using the @{command
+ code_thms} command:
+*}
+
+code_thms %quote dequeue
+
+text {*
+ \noindent This prints a table with the code equations for @{const
+ dequeue}, including \emph{all} code equations those equations depend
+ on recursively. These dependencies themselves can be visualized using
+ the @{command_def code_deps} command.
+*}
+
+
+subsection {* Equality *}
+
+text {*
+ Implementation of equality deserves some attention. Here an example
+ function involving polymorphic equality:
+*}
+
+primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where
+ "collect_duplicates xs ys [] = xs"
+| "collect_duplicates xs ys (z#zs) = (if z \<in> set xs
+ then if z \<in> set ys
+ then collect_duplicates xs ys zs
+ else collect_duplicates xs (z#ys) zs
+ else collect_duplicates (z#xs) (z#ys) zs)"
+
+text {*
+ \noindent During preprocessing, the membership test is rewritten,
+ resulting in @{const List.member}, which itself performs an explicit
+ equality check, as can be seen in the corresponding @{text SML} code:
+*}
+
+text %quotetypewriter {*
+ @{code_stmts collect_duplicates (SML)}
+*}
+
+text {*
+ \noindent Obviously, polymorphic equality is implemented the Haskell
+ way using a type class. How is this achieved? HOL introduces an
+ explicit class @{class equal} with a corresponding operation @{const
+ HOL.equal} such that @{thm equal [no_vars]}. The preprocessing
+ framework does the rest by propagating the @{class equal} constraints
+ through all dependent code equations. For datatypes, instances of
+ @{class equal} are implicitly derived when possible. For other types,
+ you may instantiate @{text equal} manually like any other type class.
+*}
+
+
+subsection {* Explicit partiality \label{sec:partiality} *}
+
+text {*
+ Partiality usually enters the game by partial patterns, as
+ in the following example, again for amortised queues:
+*}
+
+definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where
+ "strict_dequeue q = (case dequeue q
+ of (Some x, q') \<Rightarrow> (x, q'))"
+
+lemma %quote strict_dequeue_AQueue [code]:
+ "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"
+ "strict_dequeue (AQueue xs []) =
+ (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"
+ by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split)
+
+text {*
+ \noindent In the corresponding code, there is no equation
+ for the pattern @{term "AQueue [] []"}:
+*}
+
+text %quotetypewriter {*
+ @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}
+*}
+
+text {*
+ \noindent In some cases it is desirable to have this
+ pseudo-\qt{partiality} more explicitly, e.g.~as follows:
+*}
+
+axiomatization %quote empty_queue :: 'a
+
+definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where
+ "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"
+
+lemma %quote strict_dequeue'_AQueue [code]:
+ "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue
+ else strict_dequeue' (AQueue [] (rev xs)))"
+ "strict_dequeue' (AQueue xs (y # ys)) =
+ (y, AQueue xs ys)"
+ by (simp_all add: strict_dequeue'_def split: list.splits)
+
+text {*
+ Observe that on the right hand side of the definition of @{const
+ "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs.
+
+ Normally, if constants without any code equations occur in a
+ program, the code generator complains (since in most cases this is
+ indeed an error). But such constants can also be thought
+ of as function definitions which always fail,
+ since there is never a successful pattern match on the left hand
+ side. In order to categorise a constant into that category
+ explicitly, use @{command_def "code_abort"}:
+*}
+
+code_abort %quote empty_queue
+
+text {*
+ \noindent Then the code generator will just insert an error or
+ exception at the appropriate position:
+*}
+
+text %quotetypewriter {*
+ @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}
+*}
+
+text {*
+ \noindent This feature however is rarely needed in practice. Note
+ also that the HOL default setup already declares @{const undefined}
+ as @{command "code_abort"}, which is most likely to be used in such
+ situations.
+*}
+
+
+subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *}
+
+text {*
+ Under certain circumstances, the code generator fails to produce
+ code entirely. To debug these, the following hints may prove
+ helpful:
+
+ \begin{description}
+
+ \ditem{\emph{Check with a different target language}.} Sometimes
+ the situation gets more clear if you switch to another target
+ language; the code generated there might give some hints what
+ prevents the code generator to produce code for the desired
+ language.
+
+ \ditem{\emph{Inspect code equations}.} Code equations are the central
+ carrier of code generation. Most problems occurring while generating
+ code can be traced to single equations which are printed as part of
+ the error message. A closer inspection of those may offer the key
+ for solving issues (cf.~\secref{sec:equations}).
+
+ \ditem{\emph{Inspect preprocessor setup}.} The preprocessor might
+ transform code equations unexpectedly; to understand an
+ inspection of its setup is necessary (cf.~\secref{sec:preproc}).
+
+ \ditem{\emph{Generate exceptions}.} If the code generator
+ complains about missing code equations, in can be helpful to
+ implement the offending constants as exceptions
+ (cf.~\secref{sec:partiality}); this allows at least for a formal
+ generation of code, whose inspection may then give clues what is
+ wrong.
+
+ \ditem{\emph{Remove offending code equations}.} If code
+ generation is prevented by just a single equation, this can be
+ removed (cf.~\secref{sec:equations}) to allow formal code
+ generation, whose result in turn can be used to trace the
+ problem. The most prominent case here are mismatches in type
+ class signatures (\qt{wellsortedness error}).
+
+ \end{description}
+*}
+
+end