src/Doc/Codegen/Foundations.thy
author haftmann
Fri Feb 15 08:31:31 2013 +0100 (2013-02-15)
changeset 51143 0a2371e7ced3
parent 48985 5386df44a037
child 51171 e8b2d90da499
permissions -rw-r--r--
two target language numeral types: integer and natural, as replacement for code_numeral;
former theory HOL/Library/Code_Numeral_Types replaces HOL/Code_Numeral;
refined stack of theories implementing int and/or nat by target language numerals;
reduced number of target language numeral types to exactly one
     1 theory Foundations
     2 imports Introduction
     3 begin
     4 
     5 section {* Code generation foundations \label{sec:foundations} *}
     6 
     7 subsection {* Code generator architecture \label{sec:architecture} *}
     8 
     9 text {*
    10   The code generator is actually a framework consisting of different
    11   components which can be customised individually.
    12 
    13   Conceptually all components operate on Isabelle's logic framework
    14   @{theory Pure}.  Practically, the object logic @{theory HOL}
    15   provides the necessary facilities to make use of the code generator,
    16   mainly since it is an extension of @{theory Pure}.
    17 
    18   The constellation of the different components is visualized in the
    19   following picture.
    20 
    21   \begin{figure}[h]
    22     \includegraphics{architecture}
    23     \caption{Code generator architecture}
    24     \label{fig:arch}
    25   \end{figure}
    26 
    27   \noindent Central to code generation is the notion of \emph{code
    28   equations}.  A code equation as a first approximation is a theorem
    29   of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} (an equation headed by a
    30   constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right
    31   hand side @{text t}).
    32 
    33   \begin{itemize}
    34 
    35     \item Starting point of code generation is a collection of (raw)
    36       code equations in a theory. It is not relevant where they stem
    37       from, but typically they were either produced by specification
    38       tools or proved explicitly by the user.
    39       
    40     \item These raw code equations can be subjected to theorem
    41       transformations.  This \qn{preprocessor} (see
    42       \secref{sec:preproc}) can apply the full expressiveness of
    43       ML-based theorem transformations to code generation.  The result
    44       of preprocessing is a structured collection of code equations.
    45 
    46     \item These code equations are \qn{translated} to a program in an
    47       abstract intermediate language.  Think of it as a kind of
    48       \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for
    49       datatypes), @{text fun} (stemming from code equations), also
    50       @{text class} and @{text inst} (for type classes).
    51 
    52     \item Finally, the abstract program is \qn{serialised} into
    53       concrete source code of a target language.  This step only
    54       produces concrete syntax but does not change the program in
    55       essence; all conceptual transformations occur in the translation
    56       step.
    57 
    58   \end{itemize}
    59 
    60   \noindent From these steps, only the last two are carried out
    61   outside the logic; by keeping this layer as thin as possible, the
    62   amount of code to trust is kept to a minimum.
    63 *}
    64 
    65 
    66 subsection {* The preprocessor \label{sec:preproc} *}
    67 
    68 text {*
    69   Before selected function theorems are turned into abstract code, a
    70   chain of definitional transformation steps is carried out:
    71   \emph{preprocessing}.  The preprocessor consists of two
    72   components: a \emph{simpset} and \emph{function transformers}.
    73 
    74   The \emph{simpset} can apply the full generality of the Isabelle
    75   simplifier.  Due to the interpretation of theorems as code
    76   equations, rewrites are applied to the right hand side and the
    77   arguments of the left hand side of an equation, but never to the
    78   constant heading the left hand side.  An important special case are
    79   \emph{unfold theorems}, which may be declared and removed using the
    80   @{attribute code_unfold} or \emph{@{attribute code_unfold} del}
    81   attribute, respectively.
    82 
    83   Some common applications:
    84 *}
    85 
    86 text_raw {*
    87   \begin{itemize}
    88 *}
    89 
    90 text {*
    91      \item replacing non-executable constructs by executable ones:
    92 *}     
    93 
    94 lemma %quote [code_unfold]:
    95   "x \<in> set xs \<longleftrightarrow> List.member xs x" by (fact in_set_member)
    96 
    97 text {*
    98      \item replacing executable but inconvenient constructs:
    99 *}
   100 
   101 lemma %quote [code_unfold]:
   102   "xs = [] \<longleftrightarrow> List.null xs" by (fact eq_Nil_null)
   103 
   104 text {*
   105      \item eliminating disturbing expressions:
   106 *}
   107 
   108 lemma %quote [code_unfold]:
   109   "1 = Suc 0" by (fact One_nat_def)
   110 
   111 text_raw {*
   112   \end{itemize}
   113 *}
   114 
   115 text {*
   116   \noindent \emph{Function transformers} provide a very general
   117   interface, transforming a list of function theorems to another list
   118   of function theorems, provided that neither the heading constant nor
   119   its type change.  The @{term "0\<Colon>nat"} / @{const Suc} pattern
   120   elimination implemented in theory @{text Code_Binary_Nat} (see
   121   \secref{eff_nat}) uses this interface.
   122 
   123   \noindent The current setup of the preprocessor may be inspected
   124   using the @{command_def print_codeproc} command.  @{command_def
   125   code_thms} (see \secref{sec:equations}) provides a convenient
   126   mechanism to inspect the impact of a preprocessor setup on code
   127   equations.
   128 *}
   129 
   130 
   131 subsection {* Understanding code equations \label{sec:equations} *}
   132 
   133 text {*
   134   As told in \secref{sec:principle}, the notion of code equations is
   135   vital to code generation.  Indeed most problems which occur in
   136   practice can be resolved by an inspection of the underlying code
   137   equations.
   138 
   139   It is possible to exchange the default code equations for constants
   140   by explicitly proving alternative ones:
   141 *}
   142 
   143 lemma %quote [code]:
   144   "dequeue (AQueue xs []) =
   145      (if xs = [] then (None, AQueue [] [])
   146        else dequeue (AQueue [] (rev xs)))"
   147   "dequeue (AQueue xs (y # ys)) =
   148      (Some y, AQueue xs ys)"
   149   by (cases xs, simp_all) (cases "rev xs", simp_all)
   150 
   151 text {*
   152   \noindent The annotation @{text "[code]"} is an @{text attribute}
   153   which states that the given theorems should be considered as code
   154   equations for a @{text fun} statement -- the corresponding constant
   155   is determined syntactically.  The resulting code:
   156 *}
   157 
   158 text %quotetypewriter {*
   159   @{code_stmts dequeue (consts) dequeue (Haskell)}
   160 *}
   161 
   162 text {*
   163   \noindent You may note that the equality test @{term "xs = []"} has
   164   been replaced by the predicate @{term "List.null xs"}.  This is due
   165   to the default setup of the \qn{preprocessor}.
   166 
   167   This possibility to select arbitrary code equations is the key
   168   technique for program and datatype refinement (see
   169   \secref{sec:refinement}).
   170 
   171   Due to the preprocessor, there is the distinction of raw code
   172   equations (before preprocessing) and code equations (after
   173   preprocessing).
   174 
   175   The first can be listed (among other data) using the @{command_def
   176   print_codesetup} command.
   177 
   178   The code equations after preprocessing are already are blueprint of
   179   the generated program and can be inspected using the @{command
   180   code_thms} command:
   181 *}
   182 
   183 code_thms %quote dequeue
   184 
   185 text {*
   186   \noindent This prints a table with the code equations for @{const
   187   dequeue}, including \emph{all} code equations those equations depend
   188   on recursively.  These dependencies themselves can be visualized using
   189   the @{command_def code_deps} command.
   190 *}
   191 
   192 
   193 subsection {* Equality *}
   194 
   195 text {*
   196   Implementation of equality deserves some attention.  Here an example
   197   function involving polymorphic equality:
   198 *}
   199 
   200 primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where
   201   "collect_duplicates xs ys [] = xs"
   202 | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs
   203     then if z \<in> set ys
   204       then collect_duplicates xs ys zs
   205       else collect_duplicates xs (z#ys) zs
   206     else collect_duplicates (z#xs) (z#ys) zs)"
   207 
   208 text {*
   209   \noindent During preprocessing, the membership test is rewritten,
   210   resulting in @{const List.member}, which itself performs an explicit
   211   equality check, as can be seen in the corresponding @{text SML} code:
   212 *}
   213 
   214 text %quotetypewriter {*
   215   @{code_stmts collect_duplicates (SML)}
   216 *}
   217 
   218 text {*
   219   \noindent Obviously, polymorphic equality is implemented the Haskell
   220   way using a type class.  How is this achieved?  HOL introduces an
   221   explicit class @{class equal} with a corresponding operation @{const
   222   HOL.equal} such that @{thm equal [no_vars]}.  The preprocessing
   223   framework does the rest by propagating the @{class equal} constraints
   224   through all dependent code equations.  For datatypes, instances of
   225   @{class equal} are implicitly derived when possible.  For other types,
   226   you may instantiate @{text equal} manually like any other type class.
   227 *}
   228 
   229 
   230 subsection {* Explicit partiality \label{sec:partiality} *}
   231 
   232 text {*
   233   Partiality usually enters the game by partial patterns, as
   234   in the following example, again for amortised queues:
   235 *}
   236 
   237 definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where
   238   "strict_dequeue q = (case dequeue q
   239     of (Some x, q') \<Rightarrow> (x, q'))"
   240 
   241 lemma %quote strict_dequeue_AQueue [code]:
   242   "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"
   243   "strict_dequeue (AQueue xs []) =
   244     (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"
   245   by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split)
   246 
   247 text {*
   248   \noindent In the corresponding code, there is no equation
   249   for the pattern @{term "AQueue [] []"}:
   250 *}
   251 
   252 text %quotetypewriter {*
   253   @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}
   254 *}
   255 
   256 text {*
   257   \noindent In some cases it is desirable to have this
   258   pseudo-\qt{partiality} more explicitly, e.g.~as follows:
   259 *}
   260 
   261 axiomatization %quote empty_queue :: 'a
   262 
   263 definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where
   264   "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"
   265 
   266 lemma %quote strict_dequeue'_AQueue [code]:
   267   "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue
   268      else strict_dequeue' (AQueue [] (rev xs)))"
   269   "strict_dequeue' (AQueue xs (y # ys)) =
   270      (y, AQueue xs ys)"
   271   by (simp_all add: strict_dequeue'_def split: list.splits)
   272 
   273 text {*
   274   Observe that on the right hand side of the definition of @{const
   275   "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs.
   276 
   277   Normally, if constants without any code equations occur in a
   278   program, the code generator complains (since in most cases this is
   279   indeed an error).  But such constants can also be thought
   280   of as function definitions which always fail,
   281   since there is never a successful pattern match on the left hand
   282   side.  In order to categorise a constant into that category
   283   explicitly, use @{command_def "code_abort"}:
   284 *}
   285 
   286 code_abort %quote empty_queue
   287 
   288 text {*
   289   \noindent Then the code generator will just insert an error or
   290   exception at the appropriate position:
   291 *}
   292 
   293 text %quotetypewriter {*
   294   @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}
   295 *}
   296 
   297 text {*
   298   \noindent This feature however is rarely needed in practice.  Note
   299   also that the HOL default setup already declares @{const undefined}
   300   as @{command "code_abort"}, which is most likely to be used in such
   301   situations.
   302 *}
   303 
   304 
   305 subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *}
   306 
   307 text {*
   308   Under certain circumstances, the code generator fails to produce
   309   code entirely.  To debug these, the following hints may prove
   310   helpful:
   311 
   312   \begin{description}
   313 
   314     \ditem{\emph{Check with a different target language}.}  Sometimes
   315       the situation gets more clear if you switch to another target
   316       language; the code generated there might give some hints what
   317       prevents the code generator to produce code for the desired
   318       language.
   319 
   320     \ditem{\emph{Inspect code equations}.}  Code equations are the central
   321       carrier of code generation.  Most problems occurring while generating
   322       code can be traced to single equations which are printed as part of
   323       the error message.  A closer inspection of those may offer the key
   324       for solving issues (cf.~\secref{sec:equations}).
   325 
   326     \ditem{\emph{Inspect preprocessor setup}.}  The preprocessor might
   327       transform code equations unexpectedly; to understand an
   328       inspection of its setup is necessary (cf.~\secref{sec:preproc}).
   329 
   330     \ditem{\emph{Generate exceptions}.}  If the code generator
   331       complains about missing code equations, in can be helpful to
   332       implement the offending constants as exceptions
   333       (cf.~\secref{sec:partiality}); this allows at least for a formal
   334       generation of code, whose inspection may then give clues what is
   335       wrong.
   336 
   337     \ditem{\emph{Remove offending code equations}.}  If code
   338       generation is prevented by just a single equation, this can be
   339       removed (cf.~\secref{sec:equations}) to allow formal code
   340       generation, whose result in turn can be used to trace the
   341       problem.  The most prominent case here are mismatches in type
   342       class signatures (\qt{wellsortedness error}).
   343 
   344   \end{description}
   345 *}
   346 
   347 end