src/Doc/Codegen/Foundations.thy
 author haftmann Fri Feb 15 08:31:31 2013 +0100 (2013-02-15) changeset 51143 0a2371e7ced3 parent 48985 5386df44a037 child 51171 e8b2d90da499 permissions -rw-r--r--
two target language numeral types: integer and natural, as replacement for code_numeral;
former theory HOL/Library/Code_Numeral_Types replaces HOL/Code_Numeral;
refined stack of theories implementing int and/or nat by target language numerals;
reduced number of target language numeral types to exactly one
     1 theory Foundations

     2 imports Introduction

     3 begin

     4

     5 section {* Code generation foundations \label{sec:foundations} *}

     6

     7 subsection {* Code generator architecture \label{sec:architecture} *}

     8

     9 text {*

    10   The code generator is actually a framework consisting of different

    11   components which can be customised individually.

    12

    13   Conceptually all components operate on Isabelle's logic framework

    14   @{theory Pure}.  Practically, the object logic @{theory HOL}

    15   provides the necessary facilities to make use of the code generator,

    16   mainly since it is an extension of @{theory Pure}.

    17

    18   The constellation of the different components is visualized in the

    19   following picture.

    20

    21   \begin{figure}[h]

    22     \includegraphics{architecture}

    23     \caption{Code generator architecture}

    24     \label{fig:arch}

    25   \end{figure}

    26

    27   \noindent Central to code generation is the notion of \emph{code

    28   equations}.  A code equation as a first approximation is a theorem

    29   of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} (an equation headed by a

    30   constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right

    31   hand side @{text t}).

    32

    33   \begin{itemize}

    34

    35     \item Starting point of code generation is a collection of (raw)

    36       code equations in a theory. It is not relevant where they stem

    37       from, but typically they were either produced by specification

    38       tools or proved explicitly by the user.

    39

    40     \item These raw code equations can be subjected to theorem

    41       transformations.  This \qn{preprocessor} (see

    42       \secref{sec:preproc}) can apply the full expressiveness of

    43       ML-based theorem transformations to code generation.  The result

    44       of preprocessing is a structured collection of code equations.

    45

    46     \item These code equations are \qn{translated} to a program in an

    47       abstract intermediate language.  Think of it as a kind of

    48       \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for

    49       datatypes), @{text fun} (stemming from code equations), also

    50       @{text class} and @{text inst} (for type classes).

    51

    52     \item Finally, the abstract program is \qn{serialised} into

    53       concrete source code of a target language.  This step only

    54       produces concrete syntax but does not change the program in

    55       essence; all conceptual transformations occur in the translation

    56       step.

    57

    58   \end{itemize}

    59

    60   \noindent From these steps, only the last two are carried out

    61   outside the logic; by keeping this layer as thin as possible, the

    62   amount of code to trust is kept to a minimum.

    63 *}

    64

    65

    66 subsection {* The preprocessor \label{sec:preproc} *}

    67

    68 text {*

    69   Before selected function theorems are turned into abstract code, a

    70   chain of definitional transformation steps is carried out:

    71   \emph{preprocessing}.  The preprocessor consists of two

    72   components: a \emph{simpset} and \emph{function transformers}.

    73

    74   The \emph{simpset} can apply the full generality of the Isabelle

    75   simplifier.  Due to the interpretation of theorems as code

    76   equations, rewrites are applied to the right hand side and the

    77   arguments of the left hand side of an equation, but never to the

    78   constant heading the left hand side.  An important special case are

    79   \emph{unfold theorems}, which may be declared and removed using the

    80   @{attribute code_unfold} or \emph{@{attribute code_unfold} del}

    81   attribute, respectively.

    82

    83   Some common applications:

    84 *}

    85

    86 text_raw {*

    87   \begin{itemize}

    88 *}

    89

    90 text {*

    91      \item replacing non-executable constructs by executable ones:

    92 *}

    93

    94 lemma %quote [code_unfold]:

    95   "x \<in> set xs \<longleftrightarrow> List.member xs x" by (fact in_set_member)

    96

    97 text {*

    98      \item replacing executable but inconvenient constructs:

    99 *}

   100

   101 lemma %quote [code_unfold]:

   102   "xs = [] \<longleftrightarrow> List.null xs" by (fact eq_Nil_null)

   103

   104 text {*

   105      \item eliminating disturbing expressions:

   106 *}

   107

   108 lemma %quote [code_unfold]:

   109   "1 = Suc 0" by (fact One_nat_def)

   110

   111 text_raw {*

   112   \end{itemize}

   113 *}

   114

   115 text {*

   116   \noindent \emph{Function transformers} provide a very general

   117   interface, transforming a list of function theorems to another list

   118   of function theorems, provided that neither the heading constant nor

   119   its type change.  The @{term "0\<Colon>nat"} / @{const Suc} pattern

   120   elimination implemented in theory @{text Code_Binary_Nat} (see

   121   \secref{eff_nat}) uses this interface.

   122

   123   \noindent The current setup of the preprocessor may be inspected

   124   using the @{command_def print_codeproc} command.  @{command_def

   125   code_thms} (see \secref{sec:equations}) provides a convenient

   126   mechanism to inspect the impact of a preprocessor setup on code

   127   equations.

   128 *}

   129

   130

   131 subsection {* Understanding code equations \label{sec:equations} *}

   132

   133 text {*

   134   As told in \secref{sec:principle}, the notion of code equations is

   135   vital to code generation.  Indeed most problems which occur in

   136   practice can be resolved by an inspection of the underlying code

   137   equations.

   138

   139   It is possible to exchange the default code equations for constants

   140   by explicitly proving alternative ones:

   141 *}

   142

   143 lemma %quote [code]:

   144   "dequeue (AQueue xs []) =

   145      (if xs = [] then (None, AQueue [] [])

   146        else dequeue (AQueue [] (rev xs)))"

   147   "dequeue (AQueue xs (y # ys)) =

   148      (Some y, AQueue xs ys)"

   149   by (cases xs, simp_all) (cases "rev xs", simp_all)

   150

   151 text {*

   152   \noindent The annotation @{text "[code]"} is an @{text attribute}

   153   which states that the given theorems should be considered as code

   154   equations for a @{text fun} statement -- the corresponding constant

   155   is determined syntactically.  The resulting code:

   156 *}

   157

   158 text %quotetypewriter {*

   159   @{code_stmts dequeue (consts) dequeue (Haskell)}

   160 *}

   161

   162 text {*

   163   \noindent You may note that the equality test @{term "xs = []"} has

   164   been replaced by the predicate @{term "List.null xs"}.  This is due

   165   to the default setup of the \qn{preprocessor}.

   166

   167   This possibility to select arbitrary code equations is the key

   168   technique for program and datatype refinement (see

   169   \secref{sec:refinement}).

   170

   171   Due to the preprocessor, there is the distinction of raw code

   172   equations (before preprocessing) and code equations (after

   173   preprocessing).

   174

   175   The first can be listed (among other data) using the @{command_def

   176   print_codesetup} command.

   177

   178   The code equations after preprocessing are already are blueprint of

   179   the generated program and can be inspected using the @{command

   180   code_thms} command:

   181 *}

   182

   183 code_thms %quote dequeue

   184

   185 text {*

   186   \noindent This prints a table with the code equations for @{const

   187   dequeue}, including \emph{all} code equations those equations depend

   188   on recursively.  These dependencies themselves can be visualized using

   189   the @{command_def code_deps} command.

   190 *}

   191

   192

   193 subsection {* Equality *}

   194

   195 text {*

   196   Implementation of equality deserves some attention.  Here an example

   197   function involving polymorphic equality:

   198 *}

   199

   200 primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where

   201   "collect_duplicates xs ys [] = xs"

   202 | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs

   203     then if z \<in> set ys

   204       then collect_duplicates xs ys zs

   205       else collect_duplicates xs (z#ys) zs

   206     else collect_duplicates (z#xs) (z#ys) zs)"

   207

   208 text {*

   209   \noindent During preprocessing, the membership test is rewritten,

   210   resulting in @{const List.member}, which itself performs an explicit

   211   equality check, as can be seen in the corresponding @{text SML} code:

   212 *}

   213

   214 text %quotetypewriter {*

   215   @{code_stmts collect_duplicates (SML)}

   216 *}

   217

   218 text {*

   219   \noindent Obviously, polymorphic equality is implemented the Haskell

   220   way using a type class.  How is this achieved?  HOL introduces an

   221   explicit class @{class equal} with a corresponding operation @{const

   222   HOL.equal} such that @{thm equal [no_vars]}.  The preprocessing

   223   framework does the rest by propagating the @{class equal} constraints

   224   through all dependent code equations.  For datatypes, instances of

   225   @{class equal} are implicitly derived when possible.  For other types,

   226   you may instantiate @{text equal} manually like any other type class.

   227 *}

   228

   229

   230 subsection {* Explicit partiality \label{sec:partiality} *}

   231

   232 text {*

   233   Partiality usually enters the game by partial patterns, as

   234   in the following example, again for amortised queues:

   235 *}

   236

   237 definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

   238   "strict_dequeue q = (case dequeue q

   239     of (Some x, q') \<Rightarrow> (x, q'))"

   240

   241 lemma %quote strict_dequeue_AQueue [code]:

   242   "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"

   243   "strict_dequeue (AQueue xs []) =

   244     (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"

   245   by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split)

   246

   247 text {*

   248   \noindent In the corresponding code, there is no equation

   249   for the pattern @{term "AQueue [] []"}:

   250 *}

   251

   252 text %quotetypewriter {*

   253   @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}

   254 *}

   255

   256 text {*

   257   \noindent In some cases it is desirable to have this

   258   pseudo-\qt{partiality} more explicitly, e.g.~as follows:

   259 *}

   260

   261 axiomatization %quote empty_queue :: 'a

   262

   263 definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

   264   "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"

   265

   266 lemma %quote strict_dequeue'_AQueue [code]:

   267   "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue

   268      else strict_dequeue' (AQueue [] (rev xs)))"

   269   "strict_dequeue' (AQueue xs (y # ys)) =

   270      (y, AQueue xs ys)"

   271   by (simp_all add: strict_dequeue'_def split: list.splits)

   272

   273 text {*

   274   Observe that on the right hand side of the definition of @{const

   275   "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs.

   276

   277   Normally, if constants without any code equations occur in a

   278   program, the code generator complains (since in most cases this is

   279   indeed an error).  But such constants can also be thought

   280   of as function definitions which always fail,

   281   since there is never a successful pattern match on the left hand

   282   side.  In order to categorise a constant into that category

   283   explicitly, use @{command_def "code_abort"}:

   284 *}

   285

   286 code_abort %quote empty_queue

   287

   288 text {*

   289   \noindent Then the code generator will just insert an error or

   290   exception at the appropriate position:

   291 *}

   292

   293 text %quotetypewriter {*

   294   @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}

   295 *}

   296

   297 text {*

   298   \noindent This feature however is rarely needed in practice.  Note

   299   also that the HOL default setup already declares @{const undefined}

   300   as @{command "code_abort"}, which is most likely to be used in such

   301   situations.

   302 *}

   303

   304

   305 subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *}

   306

   307 text {*

   308   Under certain circumstances, the code generator fails to produce

   309   code entirely.  To debug these, the following hints may prove

   310   helpful:

   311

   312   \begin{description}

   313

   314     \ditem{\emph{Check with a different target language}.}  Sometimes

   315       the situation gets more clear if you switch to another target

   316       language; the code generated there might give some hints what

   317       prevents the code generator to produce code for the desired

   318       language.

   319

   320     \ditem{\emph{Inspect code equations}.}  Code equations are the central

   321       carrier of code generation.  Most problems occurring while generating

   322       code can be traced to single equations which are printed as part of

   323       the error message.  A closer inspection of those may offer the key

   324       for solving issues (cf.~\secref{sec:equations}).

   325

   326     \ditem{\emph{Inspect preprocessor setup}.}  The preprocessor might

   327       transform code equations unexpectedly; to understand an

   328       inspection of its setup is necessary (cf.~\secref{sec:preproc}).

   329

   330     \ditem{\emph{Generate exceptions}.}  If the code generator

   331       complains about missing code equations, in can be helpful to

   332       implement the offending constants as exceptions

   333       (cf.~\secref{sec:partiality}); this allows at least for a formal

   334       generation of code, whose inspection may then give clues what is

   335       wrong.

   336

   337     \ditem{\emph{Remove offending code equations}.}  If code

   338       generation is prevented by just a single equation, this can be

   339       removed (cf.~\secref{sec:equations}) to allow formal code

   340       generation, whose result in turn can be used to trace the

   341       problem.  The most prominent case here are mismatches in type

   342       class signatures (\qt{wellsortedness error}).

   343

   344   \end{description}

   345 *}

   346

   347 end