src/Doc/Codegen/Foundations.thy
 author haftmann Mon Feb 06 20:56:34 2017 +0100 (2017-02-06) changeset 64990 c6a7de505796 parent 61781 e1e6bb36b27a child 68254 3a7f257dcac7 permissions -rw-r--r--
more explicit errors in pathological cases
     1 theory Foundations

     2 imports Introduction

     3 begin

     4

     5 section \<open>Code generation foundations \label{sec:foundations}\<close>

     6

     7 subsection \<open>Code generator architecture \label{sec:architecture}\<close>

     8

     9 text \<open>

    10   The code generator is actually a framework consisting of different

    11   components which can be customised individually.

    12

    13   Conceptually all components operate on Isabelle's logic framework

    14   @{theory Pure}.  Practically, the object logic @{theory HOL}

    15   provides the necessary facilities to make use of the code generator,

    16   mainly since it is an extension of @{theory Pure}.

    17

    18   The constellation of the different components is visualized in the

    19   following picture.

    20

    21   \begin{figure}[h]

    22     \def\sys#1{\emph{#1}}

    23     \begin{tikzpicture}[x = 4cm, y = 1cm]

    24       \tikzstyle positive=[color = black, fill = white];

    25       \tikzstyle negative=[color = white, fill = black];

    26       \tikzstyle entity=[rounded corners, draw, thick];

    27       \tikzstyle process=[ellipse, draw, thick];

    28       \tikzstyle arrow=[-stealth, semithick];

    29       \node (spec) at (0, 3) [entity, positive] {specification tools};

    30       \node (user) at (1, 3) [entity, positive] {user proofs};

    31       \node (spec_user_join) at (0.5, 3) [shape=coordinate] {};

    32       \node (raw) at (0.5, 4) [entity, positive] {raw code equations};

    33       \node (pre) at (1.5, 4) [process, positive] {preprocessing};

    34       \node (eqn) at (2.5, 4) [entity, positive] {code equations};

    35       \node (iml) at (0.5, 0) [entity, positive] {intermediate program};

    36       \node (seri) at (1.5, 0) [process, positive] {serialisation};

    37       \node (SML) at (2.5, 3) [entity, positive] {\sys{SML}};

    38       \node (OCaml) at (2.5, 2) [entity, positive] {\sys{OCaml}};

    39       \node (Haskell) at (2.5, 1) [entity, positive] {\sys{Haskell}};

    40       \node (Scala) at (2.5, 0) [entity, positive] {\sys{Scala}};

    41       \draw [semithick] (spec) -- (spec_user_join);

    42       \draw [semithick] (user) -- (spec_user_join);

    43       \draw [-diamond, semithick] (spec_user_join) -- (raw);

    44       \draw [arrow] (raw) -- (pre);

    45       \draw [arrow] (pre) -- (eqn);

    46       \draw [arrow] (eqn) -- node (transl) [process, positive] {translation} (iml);

    47       \draw [arrow] (iml) -- (seri);

    48       \draw [arrow] (seri) -- (SML);

    49       \draw [arrow] (seri) -- (OCaml);

    50       \draw [arrow] (seri) -- (Haskell);

    51       \draw [arrow] (seri) -- (Scala);

    52     \end{tikzpicture}

    53     \caption{Code generator architecture}

    54     \label{fig:arch}

    55   \end{figure}

    56

    57   \noindent Central to code generation is the notion of \emph{code

    58   equations}.  A code equation as a first approximation is a theorem

    59   of the form @{text "f t\<^sub>1 t\<^sub>2 \<dots> t\<^sub>n \<equiv> t"} (an equation headed by a

    60   constant @{text f} with arguments @{text "t\<^sub>1 t\<^sub>2 \<dots> t\<^sub>n"} and right

    61   hand side @{text t}).

    62

    63   \begin{itemize}

    64

    65     \item Starting point of code generation is a collection of (raw)

    66       code equations in a theory. It is not relevant where they stem

    67       from, but typically they were either produced by specification

    68       tools or proved explicitly by the user.

    69

    70     \item These raw code equations can be subjected to theorem

    71       transformations.  This \qn{preprocessor} (see

    72       \secref{sec:preproc}) can apply the full expressiveness of

    73       ML-based theorem transformations to code generation.  The result

    74       of preprocessing is a structured collection of code equations.

    75

    76     \item These code equations are \qn{translated} to a program in an

    77       abstract intermediate language.  Think of it as a kind of

    78       \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for

    79       datatypes), @{text fun} (stemming from code equations), also

    80       @{text class} and @{text inst} (for type classes).

    81

    82     \item Finally, the abstract program is \qn{serialised} into

    83       concrete source code of a target language.  This step only

    84       produces concrete syntax but does not change the program in

    85       essence; all conceptual transformations occur in the translation

    86       step.

    87

    88   \end{itemize}

    89

    90   \noindent From these steps, only the last two are carried out

    91   outside the logic; by keeping this layer as thin as possible, the

    92   amount of code to trust is kept to a minimum.

    93 \<close>

    94

    95

    96 subsection \<open>The pre- and postprocessor \label{sec:preproc}\<close>

    97

    98 text \<open>

    99   Before selected function theorems are turned into abstract code, a

   100   chain of definitional transformation steps is carried out:

   101   \emph{preprocessing}.  The preprocessor consists of two

   102   components: a \emph{simpset} and \emph{function transformers}.

   103

   104   The preprocessor simpset has a disparate brother, the

   105   \emph{postprocessor simpset}.

   106   In the theory-to-code scenario depicted in the picture

   107   above, it plays no role.  But if generated code is used

   108   to evaluate expressions (cf.~\secref{sec:evaluation}), the

   109   postprocessor simpset is applied to the resulting expression before this

   110   is turned back.

   111

   112   The pre- and postprocessor \emph{simpsets} can apply the

   113   full generality of the Isabelle

   114   simplifier.  Due to the interpretation of theorems as code

   115   equations, rewrites are applied to the right hand side and the

   116   arguments of the left hand side of an equation, but never to the

   117   constant heading the left hand side.

   118

   119   Pre- and postprocessor can be setup to transfer between

   120   expressions suitable for logical reasoning and expressions

   121   suitable for execution.  As example, take list membership; logically

   122   it is expressed as @{term "x \<in> set xs"}.  But for execution

   123   the intermediate set is not desirable.  Hence the following

   124   specification:

   125 \<close>

   126

   127 definition %quote member :: "'a list \<Rightarrow> 'a \<Rightarrow> bool"

   128 where

   129   [code_abbrev]: "member xs x \<longleftrightarrow> x \<in> set xs"

   130

   131 text \<open>

   132   \noindent The \emph{@{attribute code_abbrev} attribute} declares

   133   its theorem a rewrite rule for the postprocessor and the symmetric

   134   of its theorem as rewrite rule for the preprocessor.  Together,

   135   this has the effect that expressions @{thm (rhs) member_def [no_vars]}

   136   are replaced by @{thm (lhs) member_def [no_vars]} in generated code, but

   137   are turned back into @{thm (rhs) member_def [no_vars]} if generated

   138   code is used for evaluation.

   139

   140   Rewrite rules for pre- or postprocessor may be

   141   declared independently using \emph{@{attribute code_unfold}}

   142   or \emph{@{attribute code_post}} respectively.

   143

   144   \emph{Function transformers} provide a very general

   145   interface, transforming a list of function theorems to another list

   146   of function theorems, provided that neither the heading constant nor

   147   its type change.  The @{term "0::nat"} / @{const Suc} pattern

   148   used in theory @{text Code_Abstract_Nat} (see \secref{abstract_nat})

   149   uses this interface.

   150

   151   \noindent The current setup of the pre- and postprocessor may be inspected

   152   using the @{command_def print_codeproc} command.  @{command_def

   153   code_thms} (see \secref{sec:equations}) provides a convenient

   154   mechanism to inspect the impact of a preprocessor setup on code

   155   equations.

   156 \<close>

   157

   158

   159 subsection \<open>Understanding code equations \label{sec:equations}\<close>

   160

   161 text \<open>

   162   As told in \secref{sec:principle}, the notion of code equations is

   163   vital to code generation.  Indeed most problems which occur in

   164   practice can be resolved by an inspection of the underlying code

   165   equations.

   166

   167   It is possible to exchange the default code equations for constants

   168   by explicitly proving alternative ones:

   169 \<close>

   170

   171 lemma %quote [code]:

   172   "dequeue (AQueue xs []) =

   173      (if xs = [] then (None, AQueue [] [])

   174        else dequeue (AQueue [] (rev xs)))"

   175   "dequeue (AQueue xs (y # ys)) =

   176      (Some y, AQueue xs ys)"

   177   by (cases xs, simp_all) (cases "rev xs", simp_all)

   178

   179 text \<open>

   180   \noindent The annotation @{text "[code]"} is an @{text attribute}

   181   which states that the given theorems should be considered as code

   182   equations for a @{text fun} statement -- the corresponding constant

   183   is determined syntactically.  The resulting code:

   184 \<close>

   185

   186 text %quotetypewriter \<open>

   187   @{code_stmts dequeue (consts) dequeue (Haskell)}

   188 \<close>

   189

   190 text \<open>

   191   \noindent You may note that the equality test @{term "xs = []"} has

   192   been replaced by the predicate @{term "List.null xs"}.  This is due

   193   to the default setup of the \qn{preprocessor}.

   194

   195   This possibility to select arbitrary code equations is the key

   196   technique for program and datatype refinement (see

   197   \secref{sec:refinement}).

   198

   199   Due to the preprocessor, there is the distinction of raw code

   200   equations (before preprocessing) and code equations (after

   201   preprocessing).

   202

   203   The first can be listed (among other data) using the @{command_def

   204   print_codesetup} command.

   205

   206   The code equations after preprocessing are already are blueprint of

   207   the generated program and can be inspected using the @{command

   208   code_thms} command:

   209 \<close>

   210

   211 code_thms %quote dequeue

   212

   213 text \<open>

   214   \noindent This prints a table with the code equations for @{const

   215   dequeue}, including \emph{all} code equations those equations depend

   216   on recursively.  These dependencies themselves can be visualized using

   217   the @{command_def code_deps} command.

   218 \<close>

   219

   220

   221 subsection \<open>Equality\<close>

   222

   223 text \<open>

   224   Implementation of equality deserves some attention.  Here an example

   225   function involving polymorphic equality:

   226 \<close>

   227

   228 primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where

   229   "collect_duplicates xs ys [] = xs"

   230 | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs

   231     then if z \<in> set ys

   232       then collect_duplicates xs ys zs

   233       else collect_duplicates xs (z#ys) zs

   234     else collect_duplicates (z#xs) (z#ys) zs)"

   235

   236 text \<open>

   237   \noindent During preprocessing, the membership test is rewritten,

   238   resulting in @{const List.member}, which itself performs an explicit

   239   equality check, as can be seen in the corresponding @{text SML} code:

   240 \<close>

   241

   242 text %quotetypewriter \<open>

   243   @{code_stmts collect_duplicates (SML)}

   244 \<close>

   245

   246 text \<open>

   247   \noindent Obviously, polymorphic equality is implemented the Haskell

   248   way using a type class.  How is this achieved?  HOL introduces an

   249   explicit class @{class equal} with a corresponding operation @{const

   250   HOL.equal} such that @{thm equal [no_vars]}.  The preprocessing

   251   framework does the rest by propagating the @{class equal} constraints

   252   through all dependent code equations.  For datatypes, instances of

   253   @{class equal} are implicitly derived when possible.  For other types,

   254   you may instantiate @{text equal} manually like any other type class.

   255 \<close>

   256

   257

   258 subsection \<open>Explicit partiality \label{sec:partiality}\<close>

   259

   260 text \<open>

   261   Partiality usually enters the game by partial patterns, as

   262   in the following example, again for amortised queues:

   263 \<close>

   264

   265 definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

   266   "strict_dequeue q = (case dequeue q

   267     of (Some x, q') \<Rightarrow> (x, q'))"

   268

   269 lemma %quote strict_dequeue_AQueue [code]:

   270   "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"

   271   "strict_dequeue (AQueue xs []) =

   272     (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"

   273   by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split)

   274

   275 text \<open>

   276   \noindent In the corresponding code, there is no equation

   277   for the pattern @{term "AQueue [] []"}:

   278 \<close>

   279

   280 text %quotetypewriter \<open>

   281   @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}

   282 \<close>

   283

   284 text \<open>

   285   \noindent In some cases it is desirable to have this

   286   pseudo-\qt{partiality} more explicitly, e.g.~as follows:

   287 \<close>

   288

   289 axiomatization %quote empty_queue :: 'a

   290

   291 definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

   292   "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"

   293

   294 lemma %quote strict_dequeue'_AQueue [code]:

   295   "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue

   296      else strict_dequeue' (AQueue [] (rev xs)))"

   297   "strict_dequeue' (AQueue xs (y # ys)) =

   298      (y, AQueue xs ys)"

   299   by (simp_all add: strict_dequeue'_def split: list.splits)

   300

   301 text \<open>

   302   Observe that on the right hand side of the definition of @{const

   303   "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs.

   304

   305   Normally, if constants without any code equations occur in a

   306   program, the code generator complains (since in most cases this is

   307   indeed an error).  But such constants can also be thought

   308   of as function definitions which always fail,

   309   since there is never a successful pattern match on the left hand

   310   side.  In order to categorise a constant into that category

   311   explicitly, use the @{attribute code} attribute with

   312   @{text abort}:

   313 \<close>

   314

   315 declare %quote [[code abort: empty_queue]]

   316

   317 text \<open>

   318   \noindent Then the code generator will just insert an error or

   319   exception at the appropriate position:

   320 \<close>

   321

   322 text %quotetypewriter \<open>

   323   @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}

   324 \<close>

   325

   326 text \<open>

   327   \noindent This feature however is rarely needed in practice.  Note

   328   also that the HOL default setup already declares @{const undefined},

   329   which is most likely to be used in such situations, as

   330   @{text "code abort"}.

   331 \<close>

   332

   333

   334 subsection \<open>If something goes utterly wrong \label{sec:utterly_wrong}\<close>

   335

   336 text \<open>

   337   Under certain circumstances, the code generator fails to produce

   338   code entirely.  To debug these, the following hints may prove

   339   helpful:

   340

   341   \begin{description}

   342

   343     \ditem{\emph{Check with a different target language}.}  Sometimes

   344       the situation gets more clear if you switch to another target

   345       language; the code generated there might give some hints what

   346       prevents the code generator to produce code for the desired

   347       language.

   348

   349     \ditem{\emph{Inspect code equations}.}  Code equations are the central

   350       carrier of code generation.  Most problems occurring while generating

   351       code can be traced to single equations which are printed as part of

   352       the error message.  A closer inspection of those may offer the key

   353       for solving issues (cf.~\secref{sec:equations}).

   354

   355     \ditem{\emph{Inspect preprocessor setup}.}  The preprocessor might

   356       transform code equations unexpectedly; to understand an

   357       inspection of its setup is necessary (cf.~\secref{sec:preproc}).

   358

   359     \ditem{\emph{Generate exceptions}.}  If the code generator

   360       complains about missing code equations, in can be helpful to

   361       implement the offending constants as exceptions

   362       (cf.~\secref{sec:partiality}); this allows at least for a formal

   363       generation of code, whose inspection may then give clues what is

   364       wrong.

   365

   366     \ditem{\emph{Remove offending code equations}.}  If code

   367       generation is prevented by just a single equation, this can be

   368       removed (cf.~\secref{sec:equations}) to allow formal code

   369       generation, whose result in turn can be used to trace the

   370       problem.  The most prominent case here are mismatches in type

   371       class signatures (\qt{wellsortedness error}).

   372

   373   \end{description}

   374 \<close>

   375

   376 end