doc-src/Codegen/Thy/Introduction.thy
changeset 38437 ffb1c5bf0425
parent 38405 7935b334893e
child 38505 2f8699695cf6
equal deleted inserted replaced
38410:8e4058f4848c 38437:ffb1c5bf0425
     2 imports Setup
     2 imports Setup
     3 begin
     3 begin
     4 
     4 
     5 section {* Introduction *}
     5 section {* Introduction *}
     6 
     6 
     7 subsection {* Code generation fundamental: shallow embedding *}
     7 text {*
     8 
     8   This tutorial introduces the code generator facilities of @{text
     9 subsection {* A quick start with the @{text "Isabelle/HOL"} toolbox *}
     9   "Isabelle/HOL"}.  It allows to turn (a certain class of) HOL
    10 
    10   specifications into corresponding executable code in the programming
    11 subsection {* Type classes *}
    11   languages @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and
    12 
    12   @{text Haskell} \cite{haskell-revised-report}.
    13 subsection {* How to continue from here *}
    13 
    14 
    14   To profit from this tutorial, some familiarity and experience with
    15 subsection {* If something goes utterly wrong *}
    15   @{theory HOL} \cite{isa-tutorial} and its basic theories is assumed.
    16 
    16 *}
    17 text {*
    17 
    18   This tutorial introduces a generic code generator for the
    18 
    19   @{text Isabelle} system.
    19 subsection {* Code generation principle: shallow embedding \label{sec:principle} *}
    20   The
    20 
    21   \qn{target language} for which code is
    21 text {*
    22   generated is not fixed, but may be one of several
    22   The key concept for understanding Isabelle's code generation is
    23   functional programming languages (currently, the implementation
    23   \emph{shallow embedding}: logical entities like constants, types and
    24   supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
    24   classes are identified with corresponding entities in the target
    25   \cite{haskell-revised-report}).
    25   language.  In particular, the carrier of a generated program's
    26 
    26   semantics are \emph{equational theorems} from the logic.  If we view
    27   Conceptually the code generator framework is part
    27   a generated program as an implementation of a higher-order rewrite
    28   of Isabelle's @{theory Pure} meta logic framework; the logic
    28   system, then every rewrite step performed by the program can be
    29   @{theory HOL} \cite{isa-tutorial}, which is an extension of @{theory Pure},
    29   simulated in the logic, which guarantees partial correctness
    30   already comes with a reasonable framework setup and thus provides
    30   \cite{Haftmann-Nipkow:2010:code}.
    31   a good basis for creating code-generation-driven
    31 *}
    32   applications.  So, we assume some familiarity and experience
    32 
    33   with the ingredients of the @{theory HOL} distribution theories.
    33 
    34 
    34 subsection {* A quick start with the Isabelle/HOL toolbox \label{sec:queue_example} *}
    35   The code generator aims to be usable with no further ado
    35 
    36   in most cases, while allowing for detailed customisation.
    36 text {*
    37   This can be seen in the structure of this tutorial: after a short
    37   In a HOL theory, the @{command datatype} and @{command
    38   conceptual introduction with an example (\secref{sec:intro}),
    38   definition}/@{command primrec}/@{command fun} declarations form the
    39   we discuss the generic customisation facilities (\secref{sec:program}).
    39   core of a functional programming language.  By default equational
    40   A further section (\secref{sec:adaptation}) is dedicated to the matter of
    40   theorems stemming from those are used for generated code, therefore
    41   \qn{adaptation} to specific target language environments.  After some
    41   \qt{naive} code generation can proceed without further ado.
    42   further issues (\secref{sec:further}) we conclude with an overview
    42 
    43   of some ML programming interfaces (\secref{sec:ml}).
       
    44 
       
    45   \begin{warn}
       
    46     Ultimately, the code generator which this tutorial deals with
       
    47     is supposed to replace the existing code generator
       
    48     by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
       
    49     So, for the moment, there are two distinct code generators
       
    50     in Isabelle.  In case of ambiguity, we will refer to the framework
       
    51     described here as @{text "generic code generator"}, to the
       
    52     other as @{text "SML code generator"}.
       
    53     Also note that while the framework itself is
       
    54     object-logic independent, only @{theory HOL} provides a reasonable
       
    55     framework setup.    
       
    56   \end{warn}
       
    57 
       
    58 *}
       
    59 
       
    60 subsection {* Code generation via shallow embedding \label{sec:intro} *}
       
    61 
       
    62 text {*
       
    63   The key concept for understanding @{text Isabelle}'s code generation is
       
    64   \emph{shallow embedding}, i.e.~logical entities like constants, types and
       
    65   classes are identified with corresponding concepts in the target language.
       
    66 
       
    67   Inside @{theory HOL}, the @{command datatype} and
       
    68   @{command definition}/@{command primrec}/@{command fun} declarations form
       
    69   the core of a functional programming language.  The default code generator setup
       
    70    transforms those into functional programs immediately.
       
    71   This means that \qt{naive} code generation can proceed without further ado.
       
    72   For example, here a simple \qt{implementation} of amortised queues:
    43   For example, here a simple \qt{implementation} of amortised queues:
    73 *}
    44 *}
    74 
    45 
    75 datatype %quote 'a queue = AQueue "'a list" "'a list"
    46 datatype %quote 'a queue = AQueue "'a list" "'a list"
    76 
    47 
    82 
    53 
    83 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
    54 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
    84     "dequeue (AQueue [] []) = (None, AQueue [] [])"
    55     "dequeue (AQueue [] []) = (None, AQueue [] [])"
    85   | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
    56   | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
    86   | "dequeue (AQueue xs []) =
    57   | "dequeue (AQueue xs []) =
    87       (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))"
    58       (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))" (*<*)
       
    59 
       
    60 lemma %invisible dequeue_nonempty_Nil [simp]:
       
    61   "xs \<noteq> [] \<Longrightarrow> dequeue (AQueue xs []) = (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))"
       
    62   by (cases xs) (simp_all split: list.splits) (*>*)
    88 
    63 
    89 text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
    64 text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
    90 
    65 
    91 export_code %quote empty dequeue enqueue in SML
    66 export_code %quote empty dequeue enqueue in SML
    92   module_name Example file "examples/example.ML"
    67   module_name Example file "examples/example.ML"
    94 text {* \noindent resulting in the following code: *}
    69 text {* \noindent resulting in the following code: *}
    95 
    70 
    96 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
    71 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
    97 
    72 
    98 text {*
    73 text {*
    99   \noindent The @{command export_code} command takes a space-separated list of
    74   \noindent The @{command export_code} command takes a space-separated
   100   constants for which code shall be generated;  anything else needed for those
    75   list of constants for which code shall be generated; anything else
   101   is added implicitly.  Then follows a target language identifier
    76   needed for those is added implicitly.  Then follows a target
   102   (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
    77   language identifier and a freely chosen module name.  A file name
   103   A file name denotes the destination to store the generated code.  Note that
    78   denotes the destination to store the generated code.  Note that the
   104   the semantics of the destination depends on the target language:  for
    79   semantics of the destination depends on the target language: for
   105   @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
    80   @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text
   106   it denotes a \emph{directory} where a file named as the module name
    81   Haskell} it denotes a \emph{directory} where a file named as the
   107   (with extension @{text ".hs"}) is written:
    82   module name (with extension @{text ".hs"}) is written:
   108 *}
    83 *}
   109 
    84 
   110 export_code %quote empty dequeue enqueue in Haskell
    85 export_code %quote empty dequeue enqueue in Haskell
   111   module_name Example file "examples/"
    86   module_name Example file "examples/"
   112 
    87 
   113 text {*
    88 text {*
   114   \noindent This is the corresponding code in @{text Haskell}:
    89   \noindent This is the corresponding code:
   115 *}
    90 *}
   116 
    91 
   117 text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
    92 text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
   118 
    93 
   119 text {*
    94 text {*
   120   \noindent This demonstrates the basic usage of the @{command export_code} command;
    95   \noindent For more details about @{command export_code} see
   121   for more details see \secref{sec:further}.
    96   \secref{sec:further}.
   122 *}
    97 *}
   123 
    98 
   124 subsection {* If something utterly fails *}
    99 
   125 
   100 subsection {* Type classes *}
   126 text {*
   101 
   127   Under certain circumstances, the code generator fails to produce
   102 text {*
   128   code entirely.  
   103   Code can also be generated from type classes in a Haskell-like
   129 
   104   manner.  For illustration here an example from abstract algebra:
   130   \begin{description}
   105 *}
   131 
   106 
   132     \ditem{generate only one module}
   107 class %quote semigroup =
   133 
   108   fixes mult :: "'a \<Rightarrow> 'a \<Rightarrow> 'a" (infixl "\<otimes>" 70)
   134     \ditem{check with a different target language}
   109   assumes assoc: "(x \<otimes> y) \<otimes> z = x \<otimes> (y \<otimes> z)"
   135 
   110 
   136     \ditem{inspect code equations}
   111 class %quote monoid = semigroup +
   137 
   112   fixes neutral :: 'a ("\<one>")
   138     \ditem{inspect preprocessor setup}
   113   assumes neutl: "\<one> \<otimes> x = x"
   139 
   114     and neutr: "x \<otimes> \<one> = x"
   140     \ditem{generate exceptions}
   115 
   141 
   116 instantiation %quote nat :: monoid
   142     \ditem{remove offending code equations}
   117 begin
   143 
   118 
   144   \end{description}
   119 primrec %quote mult_nat where
   145 *}
   120     "0 \<otimes> n = (0\<Colon>nat)"
   146 
   121   | "Suc m \<otimes> n = n + m \<otimes> n"
   147 subsection {* Code generator architecture \label{sec:concept} *}
   122 
   148 
   123 definition %quote neutral_nat where
   149 text {*
   124   "\<one> = Suc 0"
   150   What you have seen so far should be already enough in a lot of cases.  If you
   125 
   151   are content with this, you can quit reading here.  Anyway, in order to customise
   126 lemma %quote add_mult_distrib:
   152   and adapt the code generator, it is necessary to gain some understanding
   127   fixes n m q :: nat
   153   how it works.
   128   shows "(n + m) \<otimes> q = n \<otimes> q + m \<otimes> q"
   154 
   129   by (induct n) simp_all
   155   \begin{figure}[h]
   130 
   156     \includegraphics{architecture}
   131 instance %quote proof
   157     \caption{Code generator architecture}
   132   fix m n q :: nat
   158     \label{fig:arch}
   133   show "m \<otimes> n \<otimes> q = m \<otimes> (n \<otimes> q)"
   159   \end{figure}
   134     by (induct m) (simp_all add: add_mult_distrib)
   160 
   135   show "\<one> \<otimes> n = n"
   161   The code generator employs a notion of executability
   136     by (simp add: neutral_nat_def)
   162   for three foundational executable ingredients known
   137   show "m \<otimes> \<one> = m"
   163   from functional programming:
   138     by (induct m) (simp_all add: neutral_nat_def)
   164   \emph{code equations}, \emph{datatypes}, and
   139 qed
   165   \emph{type classes}.  A code equation as a first approximation
   140 
   166   is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
   141 end %quote
   167   (an equation headed by a constant @{text f} with arguments
   142 
   168   @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
   143 text {*
   169   Code generation aims to turn code equations
   144   \noindent We define the natural operation of the natural numbers
   170   into a functional program.  This is achieved by three major
   145   on monoids:
   171   components which operate sequentially, i.e. the result of one is
   146 *}
   172   the input
   147 
   173   of the next in the chain,  see figure \ref{fig:arch}:
   148 primrec %quote (in monoid) pow :: "nat \<Rightarrow> 'a \<Rightarrow> 'a" where
       
   149     "pow 0 a = \<one>"
       
   150   | "pow (Suc n) a = a \<otimes> pow n a"
       
   151 
       
   152 text {*
       
   153   \noindent This we use to define the discrete exponentiation
       
   154   function:
       
   155 *}
       
   156 
       
   157 definition %quote bexp :: "nat \<Rightarrow> nat" where
       
   158   "bexp n = pow n (Suc (Suc 0))"
       
   159 
       
   160 text {*
       
   161   \noindent The corresponding code in Haskell uses that language's
       
   162   native classes:
       
   163 *}
       
   164 
       
   165 text %quote {*@{code_stmts bexp (Haskell)}*}
       
   166 
       
   167 text {*
       
   168   \noindent This is a convenient place to show how explicit dictionary
       
   169   construction manifests in generated code -- the same example in
       
   170   @{text SML}:
       
   171 *}
       
   172 
       
   173 text %quote {*@{code_stmts bexp (SML)}*}
       
   174 
       
   175 text {*
       
   176   \noindent Note the parameters with trailing underscore (@{verbatim
       
   177   "A_"}), which are the dictionary parameters.
       
   178 *}
       
   179 
       
   180 
       
   181 subsection {* How to continue from here *}
       
   182 
       
   183 text {*
       
   184   What you have seen so far should be already enough in a lot of
       
   185   cases.  If you are content with this, you can quit reading here.
       
   186 
       
   187   Anyway, to understand situations where problems occur or to increase
       
   188   the scope of code generation beyond default, it is necessary to gain
       
   189   some understanding how the code generator actually works:
   174 
   190 
   175   \begin{itemize}
   191   \begin{itemize}
   176 
   192 
   177     \item The starting point is a collection of raw code equations in a
   193     \item The foundations of the code generator are described in
   178       theory. It is not relevant where they
   194       \secref{sec:foundations}.
   179       stem from, but typically they were either produced by specification
   195 
   180       tools or proved explicitly by the user.
   196     \item In particular \secref{sec:utterly_wrong} gives hints how to
   181       
   197       debug situations where code generation does not succeed as
   182     \item These raw code equations can be subjected to theorem transformations.  This
   198       expected.
   183       \qn{preprocessor} can apply the full
   199 
   184       expressiveness of ML-based theorem transformations to code
   200     \item The scope and quality of generated code can be increased
   185       generation.  The result of preprocessing is a
   201       dramatically by applying refinement techniques, which are
   186       structured collection of code equations.
   202       introduced in \secref{sec:refinement}.
   187 
   203 
   188     \item These code equations are \qn{translated} to a program in an
   204     \item Inductive predicates can be turned executable using an
   189       abstract intermediate language.  Think of it as a kind
   205       extension of the code generator \secref{sec:inductive}.
   190       of \qt{Mini-Haskell} with four \qn{statements}: @{text data}
   206 
   191       (for datatypes), @{text fun} (stemming from code equations),
   207     \item You may want to skim over the more technical sections
   192       also @{text class} and @{text inst} (for type classes).
   208       \secref{sec:adaptation} and \secref{sec:further}.
   193 
   209 
   194     \item Finally, the abstract program is \qn{serialised} into concrete
   210     \item For exhaustive syntax diagrams etc. you should visit the
   195       source code of a target language.
   211       Isabelle/Isar Reference Manual \cite{isabelle-isar-ref}.
   196       This step only produces concrete syntax but does not change the
       
   197       program in essence; all conceptual transformations occur in the
       
   198       translation step.
       
   199 
   212 
   200   \end{itemize}
   213   \end{itemize}
   201 
   214 
   202   \noindent From these steps, only the two last are carried out outside the logic;  by
   215   \bigskip
   203   keeping this layer as thin as possible, the amount of code to trust is
   216 
   204   kept to a minimum.
   217   \begin{center}\fbox{\fbox{\begin{minipage}{8cm}
       
   218 
       
   219     \begin{center}\textit{Happy proving, happy hacking!}\end{center}
       
   220 
       
   221   \end{minipage}}}\end{center}
       
   222 
       
   223   \begin{warn}
       
   224     There is also a more ancient code generator in Isabelle by Stefan
       
   225     Berghofer \cite{Berghofer-Nipkow:2002}.  Although its
       
   226     functionality is covered by the code generator presented here, it
       
   227     will sometimes show up as an artifact.  In case of ambiguity, we
       
   228     will refer to the framework described here as @{text "generic code
       
   229     generator"}, to the other as @{text "SML code generator"}.
       
   230   \end{warn}
   205 *}
   231 *}
   206 
   232 
   207 end
   233 end