doc-src/IsarAdvanced/Codegen/Thy/Introduction.thy
changeset 28419 f65e8b318581
parent 28213 b52f9205a02d
child 28420 293b166c45c5
equal deleted inserted replaced
28418:4ffb62675ade 28419:f65e8b318581
     6 
     6 
     7 section {* Introduction and Overview *}
     7 section {* Introduction and Overview *}
     8 
     8 
     9 text {*
     9 text {*
    10   This tutorial introduces a generic code generator for the
    10   This tutorial introduces a generic code generator for the
    11   Isabelle system \cite{isa-tutorial}.
    11   @{text Isabelle} system.
       
    12   Generic in the sense that the
       
    13   \qn{target language} for which code shall ultimately be
       
    14   generated is not fixed but may be an arbitrary state-of-the-art
       
    15   functional programming language (currently, the implementation
       
    16   supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
       
    17   \cite{haskell-revised-report}).
       
    18 
       
    19   Conceptually the code generator framework is part
       
    20   of Isabelle's @{text Pure} meta logic framework; the logic
       
    21   @{text HOL} which is an extension of @{text Pure}
       
    22   already comes with a reasonable framework setup and thus provides
       
    23   a good working horse for raising code-generation-driven
       
    24   applications.  So, we assume some familiarity and experience
       
    25   with the ingredients of the @{text HOL} distribution theories.
       
    26   (see also \cite{isa-tutorial}).
       
    27 
       
    28   The code generator aims to be usable with no further ado
       
    29   in most cases while allowing for detailed customisation.
       
    30   This manifests in the structure of this tutorial: after a short
       
    31   conceptual introduction with an example \secref{sec:intro},
       
    32   we discuss the generic customisation facilities \secref{sec:program}.
       
    33   A further section \secref{sec:adaption} is dedicated to the matter of
       
    34   \qn{adaption} to specific target language environments.  After some
       
    35   further issues \secref{sec:further} we conclude with an overview
       
    36   of some ML programming interfaces \secref{sec:ml}.
       
    37 
       
    38   \begin{warn}
       
    39     Ultimately, the code generator which this tutorial deals with
       
    40     is supposed to replace the already established code generator
       
    41     by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
       
    42     So, for the moment, there are two distinct code generators
       
    43     in Isabelle.
       
    44     Also note that while the framework itself is
       
    45     object-logic independent, only @{text HOL} provides a reasonable
       
    46     framework setup.    
       
    47   \end{warn}
       
    48 
    12 *}
    49 *}
    13 
    50 
    14 subsection {* Code generation via shallow embedding *}
    51 subsection {* Code generation via shallow embedding \label{sec:intro} *}
    15 
    52 
    16 text {* example *}
    53 text {*
       
    54   The key concept for understanding @{text Isabelle}'s code generation is
       
    55   \emph{shallow embedding}, i.e.~logical entities like constants, types and
       
    56   classes are identified with corresponding concepts in the target language.
       
    57 
       
    58   Inside @{text HOL}, the @{command datatype} and
       
    59   @{command definition}/@{command primrec}/@{command fun} declarations form
       
    60   the core of a functional programming language.  The default code generator setup
       
    61   allows to turn those into functional programs immediately.
       
    62 
       
    63   This means that \qt{naive} code generation can proceed without further ado.
       
    64   For example, here a simple \qt{implementation} of amortised queues:
       
    65 *}
       
    66 
       
    67 datatype %quoteme 'a queue = Queue "'a list" "'a list"
       
    68 
       
    69 definition %quoteme empty :: "'a queue" where
       
    70   "empty = Queue [] []"
       
    71 
       
    72 primrec %quoteme enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
       
    73   "enqueue x (Queue xs ys) = Queue (x # xs) ys"
       
    74 
       
    75 fun %quoteme dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
       
    76     "dequeue (Queue [] []) = (None, Queue [] [])"
       
    77   | "dequeue (Queue xs (y # ys)) = (Some y, Queue xs ys)"
       
    78   | "dequeue (Queue xs []) = (case rev xs of y # ys \<Rightarrow> (Some y, Queue [] ys))"
       
    79 
       
    80 text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
       
    81 
       
    82 export_code %quoteme empty dequeue enqueue in SML module_name Example file "examples/example.ML"
       
    83 
       
    84 text {* \noindent resulting in the following code: *}
       
    85 
       
    86 text %quoteme {*@{code_stmts empty enqueue dequeue (SML)}*}
       
    87 
       
    88 text {*
       
    89   \noindent The @{command export_code} command takes a space-separated list of
       
    90   constants for which code shall be generated;  anything else needed for those
       
    91   is added implicitly.  Then follows by a target language identifier
       
    92   (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
       
    93   A file name denotes the destination to store the generated code.  Note that
       
    94   the semantics of the destination depends on the target language:  for
       
    95   @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
       
    96   it denotes a \emph{directory} where a file named as the module name
       
    97   (with extension @{text ".hs"}) is written:
       
    98 *}
       
    99 
       
   100 export_code %quoteme empty dequeue enqueue in SML module_name Example file "examples/Example.ML"
       
   101 
       
   102 text {*
       
   103   \noindent This is how the corresponding code in @{text Haskell} looks like:
       
   104 *}
       
   105 
       
   106 text %quoteme {*@{code_stmts empty enqueue dequeue (Haskell)}*}
       
   107 
       
   108 text {*
       
   109   \noindent This demonstrates the basic usage of the @{command export_code} command;
       
   110   for more details see \secref{sec:serializer}.
       
   111 *}
    17 
   112 
    18 subsection {* Code generator architecture *}
   113 subsection {* Code generator architecture *}
    19 
   114 
    20 text {* foundation, forward references for yet unexplained things  *}
   115 text {*
       
   116   What you have seen so far should be already enough in a lot of cases.  If you
       
   117   are content with this, you can quit reading here.  Anyway, in order to customise
       
   118   and adapt the code generator, it is inevitable to gain some understanding
       
   119   how it works.
       
   120 
       
   121   \begin{figure}[h]
       
   122     \centering
       
   123     \includegraphics[width=0.7\textwidth]{codegen_process}
       
   124     \caption{Code generator architecture}
       
   125     \label{fig:arch}
       
   126   \end{figure}
       
   127 
       
   128   The code generator itself consists of three major components
       
   129   which operate sequentially, i.e. the result of one is the input
       
   130   of the next in the chain,  see diagram \ref{fig:arch}:
       
   131 
       
   132   The code generator employs a notion of executability
       
   133   for three foundational executable ingredients known
       
   134   from functional programming:
       
   135   \emph{defining equations}, \emph{datatypes}, and
       
   136   \emph{type classes}.  A defining equation as a first approximation
       
   137   is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
       
   138   (an equation headed by a constant @{text f} with arguments
       
   139   @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
       
   140   Code generation aims to turn defining equations
       
   141   into a functional program by running through the following
       
   142   process:
       
   143 
       
   144   \begin{itemize}
       
   145 
       
   146     \item Out of the vast collection of theorems proven in a
       
   147       \qn{theory}, a reasonable subset modelling
       
   148       defining equations is \qn{selected}.
       
   149 
       
   150     \item On those selected theorems, certain
       
   151       transformations are carried out
       
   152       (\qn{preprocessing}).  Their purpose is to turn theorems
       
   153       representing non- or badly executable
       
   154       specifications into equivalent but executable counterparts.
       
   155       The result is a structured collection of \qn{code theorems}.
       
   156 
       
   157     \item Before the selected defining equations are continued with,
       
   158       they can be \qn{preprocessed}, i.e. subjected to theorem
       
   159       transformations.  This \qn{preprocessor} is an interface which
       
   160       allows to apply
       
   161       the full expressiveness of ML-based theorem transformations
       
   162       to code generation;  motivating examples are shown below, see
       
   163       \secref{sec:preproc}.
       
   164       The result of the preprocessing step is a structured collection
       
   165       of defining equations.
       
   166 
       
   167     \item These defining equations are \qn{translated} to a program
       
   168       in an abstract intermediate language.
       
   169 
       
   170     \item Finally, the abstract program is \qn{serialised} into concrete
       
   171       source code of a target language.
       
   172 
       
   173   \end{itemize}
       
   174 
       
   175   \noindent From these steps, only the two last are carried out outside the logic;  by
       
   176   keeping this layer as thin as possible, the amount of code to trust is
       
   177   kept to a minimum.
       
   178 *}
    21 
   179 
    22 end
   180 end