| author | huffman | 
| Mon, 01 Mar 2010 09:55:32 -0800 | |
| changeset 35486 | c91854705b1d | 
| parent 34155 | 14aaccb399b3 | 
| child 38402 | 58fc3a3af71f | 
| permissions | -rw-r--r-- | 
| 28213 | 1 | theory Introduction | 
| 2 | imports Setup | |
| 3 | begin | |
| 4 | ||
| 5 | section {* Introduction and Overview *}
 | |
| 6 | ||
| 7 | text {*
 | |
| 8 | This tutorial introduces a generic code generator for the | |
| 28419 | 9 |   @{text Isabelle} system.
 | 
| 34155 | 10 | The | 
| 11 |   \qn{target language} for which code is
 | |
| 12 | generated is not fixed, but may be one of several | |
| 13 | functional programming languages (currently, the implementation | |
| 28419 | 14 |   supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
 | 
| 15 |   \cite{haskell-revised-report}).
 | |
| 16 | ||
| 17 | Conceptually the code generator framework is part | |
| 28428 | 18 |   of Isabelle's @{theory Pure} meta logic framework; the logic
 | 
| 34155 | 19 |   @{theory HOL} \cite{isa-tutorial}, which is an extension of @{theory Pure},
 | 
| 28419 | 20 | already comes with a reasonable framework setup and thus provides | 
| 34155 | 21 | a good basis for creating code-generation-driven | 
| 28419 | 22 | applications. So, we assume some familiarity and experience | 
| 28428 | 23 |   with the ingredients of the @{theory HOL} distribution theories.
 | 
| 28419 | 24 | |
| 25 | The code generator aims to be usable with no further ado | |
| 34155 | 26 | in most cases, while allowing for detailed customisation. | 
| 27 | This can be seen in the structure of this tutorial: after a short | |
| 28447 | 28 |   conceptual introduction with an example (\secref{sec:intro}),
 | 
| 29 |   we discuss the generic customisation facilities (\secref{sec:program}).
 | |
| 31050 | 30 |   A further section (\secref{sec:adaptation}) is dedicated to the matter of
 | 
| 31 |   \qn{adaptation} to specific target language environments.  After some
 | |
| 28447 | 32 |   further issues (\secref{sec:further}) we conclude with an overview
 | 
| 33 |   of some ML programming interfaces (\secref{sec:ml}).
 | |
| 28419 | 34 | |
| 35 |   \begin{warn}
 | |
| 36 | Ultimately, the code generator which this tutorial deals with | |
| 28447 | 37 | is supposed to replace the existing code generator | 
| 28419 | 38 |     by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
 | 
| 39 | So, for the moment, there are two distinct code generators | |
| 28447 | 40 | in Isabelle. In case of ambiguity, we will refer to the framework | 
| 41 |     described here as @{text "generic code generator"}, to the
 | |
| 42 |     other as @{text "SML code generator"}.
 | |
| 28419 | 43 | Also note that while the framework itself is | 
| 28428 | 44 |     object-logic independent, only @{theory HOL} provides a reasonable
 | 
| 28419 | 45 | framework setup. | 
| 46 |   \end{warn}
 | |
| 47 | ||
| 28213 | 48 | *} | 
| 49 | ||
| 28419 | 50 | subsection {* Code generation via shallow embedding \label{sec:intro} *}
 | 
| 51 | ||
| 52 | text {*
 | |
| 53 |   The key concept for understanding @{text Isabelle}'s code generation is
 | |
| 54 |   \emph{shallow embedding}, i.e.~logical entities like constants, types and
 | |
| 55 | classes are identified with corresponding concepts in the target language. | |
| 56 | ||
| 28428 | 57 |   Inside @{theory HOL}, the @{command datatype} and
 | 
| 28419 | 58 |   @{command definition}/@{command primrec}/@{command fun} declarations form
 | 
| 59 | the core of a functional programming language. The default code generator setup | |
| 34155 | 60 | transforms those into functional programs immediately. | 
| 28419 | 61 |   This means that \qt{naive} code generation can proceed without further ado.
 | 
| 62 |   For example, here a simple \qt{implementation} of amortised queues:
 | |
| 63 | *} | |
| 64 | ||
| 29798 | 65 | datatype %quote 'a queue = AQueue "'a list" "'a list" | 
| 28419 | 66 | |
| 28564 | 67 | definition %quote empty :: "'a queue" where | 
| 29798 | 68 | "empty = AQueue [] []" | 
| 28419 | 69 | |
| 28564 | 70 | primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where | 
| 29798 | 71 | "enqueue x (AQueue xs ys) = AQueue (x # xs) ys" | 
| 28419 | 72 | |
| 28564 | 73 | fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where | 
| 29798 | 74 | "dequeue (AQueue [] []) = (None, AQueue [] [])" | 
| 75 | | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)" | |
| 76 | | "dequeue (AQueue xs []) = | |
| 77 | (case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))" | |
| 28419 | 78 | |
| 79 | text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
 | |
| 28213 | 80 | |
| 28564 | 81 | export_code %quote empty dequeue enqueue in SML | 
| 28447 | 82 | module_name Example file "examples/example.ML" | 
| 28419 | 83 | |
| 84 | text {* \noindent resulting in the following code: *}
 | |
| 85 | ||
| 28564 | 86 | text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
 | 
| 28419 | 87 | |
| 88 | text {*
 | |
| 89 |   \noindent The @{command export_code} command takes a space-separated list of
 | |
| 90 | constants for which code shall be generated; anything else needed for those | |
| 28447 | 91 | is added implicitly. Then follows a target language identifier | 
| 28419 | 92 |   (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
 | 
| 93 | A file name denotes the destination to store the generated code. Note that | |
| 94 | the semantics of the destination depends on the target language: for | |
| 95 |   @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
 | |
| 96 |   it denotes a \emph{directory} where a file named as the module name
 | |
| 97 |   (with extension @{text ".hs"}) is written:
 | |
| 98 | *} | |
| 99 | ||
| 28564 | 100 | export_code %quote empty dequeue enqueue in Haskell | 
| 28447 | 101 | module_name Example file "examples/" | 
| 28419 | 102 | |
| 103 | text {*
 | |
| 34155 | 104 |   \noindent This is the corresponding code in @{text Haskell}:
 | 
| 28419 | 105 | *} | 
| 106 | ||
| 28564 | 107 | text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
 | 
| 28419 | 108 | |
| 109 | text {*
 | |
| 110 |   \noindent This demonstrates the basic usage of the @{command export_code} command;
 | |
| 28447 | 111 |   for more details see \secref{sec:further}.
 | 
| 28419 | 112 | *} | 
| 28213 | 113 | |
| 28447 | 114 | subsection {* Code generator architecture \label{sec:concept} *}
 | 
| 28213 | 115 | |
| 28419 | 116 | text {*
 | 
| 117 | What you have seen so far should be already enough in a lot of cases. If you | |
| 118 | are content with this, you can quit reading here. Anyway, in order to customise | |
| 34155 | 119 | and adapt the code generator, it is necessary to gain some understanding | 
| 28419 | 120 | how it works. | 
| 121 | ||
| 122 |   \begin{figure}[h]
 | |
| 30882 
d15725e84091
moved generated eps/pdf to main directory, for proper display in dvi;
 wenzelm parents: 
30880diff
changeset | 123 |     \includegraphics{architecture}
 | 
| 28419 | 124 |     \caption{Code generator architecture}
 | 
| 125 |     \label{fig:arch}
 | |
| 126 |   \end{figure}
 | |
| 127 | ||
| 128 | The code generator employs a notion of executability | |
| 129 | for three foundational executable ingredients known | |
| 130 | from functional programming: | |
| 29560 | 131 |   \emph{code equations}, \emph{datatypes}, and
 | 
| 132 |   \emph{type classes}.  A code equation as a first approximation
 | |
| 28419 | 133 |   is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
 | 
| 134 |   (an equation headed by a constant @{text f} with arguments
 | |
| 135 |   @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
 | |
| 29560 | 136 | Code generation aims to turn code equations | 
| 28447 | 137 | into a functional program. This is achieved by three major | 
| 138 | components which operate sequentially, i.e. the result of one is | |
| 139 | the input | |
| 30880 | 140 |   of the next in the chain,  see figure \ref{fig:arch}:
 | 
| 28419 | 141 | |
| 142 |   \begin{itemize}
 | |
| 143 | ||
| 34155 | 144 | \item The starting point is a collection of raw code equations in a | 
| 145 | theory. It is not relevant where they | |
| 146 | stem from, but typically they were either produced by specification | |
| 147 | tools or proved explicitly by the user. | |
| 30836 | 148 | |
| 34155 | 149 | \item These raw code equations can be subjected to theorem transformations. This | 
| 150 |       \qn{preprocessor} can apply the full
 | |
| 30836 | 151 | expressiveness of ML-based theorem transformations to code | 
| 34155 | 152 | generation. The result of preprocessing is a | 
| 30836 | 153 | structured collection of code equations. | 
| 28419 | 154 | |
| 30836 | 155 |     \item These code equations are \qn{translated} to a program in an
 | 
| 156 | abstract intermediate language. Think of it as a kind | |
| 28447 | 157 |       of \qt{Mini-Haskell} with four \qn{statements}: @{text data}
 | 
| 29560 | 158 |       (for datatypes), @{text fun} (stemming from code equations),
 | 
| 28447 | 159 |       also @{text class} and @{text inst} (for type classes).
 | 
| 28419 | 160 | |
| 161 |     \item Finally, the abstract program is \qn{serialised} into concrete
 | |
| 162 | source code of a target language. | |
| 30836 | 163 | This step only produces concrete syntax but does not change the | 
| 164 | program in essence; all conceptual transformations occur in the | |
| 165 | translation step. | |
| 28419 | 166 | |
| 167 |   \end{itemize}
 | |
| 168 | ||
| 169 | \noindent From these steps, only the two last are carried out outside the logic; by | |
| 170 | keeping this layer as thin as possible, the amount of code to trust is | |
| 171 | kept to a minimum. | |
| 172 | *} | |
| 28213 | 173 | |
| 174 | end |