author | huffman |
Sun, 07 Mar 2010 16:12:01 -0800 | |
changeset 35641 | a17bc4cec23a |
parent 34155 | 14aaccb399b3 |
child 38402 | 58fc3a3af71f |
permissions | -rw-r--r-- |
28213 | 1 |
theory Introduction |
2 |
imports Setup |
|
3 |
begin |
|
4 |
||
5 |
section {* Introduction and Overview *} |
|
6 |
||
7 |
text {* |
|
8 |
This tutorial introduces a generic code generator for the |
|
28419 | 9 |
@{text Isabelle} system. |
34155 | 10 |
The |
11 |
\qn{target language} for which code is |
|
12 |
generated is not fixed, but may be one of several |
|
13 |
functional programming languages (currently, the implementation |
|
28419 | 14 |
supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell} |
15 |
\cite{haskell-revised-report}). |
|
16 |
||
17 |
Conceptually the code generator framework is part |
|
28428 | 18 |
of Isabelle's @{theory Pure} meta logic framework; the logic |
34155 | 19 |
@{theory HOL} \cite{isa-tutorial}, which is an extension of @{theory Pure}, |
28419 | 20 |
already comes with a reasonable framework setup and thus provides |
34155 | 21 |
a good basis for creating code-generation-driven |
28419 | 22 |
applications. So, we assume some familiarity and experience |
28428 | 23 |
with the ingredients of the @{theory HOL} distribution theories. |
28419 | 24 |
|
25 |
The code generator aims to be usable with no further ado |
|
34155 | 26 |
in most cases, while allowing for detailed customisation. |
27 |
This can be seen in the structure of this tutorial: after a short |
|
28447 | 28 |
conceptual introduction with an example (\secref{sec:intro}), |
29 |
we discuss the generic customisation facilities (\secref{sec:program}). |
|
31050 | 30 |
A further section (\secref{sec:adaptation}) is dedicated to the matter of |
31 |
\qn{adaptation} to specific target language environments. After some |
|
28447 | 32 |
further issues (\secref{sec:further}) we conclude with an overview |
33 |
of some ML programming interfaces (\secref{sec:ml}). |
|
28419 | 34 |
|
35 |
\begin{warn} |
|
36 |
Ultimately, the code generator which this tutorial deals with |
|
28447 | 37 |
is supposed to replace the existing code generator |
28419 | 38 |
by Stefan Berghofer \cite{Berghofer-Nipkow:2002}. |
39 |
So, for the moment, there are two distinct code generators |
|
28447 | 40 |
in Isabelle. In case of ambiguity, we will refer to the framework |
41 |
described here as @{text "generic code generator"}, to the |
|
42 |
other as @{text "SML code generator"}. |
|
28419 | 43 |
Also note that while the framework itself is |
28428 | 44 |
object-logic independent, only @{theory HOL} provides a reasonable |
28419 | 45 |
framework setup. |
46 |
\end{warn} |
|
47 |
||
28213 | 48 |
*} |
49 |
||
28419 | 50 |
subsection {* Code generation via shallow embedding \label{sec:intro} *} |
51 |
||
52 |
text {* |
|
53 |
The key concept for understanding @{text Isabelle}'s code generation is |
|
54 |
\emph{shallow embedding}, i.e.~logical entities like constants, types and |
|
55 |
classes are identified with corresponding concepts in the target language. |
|
56 |
||
28428 | 57 |
Inside @{theory HOL}, the @{command datatype} and |
28419 | 58 |
@{command definition}/@{command primrec}/@{command fun} declarations form |
59 |
the core of a functional programming language. The default code generator setup |
|
34155 | 60 |
transforms those into functional programs immediately. |
28419 | 61 |
This means that \qt{naive} code generation can proceed without further ado. |
62 |
For example, here a simple \qt{implementation} of amortised queues: |
|
63 |
*} |
|
64 |
||
29798 | 65 |
datatype %quote 'a queue = AQueue "'a list" "'a list" |
28419 | 66 |
|
28564 | 67 |
definition %quote empty :: "'a queue" where |
29798 | 68 |
"empty = AQueue [] []" |
28419 | 69 |
|
28564 | 70 |
primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where |
29798 | 71 |
"enqueue x (AQueue xs ys) = AQueue (x # xs) ys" |
28419 | 72 |
|
28564 | 73 |
fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where |
29798 | 74 |
"dequeue (AQueue [] []) = (None, AQueue [] [])" |
75 |
| "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)" |
|
76 |
| "dequeue (AQueue xs []) = |
|
77 |
(case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))" |
|
28419 | 78 |
|
79 |
text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *} |
|
28213 | 80 |
|
28564 | 81 |
export_code %quote empty dequeue enqueue in SML |
28447 | 82 |
module_name Example file "examples/example.ML" |
28419 | 83 |
|
84 |
text {* \noindent resulting in the following code: *} |
|
85 |
||
28564 | 86 |
text %quote {*@{code_stmts empty enqueue dequeue (SML)}*} |
28419 | 87 |
|
88 |
text {* |
|
89 |
\noindent The @{command export_code} command takes a space-separated list of |
|
90 |
constants for which code shall be generated; anything else needed for those |
|
28447 | 91 |
is added implicitly. Then follows a target language identifier |
28419 | 92 |
(@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name. |
93 |
A file name denotes the destination to store the generated code. Note that |
|
94 |
the semantics of the destination depends on the target language: for |
|
95 |
@{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell} |
|
96 |
it denotes a \emph{directory} where a file named as the module name |
|
97 |
(with extension @{text ".hs"}) is written: |
|
98 |
*} |
|
99 |
||
28564 | 100 |
export_code %quote empty dequeue enqueue in Haskell |
28447 | 101 |
module_name Example file "examples/" |
28419 | 102 |
|
103 |
text {* |
|
34155 | 104 |
\noindent This is the corresponding code in @{text Haskell}: |
28419 | 105 |
*} |
106 |
||
28564 | 107 |
text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*} |
28419 | 108 |
|
109 |
text {* |
|
110 |
\noindent This demonstrates the basic usage of the @{command export_code} command; |
|
28447 | 111 |
for more details see \secref{sec:further}. |
28419 | 112 |
*} |
28213 | 113 |
|
28447 | 114 |
subsection {* Code generator architecture \label{sec:concept} *} |
28213 | 115 |
|
28419 | 116 |
text {* |
117 |
What you have seen so far should be already enough in a lot of cases. If you |
|
118 |
are content with this, you can quit reading here. Anyway, in order to customise |
|
34155 | 119 |
and adapt the code generator, it is necessary to gain some understanding |
28419 | 120 |
how it works. |
121 |
||
122 |
\begin{figure}[h] |
|
30882
d15725e84091
moved generated eps/pdf to main directory, for proper display in dvi;
wenzelm
parents:
30880
diff
changeset
|
123 |
\includegraphics{architecture} |
28419 | 124 |
\caption{Code generator architecture} |
125 |
\label{fig:arch} |
|
126 |
\end{figure} |
|
127 |
||
128 |
The code generator employs a notion of executability |
|
129 |
for three foundational executable ingredients known |
|
130 |
from functional programming: |
|
29560 | 131 |
\emph{code equations}, \emph{datatypes}, and |
132 |
\emph{type classes}. A code equation as a first approximation |
|
28419 | 133 |
is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} |
134 |
(an equation headed by a constant @{text f} with arguments |
|
135 |
@{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}). |
|
29560 | 136 |
Code generation aims to turn code equations |
28447 | 137 |
into a functional program. This is achieved by three major |
138 |
components which operate sequentially, i.e. the result of one is |
|
139 |
the input |
|
30880 | 140 |
of the next in the chain, see figure \ref{fig:arch}: |
28419 | 141 |
|
142 |
\begin{itemize} |
|
143 |
||
34155 | 144 |
\item The starting point is a collection of raw code equations in a |
145 |
theory. It is not relevant where they |
|
146 |
stem from, but typically they were either produced by specification |
|
147 |
tools or proved explicitly by the user. |
|
30836 | 148 |
|
34155 | 149 |
\item These raw code equations can be subjected to theorem transformations. This |
150 |
\qn{preprocessor} can apply the full |
|
30836 | 151 |
expressiveness of ML-based theorem transformations to code |
34155 | 152 |
generation. The result of preprocessing is a |
30836 | 153 |
structured collection of code equations. |
28419 | 154 |
|
30836 | 155 |
\item These code equations are \qn{translated} to a program in an |
156 |
abstract intermediate language. Think of it as a kind |
|
28447 | 157 |
of \qt{Mini-Haskell} with four \qn{statements}: @{text data} |
29560 | 158 |
(for datatypes), @{text fun} (stemming from code equations), |
28447 | 159 |
also @{text class} and @{text inst} (for type classes). |
28419 | 160 |
|
161 |
\item Finally, the abstract program is \qn{serialised} into concrete |
|
162 |
source code of a target language. |
|
30836 | 163 |
This step only produces concrete syntax but does not change the |
164 |
program in essence; all conceptual transformations occur in the |
|
165 |
translation step. |
|
28419 | 166 |
|
167 |
\end{itemize} |
|
168 |
||
169 |
\noindent From these steps, only the two last are carried out outside the logic; by |
|
170 |
keeping this layer as thin as possible, the amount of code to trust is |
|
171 |
kept to a minimum. |
|
172 |
*} |
|
28213 | 173 |
|
174 |
end |