author | huffman |
Fri, 06 Nov 2009 09:50:37 -0800 | |
changeset 33506 | afb577487b15 |
parent 31050 | 555b56b66fcf |
child 34155 | 14aaccb399b3 |
permissions | -rw-r--r-- |
28213 | 1 |
theory Introduction |
2 |
imports Setup |
|
3 |
begin |
|
4 |
||
5 |
section {* Introduction and Overview *} |
|
6 |
||
7 |
text {* |
|
8 |
This tutorial introduces a generic code generator for the |
|
28419 | 9 |
@{text Isabelle} system. |
10 |
Generic in the sense that the |
|
11 |
\qn{target language} for which code shall ultimately be |
|
12 |
generated is not fixed but may be an arbitrary state-of-the-art |
|
13 |
functional programming language (currently, the implementation |
|
14 |
supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell} |
|
15 |
\cite{haskell-revised-report}). |
|
16 |
||
17 |
Conceptually the code generator framework is part |
|
28428 | 18 |
of Isabelle's @{theory Pure} meta logic framework; the logic |
19 |
@{theory HOL} which is an extension of @{theory Pure} |
|
28419 | 20 |
already comes with a reasonable framework setup and thus provides |
21 |
a good working horse for raising code-generation-driven |
|
22 |
applications. So, we assume some familiarity and experience |
|
28428 | 23 |
with the ingredients of the @{theory HOL} distribution theories. |
28419 | 24 |
(see also \cite{isa-tutorial}). |
25 |
||
26 |
The code generator aims to be usable with no further ado |
|
27 |
in most cases while allowing for detailed customisation. |
|
28 |
This manifests in the structure of this tutorial: after a short |
|
28447 | 29 |
conceptual introduction with an example (\secref{sec:intro}), |
30 |
we discuss the generic customisation facilities (\secref{sec:program}). |
|
31050 | 31 |
A further section (\secref{sec:adaptation}) is dedicated to the matter of |
32 |
\qn{adaptation} to specific target language environments. After some |
|
28447 | 33 |
further issues (\secref{sec:further}) we conclude with an overview |
34 |
of some ML programming interfaces (\secref{sec:ml}). |
|
28419 | 35 |
|
36 |
\begin{warn} |
|
37 |
Ultimately, the code generator which this tutorial deals with |
|
28447 | 38 |
is supposed to replace the existing code generator |
28419 | 39 |
by Stefan Berghofer \cite{Berghofer-Nipkow:2002}. |
40 |
So, for the moment, there are two distinct code generators |
|
28447 | 41 |
in Isabelle. In case of ambiguity, we will refer to the framework |
42 |
described here as @{text "generic code generator"}, to the |
|
43 |
other as @{text "SML code generator"}. |
|
28419 | 44 |
Also note that while the framework itself is |
28428 | 45 |
object-logic independent, only @{theory HOL} provides a reasonable |
28419 | 46 |
framework setup. |
47 |
\end{warn} |
|
48 |
||
28213 | 49 |
*} |
50 |
||
28419 | 51 |
subsection {* Code generation via shallow embedding \label{sec:intro} *} |
52 |
||
53 |
text {* |
|
54 |
The key concept for understanding @{text Isabelle}'s code generation is |
|
55 |
\emph{shallow embedding}, i.e.~logical entities like constants, types and |
|
56 |
classes are identified with corresponding concepts in the target language. |
|
57 |
||
28428 | 58 |
Inside @{theory HOL}, the @{command datatype} and |
28419 | 59 |
@{command definition}/@{command primrec}/@{command fun} declarations form |
60 |
the core of a functional programming language. The default code generator setup |
|
61 |
allows to turn those into functional programs immediately. |
|
62 |
This means that \qt{naive} code generation can proceed without further ado. |
|
63 |
For example, here a simple \qt{implementation} of amortised queues: |
|
64 |
*} |
|
65 |
||
29798 | 66 |
datatype %quote 'a queue = AQueue "'a list" "'a list" |
28419 | 67 |
|
28564 | 68 |
definition %quote empty :: "'a queue" where |
29798 | 69 |
"empty = AQueue [] []" |
28419 | 70 |
|
28564 | 71 |
primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where |
29798 | 72 |
"enqueue x (AQueue xs ys) = AQueue (x # xs) ys" |
28419 | 73 |
|
28564 | 74 |
fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where |
29798 | 75 |
"dequeue (AQueue [] []) = (None, AQueue [] [])" |
76 |
| "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)" |
|
77 |
| "dequeue (AQueue xs []) = |
|
78 |
(case rev xs of y # ys \<Rightarrow> (Some y, AQueue [] ys))" |
|
28419 | 79 |
|
80 |
text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *} |
|
28213 | 81 |
|
28564 | 82 |
export_code %quote empty dequeue enqueue in SML |
28447 | 83 |
module_name Example file "examples/example.ML" |
28419 | 84 |
|
85 |
text {* \noindent resulting in the following code: *} |
|
86 |
||
28564 | 87 |
text %quote {*@{code_stmts empty enqueue dequeue (SML)}*} |
28419 | 88 |
|
89 |
text {* |
|
90 |
\noindent The @{command export_code} command takes a space-separated list of |
|
91 |
constants for which code shall be generated; anything else needed for those |
|
28447 | 92 |
is added implicitly. Then follows a target language identifier |
28419 | 93 |
(@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name. |
94 |
A file name denotes the destination to store the generated code. Note that |
|
95 |
the semantics of the destination depends on the target language: for |
|
96 |
@{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell} |
|
97 |
it denotes a \emph{directory} where a file named as the module name |
|
98 |
(with extension @{text ".hs"}) is written: |
|
99 |
*} |
|
100 |
||
28564 | 101 |
export_code %quote empty dequeue enqueue in Haskell |
28447 | 102 |
module_name Example file "examples/" |
28419 | 103 |
|
104 |
text {* |
|
105 |
\noindent This is how the corresponding code in @{text Haskell} looks like: |
|
106 |
*} |
|
107 |
||
28564 | 108 |
text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*} |
28419 | 109 |
|
110 |
text {* |
|
111 |
\noindent This demonstrates the basic usage of the @{command export_code} command; |
|
28447 | 112 |
for more details see \secref{sec:further}. |
28419 | 113 |
*} |
28213 | 114 |
|
28447 | 115 |
subsection {* Code generator architecture \label{sec:concept} *} |
28213 | 116 |
|
28419 | 117 |
text {* |
118 |
What you have seen so far should be already enough in a lot of cases. If you |
|
119 |
are content with this, you can quit reading here. Anyway, in order to customise |
|
120 |
and adapt the code generator, it is inevitable to gain some understanding |
|
121 |
how it works. |
|
122 |
||
123 |
\begin{figure}[h] |
|
30882
d15725e84091
moved generated eps/pdf to main directory, for proper display in dvi;
wenzelm
parents:
30880
diff
changeset
|
124 |
\includegraphics{architecture} |
28419 | 125 |
\caption{Code generator architecture} |
126 |
\label{fig:arch} |
|
127 |
\end{figure} |
|
128 |
||
129 |
The code generator employs a notion of executability |
|
130 |
for three foundational executable ingredients known |
|
131 |
from functional programming: |
|
29560 | 132 |
\emph{code equations}, \emph{datatypes}, and |
133 |
\emph{type classes}. A code equation as a first approximation |
|
28419 | 134 |
is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} |
135 |
(an equation headed by a constant @{text f} with arguments |
|
136 |
@{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}). |
|
29560 | 137 |
Code generation aims to turn code equations |
28447 | 138 |
into a functional program. This is achieved by three major |
139 |
components which operate sequentially, i.e. the result of one is |
|
140 |
the input |
|
30880 | 141 |
of the next in the chain, see figure \ref{fig:arch}: |
28419 | 142 |
|
143 |
\begin{itemize} |
|
144 |
||
30836 | 145 |
\item Starting point is a collection of raw code equations in a |
146 |
theory; due to proof irrelevance it is not relevant where they |
|
147 |
stem from but typically they are either descendant of specification |
|
148 |
tools or explicit proofs by the user. |
|
149 |
||
150 |
\item Before these raw code equations are continued |
|
151 |
with, they can be subjected to theorem transformations. This |
|
152 |
\qn{preprocessor} is an interface which allows to apply the full |
|
153 |
expressiveness of ML-based theorem transformations to code |
|
154 |
generation. The result of the preprocessing step is a |
|
155 |
structured collection of code equations. |
|
28419 | 156 |
|
30836 | 157 |
\item These code equations are \qn{translated} to a program in an |
158 |
abstract intermediate language. Think of it as a kind |
|
28447 | 159 |
of \qt{Mini-Haskell} with four \qn{statements}: @{text data} |
29560 | 160 |
(for datatypes), @{text fun} (stemming from code equations), |
28447 | 161 |
also @{text class} and @{text inst} (for type classes). |
28419 | 162 |
|
163 |
\item Finally, the abstract program is \qn{serialised} into concrete |
|
164 |
source code of a target language. |
|
30836 | 165 |
This step only produces concrete syntax but does not change the |
166 |
program in essence; all conceptual transformations occur in the |
|
167 |
translation step. |
|
28419 | 168 |
|
169 |
\end{itemize} |
|
170 |
||
171 |
\noindent From these steps, only the two last are carried out outside the logic; by |
|
172 |
keeping this layer as thin as possible, the amount of code to trust is |
|
173 |
kept to a minimum. |
|
174 |
*} |
|
28213 | 175 |
|
176 |
end |