28213
|
1 |
theory Introduction
|
|
2 |
imports Setup
|
|
3 |
begin
|
|
4 |
|
|
5 |
chapter {* Code generation from @{text "Isabelle/HOL"} theories *}
|
|
6 |
|
|
7 |
section {* Introduction and Overview *}
|
|
8 |
|
|
9 |
text {*
|
|
10 |
This tutorial introduces a generic code generator for the
|
28419
|
11 |
@{text Isabelle} system.
|
|
12 |
Generic in the sense that the
|
|
13 |
\qn{target language} for which code shall ultimately be
|
|
14 |
generated is not fixed but may be an arbitrary state-of-the-art
|
|
15 |
functional programming language (currently, the implementation
|
|
16 |
supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
|
|
17 |
\cite{haskell-revised-report}).
|
|
18 |
|
|
19 |
Conceptually the code generator framework is part
|
28428
|
20 |
of Isabelle's @{theory Pure} meta logic framework; the logic
|
|
21 |
@{theory HOL} which is an extension of @{theory Pure}
|
28419
|
22 |
already comes with a reasonable framework setup and thus provides
|
|
23 |
a good working horse for raising code-generation-driven
|
|
24 |
applications. So, we assume some familiarity and experience
|
28428
|
25 |
with the ingredients of the @{theory HOL} distribution theories.
|
28419
|
26 |
(see also \cite{isa-tutorial}).
|
|
27 |
|
|
28 |
The code generator aims to be usable with no further ado
|
|
29 |
in most cases while allowing for detailed customisation.
|
|
30 |
This manifests in the structure of this tutorial: after a short
|
28447
|
31 |
conceptual introduction with an example (\secref{sec:intro}),
|
|
32 |
we discuss the generic customisation facilities (\secref{sec:program}).
|
|
33 |
A further section (\secref{sec:adaption}) is dedicated to the matter of
|
28419
|
34 |
\qn{adaption} to specific target language environments. After some
|
28447
|
35 |
further issues (\secref{sec:further}) we conclude with an overview
|
|
36 |
of some ML programming interfaces (\secref{sec:ml}).
|
28419
|
37 |
|
|
38 |
\begin{warn}
|
|
39 |
Ultimately, the code generator which this tutorial deals with
|
28447
|
40 |
is supposed to replace the existing code generator
|
28419
|
41 |
by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
|
|
42 |
So, for the moment, there are two distinct code generators
|
28447
|
43 |
in Isabelle. In case of ambiguity, we will refer to the framework
|
|
44 |
described here as @{text "generic code generator"}, to the
|
|
45 |
other as @{text "SML code generator"}.
|
28419
|
46 |
Also note that while the framework itself is
|
28428
|
47 |
object-logic independent, only @{theory HOL} provides a reasonable
|
28419
|
48 |
framework setup.
|
|
49 |
\end{warn}
|
|
50 |
|
28213
|
51 |
*}
|
|
52 |
|
28419
|
53 |
subsection {* Code generation via shallow embedding \label{sec:intro} *}
|
|
54 |
|
|
55 |
text {*
|
|
56 |
The key concept for understanding @{text Isabelle}'s code generation is
|
|
57 |
\emph{shallow embedding}, i.e.~logical entities like constants, types and
|
|
58 |
classes are identified with corresponding concepts in the target language.
|
|
59 |
|
28428
|
60 |
Inside @{theory HOL}, the @{command datatype} and
|
28419
|
61 |
@{command definition}/@{command primrec}/@{command fun} declarations form
|
|
62 |
the core of a functional programming language. The default code generator setup
|
|
63 |
allows to turn those into functional programs immediately.
|
|
64 |
This means that \qt{naive} code generation can proceed without further ado.
|
|
65 |
For example, here a simple \qt{implementation} of amortised queues:
|
|
66 |
*}
|
|
67 |
|
28564
|
68 |
datatype %quote 'a queue = Queue "'a list" "'a list"
|
28419
|
69 |
|
28564
|
70 |
definition %quote empty :: "'a queue" where
|
28419
|
71 |
"empty = Queue [] []"
|
|
72 |
|
28564
|
73 |
primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
|
28419
|
74 |
"enqueue x (Queue xs ys) = Queue (x # xs) ys"
|
|
75 |
|
28564
|
76 |
fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
|
28419
|
77 |
"dequeue (Queue [] []) = (None, Queue [] [])"
|
|
78 |
| "dequeue (Queue xs (y # ys)) = (Some y, Queue xs ys)"
|
28447
|
79 |
| "dequeue (Queue xs []) =
|
|
80 |
(case rev xs of y # ys \<Rightarrow> (Some y, Queue [] ys))"
|
28419
|
81 |
|
|
82 |
text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
|
28213
|
83 |
|
28564
|
84 |
export_code %quote empty dequeue enqueue in SML
|
28447
|
85 |
module_name Example file "examples/example.ML"
|
28419
|
86 |
|
|
87 |
text {* \noindent resulting in the following code: *}
|
|
88 |
|
28564
|
89 |
text %quote {*@{code_stmts empty enqueue dequeue (SML)}*}
|
28419
|
90 |
|
|
91 |
text {*
|
|
92 |
\noindent The @{command export_code} command takes a space-separated list of
|
|
93 |
constants for which code shall be generated; anything else needed for those
|
28447
|
94 |
is added implicitly. Then follows a target language identifier
|
28419
|
95 |
(@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
|
|
96 |
A file name denotes the destination to store the generated code. Note that
|
|
97 |
the semantics of the destination depends on the target language: for
|
|
98 |
@{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
|
|
99 |
it denotes a \emph{directory} where a file named as the module name
|
|
100 |
(with extension @{text ".hs"}) is written:
|
|
101 |
*}
|
|
102 |
|
28564
|
103 |
export_code %quote empty dequeue enqueue in Haskell
|
28447
|
104 |
module_name Example file "examples/"
|
28419
|
105 |
|
|
106 |
text {*
|
|
107 |
\noindent This is how the corresponding code in @{text Haskell} looks like:
|
|
108 |
*}
|
|
109 |
|
28564
|
110 |
text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*}
|
28419
|
111 |
|
|
112 |
text {*
|
|
113 |
\noindent This demonstrates the basic usage of the @{command export_code} command;
|
28447
|
114 |
for more details see \secref{sec:further}.
|
28419
|
115 |
*}
|
28213
|
116 |
|
28447
|
117 |
subsection {* Code generator architecture \label{sec:concept} *}
|
28213
|
118 |
|
28419
|
119 |
text {*
|
|
120 |
What you have seen so far should be already enough in a lot of cases. If you
|
|
121 |
are content with this, you can quit reading here. Anyway, in order to customise
|
|
122 |
and adapt the code generator, it is inevitable to gain some understanding
|
|
123 |
how it works.
|
|
124 |
|
|
125 |
\begin{figure}[h]
|
28635
|
126 |
\begin{tikzpicture}[x = 4.2cm, y = 1cm]
|
|
127 |
\tikzstyle entity=[rounded corners, draw, thick, color = black, fill = white];
|
|
128 |
\tikzstyle process=[ellipse, draw, thick, color = green, fill = white];
|
|
129 |
\tikzstyle process_arrow=[->, semithick, color = green];
|
|
130 |
\node (HOL) at (0, 4) [style=entity] {@{text "Isabelle/HOL"} theory};
|
29560
|
131 |
\node (eqn) at (2, 2) [style=entity] {code equations};
|
28635
|
132 |
\node (iml) at (2, 0) [style=entity] {intermediate language};
|
|
133 |
\node (seri) at (1, 0) [style=process] {serialisation};
|
|
134 |
\node (SML) at (0, 3) [style=entity] {@{text SML}};
|
|
135 |
\node (OCaml) at (0, 2) [style=entity] {@{text OCaml}};
|
|
136 |
\node (further) at (0, 1) [style=entity] {@{text "\<dots>"}};
|
|
137 |
\node (Haskell) at (0, 0) [style=entity] {@{text Haskell}};
|
|
138 |
\draw [style=process_arrow] (HOL) .. controls (2, 4) ..
|
|
139 |
node [style=process, near start] {selection}
|
|
140 |
node [style=process, near end] {preprocessing}
|
|
141 |
(eqn);
|
|
142 |
\draw [style=process_arrow] (eqn) -- node (transl) [style=process] {translation} (iml);
|
|
143 |
\draw [style=process_arrow] (iml) -- (seri);
|
|
144 |
\draw [style=process_arrow] (seri) -- (SML);
|
|
145 |
\draw [style=process_arrow] (seri) -- (OCaml);
|
|
146 |
\draw [style=process_arrow, dashed] (seri) -- (further);
|
|
147 |
\draw [style=process_arrow] (seri) -- (Haskell);
|
|
148 |
\end{tikzpicture}
|
28419
|
149 |
\caption{Code generator architecture}
|
|
150 |
\label{fig:arch}
|
|
151 |
\end{figure}
|
|
152 |
|
|
153 |
The code generator employs a notion of executability
|
|
154 |
for three foundational executable ingredients known
|
|
155 |
from functional programming:
|
29560
|
156 |
\emph{code equations}, \emph{datatypes}, and
|
|
157 |
\emph{type classes}. A code equation as a first approximation
|
28419
|
158 |
is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
|
|
159 |
(an equation headed by a constant @{text f} with arguments
|
|
160 |
@{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
|
29560
|
161 |
Code generation aims to turn code equations
|
28447
|
162 |
into a functional program. This is achieved by three major
|
|
163 |
components which operate sequentially, i.e. the result of one is
|
|
164 |
the input
|
|
165 |
of the next in the chain, see diagram \ref{fig:arch}:
|
28419
|
166 |
|
|
167 |
\begin{itemize}
|
|
168 |
|
|
169 |
\item Out of the vast collection of theorems proven in a
|
|
170 |
\qn{theory}, a reasonable subset modelling
|
29560
|
171 |
code equations is \qn{selected}.
|
28419
|
172 |
|
|
173 |
\item On those selected theorems, certain
|
|
174 |
transformations are carried out
|
|
175 |
(\qn{preprocessing}). Their purpose is to turn theorems
|
|
176 |
representing non- or badly executable
|
|
177 |
specifications into equivalent but executable counterparts.
|
|
178 |
The result is a structured collection of \qn{code theorems}.
|
|
179 |
|
29560
|
180 |
\item Before the selected code equations are continued with,
|
28419
|
181 |
they can be \qn{preprocessed}, i.e. subjected to theorem
|
|
182 |
transformations. This \qn{preprocessor} is an interface which
|
|
183 |
allows to apply
|
|
184 |
the full expressiveness of ML-based theorem transformations
|
|
185 |
to code generation; motivating examples are shown below, see
|
|
186 |
\secref{sec:preproc}.
|
|
187 |
The result of the preprocessing step is a structured collection
|
29560
|
188 |
of code equations.
|
28419
|
189 |
|
29560
|
190 |
\item These code equations are \qn{translated} to a program
|
28447
|
191 |
in an abstract intermediate language. Think of it as a kind
|
|
192 |
of \qt{Mini-Haskell} with four \qn{statements}: @{text data}
|
29560
|
193 |
(for datatypes), @{text fun} (stemming from code equations),
|
28447
|
194 |
also @{text class} and @{text inst} (for type classes).
|
28419
|
195 |
|
|
196 |
\item Finally, the abstract program is \qn{serialised} into concrete
|
|
197 |
source code of a target language.
|
|
198 |
|
|
199 |
\end{itemize}
|
|
200 |
|
|
201 |
\noindent From these steps, only the two last are carried out outside the logic; by
|
|
202 |
keeping this layer as thin as possible, the amount of code to trust is
|
|
203 |
kept to a minimum.
|
|
204 |
*}
|
28213
|
205 |
|
|
206 |
end
|