28213
|
1 |
theory Introduction
|
|
2 |
imports Setup
|
|
3 |
begin
|
|
4 |
|
|
5 |
chapter {* Code generation from @{text "Isabelle/HOL"} theories *}
|
|
6 |
|
|
7 |
section {* Introduction and Overview *}
|
|
8 |
|
|
9 |
text {*
|
|
10 |
This tutorial introduces a generic code generator for the
|
28419
|
11 |
@{text Isabelle} system.
|
|
12 |
Generic in the sense that the
|
|
13 |
\qn{target language} for which code shall ultimately be
|
|
14 |
generated is not fixed but may be an arbitrary state-of-the-art
|
|
15 |
functional programming language (currently, the implementation
|
|
16 |
supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell}
|
|
17 |
\cite{haskell-revised-report}).
|
|
18 |
|
|
19 |
Conceptually the code generator framework is part
|
28428
|
20 |
of Isabelle's @{theory Pure} meta logic framework; the logic
|
|
21 |
@{theory HOL} which is an extension of @{theory Pure}
|
28419
|
22 |
already comes with a reasonable framework setup and thus provides
|
|
23 |
a good working horse for raising code-generation-driven
|
|
24 |
applications. So, we assume some familiarity and experience
|
28428
|
25 |
with the ingredients of the @{theory HOL} distribution theories.
|
28419
|
26 |
(see also \cite{isa-tutorial}).
|
|
27 |
|
|
28 |
The code generator aims to be usable with no further ado
|
|
29 |
in most cases while allowing for detailed customisation.
|
|
30 |
This manifests in the structure of this tutorial: after a short
|
|
31 |
conceptual introduction with an example \secref{sec:intro},
|
|
32 |
we discuss the generic customisation facilities \secref{sec:program}.
|
|
33 |
A further section \secref{sec:adaption} is dedicated to the matter of
|
|
34 |
\qn{adaption} to specific target language environments. After some
|
|
35 |
further issues \secref{sec:further} we conclude with an overview
|
|
36 |
of some ML programming interfaces \secref{sec:ml}.
|
|
37 |
|
|
38 |
\begin{warn}
|
|
39 |
Ultimately, the code generator which this tutorial deals with
|
|
40 |
is supposed to replace the already established code generator
|
|
41 |
by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
|
|
42 |
So, for the moment, there are two distinct code generators
|
|
43 |
in Isabelle.
|
|
44 |
Also note that while the framework itself is
|
28428
|
45 |
object-logic independent, only @{theory HOL} provides a reasonable
|
28419
|
46 |
framework setup.
|
|
47 |
\end{warn}
|
|
48 |
|
28213
|
49 |
*}
|
|
50 |
|
28419
|
51 |
subsection {* Code generation via shallow embedding \label{sec:intro} *}
|
|
52 |
|
|
53 |
text {*
|
|
54 |
The key concept for understanding @{text Isabelle}'s code generation is
|
|
55 |
\emph{shallow embedding}, i.e.~logical entities like constants, types and
|
|
56 |
classes are identified with corresponding concepts in the target language.
|
|
57 |
|
28428
|
58 |
Inside @{theory HOL}, the @{command datatype} and
|
28419
|
59 |
@{command definition}/@{command primrec}/@{command fun} declarations form
|
|
60 |
the core of a functional programming language. The default code generator setup
|
|
61 |
allows to turn those into functional programs immediately.
|
|
62 |
|
|
63 |
This means that \qt{naive} code generation can proceed without further ado.
|
|
64 |
For example, here a simple \qt{implementation} of amortised queues:
|
|
65 |
*}
|
|
66 |
|
|
67 |
datatype %quoteme 'a queue = Queue "'a list" "'a list"
|
|
68 |
|
|
69 |
definition %quoteme empty :: "'a queue" where
|
|
70 |
"empty = Queue [] []"
|
|
71 |
|
|
72 |
primrec %quoteme enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where
|
|
73 |
"enqueue x (Queue xs ys) = Queue (x # xs) ys"
|
|
74 |
|
|
75 |
fun %quoteme dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where
|
|
76 |
"dequeue (Queue [] []) = (None, Queue [] [])"
|
|
77 |
| "dequeue (Queue xs (y # ys)) = (Some y, Queue xs ys)"
|
|
78 |
| "dequeue (Queue xs []) = (case rev xs of y # ys \<Rightarrow> (Some y, Queue [] ys))"
|
|
79 |
|
|
80 |
text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *}
|
28213
|
81 |
|
28419
|
82 |
export_code %quoteme empty dequeue enqueue in SML module_name Example file "examples/example.ML"
|
|
83 |
|
|
84 |
text {* \noindent resulting in the following code: *}
|
|
85 |
|
|
86 |
text %quoteme {*@{code_stmts empty enqueue dequeue (SML)}*}
|
|
87 |
|
|
88 |
text {*
|
|
89 |
\noindent The @{command export_code} command takes a space-separated list of
|
|
90 |
constants for which code shall be generated; anything else needed for those
|
|
91 |
is added implicitly. Then follows by a target language identifier
|
|
92 |
(@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name.
|
|
93 |
A file name denotes the destination to store the generated code. Note that
|
|
94 |
the semantics of the destination depends on the target language: for
|
|
95 |
@{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell}
|
|
96 |
it denotes a \emph{directory} where a file named as the module name
|
|
97 |
(with extension @{text ".hs"}) is written:
|
|
98 |
*}
|
|
99 |
|
28420
|
100 |
export_code %quoteme empty dequeue enqueue in Haskell module_name Example file "examples/"
|
28419
|
101 |
|
|
102 |
text {*
|
|
103 |
\noindent This is how the corresponding code in @{text Haskell} looks like:
|
|
104 |
*}
|
|
105 |
|
|
106 |
text %quoteme {*@{code_stmts empty enqueue dequeue (Haskell)}*}
|
|
107 |
|
|
108 |
text {*
|
|
109 |
\noindent This demonstrates the basic usage of the @{command export_code} command;
|
|
110 |
for more details see \secref{sec:serializer}.
|
|
111 |
*}
|
28213
|
112 |
|
|
113 |
subsection {* Code generator architecture *}
|
|
114 |
|
28419
|
115 |
text {*
|
|
116 |
What you have seen so far should be already enough in a lot of cases. If you
|
|
117 |
are content with this, you can quit reading here. Anyway, in order to customise
|
|
118 |
and adapt the code generator, it is inevitable to gain some understanding
|
|
119 |
how it works.
|
|
120 |
|
|
121 |
\begin{figure}[h]
|
|
122 |
\centering
|
|
123 |
\includegraphics[width=0.7\textwidth]{codegen_process}
|
|
124 |
\caption{Code generator architecture}
|
|
125 |
\label{fig:arch}
|
|
126 |
\end{figure}
|
|
127 |
|
|
128 |
The code generator itself consists of three major components
|
|
129 |
which operate sequentially, i.e. the result of one is the input
|
|
130 |
of the next in the chain, see diagram \ref{fig:arch}:
|
|
131 |
|
|
132 |
The code generator employs a notion of executability
|
|
133 |
for three foundational executable ingredients known
|
|
134 |
from functional programming:
|
|
135 |
\emph{defining equations}, \emph{datatypes}, and
|
|
136 |
\emph{type classes}. A defining equation as a first approximation
|
|
137 |
is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
|
|
138 |
(an equation headed by a constant @{text f} with arguments
|
|
139 |
@{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
|
|
140 |
Code generation aims to turn defining equations
|
|
141 |
into a functional program by running through the following
|
|
142 |
process:
|
|
143 |
|
|
144 |
\begin{itemize}
|
|
145 |
|
|
146 |
\item Out of the vast collection of theorems proven in a
|
|
147 |
\qn{theory}, a reasonable subset modelling
|
|
148 |
defining equations is \qn{selected}.
|
|
149 |
|
|
150 |
\item On those selected theorems, certain
|
|
151 |
transformations are carried out
|
|
152 |
(\qn{preprocessing}). Their purpose is to turn theorems
|
|
153 |
representing non- or badly executable
|
|
154 |
specifications into equivalent but executable counterparts.
|
|
155 |
The result is a structured collection of \qn{code theorems}.
|
|
156 |
|
|
157 |
\item Before the selected defining equations are continued with,
|
|
158 |
they can be \qn{preprocessed}, i.e. subjected to theorem
|
|
159 |
transformations. This \qn{preprocessor} is an interface which
|
|
160 |
allows to apply
|
|
161 |
the full expressiveness of ML-based theorem transformations
|
|
162 |
to code generation; motivating examples are shown below, see
|
|
163 |
\secref{sec:preproc}.
|
|
164 |
The result of the preprocessing step is a structured collection
|
|
165 |
of defining equations.
|
|
166 |
|
|
167 |
\item These defining equations are \qn{translated} to a program
|
|
168 |
in an abstract intermediate language.
|
|
169 |
|
|
170 |
\item Finally, the abstract program is \qn{serialised} into concrete
|
|
171 |
source code of a target language.
|
|
172 |
|
|
173 |
\end{itemize}
|
|
174 |
|
|
175 |
\noindent From these steps, only the two last are carried out outside the logic; by
|
|
176 |
keeping this layer as thin as possible, the amount of code to trust is
|
|
177 |
kept to a minimum.
|
|
178 |
*}
|
28213
|
179 |
|
|
180 |
end
|