20948
|
1 |
|
|
2 |
(* $Id$ *)
|
|
3 |
|
|
4 |
theory Codegen
|
|
5 |
imports Main
|
|
6 |
begin
|
|
7 |
|
|
8 |
chapter {* Code generation from Isabelle theories *}
|
|
9 |
|
|
10 |
section {* Introduction *}
|
|
11 |
|
21058
|
12 |
subsection {* Motivation *}
|
|
13 |
|
20948
|
14 |
text {*
|
21058
|
15 |
Executing formal specifications as programs is a well-established
|
|
16 |
topic in the theorem proving community. With increasing
|
|
17 |
application of theorem proving systems in the area of
|
|
18 |
software development and verification, its relevance manifests
|
|
19 |
for running test cases and rapid prototyping. In logical
|
|
20 |
calculi like constructive type theory,
|
|
21 |
a notion of executability is implicit due to the nature
|
|
22 |
of the calculus. In contrast, specifications in Isabelle/HOL
|
|
23 |
can be highly non-executable. In order to bridge
|
|
24 |
the gap between logic and executable specifications,
|
|
25 |
an explicit non-trivial transformation has to be applied:
|
|
26 |
code generation.
|
|
27 |
|
|
28 |
This tutorial introduces a generic code generator for the
|
|
29 |
Isabelle system \cite{isa-tutorial}.
|
|
30 |
Generic in the sense that the
|
|
31 |
\qn{target language} for which code shall ultimately be
|
|
32 |
generated is not fixed but may be an arbitrary state-of-the-art
|
|
33 |
functional programming language (currently, the implementation
|
|
34 |
supports SML \cite{web:sml} and Haskell \cite{web:haskell}).
|
|
35 |
We aim to provide a
|
|
36 |
versatile environment
|
|
37 |
suitable for software development and verification,
|
|
38 |
structuring the process
|
|
39 |
of code generation into a small set of orthogonal principles
|
|
40 |
while achieving a big coverage of application areas
|
|
41 |
with maximum flexibility.
|
|
42 |
*}
|
|
43 |
|
|
44 |
|
|
45 |
subsection {* Overview *}
|
|
46 |
|
|
47 |
text {*
|
|
48 |
The code generator aims to be usable with no further ado
|
|
49 |
in most cases while allowing for detailed customization.
|
|
50 |
This manifests in the structure of this tutorial: this introduction
|
|
51 |
continous with a short introduction of concepts. Section
|
|
52 |
\secref{sec:basics} explains how to use the framework naivly,
|
|
53 |
presuming a reasonable default setup. Then, section
|
|
54 |
\secref{sec:advanced} deals with advanced topics,
|
|
55 |
introducing further aspects of the code generator framework
|
|
56 |
in a motivation-driven manner. Last, section \secref{sec:ml}
|
|
57 |
introduces the framework's internal programming interfaces.
|
20948
|
58 |
|
21058
|
59 |
\begin{warn}
|
|
60 |
Ultimately, the code generator which this tutorial deals with
|
|
61 |
is supposed to replace the already established code generator
|
|
62 |
by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
|
|
63 |
So, for the momennt, there are two distinct code generators
|
|
64 |
in Isabelle.
|
|
65 |
Also note that while the framework itself is largely
|
|
66 |
object-logic independent, only HOL provides a reasonable
|
|
67 |
framework setup.
|
|
68 |
\end{warn}
|
|
69 |
*}
|
|
70 |
|
|
71 |
|
|
72 |
subsection {* Code generation process *}
|
|
73 |
|
|
74 |
text {*
|
|
75 |
The code generator employs a notion of executability
|
|
76 |
for three foundational executable ingredients known
|
|
77 |
from functional programming:
|
|
78 |
\emph{function equations}, \emph{datatypes}, and
|
|
79 |
\emph{type classes}. A function equation as a first approximation
|
|
80 |
is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
|
|
81 |
(an equation headed by a constant @{text f} with arguments
|
|
82 |
@{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}.
|
|
83 |
Code generation aims to turn function equations
|
|
84 |
into a functional program by running through
|
|
85 |
a process (see figure \ref{fig:process}):
|
|
86 |
|
|
87 |
\begin{itemize}
|
|
88 |
|
|
89 |
\item Out of the vast collection of theorems proven in a
|
|
90 |
\qn{theory}, a reasonable subset modeling
|
|
91 |
function equations is \qn{selected}.
|
|
92 |
|
|
93 |
\item On those selected theorems, certain
|
|
94 |
transformations are carried out
|
|
95 |
(\qn{preprocessing}). Their purpose is to turn theorems
|
|
96 |
representing non- or badly executable
|
|
97 |
specifications into equivalent but executable counterparts.
|
|
98 |
The result is a structured collection of \qn{code theorems}.
|
|
99 |
|
|
100 |
\item These \qn{code theorems} then are extracted
|
|
101 |
into an Haskell-like intermediate
|
|
102 |
language.
|
|
103 |
|
|
104 |
\item Finally, out of the intermediate language the final
|
|
105 |
code in the desired \qn{target language} is \qn{serialized}.
|
|
106 |
|
|
107 |
\end{itemize}
|
|
108 |
|
|
109 |
\begin{figure}[h]
|
|
110 |
\centering
|
|
111 |
\includegraphics[width=0.3\textwidth]{codegen_process}
|
|
112 |
\caption{code generator -- processing overview}
|
|
113 |
\label{fig:process}
|
|
114 |
\end{figure}
|
|
115 |
|
|
116 |
From these steps, only the two last are carried out
|
|
117 |
outside the logic; by keeping this layer as
|
|
118 |
thin as possible, the amount of code to trust is
|
|
119 |
kept to a minimum.
|
|
120 |
*}
|
|
121 |
|
|
122 |
|
|
123 |
|
|
124 |
section {* Basics \label{sec:basics} *}
|
20948
|
125 |
|
|
126 |
subsection {* Invoking the code generator *}
|
|
127 |
|
21058
|
128 |
text {*
|
|
129 |
Thanks to a reasonable setup of the HOL theories, in
|
|
130 |
most cases code generation proceeds without further ado:
|
|
131 |
*}
|
|
132 |
|
|
133 |
consts
|
|
134 |
fac :: "nat \<Rightarrow> nat"
|
|
135 |
|
|
136 |
primrec
|
|
137 |
"fac 0 = 1"
|
|
138 |
"fac (Suc n) = Suc n * fac n"
|
|
139 |
|
|
140 |
text {*
|
|
141 |
This executable specification is now turned to SML code:
|
|
142 |
*}
|
|
143 |
|
|
144 |
code_gen fac (SML "examples/fac.ML")
|
|
145 |
|
|
146 |
text {*
|
|
147 |
The \isasymCODEGEN command takes a space-seperated list of
|
|
148 |
constants together with \qn{serialization directives}
|
|
149 |
in parentheses. These start with a \qn{target language}
|
|
150 |
identifer, followed by arguments, their semantics
|
|
151 |
depending on the target. In the SML case, a filename
|
|
152 |
is given where to write the generated code to.
|
|
153 |
|
|
154 |
Internally, the function equations for all selected
|
|
155 |
constants are taken, including any tranitivly required
|
|
156 |
constants, datatypes and classes, resulting in the following
|
|
157 |
code:
|
|
158 |
|
|
159 |
\lstsml{Thy/examples/fac.ML}
|
|
160 |
|
|
161 |
The code generator will complain when a required
|
|
162 |
ingredient does not provide a executable counterpart.
|
|
163 |
This is the case if an involved type is not a datatype:
|
|
164 |
*}
|
|
165 |
|
|
166 |
(*<*)
|
|
167 |
setup {* Sign.add_path "foo" *}
|
|
168 |
(*>*)
|
|
169 |
|
|
170 |
typedecl 'a foo
|
|
171 |
|
|
172 |
definition
|
|
173 |
bar :: "'a foo \<Rightarrow> 'a \<Rightarrow> 'a"
|
|
174 |
"bar x y = y"
|
|
175 |
|
|
176 |
(*<*)
|
|
177 |
hide type foo
|
|
178 |
hide const bar
|
|
179 |
|
|
180 |
setup {* Sign.parent_path *}
|
|
181 |
|
|
182 |
datatype 'a foo = Foo
|
|
183 |
|
|
184 |
definition
|
|
185 |
bar :: "'a foo \<Rightarrow> 'a \<Rightarrow> 'a"
|
|
186 |
"bar x y = y"
|
|
187 |
(*>*)
|
|
188 |
|
|
189 |
code_gen bar (SML "examples/fail_type.ML")
|
|
190 |
|
|
191 |
text {*
|
|
192 |
\noindent will result in an error. Likewise, generating code
|
|
193 |
for constants not yielding
|
|
194 |
a function equation will fail, e.g.~the Hilbert choice
|
|
195 |
operation @{text "SOME"}:
|
|
196 |
*}
|
|
197 |
|
|
198 |
(*<*)
|
|
199 |
setup {* Sign.add_path "foo" *}
|
|
200 |
(*>*)
|
|
201 |
|
|
202 |
definition
|
|
203 |
pick_some :: "'a list \<Rightarrow> 'a"
|
|
204 |
"pick_some xs = (SOME x. x \<in> set xs)"
|
|
205 |
|
|
206 |
(*<*)
|
|
207 |
hide const pick_some
|
|
208 |
|
|
209 |
setup {* Sign.parent_path *}
|
|
210 |
|
|
211 |
definition
|
|
212 |
pick_some :: "'a list \<Rightarrow> 'a"
|
|
213 |
"pick_some = hd"
|
|
214 |
(*>*)
|
|
215 |
|
|
216 |
code_gen pick_some (SML "examples/fail_const.ML")
|
|
217 |
|
20948
|
218 |
subsection {* Theorem selection *}
|
|
219 |
|
21058
|
220 |
text {*
|
|
221 |
The list of all function equations in a theory may be inspected
|
|
222 |
using the \isasymPRINTCODETHMS command:
|
|
223 |
*}
|
|
224 |
|
|
225 |
print_codethms
|
|
226 |
|
|
227 |
text {*
|
|
228 |
\noindent which displays a table of constant with corresponding
|
|
229 |
function equations (the additional stuff displayed
|
|
230 |
shall not bother us for the moment). If this table does
|
|
231 |
not provide at least one function
|
|
232 |
equation, the table of primititve definitions is searched
|
|
233 |
whether it provides one.
|
|
234 |
|
|
235 |
The typical HOL tools are already set up in a way that
|
|
236 |
function definitions introduced by \isasymFUN, \isasymFUNCTION,
|
|
237 |
\isasymPRIMREC, \isasymRECDEF are implicitly propagated
|
|
238 |
to this function equation table. Specific theorems may be
|
|
239 |
selected using an attribute: \emph{code func}. As example,
|
|
240 |
a weight selector function:
|
|
241 |
*}
|
|
242 |
|
|
243 |
consts
|
|
244 |
pick :: "(nat \<times> 'a) list \<Rightarrow> nat \<Rightarrow> 'a"
|
|
245 |
|
|
246 |
primrec
|
|
247 |
"pick (x#xs) n = (let (k, v) = x in
|
|
248 |
if n < k then v else pick xs (n - k))"
|
|
249 |
|
|
250 |
text {*
|
|
251 |
We want to eliminate the explicit destruction
|
|
252 |
of @{term x} to @{term "(k, v)"}:
|
|
253 |
*}
|
|
254 |
|
|
255 |
lemma [code func]:
|
|
256 |
"pick ((k, v)#xs) n = (if n < k then v else pick xs (n - k))"
|
|
257 |
by simp
|
|
258 |
|
|
259 |
code_gen pick (SML "examples/pick1.ML")
|
|
260 |
|
|
261 |
text {*
|
|
262 |
This theorem is now added to the function equation table:
|
|
263 |
|
|
264 |
\lstsml{Thy/examples/pick1.ML}
|
|
265 |
|
|
266 |
It might be convenient to remove the pointless original
|
|
267 |
equation, using the \emph{nofunc} attribute:
|
|
268 |
*}
|
|
269 |
|
|
270 |
lemmas [code nofunc] = pick.simps
|
|
271 |
|
|
272 |
code_gen pick (SML "examples/pick2.ML")
|
|
273 |
|
|
274 |
text {*
|
|
275 |
\lstsml{Thy/examples/pick2.ML}
|
|
276 |
|
|
277 |
Syntactic redundancies are implicitly dropped. For example,
|
|
278 |
using a modified version of the @{const fac} function
|
|
279 |
as function equation, the then redundant (since
|
|
280 |
syntactically more subsumed) original function equations
|
|
281 |
are dropped, resulting in a warning:
|
|
282 |
*}
|
|
283 |
|
|
284 |
lemma [code func]:
|
|
285 |
"fac n = (case n of 0 \<Rightarrow> 1 | Suc m \<Rightarrow> n * fac m)"
|
|
286 |
by (cases n) simp_all
|
|
287 |
|
|
288 |
code_gen fac (SML "examples/fac_case.ML")
|
|
289 |
|
|
290 |
text {*
|
|
291 |
\lstsml{Thy/examples/fac_case.ML}
|
|
292 |
|
|
293 |
\begin{warn}
|
|
294 |
Some statements in this section have to be treated with some
|
|
295 |
caution. First, since the HOL function package is still
|
|
296 |
under development, its setup with respect to code generation
|
|
297 |
may differ from what is presumed here.
|
|
298 |
Further, the attributes \emph{code} and \emph{code del}
|
|
299 |
associated with the existing code generator also apply to
|
|
300 |
the new one: \emph{code} implies \emph{code func},
|
|
301 |
and \emph{code del} implies \emph{code nofunc}.
|
|
302 |
\end{warn}
|
|
303 |
*}
|
20948
|
304 |
|
|
305 |
subsection {* Type classes *}
|
|
306 |
|
21058
|
307 |
text {*
|
|
308 |
Type classes enter the game via the Isar class package.
|
|
309 |
For a short introduction how to use it, see \fixme[ref];
|
|
310 |
here we just illustrate its relation on code generation.
|
|
311 |
|
|
312 |
In a target language, type classes may be represented
|
|
313 |
nativly (as in the case of Haskell). For languages
|
|
314 |
like SML, they are implemented using \emph{dictionaries}.
|
|
315 |
Our following example specified a class \qt{null},
|
|
316 |
assigning to each of its inhabitants a \qt{null} value:
|
|
317 |
*}
|
|
318 |
|
|
319 |
class null =
|
|
320 |
fixes null :: 'a
|
|
321 |
|
|
322 |
consts
|
|
323 |
head :: "'a\<Colon>null list \<Rightarrow> 'a"
|
|
324 |
|
|
325 |
primrec
|
|
326 |
"head [] = null"
|
|
327 |
"head (x#xs) = x"
|
|
328 |
|
|
329 |
text {*
|
|
330 |
We provide some instances for our @{text null}:
|
|
331 |
*}
|
|
332 |
|
|
333 |
instance option :: (type) null
|
|
334 |
"null \<equiv> None" ..
|
|
335 |
|
|
336 |
instance list :: (type) null
|
|
337 |
"null \<equiv> []" ..
|
|
338 |
|
|
339 |
text {*
|
|
340 |
Constructing a dummy example:
|
|
341 |
*}
|
|
342 |
|
|
343 |
definition
|
|
344 |
"dummy = head [Some (Suc 0), None]"
|
|
345 |
|
|
346 |
text {*
|
|
347 |
Type classes offer a suitable occassion to introduce
|
|
348 |
the Haskell serializer. Its usage is almost the same
|
|
349 |
as SML, but, in accordance with conventions some
|
|
350 |
common Haskell compilers enforce, each module ends
|
|
351 |
up in a single file which the file given by the user
|
|
352 |
then imports. The module hierarchy is reflected in
|
|
353 |
the file system.
|
|
354 |
*}
|
|
355 |
|
|
356 |
code_gen dummy (Haskell "examples/codegen.hs")
|
|
357 |
(* NOTE: you may use Haskell only once in this document *)
|
|
358 |
|
|
359 |
text {*
|
|
360 |
\lsthaskell{Thy/examples/Codegen.hs}
|
|
361 |
|
|
362 |
(we have left out all other modules).
|
|
363 |
|
|
364 |
The whole code in SML with explicit dictionary passing:
|
|
365 |
*}
|
|
366 |
|
|
367 |
code_gen dummy (SML "examples/class.ML")
|
|
368 |
|
|
369 |
text {*
|
|
370 |
\lstsml{Thy/examples/class.ML}
|
|
371 |
*}
|
|
372 |
|
|
373 |
subsection {* Incremental code generation *}
|
|
374 |
|
|
375 |
(* print_codethms (\<dots>) and code_gen, 2 *)
|
|
376 |
|
|
377 |
|
|
378 |
section {* Recipes and advanced topics \label{sec:advanced} *}
|
|
379 |
|
|
380 |
(* no reference, IsarRef, but see paper *)
|
|
381 |
|
|
382 |
subsection {* Library theories *}
|
|
383 |
|
20948
|
384 |
subsection {* Preprocessing *}
|
|
385 |
|
|
386 |
(* preprocessing, print_codethms () *)
|
|
387 |
|
21058
|
388 |
subsection {* Customizing serialization *}
|
20948
|
389 |
|
21058
|
390 |
(* existing libraries, understanding the type game, reflexive equations, code inline code_constsubst, code_abstype*)
|
20948
|
391 |
|
21058
|
392 |
section {* ML interfaces \label{sec:ml} *}
|
20948
|
393 |
|
21058
|
394 |
(* under developement *)
|
20948
|
395 |
|
|
396 |
subsection {* codegen\_data.ML *}
|
|
397 |
|
|
398 |
subsection {* Implementing code generator applications *}
|
|
399 |
|
21058
|
400 |
(* hooks *)
|
|
401 |
|
20948
|
402 |
end
|