20948
|
1 |
|
|
2 |
(* $Id$ *)
|
|
3 |
|
21147
|
4 |
(*<*)
|
20948
|
5 |
theory Codegen
|
|
6 |
imports Main
|
21147
|
7 |
uses "../../../IsarImplementation/Thy/setup.ML"
|
20948
|
8 |
begin
|
|
9 |
|
21147
|
10 |
ML {*
|
|
11 |
CodegenSerializer.sml_code_width := 74;
|
|
12 |
*}
|
|
13 |
|
|
14 |
(*>*)
|
|
15 |
|
20948
|
16 |
chapter {* Code generation from Isabelle theories *}
|
|
17 |
|
|
18 |
section {* Introduction *}
|
|
19 |
|
21058
|
20 |
subsection {* Motivation *}
|
|
21 |
|
20948
|
22 |
text {*
|
21058
|
23 |
Executing formal specifications as programs is a well-established
|
|
24 |
topic in the theorem proving community. With increasing
|
|
25 |
application of theorem proving systems in the area of
|
|
26 |
software development and verification, its relevance manifests
|
|
27 |
for running test cases and rapid prototyping. In logical
|
|
28 |
calculi like constructive type theory,
|
|
29 |
a notion of executability is implicit due to the nature
|
|
30 |
of the calculus. In contrast, specifications in Isabelle/HOL
|
|
31 |
can be highly non-executable. In order to bridge
|
|
32 |
the gap between logic and executable specifications,
|
|
33 |
an explicit non-trivial transformation has to be applied:
|
|
34 |
code generation.
|
|
35 |
|
|
36 |
This tutorial introduces a generic code generator for the
|
|
37 |
Isabelle system \cite{isa-tutorial}.
|
|
38 |
Generic in the sense that the
|
|
39 |
\qn{target language} for which code shall ultimately be
|
|
40 |
generated is not fixed but may be an arbitrary state-of-the-art
|
|
41 |
functional programming language (currently, the implementation
|
|
42 |
supports SML \cite{web:sml} and Haskell \cite{web:haskell}).
|
|
43 |
We aim to provide a
|
|
44 |
versatile environment
|
|
45 |
suitable for software development and verification,
|
|
46 |
structuring the process
|
|
47 |
of code generation into a small set of orthogonal principles
|
|
48 |
while achieving a big coverage of application areas
|
|
49 |
with maximum flexibility.
|
21189
|
50 |
|
|
51 |
For readers, some familiarity and experience
|
|
52 |
with the the ingredients
|
|
53 |
of the HOL \emph{Main} theory is assumed.
|
21058
|
54 |
*}
|
|
55 |
|
|
56 |
|
|
57 |
subsection {* Overview *}
|
|
58 |
|
|
59 |
text {*
|
|
60 |
The code generator aims to be usable with no further ado
|
|
61 |
in most cases while allowing for detailed customization.
|
|
62 |
This manifests in the structure of this tutorial: this introduction
|
21147
|
63 |
continues with a short introduction of concepts. Section
|
21178
|
64 |
\secref{sec:basics} explains how to use the framework naively,
|
21058
|
65 |
presuming a reasonable default setup. Then, section
|
|
66 |
\secref{sec:advanced} deals with advanced topics,
|
|
67 |
introducing further aspects of the code generator framework
|
|
68 |
in a motivation-driven manner. Last, section \secref{sec:ml}
|
|
69 |
introduces the framework's internal programming interfaces.
|
20948
|
70 |
|
21058
|
71 |
\begin{warn}
|
|
72 |
Ultimately, the code generator which this tutorial deals with
|
|
73 |
is supposed to replace the already established code generator
|
|
74 |
by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
|
21147
|
75 |
So, for the moment, there are two distinct code generators
|
21058
|
76 |
in Isabelle.
|
|
77 |
Also note that while the framework itself is largely
|
|
78 |
object-logic independent, only HOL provides a reasonable
|
|
79 |
framework setup.
|
|
80 |
\end{warn}
|
|
81 |
*}
|
|
82 |
|
|
83 |
|
|
84 |
subsection {* Code generation process *}
|
|
85 |
|
|
86 |
text {*
|
21189
|
87 |
\begin{figure}[h]
|
|
88 |
\centering
|
|
89 |
\includegraphics[width=0.7\textwidth]{codegen_process}
|
|
90 |
\caption{code generator -- processing overview}
|
|
91 |
\label{fig:process}
|
|
92 |
\end{figure}
|
|
93 |
|
21058
|
94 |
The code generator employs a notion of executability
|
|
95 |
for three foundational executable ingredients known
|
|
96 |
from functional programming:
|
|
97 |
\emph{function equations}, \emph{datatypes}, and
|
|
98 |
\emph{type classes}. A function equation as a first approximation
|
|
99 |
is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
|
|
100 |
(an equation headed by a constant @{text f} with arguments
|
|
101 |
@{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}.
|
|
102 |
Code generation aims to turn function equations
|
|
103 |
into a functional program by running through
|
|
104 |
a process (see figure \ref{fig:process}):
|
|
105 |
|
|
106 |
\begin{itemize}
|
|
107 |
|
|
108 |
\item Out of the vast collection of theorems proven in a
|
|
109 |
\qn{theory}, a reasonable subset modeling
|
|
110 |
function equations is \qn{selected}.
|
|
111 |
|
|
112 |
\item On those selected theorems, certain
|
|
113 |
transformations are carried out
|
|
114 |
(\qn{preprocessing}). Their purpose is to turn theorems
|
|
115 |
representing non- or badly executable
|
|
116 |
specifications into equivalent but executable counterparts.
|
|
117 |
The result is a structured collection of \qn{code theorems}.
|
|
118 |
|
|
119 |
\item These \qn{code theorems} then are extracted
|
|
120 |
into an Haskell-like intermediate
|
|
121 |
language.
|
|
122 |
|
|
123 |
\item Finally, out of the intermediate language the final
|
|
124 |
code in the desired \qn{target language} is \qn{serialized}.
|
|
125 |
|
|
126 |
\end{itemize}
|
|
127 |
|
|
128 |
From these steps, only the two last are carried out
|
|
129 |
outside the logic; by keeping this layer as
|
|
130 |
thin as possible, the amount of code to trust is
|
|
131 |
kept to a minimum.
|
|
132 |
*}
|
|
133 |
|
|
134 |
|
|
135 |
|
|
136 |
section {* Basics \label{sec:basics} *}
|
20948
|
137 |
|
|
138 |
subsection {* Invoking the code generator *}
|
|
139 |
|
21058
|
140 |
text {*
|
|
141 |
Thanks to a reasonable setup of the HOL theories, in
|
|
142 |
most cases code generation proceeds without further ado:
|
|
143 |
*}
|
|
144 |
|
|
145 |
consts
|
|
146 |
fac :: "nat \<Rightarrow> nat"
|
|
147 |
|
|
148 |
primrec
|
|
149 |
"fac 0 = 1"
|
|
150 |
"fac (Suc n) = Suc n * fac n"
|
|
151 |
|
|
152 |
text {*
|
|
153 |
This executable specification is now turned to SML code:
|
|
154 |
*}
|
|
155 |
|
|
156 |
code_gen fac (SML "examples/fac.ML")
|
|
157 |
|
|
158 |
text {*
|
21323
|
159 |
The @{text "\<CODEGEN>"} command takes a space-separated list of
|
21058
|
160 |
constants together with \qn{serialization directives}
|
|
161 |
in parentheses. These start with a \qn{target language}
|
21178
|
162 |
identifier, followed by arguments, their semantics
|
21058
|
163 |
depending on the target. In the SML case, a filename
|
|
164 |
is given where to write the generated code to.
|
|
165 |
|
|
166 |
Internally, the function equations for all selected
|
21178
|
167 |
constants are taken, including any transitively required
|
21058
|
168 |
constants, datatypes and classes, resulting in the following
|
|
169 |
code:
|
|
170 |
|
|
171 |
\lstsml{Thy/examples/fac.ML}
|
|
172 |
|
|
173 |
The code generator will complain when a required
|
|
174 |
ingredient does not provide a executable counterpart.
|
|
175 |
This is the case if an involved type is not a datatype:
|
|
176 |
*}
|
|
177 |
|
|
178 |
(*<*)
|
|
179 |
setup {* Sign.add_path "foo" *}
|
|
180 |
(*>*)
|
|
181 |
|
|
182 |
typedecl 'a foo
|
|
183 |
|
|
184 |
definition
|
|
185 |
bar :: "'a foo \<Rightarrow> 'a \<Rightarrow> 'a"
|
|
186 |
"bar x y = y"
|
|
187 |
|
|
188 |
(*<*)
|
|
189 |
hide type foo
|
|
190 |
hide const bar
|
|
191 |
|
|
192 |
setup {* Sign.parent_path *}
|
|
193 |
|
|
194 |
datatype 'a foo = Foo
|
|
195 |
|
|
196 |
definition
|
|
197 |
bar :: "'a foo \<Rightarrow> 'a \<Rightarrow> 'a"
|
|
198 |
"bar x y = y"
|
|
199 |
(*>*)
|
|
200 |
|
|
201 |
code_gen bar (SML "examples/fail_type.ML")
|
|
202 |
|
|
203 |
text {*
|
|
204 |
\noindent will result in an error. Likewise, generating code
|
|
205 |
for constants not yielding
|
|
206 |
a function equation will fail, e.g.~the Hilbert choice
|
|
207 |
operation @{text "SOME"}:
|
|
208 |
*}
|
|
209 |
|
|
210 |
(*<*)
|
|
211 |
setup {* Sign.add_path "foo" *}
|
|
212 |
(*>*)
|
|
213 |
|
|
214 |
definition
|
|
215 |
pick_some :: "'a list \<Rightarrow> 'a"
|
|
216 |
"pick_some xs = (SOME x. x \<in> set xs)"
|
|
217 |
|
|
218 |
(*<*)
|
|
219 |
hide const pick_some
|
|
220 |
|
|
221 |
setup {* Sign.parent_path *}
|
|
222 |
|
|
223 |
definition
|
|
224 |
pick_some :: "'a list \<Rightarrow> 'a"
|
|
225 |
"pick_some = hd"
|
|
226 |
(*>*)
|
|
227 |
|
|
228 |
code_gen pick_some (SML "examples/fail_const.ML")
|
|
229 |
|
20948
|
230 |
subsection {* Theorem selection *}
|
|
231 |
|
21058
|
232 |
text {*
|
|
233 |
The list of all function equations in a theory may be inspected
|
21323
|
234 |
using the @{text "\<PRINTCODETHMS>"} command:
|
21058
|
235 |
*}
|
|
236 |
|
|
237 |
print_codethms
|
|
238 |
|
|
239 |
text {*
|
|
240 |
\noindent which displays a table of constant with corresponding
|
|
241 |
function equations (the additional stuff displayed
|
|
242 |
shall not bother us for the moment). If this table does
|
|
243 |
not provide at least one function
|
21178
|
244 |
equation, the table of primitive definitions is searched
|
21058
|
245 |
whether it provides one.
|
|
246 |
|
|
247 |
The typical HOL tools are already set up in a way that
|
21323
|
248 |
function definitions introduced by @{text "\<FUN>"},
|
|
249 |
@{text "\<FUNCTION>"}, @{text "\<PRIMREC>"}
|
|
250 |
@{text "\<RECDEF>"} are implicitly propagated
|
21058
|
251 |
to this function equation table. Specific theorems may be
|
|
252 |
selected using an attribute: \emph{code func}. As example,
|
|
253 |
a weight selector function:
|
|
254 |
*}
|
|
255 |
|
|
256 |
consts
|
|
257 |
pick :: "(nat \<times> 'a) list \<Rightarrow> nat \<Rightarrow> 'a"
|
|
258 |
|
|
259 |
primrec
|
|
260 |
"pick (x#xs) n = (let (k, v) = x in
|
|
261 |
if n < k then v else pick xs (n - k))"
|
|
262 |
|
|
263 |
text {*
|
|
264 |
We want to eliminate the explicit destruction
|
|
265 |
of @{term x} to @{term "(k, v)"}:
|
|
266 |
*}
|
|
267 |
|
|
268 |
lemma [code func]:
|
|
269 |
"pick ((k, v)#xs) n = (if n < k then v else pick xs (n - k))"
|
|
270 |
by simp
|
|
271 |
|
|
272 |
code_gen pick (SML "examples/pick1.ML")
|
|
273 |
|
|
274 |
text {*
|
|
275 |
This theorem is now added to the function equation table:
|
|
276 |
|
|
277 |
\lstsml{Thy/examples/pick1.ML}
|
|
278 |
|
|
279 |
It might be convenient to remove the pointless original
|
|
280 |
equation, using the \emph{nofunc} attribute:
|
|
281 |
*}
|
|
282 |
|
|
283 |
lemmas [code nofunc] = pick.simps
|
|
284 |
|
|
285 |
code_gen pick (SML "examples/pick2.ML")
|
|
286 |
|
|
287 |
text {*
|
|
288 |
\lstsml{Thy/examples/pick2.ML}
|
|
289 |
|
|
290 |
Syntactic redundancies are implicitly dropped. For example,
|
|
291 |
using a modified version of the @{const fac} function
|
|
292 |
as function equation, the then redundant (since
|
21075
|
293 |
syntactically subsumed) original function equations
|
21058
|
294 |
are dropped, resulting in a warning:
|
|
295 |
*}
|
|
296 |
|
|
297 |
lemma [code func]:
|
|
298 |
"fac n = (case n of 0 \<Rightarrow> 1 | Suc m \<Rightarrow> n * fac m)"
|
|
299 |
by (cases n) simp_all
|
|
300 |
|
|
301 |
code_gen fac (SML "examples/fac_case.ML")
|
|
302 |
|
|
303 |
text {*
|
|
304 |
\lstsml{Thy/examples/fac_case.ML}
|
|
305 |
|
|
306 |
\begin{warn}
|
|
307 |
Some statements in this section have to be treated with some
|
|
308 |
caution. First, since the HOL function package is still
|
|
309 |
under development, its setup with respect to code generation
|
|
310 |
may differ from what is presumed here.
|
|
311 |
Further, the attributes \emph{code} and \emph{code del}
|
|
312 |
associated with the existing code generator also apply to
|
|
313 |
the new one: \emph{code} implies \emph{code func},
|
|
314 |
and \emph{code del} implies \emph{code nofunc}.
|
|
315 |
\end{warn}
|
|
316 |
*}
|
20948
|
317 |
|
|
318 |
subsection {* Type classes *}
|
|
319 |
|
21058
|
320 |
text {*
|
|
321 |
Type classes enter the game via the Isar class package.
|
21075
|
322 |
For a short introduction how to use it, see \cite{isabelle-classes};
|
21147
|
323 |
here we just illustrate its impact on code generation.
|
21058
|
324 |
|
|
325 |
In a target language, type classes may be represented
|
21178
|
326 |
natively (as in the case of Haskell). For languages
|
21058
|
327 |
like SML, they are implemented using \emph{dictionaries}.
|
21178
|
328 |
Our following example specifies a class \qt{null},
|
21058
|
329 |
assigning to each of its inhabitants a \qt{null} value:
|
|
330 |
*}
|
|
331 |
|
|
332 |
class null =
|
|
333 |
fixes null :: 'a
|
|
334 |
|
|
335 |
consts
|
|
336 |
head :: "'a\<Colon>null list \<Rightarrow> 'a"
|
|
337 |
|
|
338 |
primrec
|
|
339 |
"head [] = null"
|
|
340 |
"head (x#xs) = x"
|
|
341 |
|
|
342 |
text {*
|
|
343 |
We provide some instances for our @{text null}:
|
|
344 |
*}
|
|
345 |
|
|
346 |
instance option :: (type) null
|
|
347 |
"null \<equiv> None" ..
|
|
348 |
|
|
349 |
instance list :: (type) null
|
|
350 |
"null \<equiv> []" ..
|
|
351 |
|
|
352 |
text {*
|
|
353 |
Constructing a dummy example:
|
|
354 |
*}
|
|
355 |
|
|
356 |
definition
|
|
357 |
"dummy = head [Some (Suc 0), None]"
|
|
358 |
|
|
359 |
text {*
|
21178
|
360 |
Type classes offer a suitable occasion to introduce
|
21058
|
361 |
the Haskell serializer. Its usage is almost the same
|
21075
|
362 |
as SML, but, in accordance with conventions
|
|
363 |
some Haskell systems enforce, each module ends
|
|
364 |
up in a single file. The module hierarchy is reflected in
|
|
365 |
the file system, with root given by the user.
|
21058
|
366 |
*}
|
|
367 |
|
21075
|
368 |
code_gen dummy (Haskell "examples/")
|
21147
|
369 |
(* NOTE: you may use Haskell only once in this document, otherwise
|
|
370 |
you have to work in distinct subdirectories *)
|
21058
|
371 |
|
|
372 |
text {*
|
|
373 |
\lsthaskell{Thy/examples/Codegen.hs}
|
|
374 |
|
|
375 |
(we have left out all other modules).
|
|
376 |
|
|
377 |
The whole code in SML with explicit dictionary passing:
|
|
378 |
*}
|
|
379 |
|
|
380 |
code_gen dummy (SML "examples/class.ML")
|
|
381 |
|
|
382 |
text {*
|
|
383 |
\lstsml{Thy/examples/class.ML}
|
|
384 |
*}
|
|
385 |
|
|
386 |
subsection {* Incremental code generation *}
|
|
387 |
|
21075
|
388 |
text {*
|
|
389 |
Code generation is \emph{incremental}: theorems
|
|
390 |
and abstract intermediate code are cached and extended on demand.
|
|
391 |
The cache may be partially or fully dropped if the underlying
|
|
392 |
executable content of the theory changes.
|
|
393 |
Implementation of caching is supposed to transparently
|
|
394 |
hid away the details from the user. Anyway, caching
|
|
395 |
reaches the surface by using a slightly more general form
|
21323
|
396 |
of the @{text "\<CODEGEN>"}: either the list of constants or the
|
21075
|
397 |
list of serialization expressions may be dropped. If no
|
|
398 |
serialization expressions are given, only abstract code
|
|
399 |
is generated and cached; if no constants are given, the
|
|
400 |
current cache is serialized.
|
|
401 |
|
21323
|
402 |
For explorative purpose, an extended version of the
|
|
403 |
@{text "\<CODEGEN>"} command may prove useful:
|
21075
|
404 |
*}
|
|
405 |
|
|
406 |
print_codethms ()
|
|
407 |
|
|
408 |
text {*
|
|
409 |
\noindent print all cached function equations (i.e.~\emph{after}
|
|
410 |
any applied transformation. Inside the brackets a
|
|
411 |
list of constants may be given; their function
|
21178
|
412 |
equations are added to the cache if not already present.
|
21075
|
413 |
*}
|
|
414 |
|
21058
|
415 |
|
|
416 |
section {* Recipes and advanced topics \label{sec:advanced} *}
|
|
417 |
|
21089
|
418 |
text {*
|
|
419 |
In this tutorial, we do not attempt to give an exhaustive
|
|
420 |
description of the code generator framework; instead,
|
|
421 |
we cast a light on advanced topics by introducing
|
|
422 |
them together with practically motivated examples. Concerning
|
|
423 |
further reading, see
|
|
424 |
|
|
425 |
\begin{itemize}
|
|
426 |
|
|
427 |
\item the Isabelle/Isar Reference Manual \cite{isabelle-isar-ref}
|
|
428 |
for exhaustive syntax diagrams.
|
21222
|
429 |
\item or \fixme[ref] which deals with foundational issues
|
21089
|
430 |
of the code generator framework.
|
|
431 |
|
|
432 |
\end{itemize}
|
|
433 |
*}
|
21058
|
434 |
|
|
435 |
subsection {* Library theories *}
|
|
436 |
|
21089
|
437 |
text {*
|
|
438 |
The HOL \emph{Main} theory already provides a code generator setup
|
|
439 |
which should be suitable for most applications. Common extensions
|
|
440 |
and modifications are available by certain theories of the HOL
|
|
441 |
library; beside being useful in applications, they may serve
|
21178
|
442 |
as a tutorial for customizing the code generator setup.
|
21089
|
443 |
|
|
444 |
\begin{description}
|
|
445 |
|
21323
|
446 |
\item[@{theory "ExecutableSet"}] allows to generate code
|
21089
|
447 |
for finite sets using lists.
|
21323
|
448 |
\item[@{theory "ExecutableRat"}] \label{exec_rat} implements rational
|
21089
|
449 |
numbers as triples @{text "(sign, enumerator, denominator)"}.
|
21323
|
450 |
\item[@{theory "EfficientNat"}] \label{eff_nat} implements natural numbers by integers,
|
21178
|
451 |
which in general will result in higher efficency; pattern
|
21089
|
452 |
matching with @{const "0\<Colon>nat"} / @{const "Suc"}
|
21189
|
453 |
is eliminated.
|
21323
|
454 |
\item[@{theory "MLString"}] provides an additional datatype @{text "mlstring"};
|
21089
|
455 |
in the HOL default setup, strings in HOL are mapped to list
|
|
456 |
of chars in SML; values of type @{text "mlstring"} are
|
|
457 |
mapped to strings in SML.
|
|
458 |
|
|
459 |
\end{description}
|
|
460 |
*}
|
|
461 |
|
20948
|
462 |
subsection {* Preprocessing *}
|
|
463 |
|
21089
|
464 |
text {*
|
21147
|
465 |
Before selected function theorems are turned into abstract
|
|
466 |
code, a chain of definitional transformation steps is carried
|
21178
|
467 |
out: \emph{preprocessing}. There are three possibilities
|
21147
|
468 |
to customize preprocessing: \emph{inline theorems},
|
|
469 |
\emph{inline procedures} and \emph{generic preprocessors}.
|
|
470 |
|
|
471 |
\emph{Inline theorems} are rewriting rules applied to each
|
|
472 |
function equation. Due to the interpretation of theorems
|
|
473 |
of function equations, rewrites are applied to the right
|
|
474 |
hand side and the arguments of the left hand side of an
|
|
475 |
equation, but never to the constant heading the left hand side.
|
|
476 |
Inline theorems may be declared an undeclared using the
|
21178
|
477 |
\emph{code inline} or \emph{code noinline} attribute respectively.
|
21147
|
478 |
|
|
479 |
Some common applications:
|
|
480 |
*}
|
|
481 |
|
|
482 |
text_raw {*
|
|
483 |
\begin{itemize}
|
|
484 |
\item replacing non-executable constructs by executable ones: \\
|
|
485 |
*}
|
|
486 |
|
|
487 |
lemma [code inline]:
|
|
488 |
"x \<in> set xs \<longleftrightarrow> x mem xs" by (induct xs) simp_all
|
|
489 |
|
|
490 |
text_raw {*
|
|
491 |
\item eliminating superfluous constants: \\
|
|
492 |
*}
|
|
493 |
|
|
494 |
lemma [code inline]:
|
|
495 |
"1 = Suc 0" by simp
|
|
496 |
|
|
497 |
text_raw {*
|
|
498 |
\item replacing executable but inconvenient constructs: \\
|
21089
|
499 |
*}
|
|
500 |
|
21147
|
501 |
lemma [code inline]:
|
|
502 |
"xs = [] \<longleftrightarrow> List.null xs" by (induct xs) simp_all
|
|
503 |
|
|
504 |
text_raw {*
|
|
505 |
\end{itemize}
|
|
506 |
*}
|
|
507 |
|
|
508 |
text {*
|
|
509 |
The current set of inline theorems may be inspected using
|
21323
|
510 |
the @{text "\<PRINTCODETHMS>"} command.
|
21147
|
511 |
|
|
512 |
\emph{Inline procedures} are a generalized version of inline
|
|
513 |
theorems written in ML -- rewrite rules are generated dependent
|
|
514 |
on the function theorems for a certain function. One
|
|
515 |
application is the implicit expanding of @{typ nat} numerals
|
|
516 |
to @{const "0\<Colon>nat"} / @{const Suc} representation. See further
|
|
517 |
\secref{sec:ml}
|
|
518 |
|
|
519 |
\emph{Generic preprocessors} provide a most general interface,
|
|
520 |
transforming a list of function theorems to another
|
|
521 |
list of function theorems, provided that neither the heading
|
|
522 |
constant nor its type change. The @{const "0\<Colon>nat"} / @{const Suc}
|
21323
|
523 |
pattern elimination implemented in
|
|
524 |
theory @{theory "EfficientNat"} (\secref{eff_nat}) uses this
|
21147
|
525 |
interface.
|
|
526 |
|
|
527 |
\begin{warn}
|
|
528 |
The order in which single preprocessing steps are carried
|
|
529 |
out currently is not specified; in particular, preprocessing
|
21178
|
530 |
is \emph{no} fix point process. Keep this in mind when
|
21147
|
531 |
setting up the preprocessor.
|
|
532 |
|
|
533 |
Further, the attribute \emph{code unfold}
|
|
534 |
associated with the existing code generator also applies to
|
|
535 |
the new one: \emph{code unfold} implies \emph{code inline}.
|
|
536 |
\end{warn}
|
|
537 |
*}
|
20948
|
538 |
|
21058
|
539 |
subsection {* Customizing serialization *}
|
20948
|
540 |
|
21147
|
541 |
text {*
|
|
542 |
Consider the following function and its corresponding
|
|
543 |
SML code:
|
|
544 |
*}
|
|
545 |
|
|
546 |
fun
|
|
547 |
in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where
|
|
548 |
"in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l"
|
|
549 |
(*<*)
|
|
550 |
declare in_interval.simps [code func]
|
|
551 |
(*>*)
|
|
552 |
|
|
553 |
(*<*)
|
21323
|
554 |
code_type %tt bool
|
21147
|
555 |
(SML)
|
21323
|
556 |
code_const %tt True and False and "op \<and>" and Not
|
21147
|
557 |
(SML and and and)
|
|
558 |
(*>*)
|
|
559 |
|
21323
|
560 |
code_gen in_interval (SML "examples/bool_literal.ML")
|
21147
|
561 |
|
|
562 |
text {*
|
21323
|
563 |
\lstsml{Thy/examples/bool_literal.ML}
|
21147
|
564 |
|
|
565 |
Though this is correct code, it is a little bit unsatisfactory:
|
|
566 |
boolean values and operators are materialized as distinguished
|
|
567 |
entities with have nothing to do with the SML-builtin notion
|
|
568 |
of \qt{bool}. This results in less readable code;
|
|
569 |
additionally, eager evaluation may cause programs to
|
|
570 |
loop or break which would perfectly terminate when
|
|
571 |
the existing SML \qt{bool} would be used. To map
|
|
572 |
the HOL \qt{bool} on SML \qt{bool}, we may use
|
|
573 |
\qn{custom serializations}:
|
|
574 |
*}
|
|
575 |
|
21323
|
576 |
code_type %tt bool
|
21147
|
577 |
(SML "bool")
|
21323
|
578 |
code_const %tt True and False and "op \<and>"
|
21147
|
579 |
(SML "true" and "false" and "_ andalso _")
|
|
580 |
|
|
581 |
text {*
|
21323
|
582 |
The @{text "\<CODETYPE>"} commad takes a type constructor
|
21147
|
583 |
as arguments together with a list of custom serializations.
|
|
584 |
Each custom serialization starts with a target language
|
|
585 |
identifier followed by an expression, which during
|
|
586 |
code serialization is inserted whenever the type constructor
|
21323
|
587 |
would occur. For constants, @{text "\<CODECONST>"} implements
|
|
588 |
the corresponding mechanism. Each ``@{verbatim "_"}'' in
|
21147
|
589 |
a serialization expression is treated as a placeholder
|
|
590 |
for the type constructor's (the constant's) arguments.
|
|
591 |
*}
|
|
592 |
|
|
593 |
code_reserved SML
|
|
594 |
bool true false
|
|
595 |
|
|
596 |
text {*
|
|
597 |
To assert that the existing \qt{bool}, \qt{true} and \qt{false}
|
21323
|
598 |
is not used for generated code, we use @{text "\<CODERESERVED>"}.
|
21147
|
599 |
|
|
600 |
After this setup, code looks quite more readable:
|
|
601 |
*}
|
|
602 |
|
21323
|
603 |
code_gen in_interval (SML "examples/bool_mlbool.ML")
|
21147
|
604 |
|
|
605 |
text {*
|
21323
|
606 |
\lstsml{Thy/examples/bool_mlbool.ML}
|
21147
|
607 |
|
|
608 |
This still is not perfect: the parentheses
|
21323
|
609 |
around the \qt{andalso} expression are superfluous.
|
|
610 |
Though the serializer
|
21147
|
611 |
by no means attempts to imitate the rich Isabelle syntax
|
|
612 |
framework, it provides some common idioms, notably
|
|
613 |
associative infixes with precedences which may be used here:
|
|
614 |
*}
|
|
615 |
|
21323
|
616 |
code_const %tt "op \<and>"
|
21147
|
617 |
(SML infixl 1 "andalso")
|
|
618 |
|
21323
|
619 |
code_gen in_interval (SML "examples/bool_infix.ML")
|
21147
|
620 |
|
|
621 |
text {*
|
21323
|
622 |
\lstsml{Thy/examples/bool_infix.ML}
|
21147
|
623 |
|
|
624 |
Next, we try to map HOL pairs to SML pairs, using the
|
21323
|
625 |
infix ``@{verbatim "*"}'' type constructor and parentheses:
|
21147
|
626 |
*}
|
|
627 |
|
|
628 |
(*<*)
|
|
629 |
code_type *
|
|
630 |
(SML)
|
|
631 |
code_const Pair
|
|
632 |
(SML)
|
|
633 |
(*>*)
|
|
634 |
|
21323
|
635 |
code_type %tt *
|
21147
|
636 |
(SML infix 2 "*")
|
|
637 |
|
21323
|
638 |
code_const %tt Pair
|
21147
|
639 |
(SML "!((_),/ (_))")
|
|
640 |
|
|
641 |
text {*
|
21323
|
642 |
The initial bang ``@{verbatim "!"}'' tells the serializer to never put
|
21147
|
643 |
parentheses around the whole expression (they are already present),
|
|
644 |
while the parentheses around argument place holders
|
|
645 |
tell not to put parentheses around the arguments.
|
21323
|
646 |
The slash ``@{verbatim "/"}'' (followed by arbitrary white space)
|
21147
|
647 |
inserts a space which may be used as a break if necessary
|
|
648 |
during pretty printing.
|
|
649 |
|
21178
|
650 |
So far, we did only provide more idiomatic serializations for
|
|
651 |
constructs which would be executable on their own. Target-specific
|
|
652 |
serializations may also be used to \emph{implement} constructs
|
|
653 |
which have no implicit notion of executability. For example,
|
|
654 |
take the HOL integers:
|
|
655 |
*}
|
|
656 |
|
|
657 |
definition
|
|
658 |
double_inc :: "int \<Rightarrow> int"
|
|
659 |
"double_inc k = 2 * k + 1"
|
|
660 |
|
|
661 |
code_gen double_inc (SML "examples/integers.ML")
|
|
662 |
|
|
663 |
text {*
|
|
664 |
will fail: @{typ int} in HOL is implemented using a quotient
|
|
665 |
type, which does not provide any notion of executability.
|
|
666 |
\footnote{Eventually, we also want to provide executability
|
|
667 |
for quotients.}. However, we could use the SML builtin
|
|
668 |
integers:
|
21147
|
669 |
*}
|
|
670 |
|
21323
|
671 |
code_type %tt int
|
21147
|
672 |
(SML "IntInf.int")
|
|
673 |
|
21323
|
674 |
code_const %tt "op + \<Colon> int \<Rightarrow> int \<Rightarrow> int"
|
21147
|
675 |
and "op * \<Colon> int \<Rightarrow> int \<Rightarrow> int"
|
21178
|
676 |
(SML "IntInf.+ (_, _)" and "IntInf.* (_, _)")
|
|
677 |
|
|
678 |
code_gen double_inc (SML "examples/integers.ML")
|
21147
|
679 |
|
21178
|
680 |
text {*
|
|
681 |
resulting in:
|
21147
|
682 |
|
21178
|
683 |
\lstsml{Thy/examples/integers.ML}
|
|
684 |
*}
|
21147
|
685 |
|
21178
|
686 |
text {*
|
|
687 |
These examples give a glimpse what powerful mechanisms
|
|
688 |
custom serializations provide; however their usage
|
|
689 |
requires careful thinking in order not to introduce
|
|
690 |
inconsistencies -- or, in other words:
|
|
691 |
custom serializations are completely axiomatic.
|
21147
|
692 |
|
21178
|
693 |
A further noteworthy details is that any special
|
|
694 |
character in a custom serialization may be quoted
|
21323
|
695 |
using ``@{verbatim "'"}''; thus, in
|
|
696 |
``@{verbatim "fn '_ => _"}'' the first
|
|
697 |
``@{verbatim "_"}'' is a proper underscore while the
|
|
698 |
second ``@{verbatim "_"}'' is a placeholder.
|
21147
|
699 |
|
21178
|
700 |
The HOL theories provide further
|
|
701 |
examples for custom serializations and form
|
|
702 |
a recommended tutorial on how to use them properly.
|
|
703 |
*}
|
21147
|
704 |
|
|
705 |
subsection {* Concerning operational equality *}
|
|
706 |
|
|
707 |
text {*
|
|
708 |
Surely you have already noticed how equality is treated
|
|
709 |
by the code generator:
|
|
710 |
*}
|
|
711 |
|
|
712 |
fun
|
|
713 |
collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where
|
|
714 |
"collect_duplicates xs ys [] = xs"
|
|
715 |
"collect_duplicates xs ys (z#zs) = (if z \<in> set xs
|
|
716 |
then if z \<in> set ys
|
|
717 |
then collect_duplicates xs ys zs
|
|
718 |
else collect_duplicates xs (z#ys) zs
|
|
719 |
else collect_duplicates (z#xs) (z#ys) zs)"
|
|
720 |
(*<*)
|
|
721 |
lemmas [code func] = collect_duplicates.simps
|
|
722 |
(*>*)
|
|
723 |
|
|
724 |
text {*
|
21217
|
725 |
The membership test during preprocessing is rewritten,
|
21147
|
726 |
resulting in @{const List.memberl}, which itself
|
|
727 |
performs an explicit equality check.
|
|
728 |
*}
|
|
729 |
|
|
730 |
code_gen collect_duplicates (SML "examples/collect_duplicates.ML")
|
|
731 |
|
|
732 |
text {*
|
|
733 |
\lstsml{Thy/examples/collect_duplicates.ML}
|
|
734 |
*}
|
|
735 |
|
|
736 |
text {*
|
|
737 |
Obviously, polymorphic equality is implemented the Haskell
|
|
738 |
way using a type class. How is this achieved? By an
|
|
739 |
almost trivial definition in the HOL setup:
|
|
740 |
*}
|
|
741 |
|
|
742 |
(*<*)
|
|
743 |
setup {* Sign.add_path "foo" *}
|
|
744 |
(*>*)
|
|
745 |
|
|
746 |
class eq =
|
|
747 |
fixes eq :: "'a \<Rightarrow> 'a \<Rightarrow> bool"
|
|
748 |
|
|
749 |
defs
|
21178
|
750 |
eq (*[symmetric, code inline, code func]*): "eq \<equiv> (op =)"
|
21147
|
751 |
|
|
752 |
text {*
|
|
753 |
This merely introduces a class @{text eq} with corresponding
|
|
754 |
operation @{const eq}, which by definition is isomorphic
|
|
755 |
to @{const "op ="}; the preprocessing framework does the rest.
|
|
756 |
*}
|
|
757 |
|
|
758 |
(*<*)
|
|
759 |
lemmas [code noinline] = eq
|
|
760 |
|
|
761 |
hide (open) "class" eq
|
|
762 |
hide (open) const eq
|
|
763 |
|
|
764 |
lemmas [symmetric, code func] = eq_def
|
|
765 |
|
|
766 |
setup {* Sign.parent_path *}
|
|
767 |
(*>*)
|
|
768 |
|
|
769 |
text {*
|
|
770 |
For datatypes, instances of @{text eq} are implicitly derived
|
|
771 |
when possible.
|
|
772 |
|
|
773 |
Though this class is designed to get rarely in the way, there
|
|
774 |
are some cases when it suddenly comes to surface:
|
|
775 |
*}
|
|
776 |
|
21178
|
777 |
subsubsection {* code lemmas and customary serializations for equality *}
|
|
778 |
|
|
779 |
text {*
|
|
780 |
Examine the following:
|
21147
|
781 |
*}
|
|
782 |
|
21323
|
783 |
code_const %tt "op = \<Colon> int \<Rightarrow> int \<Rightarrow> bool"
|
21147
|
784 |
(SML "!(_ : IntInf.int = _)")
|
|
785 |
|
21178
|
786 |
text {*
|
|
787 |
What is wrong here? Since @{const "op ="} is nothing else then
|
21147
|
788 |
a plain constant, this customary serialization will refer
|
|
789 |
to polymorphic equality @{const "op = \<Colon> 'a \<Rightarrow> 'a \<Rightarrow> bool"}.
|
|
790 |
Instead, we want the specific equality on @{typ int},
|
21178
|
791 |
by using the overloaded constant @{const "Code_Generator.eq"}:
|
21147
|
792 |
*}
|
|
793 |
|
21323
|
794 |
code_const %tt "Code_Generator.eq \<Colon> int \<Rightarrow> int \<Rightarrow> bool"
|
21147
|
795 |
(SML "!(_ : IntInf.int = _)")
|
|
796 |
|
21223
|
797 |
subsubsection {* typedecls interpreted by customary serializations *}
|
21178
|
798 |
|
|
799 |
text {*
|
|
800 |
A common idiom is to use unspecified types for formalizations
|
|
801 |
and interpret them for a specific target language:
|
21147
|
802 |
*}
|
|
803 |
|
|
804 |
typedecl key
|
|
805 |
|
|
806 |
fun
|
|
807 |
lookup :: "(key \<times> 'a) list \<Rightarrow> key \<Rightarrow> 'a option" where
|
|
808 |
"lookup [] l = None"
|
|
809 |
"lookup ((k, v) # xs) l = (if k = l then Some v else lookup xs l)"
|
|
810 |
(*<*)
|
|
811 |
lemmas [code func] = lookup.simps
|
|
812 |
(*>*)
|
|
813 |
|
21323
|
814 |
code_type %tt key
|
21147
|
815 |
(SML "string")
|
|
816 |
|
21178
|
817 |
text {*
|
|
818 |
This, though, is not sufficient: @{typ key} is no instance
|
21147
|
819 |
of @{text eq} since @{typ key} is no datatype; the instance
|
|
820 |
has to be declared manually, including a serialization
|
21178
|
821 |
for the particular instance of @{const "Code_Generator.eq"}:
|
21147
|
822 |
*}
|
|
823 |
|
|
824 |
instance key :: eq ..
|
|
825 |
|
21323
|
826 |
code_const %tt "Code_Generator.eq \<Colon> key \<Rightarrow> key \<Rightarrow> bool"
|
21147
|
827 |
(SML "!(_ : string = _)")
|
|
828 |
|
21178
|
829 |
text {*
|
|
830 |
Then everything goes fine:
|
21147
|
831 |
*}
|
|
832 |
|
|
833 |
code_gen lookup (SML "examples/lookup.ML")
|
|
834 |
|
|
835 |
text {*
|
|
836 |
\lstsml{Thy/examples/lookup.ML}
|
|
837 |
*}
|
|
838 |
|
21178
|
839 |
subsubsection {* lexicographic orderings and coregularity *}
|
|
840 |
|
|
841 |
text {*
|
|
842 |
Another subtlety
|
21147
|
843 |
enters the stage when definitions of overloaded constants
|
|
844 |
are dependent on operational equality. For example, let
|
21189
|
845 |
us define a lexicographic ordering on tuples:
|
21147
|
846 |
*}
|
|
847 |
|
|
848 |
(*<*)
|
21178
|
849 |
setup {* Sign.add_path "foobar" *}
|
21147
|
850 |
|
21178
|
851 |
class eq = fixes eq :: "'a \<Rightarrow> 'a \<Rightarrow> bool"
|
|
852 |
class ord =
|
|
853 |
fixes less_eq :: "'a \<Rightarrow> 'a \<Rightarrow> bool" ("(_/ \<^loc>\<le> _)" [50, 51] 50)
|
|
854 |
fixes less :: "'a \<Rightarrow> 'a \<Rightarrow> bool" ("(_/ \<^loc>< _)" [50, 51] 50)
|
21147
|
855 |
(*>*)
|
|
856 |
|
21178
|
857 |
instance * :: (ord, ord) ord
|
|
858 |
"p1 < p2 \<equiv> let (x1 \<Colon> 'a\<Colon>ord, y1 \<Colon> 'b\<Colon>ord) = p1; (x2, y2) = p2 in
|
21147
|
859 |
x1 < x2 \<or> (x1 = x2 \<and> y1 < y2)"
|
21178
|
860 |
"p1 \<le> p2 \<equiv> p1 < p2 \<or> (p1 \<Colon> 'a\<Colon>ord \<times> 'b\<Colon>ord) = p2" ..
|
21147
|
861 |
|
|
862 |
(*<*)
|
21178
|
863 |
hide "class" eq ord
|
|
864 |
hide const eq less_eq less
|
|
865 |
setup {* Sign.parent_path *}
|
21147
|
866 |
(*>*)
|
|
867 |
|
21178
|
868 |
text {*
|
|
869 |
Then code generation will fail. Why? The definition
|
21147
|
870 |
of @{const "op \<le>"} depends on equality on both arguments,
|
21178
|
871 |
which are polymorphic and impose an additional @{text eq}
|
21147
|
872 |
class constraint, thus violating the type discipline
|
|
873 |
for class operations.
|
|
874 |
|
21178
|
875 |
The solution is to add @{text eq} to both sort arguments:
|
21147
|
876 |
*}
|
|
877 |
|
|
878 |
instance * :: ("{eq, ord}", "{eq, ord}") ord
|
21178
|
879 |
"p1 < p2 \<equiv> let (x1 \<Colon> 'a\<Colon>{eq, ord}, y1 \<Colon> 'b\<Colon>{eq, ord}) = p1; (x2, y2) = p2 in
|
21147
|
880 |
x1 < x2 \<or> (x1 = x2 \<and> y1 < y2)"
|
21178
|
881 |
"p1 \<le> p2 \<equiv> p1 < p2 \<or> (p1 \<Colon> 'a\<Colon>{eq, ord} \<times> 'b\<Colon>{eq, ord}) = p2" ..
|
21147
|
882 |
|
21178
|
883 |
text {*
|
|
884 |
Then code generation succeeds:
|
21147
|
885 |
*}
|
|
886 |
|
|
887 |
code_gen "op \<le> \<Colon> 'a\<Colon>{eq, ord} \<times> 'b\<Colon>{eq, ord} \<Rightarrow> 'a \<times> 'b \<Rightarrow> bool"
|
|
888 |
(SML "examples/lexicographic.ML")
|
|
889 |
|
|
890 |
text {*
|
|
891 |
\lstsml{Thy/examples/lexicographic.ML}
|
|
892 |
*}
|
|
893 |
|
21178
|
894 |
subsubsection {* Haskell serialization *}
|
|
895 |
|
|
896 |
text {*
|
|
897 |
For convenience, the default
|
|
898 |
HOL setup for Haskell maps the @{text eq} class to
|
|
899 |
its counterpart in Haskell, giving custom serializations
|
21323
|
900 |
for the class (@{text "\<CODECLASS>"}) and its operation:
|
21178
|
901 |
*}
|
|
902 |
|
|
903 |
(*<*)
|
|
904 |
setup {* Sign.add_path "bar" *}
|
|
905 |
class eq = fixes eq :: "'a \<Rightarrow> 'a \<Rightarrow> bool"
|
|
906 |
(*>*)
|
|
907 |
|
21323
|
908 |
code_class %tt eq
|
21178
|
909 |
(Haskell "Eq" where eq \<equiv> "(==)")
|
|
910 |
|
21323
|
911 |
code_const %tt eq
|
21178
|
912 |
(Haskell infixl 4 "==")
|
|
913 |
|
|
914 |
(*<*)
|
|
915 |
hide "class" eq
|
|
916 |
hide const eq
|
|
917 |
setup {* Sign.parent_path *}
|
|
918 |
(*>*)
|
|
919 |
|
|
920 |
text {*
|
|
921 |
A problem now occurs whenever a type which
|
|
922 |
is an instance of @{text eq} in HOL is mapped
|
|
923 |
on a Haskell-builtin type which is also an instance
|
|
924 |
of Haskell @{text Eq}:
|
21147
|
925 |
*}
|
|
926 |
|
21178
|
927 |
typedecl bar
|
|
928 |
|
|
929 |
instance bar :: eq ..
|
|
930 |
|
21323
|
931 |
code_type %tt bar
|
21178
|
932 |
(Haskell "Integer")
|
|
933 |
|
|
934 |
text {*
|
|
935 |
The code generator would produce
|
|
936 |
an additional instance, which of course is rejected.
|
|
937 |
To suppress this additional instance, use
|
21323
|
938 |
@{text "\<CODEINSTANCE>"}:
|
21147
|
939 |
*}
|
|
940 |
|
21323
|
941 |
code_instance %tt bar :: eq
|
21178
|
942 |
(Haskell -)
|
|
943 |
|
|
944 |
subsection {* Types matter *}
|
|
945 |
|
|
946 |
text {*
|
|
947 |
Imagine the following quick-and-dirty setup for implementing
|
21189
|
948 |
some kind of sets as lists in SML:
|
21178
|
949 |
*}
|
|
950 |
|
21323
|
951 |
code_type %tt set
|
21178
|
952 |
(SML "_ list")
|
|
953 |
|
21323
|
954 |
code_const %tt "{}" and insert
|
21178
|
955 |
(SML "![]" and infixl 7 "::")
|
|
956 |
|
|
957 |
definition
|
21189
|
958 |
dummy_set :: "(nat \<Rightarrow> nat) set"
|
|
959 |
"dummy_set = {Suc}"
|
|
960 |
|
|
961 |
text {*
|
|
962 |
Then code generation for @{const dummy_set} will fail.
|
|
963 |
Why? A glimpse at the function equations will offer:
|
|
964 |
*}
|
|
965 |
|
|
966 |
print_codethms (insert)
|
|
967 |
|
|
968 |
text {*
|
|
969 |
This reveals the function equation @{thm insert_def}
|
21223
|
970 |
for @{const insert}, which is operationally meaningless
|
21189
|
971 |
but forces an equality constraint on the set members
|
21223
|
972 |
(which is not satisfiable if the set members are functions).
|
21189
|
973 |
Even when using set of natural numbers (which are an instance
|
|
974 |
of \emph{eq}), we run into a problem:
|
|
975 |
*}
|
|
976 |
|
|
977 |
definition
|
|
978 |
foobar_set :: "nat set"
|
|
979 |
"foobar_set = {0, 1, 2}"
|
|
980 |
|
|
981 |
text {*
|
|
982 |
In this case the serializer would complain that @{const insert}
|
|
983 |
expects dictionaries (namely an \emph{eq} dictionary) but
|
|
984 |
has also been given a customary serialization.
|
|
985 |
|
|
986 |
The solution to this dilemma:
|
|
987 |
*}
|
|
988 |
|
|
989 |
lemma [code func]:
|
|
990 |
"insert = insert" ..
|
|
991 |
|
|
992 |
code_gen dummy_set foobar_set (SML "examples/dirty_set.ML")
|
|
993 |
|
|
994 |
text {*
|
|
995 |
\lstsml{Thy/examples/dirty_set.ML}
|
21178
|
996 |
|
21189
|
997 |
Reflexive function equations by convention are dropped.
|
|
998 |
But their presence prevents primitive definitions to be
|
|
999 |
used as function equations:
|
|
1000 |
*}
|
|
1001 |
|
|
1002 |
print_codethms (insert)
|
|
1003 |
|
|
1004 |
text {*
|
|
1005 |
will show \emph{no} function equations for insert.
|
21178
|
1006 |
|
21189
|
1007 |
Note that the sort constraints of reflexive equations
|
|
1008 |
are considered; so
|
|
1009 |
*}
|
|
1010 |
|
|
1011 |
lemma [code func]:
|
|
1012 |
"(insert \<Colon> 'a\<Colon>eq \<Rightarrow> 'a set \<Rightarrow> 'a set) = insert" ..
|
|
1013 |
|
|
1014 |
text {*
|
|
1015 |
would mean nothing else than to introduce the evil
|
|
1016 |
sort constraint by hand.
|
|
1017 |
*}
|
|
1018 |
|
|
1019 |
subsection {* Cyclic module dependencies *}
|
21178
|
1020 |
|
21189
|
1021 |
text {*
|
|
1022 |
Sometimes the awkward situation occurs that dependencies
|
|
1023 |
between definitions introduce cyclic dependencies
|
|
1024 |
between modules, which in the Haskell world leaves
|
|
1025 |
you to the mercy of the Haskell implementation you are using,
|
|
1026 |
while for SML code generation is not possible.
|
21178
|
1027 |
|
21189
|
1028 |
A solution is to declare module names explicitly.
|
|
1029 |
Let use assume the three cyclically dependent
|
|
1030 |
modules are named \emph{A}, \emph{B} and \emph{C}.
|
|
1031 |
Then, by stating
|
|
1032 |
*}
|
|
1033 |
|
|
1034 |
code_modulename SML
|
|
1035 |
A ABC
|
|
1036 |
B ABC
|
|
1037 |
C ABC
|
|
1038 |
|
|
1039 |
text {*
|
|
1040 |
we explicitly map all those modules on \emph{ABC},
|
|
1041 |
resulting in an ad-hoc merge of this three modules
|
|
1042 |
at serialization time.
|
|
1043 |
*}
|
21147
|
1044 |
|
|
1045 |
subsection {* Axiomatic extensions *}
|
|
1046 |
|
|
1047 |
text {*
|
|
1048 |
\begin{warn}
|
|
1049 |
The extensions introduced in this section, though working
|
21189
|
1050 |
in practice, are not the cream of the crop, as you
|
|
1051 |
will notice during reading. They will
|
21147
|
1052 |
eventually be replaced by more mature approaches.
|
|
1053 |
\end{warn}
|
21189
|
1054 |
|
|
1055 |
Sometimes equalities are taken for granted which are
|
|
1056 |
not derivable inside the HOL logic but are silently assumed
|
|
1057 |
to hold for executable code. For example, we may want
|
|
1058 |
to identify the famous HOL constant @{const arbitrary}
|
|
1059 |
of type @{typ "'a option"} with @{const None}.
|
|
1060 |
By brute force:
|
|
1061 |
*}
|
|
1062 |
|
21323
|
1063 |
axiomatization where
|
|
1064 |
"arbitrary = None"
|
21189
|
1065 |
|
|
1066 |
text {*
|
|
1067 |
However this has to be considered harmful since this axiom,
|
|
1068 |
though probably justifiable for generated code, could
|
|
1069 |
introduce serious inconsistencies into the logic.
|
|
1070 |
|
|
1071 |
So, there is a distinguished construct for stating axiomatic
|
|
1072 |
equalities of constants which apply only for code generation.
|
|
1073 |
Before introducing this, here is a convenient place to describe
|
|
1074 |
shortly how to deal with some restrictions the type discipline
|
|
1075 |
imposes.
|
|
1076 |
|
|
1077 |
By itself, the constant @{const arbitrary} is a non-overloaded
|
|
1078 |
polymorphic constant. So, there is no way to distinguish
|
|
1079 |
different versions of @{const arbitrary} for different types
|
|
1080 |
inside the code generator framework. However, inlining
|
|
1081 |
theorems together with auxiliary constants provide a solution:
|
21147
|
1082 |
*}
|
|
1083 |
|
21189
|
1084 |
definition
|
|
1085 |
arbitrary_option :: "'a option"
|
|
1086 |
[symmetric, code inline]: "arbitrary_option = arbitrary"
|
|
1087 |
|
|
1088 |
text {*
|
|
1089 |
By that, we replace any @{const arbitrary} with option type
|
|
1090 |
by @{const arbitrary_option} in function equations.
|
|
1091 |
|
|
1092 |
For technical reasons, we further have to provide a
|
|
1093 |
synonym for @{const None} which in code generator view
|
|
1094 |
is a function rather than a datatype constructor
|
|
1095 |
*}
|
|
1096 |
|
|
1097 |
definition
|
|
1098 |
"None' = None"
|
|
1099 |
|
|
1100 |
text {*
|
21323
|
1101 |
Then finally we are enabled to use @{text "\<CODEAXIOMS>"}:
|
21189
|
1102 |
*}
|
|
1103 |
|
|
1104 |
code_axioms
|
|
1105 |
arbitrary_option \<equiv> None'
|
|
1106 |
|
|
1107 |
text {*
|
|
1108 |
A dummy example:
|
|
1109 |
*}
|
|
1110 |
|
|
1111 |
fun
|
|
1112 |
dummy_option :: "'a list \<Rightarrow> 'a option" where
|
|
1113 |
"dummy_option (x#xs) = Some x"
|
|
1114 |
"dummy_option [] = arbitrary"
|
|
1115 |
(*<*)
|
|
1116 |
declare dummy_option.simps [code func]
|
|
1117 |
(*>*)
|
|
1118 |
|
|
1119 |
code_gen dummy_option (SML "examples/arbitrary.ML")
|
|
1120 |
|
|
1121 |
text {*
|
|
1122 |
\lstsml{Thy/examples/arbitrary.ML}
|
|
1123 |
|
|
1124 |
Another axiomatic extension is code generation
|
|
1125 |
for abstracted types. For this, the
|
21323
|
1126 |
@{theory "ExecutableRat"} (see \secref{exec_rat})
|
21189
|
1127 |
forms a good example.
|
|
1128 |
*}
|
|
1129 |
|
20948
|
1130 |
|
21058
|
1131 |
section {* ML interfaces \label{sec:ml} *}
|
20948
|
1132 |
|
21189
|
1133 |
text {*
|
|
1134 |
Since the code generator framework not only aims to provide
|
|
1135 |
a nice Isar interface but also to form a base for
|
|
1136 |
code-generation-based applications, here a short
|
|
1137 |
description of the most important ML interfaces.
|
|
1138 |
*}
|
|
1139 |
|
21147
|
1140 |
subsection {* Constants with type discipline: codegen\_consts.ML *}
|
|
1141 |
|
21189
|
1142 |
text {*
|
|
1143 |
This Pure module manages identification of (probably overloaded)
|
|
1144 |
constants by unique identifiers.
|
|
1145 |
*}
|
|
1146 |
|
21147
|
1147 |
text %mlref {*
|
|
1148 |
\begin{mldecls}
|
21323
|
1149 |
@{index_ML_type CodegenConsts.const: "string * typ list"} \\
|
21189
|
1150 |
@{index_ML CodegenConsts.norm_of_typ: "theory -> string * typ -> CodegenConsts.const"} \\
|
21147
|
1151 |
@{index_ML CodegenConsts.typ_of_inst: "theory -> CodegenConsts.const -> string * typ"} \\
|
21189
|
1152 |
\end{mldecls}
|
|
1153 |
|
|
1154 |
\begin{description}
|
|
1155 |
|
|
1156 |
\item @{ML_type CodegenConsts.const} is the identifier type:
|
|
1157 |
the product of a \emph{string} with a list of \emph{typs}.
|
|
1158 |
The \emph{string} is the constant name as represented inside Isabelle;
|
21223
|
1159 |
the \emph{typs} are a type instantiation in the sense of System F,
|
21189
|
1160 |
with canonical names for type variables.
|
|
1161 |
|
|
1162 |
\item @{ML CodegenConsts.norm_of_typ}~@{text thy}~@{text "(constname, typ)"}
|
|
1163 |
maps a constant expression @{text "(constname, typ)"} to its canonical identifier.
|
|
1164 |
|
|
1165 |
\item @{ML CodegenConsts.typ_of_inst}~@{text thy}~@{text const}
|
|
1166 |
maps a canonical identifier @{text const} to a constant
|
|
1167 |
expression with appropriate type.
|
|
1168 |
|
|
1169 |
\end{description}
|
21147
|
1170 |
*}
|
|
1171 |
|
|
1172 |
subsection {* Executable theory content: codegen\_data.ML *}
|
|
1173 |
|
|
1174 |
text {*
|
|
1175 |
This Pure module implements the core notions of
|
|
1176 |
executable content of a theory.
|
|
1177 |
*}
|
|
1178 |
|
|
1179 |
subsubsection {* Suspended theorems *}
|
|
1180 |
|
|
1181 |
text %mlref {*
|
|
1182 |
\begin{mldecls}
|
21341
|
1183 |
@{index_ML CodegenData.lazy: "(unit -> thm list) -> thm list Susp.T"}
|
21147
|
1184 |
\end{mldecls}
|
21189
|
1185 |
|
|
1186 |
\begin{description}
|
|
1187 |
|
|
1188 |
\item @{ML CodegenData.lazy}~@{text f} turns an abstract
|
21323
|
1189 |
theorem computation @{text f} into a suspension of theorems.
|
21189
|
1190 |
|
|
1191 |
\end{description}
|
21147
|
1192 |
*}
|
|
1193 |
|
|
1194 |
subsubsection {* Executable content *}
|
20948
|
1195 |
|
21147
|
1196 |
text %mlref {*
|
|
1197 |
\begin{mldecls}
|
|
1198 |
@{index_ML CodegenData.add_func: "thm -> theory -> theory"} \\
|
|
1199 |
@{index_ML CodegenData.del_func: "thm -> theory -> theory"} \\
|
21341
|
1200 |
@{index_ML CodegenData.add_funcl: "CodegenConsts.const * thm list Susp.T -> theory -> theory"} \\
|
21147
|
1201 |
@{index_ML CodegenData.add_inline: "thm -> theory -> theory"} \\
|
|
1202 |
@{index_ML CodegenData.del_inline: "thm -> theory -> theory"} \\
|
21189
|
1203 |
@{index_ML CodegenData.add_inline_proc: "(theory -> cterm list -> thm list)
|
|
1204 |
-> theory -> theory"} \\
|
|
1205 |
@{index_ML CodegenData.add_preproc: "(theory -> thm list -> thm list)
|
|
1206 |
-> theory -> theory"} \\
|
|
1207 |
@{index_ML CodegenData.add_datatype: "string * (((string * sort) list * (string * typ list) list)
|
21341
|
1208 |
* thm list Susp.T) -> theory -> theory"} \\
|
21189
|
1209 |
@{index_ML CodegenData.del_datatype: "string -> theory -> theory"} \\
|
|
1210 |
@{index_ML CodegenData.get_datatype: "theory -> string
|
|
1211 |
-> ((string * sort) list * (string * typ list) list) option"} \\
|
21147
|
1212 |
@{index_ML CodegenData.get_datatype_of_constr: "theory -> CodegenConsts.const -> string option"}
|
|
1213 |
\end{mldecls}
|
|
1214 |
|
|
1215 |
\begin{description}
|
|
1216 |
|
21189
|
1217 |
\item @{ML CodegenData.add_func}~@{text "thm"}~@{text "thy"} adds function
|
|
1218 |
theorem @{text "thm"} to executable content.
|
|
1219 |
|
|
1220 |
\item @{ML CodegenData.del_func}~@{text "thm"}~@{text "thy"} removes function
|
|
1221 |
theorem @{text "thm"} from executable content, if present.
|
|
1222 |
|
|
1223 |
\item @{ML CodegenData.add_funcl}~@{text "(const, lthms)"}~@{text "thy"} adds
|
|
1224 |
suspended function equations @{text lthms} for constant
|
|
1225 |
@{text const} to executable content.
|
|
1226 |
|
|
1227 |
\item @{ML CodegenData.add_inline}~@{text "thm"}~@{text "thy"} adds
|
21223
|
1228 |
inlining theorem @{text thm} to executable content.
|
21189
|
1229 |
|
|
1230 |
\item @{ML CodegenData.del_inline}~@{text "thm"}~@{text "thy"} remove
|
|
1231 |
inlining theorem @{text thm} from executable content, if present.
|
|
1232 |
|
|
1233 |
\item @{ML CodegenData.add_inline_proc}~@{text "f"}~@{text "thy"} adds
|
|
1234 |
inline procedure @{text f} to executable content;
|
|
1235 |
@{text f} is a computation of rewrite rules dependent on
|
|
1236 |
the current theory context and the list of all arguments
|
|
1237 |
and right hand sides of the function equations belonging
|
|
1238 |
to a certain function definition.
|
|
1239 |
|
|
1240 |
\item @{ML CodegenData.add_preproc}~@{text "f"}~@{text "thy"} adds
|
|
1241 |
generic preprocessor @{text f} to executable content;
|
|
1242 |
@{text f} is a transformation of the function equations belonging
|
|
1243 |
to a certain function definition, depending on the
|
|
1244 |
current theory context.
|
|
1245 |
|
|
1246 |
\item @{ML CodegenData.add_datatype}~@{text "(name, (spec, cert))"}~@{text "thy"} adds
|
|
1247 |
a datatype to executable content, with type constructor
|
|
1248 |
@{text name} and specification @{text spec}; @{text spec} is
|
|
1249 |
a pair consisting of a list of type variable with sort
|
21223
|
1250 |
constraints and a list of constructors with name
|
21189
|
1251 |
and types of arguments. The addition as datatype
|
|
1252 |
has to be justified giving a certificate of suspended
|
21223
|
1253 |
theorems as witnesses for injectiveness and distinctness.
|
21189
|
1254 |
|
|
1255 |
\item @{ML CodegenData.del_datatype}~@{text "name"}~@{text "thy"}
|
|
1256 |
remove a datatype from executable content, if present.
|
|
1257 |
|
|
1258 |
\item @{ML CodegenData.get_datatype_of_constr}~@{text "thy"}~@{text "const"}
|
|
1259 |
returns type constructor corresponding to
|
|
1260 |
constructor @{text const}; returns @{text NONE}
|
|
1261 |
if @{text const} is no constructor.
|
21147
|
1262 |
|
|
1263 |
\end{description}
|
|
1264 |
*}
|
|
1265 |
|
21189
|
1266 |
subsection {* Function equation systems: codegen\_funcgr.ML *}
|
|
1267 |
|
|
1268 |
text {*
|
21217
|
1269 |
Out of the executable content of a theory, a normalized
|
|
1270 |
function equation systems may be constructed containing
|
|
1271 |
function definitions for constants. The system is cached
|
|
1272 |
until its underlying executable content changes.
|
21189
|
1273 |
*}
|
|
1274 |
|
|
1275 |
text %mlref {*
|
|
1276 |
\begin{mldecls}
|
|
1277 |
@{index_ML_type CodegenFuncgr.T} \\
|
|
1278 |
@{index_ML CodegenFuncgr.make: "theory -> CodegenConsts.const list -> CodegenFuncgr.T"} \\
|
|
1279 |
@{index_ML CodegenFuncgr.funcs: "CodegenFuncgr.T -> CodegenConsts.const -> thm list"} \\
|
|
1280 |
@{index_ML CodegenFuncgr.typ: "CodegenFuncgr.T -> CodegenConsts.const -> typ"} \\
|
|
1281 |
@{index_ML CodegenFuncgr.deps: "CodegenFuncgr.T
|
|
1282 |
-> CodegenConsts.const list -> CodegenConsts.const list list"} \\
|
|
1283 |
@{index_ML CodegenFuncgr.all: "CodegenFuncgr.T -> CodegenConsts.const list"}
|
|
1284 |
\end{mldecls}
|
21217
|
1285 |
|
|
1286 |
\begin{description}
|
|
1287 |
|
|
1288 |
\item @{ML_type CodegenFuncgr.T} represents
|
|
1289 |
a normalized function equation system.
|
|
1290 |
|
|
1291 |
\item @{ML CodegenFuncgr.make}~@{text thy}~@{text cs}
|
|
1292 |
returns a normalized function equation system,
|
|
1293 |
with the assertion that it contains any function
|
21223
|
1294 |
definition for constants @{text cs} (if existing).
|
21217
|
1295 |
|
|
1296 |
\item @{ML CodegenFuncgr.funcs}~@{text funcgr}~@{text c}
|
|
1297 |
retrieves function definition for constant @{text c}.
|
|
1298 |
|
|
1299 |
\item @{ML CodegenFuncgr.typ}~@{text funcgr}~@{text c}
|
|
1300 |
retrieves function type for constant @{text c}.
|
|
1301 |
|
|
1302 |
\item @{ML CodegenFuncgr.deps}~@{text funcgr}~@{text cs}
|
|
1303 |
returns the transitive closure of dependencies for
|
|
1304 |
constants @{text cs} as a partitioning where each partition
|
|
1305 |
corresponds to a strongly connected component of
|
|
1306 |
dependencies and any partition does \emph{not}
|
|
1307 |
depend on partitions further left.
|
|
1308 |
|
|
1309 |
\item @{ML CodegenFuncgr.all}~@{text funcgr}
|
|
1310 |
returns all currently represented constants.
|
|
1311 |
|
|
1312 |
\end{description}
|
21189
|
1313 |
*}
|
|
1314 |
|
21147
|
1315 |
subsection {* Further auxiliary *}
|
|
1316 |
|
|
1317 |
text %mlref {*
|
|
1318 |
\begin{mldecls}
|
|
1319 |
@{index_ML CodegenConsts.const_ord: "CodegenConsts.const * CodegenConsts.const -> order"} \\
|
|
1320 |
@{index_ML CodegenConsts.eq_const: "CodegenConsts.const * CodegenConsts.const -> bool"} \\
|
|
1321 |
@{index_ML CodegenConsts.consts_of: "theory -> term -> CodegenConsts.const list"} \\
|
|
1322 |
@{index_ML CodegenConsts.read_const: "theory -> string -> CodegenConsts.const"} \\
|
|
1323 |
@{index_ML_structure CodegenConsts.Consttab} \\
|
21189
|
1324 |
@{index_ML_structure CodegenFuncgr.Constgraph} \\
|
21147
|
1325 |
@{index_ML CodegenData.typ_func: "theory -> thm -> typ"} \\
|
|
1326 |
@{index_ML CodegenData.rewrite_func: "thm list -> thm -> thm"} \\
|
|
1327 |
\end{mldecls}
|
21217
|
1328 |
|
|
1329 |
\begin{description}
|
|
1330 |
|
|
1331 |
\item @{ML CodegenConsts.const_ord},~@{ML CodegenConsts.eq_const}
|
|
1332 |
provide order and equality on constant identifiers.
|
|
1333 |
|
|
1334 |
\item @{ML_struct CodegenConsts.Consttab},~@{ML_struct CodegenFuncgr.Constgraph}
|
|
1335 |
provide advanced data structures with constant identifiers as keys.
|
|
1336 |
|
|
1337 |
\item @{ML CodegenConsts.consts_of}~@{text thy}~@{text t}
|
|
1338 |
returns all constant identifiers mentioned in a term @{text t}.
|
|
1339 |
|
|
1340 |
\item @{ML CodegenConsts.read_const}~@{text thy}~@{text s}
|
|
1341 |
reads a constant as a concrete term expression @{text s}.
|
|
1342 |
|
|
1343 |
\item @{ML CodegenData.typ_func}~@{text thy}~@{text thm}
|
|
1344 |
extracts the type of a constant in a function equation @{text thm}.
|
|
1345 |
|
|
1346 |
\item @{ML CodegenData.rewrite_func}~@{text rews}~@{text thm}
|
|
1347 |
rewrites a function equation @{text thm} with a set of rewrite
|
|
1348 |
rules @{text rews}; only arguments and right hand side are rewritten,
|
|
1349 |
not the head of the function equation.
|
|
1350 |
|
|
1351 |
\end{description}
|
|
1352 |
|
21147
|
1353 |
*}
|
20948
|
1354 |
|
|
1355 |
subsection {* Implementing code generator applications *}
|
|
1356 |
|
21147
|
1357 |
text {*
|
21217
|
1358 |
Implementing code generator applications on top
|
|
1359 |
of the framework set out so far usually not only
|
|
1360 |
involves using those primitive interfaces
|
|
1361 |
but also storing code-dependent data and various
|
|
1362 |
other things.
|
|
1363 |
|
21147
|
1364 |
\begin{warn}
|
|
1365 |
Some interfaces discussed here have not reached
|
|
1366 |
a final state yet.
|
|
1367 |
Changes likely to occur in future.
|
|
1368 |
\end{warn}
|
21222
|
1369 |
|
|
1370 |
\fixme
|
21147
|
1371 |
*}
|
|
1372 |
|
|
1373 |
subsubsection {* Data depending on the theory's executable content *}
|
|
1374 |
|
21217
|
1375 |
text {*
|
|
1376 |
\medskip
|
|
1377 |
\begin{tabular}{l}
|
|
1378 |
@{text "val name: string"} \\
|
|
1379 |
@{text "type T"} \\
|
|
1380 |
@{text "val empty: T"} \\
|
|
1381 |
@{text "val merge: Pretty.pp \<rightarrow> T * T \<rightarrow> T"} \\
|
|
1382 |
@{text "val purge: theory option \<rightarrow> CodegenConsts.const list option \<rightarrow> T \<rightarrow> T"}
|
|
1383 |
\end{tabular}
|
|
1384 |
|
|
1385 |
\medskip
|
|
1386 |
|
|
1387 |
\begin{tabular}{l}
|
|
1388 |
@{text "init: theory \<rightarrow> theory"} \\
|
|
1389 |
@{text "get: theory \<rightarrow> T"} \\
|
|
1390 |
@{text "change: theory \<rightarrow> (T \<rightarrow> T) \<rightarrow> T"} \\
|
|
1391 |
@{text "change_yield: theory \<rightarrow> (T \<rightarrow> 'a * T) \<rightarrow> 'a * T"}
|
|
1392 |
\end{tabular}
|
|
1393 |
*}
|
|
1394 |
|
|
1395 |
text %mlref {*
|
|
1396 |
\begin{mldecls}
|
|
1397 |
@{index_ML_functor CodeDataFun}
|
|
1398 |
\end{mldecls}
|
|
1399 |
|
|
1400 |
\begin{description}
|
|
1401 |
|
|
1402 |
\item @{ML_functor CodeDataFun}@{text "(spec)"} declares code
|
|
1403 |
dependent data according to the specification provided as
|
|
1404 |
argument structure. The resulting structure provides data init and
|
|
1405 |
access operations as described above.
|
|
1406 |
|
|
1407 |
\end{description}
|
|
1408 |
*}
|
|
1409 |
|
21147
|
1410 |
subsubsection {* Datatype hooks *}
|
|
1411 |
|
21217
|
1412 |
text %mlref {*
|
|
1413 |
\begin{mldecls}
|
21323
|
1414 |
@{index_ML_type DatatypeHooks.hook: "string list -> theory -> theory"} \\
|
21217
|
1415 |
@{index_ML DatatypeHooks.add: "DatatypeHooks.hook -> theory -> theory"}
|
|
1416 |
\end{mldecls}
|
21147
|
1417 |
*}
|
21058
|
1418 |
|
21217
|
1419 |
text %mlref {*
|
|
1420 |
\begin{mldecls}
|
21323
|
1421 |
@{index_ML_type TypecopyPackage.info: "{
|
|
1422 |
vs: (string * sort) list,
|
|
1423 |
constr: string,
|
|
1424 |
typ: typ,
|
|
1425 |
inject: thm,
|
|
1426 |
proj: string * typ,
|
|
1427 |
proj_def: thm
|
|
1428 |
}"} \\
|
21217
|
1429 |
@{index_ML TypecopyPackage.add_typecopy: "
|
|
1430 |
bstring * string list -> typ -> (bstring * bstring) option
|
|
1431 |
-> theory -> (string * TypecopyPackage.info) * theory"} \\
|
|
1432 |
@{index_ML TypecopyPackage.get_typecopies: "theory -> string list"} \\
|
|
1433 |
@{index_ML TypecopyPackage.get_typecopy_info: "theory
|
|
1434 |
-> string -> TypecopyPackage.info option"} \\
|
|
1435 |
@{index_ML_type TypecopyPackage.hook} \\
|
|
1436 |
@{index_ML TypecopyPackage.add_hook: "TypecopyPackage.hook -> theory -> theory"} \\
|
|
1437 |
@{index_ML TypecopyPackage.get_spec: "theory -> string
|
|
1438 |
-> (string * sort) list * (string * typ list) list"}
|
|
1439 |
\end{mldecls}
|
|
1440 |
*}
|
|
1441 |
|
|
1442 |
text %mlref {*
|
|
1443 |
\begin{mldecls}
|
21323
|
1444 |
@{index_ML_type DatatypeCodegen.hook: "(string * (bool * ((string * sort) list * (string * typ list) list))) list
|
|
1445 |
-> theory -> theory"} \\
|
21217
|
1446 |
@{index_ML DatatypeCodegen.add_codetypes_hook_bootstrap: "
|
|
1447 |
DatatypeCodegen.hook -> theory -> theory"}
|
|
1448 |
\end{mldecls}
|
|
1449 |
*}
|
|
1450 |
|
21222
|
1451 |
text {*
|
|
1452 |
\fixme
|
|
1453 |
% \emph{Happy proving, happy hacking!}
|
|
1454 |
*}
|
21217
|
1455 |
|
20948
|
1456 |
end
|