2 imports Setup |
2 imports Setup |
3 begin |
3 begin |
4 |
4 |
5 section {* Introduction *} |
5 section {* Introduction *} |
6 |
6 |
7 subsection {* Code generation fundamental: shallow embedding *} |
7 text {* |
8 |
8 This tutorial introduces the code generator facilities of @{text |
9 subsection {* A quick start with the @{text "Isabelle/HOL"} toolbox *} |
9 "Isabelle/HOL"}. It allows to turn (a certain class of) HOL |
10 |
10 specifications into corresponding executable code in the programming |
11 subsection {* Type classes *} |
11 languages @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and |
12 |
12 @{text Haskell} \cite{haskell-revised-report}. |
13 subsection {* How to continue from here *} |
13 |
14 |
14 To profit from this tutorial, some familiarity and experience with |
15 subsection {* If something goes utterly wrong *} |
15 @{theory HOL} \cite{isa-tutorial} and its basic theories is assumed. |
16 |
16 *} |
17 text {* |
17 |
18 This tutorial introduces a generic code generator for the |
18 |
19 @{text Isabelle} system. |
19 subsection {* Code generation principle: shallow embedding \label{sec:principle} *} |
20 The |
20 |
21 \qn{target language} for which code is |
21 text {* |
22 generated is not fixed, but may be one of several |
22 The key concept for understanding Isabelle's code generation is |
23 functional programming languages (currently, the implementation |
23 \emph{shallow embedding}: logical entities like constants, types and |
24 supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell} |
24 classes are identified with corresponding entities in the target |
25 \cite{haskell-revised-report}). |
25 language. In particular, the carrier of a generated program's |
26 |
26 semantics are \emph{equational theorems} from the logic. If we view |
27 Conceptually the code generator framework is part |
27 a generated program as an implementation of a higher-order rewrite |
28 of Isabelle's @{theory Pure} meta logic framework; the logic |
28 system, then every rewrite step performed by the program can be |
29 @{theory HOL} \cite{isa-tutorial}, which is an extension of @{theory Pure}, |
29 simulated in the logic, which guarantees partial correctness |
30 already comes with a reasonable framework setup and thus provides |
30 \cite{Haftmann-Nipkow:2010:code}. |
31 a good basis for creating code-generation-driven |
31 *} |
32 applications. So, we assume some familiarity and experience |
32 |
33 with the ingredients of the @{theory HOL} distribution theories. |
33 |
34 |
34 subsection {* A quick start with the Isabelle/HOL toolbox \label{sec:queue_example} *} |
35 The code generator aims to be usable with no further ado |
35 |
36 in most cases, while allowing for detailed customisation. |
36 text {* |
37 This can be seen in the structure of this tutorial: after a short |
37 In a HOL theory, the @{command datatype} and @{command |
38 conceptual introduction with an example (\secref{sec:intro}), |
38 definition}/@{command primrec}/@{command fun} declarations form the |
39 we discuss the generic customisation facilities (\secref{sec:program}). |
39 core of a functional programming language. By default equational |
40 A further section (\secref{sec:adaptation}) is dedicated to the matter of |
40 theorems stemming from those are used for generated code, therefore |
41 \qn{adaptation} to specific target language environments. After some |
41 \qt{naive} code generation can proceed without further ado. |
42 further issues (\secref{sec:further}) we conclude with an overview |
42 |
43 of some ML programming interfaces (\secref{sec:ml}). |
|
44 |
|
45 \begin{warn} |
|
46 Ultimately, the code generator which this tutorial deals with |
|
47 is supposed to replace the existing code generator |
|
48 by Stefan Berghofer \cite{Berghofer-Nipkow:2002}. |
|
49 So, for the moment, there are two distinct code generators |
|
50 in Isabelle. In case of ambiguity, we will refer to the framework |
|
51 described here as @{text "generic code generator"}, to the |
|
52 other as @{text "SML code generator"}. |
|
53 Also note that while the framework itself is |
|
54 object-logic independent, only @{theory HOL} provides a reasonable |
|
55 framework setup. |
|
56 \end{warn} |
|
57 |
|
58 *} |
|
59 |
|
60 subsection {* Code generation via shallow embedding \label{sec:intro} *} |
|
61 |
|
62 text {* |
|
63 The key concept for understanding @{text Isabelle}'s code generation is |
|
64 \emph{shallow embedding}, i.e.~logical entities like constants, types and |
|
65 classes are identified with corresponding concepts in the target language. |
|
66 |
|
67 Inside @{theory HOL}, the @{command datatype} and |
|
68 @{command definition}/@{command primrec}/@{command fun} declarations form |
|
69 the core of a functional programming language. The default code generator setup |
|
70 transforms those into functional programs immediately. |
|
71 This means that \qt{naive} code generation can proceed without further ado. |
|
72 For example, here a simple \qt{implementation} of amortised queues: |
43 For example, here a simple \qt{implementation} of amortised queues: |
73 *} |
44 *} |
74 |
45 |
75 datatype %quote 'a queue = AQueue "'a list" "'a list" |
46 datatype %quote 'a queue = AQueue "'a list" "'a list" |
76 |
47 |
94 text {* \noindent resulting in the following code: *} |
69 text {* \noindent resulting in the following code: *} |
95 |
70 |
96 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*} |
71 text %quote {*@{code_stmts empty enqueue dequeue (SML)}*} |
97 |
72 |
98 text {* |
73 text {* |
99 \noindent The @{command export_code} command takes a space-separated list of |
74 \noindent The @{command export_code} command takes a space-separated |
100 constants for which code shall be generated; anything else needed for those |
75 list of constants for which code shall be generated; anything else |
101 is added implicitly. Then follows a target language identifier |
76 needed for those is added implicitly. Then follows a target |
102 (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name. |
77 language identifier and a freely chosen module name. A file name |
103 A file name denotes the destination to store the generated code. Note that |
78 denotes the destination to store the generated code. Note that the |
104 the semantics of the destination depends on the target language: for |
79 semantics of the destination depends on the target language: for |
105 @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell} |
80 @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text |
106 it denotes a \emph{directory} where a file named as the module name |
81 Haskell} it denotes a \emph{directory} where a file named as the |
107 (with extension @{text ".hs"}) is written: |
82 module name (with extension @{text ".hs"}) is written: |
108 *} |
83 *} |
109 |
84 |
110 export_code %quote empty dequeue enqueue in Haskell |
85 export_code %quote empty dequeue enqueue in Haskell |
111 module_name Example file "examples/" |
86 module_name Example file "examples/" |
112 |
87 |
113 text {* |
88 text {* |
114 \noindent This is the corresponding code in @{text Haskell}: |
89 \noindent This is the corresponding code: |
115 *} |
90 *} |
116 |
91 |
117 text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*} |
92 text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*} |
118 |
93 |
119 text {* |
94 text {* |
120 \noindent This demonstrates the basic usage of the @{command export_code} command; |
95 \noindent For more details about @{command export_code} see |
121 for more details see \secref{sec:further}. |
96 \secref{sec:further}. |
122 *} |
97 *} |
123 |
98 |
124 subsection {* If something utterly fails *} |
99 |
125 |
100 subsection {* Type classes *} |
126 text {* |
101 |
127 Under certain circumstances, the code generator fails to produce |
102 text {* |
128 code entirely. |
103 Code can also be generated from type classes in a Haskell-like |
129 |
104 manner. For illustration here an example from abstract algebra: |
130 \begin{description} |
105 *} |
131 |
106 |
132 \ditem{generate only one module} |
107 class %quote semigroup = |
133 |
108 fixes mult :: "'a \<Rightarrow> 'a \<Rightarrow> 'a" (infixl "\<otimes>" 70) |
134 \ditem{check with a different target language} |
109 assumes assoc: "(x \<otimes> y) \<otimes> z = x \<otimes> (y \<otimes> z)" |
135 |
110 |
136 \ditem{inspect code equations} |
111 class %quote monoid = semigroup + |
137 |
112 fixes neutral :: 'a ("\<one>") |
138 \ditem{inspect preprocessor setup} |
113 assumes neutl: "\<one> \<otimes> x = x" |
139 |
114 and neutr: "x \<otimes> \<one> = x" |
140 \ditem{generate exceptions} |
115 |
141 |
116 instantiation %quote nat :: monoid |
142 \ditem{remove offending code equations} |
117 begin |
143 |
118 |
144 \end{description} |
119 primrec %quote mult_nat where |
145 *} |
120 "0 \<otimes> n = (0\<Colon>nat)" |
146 |
121 | "Suc m \<otimes> n = n + m \<otimes> n" |
147 subsection {* Code generator architecture \label{sec:concept} *} |
122 |
148 |
123 definition %quote neutral_nat where |
149 text {* |
124 "\<one> = Suc 0" |
150 What you have seen so far should be already enough in a lot of cases. If you |
125 |
151 are content with this, you can quit reading here. Anyway, in order to customise |
126 lemma %quote add_mult_distrib: |
152 and adapt the code generator, it is necessary to gain some understanding |
127 fixes n m q :: nat |
153 how it works. |
128 shows "(n + m) \<otimes> q = n \<otimes> q + m \<otimes> q" |
154 |
129 by (induct n) simp_all |
155 \begin{figure}[h] |
130 |
156 \includegraphics{architecture} |
131 instance %quote proof |
157 \caption{Code generator architecture} |
132 fix m n q :: nat |
158 \label{fig:arch} |
133 show "m \<otimes> n \<otimes> q = m \<otimes> (n \<otimes> q)" |
159 \end{figure} |
134 by (induct m) (simp_all add: add_mult_distrib) |
160 |
135 show "\<one> \<otimes> n = n" |
161 The code generator employs a notion of executability |
136 by (simp add: neutral_nat_def) |
162 for three foundational executable ingredients known |
137 show "m \<otimes> \<one> = m" |
163 from functional programming: |
138 by (induct m) (simp_all add: neutral_nat_def) |
164 \emph{code equations}, \emph{datatypes}, and |
139 qed |
165 \emph{type classes}. A code equation as a first approximation |
140 |
166 is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} |
141 end %quote |
167 (an equation headed by a constant @{text f} with arguments |
142 |
168 @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}). |
143 text {* |
169 Code generation aims to turn code equations |
144 \noindent We define the natural operation of the natural numbers |
170 into a functional program. This is achieved by three major |
145 on monoids: |
171 components which operate sequentially, i.e. the result of one is |
146 *} |
172 the input |
147 |
173 of the next in the chain, see figure \ref{fig:arch}: |
148 primrec %quote (in monoid) pow :: "nat \<Rightarrow> 'a \<Rightarrow> 'a" where |
|
149 "pow 0 a = \<one>" |
|
150 | "pow (Suc n) a = a \<otimes> pow n a" |
|
151 |
|
152 text {* |
|
153 \noindent This we use to define the discrete exponentiation |
|
154 function: |
|
155 *} |
|
156 |
|
157 definition %quote bexp :: "nat \<Rightarrow> nat" where |
|
158 "bexp n = pow n (Suc (Suc 0))" |
|
159 |
|
160 text {* |
|
161 \noindent The corresponding code in Haskell uses that language's |
|
162 native classes: |
|
163 *} |
|
164 |
|
165 text %quote {*@{code_stmts bexp (Haskell)}*} |
|
166 |
|
167 text {* |
|
168 \noindent This is a convenient place to show how explicit dictionary |
|
169 construction manifests in generated code -- the same example in |
|
170 @{text SML}: |
|
171 *} |
|
172 |
|
173 text %quote {*@{code_stmts bexp (SML)}*} |
|
174 |
|
175 text {* |
|
176 \noindent Note the parameters with trailing underscore (@{verbatim |
|
177 "A_"}), which are the dictionary parameters. |
|
178 *} |
|
179 |
|
180 |
|
181 subsection {* How to continue from here *} |
|
182 |
|
183 text {* |
|
184 What you have seen so far should be already enough in a lot of |
|
185 cases. If you are content with this, you can quit reading here. |
|
186 |
|
187 Anyway, to understand situations where problems occur or to increase |
|
188 the scope of code generation beyond default, it is necessary to gain |
|
189 some understanding how the code generator actually works: |
174 |
190 |
175 \begin{itemize} |
191 \begin{itemize} |
176 |
192 |
177 \item The starting point is a collection of raw code equations in a |
193 \item The foundations of the code generator are described in |
178 theory. It is not relevant where they |
194 \secref{sec:foundations}. |
179 stem from, but typically they were either produced by specification |
195 |
180 tools or proved explicitly by the user. |
196 \item In particular \secref{sec:utterly_wrong} gives hints how to |
181 |
197 debug situations where code generation does not succeed as |
182 \item These raw code equations can be subjected to theorem transformations. This |
198 expected. |
183 \qn{preprocessor} can apply the full |
199 |
184 expressiveness of ML-based theorem transformations to code |
200 \item The scope and quality of generated code can be increased |
185 generation. The result of preprocessing is a |
201 dramatically by applying refinement techniques, which are |
186 structured collection of code equations. |
202 introduced in \secref{sec:refinement}. |
187 |
203 |
188 \item These code equations are \qn{translated} to a program in an |
204 \item Inductive predicates can be turned executable using an |
189 abstract intermediate language. Think of it as a kind |
205 extension of the code generator \secref{sec:inductive}. |
190 of \qt{Mini-Haskell} with four \qn{statements}: @{text data} |
206 |
191 (for datatypes), @{text fun} (stemming from code equations), |
207 \item You may want to skim over the more technical sections |
192 also @{text class} and @{text inst} (for type classes). |
208 \secref{sec:adaptation} and \secref{sec:further}. |
193 |
209 |
194 \item Finally, the abstract program is \qn{serialised} into concrete |
210 \item For exhaustive syntax diagrams etc. you should visit the |
195 source code of a target language. |
211 Isabelle/Isar Reference Manual \cite{isabelle-isar-ref}. |
196 This step only produces concrete syntax but does not change the |
|
197 program in essence; all conceptual transformations occur in the |
|
198 translation step. |
|
199 |
212 |
200 \end{itemize} |
213 \end{itemize} |
201 |
214 |
202 \noindent From these steps, only the two last are carried out outside the logic; by |
215 \bigskip |
203 keeping this layer as thin as possible, the amount of code to trust is |
216 |
204 kept to a minimum. |
217 \begin{center}\fbox{\fbox{\begin{minipage}{8cm} |
|
218 |
|
219 \begin{center}\textit{Happy proving, happy hacking!}\end{center} |
|
220 |
|
221 \end{minipage}}}\end{center} |
|
222 |
|
223 \begin{warn} |
|
224 There is also a more ancient code generator in Isabelle by Stefan |
|
225 Berghofer \cite{Berghofer-Nipkow:2002}. Although its |
|
226 functionality is covered by the code generator presented here, it |
|
227 will sometimes show up as an artifact. In case of ambiguity, we |
|
228 will refer to the framework described here as @{text "generic code |
|
229 generator"}, to the other as @{text "SML code generator"}. |
|
230 \end{warn} |
205 *} |
231 *} |
206 |
232 |
207 end |
233 end |