6 |
6 |
7 section {* Introduction and Overview *} |
7 section {* Introduction and Overview *} |
8 |
8 |
9 text {* |
9 text {* |
10 This tutorial introduces a generic code generator for the |
10 This tutorial introduces a generic code generator for the |
11 Isabelle system \cite{isa-tutorial}. |
11 @{text Isabelle} system. |
|
12 Generic in the sense that the |
|
13 \qn{target language} for which code shall ultimately be |
|
14 generated is not fixed but may be an arbitrary state-of-the-art |
|
15 functional programming language (currently, the implementation |
|
16 supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell} |
|
17 \cite{haskell-revised-report}). |
|
18 |
|
19 Conceptually the code generator framework is part |
|
20 of Isabelle's @{text Pure} meta logic framework; the logic |
|
21 @{text HOL} which is an extension of @{text Pure} |
|
22 already comes with a reasonable framework setup and thus provides |
|
23 a good working horse for raising code-generation-driven |
|
24 applications. So, we assume some familiarity and experience |
|
25 with the ingredients of the @{text HOL} distribution theories. |
|
26 (see also \cite{isa-tutorial}). |
|
27 |
|
28 The code generator aims to be usable with no further ado |
|
29 in most cases while allowing for detailed customisation. |
|
30 This manifests in the structure of this tutorial: after a short |
|
31 conceptual introduction with an example \secref{sec:intro}, |
|
32 we discuss the generic customisation facilities \secref{sec:program}. |
|
33 A further section \secref{sec:adaption} is dedicated to the matter of |
|
34 \qn{adaption} to specific target language environments. After some |
|
35 further issues \secref{sec:further} we conclude with an overview |
|
36 of some ML programming interfaces \secref{sec:ml}. |
|
37 |
|
38 \begin{warn} |
|
39 Ultimately, the code generator which this tutorial deals with |
|
40 is supposed to replace the already established code generator |
|
41 by Stefan Berghofer \cite{Berghofer-Nipkow:2002}. |
|
42 So, for the moment, there are two distinct code generators |
|
43 in Isabelle. |
|
44 Also note that while the framework itself is |
|
45 object-logic independent, only @{text HOL} provides a reasonable |
|
46 framework setup. |
|
47 \end{warn} |
|
48 |
12 *} |
49 *} |
13 |
50 |
14 subsection {* Code generation via shallow embedding *} |
51 subsection {* Code generation via shallow embedding \label{sec:intro} *} |
15 |
52 |
16 text {* example *} |
53 text {* |
|
54 The key concept for understanding @{text Isabelle}'s code generation is |
|
55 \emph{shallow embedding}, i.e.~logical entities like constants, types and |
|
56 classes are identified with corresponding concepts in the target language. |
|
57 |
|
58 Inside @{text HOL}, the @{command datatype} and |
|
59 @{command definition}/@{command primrec}/@{command fun} declarations form |
|
60 the core of a functional programming language. The default code generator setup |
|
61 allows to turn those into functional programs immediately. |
|
62 |
|
63 This means that \qt{naive} code generation can proceed without further ado. |
|
64 For example, here a simple \qt{implementation} of amortised queues: |
|
65 *} |
|
66 |
|
67 datatype %quoteme 'a queue = Queue "'a list" "'a list" |
|
68 |
|
69 definition %quoteme empty :: "'a queue" where |
|
70 "empty = Queue [] []" |
|
71 |
|
72 primrec %quoteme enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where |
|
73 "enqueue x (Queue xs ys) = Queue (x # xs) ys" |
|
74 |
|
75 fun %quoteme dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where |
|
76 "dequeue (Queue [] []) = (None, Queue [] [])" |
|
77 | "dequeue (Queue xs (y # ys)) = (Some y, Queue xs ys)" |
|
78 | "dequeue (Queue xs []) = (case rev xs of y # ys \<Rightarrow> (Some y, Queue [] ys))" |
|
79 |
|
80 text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *} |
|
81 |
|
82 export_code %quoteme empty dequeue enqueue in SML module_name Example file "examples/example.ML" |
|
83 |
|
84 text {* \noindent resulting in the following code: *} |
|
85 |
|
86 text %quoteme {*@{code_stmts empty enqueue dequeue (SML)}*} |
|
87 |
|
88 text {* |
|
89 \noindent The @{command export_code} command takes a space-separated list of |
|
90 constants for which code shall be generated; anything else needed for those |
|
91 is added implicitly. Then follows by a target language identifier |
|
92 (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name. |
|
93 A file name denotes the destination to store the generated code. Note that |
|
94 the semantics of the destination depends on the target language: for |
|
95 @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell} |
|
96 it denotes a \emph{directory} where a file named as the module name |
|
97 (with extension @{text ".hs"}) is written: |
|
98 *} |
|
99 |
|
100 export_code %quoteme empty dequeue enqueue in SML module_name Example file "examples/Example.ML" |
|
101 |
|
102 text {* |
|
103 \noindent This is how the corresponding code in @{text Haskell} looks like: |
|
104 *} |
|
105 |
|
106 text %quoteme {*@{code_stmts empty enqueue dequeue (Haskell)}*} |
|
107 |
|
108 text {* |
|
109 \noindent This demonstrates the basic usage of the @{command export_code} command; |
|
110 for more details see \secref{sec:serializer}. |
|
111 *} |
17 |
112 |
18 subsection {* Code generator architecture *} |
113 subsection {* Code generator architecture *} |
19 |
114 |
20 text {* foundation, forward references for yet unexplained things *} |
115 text {* |
|
116 What you have seen so far should be already enough in a lot of cases. If you |
|
117 are content with this, you can quit reading here. Anyway, in order to customise |
|
118 and adapt the code generator, it is inevitable to gain some understanding |
|
119 how it works. |
|
120 |
|
121 \begin{figure}[h] |
|
122 \centering |
|
123 \includegraphics[width=0.7\textwidth]{codegen_process} |
|
124 \caption{Code generator architecture} |
|
125 \label{fig:arch} |
|
126 \end{figure} |
|
127 |
|
128 The code generator itself consists of three major components |
|
129 which operate sequentially, i.e. the result of one is the input |
|
130 of the next in the chain, see diagram \ref{fig:arch}: |
|
131 |
|
132 The code generator employs a notion of executability |
|
133 for three foundational executable ingredients known |
|
134 from functional programming: |
|
135 \emph{defining equations}, \emph{datatypes}, and |
|
136 \emph{type classes}. A defining equation as a first approximation |
|
137 is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} |
|
138 (an equation headed by a constant @{text f} with arguments |
|
139 @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}). |
|
140 Code generation aims to turn defining equations |
|
141 into a functional program by running through the following |
|
142 process: |
|
143 |
|
144 \begin{itemize} |
|
145 |
|
146 \item Out of the vast collection of theorems proven in a |
|
147 \qn{theory}, a reasonable subset modelling |
|
148 defining equations is \qn{selected}. |
|
149 |
|
150 \item On those selected theorems, certain |
|
151 transformations are carried out |
|
152 (\qn{preprocessing}). Their purpose is to turn theorems |
|
153 representing non- or badly executable |
|
154 specifications into equivalent but executable counterparts. |
|
155 The result is a structured collection of \qn{code theorems}. |
|
156 |
|
157 \item Before the selected defining equations are continued with, |
|
158 they can be \qn{preprocessed}, i.e. subjected to theorem |
|
159 transformations. This \qn{preprocessor} is an interface which |
|
160 allows to apply |
|
161 the full expressiveness of ML-based theorem transformations |
|
162 to code generation; motivating examples are shown below, see |
|
163 \secref{sec:preproc}. |
|
164 The result of the preprocessing step is a structured collection |
|
165 of defining equations. |
|
166 |
|
167 \item These defining equations are \qn{translated} to a program |
|
168 in an abstract intermediate language. |
|
169 |
|
170 \item Finally, the abstract program is \qn{serialised} into concrete |
|
171 source code of a target language. |
|
172 |
|
173 \end{itemize} |
|
174 |
|
175 \noindent From these steps, only the two last are carried out outside the logic; by |
|
176 keeping this layer as thin as possible, the amount of code to trust is |
|
177 kept to a minimum. |
|
178 *} |
21 |
179 |
22 end |
180 end |