author | wenzelm |
Sun, 28 Jun 2009 22:35:23 +0200 | |
changeset 31843 | f02c6a43f1b3 |
parent 31206 | a9fa62683582 |
child 34155 | 14aaccb399b3 |
permissions | -rw-r--r-- |
31050 | 1 |
theory Adaptation |
28213 | 2 |
imports Setup |
3 |
begin |
|
4 |
||
28679 | 5 |
setup %invisible {* Code_Target.extend_target ("\<SML>", ("SML", K I)) *} |
28561 | 6 |
|
31050 | 7 |
section {* Adaptation to target languages \label{sec:adaptation} *} |
28419 | 8 |
|
28561 | 9 |
subsection {* Adapting code generation *} |
10 |
||
11 |
text {* |
|
12 |
The aspects of code generation introduced so far have two aspects |
|
13 |
in common: |
|
14 |
||
15 |
\begin{itemize} |
|
16 |
\item They act uniformly, without reference to a specific |
|
17 |
target language. |
|
18 |
\item They are \emph{safe} in the sense that as long as you trust |
|
19 |
the code generator meta theory and implementation, you cannot |
|
20 |
produce programs that yield results which are not derivable |
|
21 |
in the logic. |
|
22 |
\end{itemize} |
|
23 |
||
24 |
\noindent In this section we will introduce means to \emph{adapt} the serialiser |
|
25 |
to a specific target language, i.e.~to print program fragments |
|
28593 | 26 |
in a way which accommodates \qt{already existing} ingredients of |
28561 | 27 |
a target language environment, for three reasons: |
28 |
||
29 |
\begin{itemize} |
|
28593 | 30 |
\item improving readability and aesthetics of generated code |
28561 | 31 |
\item gaining efficiency |
32 |
\item interface with language parts which have no direct counterpart |
|
33 |
in @{text "HOL"} (say, imperative data structures) |
|
34 |
\end{itemize} |
|
35 |
||
36 |
\noindent Generally, you should avoid using those features yourself |
|
37 |
\emph{at any cost}: |
|
38 |
||
39 |
\begin{itemize} |
|
40 |
\item The safe configuration methods act uniformly on every target language, |
|
31050 | 41 |
whereas for adaptation you have to treat each target language separate. |
28561 | 42 |
\item Application is extremely tedious since there is no abstraction |
28593 | 43 |
which would allow for a static check, making it easy to produce garbage. |
44 |
\item More or less subtle errors can be introduced unconsciously. |
|
28561 | 45 |
\end{itemize} |
46 |
||
31050 | 47 |
\noindent However, even if you ought refrain from setting up adaptation |
28561 | 48 |
yourself, already the @{text "HOL"} comes with some reasonable default |
31050 | 49 |
adaptations (say, using target language list syntax). There also some |
50 |
common adaptation cases which you can setup by importing particular |
|
28561 | 51 |
library theories. In order to understand these, we provide some clues here; |
52 |
these however are not supposed to replace a careful study of the sources. |
|
53 |
*} |
|
54 |
||
31050 | 55 |
subsection {* The adaptation principle *} |
28561 | 56 |
|
57 |
text {* |
|
31050 | 58 |
Figure \ref{fig:adaptation} illustrates what \qt{adaptation} is conceptually |
28601 | 59 |
supposed to be: |
60 |
||
61 |
\begin{figure}[here] |
|
31050 | 62 |
\includegraphics{adaptation} |
63 |
\caption{The adaptation principle} |
|
64 |
\label{fig:adaptation} |
|
28601 | 65 |
\end{figure} |
66 |
||
67 |
\noindent In the tame view, code generation acts as broker between |
|
68 |
@{text logic}, @{text "intermediate language"} and |
|
69 |
@{text "target language"} by means of @{text translation} and |
|
70 |
@{text serialisation}; for the latter, the serialiser has to observe |
|
71 |
the structure of the @{text language} itself plus some @{text reserved} |
|
72 |
keywords which have to be avoided for generated code. |
|
31050 | 73 |
However, if you consider @{text adaptation} mechanisms, the code generated |
28601 | 74 |
by the serializer is just the tip of the iceberg: |
75 |
||
76 |
\begin{itemize} |
|
28635 | 77 |
\item @{text serialisation} can be \emph{parametrised} such that |
78 |
logical entities are mapped to target-specific ones |
|
79 |
(e.g. target-specific list syntax, |
|
31050 | 80 |
see also \secref{sec:adaptation_mechanisms}) |
28635 | 81 |
\item Such parametrisations can involve references to a |
82 |
target-specific standard @{text library} (e.g. using |
|
83 |
the @{text Haskell} @{verbatim Maybe} type instead |
|
84 |
of the @{text HOL} @{type "option"} type); |
|
85 |
if such are used, the corresponding identifiers |
|
86 |
(in our example, @{verbatim Maybe}, @{verbatim Nothing} |
|
87 |
and @{verbatim Just}) also have to be considered @{text reserved}. |
|
88 |
\item Even more, the user can enrich the library of the |
|
89 |
target-language by providing code snippets |
|
90 |
(\qt{@{text "includes"}}) which are prepended to |
|
91 |
any generated code (see \secref{sec:include}); this typically |
|
92 |
also involves further @{text reserved} identifiers. |
|
28601 | 93 |
\end{itemize} |
28635 | 94 |
|
31050 | 95 |
\noindent As figure \ref{fig:adaptation} illustrates, all these adaptation mechanisms |
28635 | 96 |
have to act consistently; it is at the discretion of the user |
97 |
to take care for this. |
|
28561 | 98 |
*} |
99 |
||
31050 | 100 |
subsection {* Common adaptation patterns *} |
28419 | 101 |
|
102 |
text {* |
|
28428 | 103 |
The @{theory HOL} @{theory Main} theory already provides a code |
28419 | 104 |
generator setup |
28593 | 105 |
which should be suitable for most applications. Common extensions |
28419 | 106 |
and modifications are available by certain theories of the @{text HOL} |
107 |
library; beside being useful in applications, they may serve |
|
108 |
as a tutorial for customising the code generator setup (see below |
|
31050 | 109 |
\secref{sec:adaptation_mechanisms}). |
28419 | 110 |
|
111 |
\begin{description} |
|
112 |
||
113 |
\item[@{theory "Code_Integer"}] represents @{text HOL} integers by big |
|
114 |
integer literals in target languages. |
|
115 |
\item[@{theory "Code_Char"}] represents @{text HOL} characters by |
|
116 |
character literals in target languages. |
|
117 |
\item[@{theory "Code_Char_chr"}] like @{text "Code_Char"}, |
|
118 |
but also offers treatment of character codes; includes |
|
28561 | 119 |
@{theory "Code_Char"}. |
28419 | 120 |
\item[@{theory "Efficient_Nat"}] \label{eff_nat} implements natural numbers by integers, |
121 |
which in general will result in higher efficiency; pattern |
|
122 |
matching with @{term "0\<Colon>nat"} / @{const "Suc"} |
|
28561 | 123 |
is eliminated; includes @{theory "Code_Integer"} |
31206 | 124 |
and @{theory "Code_Numeral"}. |
125 |
\item[@{theory "Code_Numeral"}] provides an additional datatype |
|
28419 | 126 |
@{typ index} which is mapped to target-language built-in integers. |
127 |
Useful for code setups which involve e.g. indexing of |
|
128 |
target-language arrays. |
|
31050 | 129 |
\item[@{theory "String"}] provides an additional datatype |
31205
98370b26c2ce
String.literal replaces message_string, code_numeral replaces (code_)index
haftmann
parents:
31050
diff
changeset
|
130 |
@{typ String.literal} which is isomorphic to strings; |
98370b26c2ce
String.literal replaces message_string, code_numeral replaces (code_)index
haftmann
parents:
31050
diff
changeset
|
131 |
@{typ String.literal}s are mapped to target-language strings. |
28419 | 132 |
Useful for code setups which involve e.g. printing (error) messages. |
133 |
||
134 |
\end{description} |
|
135 |
||
136 |
\begin{warn} |
|
137 |
When importing any of these theories, they should form the last |
|
138 |
items in an import list. Since these theories adapt the |
|
139 |
code generator setup in a non-conservative fashion, |
|
140 |
strange effects may occur otherwise. |
|
141 |
\end{warn} |
|
142 |
*} |
|
143 |
||
144 |
||
31050 | 145 |
subsection {* Parametrising serialisation \label{sec:adaptation_mechanisms} *} |
28419 | 146 |
|
147 |
text {* |
|
28561 | 148 |
Consider the following function and its corresponding |
28419 | 149 |
SML code: |
150 |
*} |
|
151 |
||
28564 | 152 |
primrec %quote in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where |
28419 | 153 |
"in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l" |
28447 | 154 |
(*<*) |
28419 | 155 |
code_type %invisible bool |
156 |
(SML) |
|
157 |
code_const %invisible True and False and "op \<and>" and Not |
|
158 |
(SML and and and) |
|
28447 | 159 |
(*>*) |
28564 | 160 |
text %quote {*@{code_stmts in_interval (SML)}*} |
28419 | 161 |
|
162 |
text {* |
|
163 |
\noindent Though this is correct code, it is a little bit unsatisfactory: |
|
164 |
boolean values and operators are materialised as distinguished |
|
165 |
entities with have nothing to do with the SML-built-in notion |
|
166 |
of \qt{bool}. This results in less readable code; |
|
167 |
additionally, eager evaluation may cause programs to |
|
168 |
loop or break which would perfectly terminate when |
|
169 |
the existing SML @{verbatim "bool"} would be used. To map |
|
170 |
the HOL @{typ bool} on SML @{verbatim "bool"}, we may use |
|
171 |
\qn{custom serialisations}: |
|
172 |
*} |
|
173 |
||
28564 | 174 |
code_type %quotett bool |
28419 | 175 |
(SML "bool") |
28564 | 176 |
code_const %quotett True and False and "op \<and>" |
28419 | 177 |
(SML "true" and "false" and "_ andalso _") |
28213 | 178 |
|
28419 | 179 |
text {* |
28447 | 180 |
\noindent The @{command code_type} command takes a type constructor |
28419 | 181 |
as arguments together with a list of custom serialisations. |
182 |
Each custom serialisation starts with a target language |
|
183 |
identifier followed by an expression, which during |
|
184 |
code serialisation is inserted whenever the type constructor |
|
185 |
would occur. For constants, @{command code_const} implements |
|
186 |
the corresponding mechanism. Each ``@{verbatim "_"}'' in |
|
187 |
a serialisation expression is treated as a placeholder |
|
188 |
for the type constructor's (the constant's) arguments. |
|
189 |
*} |
|
190 |
||
28564 | 191 |
text %quote {*@{code_stmts in_interval (SML)}*} |
28419 | 192 |
|
193 |
text {* |
|
194 |
\noindent This still is not perfect: the parentheses |
|
195 |
around the \qt{andalso} expression are superfluous. |
|
28593 | 196 |
Though the serialiser |
28419 | 197 |
by no means attempts to imitate the rich Isabelle syntax |
198 |
framework, it provides some common idioms, notably |
|
199 |
associative infixes with precedences which may be used here: |
|
200 |
*} |
|
201 |
||
28564 | 202 |
code_const %quotett "op \<and>" |
28419 | 203 |
(SML infixl 1 "andalso") |
204 |
||
28564 | 205 |
text %quote {*@{code_stmts in_interval (SML)}*} |
28419 | 206 |
|
207 |
text {* |
|
28561 | 208 |
\noindent The attentive reader may ask how we assert that no generated |
209 |
code will accidentally overwrite. For this reason the serialiser has |
|
210 |
an internal table of identifiers which have to be avoided to be used |
|
211 |
for new declarations. Initially, this table typically contains the |
|
212 |
keywords of the target language. It can be extended manually, thus avoiding |
|
213 |
accidental overwrites, using the @{command "code_reserved"} command: |
|
214 |
*} |
|
215 |
||
28601 | 216 |
code_reserved %quote "\<SML>" bool true false andalso |
28561 | 217 |
|
218 |
text {* |
|
28447 | 219 |
\noindent Next, we try to map HOL pairs to SML pairs, using the |
28419 | 220 |
infix ``@{verbatim "*"}'' type constructor and parentheses: |
221 |
*} |
|
28447 | 222 |
(*<*) |
28419 | 223 |
code_type %invisible * |
224 |
(SML) |
|
225 |
code_const %invisible Pair |
|
226 |
(SML) |
|
28447 | 227 |
(*>*) |
28564 | 228 |
code_type %quotett * |
28419 | 229 |
(SML infix 2 "*") |
28564 | 230 |
code_const %quotett Pair |
28419 | 231 |
(SML "!((_),/ (_))") |
232 |
||
233 |
text {* |
|
28593 | 234 |
\noindent The initial bang ``@{verbatim "!"}'' tells the serialiser |
28561 | 235 |
never to put |
28419 | 236 |
parentheses around the whole expression (they are already present), |
237 |
while the parentheses around argument place holders |
|
238 |
tell not to put parentheses around the arguments. |
|
239 |
The slash ``@{verbatim "/"}'' (followed by arbitrary white space) |
|
240 |
inserts a space which may be used as a break if necessary |
|
241 |
during pretty printing. |
|
242 |
||
243 |
These examples give a glimpse what mechanisms |
|
244 |
custom serialisations provide; however their usage |
|
245 |
requires careful thinking in order not to introduce |
|
246 |
inconsistencies -- or, in other words: |
|
247 |
custom serialisations are completely axiomatic. |
|
248 |
||
249 |
A further noteworthy details is that any special |
|
250 |
character in a custom serialisation may be quoted |
|
251 |
using ``@{verbatim "'"}''; thus, in |
|
252 |
``@{verbatim "fn '_ => _"}'' the first |
|
253 |
``@{verbatim "_"}'' is a proper underscore while the |
|
254 |
second ``@{verbatim "_"}'' is a placeholder. |
|
255 |
*} |
|
256 |
||
257 |
||
258 |
subsection {* @{text Haskell} serialisation *} |
|
259 |
||
260 |
text {* |
|
261 |
For convenience, the default |
|
262 |
@{text HOL} setup for @{text Haskell} maps the @{class eq} class to |
|
263 |
its counterpart in @{text Haskell}, giving custom serialisations |
|
264 |
for the class @{class eq} (by command @{command code_class}) and its operation |
|
265 |
@{const HOL.eq} |
|
266 |
*} |
|
267 |
||
28564 | 268 |
code_class %quotett eq |
28714 | 269 |
(Haskell "Eq") |
28419 | 270 |
|
28564 | 271 |
code_const %quotett "op =" |
28419 | 272 |
(Haskell infixl 4 "==") |
273 |
||
274 |
text {* |
|
28447 | 275 |
\noindent A problem now occurs whenever a type which |
28419 | 276 |
is an instance of @{class eq} in @{text HOL} is mapped |
277 |
on a @{text Haskell}-built-in type which is also an instance |
|
278 |
of @{text Haskell} @{text Eq}: |
|
279 |
*} |
|
280 |
||
28564 | 281 |
typedecl %quote bar |
28419 | 282 |
|
28564 | 283 |
instantiation %quote bar :: eq |
28419 | 284 |
begin |
285 |
||
28564 | 286 |
definition %quote "eq_class.eq (x\<Colon>bar) y \<longleftrightarrow> x = y" |
28419 | 287 |
|
28564 | 288 |
instance %quote by default (simp add: eq_bar_def) |
28213 | 289 |
|
30880 | 290 |
end %quote (*<*) |
291 |
||
292 |
(*>*) code_type %quotett bar |
|
28419 | 293 |
(Haskell "Integer") |
294 |
||
295 |
text {* |
|
28447 | 296 |
\noindent The code generator would produce |
28593 | 297 |
an additional instance, which of course is rejected by the @{text Haskell} |
28419 | 298 |
compiler. |
299 |
To suppress this additional instance, use |
|
300 |
@{text "code_instance"}: |
|
301 |
*} |
|
302 |
||
28564 | 303 |
code_instance %quotett bar :: eq |
28419 | 304 |
(Haskell -) |
305 |
||
28561 | 306 |
|
28635 | 307 |
subsection {* Enhancing the target language context \label{sec:include} *} |
28561 | 308 |
|
309 |
text {* |
|
28593 | 310 |
In rare cases it is necessary to \emph{enrich} the context of a |
28561 | 311 |
target language; this is accomplished using the @{command "code_include"} |
312 |
command: |
|
313 |
*} |
|
314 |
||
28564 | 315 |
code_include %quotett Haskell "Errno" |
28561 | 316 |
{*errno i = error ("Error number: " ++ show i)*} |
317 |
||
28564 | 318 |
code_reserved %quotett Haskell Errno |
28561 | 319 |
|
320 |
text {* |
|
321 |
\noindent Such named @{text include}s are then prepended to every generated code. |
|
322 |
Inspect such code in order to find out how @{command "code_include"} behaves |
|
323 |
with respect to a particular target language. |
|
324 |
*} |
|
325 |
||
28419 | 326 |
end |