author | haftmann |
Mon, 27 Sep 2010 16:27:31 +0200 | |
changeset 39745 | 3aa2bc9c5478 |
parent 39711 | 793451df8c4e |
child 40351 | 090dac52cfd7 |
permissions | -rw-r--r-- |
31050 | 1 |
theory Adaptation |
28213 | 2 |
imports Setup |
3 |
begin |
|
4 |
||
28679 | 5 |
setup %invisible {* Code_Target.extend_target ("\<SML>", ("SML", K I)) *} |
28561 | 6 |
|
31050 | 7 |
section {* Adaptation to target languages \label{sec:adaptation} *} |
28419 | 8 |
|
28561 | 9 |
subsection {* Adapting code generation *} |
10 |
||
11 |
text {* |
|
12 |
The aspects of code generation introduced so far have two aspects |
|
13 |
in common: |
|
14 |
||
15 |
\begin{itemize} |
|
38450 | 16 |
|
17 |
\item They act uniformly, without reference to a specific target |
|
18 |
language. |
|
19 |
||
28561 | 20 |
\item They are \emph{safe} in the sense that as long as you trust |
21 |
the code generator meta theory and implementation, you cannot |
|
38450 | 22 |
produce programs that yield results which are not derivable in |
23 |
the logic. |
|
24 |
||
28561 | 25 |
\end{itemize} |
26 |
||
38450 | 27 |
\noindent In this section we will introduce means to \emph{adapt} |
28 |
the serialiser to a specific target language, i.e.~to print program |
|
29 |
fragments in a way which accommodates \qt{already existing} |
|
30 |
ingredients of a target language environment, for three reasons: |
|
28561 | 31 |
|
32 |
\begin{itemize} |
|
28593 | 33 |
\item improving readability and aesthetics of generated code |
28561 | 34 |
\item gaining efficiency |
35 |
\item interface with language parts which have no direct counterpart |
|
36 |
in @{text "HOL"} (say, imperative data structures) |
|
37 |
\end{itemize} |
|
38 |
||
39 |
\noindent Generally, you should avoid using those features yourself |
|
40 |
\emph{at any cost}: |
|
41 |
||
42 |
\begin{itemize} |
|
38450 | 43 |
|
44 |
\item The safe configuration methods act uniformly on every target |
|
45 |
language, whereas for adaptation you have to treat each target |
|
46 |
language separately. |
|
47 |
||
48 |
\item Application is extremely tedious since there is no |
|
49 |
abstraction which would allow for a static check, making it easy |
|
50 |
to produce garbage. |
|
51 |
||
34155 | 52 |
\item Subtle errors can be introduced unconsciously. |
38450 | 53 |
|
28561 | 54 |
\end{itemize} |
55 |
||
38450 | 56 |
\noindent However, even if you ought refrain from setting up |
57 |
adaptation yourself, already the @{text "HOL"} comes with some |
|
58 |
reasonable default adaptations (say, using target language list |
|
59 |
syntax). There also some common adaptation cases which you can |
|
60 |
setup by importing particular library theories. In order to |
|
61 |
understand these, we provide some clues here; these however are not |
|
62 |
supposed to replace a careful study of the sources. |
|
28561 | 63 |
*} |
64 |
||
38450 | 65 |
|
31050 | 66 |
subsection {* The adaptation principle *} |
28561 | 67 |
|
68 |
text {* |
|
38450 | 69 |
Figure \ref{fig:adaptation} illustrates what \qt{adaptation} is |
70 |
conceptually supposed to be: |
|
28601 | 71 |
|
72 |
\begin{figure}[here] |
|
31050 | 73 |
\includegraphics{adaptation} |
74 |
\caption{The adaptation principle} |
|
75 |
\label{fig:adaptation} |
|
28601 | 76 |
\end{figure} |
77 |
||
78 |
\noindent In the tame view, code generation acts as broker between |
|
38450 | 79 |
@{text logic}, @{text "intermediate language"} and @{text "target |
80 |
language"} by means of @{text translation} and @{text |
|
81 |
serialisation}; for the latter, the serialiser has to observe the |
|
82 |
structure of the @{text language} itself plus some @{text reserved} |
|
83 |
keywords which have to be avoided for generated code. However, if |
|
84 |
you consider @{text adaptation} mechanisms, the code generated by |
|
85 |
the serializer is just the tip of the iceberg: |
|
28601 | 86 |
|
87 |
\begin{itemize} |
|
38450 | 88 |
|
28635 | 89 |
\item @{text serialisation} can be \emph{parametrised} such that |
90 |
logical entities are mapped to target-specific ones |
|
38450 | 91 |
(e.g. target-specific list syntax, see also |
92 |
\secref{sec:adaptation_mechanisms}) |
|
93 |
||
28635 | 94 |
\item Such parametrisations can involve references to a |
38450 | 95 |
target-specific standard @{text library} (e.g. using the @{text |
96 |
Haskell} @{verbatim Maybe} type instead of the @{text HOL} |
|
97 |
@{type "option"} type); if such are used, the corresponding |
|
98 |
identifiers (in our example, @{verbatim Maybe}, @{verbatim |
|
99 |
Nothing} and @{verbatim Just}) also have to be considered @{text |
|
100 |
reserved}. |
|
101 |
||
28635 | 102 |
\item Even more, the user can enrich the library of the |
38450 | 103 |
target-language by providing code snippets (\qt{@{text |
104 |
"includes"}}) which are prepended to any generated code (see |
|
105 |
\secref{sec:include}); this typically also involves further |
|
106 |
@{text reserved} identifiers. |
|
107 |
||
28601 | 108 |
\end{itemize} |
28635 | 109 |
|
38450 | 110 |
\noindent As figure \ref{fig:adaptation} illustrates, all these |
111 |
adaptation mechanisms have to act consistently; it is at the |
|
112 |
discretion of the user to take care for this. |
|
28561 | 113 |
*} |
114 |
||
31050 | 115 |
subsection {* Common adaptation patterns *} |
28419 | 116 |
|
117 |
text {* |
|
28428 | 118 |
The @{theory HOL} @{theory Main} theory already provides a code |
38450 | 119 |
generator setup which should be suitable for most applications. |
120 |
Common extensions and modifications are available by certain |
|
121 |
theories of the @{text HOL} library; beside being useful in |
|
122 |
applications, they may serve as a tutorial for customising the code |
|
123 |
generator setup (see below \secref{sec:adaptation_mechanisms}). |
|
28419 | 124 |
|
125 |
\begin{description} |
|
126 |
||
38450 | 127 |
\item[@{theory "Code_Integer"}] represents @{text HOL} integers by |
128 |
big integer literals in target languages. |
|
129 |
||
130 |
\item[@{theory "Code_Char"}] represents @{text HOL} characters by |
|
28419 | 131 |
character literals in target languages. |
38450 | 132 |
|
133 |
\item[@{theory "Code_Char_chr"}] like @{text "Code_Char"}, but |
|
134 |
also offers treatment of character codes; includes @{theory |
|
135 |
"Code_Char"}. |
|
136 |
||
137 |
\item[@{theory "Efficient_Nat"}] \label{eff_nat} implements |
|
138 |
natural numbers by integers, which in general will result in |
|
139 |
higher efficiency; pattern matching with @{term "0\<Colon>nat"} / |
|
140 |
@{const "Suc"} is eliminated; includes @{theory "Code_Integer"} |
|
31206 | 141 |
and @{theory "Code_Numeral"}. |
38450 | 142 |
|
31206 | 143 |
\item[@{theory "Code_Numeral"}] provides an additional datatype |
38450 | 144 |
@{typ index} which is mapped to target-language built-in |
145 |
integers. Useful for code setups which involve e.g.~indexing |
|
146 |
of target-language arrays. |
|
147 |
||
148 |
\item[@{theory "String"}] provides an additional datatype @{typ |
|
149 |
String.literal} which is isomorphic to strings; @{typ |
|
150 |
String.literal}s are mapped to target-language strings. Useful |
|
151 |
for code setups which involve e.g.~printing (error) messages. |
|
28419 | 152 |
|
153 |
\end{description} |
|
154 |
||
155 |
\begin{warn} |
|
156 |
When importing any of these theories, they should form the last |
|
38450 | 157 |
items in an import list. Since these theories adapt the code |
158 |
generator setup in a non-conservative fashion, strange effects may |
|
159 |
occur otherwise. |
|
28419 | 160 |
\end{warn} |
161 |
*} |
|
162 |
||
163 |
||
31050 | 164 |
subsection {* Parametrising serialisation \label{sec:adaptation_mechanisms} *} |
28419 | 165 |
|
166 |
text {* |
|
38450 | 167 |
Consider the following function and its corresponding SML code: |
28419 | 168 |
*} |
169 |
||
28564 | 170 |
primrec %quote in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where |
28419 | 171 |
"in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l" |
28447 | 172 |
(*<*) |
28419 | 173 |
code_type %invisible bool |
174 |
(SML) |
|
175 |
code_const %invisible True and False and "op \<and>" and Not |
|
176 |
(SML and and and) |
|
28447 | 177 |
(*>*) |
39745 | 178 |
text %quotetypewriter {* |
39683 | 179 |
@{code_stmts in_interval (SML)} |
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
39643
diff
changeset
|
180 |
*} |
28419 | 181 |
|
182 |
text {* |
|
38450 | 183 |
\noindent Though this is correct code, it is a little bit |
184 |
unsatisfactory: boolean values and operators are materialised as |
|
185 |
distinguished entities with have nothing to do with the SML-built-in |
|
186 |
notion of \qt{bool}. This results in less readable code; |
|
187 |
additionally, eager evaluation may cause programs to loop or break |
|
188 |
which would perfectly terminate when the existing SML @{verbatim |
|
189 |
"bool"} would be used. To map the HOL @{typ bool} on SML @{verbatim |
|
190 |
"bool"}, we may use \qn{custom serialisations}: |
|
28419 | 191 |
*} |
192 |
||
39745 | 193 |
code_type %quotett bool |
28419 | 194 |
(SML "bool") |
39745 | 195 |
code_const %quotett True and False and "op \<and>" |
28419 | 196 |
(SML "true" and "false" and "_ andalso _") |
28213 | 197 |
|
28419 | 198 |
text {* |
38505 | 199 |
\noindent The @{command_def code_type} command takes a type constructor |
38450 | 200 |
as arguments together with a list of custom serialisations. Each |
201 |
custom serialisation starts with a target language identifier |
|
202 |
followed by an expression, which during code serialisation is |
|
203 |
inserted whenever the type constructor would occur. For constants, |
|
38505 | 204 |
@{command_def code_const} implements the corresponding mechanism. Each |
38450 | 205 |
``@{verbatim "_"}'' in a serialisation expression is treated as a |
206 |
placeholder for the type constructor's (the constant's) arguments. |
|
28419 | 207 |
*} |
208 |
||
39745 | 209 |
text %quotetypewriter {* |
39683 | 210 |
@{code_stmts in_interval (SML)} |
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
39643
diff
changeset
|
211 |
*} |
28419 | 212 |
|
213 |
text {* |
|
38450 | 214 |
\noindent This still is not perfect: the parentheses around the |
215 |
\qt{andalso} expression are superfluous. Though the serialiser by |
|
216 |
no means attempts to imitate the rich Isabelle syntax framework, it |
|
217 |
provides some common idioms, notably associative infixes with |
|
218 |
precedences which may be used here: |
|
28419 | 219 |
*} |
220 |
||
39745 | 221 |
code_const %quotett "op \<and>" |
28419 | 222 |
(SML infixl 1 "andalso") |
223 |
||
39745 | 224 |
text %quotetypewriter {* |
39683 | 225 |
@{code_stmts in_interval (SML)} |
39664
0afaf89ab591
more canonical type setting of type writer code examples
haftmann
parents:
39643
diff
changeset
|
226 |
*} |
28419 | 227 |
|
228 |
text {* |
|
38450 | 229 |
\noindent The attentive reader may ask how we assert that no |
230 |
generated code will accidentally overwrite. For this reason the |
|
231 |
serialiser has an internal table of identifiers which have to be |
|
232 |
avoided to be used for new declarations. Initially, this table |
|
233 |
typically contains the keywords of the target language. It can be |
|
234 |
extended manually, thus avoiding accidental overwrites, using the |
|
38505 | 235 |
@{command_def "code_reserved"} command: |
28561 | 236 |
*} |
237 |
||
28601 | 238 |
code_reserved %quote "\<SML>" bool true false andalso |
28561 | 239 |
|
240 |
text {* |
|
28447 | 241 |
\noindent Next, we try to map HOL pairs to SML pairs, using the |
28419 | 242 |
infix ``@{verbatim "*"}'' type constructor and parentheses: |
243 |
*} |
|
28447 | 244 |
(*<*) |
37836 | 245 |
code_type %invisible prod |
28419 | 246 |
(SML) |
247 |
code_const %invisible Pair |
|
248 |
(SML) |
|
28447 | 249 |
(*>*) |
39745 | 250 |
code_type %quotett prod |
28419 | 251 |
(SML infix 2 "*") |
39745 | 252 |
code_const %quotett Pair |
28419 | 253 |
(SML "!((_),/ (_))") |
254 |
||
255 |
text {* |
|
28593 | 256 |
\noindent The initial bang ``@{verbatim "!"}'' tells the serialiser |
38450 | 257 |
never to put parentheses around the whole expression (they are |
258 |
already present), while the parentheses around argument place |
|
259 |
holders tell not to put parentheses around the arguments. The slash |
|
260 |
``@{verbatim "/"}'' (followed by arbitrary white space) inserts a |
|
261 |
space which may be used as a break if necessary during pretty |
|
262 |
printing. |
|
28419 | 263 |
|
38450 | 264 |
These examples give a glimpse what mechanisms custom serialisations |
265 |
provide; however their usage requires careful thinking in order not |
|
266 |
to introduce inconsistencies -- or, in other words: custom |
|
267 |
serialisations are completely axiomatic. |
|
28419 | 268 |
|
39643 | 269 |
A further noteworthy detail is that any special character in a |
38450 | 270 |
custom serialisation may be quoted using ``@{verbatim "'"}''; thus, |
271 |
in ``@{verbatim "fn '_ => _"}'' the first ``@{verbatim "_"}'' is a |
|
272 |
proper underscore while the second ``@{verbatim "_"}'' is a |
|
273 |
placeholder. |
|
28419 | 274 |
*} |
275 |
||
276 |
||
277 |
subsection {* @{text Haskell} serialisation *} |
|
278 |
||
279 |
text {* |
|
38450 | 280 |
For convenience, the default @{text HOL} setup for @{text Haskell} |
39063 | 281 |
maps the @{class equal} class to its counterpart in @{text Haskell}, |
282 |
giving custom serialisations for the class @{class equal} (by command |
|
39643 | 283 |
@{command_def code_class}) and its operation @{const [source] HOL.equal} |
28419 | 284 |
*} |
285 |
||
39745 | 286 |
code_class %quotett equal |
28714 | 287 |
(Haskell "Eq") |
28419 | 288 |
|
39745 | 289 |
code_const %quotett "HOL.equal" |
28419 | 290 |
(Haskell infixl 4 "==") |
291 |
||
292 |
text {* |
|
38450 | 293 |
\noindent A problem now occurs whenever a type which is an instance |
39063 | 294 |
of @{class equal} in @{text HOL} is mapped on a @{text |
38450 | 295 |
Haskell}-built-in type which is also an instance of @{text Haskell} |
296 |
@{text Eq}: |
|
28419 | 297 |
*} |
298 |
||
28564 | 299 |
typedecl %quote bar |
28419 | 300 |
|
39063 | 301 |
instantiation %quote bar :: equal |
28419 | 302 |
begin |
303 |
||
39063 | 304 |
definition %quote "HOL.equal (x\<Colon>bar) y \<longleftrightarrow> x = y" |
28419 | 305 |
|
39063 | 306 |
instance %quote by default (simp add: equal_bar_def) |
28213 | 307 |
|
30880 | 308 |
end %quote (*<*) |
309 |
||
39745 | 310 |
(*>*) code_type %quotett bar |
28419 | 311 |
(Haskell "Integer") |
312 |
||
313 |
text {* |
|
38450 | 314 |
\noindent The code generator would produce an additional instance, |
315 |
which of course is rejected by the @{text Haskell} compiler. To |
|
38506 | 316 |
suppress this additional instance, use @{command_def "code_instance"}: |
28419 | 317 |
*} |
318 |
||
39745 | 319 |
code_instance %quotett bar :: equal |
28419 | 320 |
(Haskell -) |
321 |
||
28561 | 322 |
|
28635 | 323 |
subsection {* Enhancing the target language context \label{sec:include} *} |
28561 | 324 |
|
325 |
text {* |
|
28593 | 326 |
In rare cases it is necessary to \emph{enrich} the context of a |
38505 | 327 |
target language; this is accomplished using the @{command_def |
38450 | 328 |
"code_include"} command: |
28561 | 329 |
*} |
330 |
||
39745 | 331 |
code_include %quotett Haskell "Errno" |
28561 | 332 |
{*errno i = error ("Error number: " ++ show i)*} |
333 |
||
39745 | 334 |
code_reserved %quotett Haskell Errno |
28561 | 335 |
|
336 |
text {* |
|
38450 | 337 |
\noindent Such named @{text include}s are then prepended to every |
338 |
generated code. Inspect such code in order to find out how |
|
339 |
@{command "code_include"} behaves with respect to a particular |
|
340 |
target language. |
|
28561 | 341 |
*} |
342 |
||
28419 | 343 |
end |