| author | blanchet | 
| Fri, 27 May 2011 10:30:07 +0200 | |
| changeset 43005 | c96f06bffd90 | 
| parent 40351 | 090dac52cfd7 | 
| child 46519 | 17dde5feea4b | 
| permissions | -rw-r--r-- | 
| 31050 | 1 | theory Adaptation | 
| 28213 | 2 | imports Setup | 
| 3 | begin | |
| 4 | ||
| 40351 | 5 | setup %invisible {* Code_Target.extend_target ("\<SML>", ("SML", K I))
 | 
| 6 |   #> Code_Target.extend_target ("\<SMLdummy>", ("Haskell", K I)) *}
 | |
| 28561 | 7 | |
| 31050 | 8 | section {* Adaptation to target languages \label{sec:adaptation} *}
 | 
| 28419 | 9 | |
| 28561 | 10 | subsection {* Adapting code generation *}
 | 
| 11 | ||
| 12 | text {*
 | |
| 13 | The aspects of code generation introduced so far have two aspects | |
| 14 | in common: | |
| 15 | ||
| 16 |   \begin{itemize}
 | |
| 38450 | 17 | |
| 18 | \item They act uniformly, without reference to a specific target | |
| 19 | language. | |
| 20 | ||
| 28561 | 21 |     \item They are \emph{safe} in the sense that as long as you trust
 | 
| 22 | the code generator meta theory and implementation, you cannot | |
| 38450 | 23 | produce programs that yield results which are not derivable in | 
| 24 | the logic. | |
| 25 | ||
| 28561 | 26 |   \end{itemize}
 | 
| 27 | ||
| 38450 | 28 |   \noindent In this section we will introduce means to \emph{adapt}
 | 
| 29 | the serialiser to a specific target language, i.e.~to print program | |
| 30 |   fragments in a way which accommodates \qt{already existing}
 | |
| 31 | ingredients of a target language environment, for three reasons: | |
| 28561 | 32 | |
| 33 |   \begin{itemize}
 | |
| 28593 | 34 | \item improving readability and aesthetics of generated code | 
| 28561 | 35 | \item gaining efficiency | 
| 36 | \item interface with language parts which have no direct counterpart | |
| 37 |       in @{text "HOL"} (say, imperative data structures)
 | |
| 38 |   \end{itemize}
 | |
| 39 | ||
| 40 | \noindent Generally, you should avoid using those features yourself | |
| 41 |   \emph{at any cost}:
 | |
| 42 | ||
| 43 |   \begin{itemize}
 | |
| 38450 | 44 | |
| 45 | \item The safe configuration methods act uniformly on every target | |
| 46 | language, whereas for adaptation you have to treat each target | |
| 47 | language separately. | |
| 48 | ||
| 49 | \item Application is extremely tedious since there is no | |
| 50 | abstraction which would allow for a static check, making it easy | |
| 51 | to produce garbage. | |
| 52 | ||
| 34155 | 53 | \item Subtle errors can be introduced unconsciously. | 
| 38450 | 54 | |
| 28561 | 55 |   \end{itemize}
 | 
| 56 | ||
| 38450 | 57 | \noindent However, even if you ought refrain from setting up | 
| 58 |   adaptation yourself, already the @{text "HOL"} comes with some
 | |
| 59 | reasonable default adaptations (say, using target language list | |
| 60 | syntax). There also some common adaptation cases which you can | |
| 61 | setup by importing particular library theories. In order to | |
| 62 | understand these, we provide some clues here; these however are not | |
| 63 | supposed to replace a careful study of the sources. | |
| 28561 | 64 | *} | 
| 65 | ||
| 38450 | 66 | |
| 31050 | 67 | subsection {* The adaptation principle *}
 | 
| 28561 | 68 | |
| 69 | text {*
 | |
| 38450 | 70 |   Figure \ref{fig:adaptation} illustrates what \qt{adaptation} is
 | 
| 71 | conceptually supposed to be: | |
| 28601 | 72 | |
| 73 |   \begin{figure}[here]
 | |
| 31050 | 74 |     \includegraphics{adaptation}
 | 
| 75 |     \caption{The adaptation principle}
 | |
| 76 |     \label{fig:adaptation}
 | |
| 28601 | 77 |   \end{figure}
 | 
| 78 | ||
| 79 | \noindent In the tame view, code generation acts as broker between | |
| 38450 | 80 |   @{text logic}, @{text "intermediate language"} and @{text "target
 | 
| 81 |   language"} by means of @{text translation} and @{text
 | |
| 82 | serialisation}; for the latter, the serialiser has to observe the | |
| 83 |   structure of the @{text language} itself plus some @{text reserved}
 | |
| 84 | keywords which have to be avoided for generated code. However, if | |
| 85 |   you consider @{text adaptation} mechanisms, the code generated by
 | |
| 86 | the serializer is just the tip of the iceberg: | |
| 28601 | 87 | |
| 88 |   \begin{itemize}
 | |
| 38450 | 89 | |
| 28635 | 90 |     \item @{text serialisation} can be \emph{parametrised} such that
 | 
| 91 | logical entities are mapped to target-specific ones | |
| 38450 | 92 | (e.g. target-specific list syntax, see also | 
| 93 |       \secref{sec:adaptation_mechanisms})
 | |
| 94 | ||
| 28635 | 95 | \item Such parametrisations can involve references to a | 
| 38450 | 96 |       target-specific standard @{text library} (e.g. using the @{text
 | 
| 97 |       Haskell} @{verbatim Maybe} type instead of the @{text HOL}
 | |
| 98 |       @{type "option"} type); if such are used, the corresponding
 | |
| 99 |       identifiers (in our example, @{verbatim Maybe}, @{verbatim
 | |
| 100 |       Nothing} and @{verbatim Just}) also have to be considered @{text
 | |
| 101 | reserved}. | |
| 102 | ||
| 28635 | 103 | \item Even more, the user can enrich the library of the | 
| 38450 | 104 |       target-language by providing code snippets (\qt{@{text
 | 
| 105 | "includes"}}) which are prepended to any generated code (see | |
| 106 |       \secref{sec:include}); this typically also involves further
 | |
| 107 |       @{text reserved} identifiers.
 | |
| 108 | ||
| 28601 | 109 |   \end{itemize}
 | 
| 28635 | 110 | |
| 38450 | 111 |   \noindent As figure \ref{fig:adaptation} illustrates, all these
 | 
| 112 | adaptation mechanisms have to act consistently; it is at the | |
| 113 | discretion of the user to take care for this. | |
| 28561 | 114 | *} | 
| 115 | ||
| 31050 | 116 | subsection {* Common adaptation patterns *}
 | 
| 28419 | 117 | |
| 118 | text {*
 | |
| 28428 | 119 |   The @{theory HOL} @{theory Main} theory already provides a code
 | 
| 38450 | 120 | generator setup which should be suitable for most applications. | 
| 121 | Common extensions and modifications are available by certain | |
| 122 |   theories of the @{text HOL} library; beside being useful in
 | |
| 123 | applications, they may serve as a tutorial for customising the code | |
| 124 |   generator setup (see below \secref{sec:adaptation_mechanisms}).
 | |
| 28419 | 125 | |
| 126 |   \begin{description}
 | |
| 127 | ||
| 38450 | 128 |     \item[@{theory "Code_Integer"}] represents @{text HOL} integers by
 | 
| 129 | big integer literals in target languages. | |
| 130 | ||
| 131 |     \item[@{theory "Code_Char"}] represents @{text HOL} characters by
 | |
| 28419 | 132 | character literals in target languages. | 
| 38450 | 133 | |
| 134 |     \item[@{theory "Code_Char_chr"}] like @{text "Code_Char"}, but
 | |
| 135 |        also offers treatment of character codes; includes @{theory
 | |
| 136 | "Code_Char"}. | |
| 137 | ||
| 138 |     \item[@{theory "Efficient_Nat"}] \label{eff_nat} implements
 | |
| 139 | natural numbers by integers, which in general will result in | |
| 140 |        higher efficiency; pattern matching with @{term "0\<Colon>nat"} /
 | |
| 141 |        @{const "Suc"} is eliminated; includes @{theory "Code_Integer"}
 | |
| 31206 | 142 |        and @{theory "Code_Numeral"}.
 | 
| 38450 | 143 | |
| 31206 | 144 |     \item[@{theory "Code_Numeral"}] provides an additional datatype
 | 
| 38450 | 145 |        @{typ index} which is mapped to target-language built-in
 | 
| 146 | integers. Useful for code setups which involve e.g.~indexing | |
| 147 | of target-language arrays. | |
| 148 | ||
| 149 |     \item[@{theory "String"}] provides an additional datatype @{typ
 | |
| 150 |        String.literal} which is isomorphic to strings; @{typ
 | |
| 151 | String.literal}s are mapped to target-language strings. Useful | |
| 152 | for code setups which involve e.g.~printing (error) messages. | |
| 28419 | 153 | |
| 154 |   \end{description}
 | |
| 155 | ||
| 156 |   \begin{warn}
 | |
| 157 | When importing any of these theories, they should form the last | |
| 38450 | 158 | items in an import list. Since these theories adapt the code | 
| 159 | generator setup in a non-conservative fashion, strange effects may | |
| 160 | occur otherwise. | |
| 28419 | 161 |   \end{warn}
 | 
| 162 | *} | |
| 163 | ||
| 164 | ||
| 31050 | 165 | subsection {* Parametrising serialisation \label{sec:adaptation_mechanisms} *}
 | 
| 28419 | 166 | |
| 167 | text {*
 | |
| 38450 | 168 | Consider the following function and its corresponding SML code: | 
| 28419 | 169 | *} | 
| 170 | ||
| 28564 | 171 | primrec %quote in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where | 
| 28419 | 172 | "in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l" | 
| 28447 | 173 | (*<*) | 
| 28419 | 174 | code_type %invisible bool | 
| 175 | (SML) | |
| 176 | code_const %invisible True and False and "op \<and>" and Not | |
| 177 | (SML and and and) | |
| 28447 | 178 | (*>*) | 
| 39745 | 179 | text %quotetypewriter {*
 | 
| 39683 | 180 |   @{code_stmts in_interval (SML)}
 | 
| 39664 
0afaf89ab591
more canonical type setting of type writer code examples
 haftmann parents: 
39643diff
changeset | 181 | *} | 
| 28419 | 182 | |
| 183 | text {*
 | |
| 38450 | 184 | \noindent Though this is correct code, it is a little bit | 
| 185 | unsatisfactory: boolean values and operators are materialised as | |
| 186 | distinguished entities with have nothing to do with the SML-built-in | |
| 187 |   notion of \qt{bool}.  This results in less readable code;
 | |
| 188 | additionally, eager evaluation may cause programs to loop or break | |
| 189 |   which would perfectly terminate when the existing SML @{verbatim
 | |
| 190 |   "bool"} would be used.  To map the HOL @{typ bool} on SML @{verbatim
 | |
| 191 |   "bool"}, we may use \qn{custom serialisations}:
 | |
| 28419 | 192 | *} | 
| 193 | ||
| 39745 | 194 | code_type %quotett bool | 
| 28419 | 195 | (SML "bool") | 
| 39745 | 196 | code_const %quotett True and False and "op \<and>" | 
| 28419 | 197 | (SML "true" and "false" and "_ andalso _") | 
| 28213 | 198 | |
| 28419 | 199 | text {*
 | 
| 38505 | 200 |   \noindent The @{command_def code_type} command takes a type constructor
 | 
| 38450 | 201 | as arguments together with a list of custom serialisations. Each | 
| 202 | custom serialisation starts with a target language identifier | |
| 203 | followed by an expression, which during code serialisation is | |
| 204 | inserted whenever the type constructor would occur. For constants, | |
| 38505 | 205 |   @{command_def code_const} implements the corresponding mechanism.  Each
 | 
| 38450 | 206 |   ``@{verbatim "_"}'' in a serialisation expression is treated as a
 | 
| 207 | placeholder for the type constructor's (the constant's) arguments. | |
| 28419 | 208 | *} | 
| 209 | ||
| 39745 | 210 | text %quotetypewriter {*
 | 
| 39683 | 211 |   @{code_stmts in_interval (SML)}
 | 
| 39664 
0afaf89ab591
more canonical type setting of type writer code examples
 haftmann parents: 
39643diff
changeset | 212 | *} | 
| 28419 | 213 | |
| 214 | text {*
 | |
| 38450 | 215 | \noindent This still is not perfect: the parentheses around the | 
| 216 |   \qt{andalso} expression are superfluous.  Though the serialiser by
 | |
| 217 | no means attempts to imitate the rich Isabelle syntax framework, it | |
| 218 | provides some common idioms, notably associative infixes with | |
| 219 | precedences which may be used here: | |
| 28419 | 220 | *} | 
| 221 | ||
| 39745 | 222 | code_const %quotett "op \<and>" | 
| 28419 | 223 | (SML infixl 1 "andalso") | 
| 224 | ||
| 39745 | 225 | text %quotetypewriter {*
 | 
| 39683 | 226 |   @{code_stmts in_interval (SML)}
 | 
| 39664 
0afaf89ab591
more canonical type setting of type writer code examples
 haftmann parents: 
39643diff
changeset | 227 | *} | 
| 28419 | 228 | |
| 229 | text {*
 | |
| 38450 | 230 | \noindent The attentive reader may ask how we assert that no | 
| 231 | generated code will accidentally overwrite. For this reason the | |
| 232 | serialiser has an internal table of identifiers which have to be | |
| 233 | avoided to be used for new declarations. Initially, this table | |
| 234 | typically contains the keywords of the target language. It can be | |
| 235 | extended manually, thus avoiding accidental overwrites, using the | |
| 38505 | 236 |   @{command_def "code_reserved"} command:
 | 
| 28561 | 237 | *} | 
| 238 | ||
| 40351 | 239 | code_reserved %quote "\<SMLdummy>" bool true false andalso | 
| 28561 | 240 | |
| 241 | text {*
 | |
| 28447 | 242 | \noindent Next, we try to map HOL pairs to SML pairs, using the | 
| 28419 | 243 |   infix ``@{verbatim "*"}'' type constructor and parentheses:
 | 
| 244 | *} | |
| 28447 | 245 | (*<*) | 
| 37836 | 246 | code_type %invisible prod | 
| 28419 | 247 | (SML) | 
| 248 | code_const %invisible Pair | |
| 249 | (SML) | |
| 28447 | 250 | (*>*) | 
| 39745 | 251 | code_type %quotett prod | 
| 28419 | 252 | (SML infix 2 "*") | 
| 39745 | 253 | code_const %quotett Pair | 
| 28419 | 254 | (SML "!((_),/ (_))") | 
| 255 | ||
| 256 | text {*
 | |
| 28593 | 257 |   \noindent The initial bang ``@{verbatim "!"}'' tells the serialiser
 | 
| 38450 | 258 | never to put parentheses around the whole expression (they are | 
| 259 | already present), while the parentheses around argument place | |
| 260 | holders tell not to put parentheses around the arguments. The slash | |
| 261 |   ``@{verbatim "/"}'' (followed by arbitrary white space) inserts a
 | |
| 262 | space which may be used as a break if necessary during pretty | |
| 263 | printing. | |
| 28419 | 264 | |
| 38450 | 265 | These examples give a glimpse what mechanisms custom serialisations | 
| 266 | provide; however their usage requires careful thinking in order not | |
| 267 | to introduce inconsistencies -- or, in other words: custom | |
| 268 | serialisations are completely axiomatic. | |
| 28419 | 269 | |
| 39643 | 270 | A further noteworthy detail is that any special character in a | 
| 38450 | 271 |   custom serialisation may be quoted using ``@{verbatim "'"}''; thus,
 | 
| 272 |   in ``@{verbatim "fn '_ => _"}'' the first ``@{verbatim "_"}'' is a
 | |
| 273 |   proper underscore while the second ``@{verbatim "_"}'' is a
 | |
| 274 | placeholder. | |
| 28419 | 275 | *} | 
| 276 | ||
| 277 | ||
| 278 | subsection {* @{text Haskell} serialisation *}
 | |
| 279 | ||
| 280 | text {*
 | |
| 38450 | 281 |   For convenience, the default @{text HOL} setup for @{text Haskell}
 | 
| 39063 | 282 |   maps the @{class equal} class to its counterpart in @{text Haskell},
 | 
| 283 |   giving custom serialisations for the class @{class equal} (by command
 | |
| 39643 | 284 |   @{command_def code_class}) and its operation @{const [source] HOL.equal}
 | 
| 28419 | 285 | *} | 
| 286 | ||
| 39745 | 287 | code_class %quotett equal | 
| 28714 | 288 | (Haskell "Eq") | 
| 28419 | 289 | |
| 39745 | 290 | code_const %quotett "HOL.equal" | 
| 28419 | 291 | (Haskell infixl 4 "==") | 
| 292 | ||
| 293 | text {*
 | |
| 38450 | 294 | \noindent A problem now occurs whenever a type which is an instance | 
| 39063 | 295 |   of @{class equal} in @{text HOL} is mapped on a @{text
 | 
| 38450 | 296 |   Haskell}-built-in type which is also an instance of @{text Haskell}
 | 
| 297 |   @{text Eq}:
 | |
| 28419 | 298 | *} | 
| 299 | ||
| 28564 | 300 | typedecl %quote bar | 
| 28419 | 301 | |
| 39063 | 302 | instantiation %quote bar :: equal | 
| 28419 | 303 | begin | 
| 304 | ||
| 39063 | 305 | definition %quote "HOL.equal (x\<Colon>bar) y \<longleftrightarrow> x = y" | 
| 28419 | 306 | |
| 39063 | 307 | instance %quote by default (simp add: equal_bar_def) | 
| 28213 | 308 | |
| 30880 | 309 | end %quote (*<*) | 
| 310 | ||
| 39745 | 311 | (*>*) code_type %quotett bar | 
| 28419 | 312 | (Haskell "Integer") | 
| 313 | ||
| 314 | text {*
 | |
| 38450 | 315 | \noindent The code generator would produce an additional instance, | 
| 316 |   which of course is rejected by the @{text Haskell} compiler.  To
 | |
| 38506 | 317 |   suppress this additional instance, use @{command_def "code_instance"}:
 | 
| 28419 | 318 | *} | 
| 319 | ||
| 39745 | 320 | code_instance %quotett bar :: equal | 
| 28419 | 321 | (Haskell -) | 
| 322 | ||
| 28561 | 323 | |
| 28635 | 324 | subsection {* Enhancing the target language context \label{sec:include} *}
 | 
| 28561 | 325 | |
| 326 | text {*
 | |
| 28593 | 327 |   In rare cases it is necessary to \emph{enrich} the context of a
 | 
| 38505 | 328 |   target language; this is accomplished using the @{command_def
 | 
| 38450 | 329 | "code_include"} command: | 
| 28561 | 330 | *} | 
| 331 | ||
| 39745 | 332 | code_include %quotett Haskell "Errno" | 
| 28561 | 333 | {*errno i = error ("Error number: " ++ show i)*}
 | 
| 334 | ||
| 39745 | 335 | code_reserved %quotett Haskell Errno | 
| 28561 | 336 | |
| 337 | text {*
 | |
| 38450 | 338 |   \noindent Such named @{text include}s are then prepended to every
 | 
| 339 | generated code. Inspect such code in order to find out how | |
| 340 |   @{command "code_include"} behaves with respect to a particular
 | |
| 341 | target language. | |
| 28561 | 342 | *} | 
| 343 | ||
| 28419 | 344 | end |