| 31050 |      1 | theory Adaptation
 | 
| 28213 |      2 | imports Setup
 | 
|  |      3 | begin
 | 
|  |      4 | 
 | 
| 28679 |      5 | setup %invisible {* Code_Target.extend_target ("\<SML>", ("SML", K I)) *}
 | 
| 28561 |      6 | 
 | 
| 31050 |      7 | section {* Adaptation to target languages \label{sec:adaptation} *}
 | 
| 28419 |      8 | 
 | 
| 28561 |      9 | subsection {* Adapting code generation *}
 | 
|  |     10 | 
 | 
|  |     11 | text {*
 | 
|  |     12 |   The aspects of code generation introduced so far have two aspects
 | 
|  |     13 |   in common:
 | 
|  |     14 | 
 | 
|  |     15 |   \begin{itemize}
 | 
| 38450 |     16 | 
 | 
|  |     17 |     \item They act uniformly, without reference to a specific target
 | 
|  |     18 |        language.
 | 
|  |     19 | 
 | 
| 28561 |     20 |     \item They are \emph{safe} in the sense that as long as you trust
 | 
|  |     21 |        the code generator meta theory and implementation, you cannot
 | 
| 38450 |     22 |        produce programs that yield results which are not derivable in
 | 
|  |     23 |        the logic.
 | 
|  |     24 | 
 | 
| 28561 |     25 |   \end{itemize}
 | 
|  |     26 | 
 | 
| 38450 |     27 |   \noindent In this section we will introduce means to \emph{adapt}
 | 
|  |     28 |   the serialiser to a specific target language, i.e.~to print program
 | 
|  |     29 |   fragments in a way which accommodates \qt{already existing}
 | 
|  |     30 |   ingredients of a target language environment, for three reasons:
 | 
| 28561 |     31 | 
 | 
|  |     32 |   \begin{itemize}
 | 
| 28593 |     33 |     \item improving readability and aesthetics of generated code
 | 
| 28561 |     34 |     \item gaining efficiency
 | 
|  |     35 |     \item interface with language parts which have no direct counterpart
 | 
|  |     36 |       in @{text "HOL"} (say, imperative data structures)
 | 
|  |     37 |   \end{itemize}
 | 
|  |     38 | 
 | 
|  |     39 |   \noindent Generally, you should avoid using those features yourself
 | 
|  |     40 |   \emph{at any cost}:
 | 
|  |     41 | 
 | 
|  |     42 |   \begin{itemize}
 | 
| 38450 |     43 | 
 | 
|  |     44 |     \item The safe configuration methods act uniformly on every target
 | 
|  |     45 |       language, whereas for adaptation you have to treat each target
 | 
|  |     46 |       language separately.
 | 
|  |     47 | 
 | 
|  |     48 |     \item Application is extremely tedious since there is no
 | 
|  |     49 |       abstraction which would allow for a static check, making it easy
 | 
|  |     50 |       to produce garbage.
 | 
|  |     51 | 
 | 
| 34155 |     52 |     \item Subtle errors can be introduced unconsciously.
 | 
| 38450 |     53 | 
 | 
| 28561 |     54 |   \end{itemize}
 | 
|  |     55 | 
 | 
| 38450 |     56 |   \noindent However, even if you ought refrain from setting up
 | 
|  |     57 |   adaptation yourself, already the @{text "HOL"} comes with some
 | 
|  |     58 |   reasonable default adaptations (say, using target language list
 | 
|  |     59 |   syntax).  There also some common adaptation cases which you can
 | 
|  |     60 |   setup by importing particular library theories.  In order to
 | 
|  |     61 |   understand these, we provide some clues here; these however are not
 | 
|  |     62 |   supposed to replace a careful study of the sources.
 | 
| 28561 |     63 | *}
 | 
|  |     64 | 
 | 
| 38450 |     65 | 
 | 
| 31050 |     66 | subsection {* The adaptation principle *}
 | 
| 28561 |     67 | 
 | 
|  |     68 | text {*
 | 
| 38450 |     69 |   Figure \ref{fig:adaptation} illustrates what \qt{adaptation} is
 | 
|  |     70 |   conceptually supposed to be:
 | 
| 28601 |     71 | 
 | 
|  |     72 |   \begin{figure}[here]
 | 
| 31050 |     73 |     \includegraphics{adaptation}
 | 
|  |     74 |     \caption{The adaptation principle}
 | 
|  |     75 |     \label{fig:adaptation}
 | 
| 28601 |     76 |   \end{figure}
 | 
|  |     77 | 
 | 
|  |     78 |   \noindent In the tame view, code generation acts as broker between
 | 
| 38450 |     79 |   @{text logic}, @{text "intermediate language"} and @{text "target
 | 
|  |     80 |   language"} by means of @{text translation} and @{text
 | 
|  |     81 |   serialisation}; for the latter, the serialiser has to observe the
 | 
|  |     82 |   structure of the @{text language} itself plus some @{text reserved}
 | 
|  |     83 |   keywords which have to be avoided for generated code.  However, if
 | 
|  |     84 |   you consider @{text adaptation} mechanisms, the code generated by
 | 
|  |     85 |   the serializer is just the tip of the iceberg:
 | 
| 28601 |     86 | 
 | 
|  |     87 |   \begin{itemize}
 | 
| 38450 |     88 | 
 | 
| 28635 |     89 |     \item @{text serialisation} can be \emph{parametrised} such that
 | 
|  |     90 |       logical entities are mapped to target-specific ones
 | 
| 38450 |     91 |       (e.g. target-specific list syntax, see also
 | 
|  |     92 |       \secref{sec:adaptation_mechanisms})
 | 
|  |     93 | 
 | 
| 28635 |     94 |     \item Such parametrisations can involve references to a
 | 
| 38450 |     95 |       target-specific standard @{text library} (e.g. using the @{text
 | 
|  |     96 |       Haskell} @{verbatim Maybe} type instead of the @{text HOL}
 | 
|  |     97 |       @{type "option"} type); if such are used, the corresponding
 | 
|  |     98 |       identifiers (in our example, @{verbatim Maybe}, @{verbatim
 | 
|  |     99 |       Nothing} and @{verbatim Just}) also have to be considered @{text
 | 
|  |    100 |       reserved}.
 | 
|  |    101 | 
 | 
| 28635 |    102 |     \item Even more, the user can enrich the library of the
 | 
| 38450 |    103 |       target-language by providing code snippets (\qt{@{text
 | 
|  |    104 |       "includes"}}) which are prepended to any generated code (see
 | 
|  |    105 |       \secref{sec:include}); this typically also involves further
 | 
|  |    106 |       @{text reserved} identifiers.
 | 
|  |    107 | 
 | 
| 28601 |    108 |   \end{itemize}
 | 
| 28635 |    109 | 
 | 
| 38450 |    110 |   \noindent As figure \ref{fig:adaptation} illustrates, all these
 | 
|  |    111 |   adaptation mechanisms have to act consistently; it is at the
 | 
|  |    112 |   discretion of the user to take care for this.
 | 
| 28561 |    113 | *}
 | 
|  |    114 | 
 | 
| 31050 |    115 | subsection {* Common adaptation patterns *}
 | 
| 28419 |    116 | 
 | 
|  |    117 | text {*
 | 
| 28428 |    118 |   The @{theory HOL} @{theory Main} theory already provides a code
 | 
| 38450 |    119 |   generator setup which should be suitable for most applications.
 | 
|  |    120 |   Common extensions and modifications are available by certain
 | 
|  |    121 |   theories of the @{text HOL} library; beside being useful in
 | 
|  |    122 |   applications, they may serve as a tutorial for customising the code
 | 
|  |    123 |   generator setup (see below \secref{sec:adaptation_mechanisms}).
 | 
| 28419 |    124 | 
 | 
|  |    125 |   \begin{description}
 | 
|  |    126 | 
 | 
| 38450 |    127 |     \item[@{theory "Code_Integer"}] represents @{text HOL} integers by
 | 
|  |    128 |        big integer literals in target languages.
 | 
|  |    129 | 
 | 
|  |    130 |     \item[@{theory "Code_Char"}] represents @{text HOL} characters by
 | 
| 28419 |    131 |        character literals in target languages.
 | 
| 38450 |    132 | 
 | 
|  |    133 |     \item[@{theory "Code_Char_chr"}] like @{text "Code_Char"}, but
 | 
|  |    134 |        also offers treatment of character codes; includes @{theory
 | 
|  |    135 |        "Code_Char"}.
 | 
|  |    136 | 
 | 
|  |    137 |     \item[@{theory "Efficient_Nat"}] \label{eff_nat} implements
 | 
|  |    138 |        natural numbers by integers, which in general will result in
 | 
|  |    139 |        higher efficiency; pattern matching with @{term "0\<Colon>nat"} /
 | 
|  |    140 |        @{const "Suc"} is eliminated; includes @{theory "Code_Integer"}
 | 
| 31206 |    141 |        and @{theory "Code_Numeral"}.
 | 
| 38450 |    142 | 
 | 
| 31206 |    143 |     \item[@{theory "Code_Numeral"}] provides an additional datatype
 | 
| 38450 |    144 |        @{typ index} which is mapped to target-language built-in
 | 
|  |    145 |        integers.  Useful for code setups which involve e.g.~indexing
 | 
|  |    146 |        of target-language arrays.
 | 
|  |    147 | 
 | 
|  |    148 |     \item[@{theory "String"}] provides an additional datatype @{typ
 | 
|  |    149 |        String.literal} which is isomorphic to strings; @{typ
 | 
|  |    150 |        String.literal}s are mapped to target-language strings.  Useful
 | 
|  |    151 |        for code setups which involve e.g.~printing (error) messages.
 | 
| 28419 |    152 | 
 | 
|  |    153 |   \end{description}
 | 
|  |    154 | 
 | 
|  |    155 |   \begin{warn}
 | 
|  |    156 |     When importing any of these theories, they should form the last
 | 
| 38450 |    157 |     items in an import list.  Since these theories adapt the code
 | 
|  |    158 |     generator setup in a non-conservative fashion, strange effects may
 | 
|  |    159 |     occur otherwise.
 | 
| 28419 |    160 |   \end{warn}
 | 
|  |    161 | *}
 | 
|  |    162 | 
 | 
|  |    163 | 
 | 
| 31050 |    164 | subsection {* Parametrising serialisation \label{sec:adaptation_mechanisms} *}
 | 
| 28419 |    165 | 
 | 
|  |    166 | text {*
 | 
| 38450 |    167 |   Consider the following function and its corresponding SML code:
 | 
| 28419 |    168 | *}
 | 
|  |    169 | 
 | 
| 28564 |    170 | primrec %quote in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where
 | 
| 28419 |    171 |   "in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l"
 | 
| 28447 |    172 | (*<*)
 | 
| 28419 |    173 | code_type %invisible bool
 | 
|  |    174 |   (SML)
 | 
|  |    175 | code_const %invisible True and False and "op \<and>" and Not
 | 
|  |    176 |   (SML and and and)
 | 
| 28447 |    177 | (*>*)
 | 
| 28564 |    178 | text %quote {*@{code_stmts in_interval (SML)}*}
 | 
| 28419 |    179 | 
 | 
|  |    180 | text {*
 | 
| 38450 |    181 |   \noindent Though this is correct code, it is a little bit
 | 
|  |    182 |   unsatisfactory: boolean values and operators are materialised as
 | 
|  |    183 |   distinguished entities with have nothing to do with the SML-built-in
 | 
|  |    184 |   notion of \qt{bool}.  This results in less readable code;
 | 
|  |    185 |   additionally, eager evaluation may cause programs to loop or break
 | 
|  |    186 |   which would perfectly terminate when the existing SML @{verbatim
 | 
|  |    187 |   "bool"} would be used.  To map the HOL @{typ bool} on SML @{verbatim
 | 
|  |    188 |   "bool"}, we may use \qn{custom serialisations}:
 | 
| 28419 |    189 | *}
 | 
|  |    190 | 
 | 
| 28564 |    191 | code_type %quotett bool
 | 
| 28419 |    192 |   (SML "bool")
 | 
| 28564 |    193 | code_const %quotett True and False and "op \<and>"
 | 
| 28419 |    194 |   (SML "true" and "false" and "_ andalso _")
 | 
| 28213 |    195 | 
 | 
| 28419 |    196 | text {*
 | 
| 38505 |    197 |   \noindent The @{command_def code_type} command takes a type constructor
 | 
| 38450 |    198 |   as arguments together with a list of custom serialisations.  Each
 | 
|  |    199 |   custom serialisation starts with a target language identifier
 | 
|  |    200 |   followed by an expression, which during code serialisation is
 | 
|  |    201 |   inserted whenever the type constructor would occur.  For constants,
 | 
| 38505 |    202 |   @{command_def code_const} implements the corresponding mechanism.  Each
 | 
| 38450 |    203 |   ``@{verbatim "_"}'' in a serialisation expression is treated as a
 | 
|  |    204 |   placeholder for the type constructor's (the constant's) arguments.
 | 
| 28419 |    205 | *}
 | 
|  |    206 | 
 | 
| 28564 |    207 | text %quote {*@{code_stmts in_interval (SML)}*}
 | 
| 28419 |    208 | 
 | 
|  |    209 | text {*
 | 
| 38450 |    210 |   \noindent This still is not perfect: the parentheses around the
 | 
|  |    211 |   \qt{andalso} expression are superfluous.  Though the serialiser by
 | 
|  |    212 |   no means attempts to imitate the rich Isabelle syntax framework, it
 | 
|  |    213 |   provides some common idioms, notably associative infixes with
 | 
|  |    214 |   precedences which may be used here:
 | 
| 28419 |    215 | *}
 | 
|  |    216 | 
 | 
| 28564 |    217 | code_const %quotett "op \<and>"
 | 
| 28419 |    218 |   (SML infixl 1 "andalso")
 | 
|  |    219 | 
 | 
| 28564 |    220 | text %quote {*@{code_stmts in_interval (SML)}*}
 | 
| 28419 |    221 | 
 | 
|  |    222 | text {*
 | 
| 38450 |    223 |   \noindent The attentive reader may ask how we assert that no
 | 
|  |    224 |   generated code will accidentally overwrite.  For this reason the
 | 
|  |    225 |   serialiser has an internal table of identifiers which have to be
 | 
|  |    226 |   avoided to be used for new declarations.  Initially, this table
 | 
|  |    227 |   typically contains the keywords of the target language.  It can be
 | 
|  |    228 |   extended manually, thus avoiding accidental overwrites, using the
 | 
| 38505 |    229 |   @{command_def "code_reserved"} command:
 | 
| 28561 |    230 | *}
 | 
|  |    231 | 
 | 
| 28601 |    232 | code_reserved %quote "\<SML>" bool true false andalso
 | 
| 28561 |    233 | 
 | 
|  |    234 | text {*
 | 
| 28447 |    235 |   \noindent Next, we try to map HOL pairs to SML pairs, using the
 | 
| 28419 |    236 |   infix ``@{verbatim "*"}'' type constructor and parentheses:
 | 
|  |    237 | *}
 | 
| 28447 |    238 | (*<*)
 | 
| 37836 |    239 | code_type %invisible prod
 | 
| 28419 |    240 |   (SML)
 | 
|  |    241 | code_const %invisible Pair
 | 
|  |    242 |   (SML)
 | 
| 28447 |    243 | (*>*)
 | 
| 37836 |    244 | code_type %quotett prod
 | 
| 28419 |    245 |   (SML infix 2 "*")
 | 
| 28564 |    246 | code_const %quotett Pair
 | 
| 28419 |    247 |   (SML "!((_),/ (_))")
 | 
|  |    248 | 
 | 
|  |    249 | text {*
 | 
| 28593 |    250 |   \noindent The initial bang ``@{verbatim "!"}'' tells the serialiser
 | 
| 38450 |    251 |   never to put parentheses around the whole expression (they are
 | 
|  |    252 |   already present), while the parentheses around argument place
 | 
|  |    253 |   holders tell not to put parentheses around the arguments.  The slash
 | 
|  |    254 |   ``@{verbatim "/"}'' (followed by arbitrary white space) inserts a
 | 
|  |    255 |   space which may be used as a break if necessary during pretty
 | 
|  |    256 |   printing.
 | 
| 28419 |    257 | 
 | 
| 38450 |    258 |   These examples give a glimpse what mechanisms custom serialisations
 | 
|  |    259 |   provide; however their usage requires careful thinking in order not
 | 
|  |    260 |   to introduce inconsistencies -- or, in other words: custom
 | 
|  |    261 |   serialisations are completely axiomatic.
 | 
| 28419 |    262 | 
 | 
| 38450 |    263 |   A further noteworthy details is that any special character in a
 | 
|  |    264 |   custom serialisation may be quoted using ``@{verbatim "'"}''; thus,
 | 
|  |    265 |   in ``@{verbatim "fn '_ => _"}'' the first ``@{verbatim "_"}'' is a
 | 
|  |    266 |   proper underscore while the second ``@{verbatim "_"}'' is a
 | 
|  |    267 |   placeholder.
 | 
| 28419 |    268 | *}
 | 
|  |    269 | 
 | 
|  |    270 | 
 | 
|  |    271 | subsection {* @{text Haskell} serialisation *}
 | 
|  |    272 | 
 | 
|  |    273 | text {*
 | 
| 38450 |    274 |   For convenience, the default @{text HOL} setup for @{text Haskell}
 | 
|  |    275 |   maps the @{class eq} class to its counterpart in @{text Haskell},
 | 
|  |    276 |   giving custom serialisations for the class @{class eq} (by command
 | 
| 38505 |    277 |   @{command_def code_class}) and its operation @{const HOL.eq}
 | 
| 28419 |    278 | *}
 | 
|  |    279 | 
 | 
| 28564 |    280 | code_class %quotett eq
 | 
| 28714 |    281 |   (Haskell "Eq")
 | 
| 28419 |    282 | 
 | 
| 28564 |    283 | code_const %quotett "op ="
 | 
| 28419 |    284 |   (Haskell infixl 4 "==")
 | 
|  |    285 | 
 | 
|  |    286 | text {*
 | 
| 38450 |    287 |   \noindent A problem now occurs whenever a type which is an instance
 | 
|  |    288 |   of @{class eq} in @{text HOL} is mapped on a @{text
 | 
|  |    289 |   Haskell}-built-in type which is also an instance of @{text Haskell}
 | 
|  |    290 |   @{text Eq}:
 | 
| 28419 |    291 | *}
 | 
|  |    292 | 
 | 
| 28564 |    293 | typedecl %quote bar
 | 
| 28419 |    294 | 
 | 
| 28564 |    295 | instantiation %quote bar :: eq
 | 
| 28419 |    296 | begin
 | 
|  |    297 | 
 | 
| 28564 |    298 | definition %quote "eq_class.eq (x\<Colon>bar) y \<longleftrightarrow> x = y"
 | 
| 28419 |    299 | 
 | 
| 28564 |    300 | instance %quote by default (simp add: eq_bar_def)
 | 
| 28213 |    301 | 
 | 
| 30880 |    302 | end %quote (*<*)
 | 
|  |    303 | 
 | 
|  |    304 | (*>*) code_type %quotett bar
 | 
| 28419 |    305 |   (Haskell "Integer")
 | 
|  |    306 | 
 | 
|  |    307 | text {*
 | 
| 38450 |    308 |   \noindent The code generator would produce an additional instance,
 | 
|  |    309 |   which of course is rejected by the @{text Haskell} compiler.  To
 | 
| 38506 |    310 |   suppress this additional instance, use @{command_def "code_instance"}:
 | 
| 28419 |    311 | *}
 | 
|  |    312 | 
 | 
| 28564 |    313 | code_instance %quotett bar :: eq
 | 
| 28419 |    314 |   (Haskell -)
 | 
|  |    315 | 
 | 
| 28561 |    316 | 
 | 
| 28635 |    317 | subsection {* Enhancing the target language context \label{sec:include} *}
 | 
| 28561 |    318 | 
 | 
|  |    319 | text {*
 | 
| 28593 |    320 |   In rare cases it is necessary to \emph{enrich} the context of a
 | 
| 38505 |    321 |   target language; this is accomplished using the @{command_def
 | 
| 38450 |    322 |   "code_include"} command:
 | 
| 28561 |    323 | *}
 | 
|  |    324 | 
 | 
| 28564 |    325 | code_include %quotett Haskell "Errno"
 | 
| 28561 |    326 | {*errno i = error ("Error number: " ++ show i)*}
 | 
|  |    327 | 
 | 
| 28564 |    328 | code_reserved %quotett Haskell Errno
 | 
| 28561 |    329 | 
 | 
|  |    330 | text {*
 | 
| 38450 |    331 |   \noindent Such named @{text include}s are then prepended to every
 | 
|  |    332 |   generated code.  Inspect such code in order to find out how
 | 
|  |    333 |   @{command "code_include"} behaves with respect to a particular
 | 
|  |    334 |   target language.
 | 
| 28561 |    335 | *}
 | 
|  |    336 | 
 | 
| 28419 |    337 | end
 |