doc-src/Codegen/Thy/Adaptation.thy
author haftmann
Thu Sep 23 15:46:17 2010 +0200 (2010-09-23)
changeset 39664 0afaf89ab591
parent 39643 29cc021398fc
child 39683 f75a01ee6c41
permissions -rw-r--r--
more canonical type setting of type writer code examples
     1 theory Adaptation
     2 imports Setup
     3 begin
     4 
     5 setup %invisible {* Code_Target.extend_target ("\<SML>", ("SML", K I)) *}
     6 
     7 section {* Adaptation to target languages \label{sec:adaptation} *}
     8 
     9 subsection {* Adapting code generation *}
    10 
    11 text {*
    12   The aspects of code generation introduced so far have two aspects
    13   in common:
    14 
    15   \begin{itemize}
    16 
    17     \item They act uniformly, without reference to a specific target
    18        language.
    19 
    20     \item They are \emph{safe} in the sense that as long as you trust
    21        the code generator meta theory and implementation, you cannot
    22        produce programs that yield results which are not derivable in
    23        the logic.
    24 
    25   \end{itemize}
    26 
    27   \noindent In this section we will introduce means to \emph{adapt}
    28   the serialiser to a specific target language, i.e.~to print program
    29   fragments in a way which accommodates \qt{already existing}
    30   ingredients of a target language environment, for three reasons:
    31 
    32   \begin{itemize}
    33     \item improving readability and aesthetics of generated code
    34     \item gaining efficiency
    35     \item interface with language parts which have no direct counterpart
    36       in @{text "HOL"} (say, imperative data structures)
    37   \end{itemize}
    38 
    39   \noindent Generally, you should avoid using those features yourself
    40   \emph{at any cost}:
    41 
    42   \begin{itemize}
    43 
    44     \item The safe configuration methods act uniformly on every target
    45       language, whereas for adaptation you have to treat each target
    46       language separately.
    47 
    48     \item Application is extremely tedious since there is no
    49       abstraction which would allow for a static check, making it easy
    50       to produce garbage.
    51 
    52     \item Subtle errors can be introduced unconsciously.
    53 
    54   \end{itemize}
    55 
    56   \noindent However, even if you ought refrain from setting up
    57   adaptation yourself, already the @{text "HOL"} comes with some
    58   reasonable default adaptations (say, using target language list
    59   syntax).  There also some common adaptation cases which you can
    60   setup by importing particular library theories.  In order to
    61   understand these, we provide some clues here; these however are not
    62   supposed to replace a careful study of the sources.
    63 *}
    64 
    65 
    66 subsection {* The adaptation principle *}
    67 
    68 text {*
    69   Figure \ref{fig:adaptation} illustrates what \qt{adaptation} is
    70   conceptually supposed to be:
    71 
    72   \begin{figure}[here]
    73     \includegraphics{adaptation}
    74     \caption{The adaptation principle}
    75     \label{fig:adaptation}
    76   \end{figure}
    77 
    78   \noindent In the tame view, code generation acts as broker between
    79   @{text logic}, @{text "intermediate language"} and @{text "target
    80   language"} by means of @{text translation} and @{text
    81   serialisation}; for the latter, the serialiser has to observe the
    82   structure of the @{text language} itself plus some @{text reserved}
    83   keywords which have to be avoided for generated code.  However, if
    84   you consider @{text adaptation} mechanisms, the code generated by
    85   the serializer is just the tip of the iceberg:
    86 
    87   \begin{itemize}
    88 
    89     \item @{text serialisation} can be \emph{parametrised} such that
    90       logical entities are mapped to target-specific ones
    91       (e.g. target-specific list syntax, see also
    92       \secref{sec:adaptation_mechanisms})
    93 
    94     \item Such parametrisations can involve references to a
    95       target-specific standard @{text library} (e.g. using the @{text
    96       Haskell} @{verbatim Maybe} type instead of the @{text HOL}
    97       @{type "option"} type); if such are used, the corresponding
    98       identifiers (in our example, @{verbatim Maybe}, @{verbatim
    99       Nothing} and @{verbatim Just}) also have to be considered @{text
   100       reserved}.
   101 
   102     \item Even more, the user can enrich the library of the
   103       target-language by providing code snippets (\qt{@{text
   104       "includes"}}) which are prepended to any generated code (see
   105       \secref{sec:include}); this typically also involves further
   106       @{text reserved} identifiers.
   107 
   108   \end{itemize}
   109 
   110   \noindent As figure \ref{fig:adaptation} illustrates, all these
   111   adaptation mechanisms have to act consistently; it is at the
   112   discretion of the user to take care for this.
   113 *}
   114 
   115 subsection {* Common adaptation patterns *}
   116 
   117 text {*
   118   The @{theory HOL} @{theory Main} theory already provides a code
   119   generator setup which should be suitable for most applications.
   120   Common extensions and modifications are available by certain
   121   theories of the @{text HOL} library; beside being useful in
   122   applications, they may serve as a tutorial for customising the code
   123   generator setup (see below \secref{sec:adaptation_mechanisms}).
   124 
   125   \begin{description}
   126 
   127     \item[@{theory "Code_Integer"}] represents @{text HOL} integers by
   128        big integer literals in target languages.
   129 
   130     \item[@{theory "Code_Char"}] represents @{text HOL} characters by
   131        character literals in target languages.
   132 
   133     \item[@{theory "Code_Char_chr"}] like @{text "Code_Char"}, but
   134        also offers treatment of character codes; includes @{theory
   135        "Code_Char"}.
   136 
   137     \item[@{theory "Efficient_Nat"}] \label{eff_nat} implements
   138        natural numbers by integers, which in general will result in
   139        higher efficiency; pattern matching with @{term "0\<Colon>nat"} /
   140        @{const "Suc"} is eliminated; includes @{theory "Code_Integer"}
   141        and @{theory "Code_Numeral"}.
   142 
   143     \item[@{theory "Code_Numeral"}] provides an additional datatype
   144        @{typ index} which is mapped to target-language built-in
   145        integers.  Useful for code setups which involve e.g.~indexing
   146        of target-language arrays.
   147 
   148     \item[@{theory "String"}] provides an additional datatype @{typ
   149        String.literal} which is isomorphic to strings; @{typ
   150        String.literal}s are mapped to target-language strings.  Useful
   151        for code setups which involve e.g.~printing (error) messages.
   152 
   153   \end{description}
   154 
   155   \begin{warn}
   156     When importing any of these theories, they should form the last
   157     items in an import list.  Since these theories adapt the code
   158     generator setup in a non-conservative fashion, strange effects may
   159     occur otherwise.
   160   \end{warn}
   161 *}
   162 
   163 
   164 subsection {* Parametrising serialisation \label{sec:adaptation_mechanisms} *}
   165 
   166 text {*
   167   Consider the following function and its corresponding SML code:
   168 *}
   169 
   170 primrec %quote in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where
   171   "in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l"
   172 (*<*)
   173 code_type %invisible bool
   174   (SML)
   175 code_const %invisible True and False and "op \<and>" and Not
   176   (SML and and and)
   177 (*>*)
   178 text %quote {*
   179   \begin{typewriter}
   180     @{code_stmts in_interval (SML)}
   181   \end{typewriter}
   182 *}
   183 
   184 text {*
   185   \noindent Though this is correct code, it is a little bit
   186   unsatisfactory: boolean values and operators are materialised as
   187   distinguished entities with have nothing to do with the SML-built-in
   188   notion of \qt{bool}.  This results in less readable code;
   189   additionally, eager evaluation may cause programs to loop or break
   190   which would perfectly terminate when the existing SML @{verbatim
   191   "bool"} would be used.  To map the HOL @{typ bool} on SML @{verbatim
   192   "bool"}, we may use \qn{custom serialisations}:
   193 *}
   194 
   195 code_type %quotett bool
   196   (SML "bool")
   197 code_const %quotett True and False and "op \<and>"
   198   (SML "true" and "false" and "_ andalso _")
   199 
   200 text {*
   201   \noindent The @{command_def code_type} command takes a type constructor
   202   as arguments together with a list of custom serialisations.  Each
   203   custom serialisation starts with a target language identifier
   204   followed by an expression, which during code serialisation is
   205   inserted whenever the type constructor would occur.  For constants,
   206   @{command_def code_const} implements the corresponding mechanism.  Each
   207   ``@{verbatim "_"}'' in a serialisation expression is treated as a
   208   placeholder for the type constructor's (the constant's) arguments.
   209 *}
   210 
   211 text %quote {*
   212   \begin{typewriter}
   213     @{code_stmts in_interval (SML)}
   214   \end{typewriter}
   215 *}
   216 
   217 text {*
   218   \noindent This still is not perfect: the parentheses around the
   219   \qt{andalso} expression are superfluous.  Though the serialiser by
   220   no means attempts to imitate the rich Isabelle syntax framework, it
   221   provides some common idioms, notably associative infixes with
   222   precedences which may be used here:
   223 *}
   224 
   225 code_const %quotett "op \<and>"
   226   (SML infixl 1 "andalso")
   227 
   228 text %quote {*
   229   \begin{typewriter}
   230     @{code_stmts in_interval (SML)}
   231   \end{typewriter}
   232 *}
   233 
   234 text {*
   235   \noindent The attentive reader may ask how we assert that no
   236   generated code will accidentally overwrite.  For this reason the
   237   serialiser has an internal table of identifiers which have to be
   238   avoided to be used for new declarations.  Initially, this table
   239   typically contains the keywords of the target language.  It can be
   240   extended manually, thus avoiding accidental overwrites, using the
   241   @{command_def "code_reserved"} command:
   242 *}
   243 
   244 code_reserved %quote "\<SML>" bool true false andalso
   245 
   246 text {*
   247   \noindent Next, we try to map HOL pairs to SML pairs, using the
   248   infix ``@{verbatim "*"}'' type constructor and parentheses:
   249 *}
   250 (*<*)
   251 code_type %invisible prod
   252   (SML)
   253 code_const %invisible Pair
   254   (SML)
   255 (*>*)
   256 code_type %quotett prod
   257   (SML infix 2 "*")
   258 code_const %quotett Pair
   259   (SML "!((_),/ (_))")
   260 
   261 text {*
   262   \noindent The initial bang ``@{verbatim "!"}'' tells the serialiser
   263   never to put parentheses around the whole expression (they are
   264   already present), while the parentheses around argument place
   265   holders tell not to put parentheses around the arguments.  The slash
   266   ``@{verbatim "/"}'' (followed by arbitrary white space) inserts a
   267   space which may be used as a break if necessary during pretty
   268   printing.
   269 
   270   These examples give a glimpse what mechanisms custom serialisations
   271   provide; however their usage requires careful thinking in order not
   272   to introduce inconsistencies -- or, in other words: custom
   273   serialisations are completely axiomatic.
   274 
   275   A further noteworthy detail is that any special character in a
   276   custom serialisation may be quoted using ``@{verbatim "'"}''; thus,
   277   in ``@{verbatim "fn '_ => _"}'' the first ``@{verbatim "_"}'' is a
   278   proper underscore while the second ``@{verbatim "_"}'' is a
   279   placeholder.
   280 *}
   281 
   282 
   283 subsection {* @{text Haskell} serialisation *}
   284 
   285 text {*
   286   For convenience, the default @{text HOL} setup for @{text Haskell}
   287   maps the @{class equal} class to its counterpart in @{text Haskell},
   288   giving custom serialisations for the class @{class equal} (by command
   289   @{command_def code_class}) and its operation @{const [source] HOL.equal}
   290 *}
   291 
   292 code_class %quotett equal
   293   (Haskell "Eq")
   294 
   295 code_const %quotett "HOL.equal"
   296   (Haskell infixl 4 "==")
   297 
   298 text {*
   299   \noindent A problem now occurs whenever a type which is an instance
   300   of @{class equal} in @{text HOL} is mapped on a @{text
   301   Haskell}-built-in type which is also an instance of @{text Haskell}
   302   @{text Eq}:
   303 *}
   304 
   305 typedecl %quote bar
   306 
   307 instantiation %quote bar :: equal
   308 begin
   309 
   310 definition %quote "HOL.equal (x\<Colon>bar) y \<longleftrightarrow> x = y"
   311 
   312 instance %quote by default (simp add: equal_bar_def)
   313 
   314 end %quote (*<*)
   315 
   316 (*>*) code_type %quotett bar
   317   (Haskell "Integer")
   318 
   319 text {*
   320   \noindent The code generator would produce an additional instance,
   321   which of course is rejected by the @{text Haskell} compiler.  To
   322   suppress this additional instance, use @{command_def "code_instance"}:
   323 *}
   324 
   325 code_instance %quotett bar :: equal
   326   (Haskell -)
   327 
   328 
   329 subsection {* Enhancing the target language context \label{sec:include} *}
   330 
   331 text {*
   332   In rare cases it is necessary to \emph{enrich} the context of a
   333   target language; this is accomplished using the @{command_def
   334   "code_include"} command:
   335 *}
   336 
   337 code_include %quotett Haskell "Errno"
   338 {*errno i = error ("Error number: " ++ show i)*}
   339 
   340 code_reserved %quotett Haskell Errno
   341 
   342 text {*
   343   \noindent Such named @{text include}s are then prepended to every
   344   generated code.  Inspect such code in order to find out how
   345   @{command "code_include"} behaves with respect to a particular
   346   target language.
   347 *}
   348 
   349 end