src/Doc/Codegen/Adaptation.thy
author haftmann
Fri Feb 15 11:47:33 2013 +0100 (2013-02-15)
changeset 51160 599ff65b85e2
parent 51143 0a2371e7ced3
child 51162 310b94ed1815
permissions -rw-r--r--
systematic conversions between nat and nibble/char;
more uniform approaches to execute operations on nibble/char
     1 theory Adaptation
     2 imports Setup
     3 begin
     4 
     5 setup %invisible {* Code_Target.extend_target ("\<SML>", ("SML", K I))
     6   #> Code_Target.extend_target ("\<SMLdummy>", ("Haskell", K I)) *}
     7 
     8 section {* Adaptation to target languages \label{sec:adaptation} *}
     9 
    10 subsection {* Adapting code generation *}
    11 
    12 text {*
    13   The aspects of code generation introduced so far have two aspects
    14   in common:
    15 
    16   \begin{itemize}
    17 
    18     \item They act uniformly, without reference to a specific target
    19        language.
    20 
    21     \item They are \emph{safe} in the sense that as long as you trust
    22        the code generator meta theory and implementation, you cannot
    23        produce programs that yield results which are not derivable in
    24        the logic.
    25 
    26   \end{itemize}
    27 
    28   \noindent In this section we will introduce means to \emph{adapt}
    29   the serialiser to a specific target language, i.e.~to print program
    30   fragments in a way which accommodates \qt{already existing}
    31   ingredients of a target language environment, for three reasons:
    32 
    33   \begin{itemize}
    34     \item improving readability and aesthetics of generated code
    35     \item gaining efficiency
    36     \item interface with language parts which have no direct counterpart
    37       in @{text "HOL"} (say, imperative data structures)
    38   \end{itemize}
    39 
    40   \noindent Generally, you should avoid using those features yourself
    41   \emph{at any cost}:
    42 
    43   \begin{itemize}
    44 
    45     \item The safe configuration methods act uniformly on every target
    46       language, whereas for adaptation you have to treat each target
    47       language separately.
    48 
    49     \item Application is extremely tedious since there is no
    50       abstraction which would allow for a static check, making it easy
    51       to produce garbage.
    52 
    53     \item Subtle errors can be introduced unconsciously.
    54 
    55   \end{itemize}
    56 
    57   \noindent However, even if you ought refrain from setting up
    58   adaptation yourself, already the @{text "HOL"} comes with some
    59   reasonable default adaptations (say, using target language list
    60   syntax).  There also some common adaptation cases which you can
    61   setup by importing particular library theories.  In order to
    62   understand these, we provide some clues here; these however are not
    63   supposed to replace a careful study of the sources.
    64 *}
    65 
    66 
    67 subsection {* The adaptation principle *}
    68 
    69 text {*
    70   Figure \ref{fig:adaptation} illustrates what \qt{adaptation} is
    71   conceptually supposed to be:
    72 
    73   \begin{figure}[here]
    74     \includegraphics{adapt}
    75     \caption{The adaptation principle}
    76     \label{fig:adaptation}
    77   \end{figure}
    78 
    79   \noindent In the tame view, code generation acts as broker between
    80   @{text logic}, @{text "intermediate language"} and @{text "target
    81   language"} by means of @{text translation} and @{text
    82   serialisation}; for the latter, the serialiser has to observe the
    83   structure of the @{text language} itself plus some @{text reserved}
    84   keywords which have to be avoided for generated code.  However, if
    85   you consider @{text adaptation} mechanisms, the code generated by
    86   the serializer is just the tip of the iceberg:
    87 
    88   \begin{itemize}
    89 
    90     \item @{text serialisation} can be \emph{parametrised} such that
    91       logical entities are mapped to target-specific ones
    92       (e.g. target-specific list syntax, see also
    93       \secref{sec:adaptation_mechanisms})
    94 
    95     \item Such parametrisations can involve references to a
    96       target-specific standard @{text library} (e.g. using the @{text
    97       Haskell} @{verbatim Maybe} type instead of the @{text HOL}
    98       @{type "option"} type); if such are used, the corresponding
    99       identifiers (in our example, @{verbatim Maybe}, @{verbatim
   100       Nothing} and @{verbatim Just}) also have to be considered @{text
   101       reserved}.
   102 
   103     \item Even more, the user can enrich the library of the
   104       target-language by providing code snippets (\qt{@{text
   105       "includes"}}) which are prepended to any generated code (see
   106       \secref{sec:include}); this typically also involves further
   107       @{text reserved} identifiers.
   108 
   109   \end{itemize}
   110 
   111   \noindent As figure \ref{fig:adaptation} illustrates, all these
   112   adaptation mechanisms have to act consistently; it is at the
   113   discretion of the user to take care for this.
   114 *}
   115 
   116 subsection {* Common adaptation patterns *}
   117 
   118 text {*
   119   The @{theory HOL} @{theory Main} theory already provides a code
   120   generator setup which should be suitable for most applications.
   121   Common extensions and modifications are available by certain
   122   theories of the @{text HOL} library; beside being useful in
   123   applications, they may serve as a tutorial for customising the code
   124   generator setup (see below \secref{sec:adaptation_mechanisms}).
   125 
   126   \begin{description}
   127 
   128     \item[@{theory "Code_Numeral"}] provides additional numeric
   129        types @{typ integer} and @{typ natural} isomorphic to types
   130        @{typ int} and @{typ nat} respectively.  Type @{typ integer}
   131        is mapped to target-language built-in integers; @{typ natural}
   132        is implemented as abstract type over @{typ integer}.
   133        Useful for code setups which involve e.g.~indexing
   134        of target-language arrays.  Part of @{text "HOL-Main"}.
   135 
   136     \item[@{text "Code_Target_Int"}] implements type @{typ int}
   137        by @{typ integer} and thus by target-language built-in integers.
   138 
   139     \item[@{text "Code_Binary_Nat"}] \label{eff_nat} implements type
   140        @{typ nat} using a binary rather than a linear representation,
   141        which yields a considerable speedup for computations.
   142        Pattern matching with @{term "0\<Colon>nat"} / @{const "Suc"} is eliminated
   143        by a preprocessor.
   144 
   145     \item[@{text "Code_Target_Nat"}] implements type @{typ int}
   146        by @{typ integer} and thus by target-language built-in integers;
   147        contains @{text "Code_Binary_Nat"} as a prerequisite.
   148 
   149     \item[@{text "Code_Target_Numeral"}] is a convenience node
   150        containing both @{text "Code_Target_Nat"} and
   151        @{text "Code_Target_Int"}.
   152 
   153     \item[@{text "Efficient_Nat"}] \label{eff_nat} implements
   154        natural numbers by integers, which in general will result in
   155        higher efficiency; pattern matching with @{term "0\<Colon>nat"} /
   156        @{const "Suc"} is eliminated; includes @{text "Code_Integer"}
   157        and @{text "Code_Numeral"}.
   158 
   159     \item[@{text "Code_Char"}] represents @{text HOL} characters by
   160        character literals in target languages.
   161 
   162     \item[@{theory "String"}] provides an additional datatype @{typ
   163        String.literal} which is isomorphic to strings; @{typ
   164        String.literal}s are mapped to target-language strings.  Useful
   165        for code setups which involve e.g.~printing (error) messages.
   166        Part of @{text "HOL-Main"}.
   167 
   168   \end{description}
   169 
   170   \begin{warn}
   171     When importing any of those theories which are not part of
   172     @{text "HOL-Main"}, they should form the last
   173     items in an import list.  Since these theories adapt the code
   174     generator setup in a non-conservative fashion, strange effects may
   175     occur otherwise.
   176   \end{warn}
   177 *}
   178 
   179 
   180 subsection {* Parametrising serialisation \label{sec:adaptation_mechanisms} *}
   181 
   182 text {*
   183   Consider the following function and its corresponding SML code:
   184 *}
   185 
   186 primrec %quote in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where
   187   "in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l"
   188 (*<*)
   189 code_type %invisible bool
   190   (SML)
   191 code_const %invisible True and False and "op \<and>" and Not
   192   (SML and and and)
   193 (*>*)
   194 text %quotetypewriter {*
   195   @{code_stmts in_interval (SML)}
   196 *}
   197 
   198 text {*
   199   \noindent Though this is correct code, it is a little bit
   200   unsatisfactory: boolean values and operators are materialised as
   201   distinguished entities with have nothing to do with the SML-built-in
   202   notion of \qt{bool}.  This results in less readable code;
   203   additionally, eager evaluation may cause programs to loop or break
   204   which would perfectly terminate when the existing SML @{verbatim
   205   "bool"} would be used.  To map the HOL @{typ bool} on SML @{verbatim
   206   "bool"}, we may use \qn{custom serialisations}:
   207 *}
   208 
   209 code_type %quotett bool
   210   (SML "bool")
   211 code_const %quotett True and False and "op \<and>"
   212   (SML "true" and "false" and "_ andalso _")
   213 
   214 text {*
   215   \noindent The @{command_def code_type} command takes a type constructor
   216   as arguments together with a list of custom serialisations.  Each
   217   custom serialisation starts with a target language identifier
   218   followed by an expression, which during code serialisation is
   219   inserted whenever the type constructor would occur.  For constants,
   220   @{command_def code_const} implements the corresponding mechanism.  Each
   221   ``@{verbatim "_"}'' in a serialisation expression is treated as a
   222   placeholder for the type constructor's (the constant's) arguments.
   223 *}
   224 
   225 text %quotetypewriter {*
   226   @{code_stmts in_interval (SML)}
   227 *}
   228 
   229 text {*
   230   \noindent This still is not perfect: the parentheses around the
   231   \qt{andalso} expression are superfluous.  Though the serialiser by
   232   no means attempts to imitate the rich Isabelle syntax framework, it
   233   provides some common idioms, notably associative infixes with
   234   precedences which may be used here:
   235 *}
   236 
   237 code_const %quotett "op \<and>"
   238   (SML infixl 1 "andalso")
   239 
   240 text %quotetypewriter {*
   241   @{code_stmts in_interval (SML)}
   242 *}
   243 
   244 text {*
   245   \noindent The attentive reader may ask how we assert that no
   246   generated code will accidentally overwrite.  For this reason the
   247   serialiser has an internal table of identifiers which have to be
   248   avoided to be used for new declarations.  Initially, this table
   249   typically contains the keywords of the target language.  It can be
   250   extended manually, thus avoiding accidental overwrites, using the
   251   @{command_def "code_reserved"} command:
   252 *}
   253 
   254 code_reserved %quote "\<SMLdummy>" bool true false andalso
   255 
   256 text {*
   257   \noindent Next, we try to map HOL pairs to SML pairs, using the
   258   infix ``@{verbatim "*"}'' type constructor and parentheses:
   259 *}
   260 (*<*)
   261 code_type %invisible prod
   262   (SML)
   263 code_const %invisible Pair
   264   (SML)
   265 (*>*)
   266 code_type %quotett prod
   267   (SML infix 2 "*")
   268 code_const %quotett Pair
   269   (SML "!((_),/ (_))")
   270 
   271 text {*
   272   \noindent The initial bang ``@{verbatim "!"}'' tells the serialiser
   273   never to put parentheses around the whole expression (they are
   274   already present), while the parentheses around argument place
   275   holders tell not to put parentheses around the arguments.  The slash
   276   ``@{verbatim "/"}'' (followed by arbitrary white space) inserts a
   277   space which may be used as a break if necessary during pretty
   278   printing.
   279 
   280   These examples give a glimpse what mechanisms custom serialisations
   281   provide; however their usage requires careful thinking in order not
   282   to introduce inconsistencies -- or, in other words: custom
   283   serialisations are completely axiomatic.
   284 
   285   A further noteworthy detail is that any special character in a
   286   custom serialisation may be quoted using ``@{verbatim "'"}''; thus,
   287   in ``@{verbatim "fn '_ => _"}'' the first ``@{verbatim "_"}'' is a
   288   proper underscore while the second ``@{verbatim "_"}'' is a
   289   placeholder.
   290 *}
   291 
   292 
   293 subsection {* @{text Haskell} serialisation *}
   294 
   295 text {*
   296   For convenience, the default @{text HOL} setup for @{text Haskell}
   297   maps the @{class equal} class to its counterpart in @{text Haskell},
   298   giving custom serialisations for the class @{class equal} (by command
   299   @{command_def code_class}) and its operation @{const [source] HOL.equal}
   300 *}
   301 
   302 code_class %quotett equal
   303   (Haskell "Eq")
   304 
   305 code_const %quotett "HOL.equal"
   306   (Haskell infixl 4 "==")
   307 
   308 text {*
   309   \noindent A problem now occurs whenever a type which is an instance
   310   of @{class equal} in @{text HOL} is mapped on a @{text
   311   Haskell}-built-in type which is also an instance of @{text Haskell}
   312   @{text Eq}:
   313 *}
   314 
   315 typedecl %quote bar
   316 
   317 instantiation %quote bar :: equal
   318 begin
   319 
   320 definition %quote "HOL.equal (x\<Colon>bar) y \<longleftrightarrow> x = y"
   321 
   322 instance %quote by default (simp add: equal_bar_def)
   323 
   324 end %quote (*<*)
   325 
   326 (*>*) code_type %quotett bar
   327   (Haskell "Integer")
   328 
   329 text {*
   330   \noindent The code generator would produce an additional instance,
   331   which of course is rejected by the @{text Haskell} compiler.  To
   332   suppress this additional instance, use @{command_def "code_instance"}:
   333 *}
   334 
   335 code_instance %quotett bar :: equal
   336   (Haskell -)
   337 
   338 
   339 subsection {* Enhancing the target language context \label{sec:include} *}
   340 
   341 text {*
   342   In rare cases it is necessary to \emph{enrich} the context of a
   343   target language; this is accomplished using the @{command_def
   344   "code_include"} command:
   345 *}
   346 
   347 code_include %quotett Haskell "Errno"
   348 {*errno i = error ("Error number: " ++ show i)*}
   349 
   350 code_reserved %quotett Haskell Errno
   351 
   352 text {*
   353   \noindent Such named @{text include}s are then prepended to every
   354   generated code.  Inspect such code in order to find out how
   355   @{command "code_include"} behaves with respect to a particular
   356   target language.
   357 *}
   358 
   359 end
   360