src/Doc/Codegen/Adaptation.thy
author haftmann
Sun Feb 17 20:45:49 2013 +0100 (2013-02-17)
changeset 51172 16eb76ca1e4a
parent 51171 e8b2d90da499
child 52378 08dbf9ff2140
permissions -rw-r--r--
note on parallel computation
     1 theory Adaptation
     2 imports Setup
     3 begin
     4 
     5 setup %invisible {* Code_Target.extend_target ("\<SML>", ("SML", K I))
     6   #> Code_Target.extend_target ("\<SMLdummy>", ("Haskell", K I)) *}
     7 
     8 section {* Adaptation to target languages \label{sec:adaptation} *}
     9 
    10 subsection {* Adapting code generation *}
    11 
    12 text {*
    13   The aspects of code generation introduced so far have two aspects
    14   in common:
    15 
    16   \begin{itemize}
    17 
    18     \item They act uniformly, without reference to a specific target
    19        language.
    20 
    21     \item They are \emph{safe} in the sense that as long as you trust
    22        the code generator meta theory and implementation, you cannot
    23        produce programs that yield results which are not derivable in
    24        the logic.
    25 
    26   \end{itemize}
    27 
    28   \noindent In this section we will introduce means to \emph{adapt}
    29   the serialiser to a specific target language, i.e.~to print program
    30   fragments in a way which accommodates \qt{already existing}
    31   ingredients of a target language environment, for three reasons:
    32 
    33   \begin{itemize}
    34     \item improving readability and aesthetics of generated code
    35     \item gaining efficiency
    36     \item interface with language parts which have no direct counterpart
    37       in @{text "HOL"} (say, imperative data structures)
    38   \end{itemize}
    39 
    40   \noindent Generally, you should avoid using those features yourself
    41   \emph{at any cost}:
    42 
    43   \begin{itemize}
    44 
    45     \item The safe configuration methods act uniformly on every target
    46       language, whereas for adaptation you have to treat each target
    47       language separately.
    48 
    49     \item Application is extremely tedious since there is no
    50       abstraction which would allow for a static check, making it easy
    51       to produce garbage.
    52 
    53     \item Subtle errors can be introduced unconsciously.
    54 
    55   \end{itemize}
    56 
    57   \noindent However, even if you ought refrain from setting up
    58   adaptation yourself, already @{text "HOL"} comes with some
    59   reasonable default adaptations (say, using target language list
    60   syntax).  There also some common adaptation cases which you can
    61   setup by importing particular library theories.  In order to
    62   understand these, we provide some clues here; these however are not
    63   supposed to replace a careful study of the sources.
    64 *}
    65 
    66 
    67 subsection {* The adaptation principle *}
    68 
    69 text {*
    70   Figure \ref{fig:adaptation} illustrates what \qt{adaptation} is
    71   conceptually supposed to be:
    72 
    73   \begin{figure}[here]
    74     \includegraphics{adapt}
    75     \caption{The adaptation principle}
    76     \label{fig:adaptation}
    77   \end{figure}
    78 
    79   \noindent In the tame view, code generation acts as broker between
    80   @{text logic}, @{text "intermediate language"} and @{text "target
    81   language"} by means of @{text translation} and @{text
    82   serialisation}; for the latter, the serialiser has to observe the
    83   structure of the @{text language} itself plus some @{text reserved}
    84   keywords which have to be avoided for generated code.  However, if
    85   you consider @{text adaptation} mechanisms, the code generated by
    86   the serializer is just the tip of the iceberg:
    87 
    88   \begin{itemize}
    89 
    90     \item @{text serialisation} can be \emph{parametrised} such that
    91       logical entities are mapped to target-specific ones
    92       (e.g. target-specific list syntax, see also
    93       \secref{sec:adaptation_mechanisms})
    94 
    95     \item Such parametrisations can involve references to a
    96       target-specific standard @{text library} (e.g. using the @{text
    97       Haskell} @{verbatim Maybe} type instead of the @{text HOL}
    98       @{type "option"} type); if such are used, the corresponding
    99       identifiers (in our example, @{verbatim Maybe}, @{verbatim
   100       Nothing} and @{verbatim Just}) also have to be considered @{text
   101       reserved}.
   102 
   103     \item Even more, the user can enrich the library of the
   104       target-language by providing code snippets (\qt{@{text
   105       "includes"}}) which are prepended to any generated code (see
   106       \secref{sec:include}); this typically also involves further
   107       @{text reserved} identifiers.
   108 
   109   \end{itemize}
   110 
   111   \noindent As figure \ref{fig:adaptation} illustrates, all these
   112   adaptation mechanisms have to act consistently; it is at the
   113   discretion of the user to take care for this.
   114 *}
   115 
   116 subsection {* Common adaptation patterns *}
   117 
   118 text {*
   119   The @{theory HOL} @{theory Main} theory already provides a code
   120   generator setup which should be suitable for most applications.
   121   Common extensions and modifications are available by certain
   122   theories in @{text "HOL/Library"}; beside being useful in
   123   applications, they may serve as a tutorial for customising the code
   124   generator setup (see below \secref{sec:adaptation_mechanisms}).
   125 
   126   \begin{description}
   127 
   128     \item[@{theory "Code_Numeral"}] provides additional numeric
   129        types @{typ integer} and @{typ natural} isomorphic to types
   130        @{typ int} and @{typ nat} respectively.  Type @{typ integer}
   131        is mapped to target-language built-in integers; @{typ natural}
   132        is implemented as abstract type over @{typ integer}.
   133        Useful for code setups which involve e.g.~indexing
   134        of target-language arrays.  Part of @{text "HOL-Main"}.
   135 
   136     \item[@{text "Code_Target_Int"}] implements type @{typ int}
   137        by @{typ integer} and thus by target-language built-in integers.
   138 
   139     \item[@{text "Code_Binary_Nat"}] implements type
   140        @{typ nat} using a binary rather than a linear representation,
   141        which yields a considerable speedup for computations.
   142        Pattern matching with @{term "0\<Colon>nat"} / @{const "Suc"} is eliminated
   143        by a preprocessor.\label{abstract_nat}
   144 
   145     \item[@{text "Code_Target_Nat"}] implements type @{typ nat}
   146        by @{typ integer} and thus by target-language built-in integers.
   147        Pattern matching with @{term "0\<Colon>nat"} / @{const "Suc"} is eliminated
   148        by a preprocessor.
   149 
   150     \item[@{text "Code_Target_Numeral"}] is a convenience theory
   151        containing both @{text "Code_Target_Nat"} and
   152        @{text "Code_Target_Int"}.
   153 
   154     \item[@{text "Code_Char"}] represents @{text HOL} characters by
   155        character literals in target languages.
   156 
   157     \item[@{theory "String"}] provides an additional datatype @{typ
   158        String.literal} which is isomorphic to strings; @{typ
   159        String.literal}s are mapped to target-language strings.  Useful
   160        for code setups which involve e.g.~printing (error) messages.
   161        Part of @{text "HOL-Main"}.
   162 
   163     \item[@{theory "IArray"}] provides a type @{typ "'a iarray"}
   164        isomorphic to lists but implemented by (effectively immutable)
   165        arrays \emph{in SML only}.
   166 
   167   \end{description}
   168 *}
   169 
   170 
   171 subsection {* Parametrising serialisation \label{sec:adaptation_mechanisms} *}
   172 
   173 text {*
   174   Consider the following function and its corresponding SML code:
   175 *}
   176 
   177 primrec %quote in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where
   178   "in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l"
   179 (*<*)
   180 code_type %invisible bool
   181   (SML)
   182 code_const %invisible True and False and "op \<and>" and Not
   183   (SML and and and)
   184 (*>*)
   185 text %quotetypewriter {*
   186   @{code_stmts in_interval (SML)}
   187 *}
   188 
   189 text {*
   190   \noindent Though this is correct code, it is a little bit
   191   unsatisfactory: boolean values and operators are materialised as
   192   distinguished entities with have nothing to do with the SML-built-in
   193   notion of \qt{bool}.  This results in less readable code;
   194   additionally, eager evaluation may cause programs to loop or break
   195   which would perfectly terminate when the existing SML @{verbatim
   196   "bool"} would be used.  To map the HOL @{typ bool} on SML @{verbatim
   197   "bool"}, we may use \qn{custom serialisations}:
   198 *}
   199 
   200 code_type %quotett bool
   201   (SML "bool")
   202 code_const %quotett True and False and "op \<and>"
   203   (SML "true" and "false" and "_ andalso _")
   204 
   205 text {*
   206   \noindent The @{command_def code_type} command takes a type constructor
   207   as arguments together with a list of custom serialisations.  Each
   208   custom serialisation starts with a target language identifier
   209   followed by an expression, which during code serialisation is
   210   inserted whenever the type constructor would occur.  For constants,
   211   @{command_def code_const} implements the corresponding mechanism.  Each
   212   ``@{verbatim "_"}'' in a serialisation expression is treated as a
   213   placeholder for the type constructor's (the constant's) arguments.
   214 *}
   215 
   216 text %quotetypewriter {*
   217   @{code_stmts in_interval (SML)}
   218 *}
   219 
   220 text {*
   221   \noindent This still is not perfect: the parentheses around the
   222   \qt{andalso} expression are superfluous.  Though the serialiser by
   223   no means attempts to imitate the rich Isabelle syntax framework, it
   224   provides some common idioms, notably associative infixes with
   225   precedences which may be used here:
   226 *}
   227 
   228 code_const %quotett "op \<and>"
   229   (SML infixl 1 "andalso")
   230 
   231 text %quotetypewriter {*
   232   @{code_stmts in_interval (SML)}
   233 *}
   234 
   235 text {*
   236   \noindent The attentive reader may ask how we assert that no
   237   generated code will accidentally overwrite.  For this reason the
   238   serialiser has an internal table of identifiers which have to be
   239   avoided to be used for new declarations.  Initially, this table
   240   typically contains the keywords of the target language.  It can be
   241   extended manually, thus avoiding accidental overwrites, using the
   242   @{command_def "code_reserved"} command:
   243 *}
   244 
   245 code_reserved %quote "\<SMLdummy>" bool true false andalso
   246 
   247 text {*
   248   \noindent Next, we try to map HOL pairs to SML pairs, using the
   249   infix ``@{verbatim "*"}'' type constructor and parentheses:
   250 *}
   251 (*<*)
   252 code_type %invisible prod
   253   (SML)
   254 code_const %invisible Pair
   255   (SML)
   256 (*>*)
   257 code_type %quotett prod
   258   (SML infix 2 "*")
   259 code_const %quotett Pair
   260   (SML "!((_),/ (_))")
   261 
   262 text {*
   263   \noindent The initial bang ``@{verbatim "!"}'' tells the serialiser
   264   never to put parentheses around the whole expression (they are
   265   already present), while the parentheses around argument place
   266   holders tell not to put parentheses around the arguments.  The slash
   267   ``@{verbatim "/"}'' (followed by arbitrary white space) inserts a
   268   space which may be used as a break if necessary during pretty
   269   printing.
   270 
   271   These examples give a glimpse what mechanisms custom serialisations
   272   provide; however their usage requires careful thinking in order not
   273   to introduce inconsistencies -- or, in other words: custom
   274   serialisations are completely axiomatic.
   275 
   276   A further noteworthy detail is that any special character in a
   277   custom serialisation may be quoted using ``@{verbatim "'"}''; thus,
   278   in ``@{verbatim "fn '_ => _"}'' the first ``@{verbatim "_"}'' is a
   279   proper underscore while the second ``@{verbatim "_"}'' is a
   280   placeholder.
   281 *}
   282 
   283 
   284 subsection {* @{text Haskell} serialisation *}
   285 
   286 text {*
   287   For convenience, the default @{text HOL} setup for @{text Haskell}
   288   maps the @{class equal} class to its counterpart in @{text Haskell},
   289   giving custom serialisations for the class @{class equal} (by command
   290   @{command_def code_class}) and its operation @{const [source] HOL.equal}
   291 *}
   292 
   293 code_class %quotett equal
   294   (Haskell "Eq")
   295 
   296 code_const %quotett "HOL.equal"
   297   (Haskell infixl 4 "==")
   298 
   299 text {*
   300   \noindent A problem now occurs whenever a type which is an instance
   301   of @{class equal} in @{text HOL} is mapped on a @{text
   302   Haskell}-built-in type which is also an instance of @{text Haskell}
   303   @{text Eq}:
   304 *}
   305 
   306 typedecl %quote bar
   307 
   308 instantiation %quote bar :: equal
   309 begin
   310 
   311 definition %quote "HOL.equal (x\<Colon>bar) y \<longleftrightarrow> x = y"
   312 
   313 instance %quote by default (simp add: equal_bar_def)
   314 
   315 end %quote (*<*)
   316 
   317 (*>*) code_type %quotett bar
   318   (Haskell "Integer")
   319 
   320 text {*
   321   \noindent The code generator would produce an additional instance,
   322   which of course is rejected by the @{text Haskell} compiler.  To
   323   suppress this additional instance, use @{command_def "code_instance"}:
   324 *}
   325 
   326 code_instance %quotett bar :: equal
   327   (Haskell -)
   328 
   329 
   330 subsection {* Enhancing the target language context \label{sec:include} *}
   331 
   332 text {*
   333   In rare cases it is necessary to \emph{enrich} the context of a
   334   target language; this is accomplished using the @{command_def
   335   "code_include"} command:
   336 *}
   337 
   338 code_include %quotett Haskell "Errno"
   339 {*errno i = error ("Error number: " ++ show i)*}
   340 
   341 code_reserved %quotett Haskell Errno
   342 
   343 text {*
   344   \noindent Such named @{text include}s are then prepended to every
   345   generated code.  Inspect such code in order to find out how
   346   @{command "code_include"} behaves with respect to a particular
   347   target language.
   348 *}
   349 
   350 end
   351