doc-src/Codegen/Adaptation.thy
changeset 48985 5386df44a037
parent 48984 f51d4a302962
child 48986 037d32448e29
equal deleted inserted replaced
48984:f51d4a302962 48985:5386df44a037
     1 theory Adaptation
       
     2 imports Setup
       
     3 begin
       
     4 
       
     5 setup %invisible {* Code_Target.extend_target ("\<SML>", ("SML", K I))
       
     6   #> Code_Target.extend_target ("\<SMLdummy>", ("Haskell", K I)) *}
       
     7 
       
     8 section {* Adaptation to target languages \label{sec:adaptation} *}
       
     9 
       
    10 subsection {* Adapting code generation *}
       
    11 
       
    12 text {*
       
    13   The aspects of code generation introduced so far have two aspects
       
    14   in common:
       
    15 
       
    16   \begin{itemize}
       
    17 
       
    18     \item They act uniformly, without reference to a specific target
       
    19        language.
       
    20 
       
    21     \item They are \emph{safe} in the sense that as long as you trust
       
    22        the code generator meta theory and implementation, you cannot
       
    23        produce programs that yield results which are not derivable in
       
    24        the logic.
       
    25 
       
    26   \end{itemize}
       
    27 
       
    28   \noindent In this section we will introduce means to \emph{adapt}
       
    29   the serialiser to a specific target language, i.e.~to print program
       
    30   fragments in a way which accommodates \qt{already existing}
       
    31   ingredients of a target language environment, for three reasons:
       
    32 
       
    33   \begin{itemize}
       
    34     \item improving readability and aesthetics of generated code
       
    35     \item gaining efficiency
       
    36     \item interface with language parts which have no direct counterpart
       
    37       in @{text "HOL"} (say, imperative data structures)
       
    38   \end{itemize}
       
    39 
       
    40   \noindent Generally, you should avoid using those features yourself
       
    41   \emph{at any cost}:
       
    42 
       
    43   \begin{itemize}
       
    44 
       
    45     \item The safe configuration methods act uniformly on every target
       
    46       language, whereas for adaptation you have to treat each target
       
    47       language separately.
       
    48 
       
    49     \item Application is extremely tedious since there is no
       
    50       abstraction which would allow for a static check, making it easy
       
    51       to produce garbage.
       
    52 
       
    53     \item Subtle errors can be introduced unconsciously.
       
    54 
       
    55   \end{itemize}
       
    56 
       
    57   \noindent However, even if you ought refrain from setting up
       
    58   adaptation yourself, already the @{text "HOL"} comes with some
       
    59   reasonable default adaptations (say, using target language list
       
    60   syntax).  There also some common adaptation cases which you can
       
    61   setup by importing particular library theories.  In order to
       
    62   understand these, we provide some clues here; these however are not
       
    63   supposed to replace a careful study of the sources.
       
    64 *}
       
    65 
       
    66 
       
    67 subsection {* The adaptation principle *}
       
    68 
       
    69 text {*
       
    70   Figure \ref{fig:adaptation} illustrates what \qt{adaptation} is
       
    71   conceptually supposed to be:
       
    72 
       
    73   \begin{figure}[here]
       
    74     \includegraphics{adapt}
       
    75     \caption{The adaptation principle}
       
    76     \label{fig:adaptation}
       
    77   \end{figure}
       
    78 
       
    79   \noindent In the tame view, code generation acts as broker between
       
    80   @{text logic}, @{text "intermediate language"} and @{text "target
       
    81   language"} by means of @{text translation} and @{text
       
    82   serialisation}; for the latter, the serialiser has to observe the
       
    83   structure of the @{text language} itself plus some @{text reserved}
       
    84   keywords which have to be avoided for generated code.  However, if
       
    85   you consider @{text adaptation} mechanisms, the code generated by
       
    86   the serializer is just the tip of the iceberg:
       
    87 
       
    88   \begin{itemize}
       
    89 
       
    90     \item @{text serialisation} can be \emph{parametrised} such that
       
    91       logical entities are mapped to target-specific ones
       
    92       (e.g. target-specific list syntax, see also
       
    93       \secref{sec:adaptation_mechanisms})
       
    94 
       
    95     \item Such parametrisations can involve references to a
       
    96       target-specific standard @{text library} (e.g. using the @{text
       
    97       Haskell} @{verbatim Maybe} type instead of the @{text HOL}
       
    98       @{type "option"} type); if such are used, the corresponding
       
    99       identifiers (in our example, @{verbatim Maybe}, @{verbatim
       
   100       Nothing} and @{verbatim Just}) also have to be considered @{text
       
   101       reserved}.
       
   102 
       
   103     \item Even more, the user can enrich the library of the
       
   104       target-language by providing code snippets (\qt{@{text
       
   105       "includes"}}) which are prepended to any generated code (see
       
   106       \secref{sec:include}); this typically also involves further
       
   107       @{text reserved} identifiers.
       
   108 
       
   109   \end{itemize}
       
   110 
       
   111   \noindent As figure \ref{fig:adaptation} illustrates, all these
       
   112   adaptation mechanisms have to act consistently; it is at the
       
   113   discretion of the user to take care for this.
       
   114 *}
       
   115 
       
   116 subsection {* Common adaptation patterns *}
       
   117 
       
   118 text {*
       
   119   The @{theory HOL} @{theory Main} theory already provides a code
       
   120   generator setup which should be suitable for most applications.
       
   121   Common extensions and modifications are available by certain
       
   122   theories of the @{text HOL} library; beside being useful in
       
   123   applications, they may serve as a tutorial for customising the code
       
   124   generator setup (see below \secref{sec:adaptation_mechanisms}).
       
   125 
       
   126   \begin{description}
       
   127 
       
   128     \item[@{text "Code_Integer"}] represents @{text HOL} integers by
       
   129        big integer literals in target languages.
       
   130 
       
   131     \item[@{text "Code_Char"}] represents @{text HOL} characters by
       
   132        character literals in target languages.
       
   133 
       
   134     \item[@{text "Code_Char_chr"}] like @{text "Code_Char"}, but
       
   135        also offers treatment of character codes; includes @{text
       
   136        "Code_Char"}.
       
   137 
       
   138     \item[@{text "Efficient_Nat"}] \label{eff_nat} implements
       
   139        natural numbers by integers, which in general will result in
       
   140        higher efficiency; pattern matching with @{term "0\<Colon>nat"} /
       
   141        @{const "Suc"} is eliminated; includes @{text "Code_Integer"}
       
   142        and @{text "Code_Numeral"}.
       
   143 
       
   144     \item[@{theory "Code_Numeral"}] provides an additional datatype
       
   145        @{typ index} which is mapped to target-language built-in
       
   146        integers.  Useful for code setups which involve e.g.~indexing
       
   147        of target-language arrays.  Part of @{text "HOL-Main"}.
       
   148 
       
   149     \item[@{theory "String"}] provides an additional datatype @{typ
       
   150        String.literal} which is isomorphic to strings; @{typ
       
   151        String.literal}s are mapped to target-language strings.  Useful
       
   152        for code setups which involve e.g.~printing (error) messages.
       
   153        Part of @{text "HOL-Main"}.
       
   154 
       
   155   \end{description}
       
   156 
       
   157   \begin{warn}
       
   158     When importing any of those theories which are not part of
       
   159     @{text "HOL-Main"}, they should form the last
       
   160     items in an import list.  Since these theories adapt the code
       
   161     generator setup in a non-conservative fashion, strange effects may
       
   162     occur otherwise.
       
   163   \end{warn}
       
   164 *}
       
   165 
       
   166 
       
   167 subsection {* Parametrising serialisation \label{sec:adaptation_mechanisms} *}
       
   168 
       
   169 text {*
       
   170   Consider the following function and its corresponding SML code:
       
   171 *}
       
   172 
       
   173 primrec %quote in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where
       
   174   "in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l"
       
   175 (*<*)
       
   176 code_type %invisible bool
       
   177   (SML)
       
   178 code_const %invisible True and False and "op \<and>" and Not
       
   179   (SML and and and)
       
   180 (*>*)
       
   181 text %quotetypewriter {*
       
   182   @{code_stmts in_interval (SML)}
       
   183 *}
       
   184 
       
   185 text {*
       
   186   \noindent Though this is correct code, it is a little bit
       
   187   unsatisfactory: boolean values and operators are materialised as
       
   188   distinguished entities with have nothing to do with the SML-built-in
       
   189   notion of \qt{bool}.  This results in less readable code;
       
   190   additionally, eager evaluation may cause programs to loop or break
       
   191   which would perfectly terminate when the existing SML @{verbatim
       
   192   "bool"} would be used.  To map the HOL @{typ bool} on SML @{verbatim
       
   193   "bool"}, we may use \qn{custom serialisations}:
       
   194 *}
       
   195 
       
   196 code_type %quotett bool
       
   197   (SML "bool")
       
   198 code_const %quotett True and False and "op \<and>"
       
   199   (SML "true" and "false" and "_ andalso _")
       
   200 
       
   201 text {*
       
   202   \noindent The @{command_def code_type} command takes a type constructor
       
   203   as arguments together with a list of custom serialisations.  Each
       
   204   custom serialisation starts with a target language identifier
       
   205   followed by an expression, which during code serialisation is
       
   206   inserted whenever the type constructor would occur.  For constants,
       
   207   @{command_def code_const} implements the corresponding mechanism.  Each
       
   208   ``@{verbatim "_"}'' in a serialisation expression is treated as a
       
   209   placeholder for the type constructor's (the constant's) arguments.
       
   210 *}
       
   211 
       
   212 text %quotetypewriter {*
       
   213   @{code_stmts in_interval (SML)}
       
   214 *}
       
   215 
       
   216 text {*
       
   217   \noindent This still is not perfect: the parentheses around the
       
   218   \qt{andalso} expression are superfluous.  Though the serialiser by
       
   219   no means attempts to imitate the rich Isabelle syntax framework, it
       
   220   provides some common idioms, notably associative infixes with
       
   221   precedences which may be used here:
       
   222 *}
       
   223 
       
   224 code_const %quotett "op \<and>"
       
   225   (SML infixl 1 "andalso")
       
   226 
       
   227 text %quotetypewriter {*
       
   228   @{code_stmts in_interval (SML)}
       
   229 *}
       
   230 
       
   231 text {*
       
   232   \noindent The attentive reader may ask how we assert that no
       
   233   generated code will accidentally overwrite.  For this reason the
       
   234   serialiser has an internal table of identifiers which have to be
       
   235   avoided to be used for new declarations.  Initially, this table
       
   236   typically contains the keywords of the target language.  It can be
       
   237   extended manually, thus avoiding accidental overwrites, using the
       
   238   @{command_def "code_reserved"} command:
       
   239 *}
       
   240 
       
   241 code_reserved %quote "\<SMLdummy>" bool true false andalso
       
   242 
       
   243 text {*
       
   244   \noindent Next, we try to map HOL pairs to SML pairs, using the
       
   245   infix ``@{verbatim "*"}'' type constructor and parentheses:
       
   246 *}
       
   247 (*<*)
       
   248 code_type %invisible prod
       
   249   (SML)
       
   250 code_const %invisible Pair
       
   251   (SML)
       
   252 (*>*)
       
   253 code_type %quotett prod
       
   254   (SML infix 2 "*")
       
   255 code_const %quotett Pair
       
   256   (SML "!((_),/ (_))")
       
   257 
       
   258 text {*
       
   259   \noindent The initial bang ``@{verbatim "!"}'' tells the serialiser
       
   260   never to put parentheses around the whole expression (they are
       
   261   already present), while the parentheses around argument place
       
   262   holders tell not to put parentheses around the arguments.  The slash
       
   263   ``@{verbatim "/"}'' (followed by arbitrary white space) inserts a
       
   264   space which may be used as a break if necessary during pretty
       
   265   printing.
       
   266 
       
   267   These examples give a glimpse what mechanisms custom serialisations
       
   268   provide; however their usage requires careful thinking in order not
       
   269   to introduce inconsistencies -- or, in other words: custom
       
   270   serialisations are completely axiomatic.
       
   271 
       
   272   A further noteworthy detail is that any special character in a
       
   273   custom serialisation may be quoted using ``@{verbatim "'"}''; thus,
       
   274   in ``@{verbatim "fn '_ => _"}'' the first ``@{verbatim "_"}'' is a
       
   275   proper underscore while the second ``@{verbatim "_"}'' is a
       
   276   placeholder.
       
   277 *}
       
   278 
       
   279 
       
   280 subsection {* @{text Haskell} serialisation *}
       
   281 
       
   282 text {*
       
   283   For convenience, the default @{text HOL} setup for @{text Haskell}
       
   284   maps the @{class equal} class to its counterpart in @{text Haskell},
       
   285   giving custom serialisations for the class @{class equal} (by command
       
   286   @{command_def code_class}) and its operation @{const [source] HOL.equal}
       
   287 *}
       
   288 
       
   289 code_class %quotett equal
       
   290   (Haskell "Eq")
       
   291 
       
   292 code_const %quotett "HOL.equal"
       
   293   (Haskell infixl 4 "==")
       
   294 
       
   295 text {*
       
   296   \noindent A problem now occurs whenever a type which is an instance
       
   297   of @{class equal} in @{text HOL} is mapped on a @{text
       
   298   Haskell}-built-in type which is also an instance of @{text Haskell}
       
   299   @{text Eq}:
       
   300 *}
       
   301 
       
   302 typedecl %quote bar
       
   303 
       
   304 instantiation %quote bar :: equal
       
   305 begin
       
   306 
       
   307 definition %quote "HOL.equal (x\<Colon>bar) y \<longleftrightarrow> x = y"
       
   308 
       
   309 instance %quote by default (simp add: equal_bar_def)
       
   310 
       
   311 end %quote (*<*)
       
   312 
       
   313 (*>*) code_type %quotett bar
       
   314   (Haskell "Integer")
       
   315 
       
   316 text {*
       
   317   \noindent The code generator would produce an additional instance,
       
   318   which of course is rejected by the @{text Haskell} compiler.  To
       
   319   suppress this additional instance, use @{command_def "code_instance"}:
       
   320 *}
       
   321 
       
   322 code_instance %quotett bar :: equal
       
   323   (Haskell -)
       
   324 
       
   325 
       
   326 subsection {* Enhancing the target language context \label{sec:include} *}
       
   327 
       
   328 text {*
       
   329   In rare cases it is necessary to \emph{enrich} the context of a
       
   330   target language; this is accomplished using the @{command_def
       
   331   "code_include"} command:
       
   332 *}
       
   333 
       
   334 code_include %quotett Haskell "Errno"
       
   335 {*errno i = error ("Error number: " ++ show i)*}
       
   336 
       
   337 code_reserved %quotett Haskell Errno
       
   338 
       
   339 text {*
       
   340   \noindent Such named @{text include}s are then prepended to every
       
   341   generated code.  Inspect such code in order to find out how
       
   342   @{command "code_include"} behaves with respect to a particular
       
   343   target language.
       
   344 *}
       
   345 
       
   346 end
       
   347