doc-src/IsarAdvanced/Codegen/Thy/Codegen.thy
changeset 28419 f65e8b318581
parent 28143 e5c6c4aac52c
equal deleted inserted replaced
28418:4ffb62675ade 28419:f65e8b318581
     1 
       
     2 (* $Id$ *)
       
     3 
       
     4 (*<*)
       
     5 theory Codegen
       
     6 imports Main
       
     7 uses "../../../antiquote_setup.ML"
       
     8 begin
       
     9 
       
    10 ML {* Code_Target.code_width := 74 *}
       
    11 (*>*)
       
    12 
       
    13 chapter {* Code generation from Isabelle theories *}
       
    14 
       
    15 section {* Introduction *}
       
    16 
       
    17 subsection {* Motivation *}
       
    18 
       
    19 text {*
       
    20   Executing formal specifications as programs is a well-established
       
    21   topic in the theorem proving community.  With increasing
       
    22   application of theorem proving systems in the area of
       
    23   software development and verification, its relevance manifests
       
    24   for running test cases and rapid prototyping.  In logical
       
    25   calculi like constructive type theory,
       
    26   a notion of executability is implicit due to the nature
       
    27   of the calculus.  In contrast, specifications in Isabelle
       
    28   can be highly non-executable.  In order to bridge
       
    29   the gap between logic and executable specifications,
       
    30   an explicit non-trivial transformation has to be applied:
       
    31   code generation.
       
    32 
       
    33   This tutorial introduces a generic code generator for the
       
    34   Isabelle system \cite{isa-tutorial}.
       
    35   Generic in the sense that the
       
    36   \qn{target language} for which code shall ultimately be
       
    37   generated is not fixed but may be an arbitrary state-of-the-art
       
    38   functional programming language (currently, the implementation
       
    39   supports SML \cite{SML}, OCaml \cite{OCaml} and Haskell
       
    40   \cite{haskell-revised-report}).
       
    41   We aim to provide a
       
    42   versatile environment
       
    43   suitable for software development and verification,
       
    44   structuring the process
       
    45   of code generation into a small set of orthogonal principles
       
    46   while achieving a big coverage of application areas
       
    47   with maximum flexibility.
       
    48 
       
    49   Conceptually the code generator framework is part
       
    50   of Isabelle's @{text Pure} meta logic; the object logic
       
    51   @{text HOL} which is an extension of @{text Pure}
       
    52   already comes with a reasonable framework setup and thus provides
       
    53   a good working horse for raising code-generation-driven
       
    54   applications.  So, we assume some familiarity and experience
       
    55   with the ingredients of the @{text HOL} \emph{Main} theory
       
    56   (see also \cite{isa-tutorial}).
       
    57 *}
       
    58 
       
    59 
       
    60 subsection {* Overview *}
       
    61 
       
    62 text {*
       
    63   The code generator aims to be usable with no further ado
       
    64   in most cases while allowing for detailed customization.
       
    65   This manifests in the structure of this tutorial:
       
    66   we start with a generic example \secref{sec:example}
       
    67   and introduce code generation concepts \secref{sec:concept}.
       
    68   Section
       
    69   \secref{sec:basics} explains how to use the framework naively,
       
    70   presuming a reasonable default setup.  Then, section
       
    71   \secref{sec:advanced} deals with advanced topics,
       
    72   introducing further aspects of the code generator framework
       
    73   in a motivation-driven manner.  Last, section \secref{sec:ml}
       
    74   introduces the framework's internal programming interfaces.
       
    75 
       
    76   \begin{warn}
       
    77     Ultimately, the code generator which this tutorial deals with
       
    78     is supposed to replace the already established code generator
       
    79     by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
       
    80     So, for the moment, there are two distinct code generators
       
    81     in Isabelle.
       
    82     Also note that while the framework itself is
       
    83     object-logic independent, only @{text HOL} provides a reasonable
       
    84     framework setup.    
       
    85   \end{warn}
       
    86 *}
       
    87 
       
    88 
       
    89 section {* An example: a simple theory of search trees \label{sec:example} *}
       
    90 
       
    91 text {*
       
    92   When writing executable specifications using @{text HOL},
       
    93   it is convenient to use
       
    94   three existing packages: the datatype package for defining
       
    95   datatypes, the function package for (recursive) functions,
       
    96   and the class package for overloaded definitions.
       
    97 
       
    98   We develope a small theory of search trees; trees are represented
       
    99   as a datatype with key type @{typ "'a"} and value type @{typ "'b"}:
       
   100 *}
       
   101 
       
   102 datatype ('a, 'b) searchtree = Leaf "'a\<Colon>linorder" 'b
       
   103   | Branch "('a, 'b) searchtree" "'a" "('a, 'b) searchtree"
       
   104 
       
   105 text {*
       
   106   \noindent Note that we have constrained the type of keys
       
   107   to the class of total orders, @{text linorder}.
       
   108 
       
   109   We define @{text find} and @{text update} functions:
       
   110 *}
       
   111 
       
   112 primrec
       
   113   find :: "('a\<Colon>linorder, 'b) searchtree \<Rightarrow> 'a \<Rightarrow> 'b option" where
       
   114   "find (Leaf key val) it = (if it = key then Some val else None)"
       
   115   | "find (Branch t1 key t2) it = (if it \<le> key then find t1 it else find t2 it)"
       
   116 
       
   117 fun
       
   118   update :: "'a\<Colon>linorder \<times> 'b \<Rightarrow> ('a, 'b) searchtree \<Rightarrow> ('a, 'b) searchtree" where
       
   119   "update (it, entry) (Leaf key val) = (
       
   120     if it = key then Leaf key entry
       
   121       else if it \<le> key
       
   122       then Branch (Leaf it entry) it (Leaf key val)
       
   123       else Branch (Leaf key val) it (Leaf it entry)
       
   124    )"
       
   125   | "update (it, entry) (Branch t1 key t2) = (
       
   126     if it \<le> key
       
   127       then (Branch (update (it, entry) t1) key t2)
       
   128       else (Branch t1 key (update (it, entry) t2))
       
   129    )"
       
   130 
       
   131 text {*
       
   132   \noindent For testing purpose, we define a small example
       
   133   using natural numbers @{typ nat} (which are a @{text linorder})
       
   134   as keys and list of nats as values:
       
   135 *}
       
   136 
       
   137 definition
       
   138   example :: "(nat, nat list) searchtree"
       
   139 where
       
   140   "example = update (Suc (Suc (Suc (Suc 0))), [Suc (Suc 0), Suc (Suc 0)]) (update (Suc (Suc (Suc 0)), [Suc (Suc (Suc 0))])
       
   141     (update (Suc (Suc 0), [Suc (Suc 0)]) (Leaf (Suc 0) [])))"
       
   142 
       
   143 text {*
       
   144   \noindent Then we generate code
       
   145 *}
       
   146 
       
   147 export_code example (*<*)in SML (*>*)in SML file "examples/tree.ML"
       
   148 
       
   149 text {*
       
   150   \noindent which looks like:
       
   151   \lstsml{Thy/examples/tree.ML}
       
   152 *}
       
   153 
       
   154 
       
   155 section {* Code generation concepts and process \label{sec:concept} *}
       
   156 
       
   157 text {*
       
   158   \begin{figure}[h]
       
   159   \centering
       
   160   \includegraphics[width=0.7\textwidth]{codegen_process}
       
   161   \caption{code generator -- processing overview}
       
   162   \label{fig:process}
       
   163   \end{figure}
       
   164 
       
   165   The code generator employs a notion of executability
       
   166   for three foundational executable ingredients known
       
   167   from functional programming:
       
   168   \emph{defining equations}, \emph{datatypes}, and
       
   169   \emph{type classes}. A defining equation as a first approximation
       
   170   is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
       
   171   (an equation headed by a constant @{text f} with arguments
       
   172   @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}).
       
   173   Code generation aims to turn defining equations
       
   174   into a functional program by running through
       
   175   a process (see figure \ref{fig:process}):
       
   176 
       
   177   \begin{itemize}
       
   178 
       
   179     \item Out of the vast collection of theorems proven in a
       
   180       \qn{theory}, a reasonable subset modeling
       
   181       defining equations is \qn{selected}.
       
   182 
       
   183     \item On those selected theorems, certain
       
   184       transformations are carried out
       
   185       (\qn{preprocessing}).  Their purpose is to turn theorems
       
   186       representing non- or badly executable
       
   187       specifications into equivalent but executable counterparts.
       
   188       The result is a structured collection of \qn{code theorems}.
       
   189 
       
   190     \item These \qn{code theorems} then are \qn{translated}
       
   191       into an Haskell-like intermediate
       
   192       language.
       
   193 
       
   194     \item Finally, out of the intermediate language the final
       
   195       code in the desired \qn{target language} is \qn{serialized}.
       
   196 
       
   197   \end{itemize}
       
   198 
       
   199   From these steps, only the two last are carried out
       
   200   outside the logic; by keeping this layer as
       
   201   thin as possible, the amount of code to trust is
       
   202   kept to a minimum.
       
   203 *}
       
   204 
       
   205 
       
   206 
       
   207 section {* Basics \label{sec:basics} *}
       
   208 
       
   209 subsection {* Invoking the code generator *}
       
   210 
       
   211 text {*
       
   212   Thanks to a reasonable setup of the @{text HOL} theories, in
       
   213   most cases code generation proceeds without further ado:
       
   214 *}
       
   215 
       
   216 primrec
       
   217   fac :: "nat \<Rightarrow> nat" where
       
   218     "fac 0 = 1"
       
   219   | "fac (Suc n) = Suc n * fac n"
       
   220 
       
   221 text {*
       
   222   \noindent This executable specification is now turned to SML code:
       
   223 *}
       
   224 
       
   225 export_code fac (*<*)in SML(*>*)in SML file "examples/fac.ML"
       
   226 
       
   227 text {*
       
   228   \noindent  The @{text "\<EXPORTCODE>"} command takes a space-separated list of
       
   229   constants together with \qn{serialization directives}
       
   230   These start with a \qn{target language}
       
   231   identifier, followed by a file specification
       
   232   where to write the generated code to.
       
   233 
       
   234   Internally, the defining equations for all selected
       
   235   constants are taken, including any transitively required
       
   236   constants, datatypes and classes, resulting in the following
       
   237   code:
       
   238 
       
   239   \lstsml{Thy/examples/fac.ML}
       
   240 
       
   241   The code generator will complain when a required
       
   242   ingredient does not provide a executable counterpart,
       
   243   e.g.~generating code
       
   244   for constants not yielding
       
   245   a defining equation (e.g.~the Hilbert choice
       
   246   operation @{text "SOME"}):
       
   247 *}
       
   248 (*<*)
       
   249 setup {* Sign.add_path "foo" *}
       
   250 (*>*)
       
   251 definition
       
   252   pick_some :: "'a list \<Rightarrow> 'a" where
       
   253   "pick_some xs = (SOME x. x \<in> set xs)"
       
   254 (*<*)
       
   255 hide const pick_some
       
   256 
       
   257 setup {* Sign.parent_path *}
       
   258 
       
   259 definition
       
   260   pick_some :: "'a list \<Rightarrow> 'a" where
       
   261   "pick_some = hd"
       
   262 (*>*)
       
   263 
       
   264 export_code pick_some in SML file "examples/fail_const.ML"
       
   265 
       
   266 text {* \noindent will fail. *}
       
   267 
       
   268 subsection {* Theorem selection *}
       
   269 
       
   270 text {*
       
   271   The list of all defining equations in a theory may be inspected
       
   272   using the @{text "\<PRINTCODESETUP>"} command:
       
   273 *}
       
   274 
       
   275 print_codesetup
       
   276 
       
   277 text {*
       
   278   \noindent which displays a table of constant with corresponding
       
   279   defining equations (the additional stuff displayed
       
   280   shall not bother us for the moment).
       
   281 
       
   282   The typical @{text HOL} tools are already set up in a way that
       
   283   function definitions introduced by @{text "\<DEFINITION>"},
       
   284   @{text "\<PRIMREC>"}, @{text "\<FUN>"},
       
   285   @{text "\<FUNCTION>"}, @{text "\<CONSTDEFS>"},
       
   286   @{text "\<RECDEF>"} are implicitly propagated
       
   287   to this defining equation table. Specific theorems may be
       
   288   selected using an attribute: \emph{code func}. As example,
       
   289   a weight selector function:
       
   290 *}
       
   291 
       
   292 primrec
       
   293   pick :: "(nat \<times> 'a) list \<Rightarrow> nat \<Rightarrow> 'a" where
       
   294   "pick (x#xs) n = (let (k, v) = x in
       
   295     if n < k then v else pick xs (n - k))"
       
   296 
       
   297 text {*
       
   298   \noindent We want to eliminate the explicit destruction
       
   299   of @{term x} to @{term "(k, v)"}:
       
   300 *}
       
   301 
       
   302 lemma [code func]:
       
   303   "pick ((k, v)#xs) n = (if n < k then v else pick xs (n - k))"
       
   304   by simp
       
   305 
       
   306 export_code pick (*<*)in SML(*>*) in SML file "examples/pick1.ML"
       
   307 
       
   308 text {*
       
   309   \noindent This theorem now is used for generating code:
       
   310 
       
   311   \lstsml{Thy/examples/pick1.ML}
       
   312 
       
   313   \noindent The policy is that \emph{default equations} stemming from
       
   314   @{text "\<DEFINITION>"},
       
   315   @{text "\<PRIMREC>"}, @{text "\<FUN>"},
       
   316   @{text "\<FUNCTION>"}, @{text "\<CONSTDEFS>"},
       
   317   @{text "\<RECDEF>"} statements are discarded as soon as an
       
   318   equation is explicitly selected by means of \emph{code func}.
       
   319   Further applications of \emph{code func} add theorems incrementally,
       
   320   but syntactic redundancies are implicitly dropped.  For example,
       
   321   using a modified version of the @{const fac} function
       
   322   as defining equation, the then redundant (since
       
   323   syntactically subsumed) original defining equations
       
   324   are dropped.
       
   325 
       
   326   \begin{warn}
       
   327     The attributes \emph{code} and \emph{code del}
       
   328     associated with the existing code generator also apply to
       
   329     the new one: \emph{code} implies \emph{code func},
       
   330     and \emph{code del} implies \emph{code func del}.
       
   331   \end{warn}
       
   332 *}
       
   333 
       
   334 subsection {* Type classes *}
       
   335 
       
   336 text {*
       
   337   Type classes enter the game via the Isar class package.
       
   338   For a short introduction how to use it, see \cite{isabelle-classes};
       
   339   here we just illustrate its impact on code generation.
       
   340 
       
   341   In a target language, type classes may be represented
       
   342   natively (as in the case of Haskell).  For languages
       
   343   like SML, they are implemented using \emph{dictionaries}.
       
   344   Our following example specifies a class \qt{null},
       
   345   assigning to each of its inhabitants a \qt{null} value:
       
   346 *}
       
   347 
       
   348 class null = type +
       
   349   fixes null :: 'a
       
   350 
       
   351 primrec
       
   352   head :: "'a\<Colon>null list \<Rightarrow> 'a" where
       
   353   "head [] = null"
       
   354   | "head (x#xs) = x"
       
   355 
       
   356 text {*
       
   357  \noindent  We provide some instances for our @{text null}:
       
   358 *}
       
   359 
       
   360 instantiation option and list :: (type) null
       
   361 begin
       
   362 
       
   363 definition
       
   364   "null = None"
       
   365 
       
   366 definition
       
   367   "null = []"
       
   368 
       
   369 instance ..
       
   370 
     1 
   371 end
     2 end
   372 
       
   373 text {*
       
   374   \noindent Constructing a dummy example:
       
   375 *}
       
   376 
       
   377 definition
       
   378   "dummy = head [Some (Suc 0), None]"
       
   379 
       
   380 text {*
       
   381   Type classes offer a suitable occasion to introduce
       
   382   the Haskell serializer.  Its usage is almost the same
       
   383   as SML, but, in accordance with conventions
       
   384   some Haskell systems enforce, each module ends
       
   385   up in a single file. The module hierarchy is reflected in
       
   386   the file system, with root directory given as file specification.
       
   387 *}
       
   388 
       
   389 export_code dummy in Haskell file "examples/"
       
   390   (* NOTE: you may use Haskell only once in this document, otherwise
       
   391   you have to work in distinct subdirectories *)
       
   392 
       
   393 text {*
       
   394   \lsthaskell{Thy/examples/Codegen.hs}
       
   395   \noindent (we have left out all other modules).
       
   396 
       
   397   \medskip
       
   398 
       
   399   The whole code in SML with explicit dictionary passing:
       
   400 *}
       
   401 
       
   402 export_code dummy (*<*)in SML(*>*)in SML file "examples/class.ML"
       
   403 
       
   404 text {*
       
   405   \lstsml{Thy/examples/class.ML}
       
   406 
       
   407   \medskip
       
   408 
       
   409   \noindent or in OCaml:
       
   410 *}
       
   411 
       
   412 export_code dummy in OCaml file "examples/class.ocaml"
       
   413 
       
   414 text {*
       
   415   \lstsml{Thy/examples/class.ocaml}
       
   416 
       
   417   \medskip The explicit association of constants
       
   418   to classes can be inspected using the @{text "\<PRINTCLASSES>"}
       
   419   command.
       
   420 *}
       
   421 
       
   422 
       
   423 section {* Recipes and advanced topics \label{sec:advanced} *}
       
   424 
       
   425 text {*
       
   426   In this tutorial, we do not attempt to give an exhaustive
       
   427   description of the code generator framework; instead,
       
   428   we cast a light on advanced topics by introducing
       
   429   them together with practically motivated examples.  Concerning
       
   430   further reading, see
       
   431 
       
   432   \begin{itemize}
       
   433 
       
   434   \item the Isabelle/Isar Reference Manual \cite{isabelle-isar-ref}
       
   435     for exhaustive syntax diagrams.
       
   436   \item or \cite{Haftmann-Nipkow:2007:codegen} which deals with foundational issues
       
   437     of the code generator framework.
       
   438 
       
   439   \end{itemize}
       
   440 *}
       
   441 
       
   442 subsection {* Library theories \label{sec:library} *}
       
   443 
       
   444 text {*
       
   445   The @{text HOL} @{text Main} theory already provides a code
       
   446   generator setup
       
   447   which should be suitable for most applications. Common extensions
       
   448   and modifications are available by certain theories of the @{text HOL}
       
   449   library; beside being useful in applications, they may serve
       
   450   as a tutorial for customizing the code generator setup.
       
   451 
       
   452   \begin{description}
       
   453 
       
   454     \item[@{text "Code_Integer"}] represents @{text HOL} integers by big
       
   455        integer literals in target languages.
       
   456     \item[@{text "Code_Char"}] represents @{text HOL} characters by 
       
   457        character literals in target languages.
       
   458     \item[@{text "Code_Char_chr"}] like @{text "Code_Char"},
       
   459        but also offers treatment of character codes; includes
       
   460        @{text "Code_Integer"}.
       
   461     \item[@{text "Efficient_Nat"}] \label{eff_nat} implements natural numbers by integers,
       
   462        which in general will result in higher efficency; pattern
       
   463        matching with @{term "0\<Colon>nat"} / @{const "Suc"}
       
   464        is eliminated;  includes @{text "Code_Integer"}.
       
   465     \item[@{text "Code_Index"}] provides an additional datatype
       
   466        @{text index} which is mapped to target-language built-in integers.
       
   467        Useful for code setups which involve e.g. indexing of
       
   468        target-language arrays.
       
   469     \item[@{text "Code_Message"}] provides an additional datatype
       
   470        @{text message_string} which is isomorphic to strings;
       
   471        @{text message_string}s are mapped to target-language strings.
       
   472        Useful for code setups which involve e.g. printing (error) messages.
       
   473 
       
   474   \end{description}
       
   475 
       
   476   \begin{warn}
       
   477     When importing any of these theories, they should form the last
       
   478     items in an import list.  Since these theories adapt the
       
   479     code generator setup in a non-conservative fashion,
       
   480     strange effects may occur otherwise.
       
   481   \end{warn}
       
   482 *}
       
   483 
       
   484 subsection {* Preprocessing *}
       
   485 
       
   486 text {*
       
   487   Before selected function theorems are turned into abstract
       
   488   code, a chain of definitional transformation steps is carried
       
   489   out: \emph{preprocessing}.  In essence, the preprocessor
       
   490   consists of two components: a \emph{simpset} and \emph{function transformers}.
       
   491 
       
   492   The \emph{simpset} allows to employ the full generality of the Isabelle
       
   493   simplifier.  Due to the interpretation of theorems
       
   494   as defining equations, rewrites are applied to the right
       
   495   hand side and the arguments of the left hand side of an
       
   496   equation, but never to the constant heading the left hand side.
       
   497   An important special case are \emph{inline theorems} which may be
       
   498   declared an undeclared using the
       
   499   \emph{code inline} or \emph{code inline del} attribute respectively.
       
   500   Some common applications:
       
   501 *}
       
   502 
       
   503 text_raw {*
       
   504   \begin{itemize}
       
   505 *}
       
   506 
       
   507 text {*
       
   508      \item replacing non-executable constructs by executable ones:
       
   509 *}     
       
   510 
       
   511   lemma [code inline]:
       
   512     "x \<in> set xs \<longleftrightarrow> x mem xs" by (induct xs) simp_all
       
   513 
       
   514 text {*
       
   515      \item eliminating superfluous constants:
       
   516 *}
       
   517 
       
   518   lemma [code inline]:
       
   519     "1 = Suc 0" by simp
       
   520 
       
   521 text {*
       
   522      \item replacing executable but inconvenient constructs:
       
   523 *}
       
   524 
       
   525   lemma [code inline]:
       
   526     "xs = [] \<longleftrightarrow> List.null xs" by (induct xs) simp_all
       
   527 
       
   528 text_raw {*
       
   529   \end{itemize}
       
   530 *}
       
   531 
       
   532 text {*
       
   533 
       
   534   \emph{Function transformers} provide a very general interface,
       
   535   transforming a list of function theorems to another
       
   536   list of function theorems, provided that neither the heading
       
   537   constant nor its type change.  The @{term "0\<Colon>nat"} / @{const Suc}
       
   538   pattern elimination implemented in
       
   539   theory @{text "Efficient_Nat"} (see \secref{eff_nat}) uses this
       
   540   interface.
       
   541 
       
   542   \noindent The current setup of the preprocessor may be inspected using
       
   543   the @{text "\<PRINTCODESETUP>"} command.
       
   544 
       
   545   \begin{warn}
       
   546     The attribute \emph{code unfold}
       
   547     associated with the existing code generator also applies to
       
   548     the new one: \emph{code unfold} implies \emph{code inline}.
       
   549   \end{warn}
       
   550 *}
       
   551 
       
   552 
       
   553 subsection {* Concerning operational equality *}
       
   554 
       
   555 text {*
       
   556   Surely you have already noticed how equality is treated
       
   557   by the code generator:
       
   558 *}
       
   559 
       
   560 primrec
       
   561   collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where
       
   562     "collect_duplicates xs ys [] = xs"
       
   563   | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs
       
   564       then if z \<in> set ys
       
   565         then collect_duplicates xs ys zs
       
   566         else collect_duplicates xs (z#ys) zs
       
   567       else collect_duplicates (z#xs) (z#ys) zs)"
       
   568 
       
   569 text {*
       
   570   The membership test during preprocessing is rewritten,
       
   571   resulting in @{const List.member}, which itself
       
   572   performs an explicit equality check.
       
   573 *}
       
   574 
       
   575 export_code collect_duplicates (*<*)in SML(*>*)in SML file "examples/collect_duplicates.ML"
       
   576 
       
   577 text {*
       
   578   \lstsml{Thy/examples/collect_duplicates.ML}
       
   579 *}
       
   580 
       
   581 text {*
       
   582   Obviously, polymorphic equality is implemented the Haskell
       
   583   way using a type class.  How is this achieved?  HOL introduces
       
   584   an explicit class @{text eq} with a corresponding operation
       
   585   @{const eq_class.eq} such that @{thm eq [no_vars]}.
       
   586   The preprocessing framework does the rest.
       
   587   For datatypes, instances of @{text eq} are implicitly derived
       
   588   when possible.  For other types, you may instantiate @{text eq}
       
   589   manually like any other type class.
       
   590 
       
   591   Though this @{text eq} class is designed to get rarely in
       
   592   the way, a subtlety
       
   593   enters the stage when definitions of overloaded constants
       
   594   are dependent on operational equality.  For example, let
       
   595   us define a lexicographic ordering on tuples:
       
   596 *}
       
   597 
       
   598 instantiation * :: (ord, ord) ord
       
   599 begin
       
   600 
       
   601 definition
       
   602   [code func del]: "p1 < p2 \<longleftrightarrow> (let (x1, y1) = p1; (x2, y2) = p2 in
       
   603     x1 < x2 \<or> (x1 = x2 \<and> y1 < y2))"
       
   604 
       
   605 definition
       
   606   [code func del]: "p1 \<le> p2 \<longleftrightarrow> (let (x1, y1) = p1; (x2, y2) = p2 in
       
   607     x1 < x2 \<or> (x1 = x2 \<and> y1 \<le> y2))"
       
   608 
       
   609 instance ..
       
   610 
       
   611 end
       
   612 
       
   613 lemma ord_prod [code func(*<*), code func del(*>*)]:
       
   614   "(x1 \<Colon> 'a\<Colon>ord, y1 \<Colon> 'b\<Colon>ord) < (x2, y2) \<longleftrightarrow> x1 < x2 \<or> (x1 = x2 \<and> y1 < y2)"
       
   615   "(x1 \<Colon> 'a\<Colon>ord, y1 \<Colon> 'b\<Colon>ord) \<le> (x2, y2) \<longleftrightarrow> x1 < x2 \<or> (x1 = x2 \<and> y1 \<le> y2)"
       
   616   unfolding less_prod_def less_eq_prod_def by simp_all
       
   617 
       
   618 text {*
       
   619   Then code generation will fail.  Why?  The definition
       
   620   of @{term "op \<le>"} depends on equality on both arguments,
       
   621   which are polymorphic and impose an additional @{text eq}
       
   622   class constraint, thus violating the type discipline
       
   623   for class operations.
       
   624 
       
   625   The solution is to add @{text eq} explicitly to the first sort arguments in the
       
   626   code theorems:
       
   627 *}
       
   628 
       
   629 lemma ord_prod_code [code func]:
       
   630   "(x1 \<Colon> 'a\<Colon>{ord, eq}, y1 \<Colon> 'b\<Colon>ord) < (x2, y2) \<longleftrightarrow>
       
   631     x1 < x2 \<or> (x1 = x2 \<and> y1 < y2)"
       
   632   "(x1 \<Colon> 'a\<Colon>{ord, eq}, y1 \<Colon> 'b\<Colon>ord) \<le> (x2, y2) \<longleftrightarrow>
       
   633     x1 < x2 \<or> (x1 = x2 \<and> y1 \<le> y2)"
       
   634   unfolding ord_prod by rule+
       
   635 
       
   636 text {*
       
   637   \noindent Then code generation succeeds:
       
   638 *}
       
   639 
       
   640 export_code "op \<le> \<Colon> 'a\<Colon>{eq, ord} \<times> 'b\<Colon>ord \<Rightarrow> 'a \<times> 'b \<Rightarrow> bool"
       
   641   (*<*)in SML(*>*)in SML file "examples/lexicographic.ML"
       
   642 
       
   643 text {*
       
   644   \lstsml{Thy/examples/lexicographic.ML}
       
   645 *}
       
   646 
       
   647 text {*
       
   648   In general, code theorems for overloaded constants may have more
       
   649   restrictive sort constraints than the underlying instance relation
       
   650   between class and type constructor as long as the whole system of
       
   651   constraints is coregular; code theorems violating coregularity
       
   652   are rejected immediately.  Consequently, it might be necessary
       
   653   to delete disturbing theorems in the code theorem table,
       
   654   as we have done here with the original definitions @{text less_prod_def}
       
   655   and @{text less_eq_prod_def}.
       
   656 
       
   657   In some cases, the automatically derived defining equations
       
   658   for equality on a particular type may not be appropriate.
       
   659   As example, watch the following datatype representing
       
   660   monomorphic parametric types (where type constructors
       
   661   are referred to by natural numbers):
       
   662 *}
       
   663 
       
   664 datatype monotype = Mono nat "monotype list"
       
   665 (*<*)
       
   666 lemma monotype_eq:
       
   667   "Mono tyco1 typargs1 = Mono tyco2 typargs2 \<equiv> 
       
   668      tyco1 = tyco2 \<and> typargs1 = typargs2" by simp
       
   669 (*>*)
       
   670 
       
   671 text {*
       
   672   Then code generation for SML would fail with a message
       
   673   that the generated code conains illegal mutual dependencies:
       
   674   the theorem @{thm monotype_eq [no_vars]} already requires the
       
   675   instance @{text "monotype \<Colon> eq"}, which itself requires
       
   676   @{thm monotype_eq [no_vars]};  Haskell has no problem with mutually
       
   677   recursive @{text instance} and @{text function} definitions,
       
   678   but the SML serializer does not support this.
       
   679 
       
   680   In such cases, you have to provide you own equality equations
       
   681   involving auxiliary constants.  In our case,
       
   682   @{const [show_types] list_all2} can do the job:
       
   683 *}
       
   684 
       
   685 lemma monotype_eq_list_all2 [code func]:
       
   686   "Mono tyco1 typargs1 = Mono tyco2 typargs2 \<longleftrightarrow>
       
   687      tyco1 = tyco2 \<and> list_all2 (op =) typargs1 typargs2"
       
   688   by (simp add: list_all2_eq [symmetric])
       
   689 
       
   690 text {*
       
   691   does not depend on instance @{text "monotype \<Colon> eq"}:
       
   692 *}
       
   693 
       
   694 export_code "op = :: monotype \<Rightarrow> monotype \<Rightarrow> bool"
       
   695   (*<*)in SML(*>*)in SML file "examples/monotype.ML"
       
   696 
       
   697 text {*
       
   698   \lstsml{Thy/examples/monotype.ML}
       
   699 *}
       
   700 
       
   701 subsection {* Programs as sets of theorems *}
       
   702 
       
   703 text {*
       
   704   As told in \secref{sec:concept}, code generation is based
       
   705   on a structured collection of code theorems.
       
   706   For explorative purpose, this collection
       
   707   may be inspected using the @{text "\<CODETHMS>"} command:
       
   708 *}
       
   709 
       
   710 code_thms "op mod :: nat \<Rightarrow> nat \<Rightarrow> nat"
       
   711 
       
   712 text {*
       
   713   \noindent prints a table with \emph{all} defining equations
       
   714   for @{term "op mod :: nat \<Rightarrow> nat \<Rightarrow> nat"}, including
       
   715   \emph{all} defining equations those equations depend
       
   716   on recursivly.  @{text "\<CODETHMS>"} provides a convenient
       
   717   mechanism to inspect the impact of a preprocessor setup
       
   718   on defining equations.
       
   719   
       
   720   Similarly, the @{text "\<CODEDEPS>"} command shows a graph
       
   721   visualizing dependencies between defining equations.
       
   722 *}
       
   723 
       
   724 
       
   725 subsection {* Constructor sets for datatypes *}
       
   726 
       
   727 text {*
       
   728   Conceptually, any datatype is spanned by a set of
       
   729   \emph{constructors} of type @{text "\<tau> = \<dots> \<Rightarrow> \<kappa> \<alpha>\<^isub>1 \<dots> \<alpha>\<^isub>n"}
       
   730   where @{text "{\<alpha>\<^isub>1, \<dots>, \<alpha>\<^isub>n}"} is excactly the set of \emph{all}
       
   731   type variables in @{text "\<tau>"}.  The HOL datatype package
       
   732   by default registers any new datatype in the table
       
   733   of datatypes, which may be inspected using
       
   734   the @{text "\<PRINTCODESETUP>"} command.
       
   735 
       
   736   In some cases, it may be convenient to alter or
       
   737   extend this table;  as an example, we will develope an alternative
       
   738   representation of natural numbers as binary digits, whose
       
   739   size does increase logarithmically with its value, not linear
       
   740   \footnote{Indeed, the @{text "Efficient_Nat"} theory (see \ref{eff_nat})
       
   741     does something similar}.  First, the digit representation:
       
   742 *}
       
   743 
       
   744 definition Dig0 :: "nat \<Rightarrow> nat" where
       
   745   "Dig0 n = 2 * n"
       
   746 
       
   747 definition Dig1 :: "nat \<Rightarrow> nat" where
       
   748   "Dig1 n = Suc (2 * n)"
       
   749 
       
   750 text {*
       
   751   \noindent We will use these two ">digits"< to represent natural numbers
       
   752   in binary digits, e.g.:
       
   753 *}
       
   754 
       
   755 lemma 42: "42 = Dig0 (Dig1 (Dig0 (Dig1 (Dig0 1))))"
       
   756   by (simp add: Dig0_def Dig1_def)
       
   757 
       
   758 text {*
       
   759   \noindent Of course we also have to provide proper code equations for
       
   760   the operations, e.g. @{term "op + \<Colon> nat \<Rightarrow> nat \<Rightarrow> nat"}:
       
   761 *}
       
   762 
       
   763 lemma plus_Dig [code func]:
       
   764   "0 + n = n"
       
   765   "m + 0 = m"
       
   766   "1 + Dig0 n = Dig1 n"
       
   767   "Dig0 m + 1 = Dig1 m"
       
   768   "1 + Dig1 n = Dig0 (n + 1)"
       
   769   "Dig1 m + 1 = Dig0 (m + 1)"
       
   770   "Dig0 m + Dig0 n = Dig0 (m + n)"
       
   771   "Dig0 m + Dig1 n = Dig1 (m + n)"
       
   772   "Dig1 m + Dig0 n = Dig1 (m + n)"
       
   773   "Dig1 m + Dig1 n = Dig0 (m + n + 1)"
       
   774   by (simp_all add: Dig0_def Dig1_def)
       
   775 
       
   776 text {*
       
   777   \noindent We then instruct the code generator to view @{term "0\<Colon>nat"},
       
   778   @{term "1\<Colon>nat"}, @{term Dig0} and @{term Dig1} as
       
   779   datatype constructors:
       
   780 *}
       
   781 
       
   782 code_datatype "0\<Colon>nat" "1\<Colon>nat" Dig0 Dig1
       
   783 
       
   784 text {*
       
   785   \noindent For the former constructor @{term Suc}, we provide a code
       
   786   equation and remove some parts of the default code generator setup
       
   787   which are an obstacle here:
       
   788 *}
       
   789 
       
   790 lemma Suc_Dig [code func]:
       
   791   "Suc n = n + 1"
       
   792   by simp
       
   793 
       
   794 declare One_nat_def [code inline del]
       
   795 declare add_Suc_shift [code func del] 
       
   796 
       
   797 text {*
       
   798   \noindent This yields the following code:
       
   799 *}
       
   800 
       
   801 export_code "op + \<Colon> nat \<Rightarrow> nat \<Rightarrow> nat" (*<*)in SML(*>*) in SML file "examples/nat_binary.ML"
       
   802 
       
   803 text {* \lstsml{Thy/examples/nat_binary.ML} *}
       
   804 
       
   805 text {*
       
   806   \medskip
       
   807 
       
   808   From this example, it can be easily glimpsed that using own constructor sets
       
   809   is a little delicate since it changes the set of valid patterns for values
       
   810   of that type.  Without going into much detail, here some practical hints:
       
   811 
       
   812   \begin{itemize}
       
   813     \item When changing the constuctor set for datatypes, take care to
       
   814       provide an alternative for the @{text case} combinator (e.g. by replacing
       
   815       it using the preprocessor).
       
   816     \item Values in the target language need not to be normalized -- different
       
   817       values in the target language may represent the same value in the
       
   818       logic (e.g. @{text "Dig1 0 = 1"}).
       
   819     \item Usually, a good methodology to deal with the subleties of pattern
       
   820       matching is to see the type as an abstract type: provide a set
       
   821       of operations which operate on the concrete representation of the type,
       
   822       and derive further operations by combinations of these primitive ones,
       
   823       without relying on a particular representation.
       
   824   \end{itemize}
       
   825 *}
       
   826 
       
   827 code_datatype %invisible "0::nat" Suc
       
   828 declare %invisible plus_Dig [code func del]
       
   829 declare %invisible One_nat_def [code inline]
       
   830 declare %invisible add_Suc_shift [code func] 
       
   831 lemma %invisible [code func]: "0 + n = (n \<Colon> nat)" by simp
       
   832 
       
   833 
       
   834 subsection {* Customizing serialization  *}
       
   835 
       
   836 subsubsection {* Basics *}
       
   837 
       
   838 text {*
       
   839   Consider the following function and its corresponding
       
   840   SML code:
       
   841 *}
       
   842 
       
   843 primrec
       
   844   in_interval :: "nat \<times> nat \<Rightarrow> nat \<Rightarrow> bool" where
       
   845   "in_interval (k, l) n \<longleftrightarrow> k \<le> n \<and> n \<le> l"
       
   846 (*<*)
       
   847 code_type %tt bool
       
   848   (SML)
       
   849 code_const %tt True and False and "op \<and>" and Not
       
   850   (SML and and and)
       
   851 (*>*)
       
   852 export_code in_interval (*<*)in SML(*>*)in SML file "examples/bool_literal.ML"
       
   853 
       
   854 text {*
       
   855   \lstsml{Thy/examples/bool_literal.ML}
       
   856 
       
   857   \noindent Though this is correct code, it is a little bit unsatisfactory:
       
   858   boolean values and operators are materialized as distinguished
       
   859   entities with have nothing to do with the SML-builtin notion
       
   860   of \qt{bool}.  This results in less readable code;
       
   861   additionally, eager evaluation may cause programs to
       
   862   loop or break which would perfectly terminate when
       
   863   the existing SML \qt{bool} would be used.  To map
       
   864   the HOL \qt{bool} on SML \qt{bool}, we may use
       
   865   \qn{custom serializations}:
       
   866 *}
       
   867 
       
   868 code_type %tt bool
       
   869   (SML "bool")
       
   870 code_const %tt True and False and "op \<and>"
       
   871   (SML "true" and "false" and "_ andalso _")
       
   872 
       
   873 text {*
       
   874   The @{text "\<CODETYPE>"} commad takes a type constructor
       
   875   as arguments together with a list of custom serializations.
       
   876   Each custom serialization starts with a target language
       
   877   identifier followed by an expression, which during
       
   878   code serialization is inserted whenever the type constructor
       
   879   would occur.  For constants, @{text "\<CODECONST>"} implements
       
   880   the corresponding mechanism.  Each ``@{verbatim "_"}'' in
       
   881   a serialization expression is treated as a placeholder
       
   882   for the type constructor's (the constant's) arguments.
       
   883 *}
       
   884 
       
   885 export_code in_interval (*<*)in SML(*>*) in SML file "examples/bool_mlbool.ML"
       
   886 
       
   887 text {*
       
   888   \lstsml{Thy/examples/bool_mlbool.ML}
       
   889 
       
   890   \noindent This still is not perfect: the parentheses
       
   891   around the \qt{andalso} expression are superfluous.
       
   892   Though the serializer
       
   893   by no means attempts to imitate the rich Isabelle syntax
       
   894   framework, it provides some common idioms, notably
       
   895   associative infixes with precedences which may be used here:
       
   896 *}
       
   897 
       
   898 code_const %tt "op \<and>"
       
   899   (SML infixl 1 "andalso")
       
   900 
       
   901 export_code in_interval (*<*)in SML(*>*) in SML file "examples/bool_infix.ML"
       
   902 
       
   903 text {*
       
   904   \lstsml{Thy/examples/bool_infix.ML}
       
   905 
       
   906   \medskip
       
   907 
       
   908   Next, we try to map HOL pairs to SML pairs, using the
       
   909   infix ``@{verbatim "*"}'' type constructor and parentheses:
       
   910 *}
       
   911 (*<*)
       
   912 code_type *
       
   913   (SML)
       
   914 code_const Pair
       
   915   (SML)
       
   916 (*>*)
       
   917 code_type %tt *
       
   918   (SML infix 2 "*")
       
   919 code_const %tt Pair
       
   920   (SML "!((_),/ (_))")
       
   921 
       
   922 text {*
       
   923   The initial bang ``@{verbatim "!"}'' tells the serializer to never put
       
   924   parentheses around the whole expression (they are already present),
       
   925   while the parentheses around argument place holders
       
   926   tell not to put parentheses around the arguments.
       
   927   The slash ``@{verbatim "/"}'' (followed by arbitrary white space)
       
   928   inserts a space which may be used as a break if necessary
       
   929   during pretty printing.
       
   930 
       
   931   These examples give a glimpse what mechanisms
       
   932   custom serializations provide; however their usage
       
   933   requires careful thinking in order not to introduce
       
   934   inconsistencies -- or, in other words:
       
   935   custom serializations are completely axiomatic.
       
   936 
       
   937   A further noteworthy details is that any special
       
   938   character in a custom serialization may be quoted
       
   939   using ``@{verbatim "'"}''; thus, in
       
   940   ``@{verbatim "fn '_ => _"}'' the first
       
   941   ``@{verbatim "_"}'' is a proper underscore while the
       
   942   second ``@{verbatim "_"}'' is a placeholder.
       
   943 
       
   944   The HOL theories provide further
       
   945   examples for custom serializations.
       
   946 *}
       
   947 
       
   948 
       
   949 subsubsection {* Haskell serialization *}
       
   950 
       
   951 text {*
       
   952   For convenience, the default
       
   953   HOL setup for Haskell maps the @{text eq} class to
       
   954   its counterpart in Haskell, giving custom serializations
       
   955   for the class (@{text "\<CODECLASS>"}) and its operation:
       
   956 *}
       
   957 
       
   958 code_class %tt eq
       
   959   (Haskell "Eq" where "op =" \<equiv> "(==)")
       
   960 
       
   961 code_const %tt "op ="
       
   962   (Haskell infixl 4 "==")
       
   963 
       
   964 text {*
       
   965   A problem now occurs whenever a type which
       
   966   is an instance of @{text eq} in HOL is mapped
       
   967   on a Haskell-builtin type which is also an instance
       
   968   of Haskell @{text Eq}:
       
   969 *}
       
   970 
       
   971 typedecl bar
       
   972 
       
   973 instantiation bar :: eq
       
   974 begin
       
   975 
       
   976 definition "eq_class.eq (x\<Colon>bar) y \<longleftrightarrow> x = y"
       
   977 
       
   978 instance by default (simp add: eq_bar_def)
       
   979 
       
   980 end
       
   981 
       
   982 code_type %tt bar
       
   983   (Haskell "Integer")
       
   984 
       
   985 text {*
       
   986   The code generator would produce
       
   987   an additional instance, which of course is rejected.
       
   988   To suppress this additional instance, use
       
   989   @{text "\<CODEINSTANCE>"}:
       
   990 *}
       
   991 
       
   992 code_instance %tt bar :: eq
       
   993   (Haskell -)
       
   994 
       
   995 
       
   996 subsubsection {* Pretty printing *}
       
   997 
       
   998 text {*
       
   999   The serializer provides ML interfaces to set up
       
  1000   pretty serializations for expressions like lists, numerals
       
  1001   and characters;  these are
       
  1002   monolithic stubs and should only be used with the
       
  1003   theories introduced in \secref{sec:library}.
       
  1004 *}
       
  1005 
       
  1006 
       
  1007 subsection {* Cyclic module dependencies *}
       
  1008 
       
  1009 text {*
       
  1010   Sometimes the awkward situation occurs that dependencies
       
  1011   between definitions introduce cyclic dependencies
       
  1012   between modules, which in the Haskell world leaves
       
  1013   you to the mercy of the Haskell implementation you are using,
       
  1014   while for SML code generation is not possible.
       
  1015 
       
  1016   A solution is to declare module names explicitly.
       
  1017   Let use assume the three cyclically dependent
       
  1018   modules are named \emph{A}, \emph{B} and \emph{C}.
       
  1019   Then, by stating
       
  1020 *}
       
  1021 
       
  1022 code_modulename SML
       
  1023   A ABC
       
  1024   B ABC
       
  1025   C ABC
       
  1026 
       
  1027 text {*
       
  1028   we explicitly map all those modules on \emph{ABC},
       
  1029   resulting in an ad-hoc merge of this three modules
       
  1030   at serialization time.
       
  1031 *}
       
  1032 
       
  1033 subsection {* Incremental code generation *}
       
  1034 
       
  1035 text {*
       
  1036   Code generation is \emph{incremental}: theorems
       
  1037   and abstract intermediate code are cached and extended on demand.
       
  1038   The cache may be partially or fully dropped if the underlying
       
  1039   executable content of the theory changes.
       
  1040   Implementation of caching is supposed to transparently
       
  1041   hid away the details from the user.  Anyway, caching
       
  1042   reaches the surface by using a slightly more general form
       
  1043   of the @{text "\<CODETHMS>"}, @{text "\<CODEDEPS>"}
       
  1044   and @{text "\<EXPORTCODE>"} commands: the list of constants
       
  1045   may be omitted.  Then, all constants with code theorems
       
  1046   in the current cache are referred to.
       
  1047 *}
       
  1048 
       
  1049 (*subsection {* Code generation and proof extraction *}
       
  1050 
       
  1051 text {*
       
  1052   \fixme
       
  1053 *}*)
       
  1054 
       
  1055 
       
  1056 section {* ML interfaces \label{sec:ml} *}
       
  1057 
       
  1058 text {*
       
  1059   Since the code generator framework not only aims to provide
       
  1060   a nice Isar interface but also to form a base for
       
  1061   code-generation-based applications, here a short
       
  1062   description of the most important ML interfaces.
       
  1063 *}
       
  1064 
       
  1065 subsection {* Executable theory content: @{text Code} *}
       
  1066 
       
  1067 text {*
       
  1068   This Pure module implements the core notions of
       
  1069   executable content of a theory.
       
  1070 *}
       
  1071 
       
  1072 subsubsection {* Managing executable content *}
       
  1073 
       
  1074 text %mlref {*
       
  1075   \begin{mldecls}
       
  1076   @{index_ML Code.add_func: "thm -> theory -> theory"} \\
       
  1077   @{index_ML Code.del_func: "thm -> theory -> theory"} \\
       
  1078   @{index_ML Code.add_funcl: "string * thm list Susp.T -> theory -> theory"} \\
       
  1079   @{index_ML Code.map_pre: "(MetaSimplifier.simpset -> MetaSimplifier.simpset) -> theory -> theory"} \\
       
  1080   @{index_ML Code.map_post: "(MetaSimplifier.simpset -> MetaSimplifier.simpset) -> theory -> theory"} \\
       
  1081   @{index_ML Code.add_functrans: "string * (theory -> thm list -> thm list option)
       
  1082     -> theory -> theory"} \\
       
  1083   @{index_ML Code.del_functrans: "string -> theory -> theory"} \\
       
  1084   @{index_ML Code.add_datatype: "(string * typ) list -> theory -> theory"} \\
       
  1085   @{index_ML Code.get_datatype: "theory -> string
       
  1086     -> (string * sort) list * (string * typ list) list"} \\
       
  1087   @{index_ML Code.get_datatype_of_constr: "theory -> string -> string option"}
       
  1088   \end{mldecls}
       
  1089 
       
  1090   \begin{description}
       
  1091 
       
  1092   \item @{ML Code.add_func}~@{text "thm"}~@{text "thy"} adds function
       
  1093      theorem @{text "thm"} to executable content.
       
  1094 
       
  1095   \item @{ML Code.del_func}~@{text "thm"}~@{text "thy"} removes function
       
  1096      theorem @{text "thm"} from executable content, if present.
       
  1097 
       
  1098   \item @{ML Code.add_funcl}~@{text "(const, lthms)"}~@{text "thy"} adds
       
  1099      suspended defining equations @{text lthms} for constant
       
  1100      @{text const} to executable content.
       
  1101 
       
  1102   \item @{ML Code.map_pre}~@{text "f"}~@{text "thy"} changes
       
  1103      the preprocessor simpset.
       
  1104 
       
  1105   \item @{ML Code.add_functrans}~@{text "(name, f)"}~@{text "thy"} adds
       
  1106      function transformer @{text f} (named @{text name}) to executable content;
       
  1107      @{text f} is a transformer of the defining equations belonging
       
  1108      to a certain function definition, depending on the
       
  1109      current theory context.  Returning @{text NONE} indicates that no
       
  1110      transformation took place;  otherwise, the whole process will be iterated
       
  1111      with the new defining equations.
       
  1112 
       
  1113   \item @{ML Code.del_functrans}~@{text "name"}~@{text "thy"} removes
       
  1114      function transformer named @{text name} from executable content.
       
  1115 
       
  1116   \item @{ML Code.add_datatype}~@{text cs}~@{text thy} adds
       
  1117      a datatype to executable content, with generation
       
  1118      set @{text cs}.
       
  1119 
       
  1120   \item @{ML Code.get_datatype_of_constr}~@{text "thy"}~@{text "const"}
       
  1121      returns type constructor corresponding to
       
  1122      constructor @{text const}; returns @{text NONE}
       
  1123      if @{text const} is no constructor.
       
  1124 
       
  1125   \end{description}
       
  1126 *}
       
  1127 
       
  1128 subsection {* Auxiliary *}
       
  1129 
       
  1130 text %mlref {*
       
  1131   \begin{mldecls}
       
  1132   @{index_ML Code_Unit.read_const: "theory -> string -> string"} \\
       
  1133   @{index_ML Code_Unit.head_func: "thm -> string * ((string * sort) list * typ)"} \\
       
  1134   @{index_ML Code_Unit.rewrite_func: "MetaSimplifier.simpset -> thm -> thm"} \\
       
  1135   \end{mldecls}
       
  1136 
       
  1137   \begin{description}
       
  1138 
       
  1139   \item @{ML Code_Unit.read_const}~@{text thy}~@{text s}
       
  1140      reads a constant as a concrete term expression @{text s}.
       
  1141 
       
  1142   \item @{ML Code_Unit.head_func}~@{text thm}
       
  1143      extracts the constant and its type from a defining equation @{text thm}.
       
  1144 
       
  1145   \item @{ML Code_Unit.rewrite_func}~@{text ss}~@{text thm}
       
  1146      rewrites a defining equation @{text thm} with a simpset @{text ss};
       
  1147      only arguments and right hand side are rewritten,
       
  1148      not the head of the defining equation.
       
  1149 
       
  1150   \end{description}
       
  1151 
       
  1152 *}
       
  1153 
       
  1154 subsection {* Implementing code generator applications *}
       
  1155 
       
  1156 text {*
       
  1157   Implementing code generator applications on top
       
  1158   of the framework set out so far usually not only
       
  1159   involves using those primitive interfaces
       
  1160   but also storing code-dependent data and various
       
  1161   other things.
       
  1162 
       
  1163   \begin{warn}
       
  1164     Some interfaces discussed here have not reached
       
  1165     a final state yet.
       
  1166     Changes likely to occur in future.
       
  1167   \end{warn}
       
  1168 *}
       
  1169 
       
  1170 subsubsection {* Data depending on the theory's executable content *}
       
  1171 
       
  1172 text {*
       
  1173   Due to incrementality of code generation, changes in the
       
  1174   theory's executable content have to be propagated in a
       
  1175   certain fashion.  Additionally, such changes may occur
       
  1176   not only during theory extension but also during theory
       
  1177   merge, which is a little bit nasty from an implementation
       
  1178   point of view.  The framework provides a solution
       
  1179   to this technical challenge by providing a functorial
       
  1180   data slot @{ML_functor CodeDataFun}; on instantiation
       
  1181   of this functor, the following types and operations
       
  1182   are required:
       
  1183 
       
  1184   \medskip
       
  1185   \begin{tabular}{l}
       
  1186   @{text "type T"} \\
       
  1187   @{text "val empty: T"} \\
       
  1188   @{text "val merge: Pretty.pp \<rightarrow> T * T \<rightarrow> T"} \\
       
  1189   @{text "val purge: theory option \<rightarrow> CodeUnit.const list option \<rightarrow> T \<rightarrow> T"}
       
  1190   \end{tabular}
       
  1191 
       
  1192   \begin{description}
       
  1193 
       
  1194   \item @{text T} the type of data to store.
       
  1195 
       
  1196   \item @{text empty} initial (empty) data.
       
  1197 
       
  1198   \item @{text merge} merging two data slots.
       
  1199 
       
  1200   \item @{text purge}~@{text thy}~@{text consts} propagates changes in executable content;
       
  1201     if possible, the current theory context is handed over
       
  1202     as argument @{text thy} (if there is no current theory context (e.g.~during
       
  1203     theory merge, @{ML NONE}); @{text consts} indicates the kind
       
  1204     of change: @{ML NONE} stands for a fundamental change
       
  1205     which invalidates any existing code, @{text "SOME consts"}
       
  1206     hints that executable content for constants @{text consts}
       
  1207     has changed.
       
  1208 
       
  1209   \end{description}
       
  1210 
       
  1211   An instance of @{ML_functor CodeDataFun} provides the following
       
  1212   interface:
       
  1213 
       
  1214   \medskip
       
  1215   \begin{tabular}{l}
       
  1216   @{text "get: theory \<rightarrow> T"} \\
       
  1217   @{text "change: theory \<rightarrow> (T \<rightarrow> T) \<rightarrow> T"} \\
       
  1218   @{text "change_yield: theory \<rightarrow> (T \<rightarrow> 'a * T) \<rightarrow> 'a * T"}
       
  1219   \end{tabular}
       
  1220 
       
  1221   \begin{description}
       
  1222 
       
  1223   \item @{text get} retrieval of the current data.
       
  1224 
       
  1225   \item @{text change} update of current data (cached!)
       
  1226     by giving a continuation.
       
  1227 
       
  1228   \item @{text change_yield} update with side result.
       
  1229 
       
  1230   \end{description}
       
  1231 *}
       
  1232 
       
  1233 (*subsubsection {* Datatype hooks *}
       
  1234 
       
  1235 text {*
       
  1236   Isabelle/HOL's datatype package provides a mechanism to
       
  1237   extend theories depending on datatype declarations:
       
  1238   \emph{datatype hooks}.  For example, when declaring a new
       
  1239   datatype, a hook proves defining equations for equality on
       
  1240   that datatype (if possible).
       
  1241 *}
       
  1242 
       
  1243 text %mlref {*
       
  1244   \begin{mldecls}
       
  1245   @{index_ML_type DatatypeHooks.hook: "string list -> theory -> theory"} \\
       
  1246   @{index_ML DatatypeHooks.add: "DatatypeHooks.hook -> theory -> theory"}
       
  1247   \end{mldecls}
       
  1248 
       
  1249   \begin{description}
       
  1250 
       
  1251   \item @{ML_type DatatypeHooks.hook} specifies the interface
       
  1252      of \emph{datatype hooks}: a theory update
       
  1253      depending on the list of newly introduced
       
  1254      datatype names.
       
  1255 
       
  1256   \item @{ML DatatypeHooks.add} adds a hook to the
       
  1257      chain of all hooks.
       
  1258 
       
  1259   \end{description}
       
  1260 *}
       
  1261 
       
  1262 subsubsection {* Trivial typedefs -- type copies *}
       
  1263 
       
  1264 text {*
       
  1265   Sometimes packages will introduce new types
       
  1266   as \emph{marked type copies} similar to Haskell's
       
  1267   @{text newtype} declaration (e.g. the HOL record package)
       
  1268   \emph{without} tinkering with the overhead of datatypes.
       
  1269   Technically, these type copies are trivial forms of typedefs.
       
  1270   Since these type copies in code generation view are nothing
       
  1271   else than datatypes, they have been given a own package
       
  1272   in order to faciliate code generation:
       
  1273 *}
       
  1274 
       
  1275 text %mlref {*
       
  1276   \begin{mldecls}
       
  1277   @{index_ML_type TypecopyPackage.info} \\
       
  1278   @{index_ML TypecopyPackage.add_typecopy: "
       
  1279     bstring * string list -> typ -> (bstring * bstring) option
       
  1280     -> theory -> (string * TypecopyPackage.info) * theory"} \\
       
  1281   @{index_ML TypecopyPackage.get_typecopy_info: "theory
       
  1282     -> string -> TypecopyPackage.info option"} \\
       
  1283   @{index_ML TypecopyPackage.get_spec: "theory -> string
       
  1284     -> (string * sort) list * (string * typ list) list"} \\
       
  1285   @{index_ML_type TypecopyPackage.hook: "string * TypecopyPackage.info -> theory -> theory"} \\
       
  1286   @{index_ML TypecopyPackage.add_hook:
       
  1287      "TypecopyPackage.hook -> theory -> theory"} \\
       
  1288   \end{mldecls}
       
  1289 
       
  1290   \begin{description}
       
  1291 
       
  1292   \item @{ML_type TypecopyPackage.info} a record containing
       
  1293      the specification and further data of a type copy.
       
  1294 
       
  1295   \item @{ML TypecopyPackage.add_typecopy} defines a new
       
  1296      type copy.
       
  1297 
       
  1298   \item @{ML TypecopyPackage.get_typecopy_info} retrieves
       
  1299      data of an existing type copy.
       
  1300 
       
  1301   \item @{ML TypecopyPackage.get_spec} retrieves datatype-like
       
  1302      specification of a type copy.
       
  1303 
       
  1304   \item @{ML_type TypecopyPackage.hook},~@{ML TypecopyPackage.add_hook}
       
  1305      provide a hook mechanism corresponding to the hook mechanism
       
  1306      on datatypes.
       
  1307 
       
  1308   \end{description}
       
  1309 *}
       
  1310 
       
  1311 subsubsection {* Unifying type copies and datatypes *}
       
  1312 
       
  1313 text {*
       
  1314   Since datatypes and type copies are mapped to the same concept (datatypes)
       
  1315   by code generation, the view on both is unified \qt{code types}:
       
  1316 *}
       
  1317 
       
  1318 text %mlref {*
       
  1319   \begin{mldecls}
       
  1320   @{index_ML_type DatatypeCodegen.hook: "(string * (bool * ((string * sort) list
       
  1321     * (string * typ list) list))) list
       
  1322     -> theory -> theory"} \\
       
  1323   @{index_ML DatatypeCodegen.add_codetypes_hook_bootstrap: "
       
  1324       DatatypeCodegen.hook -> theory -> theory"}
       
  1325   \end{mldecls}
       
  1326 *}
       
  1327 
       
  1328 text {*
       
  1329   \begin{description}
       
  1330 
       
  1331   \item @{ML_type DatatypeCodegen.hook} specifies the code type hook
       
  1332      interface: a theory transformation depending on a list of
       
  1333      mutual recursive code types; each entry in the list
       
  1334      has the structure @{text "(name, (is_data, (vars, cons)))"}
       
  1335      where @{text name} is the name of the code type, @{text is_data}
       
  1336      is true iff @{text name} is a datatype rather then a type copy,
       
  1337      and @{text "(vars, cons)"} is the specification of the code type.
       
  1338 
       
  1339   \item @{ML DatatypeCodegen.add_codetypes_hook_bootstrap} adds a code
       
  1340      type hook;  the hook is immediately processed for all already
       
  1341      existing datatypes, in blocks of mutual recursive datatypes,
       
  1342      where all datatypes a block depends on are processed before
       
  1343      the block.
       
  1344 
       
  1345   \end{description}
       
  1346 *}*)
       
  1347 
       
  1348 text {*
       
  1349   \emph{Happy proving, happy hacking!}
       
  1350 *}
       
  1351 
       
  1352 end