diff -r 8e4058f4848c -r ffb1c5bf0425 doc-src/Codegen/Thy/Introduction.thy --- a/doc-src/Codegen/Thy/Introduction.thy Fri Aug 13 16:40:47 2010 +0200 +++ b/doc-src/Codegen/Thy/Introduction.thy Mon Aug 16 10:32:14 2010 +0200 @@ -4,71 +4,42 @@ section {* Introduction *} -subsection {* Code generation fundamental: shallow embedding *} - -subsection {* A quick start with the @{text "Isabelle/HOL"} toolbox *} +text {* + This tutorial introduces the code generator facilities of @{text + "Isabelle/HOL"}. It allows to turn (a certain class of) HOL + specifications into corresponding executable code in the programming + languages @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and + @{text Haskell} \cite{haskell-revised-report}. -subsection {* Type classes *} + To profit from this tutorial, some familiarity and experience with + @{theory HOL} \cite{isa-tutorial} and its basic theories is assumed. +*} -subsection {* How to continue from here *} -subsection {* If something goes utterly wrong *} +subsection {* Code generation principle: shallow embedding \label{sec:principle} *} text {* - This tutorial introduces a generic code generator for the - @{text Isabelle} system. - The - \qn{target language} for which code is - generated is not fixed, but may be one of several - functional programming languages (currently, the implementation - supports @{text SML} \cite{SML}, @{text OCaml} \cite{OCaml} and @{text Haskell} - \cite{haskell-revised-report}). - - Conceptually the code generator framework is part - of Isabelle's @{theory Pure} meta logic framework; the logic - @{theory HOL} \cite{isa-tutorial}, which is an extension of @{theory Pure}, - already comes with a reasonable framework setup and thus provides - a good basis for creating code-generation-driven - applications. So, we assume some familiarity and experience - with the ingredients of the @{theory HOL} distribution theories. - - The code generator aims to be usable with no further ado - in most cases, while allowing for detailed customisation. - This can be seen in the structure of this tutorial: after a short - conceptual introduction with an example (\secref{sec:intro}), - we discuss the generic customisation facilities (\secref{sec:program}). - A further section (\secref{sec:adaptation}) is dedicated to the matter of - \qn{adaptation} to specific target language environments. After some - further issues (\secref{sec:further}) we conclude with an overview - of some ML programming interfaces (\secref{sec:ml}). - - \begin{warn} - Ultimately, the code generator which this tutorial deals with - is supposed to replace the existing code generator - by Stefan Berghofer \cite{Berghofer-Nipkow:2002}. - So, for the moment, there are two distinct code generators - in Isabelle. In case of ambiguity, we will refer to the framework - described here as @{text "generic code generator"}, to the - other as @{text "SML code generator"}. - Also note that while the framework itself is - object-logic independent, only @{theory HOL} provides a reasonable - framework setup. - \end{warn} - + The key concept for understanding Isabelle's code generation is + \emph{shallow embedding}: logical entities like constants, types and + classes are identified with corresponding entities in the target + language. In particular, the carrier of a generated program's + semantics are \emph{equational theorems} from the logic. If we view + a generated program as an implementation of a higher-order rewrite + system, then every rewrite step performed by the program can be + simulated in the logic, which guarantees partial correctness + \cite{Haftmann-Nipkow:2010:code}. *} -subsection {* Code generation via shallow embedding \label{sec:intro} *} + +subsection {* A quick start with the Isabelle/HOL toolbox \label{sec:queue_example} *} text {* - The key concept for understanding @{text Isabelle}'s code generation is - \emph{shallow embedding}, i.e.~logical entities like constants, types and - classes are identified with corresponding concepts in the target language. + In a HOL theory, the @{command datatype} and @{command + definition}/@{command primrec}/@{command fun} declarations form the + core of a functional programming language. By default equational + theorems stemming from those are used for generated code, therefore + \qt{naive} code generation can proceed without further ado. - Inside @{theory HOL}, the @{command datatype} and - @{command definition}/@{command primrec}/@{command fun} declarations form - the core of a functional programming language. The default code generator setup - transforms those into functional programs immediately. - This means that \qt{naive} code generation can proceed without further ado. For example, here a simple \qt{implementation} of amortised queues: *} @@ -84,7 +55,11 @@ "dequeue (AQueue [] []) = (None, AQueue [] [])" | "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)" | "dequeue (AQueue xs []) = - (case rev xs of y # ys \ (Some y, AQueue [] ys))" + (case rev xs of y # ys \ (Some y, AQueue [] ys))" (*<*) + +lemma %invisible dequeue_nonempty_Nil [simp]: + "xs \ [] \ dequeue (AQueue xs []) = (case rev xs of y # ys \ (Some y, AQueue [] ys))" + by (cases xs) (simp_all split: list.splits) (*>*) text {* \noindent Then we can generate code e.g.~for @{text SML} as follows: *} @@ -96,112 +71,163 @@ text %quote {*@{code_stmts empty enqueue dequeue (SML)}*} text {* - \noindent The @{command export_code} command takes a space-separated list of - constants for which code shall be generated; anything else needed for those - is added implicitly. Then follows a target language identifier - (@{text SML}, @{text OCaml} or @{text Haskell}) and a freely chosen module name. - A file name denotes the destination to store the generated code. Note that - the semantics of the destination depends on the target language: for - @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text Haskell} - it denotes a \emph{directory} where a file named as the module name - (with extension @{text ".hs"}) is written: + \noindent The @{command export_code} command takes a space-separated + list of constants for which code shall be generated; anything else + needed for those is added implicitly. Then follows a target + language identifier and a freely chosen module name. A file name + denotes the destination to store the generated code. Note that the + semantics of the destination depends on the target language: for + @{text SML} and @{text OCaml} it denotes a \emph{file}, for @{text + Haskell} it denotes a \emph{directory} where a file named as the + module name (with extension @{text ".hs"}) is written: *} export_code %quote empty dequeue enqueue in Haskell module_name Example file "examples/" text {* - \noindent This is the corresponding code in @{text Haskell}: + \noindent This is the corresponding code: *} text %quote {*@{code_stmts empty enqueue dequeue (Haskell)}*} text {* - \noindent This demonstrates the basic usage of the @{command export_code} command; - for more details see \secref{sec:further}. + \noindent For more details about @{command export_code} see + \secref{sec:further}. *} -subsection {* If something utterly fails *} + +subsection {* Type classes *} text {* - Under certain circumstances, the code generator fails to produce - code entirely. - - \begin{description} - - \ditem{generate only one module} - - \ditem{check with a different target language} - - \ditem{inspect code equations} - - \ditem{inspect preprocessor setup} - - \ditem{generate exceptions} - - \ditem{remove offending code equations} - - \end{description} + Code can also be generated from type classes in a Haskell-like + manner. For illustration here an example from abstract algebra: *} -subsection {* Code generator architecture \label{sec:concept} *} +class %quote semigroup = + fixes mult :: "'a \ 'a \ 'a" (infixl "\" 70) + assumes assoc: "(x \ y) \ z = x \ (y \ z)" + +class %quote monoid = semigroup + + fixes neutral :: 'a ("\") + assumes neutl: "\ \ x = x" + and neutr: "x \ \ = x" + +instantiation %quote nat :: monoid +begin + +primrec %quote mult_nat where + "0 \ n = (0\nat)" + | "Suc m \ n = n + m \ n" + +definition %quote neutral_nat where + "\ = Suc 0" + +lemma %quote add_mult_distrib: + fixes n m q :: nat + shows "(n + m) \ q = n \ q + m \ q" + by (induct n) simp_all + +instance %quote proof + fix m n q :: nat + show "m \ n \ q = m \ (n \ q)" + by (induct m) (simp_all add: add_mult_distrib) + show "\ \ n = n" + by (simp add: neutral_nat_def) + show "m \ \ = m" + by (induct m) (simp_all add: neutral_nat_def) +qed + +end %quote text {* - What you have seen so far should be already enough in a lot of cases. If you - are content with this, you can quit reading here. Anyway, in order to customise - and adapt the code generator, it is necessary to gain some understanding - how it works. + \noindent We define the natural operation of the natural numbers + on monoids: +*} + +primrec %quote (in monoid) pow :: "nat \ 'a \ 'a" where + "pow 0 a = \" + | "pow (Suc n) a = a \ pow n a" - \begin{figure}[h] - \includegraphics{architecture} - \caption{Code generator architecture} - \label{fig:arch} - \end{figure} +text {* + \noindent This we use to define the discrete exponentiation + function: +*} + +definition %quote bexp :: "nat \ nat" where + "bexp n = pow n (Suc (Suc 0))" + +text {* + \noindent The corresponding code in Haskell uses that language's + native classes: +*} + +text %quote {*@{code_stmts bexp (Haskell)}*} - The code generator employs a notion of executability - for three foundational executable ingredients known - from functional programming: - \emph{code equations}, \emph{datatypes}, and - \emph{type classes}. A code equation as a first approximation - is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \ t\<^isub>n \ t"} - (an equation headed by a constant @{text f} with arguments - @{text "t\<^isub>1 t\<^isub>2 \ t\<^isub>n"} and right hand side @{text t}). - Code generation aims to turn code equations - into a functional program. This is achieved by three major - components which operate sequentially, i.e. the result of one is - the input - of the next in the chain, see figure \ref{fig:arch}: +text {* + \noindent This is a convenient place to show how explicit dictionary + construction manifests in generated code -- the same example in + @{text SML}: +*} + +text %quote {*@{code_stmts bexp (SML)}*} + +text {* + \noindent Note the parameters with trailing underscore (@{verbatim + "A_"}), which are the dictionary parameters. +*} + + +subsection {* How to continue from here *} + +text {* + What you have seen so far should be already enough in a lot of + cases. If you are content with this, you can quit reading here. + + Anyway, to understand situations where problems occur or to increase + the scope of code generation beyond default, it is necessary to gain + some understanding how the code generator actually works: \begin{itemize} - \item The starting point is a collection of raw code equations in a - theory. It is not relevant where they - stem from, but typically they were either produced by specification - tools or proved explicitly by the user. - - \item These raw code equations can be subjected to theorem transformations. This - \qn{preprocessor} can apply the full - expressiveness of ML-based theorem transformations to code - generation. The result of preprocessing is a - structured collection of code equations. + \item The foundations of the code generator are described in + \secref{sec:foundations}. + + \item In particular \secref{sec:utterly_wrong} gives hints how to + debug situations where code generation does not succeed as + expected. - \item These code equations are \qn{translated} to a program in an - abstract intermediate language. Think of it as a kind - of \qt{Mini-Haskell} with four \qn{statements}: @{text data} - (for datatypes), @{text fun} (stemming from code equations), - also @{text class} and @{text inst} (for type classes). + \item The scope and quality of generated code can be increased + dramatically by applying refinement techniques, which are + introduced in \secref{sec:refinement}. - \item Finally, the abstract program is \qn{serialised} into concrete - source code of a target language. - This step only produces concrete syntax but does not change the - program in essence; all conceptual transformations occur in the - translation step. + \item Inductive predicates can be turned executable using an + extension of the code generator \secref{sec:inductive}. + + \item You may want to skim over the more technical sections + \secref{sec:adaptation} and \secref{sec:further}. + + \item For exhaustive syntax diagrams etc. you should visit the + Isabelle/Isar Reference Manual \cite{isabelle-isar-ref}. \end{itemize} - \noindent From these steps, only the two last are carried out outside the logic; by - keeping this layer as thin as possible, the amount of code to trust is - kept to a minimum. + \bigskip + + \begin{center}\fbox{\fbox{\begin{minipage}{8cm} + + \begin{center}\textit{Happy proving, happy hacking!}\end{center} + + \end{minipage}}}\end{center} + + \begin{warn} + There is also a more ancient code generator in Isabelle by Stefan + Berghofer \cite{Berghofer-Nipkow:2002}. Although its + functionality is covered by the code generator presented here, it + will sometimes show up as an artifact. In case of ambiguity, we + will refer to the framework described here as @{text "generic code + generator"}, to the other as @{text "SML code generator"}. + \end{warn} *} end