started tutorial
authorhaftmann
Fri, 20 Oct 2006 10:44:35 +0200
changeset 21058 a32d357dfd70
parent 21057 c45591716692
child 21059 361e62500ab7
started tutorial
doc-src/IsarAdvanced/Codegen/Thy/Codegen.thy
doc-src/IsarAdvanced/Codegen/Thy/ROOT.ML
doc-src/IsarAdvanced/Codegen/codegen.tex
--- a/doc-src/IsarAdvanced/Codegen/Thy/Codegen.thy	Fri Oct 20 10:44:34 2006 +0200
+++ b/doc-src/IsarAdvanced/Codegen/Thy/Codegen.thy	Fri Oct 20 10:44:35 2006 +0200
@@ -9,39 +9,394 @@
 
 section {* Introduction *}
 
+subsection {* Motivation *}
+
 text {*
-  \cite{isa-tutorial}
-*} (* add graphics here *)
+  Executing formal specifications as programs is a well-established
+  topic in the theorem proving community.  With increasing
+  application of theorem proving systems in the area of
+  software development and verification, its relevance manifests
+  for running test cases and rapid prototyping.  In logical
+  calculi like constructive type theory,
+  a notion of executability is implicit due to the nature
+  of the calculus.  In contrast, specifications in Isabelle/HOL
+  can be highly non-executable.  In order to bridge
+  the gap between logic and executable specifications,
+  an explicit non-trivial transformation has to be applied:
+  code generation.
+
+  This tutorial introduces a generic code generator for the
+  Isabelle system \cite{isa-tutorial}.
+  Generic in the sense that the
+  \qn{target language} for which code shall ultimately be
+  generated is not fixed but may be an arbitrary state-of-the-art
+  functional programming language (currently, the implementation
+  supports SML \cite{web:sml} and Haskell \cite{web:haskell}).
+  We aim to provide a
+  versatile environment
+  suitable for software development and verification,
+  structuring the process
+  of code generation into a small set of orthogonal principles
+  while achieving a big coverage of application areas
+  with maximum flexibility.
+*}
+
+
+subsection {* Overview *}
+
+text {*
+  The code generator aims to be usable with no further ado
+  in most cases while allowing for detailed customization.
+  This manifests in the structure of this tutorial: this introduction
+  continous with a short introduction of concepts.  Section
+  \secref{sec:basics} explains how to use the framework naivly,
+  presuming a reasonable default setup.  Then, section
+  \secref{sec:advanced} deals with advanced topics,
+  introducing further aspects of the code generator framework
+  in a motivation-driven manner.  Last, section \secref{sec:ml}
+  introduces the framework's internal programming interfaces.
 
-section {* Basics *}
+  \begin{warn}
+    Ultimately, the code generator which this tutorial deals with
+    is supposed to replace the already established code generator
+    by Stefan Berghofer \cite{Berghofer-Nipkow:2002}.
+    So, for the momennt, there are two distinct code generators
+    in Isabelle.
+    Also note that while the framework itself is largely
+    object-logic independent, only HOL provides a reasonable
+    framework setup.    
+  \end{warn}
+*}
+
+
+subsection {* Code generation process *}
+
+text {*
+  The code generator employs a notion of executability
+  for three foundational executable ingredients known
+  from functional programming:
+  \emph{function equations}, \emph{datatypes}, and
+  \emph{type classes}. A function equation as a first approximation
+  is a theorem of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"}
+  (an equation headed by a constant @{text f} with arguments
+  @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right hand side @{text t}.
+  Code generation aims to turn function equations
+  into a functional program by running through
+  a process (see figure \ref{fig:process}):
+
+  \begin{itemize}
+
+    \item Out of the vast collection of theorems proven in a
+      \qn{theory}, a reasonable subset modeling
+      function equations is \qn{selected}.
+
+    \item On those selected theorems, certain
+      transformations are carried out
+      (\qn{preprocessing}).  Their purpose is to turn theorems
+      representing non- or badly executable
+      specifications into equivalent but executable counterparts.
+      The result is a structured collection of \qn{code theorems}.
+
+    \item These \qn{code theorems} then are extracted
+      into an Haskell-like intermediate
+      language.
+
+    \item Finally, out of the intermediate language the final
+      code in the desired \qn{target language} is \qn{serialized}.
+
+  \end{itemize}
+
+  \begin{figure}[h]
+  \centering
+  \includegraphics[width=0.3\textwidth]{codegen_process}
+  \caption{code generator -- processing overview}
+  \label{fig:process}
+  \end{figure}
+
+  From these steps, only the two last are carried out
+  outside the logic; by keeping this layer as
+  thin as possible, the amount of code to trust is
+  kept to a minimum.
+*}
+
+
+
+section {* Basics \label{sec:basics} *}
 
 subsection {* Invoking the code generator *}
 
+text {*
+  Thanks to a reasonable setup of the HOL theories, in
+  most cases code generation proceeds without further ado:
+*}
+
+consts
+  fac :: "nat \<Rightarrow> nat"
+
+primrec
+  "fac 0 = 1"
+  "fac (Suc n) = Suc n * fac n"
+
+text {*
+  This executable specification is now turned to SML code:
+*}
+
+code_gen fac (SML "examples/fac.ML")
+
+text {*
+  The \isasymCODEGEN command takes a space-seperated list of
+  constants together with \qn{serialization directives}
+  in parentheses. These start with a \qn{target language}
+  identifer, followed by arguments, their semantics
+  depending on the target. In the SML case, a filename
+  is given where to write the generated code to.
+
+  Internally, the function equations for all selected
+  constants are taken, including any tranitivly required
+  constants, datatypes and classes, resulting in the following
+  code:
+
+  \lstsml{Thy/examples/fac.ML}
+
+  The code generator will complain when a required
+  ingredient does not provide a executable counterpart.
+  This is the case if an involved type is not a datatype:
+*}
+
+(*<*)
+setup {* Sign.add_path "foo" *}
+(*>*)
+
+typedecl 'a foo
+
+definition
+  bar :: "'a foo \<Rightarrow> 'a \<Rightarrow> 'a"
+  "bar x y = y"
+
+(*<*)
+hide type foo
+hide const bar
+
+setup {* Sign.parent_path *}
+
+datatype 'a foo = Foo
+
+definition
+  bar :: "'a foo \<Rightarrow> 'a \<Rightarrow> 'a"
+  "bar x y = y"
+(*>*)
+
+code_gen bar (SML "examples/fail_type.ML")
+
+text {*
+  \noindent will result in an error. Likewise, generating code
+  for constants not yielding
+  a function equation will fail, e.g.~the Hilbert choice
+  operation @{text "SOME"}:
+*}
+
+(*<*)
+setup {* Sign.add_path "foo" *}
+(*>*)
+
+definition
+  pick_some :: "'a list \<Rightarrow> 'a"
+  "pick_some xs = (SOME x. x \<in> set xs)"
+
+(*<*)
+hide const pick_some
+
+setup {* Sign.parent_path *}
+
+definition
+  pick_some :: "'a list \<Rightarrow> 'a"
+  "pick_some = hd"
+(*>*)
+
+code_gen pick_some (SML "examples/fail_const.ML")
+
 subsection {* Theorem selection *}
 
-(* print_codethms, code func, default setup code nofunc *)
+text {*
+  The list of all function equations in a theory may be inspected
+  using the \isasymPRINTCODETHMS command:
+*}
+
+print_codethms
+
+text {*
+  \noindent which displays a table of constant with corresponding
+  function equations (the additional stuff displayed
+  shall not bother us for the moment). If this table does
+  not provide at least one function
+  equation, the table of primititve definitions is searched
+  whether it provides one.
+
+  The typical HOL tools are already set up in a way that
+  function definitions introduced by \isasymFUN, \isasymFUNCTION,
+  \isasymPRIMREC, \isasymRECDEF are implicitly propagated
+  to this function equation table. Specific theorems may be
+  selected using an attribute: \emph{code func}. As example,
+  a weight selector function:
+*}
+
+consts
+  pick :: "(nat \<times> 'a) list \<Rightarrow> nat \<Rightarrow> 'a"
+
+primrec
+  "pick (x#xs) n = (let (k, v) = x in
+    if n < k then v else pick xs (n - k))"
+
+text {*
+  We want to eliminate the explicit destruction
+  of @{term x} to @{term "(k, v)"}:
+*}
+
+lemma [code func]:
+  "pick ((k, v)#xs) n = (if n < k then v else pick xs (n - k))"
+  by simp
+
+code_gen pick (SML "examples/pick1.ML")
+
+text {*
+  This theorem is now added to the function equation table:
+
+  \lstsml{Thy/examples/pick1.ML}
+
+  It might be convenient to remove the pointless original
+  equation, using the \emph{nofunc} attribute:
+*}
+
+lemmas [code nofunc] = pick.simps 
+
+code_gen pick (SML "examples/pick2.ML")
+
+text {*
+  \lstsml{Thy/examples/pick2.ML}
+
+  Syntactic redundancies are implicitly dropped. For example,
+  using a modified version of the @{const fac} function
+  as function equation, the then redundant (since
+  syntactically more subsumed) original function equations
+  are dropped, resulting in a warning:
+*}
+
+lemma [code func]:
+  "fac n = (case n of 0 \<Rightarrow> 1 | Suc m \<Rightarrow> n * fac m)"
+  by (cases n) simp_all
+
+code_gen fac (SML "examples/fac_case.ML")
+
+text {*
+  \lstsml{Thy/examples/fac_case.ML}
+
+  \begin{warn}
+    Some statements in this section have to be treated with some
+    caution. First, since the HOL function package is still
+    under development, its setup with respect to code generation
+    may differ from what is presumed here.
+    Further, the attributes \emph{code} and \emph{code del}
+    associated with the existing code generator also apply to
+    the new one: \emph{code} implies \emph{code func},
+    and \emph{code del} implies \emph{code nofunc}.
+  \end{warn}
+*}
 
 subsection {* Type classes *}
 
+text {*
+  Type classes enter the game via the Isar class package.
+  For a short introduction how to use it, see \fixme[ref];
+  here we just illustrate its relation on code generation.
+
+  In a target language, type classes may be represented
+  nativly (as in the case of Haskell). For languages
+  like SML, they are implemented using \emph{dictionaries}.
+  Our following example specified a class \qt{null},
+  assigning to each of its inhabitants a \qt{null} value:
+*}
+
+class null =
+  fixes null :: 'a
+
+consts
+  head :: "'a\<Colon>null list \<Rightarrow> 'a"
+
+primrec
+  "head [] = null"
+  "head (x#xs) = x"
+
+text {*
+  We provide some instances for our @{text null}:
+*}
+
+instance option :: (type) null
+  "null \<equiv> None" ..
+
+instance list :: (type) null
+  "null \<equiv> []" ..
+
+text {*
+  Constructing a dummy example:
+*}
+
+definition
+  "dummy = head [Some (Suc 0), None]"
+
+text {*
+  Type classes offer a suitable occassion to introduce
+  the Haskell serializer.  Its usage is almost the same
+  as SML, but, in accordance with conventions some
+  common Haskell compilers enforce, each module ends
+  up in a single file which the file given by the user
+  then imports. The module hierarchy is reflected in
+  the file system.
+*}
+
+code_gen dummy (Haskell "examples/codegen.hs")
+  (* NOTE: you may use Haskell only once in this document *)
+
+text {*
+  \lsthaskell{Thy/examples/Codegen.hs}
+
+  (we have left out all other modules).
+
+  The whole code in SML with explicit dictionary passing:
+*}
+
+code_gen dummy (SML "examples/class.ML")
+
+text {*
+  \lstsml{Thy/examples/class.ML}
+*}
+
+subsection {* Incremental code generation *}
+
+(* print_codethms (\<dots>) and code_gen, 2 *)
+
+
+section {* Recipes and advanced topics \label{sec:advanced} *}
+
+(* no reference, IsarRef, but see paper *)
+
+subsection {* Library theories *}
+
 subsection {* Preprocessing *}
 
 (* preprocessing, print_codethms () *)
 
-subsection {* Incremental code generation *}
+subsection {* Customizing serialization  *}
 
-(* print_codethms (\<dots>) *)
-
+(* existing libraries, understanding the type game, reflexive equations, code inline code_constsubst, code_abstype*)
 
-section {* Customizing serialization  *}
-
-section {* Recipes and advanced topics *}
+section {* ML interfaces \label{sec:ml} *}
 
-(* understanding the type game, reflexive equations, code inline code_constsubst, code_abstype*)
-
-section {* ML interfaces *}
+(* under developement *)
 
 subsection {* codegen\_data.ML *}
 
 subsection {* Implementing code generator applications *}
 
+(* hooks *)
+
 end
--- a/doc-src/IsarAdvanced/Codegen/Thy/ROOT.ML	Fri Oct 20 10:44:34 2006 +0200
+++ b/doc-src/IsarAdvanced/Codegen/Thy/ROOT.ML	Fri Oct 20 10:44:35 2006 +0200
@@ -1,4 +1,6 @@
 
 (* $Id$ *)
 
+CodegenSerializer.sml_code_width := 74;
+
 use_thy "Codegen";
--- a/doc-src/IsarAdvanced/Codegen/codegen.tex	Fri Oct 20 10:44:34 2006 +0200
+++ b/doc-src/IsarAdvanced/Codegen/codegen.tex	Fri Oct 20 10:44:35 2006 +0200
@@ -3,6 +3,7 @@
 
 \documentclass[12pt,a4paper,fleqn]{report}
 \usepackage{latexsym,graphicx}
+\usepackage{listings}
 \usepackage[refpage]{nomencl}
 \usepackage{../../iman,../../extra,../../isar,../../proof}
 \usepackage{Thy/document/isabelle,Thy/document/isabellesym}
@@ -31,6 +32,12 @@
 \newcommand{\isasymSHOW}{\cmd{show}}
 \newcommand{\isasymNOTE}{\cmd{note}}
 \newcommand{\isasymIN}{\cmd{in}}
+\newcommand{\isasymCODEGEN}{\cmd{code\_gen}}
+\newcommand{\isasymPRINTCODETHMS}{\cmd{print\_codethms}}
+\newcommand{\isasymFUN}{\cmd{fun}}
+\newcommand{\isasymFUNCTION}{\cmd{function}}
+\newcommand{\isasymPRIMREC}{\cmd{primrec}}
+\newcommand{\isasymRECDEF}{\cmd{recdef}}
 
 \newcommand{\qt}[1]{``#1''}
 \newcommand{\qtt}[1]{"{}{#1}"{}}
@@ -38,6 +45,10 @@
 \newcommand{\strong}[1]{{\bfseries #1}}
 \newcommand{\fixme}[1][!]{\strong{FIXME: #1}}
 
+\lstset{basicstyle=\scriptsize\ttfamily, keywordstyle=\itshape, commentstyle=\itshape\sffamily, frame=shadowbox}
+\newcommand{\lstsml}[1]{\lstset{language=ML}\lstinputlisting{#1}}
+\newcommand{\lsthaskell}[1]{\lstset{language=Haskell}\lstinputlisting{#1}}
+
 \hyphenation{Isabelle}
 \hyphenation{Isar}
 
@@ -52,7 +63,10 @@
 \maketitle
 
 \begin{abstract}
-  \fixme
+  This tutorial gives a motivation-driven introduction
+  to a generic code generator framework in Isabelle
+  for generating executable code in functional
+  programming languages from logical specifications.
 \end{abstract}
 
 \thispagestyle{empty}\clearpage