summary |
shortlog |
changelog |
graph |
tags |
bookmarks |
branches |
files |
changeset |
file |
latest |
revisions |
annotate |
diff |
comparison |
raw |
help

doc-src/TutorialI/fp.tex

author | nipkow |

Tue, 12 Sep 2000 15:43:15 +0200 | |

changeset 9933 | 9feb1e0c4cb3 |

parent 9844 | 8016321c7de1 |

child 10181 | c07860c826c5 |

permissions | -rw-r--r-- |

*** empty log message ***

\chapter{Functional Programming in HOL} Although on the surface this chapter is mainly concerned with how to write functional programs in HOL and how to verify them, most of the constructs and proof procedures introduced are general purpose and recur in any specification or verification task. The dedicated functional programmer should be warned: HOL offers only \emph{total functional programming} --- all functions in HOL must be total; lazy data structures are not directly available. On the positive side, functions in HOL need not be computable: HOL is a specification language that goes well beyond what can be expressed as a program. However, for the time being we concentrate on the computable. \section{An introductory theory} \label{sec:intro-theory} Functional programming needs datatypes and functions. Both of them can be defined in a theory with a syntax reminiscent of languages like ML or Haskell. As an example consider the theory in figure~\ref{fig:ToyList}. We will now examine it line by line. \begin{figure}[htbp] \begin{ttbox}\makeatother \input{ToyList2/ToyList1}\end{ttbox} \caption{A theory of lists} \label{fig:ToyList} \end{figure} {\makeatother\input{ToyList/document/ToyList.tex}} The complete proof script is shown in figure~\ref{fig:ToyList-proofs}. The concatenation of figures \ref{fig:ToyList} and \ref{fig:ToyList-proofs} constitutes the complete theory \texttt{ToyList} and should reside in file \texttt{ToyList.thy}. It is good practice to present all declarations and definitions at the beginning of a theory to facilitate browsing. \begin{figure}[htbp] \begin{ttbox}\makeatother \input{ToyList2/ToyList2}\end{ttbox} \caption{Proofs about lists} \label{fig:ToyList-proofs} \end{figure} \subsubsection*{Review} This is the end of our toy proof. It should have familiarized you with \begin{itemize} \item the standard theorem proving procedure: state a goal (lemma or theorem); proceed with proof until a separate lemma is required; prove that lemma; come back to the original goal. \item a specific procedure that works well for functional programs: induction followed by all-out simplification via \isa{auto}. \item a basic repertoire of proof commands. \end{itemize} \section{Some helpful commands} \label{sec:commands-and-hints} This section discusses a few basic commands for manipulating the proof state and can be skipped by casual readers. There are two kinds of commands used during a proof: the actual proof commands and auxiliary commands for examining the proof state and controlling the display. Simple proof commands are of the form \isacommand{apply}\isa{(method)}\indexbold{apply} where \bfindex{method} is a synonym for ``theorem proving function''. Typical examples are \isa{induct_tac} and \isa{auto}. Further methods are introduced throughout the tutorial. Unless stated otherwise you may assume that a method attacks merely the first subgoal. An exception is \isa{auto} which tries to solve all subgoals. The most useful auxiliary commands are: \begin{description} \item[Undoing:] \isacommand{undo}\indexbold{*undo} undoes the effect of the last command; \isacommand{undo} can be undone by \isacommand{redo}\indexbold{*redo}. Both are only needed at the shell level and should not occur in the final theory. \item[Printing the current state:] \isacommand{pr}\indexbold{*pr} redisplays the current proof state, for example when it has disappeared off the screen. \item[Limiting the number of subgoals:] \isacommand{pr}~$n$ tells Isabelle to print only the first $n$ subgoals from now on and redisplays the current proof state. This is helpful when there are many subgoals. \item[Modifying the order of subgoals:] \isacommand{defer}\indexbold{*defer} moves the first subgoal to the end and \isacommand{prefer}\indexbold{*prefer}~$n$ moves subgoal $n$ to the front. \item[Printing theorems:] \isacommand{thm}\indexbold{*thm}~\textit{name}$@1$~\dots~\textit{name}$@n$ prints the named theorems. \item[Displaying types:] We have already mentioned the flag \ttindex{show_types} above. It can also be useful for detecting typos in formulae early on. For example, if \texttt{show_types} is set and the goal \isa{rev(rev xs) = xs} is started, Isabelle prints the additional output \par\noindent \begin{isabelle}% Variables:\isanewline ~~xs~::~'a~list \end{isabelle}% \par\noindent which tells us that Isabelle has correctly inferred that \isa{xs} is a variable of list type. On the other hand, had we made a typo as in \isa{rev(re xs) = xs}, the response \par\noindent \begin{isabelle}% Variables:\isanewline ~~re~::~'a~list~{\isasymRightarrow}~'a~list\isanewline ~~xs~::~'a~list% \end{isabelle}% \par\noindent would have alerted us because of the unexpected variable \isa{re}. \item[Reading terms and types:] \isacommand{term}\indexbold{*term} \textit{string} reads, type-checks and prints the given string as a term in the current context; the inferred type is output as well. \isacommand{typ}\indexbold{*typ} \textit{string} reads and prints the given string as a type in the current context. \item[(Re)loading theories:] When you start your interaction you must open a named theory with the line \isa{\isacommand{theory}~T~=~\dots~:}. Isabelle automatically loads all the required parent theories from their respective files (which may take a moment, unless the theories are already loaded and the files have not been modified). If you suddenly discover that you need to modify a parent theory of your current theory you must first abandon your current theory\indexbold{abandon theory}\indexbold{theory!abandon} (at the shell level type \isacommand{kill}\indexbold{*kill}). After the parent theory has been modified you go back to your original theory. When its first line \isacommand{theory}\texttt{~T~=}~\dots~\texttt{:} is processed, the modified parent is reloaded automatically. The only time when you need to load a theory by hand is when you simply want to check if it loads successfully without wanting to make use of the theory itself. This you can do by typing \isa{\isacommand{use\_thy}\indexbold{*use_thy}~"T"}. \end{description} Further commands are found in the Isabelle/Isar Reference Manual. We now examine Isabelle's functional programming constructs systematically, starting with inductive datatypes. \section{Datatypes} \label{sec:datatype} Inductive datatypes are part of almost every non-trivial application of HOL. First we take another look at a very important example, the datatype of lists, before we turn to datatypes in general. The section closes with a case study. \subsection{Lists} \indexbold{*list} Lists are one of the essential datatypes in computing. Readers of this tutorial and users of HOL need to be familiar with their basic operations. Theory \isa{ToyList} is only a small fragment of HOL's predefined theory \isa{List}\footnote{\url{http://isabelle.in.tum.de/library/HOL/List.html}}. The latter contains many further operations. For example, the functions \isaindexbold{hd} (``head'') and \isaindexbold{tl} (``tail'') return the first element and the remainder of a list. (However, pattern-matching is usually preferable to \isa{hd} and \isa{tl}.) Theory \isa{List} also contains more syntactic sugar: \isa{[}$x@1$\isa{,}\dots\isa{,}$x@n$\isa{]} abbreviates $x@1$\isa{\#}\dots\isa{\#}$x@n$\isa{\#[]}. In the rest of the tutorial we always use HOL's predefined lists. \subsection{The general format} \label{sec:general-datatype} The general HOL \isacommand{datatype} definition is of the form \[ \isacommand{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~ C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~ C@m~\tau@{m1}~\dots~\tau@{mk@m} \] where $\alpha@i$ are distinct type variables (the parameters), $C@i$ are distinct constructor names and $\tau@{ij}$ are types; it is customary to capitalize the first letter in constructor names. There are a number of restrictions (such as that the type should not be empty) detailed elsewhere~\cite{isabelle-HOL}. Isabelle notifies you if you violate them. Laws about datatypes, such as \isa{[] \isasymnoteq~x\#xs} and \isa{(x\#xs = y\#ys) = (x=y \isasymand~xs=ys)}, are used automatically during proofs by simplification. The same is true for the equations in primitive recursive function definitions. Every datatype $t$ comes equipped with a \isa{size} function from $t$ into the natural numbers (see~\S\ref{sec:nat} below). For lists, \isa{size} is just the length, i.e.\ \isa{size [] = 0} and \isa{size(x \# xs) = size xs + 1}. In general, \isa{size} returns \isa{0} for all constructors that do not have an argument of type $t$, and for all other constructors \isa{1 +} the sum of the sizes of all arguments of type $t$. Note that because \isa{size} is defined on every datatype, it is overloaded; on lists \isa{size} is also called \isa{length}, which is not overloaded. \subsection{Primitive recursion} Functions on datatypes are usually defined by recursion. In fact, most of the time they are defined by what is called \bfindex{primitive recursion}. The keyword \isacommand{primrec}\indexbold{*primrec} is followed by a list of equations \[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \] such that $C$ is a constructor of the datatype $t$ and all recursive calls of $f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus Isabelle immediately sees that $f$ terminates because one (fixed!) argument becomes smaller with every recursive call. There must be exactly one equation for each constructor. Their order is immaterial. A more general method for defining total recursive functions is introduced in \S\ref{sec:recdef}. \begin{exercise}\label{ex:Tree} \input{Misc/document/Tree.tex}% \end{exercise} \input{Misc/document/case_exprs.tex} \begin{warn} Induction is only allowed on free (or \isasymAnd-bound) variables that should not occur among the assumptions of the subgoal; see \S\ref{sec:ind-var-in-prems} for details. Case distinction (\isa{case_tac}) works for arbitrary terms, which need to be quoted if they are non-atomic. \end{warn} \input{Ifexpr/document/Ifexpr.tex} \section{Some basic types} \subsection{Natural numbers} \label{sec:nat} \index{arithmetic|(} \input{Misc/document/fakenat.tex} \input{Misc/document/natsum.tex} The usual arithmetic operations \ttindexboldpos{+}{$HOL2arithfun}, \ttindexboldpos{-}{$HOL2arithfun}, \ttindexboldpos{*}{$HOL2arithfun}, \isaindexbold{div}, \isaindexbold{mod}, \isaindexbold{min} and \isaindexbold{max} are predefined, as are the relations \indexboldpos{\isasymle}{$HOL2arithrel} and \ttindexboldpos{<}{$HOL2arithrel}. There is even a least number operation \isaindexbold{LEAST}. For example, \isa{(LEAST n.$\,$1 < n) = 2}, although Isabelle does not prove this completely automatically. Note that \isa{1} and \isa{2} are available as abbreviations for the corresponding \isa{Suc}-expressions. If you need the full set of numerals, see~\S\ref{nat-numerals}. \begin{warn} The constant \ttindexbold{0} and the operations \ttindexboldpos{+}{$HOL2arithfun}, \ttindexboldpos{-}{$HOL2arithfun}, \ttindexboldpos{*}{$HOL2arithfun}, \isaindexbold{min}, \isaindexbold{max}, \indexboldpos{\isasymle}{$HOL2arithrel} and \ttindexboldpos{<}{$HOL2arithrel} are overloaded, i.e.\ they are available not just for natural numbers but at other types as well (see \S\ref{sec:TypeClasses}). For example, given the goal \isa{x+0 = x}, there is nothing to indicate that you are talking about natural numbers. Hence Isabelle can only infer that \isa{x} is of some arbitrary type where \isa{0} and \isa{+} are declared. As a consequence, you will be unable to prove the goal (although it may take you some time to realize what has happened if \texttt{show_types} is not set). In this particular example, you need to include an explicit type constraint, for example \isa{x+0 = (x::nat)}. If there is enough contextual information this may not be necessary: \isa{Suc x = x} automatically implies \isa{x::nat} because \isa{Suc} is not overloaded. \end{warn} Simple arithmetic goals are proved automatically by both \isa{auto} and the simplification methods introduced in \S\ref{sec:Simplification}. For example, \input{Misc/document/arith1.tex}% is proved automatically. The main restriction is that only addition is taken into account; other arithmetic operations and quantified formulae are ignored. For more complex goals, there is the special method \isaindexbold{arith} (which attacks the first subgoal). It proves arithmetic goals involving the usual logical connectives (\isasymnot, \isasymand, \isasymor, \isasymimp), the relations \isasymle\ and \isa{<}, and the operations \isa{+}, \isa{-}, \isa{min} and \isa{max}. For example, \input{Misc/document/arith2.tex}% succeeds because \isa{k*k} can be treated as atomic. In contrast, \input{Misc/document/arith3.tex}% is not even proved by \isa{arith} because the proof relies essentially on properties of multiplication. \begin{warn} The running time of \isa{arith} is exponential in the number of occurrences of \ttindexboldpos{-}{$HOL2arithfun}, \isaindexbold{min} and \isaindexbold{max} because they are first eliminated by case distinctions. \isa{arith} is incomplete even for the restricted class of formulae described above (known as ``linear arithmetic''). If divisibility plays a role, it may fail to prove a valid formula, for example $m+m \neq n+n+1$. Fortunately, such examples are rare in practice. \end{warn} \index{arithmetic|)} \subsection{Products} \input{Misc/document/pairs.tex} %FIXME move stuff below into section on proofs about products? \begin{warn} Abstraction over pairs and tuples is merely a convenient shorthand for a more complex internal representation. Thus the internal and external form of a term may differ, which can affect proofs. If you want to avoid this complication, use \isa{fst} and \isa{snd}, i.e.\ write \isa{\isasymlambda{}p.~fst p + snd p} instead of \isa{\isasymlambda(x,y).~x + y}. See~\S\ref{proofs-about-products} for theorem proving with tuple patterns. \end{warn} Note that products, like natural numbers, are datatypes, which means in particular that \isa{induct_tac} and \isa{case_tac} are applicable to products (see \S\ref{proofs-about-products}). Instead of tuples with many components (where ``many'' is not much above 2), it is far preferable to use record types (see \S\ref{sec:records}). \section{Definitions} \label{sec:Definitions} A definition is simply an abbreviation, i.e.\ a new name for an existing construction. In particular, definitions cannot be recursive. Isabelle offers definitions on the level of types and terms. Those on the type level are called type synonyms, those on the term level are called (constant) definitions. \subsection{Type synonyms} \indexbold{type synonym} Type synonyms are similar to those found in ML. Their syntax is fairly self explanatory: \input{Misc/document/types.tex}% Note that pattern-matching is not allowed, i.e.\ each definition must be of the form $f\,x@1\,\dots\,x@n~\isasymequiv~t$. Section~\S\ref{sec:Simplification} explains how definitions are used in proofs. \input{Misc/document/prime_def.tex} \chapter{More Functional Programming} The purpose of this chapter is to deepen the reader's understanding of the concepts encountered so far and to introduce advanced forms of datatypes and recursive functions. The first two sections give a structured presentation of theorem proving by simplification (\S\ref{sec:Simplification}) and discuss important heuristics for induction (\S\ref{sec:InductionHeuristics}). They can be skipped by readers less interested in proofs. They are followed by a case study, a compiler for expressions (\S\ref{sec:ExprCompiler}). Advanced datatypes, including those involving function spaces, are covered in \S\ref{sec:advanced-datatypes}, which closes with another case study, search trees (``tries''). Finally we introduce \isacommand{recdef}, a very general form of recursive function definition which goes well beyond what \isacommand{primrec} allows (\S\ref{sec:recdef}). \section{Simplification} \label{sec:Simplification} \index{simplification|(} So far we have proved our theorems by \isa{auto}, which ``simplifies'' \emph{all} subgoals. In fact, \isa{auto} can do much more than that, except that it did not need to so far. However, when you go beyond toy examples, you need to understand the ingredients of \isa{auto}. This section covers the method that \isa{auto} always applies first, namely simplification. Simplification is one of the central theorem proving tools in Isabelle and many other systems. The tool itself is called the \bfindex{simplifier}. The purpose of this section is introduce the many features of the simplifier. Anybody intending to use HOL should read this section. Later on (\S\ref{sec:simplification-II}) we explain some more advanced features and a little bit of how the simplifier works. The serious student should read that section as well, in particular in order to understand what happened if things do not simplify as expected. \subsubsection{What is simplification} In its most basic form, simplification means repeated application of equations from left to right. For example, taking the rules for \isa{\at} and applying them to the term \isa{[0,1] \at\ []} results in a sequence of simplification steps: \begin{ttbox}\makeatother (0#1#[]) @ [] \(\leadsto\) 0#((1#[]) @ []) \(\leadsto\) 0#(1#([] @ [])) \(\leadsto\) 0#1#[] \end{ttbox} This is also known as \bfindex{term rewriting}\indexbold{rewriting} and the equations are referred to as \textbf{rewrite rules}\indexbold{rewrite rule}. ``Rewriting'' is more honest than ``simplification'' because the terms do not necessarily become simpler in the process. \input{Misc/document/simp.tex} \index{simplification|)} \input{Misc/document/Itrev.tex} \begin{exercise} \input{Misc/document/Tree2.tex}% \end{exercise} \input{CodeGen/document/CodeGen.tex} \section{Advanced datatypes} \label{sec:advanced-datatypes} \index{*datatype|(} \index{*primrec|(} %|) This section presents advanced forms of \isacommand{datatype}s. \subsection{Mutual recursion} \label{sec:datatype-mut-rec} \input{Datatype/document/ABexpr.tex} \subsection{Nested recursion} \label{sec:nested-datatype} {\makeatother\input{Datatype/document/Nested.tex}} \subsection{The limits of nested recursion} How far can we push nested recursion? By the unfolding argument above, we can reduce nested to mutual recursion provided the nested recursion only involves previously defined datatypes. This does not include functions: \begin{isabelle} \isacommand{datatype} t = C "t \isasymRightarrow\ bool" \end{isabelle} is a real can of worms: in HOL it must be ruled out because it requires a type \isa{t} such that \isa{t} and its power set \isa{t \isasymFun\ bool} have the same cardinality---an impossibility. For the same reason it is not possible to allow recursion involving the type \isa{set}, which is isomorphic to \isa{t \isasymFun\ bool}. Fortunately, a limited form of recursion involving function spaces is permitted: the recursive type may occur on the right of a function arrow, but never on the left. Hence the above can of worms is ruled out but the following example of a potentially infinitely branching tree is accepted: \smallskip \input{Datatype/document/Fundata.tex} \bigskip If you need nested recursion on the left of a function arrow, there are alternatives to pure HOL: LCF~\cite{paulson87} is a logic where types like \begin{isabelle} \isacommand{datatype} lam = C "lam \isasymrightarrow\ lam" \end{isabelle} do indeed make sense (but note the intentionally different arrow \isa{\isasymrightarrow}). There is even a version of LCF on top of HOL, called HOLCF~\cite{MuellerNvOS99}. \index{*primrec|)} \index{*datatype|)} \subsection{Case study: Tries} Tries are a classic search tree data structure~\cite{Knuth3-75} for fast indexing with strings. Figure~\ref{fig:trie} gives a graphical example of a trie containing the words ``all'', ``an'', ``ape'', ``can'', ``car'' and ``cat''. When searching a string in a trie, the letters of the string are examined sequentially. Each letter determines which subtrie to search next. In this case study we model tries as a datatype, define a lookup and an update function, and prove that they behave as expected. \begin{figure}[htbp] \begin{center} \unitlength1mm \begin{picture}(60,30) \put( 5, 0){\makebox(0,0)[b]{l}} \put(25, 0){\makebox(0,0)[b]{e}} \put(35, 0){\makebox(0,0)[b]{n}} \put(45, 0){\makebox(0,0)[b]{r}} \put(55, 0){\makebox(0,0)[b]{t}} % \put( 5, 9){\line(0,-1){5}} \put(25, 9){\line(0,-1){5}} \put(44, 9){\line(-3,-2){9}} \put(45, 9){\line(0,-1){5}} \put(46, 9){\line(3,-2){9}} % \put( 5,10){\makebox(0,0)[b]{l}} \put(15,10){\makebox(0,0)[b]{n}} \put(25,10){\makebox(0,0)[b]{p}} \put(45,10){\makebox(0,0)[b]{a}} % \put(14,19){\line(-3,-2){9}} \put(15,19){\line(0,-1){5}} \put(16,19){\line(3,-2){9}} \put(45,19){\line(0,-1){5}} % \put(15,20){\makebox(0,0)[b]{a}} \put(45,20){\makebox(0,0)[b]{c}} % \put(30,30){\line(-3,-2){13}} \put(30,30){\line(3,-2){13}} \end{picture} \end{center} \caption{A sample trie} \label{fig:trie} \end{figure} Proper tries associate some value with each string. Since the information is stored only in the final node associated with the string, many nodes do not carry any value. This distinction is captured by the following predefined datatype (from theory \isa{Option}, which is a parent of \isa{Main}): \smallskip \input{Trie/document/Option2.tex} \indexbold{*option}\indexbold{*None}\indexbold{*Some}% \input{Trie/document/Trie.tex} \begin{exercise} Write an improved version of \isa{update} that does not suffer from the space leak in the version above. Prove the main theorem for your improved \isa{update}. \end{exercise} \begin{exercise} Write a function to \emph{delete} entries from a trie. An easy solution is a clever modification of \isa{update} which allows both insertion and deletion with a single function. Prove (a modified version of) the main theorem above. Optimize you function such that it shrinks tries after deletion, if possible. \end{exercise} \section{Total recursive functions} \label{sec:recdef} \index{*recdef|(} Although many total functions have a natural primitive recursive definition, this is not always the case. Arbitrary total recursive functions can be defined by means of \isacommand{recdef}: you can use full pattern-matching, recursion need not involve datatypes, and termination is proved by showing that the arguments of all recursive calls are smaller in a suitable (user supplied) sense. \subsection{Defining recursive functions} \input{Recdef/document/examples.tex} \subsection{Proving termination} \input{Recdef/document/termination.tex} \subsection{Simplification with recdef} \input{Recdef/document/simplification.tex} \subsection{Induction} \index{induction!recursion|(} \index{recursion induction|(} \input{Recdef/document/Induction.tex} \label{sec:recdef-induction} \index{induction!recursion|)} \index{recursion induction|)} \index{*recdef|)}