doc-src/TutorialI/fp.tex
author paulson
Tue, 24 Jul 2001 11:25:54 +0200
changeset 11450 1b02a6c4032f
parent 11428 332347b9b942
child 11456 7eb63f63e6c6
permissions -rw-r--r--
tweaks and indexing

\chapter{Functional Programming in HOL}

This chapter describes how to write
functional programs in HOL and how to verify them.  However, 
most of the constructs and
proof procedures introduced are general and recur in any specification
or verification task.  We really should speak of functional
\emph{modelling} rather than functional \emph{programming}: 
our primary aim is not
to write programs but to design abstract models of systems.  HOL is
a specification language that goes well beyond what can be expressed as a
program. However, for the time being we concentrate on the computable.

If you are a purist functional programmer, please note that all functions
in HOL must be total:
they must terminate for all inputs.  Lazy data structures are not
directly available.

\section{An Introductory Theory}
\label{sec:intro-theory}

Functional programming needs datatypes and functions. Both of them can be
defined in a theory with a syntax reminiscent of languages like ML or
Haskell. As an example consider the theory in figure~\ref{fig:ToyList}.
We will now examine it line by line.

\begin{figure}[htbp]
\begin{ttbox}\makeatother
\input{ToyList2/ToyList1}\end{ttbox}
\caption{A Theory of Lists}
\label{fig:ToyList}
\end{figure}

\index{*ToyList (example)|(}
{\makeatother\input{ToyList/document/ToyList.tex}}

The complete proof script is shown in Fig.\ts\ref{fig:ToyList-proofs}. The
concatenation of Figs.\ts\ref{fig:ToyList} and~\ref{fig:ToyList-proofs}
constitutes the complete theory \texttt{ToyList} and should reside in file
\texttt{ToyList.thy}. It is good practice to present all declarations and
definitions at the beginning of a theory to facilitate browsing.%
\index{*ToyList (example)|)}

\begin{figure}[htbp]
\begin{ttbox}\makeatother
\input{ToyList2/ToyList2}\end{ttbox}
\caption{Proofs about Lists}
\label{fig:ToyList-proofs}
\end{figure}

\subsubsection*{Review}

This is the end of our toy proof. It should have familiarized you with
\begin{itemize}
\item the standard theorem proving procedure:
state a goal (lemma or theorem); proceed with proof until a separate lemma is
required; prove that lemma; come back to the original goal.
\item a specific procedure that works well for functional programs:
induction followed by all-out simplification via \isa{auto}.
\item a basic repertoire of proof commands.
\end{itemize}


\section{Some Helpful Commands}
\label{sec:commands-and-hints}

This section discusses a few basic commands for manipulating the proof state
and can be skipped by casual readers.

There are two kinds of commands used during a proof: the actual proof
commands and auxiliary commands for examining the proof state and controlling
the display. Simple proof commands are of the form
\commdx{apply}\isa{(method)}, where \isa{method} is typically 
\isa{induct_tac} or \isa{auto}.  All such theorem proving operations
are referred to as \bfindex{methods}, and further ones are
introduced throughout the tutorial.  Unless stated otherwise, you may
assume that a method attacks merely the first subgoal. An exception is
\isa{auto}, which tries to solve all subgoals.

The most useful auxiliary commands are as follows:
\begin{description}
\item[Undoing:] \commdx{undo} undoes the effect of
the
  last command; \isacommand{undo} can be undone by
  \commdx{redo}.  Both are only needed at the shell
  level and should not occur in the final theory.
\item[Printing the current state:] \commdx{pr}
redisplays
  the current proof state, for example when it has scrolled past the top of
  the screen.
\item[Limiting the number of subgoals:] \isacommand{pr}~$n$ tells Isabelle to
  print only the first $n$ subgoals from now on and redisplays the current
  proof state. This is helpful when there are many subgoals.
\item[Modifying the order of subgoals:]
\commdx{defer} moves the first subgoal to the end and
\commdx{prefer}~$n$ moves subgoal $n$ to the front.
\item[Printing theorems:]
  \commdx{thm}~\textit{name}$@1$~\dots~\textit{name}$@n$
  prints the named theorems.
\item[Displaying types:] We have already mentioned the flag
  \ttindex{show_types} above. It can also be useful for detecting typos in
  formulae early on. For example, if \texttt{show_types} is set and the goal
  \isa{rev(rev xs) = xs} is started, Isabelle prints the additional output
\par\noindent
\begin{isabelle}%
Variables:\isanewline
~~xs~::~'a~list
\end{isabelle}%
\par\noindent
which tells us that Isabelle has correctly inferred that
\isa{xs} is a variable of list type. On the other hand, had we
made a typo as in \isa{rev(re xs) = xs}, the response
\par\noindent
\begin{isabelle}%
Variables:\isanewline
~~re~::~'a~list~{\isasymRightarrow}~'a~list\isanewline
~~xs~::~'a~list%
\end{isabelle}%
\par\noindent
would have alerted us because of the unexpected variable \isa{re}.
\item[Reading terms and types:] \commdx{term}
  \textit{string} reads, type-checks and prints the given string as a term in
  the current context; the inferred type is output as well.
  \commdx{typ} \textit{string} reads and prints the given
  string as a type in the current context.
\item[(Re)loading theories:] When you start your interaction you must open a
  named theory with the line \isa{\isacommand{theory}~T~=~\dots~:}. Isabelle
  automatically loads all the required parent theories from their respective
  files (which may take a moment, unless the theories are already loaded and
  the files have not been modified).
  
  If you suddenly discover that you need to modify a parent theory of your
  current theory, you must first abandon your current theory%
  \indexbold{abandoning a theory}\indexbold{theories!abandoning} 
  (at the shell
  level type \commdx{kill}). After the parent theory has
  been modified, you go back to your original theory. When its first line
  \isa{\isacommand{theory}~T~=~\dots~:} is processed, the
  modified parent is reloaded automatically.
  
%  The only time when you need to load a theory by hand is when you simply
%  want to check if it loads successfully without wanting to make use of the
%  theory itself. This you can do by typing
%  \isa{\commdx{use\_thy}~"T"}.
\end{description}
Further commands are found in the Isabelle/Isar Reference Manual.

We now examine Isabelle's functional programming constructs systematically,
starting with inductive datatypes.


\section{Datatypes}
\label{sec:datatype}

Inductive datatypes are part of almost every non-trivial application of HOL.
First we take another look at a very important example, the datatype of
lists, before we turn to datatypes in general. The section closes with a
case study.


\subsection{Lists}

\index{*list (type)}%
Lists are one of the essential datatypes in computing. Readers of this
tutorial and users of HOL need to be familiar with their basic operations.
Theory \isa{ToyList} is only a small fragment of HOL's predefined theory
\thydx{List}\footnote{\url{http://isabelle.in.tum.de/library/HOL/List.html}}.
The latter contains many further operations. For example, the functions
\cdx{hd} (``head'') and \cdx{tl} (``tail'') return the first
element and the remainder of a list. (However, pattern-matching is usually
preferable to \isa{hd} and \isa{tl}.)  
Also available are higher-order functions like \isa{map} and \isa{filter}.
Theory \isa{List} also contains
more syntactic sugar: \isa{[}$x@1$\isa{,}\dots\isa{,}$x@n$\isa{]} abbreviates
$x@1$\isa{\#}\dots\isa{\#}$x@n$\isa{\#[]}.  In the rest of the tutorial we
always use HOL's predefined lists.


\subsection{The General Format}
\label{sec:general-datatype}

The general HOL \isacommand{datatype} definition is of the form
\[
\isacommand{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~
C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~
C@m~\tau@{m1}~\dots~\tau@{mk@m}
\]
where $\alpha@i$ are distinct type variables (the parameters), $C@i$ are distinct
constructor names and $\tau@{ij}$ are types; it is customary to capitalize
the first letter in constructor names. There are a number of
restrictions (such as that the type should not be empty) detailed
elsewhere~\cite{isabelle-HOL}. Isabelle notifies you if you violate them.

Laws about datatypes, such as \isa{[] \isasymnoteq~x\#xs} and
\isa{(x\#xs = y\#ys) = (x=y \isasymand~xs=ys)}, are used automatically
during proofs by simplification.  The same is true for the equations in
primitive recursive function definitions.

Every datatype $t$ comes equipped with a \isa{size} function from $t$ into
the natural numbers (see~{\S}\ref{sec:nat} below). For lists, \isa{size} is
just the length, i.e.\ \isa{size [] = 0} and \isa{size(x \# xs) = size xs +
  1}.  In general, \cdx{size} returns \isa{0} for all constructors
that do not have an argument of type $t$, and for all other constructors
\isa{1 +} the sum of the sizes of all arguments of type $t$. Note that because
\isa{size} is defined on every datatype, it is overloaded; on lists
\isa{size} is also called \sdx{length}, which is not overloaded.
Isabelle will always show \isa{size} on lists as \isa{length}.


\subsection{Primitive Recursion}

Functions on datatypes are usually defined by recursion. In fact, most of the
time they are defined by what is called \textbf{primitive recursion}.
The keyword \commdx{primrec} is followed by a list of
equations
\[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \]
such that $C$ is a constructor of the datatype $t$ and all recursive calls of
$f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus
Isabelle immediately sees that $f$ terminates because one (fixed!) argument
becomes smaller with every recursive call. There must be at most one equation
for each constructor.  Their order is immaterial.
A more general method for defining total recursive functions is introduced in
{\S}\ref{sec:recdef}.

\begin{exercise}\label{ex:Tree}
\input{Misc/document/Tree.tex}%
\end{exercise}

\input{Misc/document/case_exprs.tex}

\input{Ifexpr/document/Ifexpr.tex}

\section{Some Basic Types}


\subsection{Natural Numbers}
\label{sec:nat}
\index{linear arithmetic|(}

\input{Misc/document/fakenat.tex}
\input{Misc/document/natsum.tex}

\index{linear arithmetic|)}


\subsection{Pairs}
\input{Misc/document/pairs.tex}

\subsection{Datatype {\tt\slshape option}}
\label{sec:option}
\input{Misc/document/Option2.tex}

\section{Definitions}
\label{sec:Definitions}

A definition is simply an abbreviation, i.e.\ a new name for an existing
construction. In particular, definitions cannot be recursive. Isabelle offers
definitions on the level of types and terms. Those on the type level are
called type synonyms, those on the term level are called (constant)
definitions.


\subsection{Type Synonyms}

\indexbold{type synonyms|(}%
Type synonyms are similar to those found in ML\@. They are issued by a 
\commdx{types} command:

\input{Misc/document/types.tex}%

Note that pattern-matching is not allowed, i.e.\ each definition must be of
the form $f\,x@1\,\dots\,x@n~\isasymequiv~t$.
Section~{\S}\ref{sec:Simp-with-Defs} explains how definitions are used
in proofs.%
\indexbold{type synonyms|)}

\input{Misc/document/prime_def.tex}

\input{Misc/document/Translations.tex}


\section{The Definitional Approach}
\label{sec:definitional}

As we pointed out at the beginning of the chapter, asserting arbitrary
axioms, e.g. $f(n) = f(n) + 1$, may easily lead to contradictions. In order
to avoid this danger, this tutorial advocates the definitional rather than
the axiomatic approach: introduce new concepts by definitions, thus
preserving consistency. However, on the face of it, Isabelle/HOL seems to
support many more constructs than just definitions, for example
\isacommand{primrec}. The crucial point is that internally, everything
(except real axioms) is reduced to a definition. For example, given some
\isacommand{primrec} function definition, this is turned into a proper
(nonrecursive!) definition, and the user-supplied recursion equations are
derived as theorems from that definition. This tricky process is completely
hidden from the user and it is not necessary to understand the details. The
result of the definitional approach is that \isacommand{primrec} is as safe
as pure \isacommand{defs} are, but more convenient. And this is not just the
case for \isacommand{primrec} but also for the other commands described
later, like \isacommand{recdef} and \isacommand{inductive}.
This strict adherence to the definitional approach reduces the risk of 
soundness errors inside Isabelle/HOL.

\chapter{More Functional Programming}

The purpose of this chapter is to deepen the reader's understanding of the
concepts encountered so far and to introduce advanced forms of datatypes and
recursive functions. The first two sections give a structured presentation of
theorem proving by simplification ({\S}\ref{sec:Simplification}) and discuss
important heuristics for induction ({\S}\ref{sec:InductionHeuristics}). They can
be skipped by readers less interested in proofs. They are followed by a case
study, a compiler for expressions ({\S}\ref{sec:ExprCompiler}). Advanced
datatypes, including those involving function spaces, are covered in
{\S}\ref{sec:advanced-datatypes}, which closes with another case study, search
trees (``tries'').  Finally we introduce \isacommand{recdef}, a very general
form of recursive function definition which goes well beyond what
\isacommand{primrec} allows ({\S}\ref{sec:recdef}).


\section{Simplification}
\label{sec:Simplification}
\index{simplification|(}

So far we have proved our theorems by \isa{auto}, which simplifies
\emph{all} subgoals. In fact, \isa{auto} can do much more than that, except
that it did not need to so far. However, when you go beyond toy examples, you
need to understand the ingredients of \isa{auto}.  This section covers the
method that \isa{auto} always applies first, simplification.

Simplification is one of the central theorem proving tools in Isabelle and
many other systems. The tool itself is called the \bfindex{simplifier}. The
purpose of this section is to introduce the many features of the simplifier.
Anybody intending to perform proofs in HOL should read this section. Later on
({\S}\ref{sec:simplification-II}) we explain some more advanced features and a
little bit of how the simplifier works. The serious student should read that
section as well, in particular in order to understand what happened if things
do not simplify as expected.

\subsection{What is Simplification}

In its most basic form, simplification means repeated application of
equations from left to right. For example, taking the rules for \isa{\at}
and applying them to the term \isa{[0,1] \at\ []} results in a sequence of
simplification steps:
\begin{ttbox}\makeatother
(0#1#[]) @ []  \(\leadsto\)  0#((1#[]) @ [])  \(\leadsto\)  0#(1#([] @ []))  \(\leadsto\)  0#1#[]
\end{ttbox}
This is also known as \bfindex{term rewriting}\indexbold{rewriting} and the
equations are referred to as \textbf{rewrite rules}\indexbold{rewrite rule}.
``Rewriting'' is more honest than ``simplification'' because the terms do not
necessarily become simpler in the process.

\input{Misc/document/simp.tex}

\index{simplification|)}

\input{Misc/document/Itrev.tex}

\begin{exercise}
\input{Misc/document/Tree2.tex}%
\end{exercise}

\input{CodeGen/document/CodeGen.tex}


\section{Advanced Datatypes}
\label{sec:advanced-datatypes}
\index{datatype@\isacommand {datatype} (command)|(}
\index{primrec@\isacommand {primrec} (command)|(}
%|)

This section presents advanced forms of datatypes: mutual and nested
recursion.  A series of examples will culminate in a treatment of the trie
data structure.


\subsection{Mutual Recursion}
\label{sec:datatype-mut-rec}

\input{Datatype/document/ABexpr.tex}

\subsection{Nested Recursion}
\label{sec:nested-datatype}

{\makeatother\input{Datatype/document/Nested.tex}}


\subsection{The Limits of Nested Recursion}

How far can we push nested recursion? By the unfolding argument above, we can
reduce nested to mutual recursion provided the nested recursion only involves
previously defined datatypes. This does not include functions:
\begin{isabelle}
\isacommand{datatype} t = C "t \isasymRightarrow\ bool"
\end{isabelle}
This declaration is a real can of worms.
In HOL it must be ruled out because it requires a type
\isa{t} such that \isa{t} and its power set \isa{t \isasymFun\ bool} have the
same cardinality --- an impossibility. For the same reason it is not possible
to allow recursion involving the type \isa{set}, which is isomorphic to
\isa{t \isasymFun\ bool}.

Fortunately, a limited form of recursion
involving function spaces is permitted: the recursive type may occur on the
right of a function arrow, but never on the left. Hence the above can of worms
is ruled out but the following example of a potentially infinitely branching tree is
accepted:
\smallskip

\input{Datatype/document/Fundata.tex}
\bigskip

If you need nested recursion on the left of a function arrow, there are
alternatives to pure HOL\@.  In the Logic for Computable Functions 
(LCF), types like
\begin{isabelle}
\isacommand{datatype} lam = C "lam \isasymrightarrow\ lam"
\end{isabelle}
do indeed make sense~\cite{paulson87}.  Note the different arrow,
\isa{\isasymrightarrow} instead of \isa{\isasymRightarrow},
expressing the type of \emph{continuous} functions. 
There is even a version of LCF on top of HOL,
called HOLCF~\cite{MuellerNvOS99}.
\index{datatype@\isacommand {datatype} (command)|)}
\index{primrec@\isacommand {primrec} (command)|)}


\subsection{Case Study: Tries}
\label{sec:Trie}

Tries are a classic search tree data structure~\cite{Knuth3-75} for fast
indexing with strings. Figure~\ref{fig:trie} gives a graphical example of a
trie containing the words ``all'', ``an'', ``ape'', ``can'', ``car'' and
``cat''.  When searching a string in a trie, the letters of the string are
examined sequentially. Each letter determines which subtrie to search next.
In this case study we model tries as a datatype, define a lookup and an
update function, and prove that they behave as expected.

\begin{figure}[htbp]
\begin{center}
\unitlength1mm
\begin{picture}(60,30)
\put( 5, 0){\makebox(0,0)[b]{l}}
\put(25, 0){\makebox(0,0)[b]{e}}
\put(35, 0){\makebox(0,0)[b]{n}}
\put(45, 0){\makebox(0,0)[b]{r}}
\put(55, 0){\makebox(0,0)[b]{t}}
%
\put( 5, 9){\line(0,-1){5}}
\put(25, 9){\line(0,-1){5}}
\put(44, 9){\line(-3,-2){9}}
\put(45, 9){\line(0,-1){5}}
\put(46, 9){\line(3,-2){9}}
%
\put( 5,10){\makebox(0,0)[b]{l}}
\put(15,10){\makebox(0,0)[b]{n}}
\put(25,10){\makebox(0,0)[b]{p}}
\put(45,10){\makebox(0,0)[b]{a}}
%
\put(14,19){\line(-3,-2){9}}
\put(15,19){\line(0,-1){5}}
\put(16,19){\line(3,-2){9}}
\put(45,19){\line(0,-1){5}}
%
\put(15,20){\makebox(0,0)[b]{a}}
\put(45,20){\makebox(0,0)[b]{c}}
%
\put(30,30){\line(-3,-2){13}}
\put(30,30){\line(3,-2){13}}
\end{picture}
\end{center}
\caption{A Sample Trie}
\label{fig:trie}
\end{figure}

Proper tries associate some value with each string. Since the
information is stored only in the final node associated with the string, many
nodes do not carry any value. This distinction is modeled with the help
of the predefined datatype \isa{option} (see {\S}\ref{sec:option}).
\input{Trie/document/Trie.tex}

\section{Total Recursive Functions}
\label{sec:recdef}
\index{recdef@\isacommand {recdef} (command)|(}\index{functions!total|(}

Although many total functions have a natural primitive recursive definition,
this is not always the case. Arbitrary total recursive functions can be
defined by means of \isacommand{recdef}: you can use full pattern-matching,
recursion need not involve datatypes, and termination is proved by showing
that the arguments of all recursive calls are smaller in a suitable (user
supplied) sense. In this section we restrict ourselves to measure functions;
more advanced termination proofs are discussed in {\S}\ref{sec:beyond-measure}.

\subsection{Defining Recursive Functions}
\label{sec:recdef-examples}
\input{Recdef/document/examples.tex}

\subsection{Proving Termination}

\input{Recdef/document/termination.tex}

\subsection{Simplification with Recdef}
\label{sec:recdef-simplification}

\input{Recdef/document/simplification.tex}

\subsection{Induction}
\index{induction!recursion|(}
\index{recursion induction|(}

\input{Recdef/document/Induction.tex}
\label{sec:recdef-induction}

\index{induction!recursion|)}
\index{recursion induction|)}
\index{recdef@\isacommand {recdef} (command)|)}\index{functions!total|)}