isabelle: doc-src/Functions/Thy/Functions.thy@ce169bd37fc0


(*  Title:      doc-src/IsarAdvanced/Functions/Thy/Fundefs.thy
    Author:     Alexander Krauss, TU Muenchen

Tutorial for function definitions with the new "function" package.
*)

theory Functions
imports Main
begin

section {* Function Definitions for Dummies *}

text {*
  In most cases, defining a recursive function is just as simple as other definitions:
*}

fun fib :: "nat \<Rightarrow> nat"
where
  "fib 0 = 1"
| "fib (Suc 0) = 1"
| "fib (Suc (Suc n)) = fib n + fib (Suc n)"

text {*
  The syntax is rather self-explanatory: We introduce a function by
  giving its name, its type, 
  and a set of defining recursive equations.
  If we leave out the type, the most general type will be
  inferred, which can sometimes lead to surprises: Since both @{term
  "1::nat"} and @{text "+"} are overloaded, we would end up
  with @{text "fib :: nat \<Rightarrow> 'a::{one,plus}"}.
*}

text {*
  The function always terminates, since its argument gets smaller in
  every recursive call. 
  Since HOL is a logic of total functions, termination is a
  fundamental requirement to prevent inconsistencies\footnote{From the
  \qt{definition} @{text "f(n) = f(n) + 1"} we could prove 
  @{text "0 = 1"} by subtracting @{text "f(n)"} on both sides.}.
  Isabelle tries to prove termination automatically when a definition
  is made. In \S\ref{termination}, we will look at cases where this
  fails and see what to do then.
*}

subsection {* Pattern matching *}

text {* \label{patmatch}
  Like in functional programming, we can use pattern matching to
  define functions. At the moment we will only consider \emph{constructor
  patterns}, which only consist of datatype constructors and
  variables. Furthermore, patterns must be linear, i.e.\ all variables
  on the left hand side of an equation must be distinct. In
  \S\ref{genpats} we discuss more general pattern matching.

  If patterns overlap, the order of the equations is taken into
  account. The following function inserts a fixed element between any
  two elements of a list:
*}

fun sep :: "'a \<Rightarrow> 'a list \<Rightarrow> 'a list"
where
  "sep a (x#y#xs) = x # a # sep a (y # xs)"
| "sep a xs       = xs"

text {* 
  Overlapping patterns are interpreted as \qt{increments} to what is
  already there: The second equation is only meant for the cases where
  the first one does not match. Consequently, Isabelle replaces it
  internally by the remaining cases, making the patterns disjoint:
*}

thm sep.simps

text {* @{thm [display] sep.simps[no_vars]} *}

text {* 
  \noindent The equations from function definitions are automatically used in
  simplification:
*}

lemma "sep 0 [1, 2, 3] = [1, 0, 2, 0, 3]"
by simp

subsection {* Induction *}

text {*

  Isabelle provides customized induction rules for recursive
  functions. These rules follow the recursive structure of the
  definition. Here is the rule @{text sep.induct} arising from the
  above definition of @{const sep}:

  @{thm [display] sep.induct}
  
  We have a step case for list with at least two elements, and two
  base cases for the zero- and the one-element list. Here is a simple
  proof about @{const sep} and @{const map}
*}

lemma "map f (sep x ys) = sep (f x) (map f ys)"
apply (induct x ys rule: sep.induct)

txt {*
  We get three cases, like in the definition.

  @{subgoals [display]}
*}

apply auto 
done
text {*

  With the \cmd{fun} command, you can define about 80\% of the
  functions that occur in practice. The rest of this tutorial explains
  the remaining 20\%.
*}


section {* fun vs.\ function *}

text {* 
  The \cmd{fun} command provides a
  convenient shorthand notation for simple function definitions. In
  this mode, Isabelle tries to solve all the necessary proof obligations
  automatically. If any proof fails, the definition is
  rejected. This can either mean that the definition is indeed faulty,
  or that the default proof procedures are just not smart enough (or
  rather: not designed) to handle the definition.

  By expanding the abbreviation to the more verbose \cmd{function} command, these proof obligations become visible and can be analyzed or
  solved manually. The expansion from \cmd{fun} to \cmd{function} is as follows:

\end{isamarkuptext}


\[\left[\;\begin{minipage}{0.25\textwidth}\vspace{6pt}
\cmd{fun} @{text "f :: \<tau>"}\\%
\cmd{where}\\%
\hspace*{2ex}{\it equations}\\%
\hspace*{2ex}\vdots\vspace*{6pt}
\end{minipage}\right]
\quad\equiv\quad
\left[\;\begin{minipage}{0.48\textwidth}\vspace{6pt}
\cmd{function} @{text "("}\cmd{sequential}@{text ") f :: \<tau>"}\\%
\cmd{where}\\%
\hspace*{2ex}{\it equations}\\%
\hspace*{2ex}\vdots\\%
\cmd{by} @{text "pat_completeness auto"}\\%
\cmd{termination by} @{text "lexicographic_order"}\vspace{6pt}
\end{minipage}
\right]\]

\begin{isamarkuptext}
  \vspace*{1em}
  \noindent Some details have now become explicit:

  \begin{enumerate}
  \item The \cmd{sequential} option enables the preprocessing of
  pattern overlaps which we already saw. Without this option, the equations
  must already be disjoint and complete. The automatic completion only
  works with constructor patterns.

  \item A function definition produces a proof obligation which
  expresses completeness and compatibility of patterns (we talk about
  this later). The combination of the methods @{text "pat_completeness"} and
  @{text "auto"} is used to solve this proof obligation.

  \item A termination proof follows the definition, started by the
  \cmd{termination} command. This will be explained in \S\ref{termination}.
 \end{enumerate}
  Whenever a \cmd{fun} command fails, it is usually a good idea to
  expand the syntax to the more verbose \cmd{function} form, to see
  what is actually going on.
 *}


section {* Termination *}

text {*\label{termination}
  The method @{text "lexicographic_order"} is the default method for
  termination proofs. It can prove termination of a
  certain class of functions by searching for a suitable lexicographic
  combination of size measures. Of course, not all functions have such
  a simple termination argument. For them, we can specify the termination
  relation manually.
*}

subsection {* The {\tt relation} method *}
text{*
  Consider the following function, which sums up natural numbers up to
  @{text "N"}, using a counter @{text "i"}:
*}

function sum :: "nat \<Rightarrow> nat \<Rightarrow> nat"
where
  "sum i N = (if i > N then 0 else i + sum (Suc i) N)"
by pat_completeness auto

text {*
  \noindent The @{text "lexicographic_order"} method fails on this example, because none of the
  arguments decreases in the recursive call, with respect to the standard size ordering.
  To prove termination manually, we must provide a custom wellfounded relation.

  The termination argument for @{text "sum"} is based on the fact that
  the \emph{difference} between @{text "i"} and @{text "N"} gets
  smaller in every step, and that the recursion stops when @{text "i"}
  is greater than @{text "N"}. Phrased differently, the expression 
  @{text "N + 1 - i"} always decreases.

  We can use this expression as a measure function suitable to prove termination.
*}

termination sum
apply (relation "measure (\<lambda>(i,N). N + 1 - i)")

txt {*
  The \cmd{termination} command sets up the termination goal for the
  specified function @{text "sum"}. If the function name is omitted, it
  implicitly refers to the last function definition.

  The @{text relation} method takes a relation of
  type @{typ "('a \<times> 'a) set"}, where @{typ "'a"} is the argument type of
  the function. If the function has multiple curried arguments, then
  these are packed together into a tuple, as it happened in the above
  example.

  The predefined function @{term[source] "measure :: ('a \<Rightarrow> nat) \<Rightarrow> ('a \<times> 'a) set"} constructs a
  wellfounded relation from a mapping into the natural numbers (a
  \emph{measure function}). 

  After the invocation of @{text "relation"}, we must prove that (a)
  the relation we supplied is wellfounded, and (b) that the arguments
  of recursive calls indeed decrease with respect to the
  relation:

  @{subgoals[display,indent=0]}

  These goals are all solved by @{text "auto"}:
*}

apply auto
done

text {*
  Let us complicate the function a little, by adding some more
  recursive calls: 
*}

function foo :: "nat \<Rightarrow> nat \<Rightarrow> nat"
where
  "foo i N = (if i > N 
              then (if N = 0 then 0 else foo 0 (N - 1))
              else i + foo (Suc i) N)"
by pat_completeness auto

text {*
  When @{text "i"} has reached @{text "N"}, it starts at zero again
  and @{text "N"} is decremented.
  This corresponds to a nested
  loop where one index counts up and the other down. Termination can
  be proved using a lexicographic combination of two measures, namely
  the value of @{text "N"} and the above difference. The @{const
  "measures"} combinator generalizes @{text "measure"} by taking a
  list of measure functions.  
*}

termination 
by (relation "measures [\<lambda>(i, N). N, \<lambda>(i,N). N + 1 - i]") auto

subsection {* How @{text "lexicographic_order"} works *}

(*fun fails :: "nat \<Rightarrow> nat list \<Rightarrow> nat"
where
  "fails a [] = a"
| "fails a (x#xs) = fails (x + a) (x # xs)"
*)

text {*
  To see how the automatic termination proofs work, let's look at an
  example where it fails\footnote{For a detailed discussion of the
  termination prover, see \cite{bulwahnKN07}}:

\end{isamarkuptext}  
\cmd{fun} @{text "fails :: \"nat \<Rightarrow> nat list \<Rightarrow> nat\""}\\%
\cmd{where}\\%
\hspace*{2ex}@{text "\"fails a [] = a\""}\\%
|\hspace*{1.5ex}@{text "\"fails a (x#xs) = fails (x + a) (x#xs)\""}\\
\begin{isamarkuptext}

\noindent Isabelle responds with the following error:

\begin{isabelle}
*** Unfinished subgoals:\newline
*** (a, 1, <):\newline
*** \ 1.~@{text "\<And>x. x = 0"}\newline
*** (a, 1, <=):\newline
*** \ 1.~False\newline
*** (a, 2, <):\newline
*** \ 1.~False\newline
*** Calls:\newline
*** a) @{text "(a, x # xs) -->> (x + a, x # xs)"}\newline
*** Measures:\newline
*** 1) @{text "\<lambda>x. size (fst x)"}\newline
*** 2) @{text "\<lambda>x. size (snd x)"}\newline
*** Result matrix:\newline
*** \ \ \ \ 1\ \ 2  \newline
*** a:  ?   <= \newline
*** Could not find lexicographic termination order.\newline
*** At command "fun".\newline
\end{isabelle}
*}


text {*
  The key to this error message is the matrix at the bottom. The rows
  of that matrix correspond to the different recursive calls (In our
  case, there is just one). The columns are the function's arguments 
  (expressed through different measure functions, which map the
  argument tuple to a natural number). 

  The contents of the matrix summarize what is known about argument
  descents: The second argument has a weak descent (@{text "<="}) at the
  recursive call, and for the first argument nothing could be proved,
  which is expressed by @{text "?"}. In general, there are the values
  @{text "<"}, @{text "<="} and @{text "?"}.

  For the failed proof attempts, the unfinished subgoals are also
  printed. Looking at these will often point to a missing lemma.

%  As a more real example, here is quicksort:
*}
(*
function qs :: "nat list \<Rightarrow> nat list"
where
  "qs [] = []"
| "qs (x#xs) = qs [y\<in>xs. y < x] @ x # qs [y\<in>xs. y \<ge> x]"
by pat_completeness auto

termination apply lexicographic_order

text {* If we try @{text "lexicographic_order"} method, we get the
  following error *}
termination by (lexicographic_order simp:l2)

lemma l: "x \<le> y \<Longrightarrow> x < Suc y" by arith

function 

*)

section {* Mutual Recursion *}

text {*
  If two or more functions call one another mutually, they have to be defined
  in one step. Here are @{text "even"} and @{text "odd"}:
*}

function even :: "nat \<Rightarrow> bool"
    and odd  :: "nat \<Rightarrow> bool"
where
  "even 0 = True"
| "odd 0 = False"
| "even (Suc n) = odd n"
| "odd (Suc n) = even n"
by pat_completeness auto

text {*
  To eliminate the mutual dependencies, Isabelle internally
  creates a single function operating on the sum
  type @{typ "nat + nat"}. Then, @{const even} and @{const odd} are
  defined as projections. Consequently, termination has to be proved
  simultaneously for both functions, by specifying a measure on the
  sum type: 
*}

termination 
by (relation "measure (\<lambda>x. case x of Inl n \<Rightarrow> n | Inr n \<Rightarrow> n)") auto

text {* 
  We could also have used @{text lexicographic_order}, which
  supports mutual recursive termination proofs to a certain extent.
*}

subsection {* Induction for mutual recursion *}

text {*

  When functions are mutually recursive, proving properties about them
  generally requires simultaneous induction. The induction rule @{text "even_odd.induct"}
  generated from the above definition reflects this.

  Let us prove something about @{const even} and @{const odd}:
*}

lemma even_odd_mod2:
  "even n = (n mod 2 = 0)"
  "odd n = (n mod 2 = 1)"

txt {* 
  We apply simultaneous induction, specifying the induction variable
  for both goals, separated by \cmd{and}:  *}

apply (induct n and n rule: even_odd.induct)

txt {* 
  We get four subgoals, which correspond to the clauses in the
  definition of @{const even} and @{const odd}:
  @{subgoals[display,indent=0]}
  Simplification solves the first two goals, leaving us with two
  statements about the @{text "mod"} operation to prove:
*}

apply simp_all

txt {* 
  @{subgoals[display,indent=0]} 

  \noindent These can be handled by Isabelle's arithmetic decision procedures.
  
*}

apply arith
apply arith
done

text {*
  In proofs like this, the simultaneous induction is really essential:
  Even if we are just interested in one of the results, the other
  one is necessary to strengthen the induction hypothesis. If we leave
  out the statement about @{const odd} and just write @{term True} instead,
  the same proof fails:
*}

lemma failed_attempt:
  "even n = (n mod 2 = 0)"
  "True"
apply (induct n rule: even_odd.induct)

txt {*
  \noindent Now the third subgoal is a dead end, since we have no
  useful induction hypothesis available:

  @{subgoals[display,indent=0]} 
*}

oops

section {* General pattern matching *}
text{*\label{genpats} *}

subsection {* Avoiding automatic pattern splitting *}

text {*

  Up to now, we used pattern matching only on datatypes, and the
  patterns were always disjoint and complete, and if they weren't,
  they were made disjoint automatically like in the definition of
  @{const "sep"} in \S\ref{patmatch}.

  This automatic splitting can significantly increase the number of
  equations involved, and this is not always desirable. The following
  example shows the problem:
  
  Suppose we are modeling incomplete knowledge about the world by a
  three-valued datatype, which has values @{term "T"}, @{term "F"}
  and @{term "X"} for true, false and uncertain propositions, respectively. 
*}

datatype P3 = T | F | X

text {* \noindent Then the conjunction of such values can be defined as follows: *}

fun And :: "P3 \<Rightarrow> P3 \<Rightarrow> P3"
where
  "And T p = p"
| "And p T = p"
| "And p F = F"
| "And F p = F"
| "And X X = X"


text {* 
  This definition is useful, because the equations can directly be used
  as simplification rules. But the patterns overlap: For example,
  the expression @{term "And T T"} is matched by both the first and
  the second equation. By default, Isabelle makes the patterns disjoint by
  splitting them up, producing instances:
*}

thm And.simps

text {*
  @{thm[indent=4] And.simps}
  
  \vspace*{1em}
  \noindent There are several problems with this:

  \begin{enumerate}
  \item If the datatype has many constructors, there can be an
  explosion of equations. For @{const "And"}, we get seven instead of
  five equations, which can be tolerated, but this is just a small
  example.

  \item Since splitting makes the equations \qt{less general}, they
  do not always match in rewriting. While the term @{term "And x F"}
  can be simplified to @{term "F"} with the original equations, a
  (manual) case split on @{term "x"} is now necessary.

  \item The splitting also concerns the induction rule @{text
  "And.induct"}. Instead of five premises it now has seven, which
  means that our induction proofs will have more cases.

  \item In general, it increases clarity if we get the same definition
  back which we put in.
  \end{enumerate}

  If we do not want the automatic splitting, we can switch it off by
  leaving out the \cmd{sequential} option. However, we will have to
  prove that our pattern matching is consistent\footnote{This prevents
  us from defining something like @{term "f x = True"} and @{term "f x
  = False"} simultaneously.}:
*}

function And2 :: "P3 \<Rightarrow> P3 \<Rightarrow> P3"
where
  "And2 T p = p"
| "And2 p T = p"
| "And2 p F = F"
| "And2 F p = F"
| "And2 X X = X"

txt {*
  \noindent Now let's look at the proof obligations generated by a
  function definition. In this case, they are:

  @{subgoals[display,indent=0]}\vspace{-1.2em}\hspace{3cm}\vdots\vspace{1.2em}

  The first subgoal expresses the completeness of the patterns. It has
  the form of an elimination rule and states that every @{term x} of
  the function's input type must match at least one of the patterns\footnote{Completeness could
  be equivalently stated as a disjunction of existential statements: 
@{term "(\<exists>p. x = (T, p)) \<or> (\<exists>p. x = (p, T)) \<or> (\<exists>p. x = (p, F)) \<or>
  (\<exists>p. x = (F, p)) \<or> (x = (X, X))"}, and you can use the method @{text atomize_elim} to get that form instead.}. If the patterns just involve
  datatypes, we can solve it with the @{text "pat_completeness"}
  method:
*}

apply pat_completeness

txt {*
  The remaining subgoals express \emph{pattern compatibility}. We do
  allow that an input value matches multiple patterns, but in this
  case, the result (i.e.~the right hand sides of the equations) must
  also be equal. For each pair of two patterns, there is one such
  subgoal. Usually this needs injectivity of the constructors, which
  is used automatically by @{text "auto"}.
*}

by auto


subsection {* Non-constructor patterns *}

text {*
  Most of Isabelle's basic types take the form of inductive datatypes,
  and usually pattern matching works on the constructors of such types. 
  However, this need not be always the case, and the \cmd{function}
  command handles other kind of patterns, too.

  One well-known instance of non-constructor patterns are
  so-called \emph{$n+k$-patterns}, which are a little controversial in
  the functional programming world. Here is the initial fibonacci
  example with $n+k$-patterns:
*}

function fib2 :: "nat \<Rightarrow> nat"
where
  "fib2 0 = 1"
| "fib2 1 = 1"
| "fib2 (n + 2) = fib2 n + fib2 (Suc n)"

(*<*)ML_val "goals_limit := 1"(*>*)
txt {*
  This kind of matching is again justified by the proof of pattern
  completeness and compatibility. 
  The proof obligation for pattern completeness states that every natural number is
  either @{term "0::nat"}, @{term "1::nat"} or @{term "n +
  (2::nat)"}:

  @{subgoals[display,indent=0]}

  This is an arithmetic triviality, but unfortunately the
  @{text arith} method cannot handle this specific form of an
  elimination rule. However, we can use the method @{text
  "atomize_elim"} to do an ad-hoc conversion to a disjunction of
  existentials, which can then be solved by the arithmetic decision procedure.
  Pattern compatibility and termination are automatic as usual.
*}
(*<*)ML_val "goals_limit := 10"(*>*)
apply atomize_elim
apply arith
apply auto
done
termination by lexicographic_order
text {*
  We can stretch the notion of pattern matching even more. The
  following function is not a sensible functional program, but a
  perfectly valid mathematical definition:
*}

function ev :: "nat \<Rightarrow> bool"
where
  "ev (2 * n) = True"
| "ev (2 * n + 1) = False"
apply atomize_elim
by arith+
termination by (relation "{}") simp

text {*
  This general notion of pattern matching gives you a certain freedom
  in writing down specifications. However, as always, such freedom should
  be used with care:

  If we leave the area of constructor
  patterns, we have effectively departed from the world of functional
  programming. This means that it is no longer possible to use the
  code generator, and expect it to generate ML code for our
  definitions. Also, such a specification might not work very well together with
  simplification. Your mileage may vary.
*}


subsection {* Conditional equations *}

text {* 
  The function package also supports conditional equations, which are
  similar to guards in a language like Haskell. Here is Euclid's
  algorithm written with conditional patterns\footnote{Note that the
  patterns are also overlapping in the base case}:
*}

function gcd :: "nat \<Rightarrow> nat \<Rightarrow> nat"
where
  "gcd x 0 = x"
| "gcd 0 y = y"
| "x < y \<Longrightarrow> gcd (Suc x) (Suc y) = gcd (Suc x) (y - x)"
| "\<not> x < y \<Longrightarrow> gcd (Suc x) (Suc y) = gcd (x - y) (Suc y)"
by (atomize_elim, auto, arith)
termination by lexicographic_order

text {*
  By now, you can probably guess what the proof obligations for the
  pattern completeness and compatibility look like. 

  Again, functions with conditional patterns are not supported by the
  code generator.
*}


subsection {* Pattern matching on strings *}

text {*
  As strings (as lists of characters) are normal datatypes, pattern
  matching on them is possible, but somewhat problematic. Consider the
  following definition:

\end{isamarkuptext}
\noindent\cmd{fun} @{text "check :: \"string \<Rightarrow> bool\""}\\%
\cmd{where}\\%
\hspace*{2ex}@{text "\"check (''good'') = True\""}\\%
@{text "| \"check s = False\""}
\begin{isamarkuptext}

  \noindent An invocation of the above \cmd{fun} command does not
  terminate. What is the problem? Strings are lists of characters, and
  characters are a datatype with a lot of constructors. Splitting the
  catch-all pattern thus leads to an explosion of cases, which cannot
  be handled by Isabelle.

  There are two things we can do here. Either we write an explicit
  @{text "if"} on the right hand side, or we can use conditional patterns:
*}

function check :: "string \<Rightarrow> bool"
where
  "check (''good'') = True"
| "s \<noteq> ''good'' \<Longrightarrow> check s = False"
by auto


section {* Partiality *}

text {* 
  In HOL, all functions are total. A function @{term "f"} applied to
  @{term "x"} always has the value @{term "f x"}, and there is no notion
  of undefinedness. 
  This is why we have to do termination
  proofs when defining functions: The proof justifies that the
  function can be defined by wellfounded recursion.

  However, the \cmd{function} package does support partiality to a
  certain extent. Let's look at the following function which looks
  for a zero of a given function f. 
*}

function (*<*)(domintros, tailrec)(*>*)findzero :: "(nat \<Rightarrow> nat) \<Rightarrow> nat \<Rightarrow> nat"
where
  "findzero f n = (if f n = 0 then n else findzero f (Suc n))"
by pat_completeness auto
(*<*)declare findzero.simps[simp del](*>*)

text {*
  \noindent Clearly, any attempt of a termination proof must fail. And without
  that, we do not get the usual rules @{text "findzero.simps"} and 
  @{text "findzero.induct"}. So what was the definition good for at all?
*}

subsection {* Domain predicates *}

text {*
  The trick is that Isabelle has not only defined the function @{const findzero}, but also
  a predicate @{term "findzero_dom"} that characterizes the values where the function
  terminates: the \emph{domain} of the function. If we treat a
  partial function just as a total function with an additional domain
  predicate, we can derive simplification and
  induction rules as we do for total functions. They are guarded
  by domain conditions and are called @{text psimps} and @{text
  pinduct}: 
*}

text {*
  \noindent\begin{minipage}{0.79\textwidth}@{thm[display,margin=85] findzero.psimps}\end{minipage}
  \hfill(@{text "findzero.psimps"})
  \vspace{1em}

  \noindent\begin{minipage}{0.79\textwidth}@{thm[display,margin=85] findzero.pinduct}\end{minipage}
  \hfill(@{text "findzero.pinduct"})
*}

text {*
  Remember that all we
  are doing here is use some tricks to make a total function appear
  as if it was partial. We can still write the term @{term "findzero
  (\<lambda>x. 1) 0"} and like any other term of type @{typ nat} it is equal
  to some natural number, although we might not be able to find out
  which one. The function is \emph{underdefined}.

  But it is defined enough to prove something interesting about it. We
  can prove that if @{term "findzero f n"}
  terminates, it indeed returns a zero of @{term f}:
*}

lemma findzero_zero: "findzero_dom (f, n) \<Longrightarrow> f (findzero f n) = 0"

txt {* \noindent We apply induction as usual, but using the partial induction
  rule: *}

apply (induct f n rule: findzero.pinduct)

txt {* \noindent This gives the following subgoals:

  @{subgoals[display,indent=0]}

  \noindent The hypothesis in our lemma was used to satisfy the first premise in
  the induction rule. However, we also get @{term
  "findzero_dom (f, n)"} as a local assumption in the induction step. This
  allows to unfold @{term "findzero f n"} using the @{text psimps}
  rule, and the rest is trivial. Since the @{text psimps} rules carry the
  @{text "[simp]"} attribute by default, we just need a single step:
 *}
apply simp
done

text {*
  Proofs about partial functions are often not harder than for total
  functions. Fig.~\ref{findzero_isar} shows a slightly more
  complicated proof written in Isar. It is verbose enough to show how
  partiality comes into play: From the partial induction, we get an
  additional domain condition hypothesis. Observe how this condition
  is applied when calls to @{term findzero} are unfolded.
*}

text_raw {*
\begin{figure}
\hrule\vspace{6pt}
\begin{minipage}{0.8\textwidth}
\isabellestyle{it}
\isastyle\isamarkuptrue
*}
lemma "\<lbrakk>findzero_dom (f, n); x \<in> {n ..< findzero f n}\<rbrakk> \<Longrightarrow> f x \<noteq> 0"
proof (induct rule: findzero.pinduct)
  fix f n assume dom: "findzero_dom (f, n)"
               and IH: "\<lbrakk>f n \<noteq> 0; x \<in> {Suc n ..< findzero f (Suc n)}\<rbrakk> \<Longrightarrow> f x \<noteq> 0"
               and x_range: "x \<in> {n ..< findzero f n}"
  have "f n \<noteq> 0"
  proof 
    assume "f n = 0"
    with dom have "findzero f n = n" by simp
    with x_range show False by auto
  qed
  
  from x_range have "x = n \<or> x \<in> {Suc n ..< findzero f n}" by auto
  thus "f x \<noteq> 0"
  proof
    assume "x = n"
    with `f n \<noteq> 0` show ?thesis by simp
  next
    assume "x \<in> {Suc n ..< findzero f n}"
    with dom and `f n \<noteq> 0` have "x \<in> {Suc n ..< findzero f (Suc n)}" by simp
    with IH and `f n \<noteq> 0`
    show ?thesis by simp
  qed
qed
text_raw {*
\isamarkupfalse\isabellestyle{tt}
\end{minipage}\vspace{6pt}\hrule
\caption{A proof about a partial function}\label{findzero_isar}
\end{figure}
*}

subsection {* Partial termination proofs *}

text {*
  Now that we have proved some interesting properties about our
  function, we should turn to the domain predicate and see if it is
  actually true for some values. Otherwise we would have just proved
  lemmas with @{term False} as a premise.

  Essentially, we need some introduction rules for @{text
  findzero_dom}. The function package can prove such domain
  introduction rules automatically. But since they are not used very
  often (they are almost never needed if the function is total), this
  functionality is disabled by default for efficiency reasons. So we have to go
  back and ask for them explicitly by passing the @{text
  "(domintros)"} option to the function package:

\vspace{1ex}
\noindent\cmd{function} @{text "(domintros) findzero :: \"(nat \<Rightarrow> nat) \<Rightarrow> nat \<Rightarrow> nat\""}\\%
\cmd{where}\isanewline%
\ \ \ldots\\

  \noindent Now the package has proved an introduction rule for @{text findzero_dom}:
*}

thm findzero.domintros

text {*
  @{thm[display] findzero.domintros}

  Domain introduction rules allow to show that a given value lies in the
  domain of a function, if the arguments of all recursive calls
  are in the domain as well. They allow to do a \qt{single step} in a
  termination proof. Usually, you want to combine them with a suitable
  induction principle.

  Since our function increases its argument at recursive calls, we
  need an induction principle which works \qt{backwards}. We will use
  @{text inc_induct}, which allows to do induction from a fixed number
  \qt{downwards}:

  \begin{center}@{thm inc_induct}\hfill(@{text "inc_induct"})\end{center}

  Figure \ref{findzero_term} gives a detailed Isar proof of the fact
  that @{text findzero} terminates if there is a zero which is greater
  or equal to @{term n}. First we derive two useful rules which will
  solve the base case and the step case of the induction. The
  induction is then straightforward, except for the unusual induction
  principle.

*}

text_raw {*
\begin{figure}
\hrule\vspace{6pt}
\begin{minipage}{0.8\textwidth}
\isabellestyle{it}
\isastyle\isamarkuptrue
*}
lemma findzero_termination:
  assumes "x \<ge> n" and "f x = 0"
  shows "findzero_dom (f, n)"
proof - 
  have base: "findzero_dom (f, x)"
    by (rule findzero.domintros) (simp add:`f x = 0`)

  have step: "\<And>i. findzero_dom (f, Suc i) 
    \<Longrightarrow> findzero_dom (f, i)"
    by (rule findzero.domintros) simp

  from `x \<ge> n` show ?thesis
  proof (induct rule:inc_induct)
    show "findzero_dom (f, x)" by (rule base)
  next
    fix i assume "findzero_dom (f, Suc i)"
    thus "findzero_dom (f, i)" by (rule step)
  qed
qed      
text_raw {*
\isamarkupfalse\isabellestyle{tt}
\end{minipage}\vspace{6pt}\hrule
\caption{Termination proof for @{text findzero}}\label{findzero_term}
\end{figure}
*}
      
text {*
  Again, the proof given in Fig.~\ref{findzero_term} has a lot of
  detail in order to explain the principles. Using more automation, we
  can also have a short proof:
*}

lemma findzero_termination_short:
  assumes zero: "x >= n" 
  assumes [simp]: "f x = 0"
  shows "findzero_dom (f, n)"
using zero
by (induct rule:inc_induct) (auto intro: findzero.domintros)
    
text {*
  \noindent It is simple to combine the partial correctness result with the
  termination lemma:
*}

lemma findzero_total_correctness:
  "f x = 0 \<Longrightarrow> f (findzero f 0) = 0"
by (blast intro: findzero_zero findzero_termination)

subsection {* Definition of the domain predicate *}

text {*
  Sometimes it is useful to know what the definition of the domain
  predicate looks like. Actually, @{text findzero_dom} is just an
  abbreviation:

  @{abbrev[display] findzero_dom}

  The domain predicate is the \emph{accessible part} of a relation @{const
  findzero_rel}, which was also created internally by the function
  package. @{const findzero_rel} is just a normal
  inductive predicate, so we can inspect its definition by
  looking at the introduction rules @{text findzero_rel.intros}.
  In our case there is just a single rule:

  @{thm[display] findzero_rel.intros}

  The predicate @{const findzero_rel}
  describes the \emph{recursion relation} of the function
  definition. The recursion relation is a binary relation on
  the arguments of the function that relates each argument to its
  recursive calls. In general, there is one introduction rule for each
  recursive call.

  The predicate @{term "accp findzero_rel"} is the accessible part of
  that relation. An argument belongs to the accessible part, if it can
  be reached in a finite number of steps (cf.~its definition in @{text
  "Wellfounded.thy"}).

  Since the domain predicate is just an abbreviation, you can use
  lemmas for @{const accp} and @{const findzero_rel} directly. Some
  lemmas which are occasionally useful are @{text accpI}, @{text
  accp_downward}, and of course the introduction and elimination rules
  for the recursion relation @{text "findzero.intros"} and @{text "findzero.cases"}.
*}

(*lemma findzero_nicer_domintros:
  "f x = 0 \<Longrightarrow> findzero_dom (f, x)"
  "findzero_dom (f, Suc x) \<Longrightarrow> findzero_dom (f, x)"
by (rule accpI, erule findzero_rel.cases, auto)+
*)
  
subsection {* A Useful Special Case: Tail recursion *}

text {*
  The domain predicate is our trick that allows us to model partiality
  in a world of total functions. The downside of this is that we have
  to carry it around all the time. The termination proof above allowed
  us to replace the abstract @{term "findzero_dom (f, n)"} by the more
  concrete @{term "(x \<ge> n \<and> f x = (0::nat))"}, but the condition is still
  there and can only be discharged for special cases.
  In particular, the domain predicate guards the unfolding of our
  function, since it is there as a condition in the @{text psimp}
  rules. 

  Now there is an important special case: We can actually get rid
  of the condition in the simplification rules, \emph{if the function
  is tail-recursive}. The reason is that for all tail-recursive
  equations there is a total function satisfying them, even if they
  are non-terminating. 

%  A function is tail recursive, if each call to the function is either
%  equal
%
%  So the outer form of the 
%
%if it can be written in the following
%  form:
%  {term[display] "f x = (if COND x then BASE x else f (LOOP x))"}


  The function package internally does the right construction and can
  derive the unconditional simp rules, if we ask it to do so. Luckily,
  our @{const "findzero"} function is tail-recursive, so we can just go
  back and add another option to the \cmd{function} command:

\vspace{1ex}
\noindent\cmd{function} @{text "(domintros, tailrec) findzero :: \"(nat \<Rightarrow> nat) \<Rightarrow> nat \<Rightarrow> nat\""}\\%
\cmd{where}\isanewline%
\ \ \ldots\\%

  
  \noindent Now, we actually get unconditional simplification rules, even
  though the function is partial:
*}

thm findzero.simps

text {*
  @{thm[display] findzero.simps}

  \noindent Of course these would make the simplifier loop, so we better remove
  them from the simpset:
*}

declare findzero.simps[simp del]

text {* 
  Getting rid of the domain conditions in the simplification rules is
  not only useful because it simplifies proofs. It is also required in
  order to use Isabelle's code generator to generate ML code
  from a function definition.
  Since the code generator only works with equations, it cannot be
  used with @{text "psimp"} rules. Thus, in order to generate code for
  partial functions, they must be defined as a tail recursion.
  Luckily, many functions have a relatively natural tail recursive
  definition.
*}

section {* Nested recursion *}

text {*
  Recursive calls which are nested in one another frequently cause
  complications, since their termination proof can depend on a partial
  correctness property of the function itself. 

  As a small example, we define the \qt{nested zero} function:
*}

function nz :: "nat \<Rightarrow> nat"
where
  "nz 0 = 0"
| "nz (Suc n) = nz (nz n)"
by pat_completeness auto

text {*
  If we attempt to prove termination using the identity measure on
  naturals, this fails:
*}

termination
  apply (relation "measure (\<lambda>n. n)")
  apply auto

txt {*
  We get stuck with the subgoal

  @{subgoals[display]}

  Of course this statement is true, since we know that @{const nz} is
  the zero function. And in fact we have no problem proving this
  property by induction.
*}
(*<*)oops(*>*)
lemma nz_is_zero: "nz_dom n \<Longrightarrow> nz n = 0"
  by (induct rule:nz.pinduct) auto

text {*
  We formulate this as a partial correctness lemma with the condition
  @{term "nz_dom n"}. This allows us to prove it with the @{text
  pinduct} rule before we have proved termination. With this lemma,
  the termination proof works as expected:
*}

termination
  by (relation "measure (\<lambda>n. n)") (auto simp: nz_is_zero)

text {*
  As a general strategy, one should prove the statements needed for
  termination as a partial property first. Then they can be used to do
  the termination proof. This also works for less trivial
  examples. Figure \ref{f91} defines the 91-function, a well-known
  challenge problem due to John McCarthy, and proves its termination.
*}

text_raw {*
\begin{figure}
\hrule\vspace{6pt}
\begin{minipage}{0.8\textwidth}
\isabellestyle{it}
\isastyle\isamarkuptrue
*}

function f91 :: "nat \<Rightarrow> nat"
where
  "f91 n = (if 100 < n then n - 10 else f91 (f91 (n + 11)))"
by pat_completeness auto

lemma f91_estimate: 
  assumes trm: "f91_dom n" 
  shows "n < f91 n + 11"
using trm by induct auto

termination
proof
  let ?R = "measure (\<lambda>x. 101 - x)"
  show "wf ?R" ..

  fix n :: nat assume "\<not> 100 < n" -- "Assumptions for both calls"

  thus "(n + 11, n) \<in> ?R" by simp -- "Inner call"

  assume inner_trm: "f91_dom (n + 11)" -- "Outer call"
  with f91_estimate have "n + 11 < f91 (n + 11) + 11" .
  with `\<not> 100 < n` show "(f91 (n + 11), n) \<in> ?R" by simp
qed

text_raw {*
\isamarkupfalse\isabellestyle{tt}
\end{minipage}
\vspace{6pt}\hrule
\caption{McCarthy's 91-function}\label{f91}
\end{figure}
*}


section {* Higher-Order Recursion *}

text {*
  Higher-order recursion occurs when recursive calls
  are passed as arguments to higher-order combinators such as @{const
  map}, @{term filter} etc.
  As an example, imagine a datatype of n-ary trees:
*}

datatype 'a tree = 
  Leaf 'a 
| Branch "'a tree list"


text {* \noindent We can define a function which swaps the left and right subtrees recursively, using the 
  list functions @{const rev} and @{const map}: *}

fun mirror :: "'a tree \<Rightarrow> 'a tree"
where
  "mirror (Leaf n) = Leaf n"
| "mirror (Branch l) = Branch (rev (map mirror l))"

text {*
  Although the definition is accepted without problems, let us look at the termination proof:
*}

termination proof
  txt {*

  As usual, we have to give a wellfounded relation, such that the
  arguments of the recursive calls get smaller. But what exactly are
  the arguments of the recursive calls when mirror is given as an
  argument to @{const map}? Isabelle gives us the
  subgoals

  @{subgoals[display,indent=0]} 

  So the system seems to know that @{const map} only
  applies the recursive call @{term "mirror"} to elements
  of @{term "l"}, which is essential for the termination proof.

  This knowledge about @{const map} is encoded in so-called congruence rules,
  which are special theorems known to the \cmd{function} command. The
  rule for @{const map} is

  @{thm[display] map_cong}

  You can read this in the following way: Two applications of @{const
  map} are equal, if the list arguments are equal and the functions
  coincide on the elements of the list. This means that for the value 
  @{term "map f l"} we only have to know how @{term f} behaves on
  the elements of @{term l}.

  Usually, one such congruence rule is
  needed for each higher-order construct that is used when defining
  new functions. In fact, even basic functions like @{const
  If} and @{const Let} are handled by this mechanism. The congruence
  rule for @{const If} states that the @{text then} branch is only
  relevant if the condition is true, and the @{text else} branch only if it
  is false:

  @{thm[display] if_cong}
  
  Congruence rules can be added to the
  function package by giving them the @{term fundef_cong} attribute.

  The constructs that are predefined in Isabelle, usually
  come with the respective congruence rules.
  But if you define your own higher-order functions, you may have to
  state and prove the required congruence rules yourself, if you want to use your
  functions in recursive definitions. 
*}
(*<*)oops(*>*)

subsection {* Congruence Rules and Evaluation Order *}

text {* 
  Higher order logic differs from functional programming languages in
  that it has no built-in notion of evaluation order. A program is
  just a set of equations, and it is not specified how they must be
  evaluated. 

  However for the purpose of function definition, we must talk about
  evaluation order implicitly, when we reason about termination.
  Congruence rules express that a certain evaluation order is
  consistent with the logical definition. 

  Consider the following function.
*}

function f :: "nat \<Rightarrow> bool"
where
  "f n = (n = 0 \<or> f (n - 1))"
(*<*)by pat_completeness auto(*>*)

text {*
  For this definition, the termination proof fails. The default configuration
  specifies no congruence rule for disjunction. We have to add a
  congruence rule that specifies left-to-right evaluation order:

  \vspace{1ex}
  \noindent @{thm disj_cong}\hfill(@{text "disj_cong"})
  \vspace{1ex}

  Now the definition works without problems. Note how the termination
  proof depends on the extra condition that we get from the congruence
  rule.

  However, as evaluation is not a hard-wired concept, we
  could just turn everything around by declaring a different
  congruence rule. Then we can make the reverse definition:
*}

lemma disj_cong2[fundef_cong]: 
  "(\<not> Q' \<Longrightarrow> P = P') \<Longrightarrow> (Q = Q') \<Longrightarrow> (P \<or> Q) = (P' \<or> Q')"
  by blast

fun f' :: "nat \<Rightarrow> bool"
where
  "f' n = (f' (n - 1) \<or> n = 0)"

text {*
  \noindent These examples show that, in general, there is no \qt{best} set of
  congruence rules.

  However, such tweaking should rarely be necessary in
  practice, as most of the time, the default set of congruence rules
  works well.
*}

end
author	wenzelm
	Tue, 02 Jun 2009 23:30:45 +0200
changeset 31384	ce169bd37fc0
parent 30226	2f4684e2ea95
child 33856	14a658faadb6
permissions	-rw-r--r--