diff -r 2775062fd3a9 -r 2f4684e2ea95 doc-src/Functions/Thy/Functions.thy --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc-src/Functions/Thy/Functions.thy Tue Mar 03 11:00:51 2009 +0100 @@ -0,0 +1,1264 @@ +(* Title: doc-src/IsarAdvanced/Functions/Thy/Fundefs.thy + Author: Alexander Krauss, TU Muenchen + +Tutorial for function definitions with the new "function" package. +*) + +theory Functions +imports Main +begin + +section {* Function Definitions for Dummies *} + +text {* + In most cases, defining a recursive function is just as simple as other definitions: +*} + +fun fib :: "nat \ nat" +where + "fib 0 = 1" +| "fib (Suc 0) = 1" +| "fib (Suc (Suc n)) = fib n + fib (Suc n)" + +text {* + The syntax is rather self-explanatory: We introduce a function by + giving its name, its type, + and a set of defining recursive equations. + If we leave out the type, the most general type will be + inferred, which can sometimes lead to surprises: Since both @{term + "1::nat"} and @{text "+"} are overloaded, we would end up + with @{text "fib :: nat \ 'a::{one,plus}"}. +*} + +text {* + The function always terminates, since its argument gets smaller in + every recursive call. + Since HOL is a logic of total functions, termination is a + fundamental requirement to prevent inconsistencies\footnote{From the + \qt{definition} @{text "f(n) = f(n) + 1"} we could prove + @{text "0 = 1"} by subtracting @{text "f(n)"} on both sides.}. + Isabelle tries to prove termination automatically when a definition + is made. In \S\ref{termination}, we will look at cases where this + fails and see what to do then. +*} + +subsection {* Pattern matching *} + +text {* \label{patmatch} + Like in functional programming, we can use pattern matching to + define functions. At the moment we will only consider \emph{constructor + patterns}, which only consist of datatype constructors and + variables. Furthermore, patterns must be linear, i.e.\ all variables + on the left hand side of an equation must be distinct. In + \S\ref{genpats} we discuss more general pattern matching. + + If patterns overlap, the order of the equations is taken into + account. The following function inserts a fixed element between any + two elements of a list: +*} + +fun sep :: "'a \ 'a list \ 'a list" +where + "sep a (x#y#xs) = x # a # sep a (y # xs)" +| "sep a xs = xs" + +text {* + Overlapping patterns are interpreted as \qt{increments} to what is + already there: The second equation is only meant for the cases where + the first one does not match. Consequently, Isabelle replaces it + internally by the remaining cases, making the patterns disjoint: +*} + +thm sep.simps + +text {* @{thm [display] sep.simps[no_vars]} *} + +text {* + \noindent The equations from function definitions are automatically used in + simplification: +*} + +lemma "sep 0 [1, 2, 3] = [1, 0, 2, 0, 3]" +by simp + +subsection {* Induction *} + +text {* + + Isabelle provides customized induction rules for recursive + functions. These rules follow the recursive structure of the + definition. Here is the rule @{text sep.induct} arising from the + above definition of @{const sep}: + + @{thm [display] sep.induct} + + We have a step case for list with at least two elements, and two + base cases for the zero- and the one-element list. Here is a simple + proof about @{const sep} and @{const map} +*} + +lemma "map f (sep x ys) = sep (f x) (map f ys)" +apply (induct x ys rule: sep.induct) + +txt {* + We get three cases, like in the definition. + + @{subgoals [display]} +*} + +apply auto +done +text {* + + With the \cmd{fun} command, you can define about 80\% of the + functions that occur in practice. The rest of this tutorial explains + the remaining 20\%. +*} + + +section {* fun vs.\ function *} + +text {* + The \cmd{fun} command provides a + convenient shorthand notation for simple function definitions. In + this mode, Isabelle tries to solve all the necessary proof obligations + automatically. If any proof fails, the definition is + rejected. This can either mean that the definition is indeed faulty, + or that the default proof procedures are just not smart enough (or + rather: not designed) to handle the definition. + + By expanding the abbreviation to the more verbose \cmd{function} command, these proof obligations become visible and can be analyzed or + solved manually. The expansion from \cmd{fun} to \cmd{function} is as follows: + +\end{isamarkuptext} + + +\[\left[\;\begin{minipage}{0.25\textwidth}\vspace{6pt} +\cmd{fun} @{text "f :: \"}\\% +\cmd{where}\\% +\hspace*{2ex}{\it equations}\\% +\hspace*{2ex}\vdots\vspace*{6pt} +\end{minipage}\right] +\quad\equiv\quad +\left[\;\begin{minipage}{0.48\textwidth}\vspace{6pt} +\cmd{function} @{text "("}\cmd{sequential}@{text ") f :: \"}\\% +\cmd{where}\\% +\hspace*{2ex}{\it equations}\\% +\hspace*{2ex}\vdots\\% +\cmd{by} @{text "pat_completeness auto"}\\% +\cmd{termination by} @{text "lexicographic_order"}\vspace{6pt} +\end{minipage} +\right]\] + +\begin{isamarkuptext} + \vspace*{1em} + \noindent Some details have now become explicit: + + \begin{enumerate} + \item The \cmd{sequential} option enables the preprocessing of + pattern overlaps which we already saw. Without this option, the equations + must already be disjoint and complete. The automatic completion only + works with constructor patterns. + + \item A function definition produces a proof obligation which + expresses completeness and compatibility of patterns (we talk about + this later). The combination of the methods @{text "pat_completeness"} and + @{text "auto"} is used to solve this proof obligation. + + \item A termination proof follows the definition, started by the + \cmd{termination} command. This will be explained in \S\ref{termination}. + \end{enumerate} + Whenever a \cmd{fun} command fails, it is usually a good idea to + expand the syntax to the more verbose \cmd{function} form, to see + what is actually going on. + *} + + +section {* Termination *} + +text {*\label{termination} + The method @{text "lexicographic_order"} is the default method for + termination proofs. It can prove termination of a + certain class of functions by searching for a suitable lexicographic + combination of size measures. Of course, not all functions have such + a simple termination argument. For them, we can specify the termination + relation manually. +*} + +subsection {* The {\tt relation} method *} +text{* + Consider the following function, which sums up natural numbers up to + @{text "N"}, using a counter @{text "i"}: +*} + +function sum :: "nat \ nat \ nat" +where + "sum i N = (if i > N then 0 else i + sum (Suc i) N)" +by pat_completeness auto + +text {* + \noindent The @{text "lexicographic_order"} method fails on this example, because none of the + arguments decreases in the recursive call, with respect to the standard size ordering. + To prove termination manually, we must provide a custom wellfounded relation. + + The termination argument for @{text "sum"} is based on the fact that + the \emph{difference} between @{text "i"} and @{text "N"} gets + smaller in every step, and that the recursion stops when @{text "i"} + is greater than @{text "N"}. Phrased differently, the expression + @{text "N + 1 - i"} always decreases. + + We can use this expression as a measure function suitable to prove termination. +*} + +termination sum +apply (relation "measure (\(i,N). N + 1 - i)") + +txt {* + The \cmd{termination} command sets up the termination goal for the + specified function @{text "sum"}. If the function name is omitted, it + implicitly refers to the last function definition. + + The @{text relation} method takes a relation of + type @{typ "('a \ 'a) set"}, where @{typ "'a"} is the argument type of + the function. If the function has multiple curried arguments, then + these are packed together into a tuple, as it happened in the above + example. + + The predefined function @{term[source] "measure :: ('a \ nat) \ ('a \ 'a) set"} constructs a + wellfounded relation from a mapping into the natural numbers (a + \emph{measure function}). + + After the invocation of @{text "relation"}, we must prove that (a) + the relation we supplied is wellfounded, and (b) that the arguments + of recursive calls indeed decrease with respect to the + relation: + + @{subgoals[display,indent=0]} + + These goals are all solved by @{text "auto"}: +*} + +apply auto +done + +text {* + Let us complicate the function a little, by adding some more + recursive calls: +*} + +function foo :: "nat \ nat \ nat" +where + "foo i N = (if i > N + then (if N = 0 then 0 else foo 0 (N - 1)) + else i + foo (Suc i) N)" +by pat_completeness auto + +text {* + When @{text "i"} has reached @{text "N"}, it starts at zero again + and @{text "N"} is decremented. + This corresponds to a nested + loop where one index counts up and the other down. Termination can + be proved using a lexicographic combination of two measures, namely + the value of @{text "N"} and the above difference. The @{const + "measures"} combinator generalizes @{text "measure"} by taking a + list of measure functions. +*} + +termination +by (relation "measures [\(i, N). N, \(i,N). N + 1 - i]") auto + +subsection {* How @{text "lexicographic_order"} works *} + +(*fun fails :: "nat \ nat list \ nat" +where + "fails a [] = a" +| "fails a (x#xs) = fails (x + a) (x # xs)" +*) + +text {* + To see how the automatic termination proofs work, let's look at an + example where it fails\footnote{For a detailed discussion of the + termination prover, see \cite{bulwahnKN07}}: + +\end{isamarkuptext} +\cmd{fun} @{text "fails :: \"nat \ nat list \ nat\""}\\% +\cmd{where}\\% +\hspace*{2ex}@{text "\"fails a [] = a\""}\\% +|\hspace*{1.5ex}@{text "\"fails a (x#xs) = fails (x + a) (x#xs)\""}\\ +\begin{isamarkuptext} + +\noindent Isabelle responds with the following error: + +\begin{isabelle} +*** Unfinished subgoals:\newline +*** (a, 1, <):\newline +*** \ 1.~@{text "\x. x = 0"}\newline +*** (a, 1, <=):\newline +*** \ 1.~False\newline +*** (a, 2, <):\newline +*** \ 1.~False\newline +*** Calls:\newline +*** a) @{text "(a, x # xs) -->> (x + a, x # xs)"}\newline +*** Measures:\newline +*** 1) @{text "\x. size (fst x)"}\newline +*** 2) @{text "\x. size (snd x)"}\newline +*** Result matrix:\newline +*** \ \ \ \ 1\ \ 2 \newline +*** a: ? <= \newline +*** Could not find lexicographic termination order.\newline +*** At command "fun".\newline +\end{isabelle} +*} + + +text {* + The key to this error message is the matrix at the bottom. The rows + of that matrix correspond to the different recursive calls (In our + case, there is just one). The columns are the function's arguments + (expressed through different measure functions, which map the + argument tuple to a natural number). + + The contents of the matrix summarize what is known about argument + descents: The second argument has a weak descent (@{text "<="}) at the + recursive call, and for the first argument nothing could be proved, + which is expressed by @{text "?"}. In general, there are the values + @{text "<"}, @{text "<="} and @{text "?"}. + + For the failed proof attempts, the unfinished subgoals are also + printed. Looking at these will often point to a missing lemma. + +% As a more real example, here is quicksort: +*} +(* +function qs :: "nat list \ nat list" +where + "qs [] = []" +| "qs (x#xs) = qs [y\xs. y < x] @ x # qs [y\xs. y \ x]" +by pat_completeness auto + +termination apply lexicographic_order + +text {* If we try @{text "lexicographic_order"} method, we get the + following error *} +termination by (lexicographic_order simp:l2) + +lemma l: "x \ y \ x < Suc y" by arith + +function + +*) + +section {* Mutual Recursion *} + +text {* + If two or more functions call one another mutually, they have to be defined + in one step. Here are @{text "even"} and @{text "odd"}: +*} + +function even :: "nat \ bool" + and odd :: "nat \ bool" +where + "even 0 = True" +| "odd 0 = False" +| "even (Suc n) = odd n" +| "odd (Suc n) = even n" +by pat_completeness auto + +text {* + To eliminate the mutual dependencies, Isabelle internally + creates a single function operating on the sum + type @{typ "nat + nat"}. Then, @{const even} and @{const odd} are + defined as projections. Consequently, termination has to be proved + simultaneously for both functions, by specifying a measure on the + sum type: +*} + +termination +by (relation "measure (\x. case x of Inl n \ n | Inr n \ n)") auto + +text {* + We could also have used @{text lexicographic_order}, which + supports mutual recursive termination proofs to a certain extent. +*} + +subsection {* Induction for mutual recursion *} + +text {* + + When functions are mutually recursive, proving properties about them + generally requires simultaneous induction. The induction rule @{text "even_odd.induct"} + generated from the above definition reflects this. + + Let us prove something about @{const even} and @{const odd}: +*} + +lemma even_odd_mod2: + "even n = (n mod 2 = 0)" + "odd n = (n mod 2 = 1)" + +txt {* + We apply simultaneous induction, specifying the induction variable + for both goals, separated by \cmd{and}: *} + +apply (induct n and n rule: even_odd.induct) + +txt {* + We get four subgoals, which correspond to the clauses in the + definition of @{const even} and @{const odd}: + @{subgoals[display,indent=0]} + Simplification solves the first two goals, leaving us with two + statements about the @{text "mod"} operation to prove: +*} + +apply simp_all + +txt {* + @{subgoals[display,indent=0]} + + \noindent These can be handled by Isabelle's arithmetic decision procedures. + +*} + +apply arith +apply arith +done + +text {* + In proofs like this, the simultaneous induction is really essential: + Even if we are just interested in one of the results, the other + one is necessary to strengthen the induction hypothesis. If we leave + out the statement about @{const odd} and just write @{term True} instead, + the same proof fails: +*} + +lemma failed_attempt: + "even n = (n mod 2 = 0)" + "True" +apply (induct n rule: even_odd.induct) + +txt {* + \noindent Now the third subgoal is a dead end, since we have no + useful induction hypothesis available: + + @{subgoals[display,indent=0]} +*} + +oops + +section {* General pattern matching *} +text{*\label{genpats} *} + +subsection {* Avoiding automatic pattern splitting *} + +text {* + + Up to now, we used pattern matching only on datatypes, and the + patterns were always disjoint and complete, and if they weren't, + they were made disjoint automatically like in the definition of + @{const "sep"} in \S\ref{patmatch}. + + This automatic splitting can significantly increase the number of + equations involved, and this is not always desirable. The following + example shows the problem: + + Suppose we are modeling incomplete knowledge about the world by a + three-valued datatype, which has values @{term "T"}, @{term "F"} + and @{term "X"} for true, false and uncertain propositions, respectively. +*} + +datatype P3 = T | F | X + +text {* \noindent Then the conjunction of such values can be defined as follows: *} + +fun And :: "P3 \ P3 \ P3" +where + "And T p = p" +| "And p T = p" +| "And p F = F" +| "And F p = F" +| "And X X = X" + + +text {* + This definition is useful, because the equations can directly be used + as simplification rules. But the patterns overlap: For example, + the expression @{term "And T T"} is matched by both the first and + the second equation. By default, Isabelle makes the patterns disjoint by + splitting them up, producing instances: +*} + +thm And.simps + +text {* + @{thm[indent=4] And.simps} + + \vspace*{1em} + \noindent There are several problems with this: + + \begin{enumerate} + \item If the datatype has many constructors, there can be an + explosion of equations. For @{const "And"}, we get seven instead of + five equations, which can be tolerated, but this is just a small + example. + + \item Since splitting makes the equations \qt{less general}, they + do not always match in rewriting. While the term @{term "And x F"} + can be simplified to @{term "F"} with the original equations, a + (manual) case split on @{term "x"} is now necessary. + + \item The splitting also concerns the induction rule @{text + "And.induct"}. Instead of five premises it now has seven, which + means that our induction proofs will have more cases. + + \item In general, it increases clarity if we get the same definition + back which we put in. + \end{enumerate} + + If we do not want the automatic splitting, we can switch it off by + leaving out the \cmd{sequential} option. However, we will have to + prove that our pattern matching is consistent\footnote{This prevents + us from defining something like @{term "f x = True"} and @{term "f x + = False"} simultaneously.}: +*} + +function And2 :: "P3 \ P3 \ P3" +where + "And2 T p = p" +| "And2 p T = p" +| "And2 p F = F" +| "And2 F p = F" +| "And2 X X = X" + +txt {* + \noindent Now let's look at the proof obligations generated by a + function definition. In this case, they are: + + @{subgoals[display,indent=0]}\vspace{-1.2em}\hspace{3cm}\vdots\vspace{1.2em} + + The first subgoal expresses the completeness of the patterns. It has + the form of an elimination rule and states that every @{term x} of + the function's input type must match at least one of the patterns\footnote{Completeness could + be equivalently stated as a disjunction of existential statements: +@{term "(\p. x = (T, p)) \ (\p. x = (p, T)) \ (\p. x = (p, F)) \ + (\p. x = (F, p)) \ (x = (X, X))"}, and you can use the method @{text atomize_elim} to get that form instead.}. If the patterns just involve + datatypes, we can solve it with the @{text "pat_completeness"} + method: +*} + +apply pat_completeness + +txt {* + The remaining subgoals express \emph{pattern compatibility}. We do + allow that an input value matches multiple patterns, but in this + case, the result (i.e.~the right hand sides of the equations) must + also be equal. For each pair of two patterns, there is one such + subgoal. Usually this needs injectivity of the constructors, which + is used automatically by @{text "auto"}. +*} + +by auto + + +subsection {* Non-constructor patterns *} + +text {* + Most of Isabelle's basic types take the form of inductive datatypes, + and usually pattern matching works on the constructors of such types. + However, this need not be always the case, and the \cmd{function} + command handles other kind of patterns, too. + + One well-known instance of non-constructor patterns are + so-called \emph{$n+k$-patterns}, which are a little controversial in + the functional programming world. Here is the initial fibonacci + example with $n+k$-patterns: +*} + +function fib2 :: "nat \ nat" +where + "fib2 0 = 1" +| "fib2 1 = 1" +| "fib2 (n + 2) = fib2 n + fib2 (Suc n)" + +(*<*)ML_val "goals_limit := 1"(*>*) +txt {* + This kind of matching is again justified by the proof of pattern + completeness and compatibility. + The proof obligation for pattern completeness states that every natural number is + either @{term "0::nat"}, @{term "1::nat"} or @{term "n + + (2::nat)"}: + + @{subgoals[display,indent=0]} + + This is an arithmetic triviality, but unfortunately the + @{text arith} method cannot handle this specific form of an + elimination rule. However, we can use the method @{text + "atomize_elim"} to do an ad-hoc conversion to a disjunction of + existentials, which can then be solved by the arithmetic decision procedure. + Pattern compatibility and termination are automatic as usual. +*} +(*<*)ML_val "goals_limit := 10"(*>*) +apply atomize_elim +apply arith +apply auto +done +termination by lexicographic_order +text {* + We can stretch the notion of pattern matching even more. The + following function is not a sensible functional program, but a + perfectly valid mathematical definition: +*} + +function ev :: "nat \ bool" +where + "ev (2 * n) = True" +| "ev (2 * n + 1) = False" +apply atomize_elim +by arith+ +termination by (relation "{}") simp + +text {* + This general notion of pattern matching gives you a certain freedom + in writing down specifications. However, as always, such freedom should + be used with care: + + If we leave the area of constructor + patterns, we have effectively departed from the world of functional + programming. This means that it is no longer possible to use the + code generator, and expect it to generate ML code for our + definitions. Also, such a specification might not work very well together with + simplification. Your mileage may vary. +*} + + +subsection {* Conditional equations *} + +text {* + The function package also supports conditional equations, which are + similar to guards in a language like Haskell. Here is Euclid's + algorithm written with conditional patterns\footnote{Note that the + patterns are also overlapping in the base case}: +*} + +function gcd :: "nat \ nat \ nat" +where + "gcd x 0 = x" +| "gcd 0 y = y" +| "x < y \ gcd (Suc x) (Suc y) = gcd (Suc x) (y - x)" +| "\ x < y \ gcd (Suc x) (Suc y) = gcd (x - y) (Suc y)" +by (atomize_elim, auto, arith) +termination by lexicographic_order + +text {* + By now, you can probably guess what the proof obligations for the + pattern completeness and compatibility look like. + + Again, functions with conditional patterns are not supported by the + code generator. +*} + + +subsection {* Pattern matching on strings *} + +text {* + As strings (as lists of characters) are normal datatypes, pattern + matching on them is possible, but somewhat problematic. Consider the + following definition: + +\end{isamarkuptext} +\noindent\cmd{fun} @{text "check :: \"string \ bool\""}\\% +\cmd{where}\\% +\hspace*{2ex}@{text "\"check (''good'') = True\""}\\% +@{text "| \"check s = False\""} +\begin{isamarkuptext} + + \noindent An invocation of the above \cmd{fun} command does not + terminate. What is the problem? Strings are lists of characters, and + characters are a datatype with a lot of constructors. Splitting the + catch-all pattern thus leads to an explosion of cases, which cannot + be handled by Isabelle. + + There are two things we can do here. Either we write an explicit + @{text "if"} on the right hand side, or we can use conditional patterns: +*} + +function check :: "string \ bool" +where + "check (''good'') = True" +| "s \ ''good'' \ check s = False" +by auto + + +section {* Partiality *} + +text {* + In HOL, all functions are total. A function @{term "f"} applied to + @{term "x"} always has the value @{term "f x"}, and there is no notion + of undefinedness. + This is why we have to do termination + proofs when defining functions: The proof justifies that the + function can be defined by wellfounded recursion. + + However, the \cmd{function} package does support partiality to a + certain extent. Let's look at the following function which looks + for a zero of a given function f. +*} + +function (*<*)(domintros, tailrec)(*>*)findzero :: "(nat \ nat) \ nat \ nat" +where + "findzero f n = (if f n = 0 then n else findzero f (Suc n))" +by pat_completeness auto +(*<*)declare findzero.simps[simp del](*>*) + +text {* + \noindent Clearly, any attempt of a termination proof must fail. And without + that, we do not get the usual rules @{text "findzero.simps"} and + @{text "findzero.induct"}. So what was the definition good for at all? +*} + +subsection {* Domain predicates *} + +text {* + The trick is that Isabelle has not only defined the function @{const findzero}, but also + a predicate @{term "findzero_dom"} that characterizes the values where the function + terminates: the \emph{domain} of the function. If we treat a + partial function just as a total function with an additional domain + predicate, we can derive simplification and + induction rules as we do for total functions. They are guarded + by domain conditions and are called @{text psimps} and @{text + pinduct}: +*} + +text {* + \noindent\begin{minipage}{0.79\textwidth}@{thm[display,margin=85] findzero.psimps}\end{minipage} + \hfill(@{text "findzero.psimps"}) + \vspace{1em} + + \noindent\begin{minipage}{0.79\textwidth}@{thm[display,margin=85] findzero.pinduct}\end{minipage} + \hfill(@{text "findzero.pinduct"}) +*} + +text {* + Remember that all we + are doing here is use some tricks to make a total function appear + as if it was partial. We can still write the term @{term "findzero + (\x. 1) 0"} and like any other term of type @{typ nat} it is equal + to some natural number, although we might not be able to find out + which one. The function is \emph{underdefined}. + + But it is defined enough to prove something interesting about it. We + can prove that if @{term "findzero f n"} + terminates, it indeed returns a zero of @{term f}: +*} + +lemma findzero_zero: "findzero_dom (f, n) \ f (findzero f n) = 0" + +txt {* \noindent We apply induction as usual, but using the partial induction + rule: *} + +apply (induct f n rule: findzero.pinduct) + +txt {* \noindent This gives the following subgoals: + + @{subgoals[display,indent=0]} + + \noindent The hypothesis in our lemma was used to satisfy the first premise in + the induction rule. However, we also get @{term + "findzero_dom (f, n)"} as a local assumption in the induction step. This + allows to unfold @{term "findzero f n"} using the @{text psimps} + rule, and the rest is trivial. Since the @{text psimps} rules carry the + @{text "[simp]"} attribute by default, we just need a single step: + *} +apply simp +done + +text {* + Proofs about partial functions are often not harder than for total + functions. Fig.~\ref{findzero_isar} shows a slightly more + complicated proof written in Isar. It is verbose enough to show how + partiality comes into play: From the partial induction, we get an + additional domain condition hypothesis. Observe how this condition + is applied when calls to @{term findzero} are unfolded. +*} + +text_raw {* +\begin{figure} +\hrule\vspace{6pt} +\begin{minipage}{0.8\textwidth} +\isabellestyle{it} +\isastyle\isamarkuptrue +*} +lemma "\findzero_dom (f, n); x \ {n ..< findzero f n}\ \ f x \ 0" +proof (induct rule: findzero.pinduct) + fix f n assume dom: "findzero_dom (f, n)" + and IH: "\f n \ 0; x \ {Suc n ..< findzero f (Suc n)}\ \ f x \ 0" + and x_range: "x \ {n ..< findzero f n}" + have "f n \ 0" + proof + assume "f n = 0" + with dom have "findzero f n = n" by simp + with x_range show False by auto + qed + + from x_range have "x = n \ x \ {Suc n ..< findzero f n}" by auto + thus "f x \ 0" + proof + assume "x = n" + with `f n \ 0` show ?thesis by simp + next + assume "x \ {Suc n ..< findzero f n}" + with dom and `f n \ 0` have "x \ {Suc n ..< findzero f (Suc n)}" by simp + with IH and `f n \ 0` + show ?thesis by simp + qed +qed +text_raw {* +\isamarkupfalse\isabellestyle{tt} +\end{minipage}\vspace{6pt}\hrule +\caption{A proof about a partial function}\label{findzero_isar} +\end{figure} +*} + +subsection {* Partial termination proofs *} + +text {* + Now that we have proved some interesting properties about our + function, we should turn to the domain predicate and see if it is + actually true for some values. Otherwise we would have just proved + lemmas with @{term False} as a premise. + + Essentially, we need some introduction rules for @{text + findzero_dom}. The function package can prove such domain + introduction rules automatically. But since they are not used very + often (they are almost never needed if the function is total), this + functionality is disabled by default for efficiency reasons. So we have to go + back and ask for them explicitly by passing the @{text + "(domintros)"} option to the function package: + +\vspace{1ex} +\noindent\cmd{function} @{text "(domintros) findzero :: \"(nat \ nat) \ nat \ nat\""}\\% +\cmd{where}\isanewline% +\ \ \ldots\\ + + \noindent Now the package has proved an introduction rule for @{text findzero_dom}: +*} + +thm findzero.domintros + +text {* + @{thm[display] findzero.domintros} + + Domain introduction rules allow to show that a given value lies in the + domain of a function, if the arguments of all recursive calls + are in the domain as well. They allow to do a \qt{single step} in a + termination proof. Usually, you want to combine them with a suitable + induction principle. + + Since our function increases its argument at recursive calls, we + need an induction principle which works \qt{backwards}. We will use + @{text inc_induct}, which allows to do induction from a fixed number + \qt{downwards}: + + \begin{center}@{thm inc_induct}\hfill(@{text "inc_induct"})\end{center} + + Figure \ref{findzero_term} gives a detailed Isar proof of the fact + that @{text findzero} terminates if there is a zero which is greater + or equal to @{term n}. First we derive two useful rules which will + solve the base case and the step case of the induction. The + induction is then straightforward, except for the unusual induction + principle. + +*} + +text_raw {* +\begin{figure} +\hrule\vspace{6pt} +\begin{minipage}{0.8\textwidth} +\isabellestyle{it} +\isastyle\isamarkuptrue +*} +lemma findzero_termination: + assumes "x \ n" and "f x = 0" + shows "findzero_dom (f, n)" +proof - + have base: "findzero_dom (f, x)" + by (rule findzero.domintros) (simp add:`f x = 0`) + + have step: "\i. findzero_dom (f, Suc i) + \ findzero_dom (f, i)" + by (rule findzero.domintros) simp + + from `x \ n` show ?thesis + proof (induct rule:inc_induct) + show "findzero_dom (f, x)" by (rule base) + next + fix i assume "findzero_dom (f, Suc i)" + thus "findzero_dom (f, i)" by (rule step) + qed +qed +text_raw {* +\isamarkupfalse\isabellestyle{tt} +\end{minipage}\vspace{6pt}\hrule +\caption{Termination proof for @{text findzero}}\label{findzero_term} +\end{figure} +*} + +text {* + Again, the proof given in Fig.~\ref{findzero_term} has a lot of + detail in order to explain the principles. Using more automation, we + can also have a short proof: +*} + +lemma findzero_termination_short: + assumes zero: "x >= n" + assumes [simp]: "f x = 0" + shows "findzero_dom (f, n)" +using zero +by (induct rule:inc_induct) (auto intro: findzero.domintros) + +text {* + \noindent It is simple to combine the partial correctness result with the + termination lemma: +*} + +lemma findzero_total_correctness: + "f x = 0 \ f (findzero f 0) = 0" +by (blast intro: findzero_zero findzero_termination) + +subsection {* Definition of the domain predicate *} + +text {* + Sometimes it is useful to know what the definition of the domain + predicate looks like. Actually, @{text findzero_dom} is just an + abbreviation: + + @{abbrev[display] findzero_dom} + + The domain predicate is the \emph{accessible part} of a relation @{const + findzero_rel}, which was also created internally by the function + package. @{const findzero_rel} is just a normal + inductive predicate, so we can inspect its definition by + looking at the introduction rules @{text findzero_rel.intros}. + In our case there is just a single rule: + + @{thm[display] findzero_rel.intros} + + The predicate @{const findzero_rel} + describes the \emph{recursion relation} of the function + definition. The recursion relation is a binary relation on + the arguments of the function that relates each argument to its + recursive calls. In general, there is one introduction rule for each + recursive call. + + The predicate @{term "accp findzero_rel"} is the accessible part of + that relation. An argument belongs to the accessible part, if it can + be reached in a finite number of steps (cf.~its definition in @{text + "Wellfounded.thy"}). + + Since the domain predicate is just an abbreviation, you can use + lemmas for @{const accp} and @{const findzero_rel} directly. Some + lemmas which are occasionally useful are @{text accpI}, @{text + accp_downward}, and of course the introduction and elimination rules + for the recursion relation @{text "findzero.intros"} and @{text "findzero.cases"}. +*} + +(*lemma findzero_nicer_domintros: + "f x = 0 \ findzero_dom (f, x)" + "findzero_dom (f, Suc x) \ findzero_dom (f, x)" +by (rule accpI, erule findzero_rel.cases, auto)+ +*) + +subsection {* A Useful Special Case: Tail recursion *} + +text {* + The domain predicate is our trick that allows us to model partiality + in a world of total functions. The downside of this is that we have + to carry it around all the time. The termination proof above allowed + us to replace the abstract @{term "findzero_dom (f, n)"} by the more + concrete @{term "(x \ n \ f x = (0::nat))"}, but the condition is still + there and can only be discharged for special cases. + In particular, the domain predicate guards the unfolding of our + function, since it is there as a condition in the @{text psimp} + rules. + + Now there is an important special case: We can actually get rid + of the condition in the simplification rules, \emph{if the function + is tail-recursive}. The reason is that for all tail-recursive + equations there is a total function satisfying them, even if they + are non-terminating. + +% A function is tail recursive, if each call to the function is either +% equal +% +% So the outer form of the +% +%if it can be written in the following +% form: +% {term[display] "f x = (if COND x then BASE x else f (LOOP x))"} + + + The function package internally does the right construction and can + derive the unconditional simp rules, if we ask it to do so. Luckily, + our @{const "findzero"} function is tail-recursive, so we can just go + back and add another option to the \cmd{function} command: + +\vspace{1ex} +\noindent\cmd{function} @{text "(domintros, tailrec) findzero :: \"(nat \ nat) \ nat \ nat\""}\\% +\cmd{where}\isanewline% +\ \ \ldots\\% + + + \noindent Now, we actually get unconditional simplification rules, even + though the function is partial: +*} + +thm findzero.simps + +text {* + @{thm[display] findzero.simps} + + \noindent Of course these would make the simplifier loop, so we better remove + them from the simpset: +*} + +declare findzero.simps[simp del] + +text {* + Getting rid of the domain conditions in the simplification rules is + not only useful because it simplifies proofs. It is also required in + order to use Isabelle's code generator to generate ML code + from a function definition. + Since the code generator only works with equations, it cannot be + used with @{text "psimp"} rules. Thus, in order to generate code for + partial functions, they must be defined as a tail recursion. + Luckily, many functions have a relatively natural tail recursive + definition. +*} + +section {* Nested recursion *} + +text {* + Recursive calls which are nested in one another frequently cause + complications, since their termination proof can depend on a partial + correctness property of the function itself. + + As a small example, we define the \qt{nested zero} function: +*} + +function nz :: "nat \ nat" +where + "nz 0 = 0" +| "nz (Suc n) = nz (nz n)" +by pat_completeness auto + +text {* + If we attempt to prove termination using the identity measure on + naturals, this fails: +*} + +termination + apply (relation "measure (\n. n)") + apply auto + +txt {* + We get stuck with the subgoal + + @{subgoals[display]} + + Of course this statement is true, since we know that @{const nz} is + the zero function. And in fact we have no problem proving this + property by induction. +*} +(*<*)oops(*>*) +lemma nz_is_zero: "nz_dom n \ nz n = 0" + by (induct rule:nz.pinduct) auto + +text {* + We formulate this as a partial correctness lemma with the condition + @{term "nz_dom n"}. This allows us to prove it with the @{text + pinduct} rule before we have proved termination. With this lemma, + the termination proof works as expected: +*} + +termination + by (relation "measure (\n. n)") (auto simp: nz_is_zero) + +text {* + As a general strategy, one should prove the statements needed for + termination as a partial property first. Then they can be used to do + the termination proof. This also works for less trivial + examples. Figure \ref{f91} defines the 91-function, a well-known + challenge problem due to John McCarthy, and proves its termination. +*} + +text_raw {* +\begin{figure} +\hrule\vspace{6pt} +\begin{minipage}{0.8\textwidth} +\isabellestyle{it} +\isastyle\isamarkuptrue +*} + +function f91 :: "nat \ nat" +where + "f91 n = (if 100 < n then n - 10 else f91 (f91 (n + 11)))" +by pat_completeness auto + +lemma f91_estimate: + assumes trm: "f91_dom n" + shows "n < f91 n + 11" +using trm by induct auto + +termination +proof + let ?R = "measure (\x. 101 - x)" + show "wf ?R" .. + + fix n :: nat assume "\ 100 < n" -- "Assumptions for both calls" + + thus "(n + 11, n) \ ?R" by simp -- "Inner call" + + assume inner_trm: "f91_dom (n + 11)" -- "Outer call" + with f91_estimate have "n + 11 < f91 (n + 11) + 11" . + with `\ 100 < n` show "(f91 (n + 11), n) \ ?R" by simp +qed + +text_raw {* +\isamarkupfalse\isabellestyle{tt} +\end{minipage} +\vspace{6pt}\hrule +\caption{McCarthy's 91-function}\label{f91} +\end{figure} +*} + + +section {* Higher-Order Recursion *} + +text {* + Higher-order recursion occurs when recursive calls + are passed as arguments to higher-order combinators such as @{const + map}, @{term filter} etc. + As an example, imagine a datatype of n-ary trees: +*} + +datatype 'a tree = + Leaf 'a +| Branch "'a tree list" + + +text {* \noindent We can define a function which swaps the left and right subtrees recursively, using the + list functions @{const rev} and @{const map}: *} + +fun mirror :: "'a tree \ 'a tree" +where + "mirror (Leaf n) = Leaf n" +| "mirror (Branch l) = Branch (rev (map mirror l))" + +text {* + Although the definition is accepted without problems, let us look at the termination proof: +*} + +termination proof + txt {* + + As usual, we have to give a wellfounded relation, such that the + arguments of the recursive calls get smaller. But what exactly are + the arguments of the recursive calls when mirror is given as an + argument to @{const map}? Isabelle gives us the + subgoals + + @{subgoals[display,indent=0]} + + So the system seems to know that @{const map} only + applies the recursive call @{term "mirror"} to elements + of @{term "l"}, which is essential for the termination proof. + + This knowledge about @{const map} is encoded in so-called congruence rules, + which are special theorems known to the \cmd{function} command. The + rule for @{const map} is + + @{thm[display] map_cong} + + You can read this in the following way: Two applications of @{const + map} are equal, if the list arguments are equal and the functions + coincide on the elements of the list. This means that for the value + @{term "map f l"} we only have to know how @{term f} behaves on + the elements of @{term l}. + + Usually, one such congruence rule is + needed for each higher-order construct that is used when defining + new functions. In fact, even basic functions like @{const + If} and @{const Let} are handled by this mechanism. The congruence + rule for @{const If} states that the @{text then} branch is only + relevant if the condition is true, and the @{text else} branch only if it + is false: + + @{thm[display] if_cong} + + Congruence rules can be added to the + function package by giving them the @{term fundef_cong} attribute. + + The constructs that are predefined in Isabelle, usually + come with the respective congruence rules. + But if you define your own higher-order functions, you may have to + state and prove the required congruence rules yourself, if you want to use your + functions in recursive definitions. +*} +(*<*)oops(*>*) + +subsection {* Congruence Rules and Evaluation Order *} + +text {* + Higher order logic differs from functional programming languages in + that it has no built-in notion of evaluation order. A program is + just a set of equations, and it is not specified how they must be + evaluated. + + However for the purpose of function definition, we must talk about + evaluation order implicitly, when we reason about termination. + Congruence rules express that a certain evaluation order is + consistent with the logical definition. + + Consider the following function. +*} + +function f :: "nat \ bool" +where + "f n = (n = 0 \ f (n - 1))" +(*<*)by pat_completeness auto(*>*) + +text {* + For this definition, the termination proof fails. The default configuration + specifies no congruence rule for disjunction. We have to add a + congruence rule that specifies left-to-right evaluation order: + + \vspace{1ex} + \noindent @{thm disj_cong}\hfill(@{text "disj_cong"}) + \vspace{1ex} + + Now the definition works without problems. Note how the termination + proof depends on the extra condition that we get from the congruence + rule. + + However, as evaluation is not a hard-wired concept, we + could just turn everything around by declaring a different + congruence rule. Then we can make the reverse definition: +*} + +lemma disj_cong2[fundef_cong]: + "(\ Q' \ P = P') \ (Q = Q') \ (P \ Q) = (P' \ Q')" + by blast + +fun f' :: "nat \ bool" +where + "f' n = (f' (n - 1) \ n = 0)" + +text {* + \noindent These examples show that, in general, there is no \qt{best} set of + congruence rules. + + However, such tweaking should rarely be necessary in + practice, as most of the time, the default set of congruence rules + works well. +*} + +end