5375
|
1 |
\chapter{Functional Programming in HOL}
|
|
2 |
|
|
3 |
Although on the surface this chapter is mainly concerned with how to write
|
|
4 |
functional programs in HOL and how to verify them, most of the
|
|
5 |
constructs and proof procedures introduced are general purpose and recur in
|
|
6 |
any specification or verification task.
|
|
7 |
|
|
8 |
The dedicated functional programmer should be warned: HOL offers only what
|
|
9 |
could be called {\em total functional programming} --- all functions in HOL
|
|
10 |
must be total; lazy data structures are not directly available. On the
|
|
11 |
positive side, functions in HOL need not be computable: HOL is a
|
|
12 |
specification language that goes well beyond what can be expressed as a
|
|
13 |
program. However, for the time being we concentrate on the computable.
|
|
14 |
|
|
15 |
\section{An introductory theory}
|
|
16 |
\label{sec:intro-theory}
|
|
17 |
|
|
18 |
Functional programming needs datatypes and functions. Both of them can be
|
|
19 |
defined in a theory with a syntax reminiscent of languages like ML or
|
|
20 |
Haskell. As an example consider the theory in Fig.~\ref{fig:ToyList}.
|
|
21 |
|
|
22 |
\begin{figure}[htbp]
|
|
23 |
\begin{ttbox}\makeatother
|
|
24 |
\input{ToyList/ToyList.thy}\end{ttbox}
|
|
25 |
\caption{A theory of lists}
|
|
26 |
\label{fig:ToyList}
|
|
27 |
\end{figure}
|
|
28 |
|
|
29 |
HOL already has a predefined theory of lists called \texttt{List} ---
|
|
30 |
\texttt{ToyList} is merely a small fragment of it chosen as an example. In
|
|
31 |
contrast to what is recommended in \S\ref{sec:Basic:Theories},
|
|
32 |
\texttt{ToyList} is not based on \texttt{Main} but on \texttt{Datatype}, a
|
|
33 |
theory that contains everything required for datatype definitions but does
|
|
34 |
not have \texttt{List} as a parent, thus avoiding ambiguities caused by
|
|
35 |
defining lists twice.
|
|
36 |
|
|
37 |
The \ttindexbold{datatype} \texttt{list} introduces two constructors
|
|
38 |
\texttt{Nil} and \texttt{Cons}, the empty list and the operator that adds an
|
|
39 |
element to the front of a list. For example, the term \texttt{Cons True (Cons
|
|
40 |
False Nil)} is a value of type \texttt{bool~list}, namely the list with the
|
|
41 |
elements \texttt{True} and \texttt{False}. Because this notation becomes
|
|
42 |
unwieldy very quickly, the datatype declaration is annotated with an
|
|
43 |
alternative syntax: instead of \texttt{Nil} and \texttt{Cons}~$x$~$xs$ we can
|
|
44 |
write \index{#@{\tt[]}|bold}\texttt{[]} and
|
|
45 |
\texttt{$x$~\#~$xs$}\index{#@{\tt\#}|bold}. In fact, this alternative syntax
|
|
46 |
is the standard syntax. Thus the list \texttt{Cons True (Cons False Nil)}
|
|
47 |
becomes \texttt{True \# False \# []}. The annotation \ttindexbold{infixr}
|
|
48 |
means that \texttt{\#} associates to the right, i.e.\ the term \texttt{$x$ \#
|
|
49 |
$y$ \# $z$} is read as \texttt{$x$ \# ($y$ \# $z$)} and not as \texttt{($x$
|
|
50 |
\# $y$) \# $z$}.
|
|
51 |
|
|
52 |
\begin{warn}
|
|
53 |
Syntax annotations are a powerful but completely optional feature. You
|
|
54 |
could drop them from theory \texttt{ToyList} and go back to the identifiers
|
|
55 |
\texttt{Nil} and \texttt{Cons}. However, lists are such a central datatype
|
|
56 |
that their syntax is highly customized. We recommend that novices should
|
|
57 |
not use syntax annotations in their own theories.
|
|
58 |
\end{warn}
|
|
59 |
|
|
60 |
Next, the functions \texttt{app} and \texttt{rev} are declared. In contrast
|
|
61 |
to ML, Isabelle insists on explicit declarations of all functions (keyword
|
|
62 |
\ttindexbold{consts}). (Apart from the declaration-before-use restriction,
|
|
63 |
the order of items in a theory file is unconstrained.) Function \texttt{app}
|
|
64 |
is annotated with concrete syntax too. Instead of the prefix syntax
|
|
65 |
\texttt{app}~$xs$~$ys$ the infix $xs$~\texttt{\at}~$ys$ becomes the preferred
|
|
66 |
form.
|
|
67 |
|
|
68 |
Both functions are defined recursively. The equations for \texttt{app} and
|
|
69 |
\texttt{rev} hardly need comments: \texttt{app} appends two lists and
|
|
70 |
\texttt{rev} reverses a list. The keyword \texttt{primrec} indicates that
|
|
71 |
the recursion is of a particularly primitive kind where each recursive call
|
|
72 |
peels off a datatype constructor from one of the arguments (see
|
|
73 |
\S\ref{sec:datatype}). Thus the recursion always terminates, i.e.\ the
|
|
74 |
function is \bfindex{total}.
|
|
75 |
|
|
76 |
The termination requirement is absolutely essential in HOL, a logic of total
|
|
77 |
functions. If we were to drop it, inconsistencies could quickly arise: the
|
|
78 |
``definition'' $f(n) = f(n)+1$ immediately leads to $0 = 1$ by subtracting
|
|
79 |
$f(n)$ on both sides.
|
|
80 |
% However, this is a subtle issue that we cannot discuss here further.
|
|
81 |
|
|
82 |
\begin{warn}
|
|
83 |
As we have indicated, the desire for total functions is not a gratuitously
|
|
84 |
imposed restriction but an essential characteristic of HOL. It is only
|
|
85 |
because of totality that reasoning in HOL is comparatively easy. More
|
|
86 |
generally, the philosophy in HOL is not to allow arbitrary axioms (such as
|
|
87 |
function definitions whose totality has not been proved) because they
|
|
88 |
quickly lead to inconsistencies. Instead, fixed constructs for introducing
|
|
89 |
types and functions are offered (such as \texttt{datatype} and
|
|
90 |
\texttt{primrec}) which are guaranteed to preserve consistency.
|
|
91 |
\end{warn}
|
|
92 |
|
|
93 |
A remark about syntax. The textual definition of a theory follows a fixed
|
|
94 |
syntax with keywords like \texttt{datatype} and \texttt{end} (see
|
|
95 |
Fig.~\ref{fig:keywords} in Appendix~\ref{sec:Appendix} for a full list).
|
|
96 |
Embedded in this syntax are the types and formulae of HOL, whose syntax is
|
|
97 |
extensible, e.g.\ by new user-defined infix operators
|
|
98 |
(see~\ref{sec:infix-syntax}). To distinguish the two levels, everything
|
|
99 |
HOL-specific should be enclosed in \texttt{"}\dots\texttt{"}. The same holds
|
|
100 |
for identifiers that happen to be keywords, as in
|
|
101 |
\begin{ttbox}
|
|
102 |
consts "end" :: 'a list => 'a
|
|
103 |
\end{ttbox}
|
|
104 |
To lessen this burden, quotation marks around types can be dropped,
|
|
105 |
provided their syntax does not go beyond what is described in
|
|
106 |
\S\ref{sec:TypesTermsForms}. Types containing further operators, e.g.\
|
|
107 |
\texttt{*} for Cartesian products, need quotation marks.
|
|
108 |
|
|
109 |
When Isabelle prints a syntax error message, it refers to the HOL syntax as
|
|
110 |
the \bfindex{inner syntax}.
|
|
111 |
|
|
112 |
\section{An introductory proof}
|
|
113 |
\label{sec:intro-proof}
|
|
114 |
|
|
115 |
Having defined \texttt{ToyList}, we load it with the ML command
|
|
116 |
\begin{ttbox}
|
|
117 |
use_thy "ToyList";
|
|
118 |
\end{ttbox}
|
|
119 |
and are ready to prove a few simple theorems. This will illustrate not just
|
|
120 |
the basic proof commands but also the typical proof process.
|
|
121 |
|
|
122 |
\subsubsection*{Main goal: \texttt{rev(rev xs) = xs}}
|
|
123 |
|
|
124 |
Our goal is to show that reversing a list twice produces the original
|
|
125 |
list. Typing
|
|
126 |
\begin{ttbox}
|
|
127 |
\input{ToyList/thm}\end{ttbox}
|
|
128 |
establishes a new goal to be proved in the context of the current theory,
|
|
129 |
which is the one we just loaded. Isabelle's response is to print the current proof state:
|
|
130 |
\begin{ttbox}
|
|
131 |
{\out Level 0}
|
|
132 |
{\out rev (rev xs) = xs}
|
|
133 |
{\out 1. rev (rev xs) = xs}
|
|
134 |
\end{ttbox}
|
|
135 |
Until we have finished a proof, the proof state always looks like this:
|
|
136 |
\begin{ttbox}
|
|
137 |
{\out Level \(i\)}
|
|
138 |
{\out \(G\)}
|
|
139 |
{\out 1. \(G@1\)}
|
|
140 |
{\out \(\vdots\)}
|
|
141 |
{\out \(n\). \(G@n\)}
|
|
142 |
\end{ttbox}
|
|
143 |
where \texttt{Level}~$i$ indicates that we are $i$ steps into the proof, $G$
|
|
144 |
is the overall goal that we are trying to prove, and the numbered lines
|
|
145 |
contain the subgoals $G@1$, \dots, $G@n$ that we need to prove to establish
|
|
146 |
$G$. At \texttt{Level 0} there is only one subgoal, which is identical with
|
|
147 |
the overall goal. Normally $G$ is constant and only serves as a reminder.
|
|
148 |
Hence we rarely show it in this tutorial.
|
|
149 |
|
|
150 |
Let us now get back to \texttt{rev(rev xs) = xs}. Properties of recursively
|
|
151 |
defined functions are best established by induction. In this case there is
|
|
152 |
not much choice except to induct on \texttt{xs}:
|
|
153 |
\begin{ttbox}
|
|
154 |
\input{ToyList/inductxs}\end{ttbox}
|
|
155 |
This tells Isabelle to perform induction on variable \texttt{xs} in subgoal
|
|
156 |
1. The new proof state contains two subgoals, namely the base case
|
|
157 |
(\texttt{Nil}) and the induction step (\texttt{Cons}):
|
|
158 |
\begin{ttbox}
|
|
159 |
{\out 1. rev (rev []) = []}
|
|
160 |
{\out 2. !!a list. rev (rev list) = list ==> rev (rev (a # list)) = a # list}
|
|
161 |
\end{ttbox}
|
|
162 |
The induction step is an example of the general format of a subgoal:
|
|
163 |
\begin{ttbox}
|
|
164 |
{\out \(i\). !!\(x@1 \dots x@n\). {\it assumptions} ==> {\it conclusion}}
|
|
165 |
\end{ttbox}\index{==>@{\tt==>}|bold}
|
|
166 |
The prefix of bound variables \texttt{!!\(x@1 \dots x@n\)} can be ignored
|
|
167 |
most of the time, or simply treated as a list of variables local to this
|
|
168 |
subgoal. Their deeper significance is explained in \S\ref{sec:PCproofs}. The
|
|
169 |
{\it assumptions} are the local assumptions for this subgoal and {\it
|
|
170 |
conclusion} is the actual proposition to be proved. Typical proof steps
|
|
171 |
that add new assumptions are induction or case distinction. In our example
|
|
172 |
the only assumption is the induction hypothesis \texttt{rev (rev list) =
|
|
173 |
list}, where \texttt{list} is a variable name chosen by Isabelle. If there
|
|
174 |
are multiple assumptions, they are enclosed in the bracket pair
|
|
175 |
\texttt{[|}\index{==>@\ttlbr|bold} and \texttt{|]}\index{==>@\ttrbr|bold}
|
|
176 |
and separated by semicolons.
|
|
177 |
|
|
178 |
Let us try to solve both goals automatically:
|
|
179 |
\begin{ttbox}
|
|
180 |
\input{ToyList/autotac}\end{ttbox}
|
|
181 |
This command tells Isabelle to apply a proof strategy called
|
|
182 |
\texttt{Auto_tac} to all subgoals. Essentially, \texttt{Auto_tac} tries to
|
|
183 |
`simplify' the subgoals. In our case, subgoal~1 is solved completely (thanks
|
|
184 |
to the equation \texttt{rev [] = []}) and disappears; the simplified version
|
|
185 |
of subgoal~2 becomes the new subgoal~1:
|
|
186 |
\begin{ttbox}\makeatother
|
|
187 |
{\out 1. !!a list. rev(rev list) = list ==> rev(rev list @ a # []) = a # list}
|
|
188 |
\end{ttbox}
|
|
189 |
In order to simplify this subgoal further, a lemma suggests itself.
|
|
190 |
|
|
191 |
\subsubsection*{First lemma: \texttt{rev(xs \at~ys) = (rev ys) \at~(rev xs)}}
|
|
192 |
|
|
193 |
We start the proof as usual:
|
|
194 |
\begin{ttbox}\makeatother
|
|
195 |
\input{ToyList/lemma1}\end{ttbox}
|
|
196 |
There are two variables that we could induct on: \texttt{xs} and
|
|
197 |
\texttt{ys}. Because \texttt{\at} is defined by recursion on
|
|
198 |
the first argument, \texttt{xs} is the correct one:
|
|
199 |
\begin{ttbox}
|
|
200 |
\input{ToyList/inductxs}\end{ttbox}
|
|
201 |
This time not even the base case is solved automatically:
|
|
202 |
\begin{ttbox}\makeatother
|
|
203 |
by(Auto_tac);
|
|
204 |
{\out 1. rev ys = rev ys @ []}
|
|
205 |
{\out 2. \dots}
|
|
206 |
\end{ttbox}
|
|
207 |
We need another lemma.
|
|
208 |
|
|
209 |
\subsubsection*{Second lemma: \texttt{xs \at~[] = xs}}
|
|
210 |
|
|
211 |
This time the canonical proof procedure
|
|
212 |
\begin{ttbox}\makeatother
|
|
213 |
\input{ToyList/lemma2}\input{ToyList/inductxs}\input{ToyList/autotac}\end{ttbox}
|
|
214 |
leads to the desired message \texttt{No subgoals!}:
|
|
215 |
\begin{ttbox}\makeatother
|
|
216 |
{\out Level 2}
|
|
217 |
{\out xs @ [] = xs}
|
|
218 |
{\out No subgoals!}
|
|
219 |
\end{ttbox}
|
|
220 |
Now we can give the lemma just proved a suitable name
|
|
221 |
\begin{ttbox}
|
|
222 |
\input{ToyList/qed2}\end{ttbox}
|
|
223 |
and tell Isabelle to use this lemma in all future proofs by simplification:
|
|
224 |
\begin{ttbox}
|
|
225 |
\input{ToyList/addsimps2}\end{ttbox}
|
|
226 |
Note that in the theorem \texttt{app_Nil2} the free variable \texttt{xs} has
|
|
227 |
been replaced by the unknown \texttt{?xs}, just as explained in
|
|
228 |
\S\ref{sec:variables}.
|
|
229 |
|
|
230 |
Going back to the proof of the first lemma
|
|
231 |
\begin{ttbox}\makeatother
|
|
232 |
\input{ToyList/lemma1}\input{ToyList/inductxs}\input{ToyList/autotac}\end{ttbox}
|
|
233 |
we find that this time \texttt{Auto_tac} solves the base case, but the
|
|
234 |
induction step merely simplifies to
|
|
235 |
\begin{ttbox}\makeatother
|
|
236 |
{\out 1. !!a list.}
|
|
237 |
{\out rev (list @ ys) = rev ys @ rev list}
|
|
238 |
{\out ==> (rev ys @ rev list) @ a # [] = rev ys @ rev list @ a # []}
|
|
239 |
\end{ttbox}
|
|
240 |
Now we need to remember that \texttt{\at} associates to the right, and that
|
|
241 |
\texttt{\#} and \texttt{\at} have the same priority (namely the \texttt{65}
|
|
242 |
in the definition of \texttt{ToyList}). Thus the conclusion really is
|
|
243 |
\begin{ttbox}\makeatother
|
|
244 |
{\out ==> (rev ys @ rev list) @ (a # []) = rev ys @ (rev list @ (a # []))}
|
|
245 |
\end{ttbox}
|
|
246 |
and the missing lemma is associativity of \texttt{\at}.
|
|
247 |
|
|
248 |
\subsubsection*{Third lemma: \texttt{(xs \at~ys) \at~zs = xs \at~(ys \at~zs)}}
|
|
249 |
|
|
250 |
This time the canonical proof procedure
|
|
251 |
\begin{ttbox}\makeatother
|
|
252 |
\input{ToyList/lemma3}\end{ttbox}
|
|
253 |
succeeds without further ado. Again we name the lemma and add it to
|
|
254 |
the set of lemmas used during simplification:
|
|
255 |
\begin{ttbox}
|
|
256 |
\input{ToyList/qed3}\end{ttbox}
|
|
257 |
Now we can go back and prove the first lemma
|
|
258 |
\begin{ttbox}\makeatother
|
|
259 |
\input{ToyList/lemma1}\input{ToyList/inductxs}\input{ToyList/autotac}\end{ttbox}
|
|
260 |
add it to the simplification lemmas
|
|
261 |
\begin{ttbox}
|
|
262 |
\input{ToyList/qed1}\end{ttbox}
|
|
263 |
and then solve our main theorem:
|
|
264 |
\begin{ttbox}\makeatother
|
|
265 |
\input{ToyList/thm}\input{ToyList/inductxs}\input{ToyList/autotac}\end{ttbox}
|
|
266 |
|
|
267 |
\subsubsection*{Review}
|
|
268 |
|
|
269 |
This is the end of our toy proof. It should have familiarized you with
|
|
270 |
\begin{itemize}
|
|
271 |
\item the standard theorem proving procedure:
|
|
272 |
state a goal; proceed with proof until a new lemma is required; prove that
|
|
273 |
lemma; come back to the original goal.
|
|
274 |
\item a specific procedure that works well for functional programs:
|
|
275 |
induction followed by all-out simplification via \texttt{Auto_tac}.
|
|
276 |
\item a basic repertoire of proof commands.
|
|
277 |
\end{itemize}
|
|
278 |
|
|
279 |
|
|
280 |
\section{Some helpful commands}
|
|
281 |
\label{sec:commands-and-hints}
|
|
282 |
|
|
283 |
This section discusses a few basic commands for manipulating the proof state
|
|
284 |
and can be skipped by casual readers.
|
|
285 |
|
|
286 |
There are two kinds of commands used during a proof: the actual proof
|
|
287 |
commands and auxiliary commands for examining the proof state and controlling
|
|
288 |
the display. Proof commands are always of the form
|
|
289 |
\texttt{by(\textit{tactic});}\indexbold{tactic} where \textbf{tactic} is a
|
|
290 |
synonym for ``theorem proving function''. Typical examples are
|
|
291 |
\texttt{induct_tac} and \texttt{Auto_tac} --- the suffix \texttt{_tac} is
|
|
292 |
merely a mnemonic. Further tactics are introduced throughout the tutorial.
|
|
293 |
|
|
294 |
%Tactics can also be modified. For example,
|
|
295 |
%\begin{ttbox}
|
|
296 |
%by(ALLGOALS Asm_simp_tac);
|
|
297 |
%\end{ttbox}
|
|
298 |
%tells Isabelle to apply \texttt{Asm_simp_tac} to all subgoals. For more on
|
|
299 |
%tactics and how to combine them see~\S\ref{sec:Tactics}.
|
|
300 |
|
|
301 |
The most useful auxiliary commands are:
|
|
302 |
\begin{description}
|
|
303 |
\item[Printing the current state]
|
|
304 |
Type \texttt{pr();} to redisplay the current proof state, for example when it
|
|
305 |
has disappeared off the screen.
|
|
306 |
\item[Limiting the number of subgoals]
|
|
307 |
Typing \texttt{prlim $k$;} tells Isabelle to print only the first $k$
|
|
308 |
subgoals from now on and redisplays the current proof state. This is helpful
|
|
309 |
when there are many subgoals.
|
|
310 |
\item[Undoing] Typing \texttt{undo();} undoes the effect of the last
|
|
311 |
tactic.
|
|
312 |
\item[Context switch] Every proof happens in the context of a
|
|
313 |
\bfindex{current theory}. By default, this is the last theory loaded. If
|
|
314 |
you want to prove a theorem in the context of a different theory
|
|
315 |
\texttt{T}, you need to type \texttt{context T.thy;}\index{*context|bold}
|
|
316 |
first. Of course you need to change the context again if you want to go
|
|
317 |
back to your original theory.
|
|
318 |
\item[Displaying types] We have already mentioned the flag
|
|
319 |
\ttindex{show_types} above. It can also be useful for detecting typos in
|
|
320 |
formulae early on. For example, if \texttt{show_types} is set and the goal
|
|
321 |
\texttt{rev(rev xs) = xs} is started, Isabelle prints the additional output
|
|
322 |
\begin{ttbox}
|
|
323 |
{\out Variables:}
|
|
324 |
{\out xs :: 'a list}
|
|
325 |
\end{ttbox}
|
|
326 |
which tells us that Isabelle has correctly inferred that
|
|
327 |
\texttt{xs} is a variable of list type. On the other hand, had we
|
|
328 |
made a typo as in \texttt{rev(re xs) = xs}, the response
|
|
329 |
\begin{ttbox}
|
|
330 |
Variables:
|
|
331 |
re :: 'a list => 'a list
|
|
332 |
xs :: 'a list
|
|
333 |
\end{ttbox}
|
|
334 |
would have alerted us because of the unexpected variable \texttt{re}.
|
|
335 |
\item[(Re)loading theories]\index{loading theories}\index{reloading theories}
|
|
336 |
Initially you load theory \texttt{T} by typing \ttindex{use_thy}~\texttt{"T";},
|
|
337 |
which loads all parent theories of \texttt{T} automatically, if they are not
|
|
338 |
loaded already. If you modify \texttt{T.thy} or \texttt{T.ML}, you can
|
|
339 |
reload it by typing \texttt{use_thy~"T";} again. This time, however, only
|
|
340 |
\texttt{T} is reloaded. If some of \texttt{T}'s parents have changed as well,
|
|
341 |
type \ttindexbold{update}\texttt{();} to reload all theories that have
|
|
342 |
changed.
|
|
343 |
\end{description}
|
|
344 |
Further commands are found in the Reference Manual.
|
|
345 |
|
|
346 |
|
|
347 |
\section{Datatypes}
|
|
348 |
\label{sec:datatype}
|
|
349 |
|
|
350 |
Inductive datatypes are part of almost every non-trivial application of HOL.
|
|
351 |
First we take another look at a very important example, the datatype of
|
|
352 |
lists, before we turn to datatypes in general. The section closes with a
|
|
353 |
case study.
|
|
354 |
|
|
355 |
|
|
356 |
\subsection{Lists}
|
|
357 |
|
|
358 |
Lists are one of the essential datatypes in computing. Readers of this tutorial
|
|
359 |
and users of HOL need to be familiar with their basic operations. Theory
|
|
360 |
\texttt{ToyList} is only a small fragment of HOL's predefined theory
|
|
361 |
\texttt{List}\footnote{\texttt{http://www.in.tum.de/\~\relax
|
|
362 |
isabelle/library/HOL/List.html}}.
|
|
363 |
The latter contains many further operations. For example, the functions
|
|
364 |
\ttindexbold{hd} (`head') and \ttindexbold{tl} (`tail') return the first
|
|
365 |
element and the remainder of a list. (However, pattern-matching is usually
|
|
366 |
preferable to \texttt{hd} and \texttt{tl}.)
|
|
367 |
Theory \texttt{List} also contains more syntactic sugar:
|
|
368 |
\texttt{[}$x@1$\texttt{,}\dots\texttt{,}$x@n$\texttt{]} abbreviates
|
|
369 |
$x@1$\texttt{\#}\dots\texttt{\#}$x@n$\texttt{\#[]}.
|
|
370 |
In the rest of the tutorial we always use HOL's predefined lists.
|
|
371 |
|
|
372 |
|
|
373 |
\subsection{The general format}
|
|
374 |
\label{sec:general-datatype}
|
|
375 |
|
|
376 |
The general HOL \texttt{datatype} definition is of the form
|
|
377 |
\[
|
|
378 |
\mathtt{datatype}~(\alpha@1, \dots, \alpha@n) \, t ~=~
|
|
379 |
C@1~\tau@{11}~\dots~\tau@{1k@1} ~\mid~ \dots ~\mid~
|
|
380 |
C@m~\tau@{m1}~\dots~\tau@{mk@m}
|
|
381 |
\]
|
|
382 |
where $\alpha@i$ are type variables (the parameters), $C@i$ are distinct
|
|
383 |
constructor names and $\tau@{ij}$ are types; it is customary to capitalize
|
|
384 |
the first letter in constructor names. There are a number of
|
|
385 |
restrictions (such as the type should not be empty) detailed
|
|
386 |
elsewhere~\cite{Isa-Logics-Man}. Isabelle notifies you if you violate them.
|
|
387 |
|
|
388 |
Laws about datatypes, such as \verb$[] ~= x#xs$ and \texttt{(x\#xs = y\#ys) =
|
|
389 |
(x=y \& xs=ys)}, are used automatically during proofs by simplification.
|
|
390 |
The same is true for the equations in primitive recursive function
|
|
391 |
definitions.
|
|
392 |
|
|
393 |
\subsection{Primitive recursion}
|
|
394 |
|
|
395 |
Functions on datatypes are usually defined by recursion. In fact, most of the
|
|
396 |
time they are defined by what is called \bfindex{primitive recursion}.
|
|
397 |
The keyword \texttt{primrec} is followed by a list of equations
|
|
398 |
\[ f \, x@1 \, \dots \, (C \, y@1 \, \dots \, y@k)\, \dots \, x@n = r \]
|
|
399 |
such that $C$ is a constructor of the datatype $t$ and all recursive calls of
|
|
400 |
$f$ in $r$ are of the form $f \, \dots \, y@i \, \dots$ for some $i$. Thus
|
|
401 |
Isabelle immediately sees that $f$ terminates because one (fixed!) argument
|
|
402 |
becomes smaller with every recursive call. There must be exactly one equation
|
|
403 |
for each constructor. Their order is immaterial.
|
|
404 |
A more general method for defining total recursive functions is explained in
|
|
405 |
\S\ref{sec:recdef}.
|
|
406 |
|
|
407 |
\begin{exercise}
|
|
408 |
Given the datatype of binary trees
|
|
409 |
\begin{ttbox}
|
|
410 |
\input{Misc/tree}\end{ttbox}
|
|
411 |
define a function \texttt{mirror} that mirrors the structure of a binary tree
|
|
412 |
by swapping subtrees (recursively). Prove \texttt{mirror(mirror(t)) = t}.
|
|
413 |
\end{exercise}
|
|
414 |
|
|
415 |
\subsection{\texttt{case}-expressions}
|
|
416 |
\label{sec:case-expressions}
|
|
417 |
|
|
418 |
HOL also features \ttindexbold{case}-expressions for analyzing elements of a
|
|
419 |
datatype. For example,
|
|
420 |
\begin{ttbox}
|
|
421 |
case xs of [] => 0 | y#ys => y
|
|
422 |
\end{ttbox}
|
|
423 |
evaluates to \texttt{0} if \texttt{xs} is \texttt{[]} and to \texttt{y} if
|
|
424 |
\texttt{xs} is \texttt{y\#ys}. (Since the result in both branches must be of
|
|
425 |
the same type, it follows that \texttt{y::nat} and hence
|
|
426 |
\texttt{xs::(nat)list}.)
|
|
427 |
|
|
428 |
In general, if $e$ is a term of the datatype $t$ defined in
|
|
429 |
\S\ref{sec:general-datatype} above, the corresponding
|
|
430 |
\texttt{case}-expression analyzing $e$ is
|
|
431 |
\[
|
|
432 |
\begin{array}{rrcl}
|
|
433 |
\mbox{\tt case}~e~\mbox{\tt of} & C@1~x@{11}~\dots~x@{1k@1} & \To & e@1 \\
|
|
434 |
\vdots \\
|
|
435 |
\mid & C@m~x@{m1}~\dots~x@{mk@m} & \To & e@m
|
|
436 |
\end{array}
|
|
437 |
\]
|
|
438 |
|
|
439 |
\begin{warn}
|
|
440 |
{\em All} constructors must be present, their order is fixed, and nested
|
|
441 |
patterns are not supported. Violating these restrictions results in strange
|
|
442 |
error messages.
|
|
443 |
\end{warn}
|
|
444 |
\noindent
|
|
445 |
Nested patterns can be simulated by nested \texttt{case}-expressions: instead
|
|
446 |
of
|
|
447 |
\begin{ttbox}
|
|
448 |
case xs of [] => 0 | [x] => x | x#(y#zs) => y
|
|
449 |
\end{ttbox}
|
|
450 |
write
|
|
451 |
\begin{ttbox}
|
|
452 |
case xs of [] => 0 | x#ys => (case ys of [] => x | y#zs => y)
|
|
453 |
\end{ttbox}
|
|
454 |
Note that \texttt{case}-expressions should be enclosed in parentheses to
|
|
455 |
indicate their scope.
|
|
456 |
|
|
457 |
\subsection{Structural induction}
|
|
458 |
|
|
459 |
Almost all the basic laws about a datatype are applied automatically during
|
|
460 |
simplification. Only induction is invoked by hand via \texttt{induct_tac},
|
|
461 |
which works for any datatype. In some cases, induction is overkill and a case
|
|
462 |
distinction over all constructors of the datatype suffices. This is performed
|
|
463 |
by \ttindexbold{exhaust_tac}. A trivial example:
|
|
464 |
\begin{ttbox}
|
|
465 |
\input{Misc/exhaust.ML}{\out1. xs = [] ==> (case xs of [] => [] | y # ys => xs) = xs}
|
|
466 |
{\out2. !!a list. xs = a # list ==> (case xs of [] => [] | y # ys => xs) = xs}
|
|
467 |
\input{Misc/autotac.ML}\end{ttbox}
|
|
468 |
Note that this particular case distinction could have been automated
|
|
469 |
completely. See~\S\ref{sec:SimpFeatures}.
|
|
470 |
|
|
471 |
\begin{warn}
|
|
472 |
Induction is only allowed on a free variable that should not occur among
|
|
473 |
the assumptions of the subgoal. Exhaustion works for arbitrary terms.
|
|
474 |
\end{warn}
|
|
475 |
|
|
476 |
\subsection{Case study: boolean expressions}
|
|
477 |
\label{sec:boolex}
|
|
478 |
|
|
479 |
The aim of this case study is twofold: it shows how to model boolean
|
|
480 |
expressions and some algorithms for manipulating them, and it demonstrates
|
|
481 |
the constructs introduced above.
|
|
482 |
|
|
483 |
\subsubsection{How can we model boolean expressions?}
|
|
484 |
|
|
485 |
We want to represent boolean expressions built up from variables and
|
|
486 |
constants by negation and conjunction. The following datatype serves exactly
|
|
487 |
that purpose:
|
|
488 |
\begin{ttbox}
|
|
489 |
\input{Ifexpr/boolex}\end{ttbox}
|
|
490 |
The two constants are represented by the terms \texttt{Const~True} and
|
|
491 |
\texttt{Const~False}. Variables are represented by terms of the form
|
|
492 |
\texttt{Var}~$n$, where $n$ is a natural number (type \texttt{nat}).
|
|
493 |
For example, the formula $P@0 \land \neg P@1$ is represented by the term
|
|
494 |
\texttt{And~(Var~0)~(Neg(Var~1))}.
|
|
495 |
|
|
496 |
\subsubsection{What is the value of boolean expressions?}
|
|
497 |
|
|
498 |
The value of a boolean expressions depends on the value of its variables.
|
|
499 |
Hence the function \texttt{value} takes an additional parameter, an {\em
|
|
500 |
environment} of type \texttt{nat~=>~bool}, which maps variables to their
|
|
501 |
values:
|
|
502 |
\begin{ttbox}
|
|
503 |
\input{Ifexpr/value}\end{ttbox}
|
|
504 |
|
|
505 |
\subsubsection{If-expressions}
|
|
506 |
|
|
507 |
An alternative and often more efficient (because in a certain sense
|
|
508 |
canonical) representation are so-called \textit{If-expressions\/} built up
|
|
509 |
from constants (\texttt{CIF}), variables (\texttt{VIF}) and conditionals
|
|
510 |
(\texttt{IF}):
|
|
511 |
\begin{ttbox}
|
|
512 |
\input{Ifexpr/ifex}\end{ttbox}
|
|
513 |
The evaluation if If-expressions proceeds as for \texttt{boolex}:
|
|
514 |
\begin{ttbox}
|
|
515 |
\input{Ifexpr/valif}\end{ttbox}
|
|
516 |
|
|
517 |
\subsubsection{Transformation into and of If-expressions}
|
|
518 |
|
|
519 |
The type \texttt{boolex} is close to the customary representation of logical
|
|
520 |
formulae, whereas \texttt{ifex} is designed for efficiency. Thus we need to
|
|
521 |
translate from \texttt{boolex} into \texttt{ifex}:
|
|
522 |
\begin{ttbox}
|
|
523 |
\input{Ifexpr/bool2if}\end{ttbox}
|
|
524 |
At last, we have something we can verify: that \texttt{bool2if} preserves the
|
|
525 |
value of its argument.
|
|
526 |
\begin{ttbox}
|
|
527 |
\input{Ifexpr/bool2if.ML}\end{ttbox}
|
|
528 |
The proof is canonical:
|
|
529 |
\begin{ttbox}
|
|
530 |
\input{Ifexpr/proof.ML}\end{ttbox}
|
|
531 |
In fact, all proofs in this case study look exactly like this. Hence we do
|
|
532 |
not show them below.
|
|
533 |
|
|
534 |
More interesting is the transformation of If-expressions into a normal form
|
|
535 |
where the first argument of \texttt{IF} cannot be another \texttt{IF} but
|
|
536 |
must be a constant or variable. Such a normal form can be computed by
|
|
537 |
repeatedly replacing a subterm of the form \texttt{IF~(IF~b~x~y)~z~u} by
|
|
538 |
\texttt{IF b (IF x z u) (IF y z u)}, which has the same value. The following
|
|
539 |
primitive recursive functions perform this task:
|
|
540 |
\begin{ttbox}
|
|
541 |
\input{Ifexpr/normif}
|
|
542 |
\input{Ifexpr/norm}\end{ttbox}
|
|
543 |
Their interplay is a bit tricky, and we leave it to the reader to develop an
|
|
544 |
intuitive understanding. Fortunately, Isabelle can help us to verify that the
|
|
545 |
transformation preserves the value of the expression:
|
|
546 |
\begin{ttbox}
|
|
547 |
\input{Ifexpr/norm.ML}\end{ttbox}
|
|
548 |
The proof is canonical, provided we first show the following lemma (which
|
|
549 |
also helps to understand what \texttt{normif} does) and make it available
|
|
550 |
for simplification via \texttt{Addsimps}:
|
|
551 |
\begin{ttbox}
|
|
552 |
\input{Ifexpr/normif.ML}\end{ttbox}
|
|
553 |
|
|
554 |
But how can we be sure that \texttt{norm} really produces a normal form in
|
|
555 |
the above sense? We have to prove
|
|
556 |
\begin{ttbox}
|
|
557 |
\input{Ifexpr/normal_norm.ML}\end{ttbox}
|
|
558 |
where \texttt{normal} expresses that an If-expression is in normal form:
|
|
559 |
\begin{ttbox}
|
|
560 |
\input{Ifexpr/normal}\end{ttbox}
|
|
561 |
Of course, this requires a lemma about normality of \texttt{normif}
|
|
562 |
\begin{ttbox}
|
|
563 |
\input{Ifexpr/normal_normif.ML}\end{ttbox}
|
|
564 |
that has to be made available for simplification via \texttt{Addsimps}.
|
|
565 |
|
|
566 |
How does one come up with the required lemmas? Try to prove the main theorems
|
|
567 |
without them and study carefully what \texttt{Auto_tac} leaves unproved. This
|
|
568 |
has to provide the clue.
|
|
569 |
The necessity of universal quantification (\texttt{!t e}) in the two lemmas
|
|
570 |
is explained in \S\ref{sec:InductionHeuristics}
|
|
571 |
|
|
572 |
\begin{exercise}
|
|
573 |
We strengthen the definition of a {\em normal\/} If-expression as follows:
|
|
574 |
the first argument of all \texttt{IF}s must be a variable. Adapt the above
|
|
575 |
development to this changed requirement. (Hint: you may need to formulate
|
|
576 |
some of the goals as implications (\texttt{-->}) rather than equalities
|
|
577 |
(\texttt{=}).)
|
|
578 |
\end{exercise}
|
|
579 |
|
|
580 |
\section{Some basic types}
|
|
581 |
|
|
582 |
\subsection{Natural numbers}
|
|
583 |
|
|
584 |
The type \ttindexbold{nat} of natural numbers is predefined and behaves like
|
|
585 |
\begin{ttbox}
|
|
586 |
datatype nat = 0 | Suc nat
|
|
587 |
\end{ttbox}
|
|
588 |
In particular, there are \texttt{case}-expressions, for example
|
|
589 |
\begin{ttbox}
|
|
590 |
case n of 0 => 0 | Suc m => m
|
|
591 |
\end{ttbox}
|
|
592 |
primitive recursion, for example
|
|
593 |
\begin{ttbox}
|
|
594 |
\input{Misc/natsum}\end{ttbox}
|
|
595 |
and induction, for example
|
|
596 |
\begin{ttbox}
|
|
597 |
\input{Misc/NatSum.ML}\ttbreak
|
|
598 |
{\out sum n + sum n = n * Suc n}
|
|
599 |
{\out No subgoals!}
|
|
600 |
\end{ttbox}
|
|
601 |
|
|
602 |
The usual arithmetic operations \ttindexbold{+}, \ttindexbold{-},
|
|
603 |
\ttindexbold{*}, \ttindexbold{div} and \ttindexbold{mod} are predefined, as
|
|
604 |
are the relations \ttindexbold{<=} and \ttindexbold{<}. There is even a least
|
|
605 |
number operation \ttindexbold{LEAST}. For example, \texttt{(LEAST n.$\,$1 <
|
|
606 |
n) = 2} (HOL does not prove this completely automatically).
|
|
607 |
|
|
608 |
\begin{warn}
|
|
609 |
The operations \ttindexbold{+}, \ttindexbold{-}, \ttindexbold{*} are
|
|
610 |
overloaded, i.e.\ they are available not just for natural numbers but at
|
|
611 |
other types as well (see \S\ref{sec:TypeClasses}). For example, given
|
|
612 |
the goal \texttt{x+y = y+x}, there is nothing to indicate that you are
|
|
613 |
talking about natural numbers. Hence Isabelle can only infer that
|
|
614 |
\texttt{x} and \texttt{y} are of some arbitrary type where \texttt{+} is
|
|
615 |
declared. As a consequence, you will be unable to prove the goal (although
|
|
616 |
it may take you some time to realize what has happened if
|
|
617 |
\texttt{show_types} is not set). In this particular example, you need to
|
|
618 |
include an explicit type constraint, for example \texttt{x+y = y+(x::nat)}.
|
|
619 |
If there is enough contextual information this may not be necessary:
|
|
620 |
\texttt{x+0 = x} automatically implies \texttt{x::nat}.
|
|
621 |
\end{warn}
|
|
622 |
|
|
623 |
|
|
624 |
\subsection{Products}
|
|
625 |
|
|
626 |
HOL also has pairs: \texttt{($a@1$,$a@2$)} is of type \texttt{$\tau@1$ *
|
|
627 |
$\tau@2$} provided each $a@i$ is of type $\tau@i$. The components of a pair
|
|
628 |
are extracted by \texttt{fst} and \texttt{snd}:
|
|
629 |
\texttt{fst($x$,$y$) = $x$} and \texttt{snd($x$,$y$) = $y$}. Tuples
|
|
630 |
are simulated by pairs nested to the right:
|
|
631 |
\texttt{($a@1$,$a@2$,$a@3$)} and \texttt{$\tau@1$ * $\tau@2$ * $\tau@3$}
|
|
632 |
stand for \texttt{($a@1$,($a@2$,$a@3$))} and \texttt{$\tau@1$ * ($\tau@2$ *
|
|
633 |
$\tau@3$)}. Therefore \texttt{fst(snd($a@1$,$a@2$,$a@3$)) = $a@2$}.
|
|
634 |
|
|
635 |
It is possible to use (nested) tuples as patterns in abstractions, for
|
|
636 |
example \texttt{\%(x,y,z).x+y+z} and \texttt{\%((x,y),z).x+y+z}.
|
|
637 |
|
|
638 |
In addition to explicit $\lambda$-abstractions, tuple patterns can be used in
|
|
639 |
most variable binding constructs. Typical examples are
|
|
640 |
\begin{ttbox}
|
|
641 |
let (x,y) = f z in (y,x)
|
|
642 |
|
|
643 |
case xs of [] => 0 | (x,y)\#zs => x+y
|
|
644 |
\end{ttbox}
|
|
645 |
Further important examples are quantifiers and sets.
|
|
646 |
|
|
647 |
\begin{warn}
|
|
648 |
Abstraction over pairs and tuples is merely a convenient shorthand for a more
|
|
649 |
complex internal representation. Thus the internal and external form of a
|
|
650 |
term may differ, which can affect proofs. If you want to avoid this
|
|
651 |
complication, use \texttt{fst} and \texttt{snd}, i.e.\ write
|
|
652 |
\texttt{\%p.~fst p + snd p} instead of \texttt{\%(x,y).~x + y}.
|
|
653 |
See~\S\ref{} for theorem proving with tuple patterns.
|
|
654 |
\end{warn}
|
|
655 |
|
|
656 |
|
|
657 |
\section{Definitions}
|
|
658 |
\label{sec:Definitions}
|
|
659 |
|
|
660 |
A definition is simply an abbreviation, i.e.\ a new name for an existing
|
|
661 |
construction. In particular, definitions cannot be recursive. Isabelle offers
|
|
662 |
definitions on the level of types and terms. Those on the type level are
|
|
663 |
called type synonyms, those on the term level are called (constant)
|
|
664 |
definitions.
|
|
665 |
|
|
666 |
|
|
667 |
\subsection{Type synonyms}
|
|
668 |
\indexbold{type synonyms}
|
|
669 |
|
|
670 |
Type synonyms are similar to those found in ML. Their syntax is fairly self
|
|
671 |
explanatory:
|
|
672 |
\begin{ttbox}
|
|
673 |
\input{Misc/types}\end{ttbox}\indexbold{*types}
|
|
674 |
The synonym \texttt{alist} shows that in general the type on the right-hand
|
|
675 |
side needs to be enclosed in double quotation marks
|
|
676 |
(see the end of~\S\ref{sec:intro-theory}).
|
|
677 |
|
|
678 |
Internally all synonyms are fully expanded. As a consequence Isabelle's
|
|
679 |
output never contains synonyms. Their main purpose is to improve the
|
|
680 |
readability of theory definitions. Synonyms can be used just like any other
|
|
681 |
type:
|
|
682 |
\begin{ttbox}
|
|
683 |
\input{Misc/consts}\end{ttbox}
|
|
684 |
|
|
685 |
\subsection{Constant definitions}
|
|
686 |
\label{sec:ConstDefinitions}
|
|
687 |
|
|
688 |
The above constants \texttt{nand} and \texttt{exor} are non-recursive and can
|
|
689 |
therefore be defined directly by
|
|
690 |
\begin{ttbox}
|
|
691 |
\input{Misc/defs}\end{ttbox}\indexbold{*defs}
|
|
692 |
where \texttt{defs} is a keyword and \texttt{nand_def} and \texttt{exor_def}
|
|
693 |
are arbitrary user-supplied names.
|
|
694 |
The symbol \texttt{==}\index{==>@{\tt==}|bold} is a special form of equality
|
|
695 |
that should only be used in constant definitions.
|
|
696 |
Declarations and definitions can also be merged
|
|
697 |
\begin{ttbox}
|
|
698 |
\input{Misc/constdefs}\end{ttbox}\indexbold{*constdefs}
|
|
699 |
in which case the default name of each definition is $f$\texttt{_def}, where
|
|
700 |
$f$ is the name of the defined constant.
|
|
701 |
|
|
702 |
Note that pattern-matching is not allowed, i.e.\ each definition must be of
|
|
703 |
the form $f\,x@1\,\dots\,x@n$\texttt{~==~}$t$.
|
|
704 |
|
|
705 |
Section~\S\ref{sec:Simplification} explains how definitions are used
|
|
706 |
in proofs.
|
|
707 |
|
|
708 |
\begin{warn}
|
|
709 |
A common mistake when writing definitions is to introduce extra free variables
|
|
710 |
on the right-hand side as in the following fictitious definition:
|
|
711 |
\begin{ttbox}
|
|
712 |
defs prime_def "prime(p) == (m divides p) --> (m=1 | m=p)"
|
|
713 |
\end{ttbox}
|
|
714 |
Isabelle rejects this `definition' because of the extra {\tt m} on the
|
|
715 |
right-hand side, which would introduce an inconsistency. What you should have
|
|
716 |
written is
|
|
717 |
\begin{ttbox}
|
|
718 |
defs prime_def "prime(p) == !m. (m divides p) --> (m=1 | m=p)"
|
|
719 |
\end{ttbox}
|
|
720 |
\end{warn}
|
|
721 |
|
|
722 |
|
|
723 |
|
|
724 |
|
|
725 |
\chapter{More Functional Programming}
|
|
726 |
|
|
727 |
The purpose of this chapter is to deepen the reader's understanding of the
|
|
728 |
concepts encountered so far and to introduce an advanced method for defining
|
|
729 |
recursive functions. The first two sections give a structured presentation of
|
|
730 |
theorem proving by simplification (\S\ref{sec:Simplification}) and
|
|
731 |
discuss important heuristics for induction (\S\ref{sec:InductionHeuristics}). They
|
|
732 |
can be skipped by readers less interested in proofs. They are followed by a
|
|
733 |
case study, a compiler for expressions (\S\ref{sec:ExprCompiler}).
|
|
734 |
Finally we present a very general method for defining recursive functions
|
|
735 |
that goes well beyond what \texttt{primrec} allows (\S\ref{sec:recdef}).
|
|
736 |
|
|
737 |
|
|
738 |
\section{Simplification}
|
|
739 |
\label{sec:Simplification}
|
|
740 |
|
|
741 |
So far we have proved our theorems by \texttt{Auto_tac}, which
|
|
742 |
`simplifies' all subgoals. In fact, \texttt{Auto_tac} can do much more than
|
|
743 |
that, except that it did not need to so far. However, when you go beyond toy
|
|
744 |
examples, you need to understand the ingredients of \texttt{Auto_tac}.
|
|
745 |
This section covers the tactic that \texttt{Auto_tac} always applies first,
|
|
746 |
namely simplification.
|
|
747 |
|
|
748 |
Simplification is one of the central theorem proving tools in Isabelle and
|
|
749 |
many other systems. The tool itself is called the \bfindex{simplifier}. The
|
|
750 |
purpose of this section is twofold: to introduce the many features of the
|
|
751 |
simplifier (\S\ref{sec:SimpFeatures}) and to explain a little bit how the
|
|
752 |
simplifier works (\S\ref{sec:SimpHow}). Anybody intending to use HOL should
|
|
753 |
read \S\ref{sec:SimpFeatures}, and the serious student should read
|
|
754 |
\S\ref{sec:SimpHow} as well in order to understand what happened in case
|
|
755 |
things do not simplify as expected.
|
|
756 |
|
|
757 |
|
|
758 |
\subsection{Using the simplifier}
|
|
759 |
\label{sec:SimpFeatures}
|
|
760 |
|
|
761 |
In its most basic form, simplification means repeated application of
|
|
762 |
equations from left to right. For example, taking the rules for \texttt{\at}
|
|
763 |
and applying them to the term \texttt{[0,1] \at\ []} results in a sequence of
|
|
764 |
simplification steps:
|
|
765 |
\begin{ttbox}\makeatother
|
|
766 |
(0#1#[]) @ [] \(\leadsto\) 0#((1#[]) @ []) \(\leadsto\) 0#(1#([] @ [])) \(\leadsto\) 0#1#[]
|
|
767 |
\end{ttbox}
|
|
768 |
This is also known as {\em term rewriting} and the equations are referred
|
|
769 |
to as {\em rewrite rules}. This is more honest than `simplification' because
|
|
770 |
the terms do not necessarily become simpler in the process.
|
|
771 |
|
|
772 |
\subsubsection{Simpsets}
|
|
773 |
|
|
774 |
To facilitate simplification, each theory has an associated set of
|
|
775 |
simplification rules, known as a \bfindex{simpset}. Within a theory,
|
|
776 |
proofs by simplification refer to the associated simpset by default.
|
|
777 |
The simpset of a theory is built up as follows: starting with the union of
|
|
778 |
the simpsets of the parent theories, each occurrence of a \texttt{datatype}
|
|
779 |
or \texttt{primrec} construct augments the simpset. Explicit definitions are
|
|
780 |
not added automatically. Users can add new theorems via \texttt{Addsimps} and
|
|
781 |
delete them again later by \texttt{Delsimps}.
|
|
782 |
|
|
783 |
You may augment a simpset not just by equations but by pretty much any
|
|
784 |
theorem. The simplifier will try to make sense of it. For example, a theorem
|
|
785 |
\verb$~$$P$ is automatically turned into \texttt{$P$ = False}. The details are
|
|
786 |
explained in \S\ref{sec:SimpHow}.
|
|
787 |
|
|
788 |
As a rule of thumb, rewrite rules that really simplify a term (like
|
|
789 |
\texttt{xs \at\ [] = xs} and \texttt{rev(rev xs) = xs}) should be added to the
|
|
790 |
current simpset right after they have been proved. Those of a more specific
|
|
791 |
nature (e.g.\ distributivity laws, which alter the structure of terms
|
|
792 |
considerably) should only be added for specific proofs and deleted again
|
|
793 |
afterwards. Conversely, it may also happen that a generally useful rule
|
|
794 |
needs to be removed for a certain proof and is added again afterwards. The
|
|
795 |
need of frequent temporary additions or deletions may indicate a badly
|
|
796 |
designed simpset.
|
|
797 |
\begin{warn}
|
|
798 |
Simplification may not terminate, for example if both $f(x) = g(x)$ and
|
|
799 |
$g(x) = f(x)$ are in the simpset. It is the user's responsibility not to
|
|
800 |
include rules that can lead to nontermination, either on their own or in
|
|
801 |
combination with other rules.
|
|
802 |
\end{warn}
|
|
803 |
|
|
804 |
\subsubsection{Simplification tactics}
|
|
805 |
|
|
806 |
There are four main simplification tactics:
|
|
807 |
\begin{ttdescription}
|
|
808 |
\item[\ttindexbold{Simp_tac} $i$] simplifies the conclusion of subgoal~$i$
|
|
809 |
using the theory's simpset. It may solve the subgoal completely if it has
|
|
810 |
become trivial. For example:
|
|
811 |
\begin{ttbox}\makeatother
|
|
812 |
{\out 1. [] @ [] = []}
|
|
813 |
by(Simp_tac 1);
|
|
814 |
{\out No subgoals!}
|
|
815 |
\end{ttbox}
|
|
816 |
|
|
817 |
\item[\ttindexbold{Asm_simp_tac}]
|
|
818 |
is like \verb$Simp_tac$, but extracts additional rewrite rules from
|
|
819 |
the assumptions of the subgoal. For example, it solves
|
|
820 |
\begin{ttbox}\makeatother
|
|
821 |
{\out 1. xs = [] ==> xs @ xs = xs}
|
|
822 |
\end{ttbox}
|
|
823 |
which \texttt{Simp_tac} does not do.
|
|
824 |
|
|
825 |
\item[\ttindexbold{Full_simp_tac}] is like \verb$Simp_tac$, but also
|
|
826 |
simplifies the assumptions (without using the assumptions to
|
|
827 |
simplify each other or the actual goal).
|
|
828 |
|
|
829 |
\item[\ttindexbold{Asm_full_simp_tac}] is like \verb$Asm_simp_tac$,
|
|
830 |
but also simplifies the assumptions. In particular, assumptions can
|
|
831 |
simplify each other. For example:
|
|
832 |
\begin{ttbox}\makeatother
|
|
833 |
\out{ 1. [| xs @ zs = ys @ xs; [] @ xs = [] @ [] |] ==> ys = zs}
|
|
834 |
by(Asm_full_simp_tac 1);
|
|
835 |
{\out No subgoals!}
|
|
836 |
\end{ttbox}
|
|
837 |
The second assumption simplifies to \texttt{xs = []}, which in turn
|
|
838 |
simplifies the first assumption to \texttt{zs = ys}, thus reducing the
|
|
839 |
conclusion to \texttt{ys = ys} and hence to \texttt{True}.
|
|
840 |
(See also the paragraph on tracing below.)
|
|
841 |
\end{ttdescription}
|
|
842 |
\texttt{Asm_full_simp_tac} is the most powerful of this quartet of
|
|
843 |
tactics. In fact, \texttt{Auto_tac} starts by applying
|
|
844 |
\texttt{Asm_full_simp_tac} to all subgoals. The only reason for the existence
|
|
845 |
of the other three tactics is that sometimes one wants to limit the amount of
|
|
846 |
simplification, for example to avoid nontermination:
|
|
847 |
\begin{ttbox}\makeatother
|
|
848 |
{\out 1. ! x. f x = g (f (g x)) ==> f [] = f [] @ []}
|
|
849 |
\end{ttbox}
|
|
850 |
is solved by \texttt{Simp_tac}, but \texttt{Asm_simp_tac} and
|
|
851 |
\texttt{Asm_full_simp_tac} loop because the rewrite rule \texttt{f x = g(f(g
|
|
852 |
x))} extracted from the assumption does not terminate. Isabelle notices
|
|
853 |
certain simple forms of nontermination, but not this one.
|
|
854 |
|
|
855 |
\subsubsection{Modifying simpsets locally}
|
|
856 |
|
|
857 |
If a certain theorem is merely needed in one proof by simplification, the
|
|
858 |
pattern
|
|
859 |
\begin{ttbox}
|
|
860 |
Addsimps [\(rare_theorem\)];
|
|
861 |
by(Simp_tac 1);
|
|
862 |
Delsimps [\(rare_theorem\)];
|
|
863 |
\end{ttbox}
|
|
864 |
is awkward. Therefore there are lower-case versions of the simplification
|
|
865 |
tactics (\ttindexbold{simp_tac}, \ttindexbold{asm_simp_tac},
|
|
866 |
\ttindexbold{full_simp_tac}, \ttindexbold{asm_full_simp_tac}) and of the
|
|
867 |
simpset modifiers (\ttindexbold{addsimps}, \ttindexbold{delsimps})
|
|
868 |
that do not access or modify the implicit simpset but explicitly take a
|
|
869 |
simpset as an argument. For example, the above three lines become
|
|
870 |
\begin{ttbox}
|
|
871 |
by(simp_tac (simpset() addsimps [\(rare_theorem\)]) 1);
|
|
872 |
\end{ttbox}
|
|
873 |
where the result of the function call \texttt{simpset()} is the simpset of
|
|
874 |
the current theory and \texttt{addsimps} is an infix function. The implicit
|
|
875 |
simpset is read once but not modified.
|
|
876 |
This is far preferable to pairs of \texttt{Addsimps} and \texttt{Delsimps}.
|
|
877 |
Local modifications can be stacked as in
|
|
878 |
\begin{ttbox}
|
|
879 |
by(simp_tac (simpset() addsimps [\(rare_theorem\)] delsimps [\(some_thm\)]) 1);
|
|
880 |
\end{ttbox}
|
|
881 |
|
|
882 |
\subsubsection{Rewriting with definitions}
|
|
883 |
|
|
884 |
Constant definitions (\S\ref{sec:ConstDefinitions}) are not automatically
|
|
885 |
included in the simpset of a theory. Hence such definitions are not expanded
|
|
886 |
automatically either, just as it should be: definitions are introduced for
|
|
887 |
the purpose of abbreviating complex concepts. Of course we need to expand the
|
|
888 |
definitions initially to derive enough lemmas that characterize the concept
|
|
889 |
sufficiently for us to forget the original definition completely. For
|
|
890 |
example, given the theory
|
|
891 |
\begin{ttbox}
|
|
892 |
\input{Misc/Exor.thy}\end{ttbox}
|
|
893 |
we may want to prove \verb$exor A (~A)$. Instead of \texttt{Goal} we use
|
|
894 |
\begin{ttbox}
|
|
895 |
\input{Misc/exorgoal.ML}\end{ttbox}
|
|
896 |
which tells Isabelle to expand the definition of \texttt{exor}---the first
|
|
897 |
argument of \texttt{Goalw} can be a list of definitions---in the initial goal:
|
|
898 |
\begin{ttbox}
|
|
899 |
{\out exor A (~ A)}
|
|
900 |
{\out 1. A & ~ ~ A | ~ A & ~ A}
|
|
901 |
\end{ttbox}
|
|
902 |
In this simple example, the goal is proved by \texttt{Simp_tac}.
|
|
903 |
Of course the resulting theorem is insufficient to characterize \texttt{exor}
|
|
904 |
completely.
|
|
905 |
|
|
906 |
In case we want to expand a definition in the middle of a proof, we can
|
|
907 |
simply add the definition locally to the simpset:
|
|
908 |
\begin{ttbox}
|
|
909 |
\input{Misc/exorproof.ML}\end{ttbox}
|
|
910 |
You should normally not add the definition permanently to the simpset
|
|
911 |
using \texttt{Addsimps} because this defeats the whole purpose of an
|
|
912 |
abbreviation.
|
|
913 |
|
|
914 |
\begin{warn}
|
|
915 |
If you have defined $f\,x\,y$\texttt{~==~}$t$ then you can only expand
|
|
916 |
occurrences of $f$ with at least two arguments. Thus it is safer to define
|
|
917 |
$f$\texttt{~==~\%$x\,y$.}$\;t$.
|
|
918 |
\end{warn}
|
|
919 |
|
|
920 |
\subsubsection{Simplifying \texttt{let}-expressions}
|
|
921 |
|
|
922 |
Proving a goal containing \ttindex{let}-expressions invariably requires the
|
|
923 |
\texttt{let}-constructs to be expanded at some point. Since
|
|
924 |
\texttt{let}-\texttt{in} is just syntactic sugar for a defined constant
|
|
925 |
(called \texttt{Let}), expanding \texttt{let}-constructs means rewriting with
|
|
926 |
\texttt{Let_def}:
|
|
927 |
%context List.thy;
|
|
928 |
%Goal "(let xs = [] in xs@xs) = ys";
|
|
929 |
\begin{ttbox}\makeatother
|
|
930 |
{\out 1. (let xs = [] in xs @ xs) = ys}
|
|
931 |
by(simp_tac (simpset() addsimps [Let_def]) 1);
|
|
932 |
{\out 1. [] = ys}
|
|
933 |
\end{ttbox}
|
|
934 |
If, in a particular context, there is no danger of a combinatorial explosion
|
|
935 |
of nested \texttt{let}s one could even add \texttt{Let_def} permanently via
|
|
936 |
\texttt{Addsimps}.
|
|
937 |
|
|
938 |
\subsubsection{Conditional equations}
|
|
939 |
|
|
940 |
So far all examples of rewrite rules were equations. The simplifier also
|
|
941 |
accepts {\em conditional\/} equations, for example
|
|
942 |
\begin{ttbox}
|
|
943 |
xs ~= [] ==> hd xs # tl xs = xs \hfill \((*)\)
|
|
944 |
\end{ttbox}
|
|
945 |
(which is proved by \texttt{exhaust_tac} on \texttt{xs} followed by
|
|
946 |
\texttt{Asm_full_simp_tac} twice). Assuming that this theorem together with
|
|
947 |
%\begin{ttbox}\makeatother
|
|
948 |
\texttt{(rev xs = []) = (xs = [])}
|
|
949 |
%\end{ttbox}
|
|
950 |
are part of the simpset, the subgoal
|
|
951 |
\begin{ttbox}\makeatother
|
|
952 |
{\out 1. xs ~= [] ==> hd(rev xs) # tl(rev xs) = rev xs}
|
|
953 |
\end{ttbox}
|
|
954 |
is proved by simplification:
|
|
955 |
the conditional equation $(*)$ above
|
|
956 |
can simplify \texttt{hd(rev~xs)~\#~tl(rev~xs)} to \texttt{rev xs}
|
|
957 |
because the corresponding precondition \verb$rev xs ~= []$ simplifies to
|
|
958 |
\verb$xs ~= []$, which is exactly the local assumption of the subgoal.
|
|
959 |
|
|
960 |
|
|
961 |
\subsubsection{Automatic case splits}
|
|
962 |
|
|
963 |
Goals containing \ttindex{if}-expressions are usually proved by case
|
|
964 |
distinction on the condition of the \texttt{if}. For example the goal
|
|
965 |
\begin{ttbox}
|
|
966 |
{\out 1. ! xs. if xs = [] then rev xs = [] else rev xs ~= []}
|
|
967 |
\end{ttbox}
|
|
968 |
can be split into
|
|
969 |
\begin{ttbox}
|
|
970 |
{\out 1. ! xs. (xs = [] --> rev xs = []) \& (xs ~= [] --> rev xs ~= [])}
|
|
971 |
\end{ttbox}
|
|
972 |
by typing
|
|
973 |
\begin{ttbox}
|
|
974 |
\input{Misc/splitif.ML}\end{ttbox}
|
|
975 |
Because this is almost always the right proof strategy, the simplifier
|
|
976 |
performs case-splitting on \texttt{if}s automatically. Try \texttt{Simp_tac}
|
|
977 |
on the initial goal above.
|
|
978 |
|
|
979 |
This splitting idea generalizes from \texttt{if} to \ttindex{case}:
|
|
980 |
\begin{ttbox}\makeatother
|
|
981 |
{\out 1. (case xs of [] => zs | y#ys => y#(ys@zs)) = xs@zs}
|
|
982 |
\end{ttbox}
|
|
983 |
becomes
|
|
984 |
\begin{ttbox}\makeatother
|
|
985 |
{\out 1. (xs = [] --> zs = xs @ zs) &}
|
|
986 |
{\out (! a list. xs = a # list --> a # list @ zs = xs @ zs)}
|
|
987 |
\end{ttbox}
|
|
988 |
by typing
|
|
989 |
\begin{ttbox}
|
|
990 |
\input{Misc/splitlist.ML}\end{ttbox}
|
|
991 |
In contrast to \texttt{if}-expressions, the simplifier does not split
|
|
992 |
\texttt{case}-expressions by default because this can lead to nontermination
|
|
993 |
in case of recursive datatypes.
|
|
994 |
Nevertheless the simplifier can be instructed to perform \texttt{case}-splits
|
|
995 |
by adding the appropriate rule to the simpset:
|
|
996 |
\begin{ttbox}
|
|
997 |
by(simp_tac (simpset() addsplits [split_list_case]) 1);
|
|
998 |
\end{ttbox}\indexbold{*addsplits}
|
|
999 |
solves the initial goal outright, which \texttt{Simp_tac} alone will not do.
|
|
1000 |
|
|
1001 |
In general, every datatype $t$ comes with a rule
|
|
1002 |
\texttt{$t$.split} that can be added to the simpset either
|
|
1003 |
locally via \texttt{addsplits} (see above), or permanently via
|
|
1004 |
\begin{ttbox}
|
|
1005 |
Addsplits [\(t\).split];
|
|
1006 |
\end{ttbox}\indexbold{*Addsplits}
|
|
1007 |
Split-rules can be removed globally via \ttindexbold{Delsplits} and locally
|
|
1008 |
via \ttindexbold{delsplits} as, for example, in
|
|
1009 |
\begin{ttbox}
|
|
1010 |
by(simp_tac (simpset() addsimps [\(\dots\)] delsplits [split_if]) 1);
|
|
1011 |
\end{ttbox}
|
|
1012 |
|
|
1013 |
|
|
1014 |
\subsubsection{Permutative rewrite rules}
|
|
1015 |
|
|
1016 |
A rewrite rule is {\bf permutative} if the left-hand side and right-hand side
|
|
1017 |
are the same up to renaming of variables. The most common permutative rule
|
|
1018 |
is commutativity: $x+y = y+x$. Another example is $(x-y)-z = (x-z)-y$. Such
|
|
1019 |
rules are problematic because once they apply, they can be used forever.
|
|
1020 |
The simplifier is aware of this danger and treats permutative rules
|
|
1021 |
separately. For details see~\cite{Isa-Ref-Man}.
|
|
1022 |
|
|
1023 |
\subsubsection{Tracing}
|
|
1024 |
\indexbold{tracing the simplifier}
|
|
1025 |
|
|
1026 |
Using the simplifier effectively may take a bit of experimentation. Set the
|
|
1027 |
\verb$trace_simp$ flag to get a better idea of what is going on:
|
|
1028 |
\begin{ttbox}\makeatother
|
|
1029 |
{\out 1. rev [x] = []}
|
|
1030 |
\ttbreak
|
|
1031 |
set trace_simp;
|
|
1032 |
by(Simp_tac 1);
|
|
1033 |
\ttbreak\makeatother
|
|
1034 |
{\out Applying instance of rewrite rule:}
|
|
1035 |
{\out rev (?x # ?xs) == rev ?xs @ [?x]}
|
|
1036 |
{\out Rewriting:}
|
|
1037 |
{\out rev [x] == rev [] @ [x]}
|
|
1038 |
\ttbreak
|
|
1039 |
{\out Applying instance of rewrite rule:}
|
|
1040 |
{\out rev [] == []}
|
|
1041 |
{\out Rewriting:}
|
|
1042 |
{\out rev [] == []}
|
|
1043 |
\ttbreak\makeatother
|
|
1044 |
{\out Applying instance of rewrite rule:}
|
|
1045 |
{\out [] @ ?y == ?y}
|
|
1046 |
{\out Rewriting:}
|
|
1047 |
{\out [] @ [x] == [x]}
|
|
1048 |
\ttbreak
|
|
1049 |
{\out Applying instance of rewrite rule:}
|
|
1050 |
{\out ?x # ?t = ?t == False}
|
|
1051 |
{\out Rewriting:}
|
|
1052 |
{\out [x] = [] == False}
|
|
1053 |
\ttbreak
|
|
1054 |
{\out Level 1}
|
|
1055 |
{\out rev [x] = []}
|
|
1056 |
{\out 1. False}
|
|
1057 |
\end{ttbox}
|
|
1058 |
In more complicated cases, the trace can be enormous, especially since
|
|
1059 |
invocations of the simplifier are often nested (e.g.\ when solving conditions
|
|
1060 |
of rewrite rules).
|
|
1061 |
|
|
1062 |
\subsection{How it works}
|
|
1063 |
\label{sec:SimpHow}
|
|
1064 |
|
|
1065 |
\subsubsection{Higher-order patterns}
|
|
1066 |
|
|
1067 |
\subsubsection{Local assumptions}
|
|
1068 |
|
|
1069 |
\subsubsection{The preprocessor}
|
|
1070 |
|
|
1071 |
\section{Induction heuristics}
|
|
1072 |
\label{sec:InductionHeuristics}
|
|
1073 |
|
|
1074 |
The purpose of this section is to illustrate some simple heuristics for
|
|
1075 |
inductive proofs. The first one we have already mentioned in our initial
|
|
1076 |
example:
|
|
1077 |
\begin{quote}
|
|
1078 |
{\em 1. Theorems about recursive functions are proved by induction.}
|
|
1079 |
\end{quote}
|
|
1080 |
In case the function has more than one argument
|
|
1081 |
\begin{quote}
|
|
1082 |
{\em 2. Do induction on argument number $i$ if the function is defined by
|
|
1083 |
recursion in argument number $i$.}
|
|
1084 |
\end{quote}
|
|
1085 |
When we look at the proof of
|
|
1086 |
\begin{ttbox}\makeatother
|
|
1087 |
(xs @ ys) @ zs = xs @ (ys @ zs)
|
|
1088 |
\end{ttbox}
|
|
1089 |
in \S\ref{sec:intro-proof} we find (a) \texttt{\at} is recursive in the first
|
|
1090 |
argument, (b) \texttt{xs} occurs only as the first argument of \texttt{\at},
|
|
1091 |
and (c) both \texttt{ys} and \texttt{zs} occur at least once as the second
|
|
1092 |
argument of \texttt{\at}. Hence it is natural to perform induction on
|
|
1093 |
\texttt{xs}.
|
|
1094 |
|
|
1095 |
The key heuristic, and the main point of this section, is to
|
|
1096 |
generalize the goal before induction. The reason is simple: if the goal is
|
|
1097 |
too specific, the induction hypothesis is too weak to allow the induction
|
|
1098 |
step to go through. Let us now illustrate the idea with an example.
|
|
1099 |
|
|
1100 |
We define a tail-recursive version of list-reversal,
|
|
1101 |
i.e.\ one that can be compiled into a loop:
|
|
1102 |
\begin{ttbox}
|
|
1103 |
\input{Misc/Itrev.thy}\end{ttbox}
|
|
1104 |
The behaviour of \texttt{itrev} is simple: it reverses its first argument by
|
|
1105 |
stacking its elements onto the second argument, and returning that second
|
|
1106 |
argument when the first one becomes empty.
|
|
1107 |
We need to show that \texttt{itrev} does indeed reverse its first argument
|
|
1108 |
provided the second one is empty:
|
|
1109 |
\begin{ttbox}
|
|
1110 |
\input{Misc/itrev1.ML}\end{ttbox}
|
|
1111 |
There is no choice as to the induction variable, and we immediately simplify:
|
|
1112 |
\begin{ttbox}
|
|
1113 |
\input{Misc/induct_auto.ML}\ttbreak\makeatother
|
|
1114 |
{\out1. !!a list. itrev list [] = rev list\(\;\)==> itrev list [a] = rev list @ [a]}
|
|
1115 |
\end{ttbox}
|
|
1116 |
Just as predicted above, the overall goal, and hence the induction
|
|
1117 |
hypothesis, is too weak to solve the induction step because of the fixed
|
|
1118 |
\texttt{[]}. The corresponding heuristic:
|
|
1119 |
\begin{quote}
|
|
1120 |
{\em 3. Generalize goals for induction by replacing constants by variables.}
|
|
1121 |
\end{quote}
|
|
1122 |
Of course one cannot do this na\"{\i}vely: \texttt{itrev xs ys = rev xs} is
|
|
1123 |
just not true --- the correct generalization is
|
|
1124 |
\begin{ttbox}\makeatother
|
|
1125 |
\input{Misc/itrev2.ML}\end{ttbox}
|
|
1126 |
If \texttt{ys} is replaced by \texttt{[]}, the right-hand side simplifies to
|
|
1127 |
\texttt{rev xs}, just as required.
|
|
1128 |
|
|
1129 |
In this particular instance it is easy to guess the right generalization,
|
|
1130 |
but in more complex situations a good deal of creativity is needed. This is
|
|
1131 |
the main source of complications in inductive proofs.
|
|
1132 |
|
|
1133 |
Although we now have two variables, only \texttt{xs} is suitable for
|
|
1134 |
induction, and we repeat our above proof attempt. Unfortunately, we are still
|
|
1135 |
not there:
|
|
1136 |
\begin{ttbox}\makeatother
|
|
1137 |
{\out 1. !!a list.}
|
|
1138 |
{\out itrev list ys = rev list @ ys}
|
|
1139 |
{\out ==> itrev list (a # ys) = rev list @ a # ys}
|
|
1140 |
\end{ttbox}
|
|
1141 |
The induction hypothesis is still too weak, but this time it takes no
|
|
1142 |
intuition to generalize: the problem is that \texttt{ys} is fixed throughout
|
|
1143 |
the subgoal, but the induction hypothesis needs to be applied with
|
|
1144 |
\texttt{a \# ys} instead of \texttt{ys}. Hence we prove the theorem
|
|
1145 |
for all \texttt{ys} instead of a fixed one:
|
|
1146 |
\begin{ttbox}\makeatother
|
|
1147 |
\input{Misc/itrev3.ML}\end{ttbox}
|
|
1148 |
This time induction on \texttt{xs} followed by simplification succeeds. This
|
|
1149 |
leads to another heuristic for generalization:
|
|
1150 |
\begin{quote}
|
|
1151 |
{\em 4. Generalize goals for induction by universally quantifying all free
|
|
1152 |
variables {\em(except the induction variable itself!)}.}
|
|
1153 |
\end{quote}
|
|
1154 |
This prevents trivial failures like the above and does not change the
|
|
1155 |
provability of the goal. Because it is not always required, and may even
|
|
1156 |
complicate matters in some cases, this heuristic is often not
|
|
1157 |
applied blindly.
|
|
1158 |
|
|
1159 |
A final point worth mentioning is the orientation of the equation we just
|
|
1160 |
proved: the more complex notion (\texttt{itrev}) is on the left-hand
|
|
1161 |
side, the simpler \texttt{rev} on the right-hand side. This constitutes
|
|
1162 |
another, albeit weak heuristic that is not restricted to induction:
|
|
1163 |
\begin{quote}
|
|
1164 |
{\em 5. The right-hand side of an equation should (in some sense) be
|
|
1165 |
simpler than the left-hand side.}
|
|
1166 |
\end{quote}
|
|
1167 |
The heuristic is tricky to apply because it is not obvious that
|
|
1168 |
\texttt{rev xs \at\ ys} is simpler than \texttt{itrev xs ys}. But see what
|
|
1169 |
happens if you try to prove the symmetric equation!
|
|
1170 |
|
|
1171 |
|
|
1172 |
\section{Case study: compiling expressions}
|
|
1173 |
\label{sec:ExprCompiler}
|
|
1174 |
|
|
1175 |
The task is to develop a compiler from a generic type of expressions (built
|
|
1176 |
up from variables, constants and binary operations) to a stack machine. This
|
|
1177 |
generic type of expressions is a generalization of the boolean expressions in
|
|
1178 |
\S\ref{sec:boolex}. This time we do not commit ourselves to a particular
|
|
1179 |
type of variables or values but make them type parameters. Neither is there
|
|
1180 |
a fixed set of binary operations: instead the expression contains the
|
|
1181 |
appropriate function itself.
|
|
1182 |
\begin{ttbox}
|
|
1183 |
\input{CodeGen/expr}\end{ttbox}
|
|
1184 |
The three constructors represent constants, variables and the combination of
|
|
1185 |
two subexpressions with a binary operation.
|
|
1186 |
|
|
1187 |
The value of an expression w.r.t.\ an environment that maps variables to
|
|
1188 |
values is easily defined:
|
|
1189 |
\begin{ttbox}
|
|
1190 |
\input{CodeGen/value}\end{ttbox}
|
|
1191 |
|
|
1192 |
The stack machine has three instructions: load a constant value onto the
|
|
1193 |
stack, load the contents of a certain address onto the stack, and apply a
|
|
1194 |
binary operation to the two topmost elements of the stack, replacing them by
|
|
1195 |
the result. As for \texttt{expr}, addresses and values are type parameters:
|
|
1196 |
\begin{ttbox}
|
|
1197 |
\input{CodeGen/instr}\end{ttbox}
|
|
1198 |
|
|
1199 |
The execution of the stack machine is modelled by a function \texttt{exec}
|
|
1200 |
that takes a store (modelled as a function from addresses to values, just
|
|
1201 |
like the environment for evaluating expressions), a stack (modelled as a
|
|
1202 |
list) of values and a list of instructions and returns the stack at the end
|
|
1203 |
of the execution --- the store remains unchanged:
|
|
1204 |
\begin{ttbox}
|
|
1205 |
\input{CodeGen/exec}\end{ttbox}
|
|
1206 |
Recall that \texttt{hd} and \texttt{tl}
|
|
1207 |
return the first element and the remainder of a list.
|
|
1208 |
|
|
1209 |
Because all functions are total, \texttt{hd} is defined even for the empty
|
|
1210 |
list, although we do not know what the result is. Thus our model of the
|
|
1211 |
machine always terminates properly, although the above definition does not
|
|
1212 |
tell us much about the result in situations where \texttt{Apply} was executed
|
|
1213 |
with fewer than two elements on the stack.
|
|
1214 |
|
|
1215 |
The compiler is a function from expressions to a list of instructions. Its
|
|
1216 |
definition is pretty much obvious:
|
|
1217 |
\begin{ttbox}\makeatother
|
|
1218 |
\input{CodeGen/comp}\end{ttbox}
|
|
1219 |
|
|
1220 |
Now we have to prove the correctness of the compiler, i.e.\ that the
|
|
1221 |
execution of a compiled expression results in the value of the expression:
|
|
1222 |
\begin{ttbox}
|
|
1223 |
exec s [] (comp e) = [value s e]
|
|
1224 |
\end{ttbox}
|
|
1225 |
This is generalized to
|
|
1226 |
\begin{ttbox}
|
|
1227 |
\input{CodeGen/goal2.ML}\end{ttbox}
|
|
1228 |
and proved by induction on \texttt{e} followed by simplification, once we
|
|
1229 |
have the following lemma about executing the concatenation of two instruction
|
|
1230 |
sequences:
|
|
1231 |
\begin{ttbox}\makeatother
|
|
1232 |
\input{CodeGen/goal2.ML}\end{ttbox}
|
|
1233 |
This requires induction on \texttt{xs} and ordinary simplification for the
|
|
1234 |
base cases. In the induction step, simplification leaves us with a formula
|
|
1235 |
that contains two \texttt{case}-expressions over instructions. Thus we add
|
|
1236 |
automatic case splitting as well, which finishes the proof:
|
|
1237 |
\begin{ttbox}
|
|
1238 |
\input{CodeGen/simpsplit.ML}\end{ttbox}
|
|
1239 |
|
|
1240 |
We could now go back and prove \texttt{exec s [] (comp e) = [value s e]}
|
|
1241 |
merely by simplification with the generalized version we just proved.
|
|
1242 |
However, this is unnecessary because the generalized version fully subsumes
|
|
1243 |
its instance.
|
|
1244 |
|
|
1245 |
\section{Total recursive functions}
|
|
1246 |
\label{sec:recdef}
|
|
1247 |
\index{*recdef|(}
|
|
1248 |
|
|
1249 |
|
|
1250 |
Although many total functions have a natural primitive recursive definition,
|
|
1251 |
this is not always the case. Arbitrary total recursive functions can be
|
|
1252 |
defined by means of \texttt{recdef}: you can use full pattern-matching,
|
|
1253 |
recursion need not involve datatypes, and termination is proved by showing
|
|
1254 |
that each recursive call makes the argument smaller in a suitable (user
|
|
1255 |
supplied) sense.
|
|
1256 |
|
|
1257 |
\subsection{Defining recursive functions}
|
|
1258 |
|
|
1259 |
Here is a simple example, the Fibonacci function:
|
|
1260 |
\begin{ttbox}
|
|
1261 |
consts fib :: nat => nat
|
|
1262 |
recdef fib "measure(\%n. n)"
|
|
1263 |
"fib 0 = 0"
|
|
1264 |
"fib 1 = 1"
|
|
1265 |
"fib (Suc(Suc x)) = fib x + fib (Suc x)"
|
|
1266 |
\end{ttbox}
|
|
1267 |
The definition of \texttt{fib} is accompanied by a \bfindex{measure function}
|
|
1268 |
\texttt{\%n.$\;$n} that maps the argument of \texttt{fib} to a natural
|
|
1269 |
number. The requirement is that in each equation the measure of the argument
|
|
1270 |
on the left-hand side is strictly greater than the measure of the argument of
|
|
1271 |
each recursive call. In the case of \texttt{fib} this is obviously true
|
|
1272 |
because the measure function is the identity and \texttt{Suc(Suc~x)} is
|
|
1273 |
strictly greater than both \texttt{x} and \texttt{Suc~x}.
|
|
1274 |
|
|
1275 |
Slightly more interesting is the insertion of a fixed element
|
|
1276 |
between any two elements of a list:
|
|
1277 |
\begin{ttbox}
|
|
1278 |
consts sep :: "'a * 'a list => 'a list"
|
|
1279 |
recdef sep "measure (\%(a,xs). length xs)"
|
|
1280 |
"sep(a, []) = []"
|
|
1281 |
"sep(a, [x]) = [x]"
|
|
1282 |
"sep(a, x#y#zs) = x # a # sep(a,y#zs)"
|
|
1283 |
\end{ttbox}
|
|
1284 |
This time the measure is the length of the list, which decreases with the
|
|
1285 |
recursive call; the first component of the argument tuple is irrelevant.
|
|
1286 |
|
|
1287 |
Pattern matching need not be exhaustive:
|
|
1288 |
\begin{ttbox}
|
|
1289 |
consts last :: 'a list => bool
|
|
1290 |
recdef last "measure (\%xs. length xs)"
|
|
1291 |
"last [x] = x"
|
|
1292 |
"last (x#y#zs) = last (y#zs)"
|
|
1293 |
\end{ttbox}
|
|
1294 |
|
|
1295 |
Overlapping patterns are disambiguated by taking the order of equations into
|
|
1296 |
account, just as in functional programming:
|
|
1297 |
\begin{ttbox}
|
|
1298 |
recdef sep "measure (\%(a,xs). length xs)"
|
|
1299 |
"sep(a, x#y#zs) = x # a # sep(a,y#zs)"
|
|
1300 |
"sep(a, xs) = xs"
|
|
1301 |
\end{ttbox}
|
|
1302 |
This defines exactly the same function \texttt{sep} as further above.
|
|
1303 |
|
|
1304 |
\begin{warn}
|
|
1305 |
Currently \texttt{recdef} only accepts functions with a single argument,
|
|
1306 |
possibly of tuple type.
|
|
1307 |
\end{warn}
|
|
1308 |
|
|
1309 |
When loading a theory containing a \texttt{recdef} of a function $f$,
|
|
1310 |
Isabelle proves the recursion equations and stores the result as a list of
|
|
1311 |
theorems $f$.\texttt{rules}. It can be viewed by typing
|
|
1312 |
\begin{ttbox}
|
|
1313 |
prths \(f\).rules;
|
|
1314 |
\end{ttbox}
|
|
1315 |
All of the above examples are simple enough that Isabelle can determine
|
|
1316 |
automatically that the measure of the argument goes down in each recursive
|
|
1317 |
call. In that case $f$.\texttt{rules} contains precisely the defining
|
|
1318 |
equations.
|
|
1319 |
|
|
1320 |
In general, Isabelle may not be able to prove all termination conditions
|
|
1321 |
automatically. For example, termination of
|
|
1322 |
\begin{ttbox}
|
|
1323 |
consts gcd :: "nat*nat => nat"
|
|
1324 |
recdef gcd "measure ((\%(m,n).n))"
|
|
1325 |
"gcd (m, n) = (if n=0 then m else gcd(n, m mod n))"
|
|
1326 |
\end{ttbox}
|
|
1327 |
relies on the lemma \texttt{mod_less_divisor}
|
|
1328 |
\begin{ttbox}
|
|
1329 |
0 < n ==> m mod n < n
|
|
1330 |
\end{ttbox}
|
|
1331 |
that is not part of the default simpset. As a result, Isabelle prints a
|
|
1332 |
warning and \texttt{gcd.rules} contains a precondition:
|
|
1333 |
\begin{ttbox}
|
|
1334 |
(! m n. 0 < n --> m mod n < n) ==> gcd (m, n) = (if n=0 \dots)
|
|
1335 |
\end{ttbox}
|
|
1336 |
We need to instruct \texttt{recdef} to use an extended simpset to prove the
|
|
1337 |
termination condition:
|
|
1338 |
\begin{ttbox}
|
|
1339 |
recdef gcd "measure ((\%(m,n).n))"
|
|
1340 |
simpset "simpset() addsimps [mod_less_divisor]"
|
|
1341 |
"gcd (m, n) = (if n=0 then m else gcd(n, m mod n))"
|
|
1342 |
\end{ttbox}
|
|
1343 |
This time everything works fine and \texttt{gcd.rules} contains precisely the
|
|
1344 |
stated recursion equation for \texttt{gcd}.
|
|
1345 |
|
|
1346 |
When defining some nontrivial total recursive function, the first attempt
|
|
1347 |
will usually generate a number of termination conditions, some of which may
|
|
1348 |
require new lemmas to be proved in some of the parent theories. Those lemmas
|
|
1349 |
can then be added to the simpset used by \texttt{recdef} for its
|
|
1350 |
proofs, as shown for \texttt{gcd}.
|
|
1351 |
|
|
1352 |
Although all the above examples employ measure functions, \texttt{recdef}
|
|
1353 |
allows arbitrary wellfounded relations. For example, termination of
|
|
1354 |
Ackermann's function requires the lexicographic product \texttt{**}:
|
|
1355 |
\begin{ttbox}
|
|
1356 |
recdef ack "measure(\%m. m) ** measure(\%n. n)"
|
|
1357 |
"ack(0,n) = Suc n"
|
|
1358 |
"ack(Suc m,0) = ack(m, 1)"
|
|
1359 |
"ack(Suc m,Suc n) = ack(m,ack(Suc m,n))"
|
|
1360 |
\end{ttbox}
|
|
1361 |
For details see~\cite{Isa-Logics-Man} and the examples in the library.
|
|
1362 |
|
|
1363 |
|
|
1364 |
\subsection{Deriving simplification rules}
|
|
1365 |
|
|
1366 |
Once we have succeeded to prove all termination conditions, we can start to
|
|
1367 |
derive some theorems. In contrast to \texttt{primrec} definitions, which are
|
|
1368 |
automatically added to the simpset, \texttt{recdef} rules must be included
|
|
1369 |
explicitly, for example as in
|
|
1370 |
\begin{ttbox}
|
|
1371 |
Addsimps fib.rules;
|
|
1372 |
\end{ttbox}
|
|
1373 |
However, some care is necessary now, in contrast to \texttt{primrec}.
|
|
1374 |
Although \texttt{gcd} is a total function, its defining equation leads to
|
|
1375 |
nontermination of the simplifier, because the subterm \texttt{gcd(n, m mod
|
|
1376 |
n)} on the right-hand side can again be simplified by the same equation,
|
|
1377 |
and so on. The reason: the simplifier rewrites the \texttt{then} and
|
|
1378 |
\texttt{else} branches of a conditional if the condition simplifies to
|
|
1379 |
neither \texttt{True} nor \texttt{False}. Therefore it is recommended to
|
|
1380 |
derive an alternative formulation that replaces case distinctions on the
|
|
1381 |
right-hand side by conditional equations. For \texttt{gcd} it means we have
|
|
1382 |
to prove
|
|
1383 |
\begin{ttbox}
|
|
1384 |
gcd (m, 0) = m
|
|
1385 |
n ~= 0 ==> gcd (m, n) = gcd(n, m mod n)
|
|
1386 |
\end{ttbox}
|
|
1387 |
To avoid nontermination during those proofs, we have to resort to some low
|
|
1388 |
level tactics:
|
|
1389 |
\begin{ttbox}
|
|
1390 |
Goal "gcd(m,0) = m";
|
|
1391 |
by(resolve_tac [trans] 1);
|
|
1392 |
by(resolve_tac gcd.rules 1);
|
|
1393 |
by(Simp_tac 1);
|
|
1394 |
\end{ttbox}
|
|
1395 |
At this point it is not necessary to understand what exactly
|
|
1396 |
\texttt{resolve_tac} is doing. The main point is that the above proof works
|
|
1397 |
not just for this one example but in general (except that we have to use
|
|
1398 |
\texttt{Asm_simp_tac} and $f$\texttt{.rules} in general). Try the second
|
|
1399 |
\texttt{gcd}-equation above!
|
|
1400 |
|
|
1401 |
\subsection{Induction}
|
|
1402 |
|
|
1403 |
Assuming we have added the recursion equations (or some suitable derived
|
|
1404 |
equations) to the simpset, we might like to prove something about our
|
|
1405 |
function. Since the function is recursive, the natural proof principle is
|
|
1406 |
again induction. But this time the structural form of induction that comes
|
|
1407 |
with datatypes is unlikely to work well---otherwise we could have defined the
|
|
1408 |
function by \texttt{primrec}. Therefore \texttt{recdef} automatically proves
|
|
1409 |
a suitable induction rule $f$\texttt{.induct} that follows the recursion
|
|
1410 |
pattern of the particular function $f$. Roughly speaking, it requires you to
|
|
1411 |
prove for each \texttt{recdef} equation that the property you are trying to
|
|
1412 |
establish holds for the left-hand side provided it holds for all recursive
|
|
1413 |
calls on the right-hand side. Applying $f$\texttt{.induct} requires its
|
|
1414 |
explicit instantiation. See \S\ref{sec:explicit-inst} for details.
|
|
1415 |
|
|
1416 |
\index{*recdef|)}
|