1240 We could now go back and prove \texttt{exec s [] (comp e) = [value s e]} |
1240 We could now go back and prove \texttt{exec s [] (comp e) = [value s e]} |
1241 merely by simplification with the generalized version we just proved. |
1241 merely by simplification with the generalized version we just proved. |
1242 However, this is unnecessary because the generalized version fully subsumes |
1242 However, this is unnecessary because the generalized version fully subsumes |
1243 its instance. |
1243 its instance. |
1244 |
1244 |
|
1245 |
|
1246 \section{Advanced datatypes} |
|
1247 \index{*datatype|(} |
|
1248 \index{*primrec|(} |
|
1249 |
|
1250 \subsection{Mutual recursion} |
|
1251 |
|
1252 Sometimes it is necessary to define two datatypes that depend on each |
|
1253 other. This is called \textbf{mutual recursion}. As an example consider a |
|
1254 language of arithmetic and boolean expressions where |
|
1255 \begin{itemize} |
|
1256 \item arithmetic expressions contain boolean expressions because there are |
|
1257 conditional arithmetic expressions like ``if $m<n$ then $n-m$ else $m-n$'', |
|
1258 and |
|
1259 \item boolean expressions contain arithmetic expressions because of |
|
1260 comparisons like ``$m<n$''. |
|
1261 \end{itemize} |
|
1262 In Isabelle this becomes |
|
1263 \begin{ttbox} |
|
1264 \input{Datatype/abdata}\end{ttbox}\indexbold{*and} |
|
1265 Type \texttt{aexp} is similar to \texttt{expr} in \S\ref{sec:ExprCompiler}, |
|
1266 except that we have fixed the values to be of type \texttt{nat} and that we |
|
1267 have fixed the two binary operations \texttt{Sum} and \texttt{Diff}. Boolean |
|
1268 expressions can be arithmetic comparisons, conjunctions and negations. |
|
1269 The semantics is fixed via two evaluation functions |
|
1270 \begin{ttbox} |
|
1271 \input{Datatype/abconstseval}\end{ttbox} |
|
1272 that take an environment (a mapping from variables \texttt{'a} to values |
|
1273 \texttt{nat}) and an expression and return its arithmetic/boolean |
|
1274 value. Since the datatypes are mutually recursive, so are functions that |
|
1275 operate on them. Hence they need to be defined in a single \texttt{primrec} |
|
1276 section: |
|
1277 \begin{ttbox} |
|
1278 \input{Datatype/abevala} |
|
1279 \input{Datatype/abevalb}\end{ttbox} |
|
1280 |
|
1281 %In general, given $n$ mutually recursive datatypes, whenever you define a |
|
1282 %\texttt{primrec} function on one of them, Isabelle expects you to define $n$ |
|
1283 %(possibly mutually recursive) functions simultaneously. |
|
1284 |
|
1285 In the same fashion we also define two functions that perform substitution: |
|
1286 \begin{ttbox} |
|
1287 \input{Datatype/abconstssubst}\end{ttbox} |
|
1288 The first argument is a function mapping variables to expressions, the |
|
1289 substitution. It is applied to all variables in the second argument. As a |
|
1290 result, the type of variables in the expression may change from \texttt{'a} |
|
1291 to \texttt{'b}. Note that there are only arithmetic and no boolean variables. |
|
1292 \begin{ttbox} |
|
1293 \input{Datatype/absubsta} |
|
1294 \input{Datatype/absubstb}\end{ttbox} |
|
1295 |
|
1296 Now we can prove a fundamental theorem about the interaction between |
|
1297 evaluation and substitution: applying a substitution $s$ to an expression $a$ |
|
1298 and evaluating the result in an environment $env$ yields the same result as |
|
1299 evaluation $a$ in the environment that maps every variable $x$ to the value |
|
1300 of $s(x)$ under $env$. If you try to prove this separately for arithmetic or |
|
1301 boolean expressions (by induction), you find that you always need the other |
|
1302 theorem in the induction step. Therefore you need to state and prove both |
|
1303 theorems simultaneously: |
|
1304 \begin{quote}\small |
|
1305 \verbatiminput{Datatype/abgoalind.ML} |
|
1306 \end{quote}\indexbold{*mutual_induct_tac} |
|
1307 The resulting 8 goals (one for each constructor) are proved in one fell swoop |
|
1308 \texttt{by Auto_tac;}. |
|
1309 |
|
1310 In general, given $n$ mutually recursive datatypes $\tau@1$, \dots, $\tau@n$, |
|
1311 Isabelle expects an inductive proof to start with a goal of the form |
|
1312 \[ P@1(x@1)\ \texttt{\&}\ \dots\ \texttt{\&}\ P@n(x@n) \] |
|
1313 where each variable $x@i$ is of type $\tau@i$. Induction is started by |
|
1314 \begin{ttbox} |
|
1315 by(mutual_induct_tac ["\(x@1\)",\(\dots\),"\(x@n\)"] \(k\)); |
|
1316 \end{ttbox} |
|
1317 |
|
1318 \begin{exercise} |
|
1319 Define a function \texttt{norma} of type \texttt{'a aexp => 'a aexp} that |
|
1320 replaces \texttt{IF}s with complex boolean conditions by nested |
|
1321 \texttt{IF}s where each condition is a \texttt{Less} --- \texttt{And} and |
|
1322 \texttt{Neg} should be eliminated completely. Prove that \texttt{norma} |
|
1323 preserves the value of an expression and that the result of \texttt{norma} |
|
1324 is really normal, i.e.\ no more \texttt{And}s and \texttt{Neg}s occur in |
|
1325 it. ({\em Hint:} proceed as in \S\ref{sec:boolex}). |
|
1326 \end{exercise} |
|
1327 |
|
1328 \subsection{Nested recursion} |
|
1329 |
|
1330 So far, all datatypes had the property that on the right-hand side of their |
|
1331 definition they occurred only at the top-level, i.e.\ directly below a |
|
1332 constructor. This is not the case any longer for the following model of terms |
|
1333 where function symbols can be applied to a list of arguments: |
|
1334 \begin{ttbox} |
|
1335 \input{Datatype/tdata}\end{ttbox} |
|
1336 Parameter \texttt{'a} is the type of variables and \texttt{'b} the type of |
|
1337 function symbols. |
|
1338 A mathematical term like $f(x,g(y))$ becomes \texttt{App f [Var x, App g |
|
1339 [Var y]]}, where \texttt{f}, \texttt{g}, \texttt{x}, \texttt{y} are |
|
1340 suitable values, e.g.\ numbers or strings. |
|
1341 |
|
1342 What complicates the definition of \texttt{term} is the nested occurrence of |
|
1343 \texttt{term} inside \texttt{list} on the right-hand side. In principle, |
|
1344 nested recursion can be eliminated in favour of mutual recursion by unfolding |
|
1345 the offending datatypes, here \texttt{list}. The result for \texttt{term} |
|
1346 would be something like |
|
1347 \begin{ttbox} |
|
1348 \input{Datatype/tunfoldeddata}\end{ttbox} |
|
1349 Although we do not recommend this unfolding to the user, this is how Isabelle |
|
1350 deals with nested recursion internally. Strictly speaking, this information |
|
1351 is irrelevant at the user level (and might change in the future), but it |
|
1352 motivates why \texttt{primrec} and induction work for nested types they way |
|
1353 they do. We now return to the initial definition of \texttt{term} using |
|
1354 nested recursion. |
|
1355 |
|
1356 Let us define a substitution function on terms. Because terms involve term |
|
1357 lists, we need to define two substitution functions simultaneously: |
|
1358 \begin{ttbox} |
|
1359 \input{Datatype/tconstssubst} |
|
1360 \input{Datatype/tsubst} |
|
1361 \input{Datatype/tsubsts}\end{ttbox} |
|
1362 Similarly, when proving a statement about terms inductively, we need |
|
1363 to prove a related statement about term lists simultaneously. For example, |
|
1364 the fact that the identity substitution does not change a term needs to be |
|
1365 strengthened and proved as follows: |
|
1366 \begin{quote}\small |
|
1367 \verbatiminput{Datatype/tidproof.ML} |
|
1368 \end{quote} |
|
1369 Note that \texttt{Var} is the identity substitution because by definition it |
|
1370 leaves variables unchanged: \texttt{subst Var (Var $x$) = Var $x$}. Note also |
|
1371 that the type annotations are necessary because otherwise there is nothing in |
|
1372 the goal to enforce that both halfs of the goal talk about the same type |
|
1373 parameters \texttt{('a,'b)}. As a result, induction would fail |
|
1374 because the two halfs of the goal would be unrelated. |
|
1375 |
|
1376 \begin{exercise} |
|
1377 Substitution distributes over composition can be expressed roughly as |
|
1378 follows: |
|
1379 \begin{ttbox} |
|
1380 subst (f o g) t = subst f (subst g t) |
|
1381 \end{ttbox} |
|
1382 Correct this statement (you will find that it does not type-check), |
|
1383 strengthen it and prove it. (Note: \texttt{o} is function composition; its |
|
1384 definition is found in the theorem \texttt{o_def}). |
|
1385 \end{exercise} |
|
1386 |
|
1387 If you feel that the \texttt{App}-equation in the definition of substitution |
|
1388 is overly complicated, you are right: the simpler |
|
1389 \begin{ttbox} |
|
1390 \input{Datatype/appmap}\end{ttbox} |
|
1391 would have done the job. Unfortunately, Isabelle insists on the more verbose |
|
1392 equation given above. Nevertheless, we can easily {\em prove} that the |
|
1393 \texttt{map}-equation holds: simply by induction on \texttt{ts} followed by |
|
1394 \texttt{Auto_tac}. |
|
1395 |
|
1396 %FIXME: forward pointer to section where better induction principles are |
|
1397 %derived? |
|
1398 |
|
1399 \begin{exercise} |
|
1400 Define a function \texttt{rev_term} of type \texttt{term -> term} that |
|
1401 recursively reverses the order of arguments of all function symbols in a |
|
1402 term. Prove that \texttt{rev_term(rev_term t) = t}. |
|
1403 \end{exercise} |
|
1404 |
|
1405 Of course, you may also combine mutual and nested recursion as in the |
|
1406 following example |
|
1407 \begin{ttbox} |
|
1408 \input{Datatype/mutnested}\end{ttbox} |
|
1409 taken from an operational semantics of applied lambda terms. Note that double |
|
1410 quotes are required around the type involving \texttt{*}, as explained on |
|
1411 page~\pageref{startype}. |
|
1412 |
|
1413 \subsection{The limits of nested recursion} |
|
1414 |
|
1415 How far can we push nested recursion? By the unfolding argument above, we can |
|
1416 reduce nested to mutual recursion provided the nested recursion only involves |
|
1417 previously defined datatypes. Isabelle is a bit more generous and also permits |
|
1418 products as in the \texttt{data} example above. |
|
1419 However, functions are most emphatically not allowed: |
|
1420 \begin{ttbox} |
|
1421 datatype t = C (t => bool) |
|
1422 \end{ttbox} |
|
1423 is a real can of worms: in HOL it must be ruled out because it requires a |
|
1424 type \texttt{t} such that \texttt{t} and its power set \texttt{t => bool} |
|
1425 have the same cardinality---an impossibility. |
|
1426 In theory, we could allow limited forms of recursion involving function |
|
1427 spaces. For example |
|
1428 \begin{ttbox} |
|
1429 datatype t = C (nat => t) | D |
|
1430 \end{ttbox} |
|
1431 is unproblematic. However, Isabelle does not support recursion involving |
|
1432 \texttt{=>} at all at the moment. |
|
1433 |
|
1434 For a theoretical analysis of what kinds of datatypes are feasible in HOL |
|
1435 see, for example,~\cite{Gunter-HOL92}. There are alternatives to pure HOL: |
|
1436 LCF~\cite{Paulson-LCF} is a logic where types like |
|
1437 \begin{ttbox} |
|
1438 datatype t = C (t -> t) |
|
1439 \end{ttbox} |
|
1440 makes enormous sense (note the intentionally different arrow \texttt{->}!). |
|
1441 There is even a version of LCF on top of HOL, called |
|
1442 HOLCF~\cite{MuellerNvOS98}. |
|
1443 |
|
1444 \index{*primrec|)} |
|
1445 \index{*datatype|)} |
|
1446 |
|
1447 \subsection{Case study: Tries} |
|
1448 |
|
1449 Tries are a classic search tree data structure~\cite{Knuth3-75} for fast |
|
1450 indexing with strings. Figure~\ref{fig:trie} gives a graphical example of a |
|
1451 trie containing the words ``all'', ``an'', ``ape'', ``can'', ``car'' and |
|
1452 ``cat''. When searching a string in a trie, the letters of the string are |
|
1453 examined sequentially. Each letter determines which subtrie to search next. |
|
1454 In this case study we model tries as a datatype, define a lookup and an |
|
1455 update function, and prove that they behave as expected. |
|
1456 |
|
1457 \begin{figure}[htbp] |
|
1458 \begin{center} |
|
1459 \unitlength1mm |
|
1460 \begin{picture}(60,30) |
|
1461 \put( 5, 0){\makebox(0,0)[b]{l}} |
|
1462 \put(25, 0){\makebox(0,0)[b]{e}} |
|
1463 \put(35, 0){\makebox(0,0)[b]{n}} |
|
1464 \put(45, 0){\makebox(0,0)[b]{r}} |
|
1465 \put(55, 0){\makebox(0,0)[b]{t}} |
|
1466 % |
|
1467 \put( 5, 9){\line(0,-1){5}} |
|
1468 \put(25, 9){\line(0,-1){5}} |
|
1469 \put(44, 9){\line(-3,-2){9}} |
|
1470 \put(45, 9){\line(0,-1){5}} |
|
1471 \put(46, 9){\line(3,-2){9}} |
|
1472 % |
|
1473 \put( 5,10){\makebox(0,0)[b]{l}} |
|
1474 \put(15,10){\makebox(0,0)[b]{n}} |
|
1475 \put(25,10){\makebox(0,0)[b]{p}} |
|
1476 \put(45,10){\makebox(0,0)[b]{a}} |
|
1477 % |
|
1478 \put(14,19){\line(-3,-2){9}} |
|
1479 \put(15,19){\line(0,-1){5}} |
|
1480 \put(16,19){\line(3,-2){9}} |
|
1481 \put(45,19){\line(0,-1){5}} |
|
1482 % |
|
1483 \put(15,20){\makebox(0,0)[b]{a}} |
|
1484 \put(45,20){\makebox(0,0)[b]{c}} |
|
1485 % |
|
1486 \put(30,30){\line(-3,-2){13}} |
|
1487 \put(30,30){\line(3,-2){13}} |
|
1488 \end{picture} |
|
1489 \end{center} |
|
1490 \caption{A sample trie} |
|
1491 \label{fig:trie} |
|
1492 \end{figure} |
|
1493 |
|
1494 Proper tries associates some value with each string. Since the |
|
1495 information is stored only in the final node associated with the string, many |
|
1496 nodes do not carry any value. This distinction is captured by the |
|
1497 following predefined datatype: |
|
1498 \begin{ttbox} |
|
1499 datatype 'a option = None | Some 'a |
|
1500 \end{ttbox} |
|
1501 |
|
1502 To minimize running time, each node of a trie should contain an array that maps |
|
1503 letters to subtries. We have chose a more space efficient representation |
|
1504 where the subtries are held in an association list, i.e.\ a list of |
|
1505 (letter,trie) pairs. Abstracting over the alphabet \texttt{'a} and the |
|
1506 values \texttt{'v} we define a trie as follows: |
|
1507 \begin{ttbox} |
|
1508 \input{Datatype/trie}\end{ttbox} |
|
1509 The first component is the optional value, the second component the |
|
1510 association list of subtries. We define two corresponding selector functions: |
|
1511 \begin{ttbox} |
|
1512 \input{Datatype/triesels}\end{ttbox} |
|
1513 Association lists come with a generic lookup function: |
|
1514 \begin{ttbox} |
|
1515 \input{Datatype/assoc}\end{ttbox} |
|
1516 |
|
1517 Now we can define the lookup function for tries. It descends into the trie |
|
1518 examining the letters of the search string one by one. As |
|
1519 recursion on lists is simpler than on tries, let us express this as primitive |
|
1520 recursion on the search string argument: |
|
1521 \begin{ttbox} |
|
1522 \input{Datatype/lookup}\end{ttbox} |
|
1523 As a first simple property we prove that looking up a string in the empty |
|
1524 trie \texttt{Trie~None~[]} always returns \texttt{None}. The proof merely |
|
1525 distinguishes the two cases whether the search string is empty or not: |
|
1526 \begin{ttbox} |
|
1527 \input{Datatype/lookupempty.ML}\end{ttbox} |
|
1528 This lemma is added to the simpset as usual. |
|
1529 |
|
1530 Things begin to get interesting with the definition of an update function |
|
1531 that adds a new (string,value) pair to a trie, overwriting the old value |
|
1532 associated with that string: |
|
1533 \begin{ttbox} |
|
1534 \input{Datatype/update}\end{ttbox} |
|
1535 The base case is obvious. In the recursive case the subtrie |
|
1536 \texttt{tt} associated with the first letter \texttt{a} is extracted, |
|
1537 recursively updated, and then placed in front of the association list. |
|
1538 The old subtrie associated with \texttt{a} is still in the association list |
|
1539 but no longer accessible via \texttt{assoc}. Clearly, there is room here for |
|
1540 optimizations! |
|
1541 |
|
1542 Our main goal is to prove the correct interaction of \texttt{update} and |
|
1543 \texttt{lookup}: |
|
1544 \begin{quote}\small |
|
1545 \verbatiminput{Datatype/triemain.ML} |
|
1546 \end{quote} |
|
1547 Our plan will be to induct on \texttt{as}; hence the remaining variables are |
|
1548 quantified. From the definitions it is clear that induction on either |
|
1549 \texttt{as} or \texttt{bs} is required. The choice of \texttt{as} is merely |
|
1550 guided by the intuition that simplification of \texttt{lookup} might be easier |
|
1551 if \texttt{update} has already been simplified, which can only happen if |
|
1552 \texttt{as} is instantiated. |
|
1553 The start of the proof is completely conventional: |
|
1554 \begin{ttbox} |
|
1555 \input{Datatype/trieinduct.ML}\end{ttbox} |
|
1556 Unfortunately, this time we are left with three intimidating looking subgoals: |
|
1557 \begin{ttbox} |
|
1558 {\out 1. ... ==> ... lookup (...) bs = lookup t bs} |
|
1559 {\out 2. ... ==> ... lookup (...) bs = lookup t bs} |
|
1560 {\out 3. ... ==> ... lookup (...) bs = lookup t bs} |
|
1561 \end{ttbox} |
|
1562 Clearly, if we want to make headway we have to instantiate \texttt{bs} as |
|
1563 well now. It turns out that instead of induction, case distinction |
|
1564 suffices: |
|
1565 \begin{ttbox} |
|
1566 \input{Datatype/trieexhaust.ML}\end{ttbox} |
|
1567 The {\em tactical} \texttt{ALLGOALS} merely applies the tactic following it |
|
1568 to all subgoals, which results in a total of six subgoals; \texttt{Auto_tac} |
|
1569 solves them all. |
|
1570 |
|
1571 This proof may look surprisingly straightforward. The reason is that we |
|
1572 have told the simplifier (without telling the reader) to expand all |
|
1573 \texttt{let}s and to split all \texttt{case}-constructs over options before |
|
1574 we started on the main goal: |
|
1575 \begin{ttbox} |
|
1576 \input{Datatype/trieAdds.ML}\end{ttbox} |
|
1577 |
|
1578 \begin{exercise} |
|
1579 Write an improved version of \texttt{update} that does not suffer from the |
|
1580 space leak in the version above. Prove the main theorem for your improved |
|
1581 \texttt{update}. |
|
1582 \end{exercise} |
|
1583 |
|
1584 \begin{exercise} |
|
1585 Modify \texttt{update} so that it can also {\em delete} entries from a |
|
1586 trie. It is up to you if you want to shrink tries if possible. Prove (a |
|
1587 modified version of) the main theorem above. |
|
1588 \end{exercise} |
|
1589 |
1245 \section{Total recursive functions} |
1590 \section{Total recursive functions} |
1246 \label{sec:recdef} |
1591 \label{sec:recdef} |
1247 \index{*recdef|(} |
1592 \index{*recdef|(} |
1248 |
1593 |
1249 |
1594 |