| 
8745
 | 
     1  | 
(*<*)
  | 
| 
58860
 | 
     2  | 
theory Trie imports Main begin
  | 
| 
8745
 | 
     3  | 
(*>*)
  | 
| 
 | 
     4  | 
text{*
 | 
| 
 | 
     5  | 
To minimize running time, each node of a trie should contain an array that maps
  | 
| 
10795
 | 
     6  | 
letters to subtries. We have chosen a
  | 
| 
8745
 | 
     7  | 
representation where the subtries are held in an association list, i.e.\ a
  | 
| 
9792
 | 
     8  | 
list of (letter,trie) pairs.  Abstracting over the alphabet @{typ"'a"} and the
 | 
| 
 | 
     9  | 
values @{typ"'v"} we define a trie as follows:
 | 
| 
58860
 | 
    10  | 
*}
  | 
| 
8745
 | 
    11  | 
  | 
| 
58860
 | 
    12  | 
datatype ('a,'v)trie = Trie  "'v option"  "('a * ('a,'v)trie)list"
 | 
| 
8745
 | 
    13  | 
  | 
| 
 | 
    14  | 
text{*\noindent
 | 
| 
11458
 | 
    15  | 
\index{datatypes!and nested recursion}%
 | 
| 
8745
 | 
    16  | 
The first component is the optional value, the second component the
  | 
| 
 | 
    17  | 
association list of subtries.  This is an example of nested recursion involving products,
  | 
| 
 | 
    18  | 
which is fine because products are datatypes as well.
  | 
| 
 | 
    19  | 
We define two selector functions:
  | 
| 
58860
 | 
    20  | 
*}
  | 
| 
8745
 | 
    21  | 
  | 
| 
27015
 | 
    22  | 
primrec "value" :: "('a,'v)trie \<Rightarrow> 'v option" where
 | 
| 
 | 
    23  | 
"value(Trie ov al) = ov"
  | 
| 
 | 
    24  | 
primrec alist :: "('a,'v)trie \<Rightarrow> ('a * ('a,'v)trie)list" where
 | 
| 
 | 
    25  | 
"alist(Trie ov al) = al"
  | 
| 
8745
 | 
    26  | 
  | 
| 
 | 
    27  | 
text{*\noindent
 | 
| 
11309
 | 
    28  | 
Association lists come with a generic lookup function.  Its result
  | 
| 
 | 
    29  | 
involves type @{text option} because a lookup can fail:
 | 
| 
58860
 | 
    30  | 
*}
  | 
| 
8745
 | 
    31  | 
  | 
| 
27015
 | 
    32  | 
primrec assoc :: "('key * 'val)list \<Rightarrow> 'key \<Rightarrow> 'val option" where
 | 
| 
 | 
    33  | 
"assoc [] x = None" |
  | 
| 
 | 
    34  | 
"assoc (p#ps) x =
  | 
| 
 | 
    35  | 
   (let (a,b) = p in if a=x then Some b else assoc ps x)"
  | 
| 
8745
 | 
    36  | 
  | 
| 
 | 
    37  | 
text{*
 | 
| 
 | 
    38  | 
Now we can define the lookup function for tries. It descends into the trie
  | 
| 
 | 
    39  | 
examining the letters of the search string one by one. As
  | 
| 
 | 
    40  | 
recursion on lists is simpler than on tries, let us express this as primitive
  | 
| 
 | 
    41  | 
recursion on the search string argument:
  | 
| 
58860
 | 
    42  | 
*}
  | 
| 
8745
 | 
    43  | 
  | 
| 
27015
 | 
    44  | 
primrec lookup :: "('a,'v)trie \<Rightarrow> 'a list \<Rightarrow> 'v option" where
 | 
| 
 | 
    45  | 
"lookup t [] = value t" |
  | 
| 
 | 
    46  | 
"lookup t (a#as) = (case assoc (alist t) a of
  | 
| 
 | 
    47  | 
                      None \<Rightarrow> None
  | 
| 
 | 
    48  | 
                    | Some at \<Rightarrow> lookup at as)"
  | 
| 
8745
 | 
    49  | 
  | 
| 
 | 
    50  | 
text{*
 | 
| 
 | 
    51  | 
As a first simple property we prove that looking up a string in the empty
  | 
| 
15905
 | 
    52  | 
trie @{term"Trie None []"} always returns @{const None}. The proof merely
 | 
| 
8745
 | 
    53  | 
distinguishes the two cases whether the search string is empty or not:
  | 
| 
58860
 | 
    54  | 
*}
  | 
| 
8745
 | 
    55  | 
  | 
| 
58860
 | 
    56  | 
lemma [simp]: "lookup (Trie None []) as = None"
  | 
| 
 | 
    57  | 
apply(case_tac as, simp_all)
  | 
| 
10171
 | 
    58  | 
done
  | 
| 
8745
 | 
    59  | 
  | 
| 
 | 
    60  | 
text{*
 | 
| 
 | 
    61  | 
Things begin to get interesting with the definition of an update function
  | 
| 
12328
 | 
    62  | 
that adds a new (string, value) pair to a trie, overwriting the old value
  | 
| 
8745
 | 
    63  | 
associated with that string:
  | 
| 
58860
 | 
    64  | 
*}
  | 
| 
8745
 | 
    65  | 
  | 
| 
27015
 | 
    66  | 
primrec update:: "('a,'v)trie \<Rightarrow> 'a list \<Rightarrow> 'v \<Rightarrow> ('a,'v)trie" where
 | 
| 
 | 
    67  | 
"update t []     v = Trie (Some v) (alist t)" |
  | 
| 
 | 
    68  | 
"update t (a#as) v =
  | 
| 
 | 
    69  | 
   (let tt = (case assoc (alist t) a of
  | 
| 
 | 
    70  | 
                None \<Rightarrow> Trie None [] | Some at \<Rightarrow> at)
  | 
| 
 | 
    71  | 
    in Trie (value t) ((a,update tt as v) # alist t))"
  | 
| 
8745
 | 
    72  | 
  | 
| 
 | 
    73  | 
text{*\noindent
 | 
| 
 | 
    74  | 
The base case is obvious. In the recursive case the subtrie
  | 
| 
10171
 | 
    75  | 
@{term tt} associated with the first letter @{term a} is extracted,
 | 
| 
8745
 | 
    76  | 
recursively updated, and then placed in front of the association list.
  | 
| 
10171
 | 
    77  | 
The old subtrie associated with @{term a} is still in the association list
 | 
| 
15905
 | 
    78  | 
but no longer accessible via @{const assoc}. Clearly, there is room here for
 | 
| 
8745
 | 
    79  | 
optimizations!
  | 
| 
 | 
    80  | 
  | 
| 
15905
 | 
    81  | 
Before we start on any proofs about @{const update} we tell the simplifier to
 | 
| 
10171
 | 
    82  | 
expand all @{text let}s and to split all @{text case}-constructs over
 | 
| 
8745
 | 
    83  | 
options:
  | 
| 
58860
 | 
    84  | 
*}
  | 
| 
8745
 | 
    85  | 
  | 
| 
9933
 | 
    86  | 
declare Let_def[simp] option.split[split]
  | 
| 
8745
 | 
    87  | 
  | 
| 
 | 
    88  | 
text{*\noindent
 | 
| 
 | 
    89  | 
The reason becomes clear when looking (probably after a failed proof
  | 
| 
15905
 | 
    90  | 
attempt) at the body of @{const update}: it contains both
 | 
| 
10171
 | 
    91  | 
@{text let} and a case distinction over type @{text option}.
 | 
| 
8745
 | 
    92  | 
  | 
| 
15905
 | 
    93  | 
Our main goal is to prove the correct interaction of @{const update} and
 | 
| 
 | 
    94  | 
@{const lookup}:
 | 
| 
58860
 | 
    95  | 
*}
  | 
| 
8745
 | 
    96  | 
  | 
| 
10171
 | 
    97  | 
theorem "\<forall>t v bs. lookup (update t as v) bs =
  | 
| 
58860
 | 
    98  | 
                    (if as=bs then Some v else lookup t bs)"
  | 
| 
8745
 | 
    99  | 
  | 
| 
 | 
   100  | 
txt{*\noindent
 | 
| 
10171
 | 
   101  | 
Our plan is to induct on @{term as}; hence the remaining variables are
 | 
| 
8745
 | 
   102  | 
quantified. From the definitions it is clear that induction on either
  | 
| 
11458
 | 
   103  | 
@{term as} or @{term bs} is required. The choice of @{term as} is 
 | 
| 
15905
 | 
   104  | 
guided by the intuition that simplification of @{const lookup} might be easier
 | 
| 
 | 
   105  | 
if @{const update} has already been simplified, which can only happen if
 | 
| 
10171
 | 
   106  | 
@{term as} is instantiated.
 | 
| 
11458
 | 
   107  | 
The start of the proof is conventional:
  | 
| 
58860
 | 
   108  | 
*}
  | 
| 
 | 
   109  | 
apply(induct_tac as, auto)
  | 
| 
8745
 | 
   110  | 
  | 
| 
 | 
   111  | 
txt{*\noindent
 | 
| 
 | 
   112  | 
Unfortunately, this time we are left with three intimidating looking subgoals:
  | 
| 
9723
 | 
   113  | 
\begin{isabelle}
 | 
| 
8745
 | 
   114  | 
~1.~\dots~{\isasymLongrightarrow}~lookup~\dots~bs~=~lookup~t~bs\isanewline
 | 
| 
 | 
   115  | 
~2.~\dots~{\isasymLongrightarrow}~lookup~\dots~bs~=~lookup~t~bs\isanewline
 | 
| 
9792
 | 
   116  | 
~3.~\dots~{\isasymLongrightarrow}~lookup~\dots~bs~=~lookup~t~bs
 | 
| 
9723
 | 
   117  | 
\end{isabelle}
 | 
| 
10171
 | 
   118  | 
Clearly, if we want to make headway we have to instantiate @{term bs} as
 | 
| 
8745
 | 
   119  | 
well now. It turns out that instead of induction, case distinction
  | 
| 
 | 
   120  | 
suffices:
  | 
| 
58860
 | 
   121  | 
*}
  | 
| 
 | 
   122  | 
apply(case_tac[!] bs, auto)
  | 
| 
10171
 | 
   123  | 
done
  | 
| 
8745
 | 
   124  | 
  | 
| 
 | 
   125  | 
text{*\noindent
 | 
| 
11458
 | 
   126  | 
\index{subgoal numbering}%
 | 
| 
10171
 | 
   127  | 
All methods ending in @{text tac} take an optional first argument that
 | 
| 
9792
 | 
   128  | 
specifies the range of subgoals they are applied to, where @{text"[!]"} means
 | 
| 
 | 
   129  | 
all subgoals, i.e.\ @{text"[1-3]"} in our case. Individual subgoal numbers,
 | 
| 
 | 
   130  | 
e.g. @{text"[2]"} are also allowed.
 | 
| 
8745
 | 
   131  | 
  | 
| 
 | 
   132  | 
This proof may look surprisingly straightforward. However, note that this
  | 
| 
9792
 | 
   133  | 
comes at a cost: the proof script is unreadable because the intermediate
  | 
| 
 | 
   134  | 
proof states are invisible, and we rely on the (possibly brittle) magic of
  | 
| 
10971
 | 
   135  | 
@{text auto} (@{text simp_all} will not do --- try it) to split the subgoals
 | 
| 
10171
 | 
   136  | 
of the induction up in such a way that case distinction on @{term bs} makes
 | 
| 
11458
 | 
   137  | 
sense and solves the proof. 
  | 
| 
11294
 | 
   138  | 
  | 
| 
 | 
   139  | 
\begin{exercise}
 | 
| 
15905
 | 
   140  | 
  Modify @{const update} (and its type) such that it allows both insertion and
 | 
| 
11303
 | 
   141  | 
  deletion of entries with a single function.  Prove the corresponding version 
  | 
| 
 | 
   142  | 
  of the main theorem above.
  | 
| 
 | 
   143  | 
  Optimize your function such that it shrinks tries after
  | 
| 
 | 
   144  | 
  deletion if possible.
  | 
| 
11294
 | 
   145  | 
\end{exercise}
 | 
| 
 | 
   146  | 
  | 
| 
 | 
   147  | 
\begin{exercise}
 | 
| 
15905
 | 
   148  | 
  Write an improved version of @{const update} that does not suffer from the
 | 
| 
11294
 | 
   149  | 
  space leak (pointed out above) caused by not deleting overwritten entries
  | 
| 
 | 
   150  | 
  from the association list. Prove the main theorem for your improved
  | 
| 
15905
 | 
   151  | 
  @{const update}.
 | 
| 
11294
 | 
   152  | 
\end{exercise}
 | 
| 
 | 
   153  | 
  | 
| 
 | 
   154  | 
\begin{exercise}
 | 
| 
 | 
   155  | 
  Conceptually, each node contains a mapping from letters to optional
  | 
| 
 | 
   156  | 
  subtries. Above we have implemented this by means of an association
  | 
| 
 | 
   157  | 
  list. Replay the development replacing @{typ "('a * ('a,'v)trie)list"}
 | 
| 
 | 
   158  | 
  with @{typ"'a \<Rightarrow> ('a,'v)trie option"}.
 | 
| 
 | 
   159  | 
\end{exercise}
 | 
| 
 | 
   160  | 
  | 
| 
58860
 | 
   161  | 
*}
  | 
| 
8745
 | 
   162  | 
  | 
| 
 | 
   163  | 
(*<*)
  | 
| 
11294
 | 
   164  | 
  | 
| 
 | 
   165  | 
(* Exercise 1. Solution by Getrud Bauer *)
  | 
| 
 | 
   166  | 
  | 
| 
27015
 | 
   167  | 
primrec update1 :: "('a, 'v) trie \<Rightarrow> 'a list \<Rightarrow> 'v option \<Rightarrow> ('a, 'v) trie"
 | 
| 
 | 
   168  | 
where
  | 
| 
 | 
   169  | 
  "update1 t []     vo = Trie vo (alist t)" |
  | 
| 
11294
 | 
   170  | 
  "update1 t (a#as) vo =
  | 
| 
 | 
   171  | 
     (let tt = (case assoc (alist t) a of
  | 
| 
 | 
   172  | 
                  None \<Rightarrow> Trie None [] 
  | 
| 
 | 
   173  | 
                | Some at \<Rightarrow> at)
  | 
| 
 | 
   174  | 
      in Trie (value t) ((a, update1 tt as vo) # alist t))"
  | 
| 
 | 
   175  | 
  | 
| 
 | 
   176  | 
theorem [simp]: "\<forall>t v bs. lookup (update1 t as v) bs =
  | 
| 
58860
 | 
   177  | 
                    (if as = bs then v else lookup t bs)"
  | 
| 
 | 
   178  | 
apply (induct_tac as, auto)
  | 
| 
 | 
   179  | 
apply (case_tac[!] bs, auto)
  | 
| 
11294
 | 
   180  | 
done
  | 
| 
 | 
   181  | 
  | 
| 
 | 
   182  | 
  | 
| 
 | 
   183  | 
(* Exercise 2. Solution by Getrud Bauer *)
  | 
| 
 | 
   184  | 
  | 
| 
27015
 | 
   185  | 
primrec overwrite :: "'a \<Rightarrow> 'b \<Rightarrow> ('a * 'b) list \<Rightarrow> ('a * 'b) list" where
 | 
| 
 | 
   186  | 
"overwrite a v [] = [(a,v)]" |
  | 
| 
 | 
   187  | 
"overwrite a v (p#ps) = (if a = fst p then (a,v)#ps else p # overwrite a v ps)"
  | 
| 
11294
 | 
   188  | 
  | 
| 
 | 
   189  | 
lemma [simp]: "\<forall> a v b. assoc (overwrite a v ps) b = assoc ((a,v)#ps) b"
  | 
| 
 | 
   190  | 
apply (induct_tac ps, auto)
  | 
| 
 | 
   191  | 
apply (case_tac[!] a)
  | 
| 
 | 
   192  | 
done
  | 
| 
 | 
   193  | 
  | 
| 
27015
 | 
   194  | 
primrec update2 :: "('a, 'v) trie \<Rightarrow> 'a list \<Rightarrow> 'v option \<Rightarrow> ('a, 'v) trie"
 | 
| 
 | 
   195  | 
where
  | 
| 
 | 
   196  | 
  "update2 t []     vo = Trie vo (alist t)" |
  | 
| 
11294
 | 
   197  | 
  "update2 t (a#as) vo =
  | 
| 
 | 
   198  | 
     (let tt = (case assoc (alist t) a of 
  | 
| 
 | 
   199  | 
                  None \<Rightarrow> Trie None []  
  | 
| 
 | 
   200  | 
                | Some at \<Rightarrow> at) 
  | 
| 
58860
 | 
   201  | 
      in Trie (value t) (overwrite a (update2 tt as vo) (alist t)))" 
  | 
| 
11294
 | 
   202  | 
  | 
| 
 | 
   203  | 
theorem "\<forall>t v bs. lookup (update2 t as vo) bs =
  | 
| 
58860
 | 
   204  | 
                    (if as = bs then vo else lookup t bs)"
  | 
| 
 | 
   205  | 
apply (induct_tac as, auto)
  | 
| 
 | 
   206  | 
apply (case_tac[!] bs, auto)
  | 
| 
11294
 | 
   207  | 
done
  | 
| 
 | 
   208  | 
  | 
| 
 | 
   209  | 
  | 
| 
 | 
   210  | 
(* Exercise 3. Solution by Getrud Bauer *)
  | 
| 
58860
 | 
   211  | 
datatype ('a,dead 'v) triem = Triem  "'v option" "'a \<Rightarrow> ('a,'v) triem option"
 | 
| 
11294
 | 
   212  | 
  | 
| 
27015
 | 
   213  | 
primrec valuem :: "('a, 'v) triem \<Rightarrow> 'v option" where
 | 
| 
 | 
   214  | 
"valuem (Triem ov m) = ov"
  | 
| 
11294
 | 
   215  | 
  | 
| 
27015
 | 
   216  | 
primrec mapping :: "('a,'v) triem \<Rightarrow> 'a \<Rightarrow> ('a, 'v) triem option" where
 | 
| 
 | 
   217  | 
"mapping (Triem ov m) = m"
  | 
| 
11294
 | 
   218  | 
  | 
| 
27015
 | 
   219  | 
primrec lookupm :: "('a,'v) triem \<Rightarrow> 'a list \<Rightarrow> 'v option" where
 | 
| 
 | 
   220  | 
  "lookupm t [] = valuem t" |
  | 
| 
11294
 | 
   221  | 
  "lookupm t (a#as) = (case mapping t a of
  | 
| 
 | 
   222  | 
                        None \<Rightarrow> None
  | 
| 
58860
 | 
   223  | 
                      | Some at \<Rightarrow> lookupm at as)"
  | 
| 
11294
 | 
   224  | 
  | 
| 
58860
 | 
   225  | 
lemma [simp]: "lookupm (Triem None  (\<lambda>c. None)) as = None"
  | 
| 
 | 
   226  | 
apply (case_tac as, simp_all)
  | 
| 
11294
 | 
   227  | 
done
  | 
| 
 | 
   228  | 
  | 
| 
27015
 | 
   229  | 
primrec updatem :: "('a,'v)triem \<Rightarrow> 'a list \<Rightarrow> 'v \<Rightarrow> ('a,'v)triem" where
 | 
| 
 | 
   230  | 
  "updatem t []     v = Triem (Some v) (mapping t)" |
  | 
| 
11294
 | 
   231  | 
  "updatem t (a#as) v =
  | 
| 
 | 
   232  | 
     (let tt = (case mapping t a of
  | 
| 
 | 
   233  | 
                  None \<Rightarrow> Triem None (\<lambda>c. None) 
  | 
| 
 | 
   234  | 
                | Some at \<Rightarrow> at)
  | 
| 
 | 
   235  | 
      in Triem (valuem t) 
  | 
| 
58860
 | 
   236  | 
              (\<lambda>c. if c = a then Some (updatem tt as v) else mapping t c))"
  | 
| 
11294
 | 
   237  | 
  | 
| 
 | 
   238  | 
theorem "\<forall>t v bs. lookupm (updatem t as v) bs = 
  | 
| 
58860
 | 
   239  | 
                    (if as = bs then Some v else lookupm t bs)"
  | 
| 
 | 
   240  | 
apply (induct_tac as, auto)
  | 
| 
 | 
   241  | 
apply (case_tac[!] bs, auto)
  | 
| 
11294
 | 
   242  | 
done
  | 
| 
 | 
   243  | 
  | 
| 
58860
 | 
   244  | 
end
  | 
| 
8745
 | 
   245  | 
(*>*)
  |