| author | wenzelm |
| Thu, 16 Oct 2014 12:09:57 +0200 | |
| changeset 58694 | 983e98da2a42 |
| parent 58305 | 57752a91eec4 |
| child 58860 | fee7cfa69c50 |
| permissions | -rw-r--r-- |
| 8745 | 1 |
(*<*) |
| 16417 | 2 |
theory Trie imports Main begin; |
| 8745 | 3 |
(*>*) |
4 |
text{*
|
|
5 |
To minimize running time, each node of a trie should contain an array that maps |
|
| 10795 | 6 |
letters to subtries. We have chosen a |
| 8745 | 7 |
representation where the subtries are held in an association list, i.e.\ a |
| 9792 | 8 |
list of (letter,trie) pairs. Abstracting over the alphabet @{typ"'a"} and the
|
9 |
values @{typ"'v"} we define a trie as follows:
|
|
| 9458 | 10 |
*}; |
| 8745 | 11 |
|
12 |
datatype ('a,'v)trie = Trie "'v option" "('a * ('a,'v)trie)list";
|
|
13 |
||
14 |
text{*\noindent
|
|
| 11458 | 15 |
\index{datatypes!and nested recursion}%
|
| 8745 | 16 |
The first component is the optional value, the second component the |
17 |
association list of subtries. This is an example of nested recursion involving products, |
|
18 |
which is fine because products are datatypes as well. |
|
19 |
We define two selector functions: |
|
| 9458 | 20 |
*}; |
| 8745 | 21 |
|
| 27015 | 22 |
primrec "value" :: "('a,'v)trie \<Rightarrow> 'v option" where
|
23 |
"value(Trie ov al) = ov" |
|
24 |
primrec alist :: "('a,'v)trie \<Rightarrow> ('a * ('a,'v)trie)list" where
|
|
25 |
"alist(Trie ov al) = al" |
|
| 8745 | 26 |
|
27 |
text{*\noindent
|
|
| 11309 | 28 |
Association lists come with a generic lookup function. Its result |
29 |
involves type @{text option} because a lookup can fail:
|
|
| 9458 | 30 |
*}; |
| 8745 | 31 |
|
| 27015 | 32 |
primrec assoc :: "('key * 'val)list \<Rightarrow> 'key \<Rightarrow> 'val option" where
|
33 |
"assoc [] x = None" | |
|
34 |
"assoc (p#ps) x = |
|
35 |
(let (a,b) = p in if a=x then Some b else assoc ps x)" |
|
| 8745 | 36 |
|
37 |
text{*
|
|
38 |
Now we can define the lookup function for tries. It descends into the trie |
|
39 |
examining the letters of the search string one by one. As |
|
40 |
recursion on lists is simpler than on tries, let us express this as primitive |
|
41 |
recursion on the search string argument: |
|
| 9458 | 42 |
*}; |
| 8745 | 43 |
|
| 27015 | 44 |
primrec lookup :: "('a,'v)trie \<Rightarrow> 'a list \<Rightarrow> 'v option" where
|
45 |
"lookup t [] = value t" | |
|
46 |
"lookup t (a#as) = (case assoc (alist t) a of |
|
47 |
None \<Rightarrow> None |
|
48 |
| Some at \<Rightarrow> lookup at as)" |
|
| 8745 | 49 |
|
50 |
text{*
|
|
51 |
As a first simple property we prove that looking up a string in the empty |
|
| 15905 | 52 |
trie @{term"Trie None []"} always returns @{const None}. The proof merely
|
| 8745 | 53 |
distinguishes the two cases whether the search string is empty or not: |
| 9458 | 54 |
*}; |
| 8745 | 55 |
|
56 |
lemma [simp]: "lookup (Trie None []) as = None"; |
|
| 10171 | 57 |
apply(case_tac as, simp_all); |
58 |
done |
|
| 8745 | 59 |
|
60 |
text{*
|
|
61 |
Things begin to get interesting with the definition of an update function |
|
| 12328 | 62 |
that adds a new (string, value) pair to a trie, overwriting the old value |
| 8745 | 63 |
associated with that string: |
| 9458 | 64 |
*}; |
| 8745 | 65 |
|
| 27015 | 66 |
primrec update:: "('a,'v)trie \<Rightarrow> 'a list \<Rightarrow> 'v \<Rightarrow> ('a,'v)trie" where
|
67 |
"update t [] v = Trie (Some v) (alist t)" | |
|
68 |
"update t (a#as) v = |
|
69 |
(let tt = (case assoc (alist t) a of |
|
70 |
None \<Rightarrow> Trie None [] | Some at \<Rightarrow> at) |
|
71 |
in Trie (value t) ((a,update tt as v) # alist t))" |
|
| 8745 | 72 |
|
73 |
text{*\noindent
|
|
74 |
The base case is obvious. In the recursive case the subtrie |
|
| 10171 | 75 |
@{term tt} associated with the first letter @{term a} is extracted,
|
| 8745 | 76 |
recursively updated, and then placed in front of the association list. |
| 10171 | 77 |
The old subtrie associated with @{term a} is still in the association list
|
| 15905 | 78 |
but no longer accessible via @{const assoc}. Clearly, there is room here for
|
| 8745 | 79 |
optimizations! |
80 |
||
| 15905 | 81 |
Before we start on any proofs about @{const update} we tell the simplifier to
|
| 10171 | 82 |
expand all @{text let}s and to split all @{text case}-constructs over
|
| 8745 | 83 |
options: |
| 9458 | 84 |
*}; |
| 8745 | 85 |
|
| 9933 | 86 |
declare Let_def[simp] option.split[split] |
| 8745 | 87 |
|
88 |
text{*\noindent
|
|
89 |
The reason becomes clear when looking (probably after a failed proof |
|
| 15905 | 90 |
attempt) at the body of @{const update}: it contains both
|
| 10171 | 91 |
@{text let} and a case distinction over type @{text option}.
|
| 8745 | 92 |
|
| 15905 | 93 |
Our main goal is to prove the correct interaction of @{const update} and
|
94 |
@{const lookup}:
|
|
| 9458 | 95 |
*}; |
| 8745 | 96 |
|
| 10171 | 97 |
theorem "\<forall>t v bs. lookup (update t as v) bs = |
| 8745 | 98 |
(if as=bs then Some v else lookup t bs)"; |
99 |
||
100 |
txt{*\noindent
|
|
| 10171 | 101 |
Our plan is to induct on @{term as}; hence the remaining variables are
|
| 8745 | 102 |
quantified. From the definitions it is clear that induction on either |
| 11458 | 103 |
@{term as} or @{term bs} is required. The choice of @{term as} is
|
| 15905 | 104 |
guided by the intuition that simplification of @{const lookup} might be easier
|
105 |
if @{const update} has already been simplified, which can only happen if
|
|
| 10171 | 106 |
@{term as} is instantiated.
|
| 11458 | 107 |
The start of the proof is conventional: |
| 9458 | 108 |
*}; |
| 8771 | 109 |
apply(induct_tac as, auto); |
| 8745 | 110 |
|
111 |
txt{*\noindent
|
|
112 |
Unfortunately, this time we are left with three intimidating looking subgoals: |
|
| 9723 | 113 |
\begin{isabelle}
|
| 8745 | 114 |
~1.~\dots~{\isasymLongrightarrow}~lookup~\dots~bs~=~lookup~t~bs\isanewline
|
115 |
~2.~\dots~{\isasymLongrightarrow}~lookup~\dots~bs~=~lookup~t~bs\isanewline
|
|
| 9792 | 116 |
~3.~\dots~{\isasymLongrightarrow}~lookup~\dots~bs~=~lookup~t~bs
|
| 9723 | 117 |
\end{isabelle}
|
| 10171 | 118 |
Clearly, if we want to make headway we have to instantiate @{term bs} as
|
| 8745 | 119 |
well now. It turns out that instead of induction, case distinction |
120 |
suffices: |
|
| 9458 | 121 |
*}; |
| 10171 | 122 |
apply(case_tac[!] bs, auto); |
123 |
done |
|
| 8745 | 124 |
|
125 |
text{*\noindent
|
|
| 11458 | 126 |
\index{subgoal numbering}%
|
| 10171 | 127 |
All methods ending in @{text tac} take an optional first argument that
|
| 9792 | 128 |
specifies the range of subgoals they are applied to, where @{text"[!]"} means
|
129 |
all subgoals, i.e.\ @{text"[1-3]"} in our case. Individual subgoal numbers,
|
|
130 |
e.g. @{text"[2]"} are also allowed.
|
|
| 8745 | 131 |
|
132 |
This proof may look surprisingly straightforward. However, note that this |
|
| 9792 | 133 |
comes at a cost: the proof script is unreadable because the intermediate |
134 |
proof states are invisible, and we rely on the (possibly brittle) magic of |
|
| 10971 | 135 |
@{text auto} (@{text simp_all} will not do --- try it) to split the subgoals
|
| 10171 | 136 |
of the induction up in such a way that case distinction on @{term bs} makes
|
| 11458 | 137 |
sense and solves the proof. |
| 11294 | 138 |
|
139 |
\begin{exercise}
|
|
| 15905 | 140 |
Modify @{const update} (and its type) such that it allows both insertion and
|
| 11303 | 141 |
deletion of entries with a single function. Prove the corresponding version |
142 |
of the main theorem above. |
|
143 |
Optimize your function such that it shrinks tries after |
|
144 |
deletion if possible. |
|
| 11294 | 145 |
\end{exercise}
|
146 |
||
147 |
\begin{exercise}
|
|
| 15905 | 148 |
Write an improved version of @{const update} that does not suffer from the
|
| 11294 | 149 |
space leak (pointed out above) caused by not deleting overwritten entries |
150 |
from the association list. Prove the main theorem for your improved |
|
| 15905 | 151 |
@{const update}.
|
| 11294 | 152 |
\end{exercise}
|
153 |
||
154 |
\begin{exercise}
|
|
155 |
Conceptually, each node contains a mapping from letters to optional |
|
156 |
subtries. Above we have implemented this by means of an association |
|
157 |
list. Replay the development replacing @{typ "('a * ('a,'v)trie)list"}
|
|
158 |
with @{typ"'a \<Rightarrow> ('a,'v)trie option"}.
|
|
159 |
\end{exercise}
|
|
160 |
||
| 9458 | 161 |
*}; |
| 8745 | 162 |
|
163 |
(*<*) |
|
| 11294 | 164 |
|
165 |
(* Exercise 1. Solution by Getrud Bauer *) |
|
166 |
||
| 27015 | 167 |
primrec update1 :: "('a, 'v) trie \<Rightarrow> 'a list \<Rightarrow> 'v option \<Rightarrow> ('a, 'v) trie"
|
168 |
where |
|
169 |
"update1 t [] vo = Trie vo (alist t)" | |
|
| 11294 | 170 |
"update1 t (a#as) vo = |
171 |
(let tt = (case assoc (alist t) a of |
|
172 |
None \<Rightarrow> Trie None [] |
|
173 |
| Some at \<Rightarrow> at) |
|
174 |
in Trie (value t) ((a, update1 tt as vo) # alist t))" |
|
175 |
||
176 |
theorem [simp]: "\<forall>t v bs. lookup (update1 t as v) bs = |
|
177 |
(if as = bs then v else lookup t bs)"; |
|
178 |
apply (induct_tac as, auto); |
|
179 |
apply (case_tac[!] bs, auto); |
|
180 |
done |
|
181 |
||
182 |
||
183 |
(* Exercise 2. Solution by Getrud Bauer *) |
|
184 |
||
| 27015 | 185 |
primrec overwrite :: "'a \<Rightarrow> 'b \<Rightarrow> ('a * 'b) list \<Rightarrow> ('a * 'b) list" where
|
186 |
"overwrite a v [] = [(a,v)]" | |
|
187 |
"overwrite a v (p#ps) = (if a = fst p then (a,v)#ps else p # overwrite a v ps)" |
|
| 11294 | 188 |
|
189 |
lemma [simp]: "\<forall> a v b. assoc (overwrite a v ps) b = assoc ((a,v)#ps) b" |
|
190 |
apply (induct_tac ps, auto) |
|
191 |
apply (case_tac[!] a) |
|
192 |
done |
|
193 |
||
| 27015 | 194 |
primrec update2 :: "('a, 'v) trie \<Rightarrow> 'a list \<Rightarrow> 'v option \<Rightarrow> ('a, 'v) trie"
|
195 |
where |
|
196 |
"update2 t [] vo = Trie vo (alist t)" | |
|
| 11294 | 197 |
"update2 t (a#as) vo = |
198 |
(let tt = (case assoc (alist t) a of |
|
199 |
None \<Rightarrow> Trie None [] |
|
200 |
| Some at \<Rightarrow> at) |
|
201 |
in Trie (value t) (overwrite a (update2 tt as vo) (alist t)))"; |
|
202 |
||
203 |
theorem "\<forall>t v bs. lookup (update2 t as vo) bs = |
|
204 |
(if as = bs then vo else lookup t bs)"; |
|
205 |
apply (induct_tac as, auto); |
|
206 |
apply (case_tac[!] bs, auto); |
|
207 |
done |
|
208 |
||
209 |
||
210 |
(* Exercise 3. Solution by Getrud Bauer *) |
|
|
58305
57752a91eec4
renamed 'datatype' to 'old_datatype'; 'datatype' is now alias for 'datatype_new'
blanchet
parents:
48985
diff
changeset
|
211 |
datatype ('a,dead 'v) triem = Triem "'v option" "'a \<Rightarrow> ('a,'v) triem option";
|
| 11294 | 212 |
|
| 27015 | 213 |
primrec valuem :: "('a, 'v) triem \<Rightarrow> 'v option" where
|
214 |
"valuem (Triem ov m) = ov" |
|
| 11294 | 215 |
|
| 27015 | 216 |
primrec mapping :: "('a,'v) triem \<Rightarrow> 'a \<Rightarrow> ('a, 'v) triem option" where
|
217 |
"mapping (Triem ov m) = m" |
|
| 11294 | 218 |
|
| 27015 | 219 |
primrec lookupm :: "('a,'v) triem \<Rightarrow> 'a list \<Rightarrow> 'v option" where
|
220 |
"lookupm t [] = valuem t" | |
|
| 11294 | 221 |
"lookupm t (a#as) = (case mapping t a of |
222 |
None \<Rightarrow> None |
|
223 |
| Some at \<Rightarrow> lookupm at as)"; |
|
224 |
||
225 |
lemma [simp]: "lookupm (Triem None (\<lambda>c. None)) as = None"; |
|
226 |
apply (case_tac as, simp_all); |
|
227 |
done |
|
228 |
||
| 27015 | 229 |
primrec updatem :: "('a,'v)triem \<Rightarrow> 'a list \<Rightarrow> 'v \<Rightarrow> ('a,'v)triem" where
|
230 |
"updatem t [] v = Triem (Some v) (mapping t)" | |
|
| 11294 | 231 |
"updatem t (a#as) v = |
232 |
(let tt = (case mapping t a of |
|
233 |
None \<Rightarrow> Triem None (\<lambda>c. None) |
|
234 |
| Some at \<Rightarrow> at) |
|
235 |
in Triem (valuem t) |
|
236 |
(\<lambda>c. if c = a then Some (updatem tt as v) else mapping t c))"; |
|
237 |
||
238 |
theorem "\<forall>t v bs. lookupm (updatem t as v) bs = |
|
239 |
(if as = bs then Some v else lookupm t bs)"; |
|
240 |
apply (induct_tac as, auto); |
|
241 |
apply (case_tac[!] bs, auto); |
|
242 |
done |
|
243 |
||
| 9458 | 244 |
end; |
| 8745 | 245 |
(*>*) |