| author | wenzelm |
| Sat, 05 Jan 2019 14:50:36 +0100 | |
| changeset 69594 | 1d340f7f8dce |
| parent 69505 | cc2d676d5395 |
| child 69597 | ff784d5a5bfb |
| permissions | -rw-r--r-- |
| 38405 | 1 |
theory Refinement |
|
69422
472af2d7835d
clarified session dependencies: faster build_doc/build_release;
wenzelm
parents:
68484
diff
changeset
|
2 |
imports Setup |
| 38405 | 3 |
begin |
4 |
||
| 59377 | 5 |
section \<open>Program and datatype refinement \label{sec:refinement}\<close>
|
| 38405 | 6 |
|
| 59377 | 7 |
text \<open> |
| 38451 | 8 |
Code generation by shallow embedding (cf.~\secref{sec:principle})
|
9 |
allows to choose code equations and datatype constructors freely, |
|
10 |
given that some very basic syntactic properties are met; this |
|
11 |
flexibility opens up mechanisms for refinement which allow to extend |
|
12 |
the scope and quality of generated code dramatically. |
|
| 59377 | 13 |
\<close> |
| 38451 | 14 |
|
15 |
||
| 59377 | 16 |
subsection \<open>Program refinement\<close> |
| 38451 | 17 |
|
| 59377 | 18 |
text \<open> |
| 38451 | 19 |
Program refinement works by choosing appropriate code equations |
| 40352 | 20 |
explicitly (cf.~\secref{sec:equations}); as example, we use Fibonacci
|
| 38451 | 21 |
numbers: |
| 59377 | 22 |
\<close> |
| 38451 | 23 |
|
24 |
fun %quote fib :: "nat \<Rightarrow> nat" where |
|
25 |
"fib 0 = 0" |
|
26 |
| "fib (Suc 0) = Suc 0" |
|
27 |
| "fib (Suc (Suc n)) = fib n + fib (Suc n)" |
|
28 |
||
| 59377 | 29 |
text \<open> |
| 38451 | 30 |
\noindent The runtime of the corresponding code grows exponential due |
31 |
to two recursive calls: |
|
| 59377 | 32 |
\<close> |
| 38451 | 33 |
|
| 59377 | 34 |
text %quotetypewriter \<open> |
| 39683 | 35 |
@{code_stmts fib (consts) fib (Haskell)}
|
| 59377 | 36 |
\<close> |
| 38451 | 37 |
|
| 59377 | 38 |
text \<open> |
| 38451 | 39 |
\noindent A more efficient implementation would use dynamic |
40 |
programming, e.g.~sharing of common intermediate results between |
|
41 |
recursive calls. This idea is expressed by an auxiliary operation |
|
42 |
which computes a Fibonacci number and its successor simultaneously: |
|
| 59377 | 43 |
\<close> |
| 38451 | 44 |
|
45 |
definition %quote fib_step :: "nat \<Rightarrow> nat \<times> nat" where |
|
46 |
"fib_step n = (fib (Suc n), fib n)" |
|
47 |
||
| 59377 | 48 |
text \<open> |
| 38451 | 49 |
\noindent This operation can be implemented by recursion using |
50 |
dynamic programming: |
|
| 59377 | 51 |
\<close> |
| 38451 | 52 |
|
53 |
lemma %quote [code]: |
|
54 |
"fib_step 0 = (Suc 0, 0)" |
|
55 |
"fib_step (Suc n) = (let (m, q) = fib_step n in (m + q, m))" |
|
56 |
by (simp_all add: fib_step_def) |
|
57 |
||
| 59377 | 58 |
text \<open> |
| 38451 | 59 |
\noindent What remains is to implement @{const fib} by @{const
|
60 |
fib_step} as follows: |
|
| 59377 | 61 |
\<close> |
| 38451 | 62 |
|
63 |
lemma %quote [code]: |
|
64 |
"fib 0 = 0" |
|
65 |
"fib (Suc n) = fst (fib_step n)" |
|
66 |
by (simp_all add: fib_step_def) |
|
67 |
||
| 59377 | 68 |
text \<open> |
| 38451 | 69 |
\noindent The resulting code shows only linear growth of runtime: |
| 59377 | 70 |
\<close> |
| 38451 | 71 |
|
| 59377 | 72 |
text %quotetypewriter \<open> |
| 39683 | 73 |
@{code_stmts fib (consts) fib fib_step (Haskell)}
|
| 59377 | 74 |
\<close> |
| 38451 | 75 |
|
76 |
||
| 59377 | 77 |
subsection \<open>Datatype refinement\<close> |
| 38437 | 78 |
|
| 59377 | 79 |
text \<open> |
| 38459 | 80 |
Selecting specific code equations \emph{and} datatype constructors
|
81 |
leads to datatype refinement. As an example, we will develop an |
|
82 |
alternative representation of the queue example given in |
|
83 |
\secref{sec:queue_example}. The amortised representation is
|
|
84 |
convenient for generating code but exposes its \qt{implementation}
|
|
85 |
details, which may be cumbersome when proving theorems about it. |
|
86 |
Therefore, here is a simple, straightforward representation of |
|
87 |
queues: |
|
| 59377 | 88 |
\<close> |
| 38437 | 89 |
|
| 58310 | 90 |
datatype %quote 'a queue = Queue "'a list" |
| 38437 | 91 |
|
92 |
definition %quote empty :: "'a queue" where |
|
93 |
"empty = Queue []" |
|
94 |
||
95 |
primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where |
|
96 |
"enqueue x (Queue xs) = Queue (xs @ [x])" |
|
97 |
||
98 |
fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where |
|
99 |
"dequeue (Queue []) = (None, Queue [])" |
|
100 |
| "dequeue (Queue (x # xs)) = (Some x, Queue xs)" |
|
101 |
||
| 59377 | 102 |
text \<open> |
| 38437 | 103 |
\noindent This we can use directly for proving; for executing, |
104 |
we provide an alternative characterisation: |
|
| 59377 | 105 |
\<close> |
| 38437 | 106 |
|
107 |
definition %quote AQueue :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a queue" where |
|
108 |
"AQueue xs ys = Queue (ys @ rev xs)" |
|
109 |
||
110 |
code_datatype %quote AQueue |
|
111 |
||
| 59377 | 112 |
text \<open> |
| 38437 | 113 |
\noindent Here we define a \qt{constructor} @{const "AQueue"} which
|
| 69505 | 114 |
is defined in terms of \<open>Queue\<close> and interprets its arguments |
| 38437 | 115 |
according to what the \emph{content} of an amortised queue is supposed
|
| 38459 | 116 |
to be. |
117 |
||
118 |
The prerequisite for datatype constructors is only syntactical: a |
|
| 69505 | 119 |
constructor must be of type \<open>\<tau> = \<dots> \<Rightarrow> \<kappa> \<alpha>\<^sub>1 \<dots> \<alpha>\<^sub>n\<close> where \<open>{\<alpha>\<^sub>1, \<dots>, \<alpha>\<^sub>n}\<close> is exactly the set of \emph{all} type variables in
|
120 |
\<open>\<tau>\<close>; then \<open>\<kappa>\<close> is its corresponding datatype. The |
|
| 38459 | 121 |
HOL datatype package by default registers any new datatype with its |
| 38511 | 122 |
constructors, but this may be changed using @{command_def
|
| 38459 | 123 |
code_datatype}; the currently chosen constructors can be inspected |
124 |
using the @{command print_codesetup} command.
|
|
125 |
||
126 |
Equipped with this, we are able to prove the following equations |
|
| 38437 | 127 |
for our primitive queue operations which \qt{implement} the simple
|
128 |
queues in an amortised fashion: |
|
| 59377 | 129 |
\<close> |
| 38437 | 130 |
|
131 |
lemma %quote empty_AQueue [code]: |
|
132 |
"empty = AQueue [] []" |
|
| 40754 | 133 |
by (simp add: AQueue_def empty_def) |
| 38437 | 134 |
|
135 |
lemma %quote enqueue_AQueue [code]: |
|
136 |
"enqueue x (AQueue xs ys) = AQueue (x # xs) ys" |
|
| 40754 | 137 |
by (simp add: AQueue_def) |
| 38437 | 138 |
|
139 |
lemma %quote dequeue_AQueue [code]: |
|
140 |
"dequeue (AQueue xs []) = |
|
141 |
(if xs = [] then (None, AQueue [] []) |
|
142 |
else dequeue (AQueue [] (rev xs)))" |
|
143 |
"dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)" |
|
| 40754 | 144 |
by (simp_all add: AQueue_def) |
| 38437 | 145 |
|
| 59377 | 146 |
text \<open> |
| 40754 | 147 |
\noindent It is good style, although no absolute requirement, to |
148 |
provide code equations for the original artefacts of the implemented |
|
149 |
type, if possible; in our case, these are the datatype constructor |
|
| 55422 | 150 |
@{const Queue} and the case combinator @{const case_queue}:
|
| 59377 | 151 |
\<close> |
| 38437 | 152 |
|
| 40754 | 153 |
lemma %quote Queue_AQueue [code]: |
154 |
"Queue = AQueue []" |
|
155 |
by (simp add: AQueue_def fun_eq_iff) |
|
156 |
||
| 55422 | 157 |
lemma %quote case_queue_AQueue [code]: |
158 |
"case_queue f (AQueue xs ys) = f (ys @ rev xs)" |
|
| 40754 | 159 |
by (simp add: AQueue_def) |
| 38437 | 160 |
|
| 59377 | 161 |
text \<open> |
| 38437 | 162 |
\noindent The resulting code looks as expected: |
| 59377 | 163 |
\<close> |
| 38437 | 164 |
|
| 59377 | 165 |
text %quotetypewriter \<open> |
| 55422 | 166 |
@{code_stmts empty enqueue dequeue Queue case_queue (SML)}
|
| 59377 | 167 |
\<close> |
| 38437 | 168 |
|
| 59377 | 169 |
text \<open> |
| 38459 | 170 |
The same techniques can also be applied to types which are not |
171 |
specified as datatypes, e.g.~type @{typ int} is originally specified
|
|
| 38511 | 172 |
as quotient type by means of @{command_def typedef}, but for code
|
| 38459 | 173 |
generation constants allowing construction of binary numeral values |
174 |
are used as constructors for @{typ int}.
|
|
| 38437 | 175 |
|
| 38459 | 176 |
This approach however fails if the representation of a type demands |
177 |
invariants; this issue is discussed in the next section. |
|
| 59377 | 178 |
\<close> |
| 38459 | 179 |
|
| 38437 | 180 |
|
| 59377 | 181 |
subsection \<open>Datatype refinement involving invariants \label{sec:invariant}\<close>
|
| 38437 | 182 |
|
| 59377 | 183 |
text \<open> |
| 38502 | 184 |
Datatype representation involving invariants require a dedicated |
185 |
setup for the type and its primitive operations. As a running |
|
| 66405 | 186 |
example, we implement a type @{typ "'a dlist"} of lists consisting
|
| 38502 | 187 |
of distinct elements. |
188 |
||
| 66405 | 189 |
The specification of @{typ "'a dlist"} itself can be found in theory
|
| 68484 | 190 |
@{theory "HOL-Library.Dlist"}.
|
| 66405 | 191 |
|
| 38502 | 192 |
The first step is to decide on which representation the abstract |
| 66405 | 193 |
type (in our example @{typ "'a dlist"}) should be implemented.
|
194 |
Here we choose @{typ "'a list"}. Then a conversion from the concrete
|
|
| 38502 | 195 |
type to the abstract type must be specified, here: |
| 59377 | 196 |
\<close> |
| 38502 | 197 |
|
| 59377 | 198 |
text %quote \<open> |
| 38502 | 199 |
@{term_type Dlist}
|
| 59377 | 200 |
\<close> |
| 38502 | 201 |
|
| 59377 | 202 |
text \<open> |
| 38502 | 203 |
\noindent Next follows the specification of a suitable \emph{projection},
|
204 |
i.e.~a conversion from abstract to concrete type: |
|
| 59377 | 205 |
\<close> |
| 38502 | 206 |
|
| 59377 | 207 |
text %quote \<open> |
| 38502 | 208 |
@{term_type list_of_dlist}
|
| 59377 | 209 |
\<close> |
| 38502 | 210 |
|
| 59377 | 211 |
text \<open> |
| 38502 | 212 |
\noindent This projection must be specified such that the following |
213 |
\emph{abstract datatype certificate} can be proven:
|
|
| 59377 | 214 |
\<close> |
| 38502 | 215 |
|
216 |
lemma %quote [code abstype]: |
|
217 |
"Dlist (list_of_dlist dxs) = dxs" |
|
218 |
by (fact Dlist_list_of_dlist) |
|
219 |
||
| 59377 | 220 |
text \<open> |
| 38502 | 221 |
\noindent Note that so far the invariant on representations |
222 |
(@{term_type distinct}) has never been mentioned explicitly:
|
|
223 |
the invariant is only referred to implicitly: all values in |
|
224 |
set @{term "{xs. list_of_dlist (Dlist xs) = xs}"} are invariant,
|
|
225 |
and in our example this is exactly @{term "{xs. distinct xs}"}.
|
|
226 |
||
227 |
The primitive operations on @{typ "'a dlist"} are specified
|
|
228 |
indirectly using the projection @{const list_of_dlist}. For
|
|
| 69505 | 229 |
the empty \<open>dlist\<close>, @{const Dlist.empty}, we finally want
|
| 38502 | 230 |
the code equation |
| 59377 | 231 |
\<close> |
| 38502 | 232 |
|
| 59377 | 233 |
text %quote \<open> |
| 38502 | 234 |
@{term "Dlist.empty = Dlist []"}
|
| 59377 | 235 |
\<close> |
| 38502 | 236 |
|
| 59377 | 237 |
text \<open> |
| 38502 | 238 |
\noindent This we have to prove indirectly as follows: |
| 59377 | 239 |
\<close> |
| 38502 | 240 |
|
|
52637
1501ebe39711
attribute "code" declares concrete and abstract code equations uniformly; added explicit "code equation" instead
haftmann
parents:
48985
diff
changeset
|
241 |
lemma %quote [code]: |
| 38502 | 242 |
"list_of_dlist Dlist.empty = []" |
243 |
by (fact list_of_dlist_empty) |
|
244 |
||
| 59377 | 245 |
text \<open> |
| 38502 | 246 |
\noindent This equation logically encodes both the desired code |
247 |
equation and that the expression @{const Dlist} is applied to obeys
|
|
248 |
the implicit invariant. Equations for insertion and removal are |
|
249 |
similar: |
|
| 59377 | 250 |
\<close> |
| 38502 | 251 |
|
|
52637
1501ebe39711
attribute "code" declares concrete and abstract code equations uniformly; added explicit "code equation" instead
haftmann
parents:
48985
diff
changeset
|
252 |
lemma %quote [code]: |
| 38502 | 253 |
"list_of_dlist (Dlist.insert x dxs) = List.insert x (list_of_dlist dxs)" |
254 |
by (fact list_of_dlist_insert) |
|
255 |
||
|
52637
1501ebe39711
attribute "code" declares concrete and abstract code equations uniformly; added explicit "code equation" instead
haftmann
parents:
48985
diff
changeset
|
256 |
lemma %quote [code]: |
| 38502 | 257 |
"list_of_dlist (Dlist.remove x dxs) = remove1 x (list_of_dlist dxs)" |
258 |
by (fact list_of_dlist_remove) |
|
259 |
||
| 59377 | 260 |
text \<open> |
| 38502 | 261 |
\noindent Then the corresponding code is as follows: |
| 59377 | 262 |
\<close> |
| 38502 | 263 |
|
| 59377 | 264 |
text %quotetypewriter \<open> |
| 39683 | 265 |
@{code_stmts Dlist.empty Dlist.insert Dlist.remove list_of_dlist (Haskell)}
|
| 59377 | 266 |
\<close> |
| 38502 | 267 |
|
| 59377 | 268 |
text \<open> |
| 58620 | 269 |
See further @{cite "Haftmann-Kraus-Kuncar-Nipkow:2013:data_refinement"}
|
| 53125 | 270 |
for the meta theory of datatype refinement involving invariants. |
271 |
||
| 38502 | 272 |
Typical data structures implemented by representations involving |
| 68484 | 273 |
invariants are available in the library, theory @{theory "HOL-Library.Mapping"}
|
| 46516 | 274 |
specifies key-value-mappings (type @{typ "('a, 'b) mapping"});
|
| 68484 | 275 |
these can be implemented by red-black-trees (theory @{theory "HOL-Library.RBT"}).
|
| 59377 | 276 |
\<close> |
| 38437 | 277 |
|
| 38405 | 278 |
end |