Theory Refinement
theory Refinement
imports Setup
begin
section ‹Program and datatype refinement \label{sec:refinement}›
text ‹
Code generation by shallow embedding (cf.~\secref{sec:principle})
allows to choose code equations and datatype constructors freely,
given that some very basic syntactic properties are met; this
flexibility opens up mechanisms for refinement which allow to extend
the scope and quality of generated code dramatically.
›
subsection ‹Program refinement›
text ‹
Program refinement works by choosing appropriate code equations
explicitly (cf.~\secref{sec:equations}); as example, we use Fibonacci
numbers:
›
fun %quote fib :: "nat ⇒ nat" where
"fib 0 = 0"
| "fib (Suc 0) = Suc 0"
| "fib (Suc (Suc n)) = fib n + fib (Suc n)"
text ‹
\noindent The runtime of the corresponding code grows exponential due
to two recursive calls:
›
text %quote ‹
@{code_stmts fib constant: fib (Haskell)}
›
text ‹
\noindent A more efficient implementation would use dynamic
programming, e.g.~sharing of common intermediate results between
recursive calls. This idea is expressed by an auxiliary operation
which computes a Fibonacci number and its successor simultaneously:
›
definition %quote fib_step :: "nat ⇒ nat × nat" where
"fib_step n = (fib (Suc n), fib n)"
text ‹
\noindent This operation can be implemented by recursion using
dynamic programming:
›
lemma %quote [code]:
"fib_step 0 = (Suc 0, 0)"
"fib_step (Suc n) = (let (m, q) = fib_step n in (m + q, m))"
by (simp_all add: fib_step_def)
text ‹
\noindent What remains is to implement \<^const>‹fib› by \<^const>‹fib_step› as follows:
›
lemma %quote [code]:
"fib 0 = 0"
"fib (Suc n) = fst (fib_step n)"
by (simp_all add: fib_step_def)
text ‹
\noindent The resulting code shows only linear growth of runtime:
›
text %quote ‹
@{code_stmts fib constant: fib fib_step (Haskell)}
›
subsection ‹Datatype refinement›
text ‹
Selecting specific code equations \emph{and} datatype constructors
leads to datatype refinement. As an example, we will develop an
alternative representation of the queue example given in
\secref{sec:queue_example}. The amortised representation is
convenient for generating code but exposes its \qt{implementation}
details, which may be cumbersome when proving theorems about it.
Therefore, here is a simple, straightforward representation of
queues:
›
datatype %quote 'a queue = Queue "'a list"
definition %quote empty :: "'a queue" where
"empty = Queue []"
primrec %quote enqueue :: "'a ⇒ 'a queue ⇒ 'a queue" where
"enqueue x (Queue xs) = Queue (xs @ [x])"
fun %quote dequeue :: "'a queue ⇒ 'a option × 'a queue" where
"dequeue (Queue []) = (None, Queue [])"
| "dequeue (Queue (x # xs)) = (Some x, Queue xs)"
text ‹
\noindent This we can use directly for proving; for executing,
we provide an alternative characterisation:
›
definition %quote AQueue :: "'a list ⇒ 'a list ⇒ 'a queue" where
"AQueue xs ys = Queue (ys @ rev xs)"
code_datatype %quote AQueue
text ‹
\noindent Here we define a \qt{constructor} \<^const>‹AQueue› which
is defined in terms of ‹Queue› and interprets its arguments
according to what the \emph{content} of an amortised queue is supposed
to be.
The prerequisite for datatype constructors is only syntactical: a
constructor must be of type ‹τ = … ⇒ κ α⇩1 … α⇩n› where ‹{α⇩1, …, α⇩n}› is exactly the set of \emph{all} type variables in
‹τ›; then ‹κ› is its corresponding datatype. The
HOL datatype package by default registers any new datatype with its
constructors, but this may be changed using @{command_def
code_datatype}; the currently chosen constructors can be inspected
using the @{command print_codesetup} command.
Equipped with this, we are able to prove the following equations
for our primitive queue operations which \qt{implement} the simple
queues in an amortised fashion:
›
lemma %quote empty_AQueue [code]:
"empty = AQueue [] []"
by (simp add: AQueue_def empty_def)
lemma %quote enqueue_AQueue [code]:
"enqueue x (AQueue xs ys) = AQueue (x # xs) ys"
by (simp add: AQueue_def)
lemma %quote dequeue_AQueue [code]:
"dequeue (AQueue xs []) =
(if xs = [] then (None, AQueue [] [])
else dequeue (AQueue [] (rev xs)))"
"dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"
by (simp_all add: AQueue_def)
text ‹
\noindent It is good style, although no absolute requirement, to
provide code equations for the original artefacts of the implemented
type, if possible; in our case, these are the datatype constructor
\<^const>‹Queue› and the case combinator \<^const>‹case_queue›:
›
lemma %quote Queue_AQueue [code]:
"Queue = AQueue []"
by (simp add: AQueue_def fun_eq_iff)
lemma %quote case_queue_AQueue [code]:
"case_queue f (AQueue xs ys) = f (ys @ rev xs)"
by (simp add: AQueue_def)
text ‹
\noindent The resulting code looks as expected:
›
text %quote ‹
@{code_stmts empty enqueue dequeue Queue case_queue (SML)}
›
text ‹
The same techniques can also be applied to types which are not
specified as datatypes, e.g.~type \<^typ>‹int› is originally specified
as quotient type by means of @{command_def typedef}, but for code
generation constants allowing construction of binary numeral values
are used as constructors for \<^typ>‹int›.
This approach however fails if the representation of a type demands
invariants; this issue is discussed in the next section.
›
subsection ‹Datatype refinement involving invariants \label{sec:invariant}›
text ‹
Datatype representation involving invariants require a dedicated
setup for the type and its primitive operations. As a running
example, we implement a type \<^typ>‹'a dlist› of lists consisting
of distinct elements.
The specification of \<^typ>‹'a dlist› itself can be found in theory
\<^theory>‹HOL-Library.Dlist›.
The first step is to decide on which representation the abstract
type (in our example \<^typ>‹'a dlist›) should be implemented.
Here we choose \<^typ>‹'a list›. Then a conversion from the concrete
type to the abstract type must be specified, here:
›
text %quote ‹
\<^term_type>‹Dlist›
›
text ‹
\noindent Next follows the specification of a suitable \emph{projection},
i.e.~a conversion from abstract to concrete type:
›
text %quote ‹
\<^term_type>‹list_of_dlist›
›
text ‹
\noindent This projection must be specified such that the following
\emph{abstract datatype certificate} can be proven:
›
lemma %quote [code abstype]:
"Dlist (list_of_dlist dxs) = dxs"
by (fact Dlist_list_of_dlist)
text ‹
\noindent Note that so far the invariant on representations
(\<^term_type>‹distinct›) has never been mentioned explicitly:
the invariant is only referred to implicitly: all values in
set \<^term>‹{xs. list_of_dlist (Dlist xs) = xs}› are invariant,
and in our example this is exactly \<^term>‹{xs. distinct xs}›.
The primitive operations on \<^typ>‹'a dlist› are specified
indirectly using the projection \<^const>‹list_of_dlist›. For
the empty ‹dlist›, \<^const>‹Dlist.empty›, we finally want
the code equation
›
text %quote ‹
\<^term>‹Dlist.empty = Dlist []›
›
text ‹
\noindent This we have to prove indirectly as follows:
›
lemma %quote [code]:
"list_of_dlist Dlist.empty = []"
by (fact list_of_dlist_empty)
text ‹
\noindent This equation logically encodes both the desired code
equation and that the expression \<^const>‹Dlist› is applied to obeys
the implicit invariant. Equations for insertion and removal are
similar:
›
lemma %quote [code]:
"list_of_dlist (Dlist.insert x dxs) = List.insert x (list_of_dlist dxs)"
by (fact list_of_dlist_insert)
lemma %quote [code]:
"list_of_dlist (Dlist.remove x dxs) = remove1 x (list_of_dlist dxs)"
by (fact list_of_dlist_remove)
text ‹
\noindent Then the corresponding code is as follows:
›
text %quote ‹
@{code_stmts Dlist.empty Dlist.insert Dlist.remove list_of_dlist (Haskell)}
›
text ‹
See further @{cite "Haftmann-Kraus-Kuncar-Nipkow:2013:data_refinement"}
for the meta theory of datatype refinement involving invariants.
Typical data structures implemented by representations involving
invariants are available in the library, theory \<^theory>‹HOL-Library.Mapping›
specifies key-value-mappings (type \<^typ>‹('a, 'b) mapping›);
these can be implemented by red-black-trees (theory \<^theory>‹HOL-Library.RBT›).
›
end