summary |
shortlog |
changelog |
graph |
tags |
branches |
files |
changeset |
file |
revisions |
annotate |
diff |
raw

src/Doc/Codegen/Refinement.thy

author | haftmann |

Mon Feb 06 20:56:34 2017 +0100 (2017-02-06) | |

changeset 64990 | c6a7de505796 |

parent 59377 | 056945909f60 |

child 66405 | 82e2291cabff |

permissions | -rw-r--r-- |

more explicit errors in pathological cases

1 theory Refinement

2 imports Setup

3 begin

5 section \<open>Program and datatype refinement \label{sec:refinement}\<close>

7 text \<open>

8 Code generation by shallow embedding (cf.~\secref{sec:principle})

9 allows to choose code equations and datatype constructors freely,

10 given that some very basic syntactic properties are met; this

11 flexibility opens up mechanisms for refinement which allow to extend

12 the scope and quality of generated code dramatically.

13 \<close>

16 subsection \<open>Program refinement\<close>

18 text \<open>

19 Program refinement works by choosing appropriate code equations

20 explicitly (cf.~\secref{sec:equations}); as example, we use Fibonacci

21 numbers:

22 \<close>

24 fun %quote fib :: "nat \<Rightarrow> nat" where

25 "fib 0 = 0"

26 | "fib (Suc 0) = Suc 0"

27 | "fib (Suc (Suc n)) = fib n + fib (Suc n)"

29 text \<open>

30 \noindent The runtime of the corresponding code grows exponential due

31 to two recursive calls:

32 \<close>

34 text %quotetypewriter \<open>

35 @{code_stmts fib (consts) fib (Haskell)}

36 \<close>

38 text \<open>

39 \noindent A more efficient implementation would use dynamic

40 programming, e.g.~sharing of common intermediate results between

41 recursive calls. This idea is expressed by an auxiliary operation

42 which computes a Fibonacci number and its successor simultaneously:

43 \<close>

45 definition %quote fib_step :: "nat \<Rightarrow> nat \<times> nat" where

46 "fib_step n = (fib (Suc n), fib n)"

48 text \<open>

49 \noindent This operation can be implemented by recursion using

50 dynamic programming:

51 \<close>

53 lemma %quote [code]:

54 "fib_step 0 = (Suc 0, 0)"

55 "fib_step (Suc n) = (let (m, q) = fib_step n in (m + q, m))"

56 by (simp_all add: fib_step_def)

58 text \<open>

59 \noindent What remains is to implement @{const fib} by @{const

60 fib_step} as follows:

61 \<close>

63 lemma %quote [code]:

64 "fib 0 = 0"

65 "fib (Suc n) = fst (fib_step n)"

66 by (simp_all add: fib_step_def)

68 text \<open>

69 \noindent The resulting code shows only linear growth of runtime:

70 \<close>

72 text %quotetypewriter \<open>

73 @{code_stmts fib (consts) fib fib_step (Haskell)}

74 \<close>

77 subsection \<open>Datatype refinement\<close>

79 text \<open>

80 Selecting specific code equations \emph{and} datatype constructors

81 leads to datatype refinement. As an example, we will develop an

82 alternative representation of the queue example given in

83 \secref{sec:queue_example}. The amortised representation is

84 convenient for generating code but exposes its \qt{implementation}

85 details, which may be cumbersome when proving theorems about it.

86 Therefore, here is a simple, straightforward representation of

87 queues:

88 \<close>

90 datatype %quote 'a queue = Queue "'a list"

92 definition %quote empty :: "'a queue" where

93 "empty = Queue []"

95 primrec %quote enqueue :: "'a \<Rightarrow> 'a queue \<Rightarrow> 'a queue" where

96 "enqueue x (Queue xs) = Queue (xs @ [x])"

98 fun %quote dequeue :: "'a queue \<Rightarrow> 'a option \<times> 'a queue" where

99 "dequeue (Queue []) = (None, Queue [])"

100 | "dequeue (Queue (x # xs)) = (Some x, Queue xs)"

102 text \<open>

103 \noindent This we can use directly for proving; for executing,

104 we provide an alternative characterisation:

105 \<close>

107 definition %quote AQueue :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a queue" where

108 "AQueue xs ys = Queue (ys @ rev xs)"

110 code_datatype %quote AQueue

112 text \<open>

113 \noindent Here we define a \qt{constructor} @{const "AQueue"} which

114 is defined in terms of @{text "Queue"} and interprets its arguments

115 according to what the \emph{content} of an amortised queue is supposed

116 to be.

118 The prerequisite for datatype constructors is only syntactical: a

119 constructor must be of type @{text "\<tau> = \<dots> \<Rightarrow> \<kappa> \<alpha>\<^sub>1 \<dots> \<alpha>\<^sub>n"} where @{text

120 "{\<alpha>\<^sub>1, \<dots>, \<alpha>\<^sub>n}"} is exactly the set of \emph{all} type variables in

121 @{text "\<tau>"}; then @{text "\<kappa>"} is its corresponding datatype. The

122 HOL datatype package by default registers any new datatype with its

123 constructors, but this may be changed using @{command_def

124 code_datatype}; the currently chosen constructors can be inspected

125 using the @{command print_codesetup} command.

127 Equipped with this, we are able to prove the following equations

128 for our primitive queue operations which \qt{implement} the simple

129 queues in an amortised fashion:

130 \<close>

132 lemma %quote empty_AQueue [code]:

133 "empty = AQueue [] []"

134 by (simp add: AQueue_def empty_def)

136 lemma %quote enqueue_AQueue [code]:

137 "enqueue x (AQueue xs ys) = AQueue (x # xs) ys"

138 by (simp add: AQueue_def)

140 lemma %quote dequeue_AQueue [code]:

141 "dequeue (AQueue xs []) =

142 (if xs = [] then (None, AQueue [] [])

143 else dequeue (AQueue [] (rev xs)))"

144 "dequeue (AQueue xs (y # ys)) = (Some y, AQueue xs ys)"

145 by (simp_all add: AQueue_def)

147 text \<open>

148 \noindent It is good style, although no absolute requirement, to

149 provide code equations for the original artefacts of the implemented

150 type, if possible; in our case, these are the datatype constructor

151 @{const Queue} and the case combinator @{const case_queue}:

152 \<close>

154 lemma %quote Queue_AQueue [code]:

155 "Queue = AQueue []"

156 by (simp add: AQueue_def fun_eq_iff)

158 lemma %quote case_queue_AQueue [code]:

159 "case_queue f (AQueue xs ys) = f (ys @ rev xs)"

160 by (simp add: AQueue_def)

162 text \<open>

163 \noindent The resulting code looks as expected:

164 \<close>

166 text %quotetypewriter \<open>

167 @{code_stmts empty enqueue dequeue Queue case_queue (SML)}

168 \<close>

170 text \<open>

171 The same techniques can also be applied to types which are not

172 specified as datatypes, e.g.~type @{typ int} is originally specified

173 as quotient type by means of @{command_def typedef}, but for code

174 generation constants allowing construction of binary numeral values

175 are used as constructors for @{typ int}.

177 This approach however fails if the representation of a type demands

178 invariants; this issue is discussed in the next section.

179 \<close>

182 subsection \<open>Datatype refinement involving invariants \label{sec:invariant}\<close>

184 text \<open>

185 Datatype representation involving invariants require a dedicated

186 setup for the type and its primitive operations. As a running

187 example, we implement a type @{text "'a dlist"} of list consisting

188 of distinct elements.

190 The first step is to decide on which representation the abstract

191 type (in our example @{text "'a dlist"}) should be implemented.

192 Here we choose @{text "'a list"}. Then a conversion from the concrete

193 type to the abstract type must be specified, here:

194 \<close>

196 text %quote \<open>

197 @{term_type Dlist}

198 \<close>

200 text \<open>

201 \noindent Next follows the specification of a suitable \emph{projection},

202 i.e.~a conversion from abstract to concrete type:

203 \<close>

205 text %quote \<open>

206 @{term_type list_of_dlist}

207 \<close>

209 text \<open>

210 \noindent This projection must be specified such that the following

211 \emph{abstract datatype certificate} can be proven:

212 \<close>

214 lemma %quote [code abstype]:

215 "Dlist (list_of_dlist dxs) = dxs"

216 by (fact Dlist_list_of_dlist)

218 text \<open>

219 \noindent Note that so far the invariant on representations

220 (@{term_type distinct}) has never been mentioned explicitly:

221 the invariant is only referred to implicitly: all values in

222 set @{term "{xs. list_of_dlist (Dlist xs) = xs}"} are invariant,

223 and in our example this is exactly @{term "{xs. distinct xs}"}.

225 The primitive operations on @{typ "'a dlist"} are specified

226 indirectly using the projection @{const list_of_dlist}. For

227 the empty @{text "dlist"}, @{const Dlist.empty}, we finally want

228 the code equation

229 \<close>

231 text %quote \<open>

232 @{term "Dlist.empty = Dlist []"}

233 \<close>

235 text \<open>

236 \noindent This we have to prove indirectly as follows:

237 \<close>

239 lemma %quote [code]:

240 "list_of_dlist Dlist.empty = []"

241 by (fact list_of_dlist_empty)

243 text \<open>

244 \noindent This equation logically encodes both the desired code

245 equation and that the expression @{const Dlist} is applied to obeys

246 the implicit invariant. Equations for insertion and removal are

247 similar:

248 \<close>

250 lemma %quote [code]:

251 "list_of_dlist (Dlist.insert x dxs) = List.insert x (list_of_dlist dxs)"

252 by (fact list_of_dlist_insert)

254 lemma %quote [code]:

255 "list_of_dlist (Dlist.remove x dxs) = remove1 x (list_of_dlist dxs)"

256 by (fact list_of_dlist_remove)

258 text \<open>

259 \noindent Then the corresponding code is as follows:

260 \<close>

262 text %quotetypewriter \<open>

263 @{code_stmts Dlist.empty Dlist.insert Dlist.remove list_of_dlist (Haskell)}

264 \<close>

266 text \<open>

267 See further @{cite "Haftmann-Kraus-Kuncar-Nipkow:2013:data_refinement"}

268 for the meta theory of datatype refinement involving invariants.

270 Typical data structures implemented by representations involving

271 invariants are available in the library, theory @{theory Mapping}

272 specifies key-value-mappings (type @{typ "('a, 'b) mapping"});

273 these can be implemented by red-black-trees (theory @{theory RBT}).

274 \<close>

276 end