summary |
shortlog |
changelog |
graph |
tags |
branches |
files |
changeset |
file |
revisions |
annotate |
diff |
raw

src/Doc/Codegen/Foundations.thy

author | haftmann |

Fri Feb 15 08:31:31 2013 +0100 (2013-02-15) | |

changeset 51143 | 0a2371e7ced3 |

parent 48985 | 5386df44a037 |

child 51171 | e8b2d90da499 |

permissions | -rw-r--r-- |

two target language numeral types: integer and natural, as replacement for code_numeral;

former theory HOL/Library/Code_Numeral_Types replaces HOL/Code_Numeral;

refined stack of theories implementing int and/or nat by target language numerals;

reduced number of target language numeral types to exactly one

former theory HOL/Library/Code_Numeral_Types replaces HOL/Code_Numeral;

refined stack of theories implementing int and/or nat by target language numerals;

reduced number of target language numeral types to exactly one

1 theory Foundations

2 imports Introduction

3 begin

5 section {* Code generation foundations \label{sec:foundations} *}

7 subsection {* Code generator architecture \label{sec:architecture} *}

9 text {*

10 The code generator is actually a framework consisting of different

11 components which can be customised individually.

13 Conceptually all components operate on Isabelle's logic framework

14 @{theory Pure}. Practically, the object logic @{theory HOL}

15 provides the necessary facilities to make use of the code generator,

16 mainly since it is an extension of @{theory Pure}.

18 The constellation of the different components is visualized in the

19 following picture.

21 \begin{figure}[h]

22 \includegraphics{architecture}

23 \caption{Code generator architecture}

24 \label{fig:arch}

25 \end{figure}

27 \noindent Central to code generation is the notion of \emph{code

28 equations}. A code equation as a first approximation is a theorem

29 of the form @{text "f t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n \<equiv> t"} (an equation headed by a

30 constant @{text f} with arguments @{text "t\<^isub>1 t\<^isub>2 \<dots> t\<^isub>n"} and right

31 hand side @{text t}).

33 \begin{itemize}

35 \item Starting point of code generation is a collection of (raw)

36 code equations in a theory. It is not relevant where they stem

37 from, but typically they were either produced by specification

38 tools or proved explicitly by the user.

40 \item These raw code equations can be subjected to theorem

41 transformations. This \qn{preprocessor} (see

42 \secref{sec:preproc}) can apply the full expressiveness of

43 ML-based theorem transformations to code generation. The result

44 of preprocessing is a structured collection of code equations.

46 \item These code equations are \qn{translated} to a program in an

47 abstract intermediate language. Think of it as a kind of

48 \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for

49 datatypes), @{text fun} (stemming from code equations), also

50 @{text class} and @{text inst} (for type classes).

52 \item Finally, the abstract program is \qn{serialised} into

53 concrete source code of a target language. This step only

54 produces concrete syntax but does not change the program in

55 essence; all conceptual transformations occur in the translation

56 step.

58 \end{itemize}

60 \noindent From these steps, only the last two are carried out

61 outside the logic; by keeping this layer as thin as possible, the

62 amount of code to trust is kept to a minimum.

63 *}

66 subsection {* The preprocessor \label{sec:preproc} *}

68 text {*

69 Before selected function theorems are turned into abstract code, a

70 chain of definitional transformation steps is carried out:

71 \emph{preprocessing}. The preprocessor consists of two

72 components: a \emph{simpset} and \emph{function transformers}.

74 The \emph{simpset} can apply the full generality of the Isabelle

75 simplifier. Due to the interpretation of theorems as code

76 equations, rewrites are applied to the right hand side and the

77 arguments of the left hand side of an equation, but never to the

78 constant heading the left hand side. An important special case are

79 \emph{unfold theorems}, which may be declared and removed using the

80 @{attribute code_unfold} or \emph{@{attribute code_unfold} del}

81 attribute, respectively.

83 Some common applications:

84 *}

86 text_raw {*

87 \begin{itemize}

88 *}

90 text {*

91 \item replacing non-executable constructs by executable ones:

92 *}

94 lemma %quote [code_unfold]:

95 "x \<in> set xs \<longleftrightarrow> List.member xs x" by (fact in_set_member)

97 text {*

98 \item replacing executable but inconvenient constructs:

99 *}

101 lemma %quote [code_unfold]:

102 "xs = [] \<longleftrightarrow> List.null xs" by (fact eq_Nil_null)

104 text {*

105 \item eliminating disturbing expressions:

106 *}

108 lemma %quote [code_unfold]:

109 "1 = Suc 0" by (fact One_nat_def)

111 text_raw {*

112 \end{itemize}

113 *}

115 text {*

116 \noindent \emph{Function transformers} provide a very general

117 interface, transforming a list of function theorems to another list

118 of function theorems, provided that neither the heading constant nor

119 its type change. The @{term "0\<Colon>nat"} / @{const Suc} pattern

120 elimination implemented in theory @{text Code_Binary_Nat} (see

121 \secref{eff_nat}) uses this interface.

123 \noindent The current setup of the preprocessor may be inspected

124 using the @{command_def print_codeproc} command. @{command_def

125 code_thms} (see \secref{sec:equations}) provides a convenient

126 mechanism to inspect the impact of a preprocessor setup on code

127 equations.

128 *}

131 subsection {* Understanding code equations \label{sec:equations} *}

133 text {*

134 As told in \secref{sec:principle}, the notion of code equations is

135 vital to code generation. Indeed most problems which occur in

136 practice can be resolved by an inspection of the underlying code

137 equations.

139 It is possible to exchange the default code equations for constants

140 by explicitly proving alternative ones:

141 *}

143 lemma %quote [code]:

144 "dequeue (AQueue xs []) =

145 (if xs = [] then (None, AQueue [] [])

146 else dequeue (AQueue [] (rev xs)))"

147 "dequeue (AQueue xs (y # ys)) =

148 (Some y, AQueue xs ys)"

149 by (cases xs, simp_all) (cases "rev xs", simp_all)

151 text {*

152 \noindent The annotation @{text "[code]"} is an @{text attribute}

153 which states that the given theorems should be considered as code

154 equations for a @{text fun} statement -- the corresponding constant

155 is determined syntactically. The resulting code:

156 *}

158 text %quotetypewriter {*

159 @{code_stmts dequeue (consts) dequeue (Haskell)}

160 *}

162 text {*

163 \noindent You may note that the equality test @{term "xs = []"} has

164 been replaced by the predicate @{term "List.null xs"}. This is due

165 to the default setup of the \qn{preprocessor}.

167 This possibility to select arbitrary code equations is the key

168 technique for program and datatype refinement (see

169 \secref{sec:refinement}).

171 Due to the preprocessor, there is the distinction of raw code

172 equations (before preprocessing) and code equations (after

173 preprocessing).

175 The first can be listed (among other data) using the @{command_def

176 print_codesetup} command.

178 The code equations after preprocessing are already are blueprint of

179 the generated program and can be inspected using the @{command

180 code_thms} command:

181 *}

183 code_thms %quote dequeue

185 text {*

186 \noindent This prints a table with the code equations for @{const

187 dequeue}, including \emph{all} code equations those equations depend

188 on recursively. These dependencies themselves can be visualized using

189 the @{command_def code_deps} command.

190 *}

193 subsection {* Equality *}

195 text {*

196 Implementation of equality deserves some attention. Here an example

197 function involving polymorphic equality:

198 *}

200 primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where

201 "collect_duplicates xs ys [] = xs"

202 | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs

203 then if z \<in> set ys

204 then collect_duplicates xs ys zs

205 else collect_duplicates xs (z#ys) zs

206 else collect_duplicates (z#xs) (z#ys) zs)"

208 text {*

209 \noindent During preprocessing, the membership test is rewritten,

210 resulting in @{const List.member}, which itself performs an explicit

211 equality check, as can be seen in the corresponding @{text SML} code:

212 *}

214 text %quotetypewriter {*

215 @{code_stmts collect_duplicates (SML)}

216 *}

218 text {*

219 \noindent Obviously, polymorphic equality is implemented the Haskell

220 way using a type class. How is this achieved? HOL introduces an

221 explicit class @{class equal} with a corresponding operation @{const

222 HOL.equal} such that @{thm equal [no_vars]}. The preprocessing

223 framework does the rest by propagating the @{class equal} constraints

224 through all dependent code equations. For datatypes, instances of

225 @{class equal} are implicitly derived when possible. For other types,

226 you may instantiate @{text equal} manually like any other type class.

227 *}

230 subsection {* Explicit partiality \label{sec:partiality} *}

232 text {*

233 Partiality usually enters the game by partial patterns, as

234 in the following example, again for amortised queues:

235 *}

237 definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

238 "strict_dequeue q = (case dequeue q

239 of (Some x, q') \<Rightarrow> (x, q'))"

241 lemma %quote strict_dequeue_AQueue [code]:

242 "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"

243 "strict_dequeue (AQueue xs []) =

244 (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"

245 by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split)

247 text {*

248 \noindent In the corresponding code, there is no equation

249 for the pattern @{term "AQueue [] []"}:

250 *}

252 text %quotetypewriter {*

253 @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}

254 *}

256 text {*

257 \noindent In some cases it is desirable to have this

258 pseudo-\qt{partiality} more explicitly, e.g.~as follows:

259 *}

261 axiomatization %quote empty_queue :: 'a

263 definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

264 "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"

266 lemma %quote strict_dequeue'_AQueue [code]:

267 "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue

268 else strict_dequeue' (AQueue [] (rev xs)))"

269 "strict_dequeue' (AQueue xs (y # ys)) =

270 (y, AQueue xs ys)"

271 by (simp_all add: strict_dequeue'_def split: list.splits)

273 text {*

274 Observe that on the right hand side of the definition of @{const

275 "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs.

277 Normally, if constants without any code equations occur in a

278 program, the code generator complains (since in most cases this is

279 indeed an error). But such constants can also be thought

280 of as function definitions which always fail,

281 since there is never a successful pattern match on the left hand

282 side. In order to categorise a constant into that category

283 explicitly, use @{command_def "code_abort"}:

284 *}

286 code_abort %quote empty_queue

288 text {*

289 \noindent Then the code generator will just insert an error or

290 exception at the appropriate position:

291 *}

293 text %quotetypewriter {*

294 @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}

295 *}

297 text {*

298 \noindent This feature however is rarely needed in practice. Note

299 also that the HOL default setup already declares @{const undefined}

300 as @{command "code_abort"}, which is most likely to be used in such

301 situations.

302 *}

305 subsection {* If something goes utterly wrong \label{sec:utterly_wrong} *}

307 text {*

308 Under certain circumstances, the code generator fails to produce

309 code entirely. To debug these, the following hints may prove

310 helpful:

312 \begin{description}

314 \ditem{\emph{Check with a different target language}.} Sometimes

315 the situation gets more clear if you switch to another target

316 language; the code generated there might give some hints what

317 prevents the code generator to produce code for the desired

318 language.

320 \ditem{\emph{Inspect code equations}.} Code equations are the central

321 carrier of code generation. Most problems occurring while generating

322 code can be traced to single equations which are printed as part of

323 the error message. A closer inspection of those may offer the key

324 for solving issues (cf.~\secref{sec:equations}).

326 \ditem{\emph{Inspect preprocessor setup}.} The preprocessor might

327 transform code equations unexpectedly; to understand an

328 inspection of its setup is necessary (cf.~\secref{sec:preproc}).

330 \ditem{\emph{Generate exceptions}.} If the code generator

331 complains about missing code equations, in can be helpful to

332 implement the offending constants as exceptions

333 (cf.~\secref{sec:partiality}); this allows at least for a formal

334 generation of code, whose inspection may then give clues what is

335 wrong.

337 \ditem{\emph{Remove offending code equations}.} If code

338 generation is prevented by just a single equation, this can be

339 removed (cf.~\secref{sec:equations}) to allow formal code

340 generation, whose result in turn can be used to trace the

341 problem. The most prominent case here are mismatches in type

342 class signatures (\qt{wellsortedness error}).

344 \end{description}

345 *}

347 end