summary |
shortlog |
changelog |
graph |
tags |
branches |
files |
changeset |
file |
revisions |
annotate |
diff |
raw

src/Doc/Codegen/Foundations.thy

author | blanchet |

Tue Nov 07 15:16:42 2017 +0100 (20 months ago) | |

changeset 67022 | 49309fe530fd |

parent 61781 | e1e6bb36b27a |

child 68254 | 3a7f257dcac7 |

permissions | -rw-r--r-- |

more robust parsing for THF proofs (esp. polymorphic Leo-III proofs)

1 theory Foundations

2 imports Introduction

3 begin

5 section \<open>Code generation foundations \label{sec:foundations}\<close>

7 subsection \<open>Code generator architecture \label{sec:architecture}\<close>

9 text \<open>

10 The code generator is actually a framework consisting of different

11 components which can be customised individually.

13 Conceptually all components operate on Isabelle's logic framework

14 @{theory Pure}. Practically, the object logic @{theory HOL}

15 provides the necessary facilities to make use of the code generator,

16 mainly since it is an extension of @{theory Pure}.

18 The constellation of the different components is visualized in the

19 following picture.

21 \begin{figure}[h]

22 \def\sys#1{\emph{#1}}

23 \begin{tikzpicture}[x = 4cm, y = 1cm]

24 \tikzstyle positive=[color = black, fill = white];

25 \tikzstyle negative=[color = white, fill = black];

26 \tikzstyle entity=[rounded corners, draw, thick];

27 \tikzstyle process=[ellipse, draw, thick];

28 \tikzstyle arrow=[-stealth, semithick];

29 \node (spec) at (0, 3) [entity, positive] {specification tools};

30 \node (user) at (1, 3) [entity, positive] {user proofs};

31 \node (spec_user_join) at (0.5, 3) [shape=coordinate] {};

32 \node (raw) at (0.5, 4) [entity, positive] {raw code equations};

33 \node (pre) at (1.5, 4) [process, positive] {preprocessing};

34 \node (eqn) at (2.5, 4) [entity, positive] {code equations};

35 \node (iml) at (0.5, 0) [entity, positive] {intermediate program};

36 \node (seri) at (1.5, 0) [process, positive] {serialisation};

37 \node (SML) at (2.5, 3) [entity, positive] {\sys{SML}};

38 \node (OCaml) at (2.5, 2) [entity, positive] {\sys{OCaml}};

39 \node (Haskell) at (2.5, 1) [entity, positive] {\sys{Haskell}};

40 \node (Scala) at (2.5, 0) [entity, positive] {\sys{Scala}};

41 \draw [semithick] (spec) -- (spec_user_join);

42 \draw [semithick] (user) -- (spec_user_join);

43 \draw [-diamond, semithick] (spec_user_join) -- (raw);

44 \draw [arrow] (raw) -- (pre);

45 \draw [arrow] (pre) -- (eqn);

46 \draw [arrow] (eqn) -- node (transl) [process, positive] {translation} (iml);

47 \draw [arrow] (iml) -- (seri);

48 \draw [arrow] (seri) -- (SML);

49 \draw [arrow] (seri) -- (OCaml);

50 \draw [arrow] (seri) -- (Haskell);

51 \draw [arrow] (seri) -- (Scala);

52 \end{tikzpicture}

53 \caption{Code generator architecture}

54 \label{fig:arch}

55 \end{figure}

57 \noindent Central to code generation is the notion of \emph{code

58 equations}. A code equation as a first approximation is a theorem

59 of the form @{text "f t\<^sub>1 t\<^sub>2 \<dots> t\<^sub>n \<equiv> t"} (an equation headed by a

60 constant @{text f} with arguments @{text "t\<^sub>1 t\<^sub>2 \<dots> t\<^sub>n"} and right

61 hand side @{text t}).

63 \begin{itemize}

65 \item Starting point of code generation is a collection of (raw)

66 code equations in a theory. It is not relevant where they stem

67 from, but typically they were either produced by specification

68 tools or proved explicitly by the user.

70 \item These raw code equations can be subjected to theorem

71 transformations. This \qn{preprocessor} (see

72 \secref{sec:preproc}) can apply the full expressiveness of

73 ML-based theorem transformations to code generation. The result

74 of preprocessing is a structured collection of code equations.

76 \item These code equations are \qn{translated} to a program in an

77 abstract intermediate language. Think of it as a kind of

78 \qt{Mini-Haskell} with four \qn{statements}: @{text data} (for

79 datatypes), @{text fun} (stemming from code equations), also

80 @{text class} and @{text inst} (for type classes).

82 \item Finally, the abstract program is \qn{serialised} into

83 concrete source code of a target language. This step only

84 produces concrete syntax but does not change the program in

85 essence; all conceptual transformations occur in the translation

86 step.

88 \end{itemize}

90 \noindent From these steps, only the last two are carried out

91 outside the logic; by keeping this layer as thin as possible, the

92 amount of code to trust is kept to a minimum.

93 \<close>

96 subsection \<open>The pre- and postprocessor \label{sec:preproc}\<close>

98 text \<open>

99 Before selected function theorems are turned into abstract code, a

100 chain of definitional transformation steps is carried out:

101 \emph{preprocessing}. The preprocessor consists of two

102 components: a \emph{simpset} and \emph{function transformers}.

104 The preprocessor simpset has a disparate brother, the

105 \emph{postprocessor simpset}.

106 In the theory-to-code scenario depicted in the picture

107 above, it plays no role. But if generated code is used

108 to evaluate expressions (cf.~\secref{sec:evaluation}), the

109 postprocessor simpset is applied to the resulting expression before this

110 is turned back.

112 The pre- and postprocessor \emph{simpsets} can apply the

113 full generality of the Isabelle

114 simplifier. Due to the interpretation of theorems as code

115 equations, rewrites are applied to the right hand side and the

116 arguments of the left hand side of an equation, but never to the

117 constant heading the left hand side.

119 Pre- and postprocessor can be setup to transfer between

120 expressions suitable for logical reasoning and expressions

121 suitable for execution. As example, take list membership; logically

122 it is expressed as @{term "x \<in> set xs"}. But for execution

123 the intermediate set is not desirable. Hence the following

124 specification:

125 \<close>

127 definition %quote member :: "'a list \<Rightarrow> 'a \<Rightarrow> bool"

128 where

129 [code_abbrev]: "member xs x \<longleftrightarrow> x \<in> set xs"

131 text \<open>

132 \noindent The \emph{@{attribute code_abbrev} attribute} declares

133 its theorem a rewrite rule for the postprocessor and the symmetric

134 of its theorem as rewrite rule for the preprocessor. Together,

135 this has the effect that expressions @{thm (rhs) member_def [no_vars]}

136 are replaced by @{thm (lhs) member_def [no_vars]} in generated code, but

137 are turned back into @{thm (rhs) member_def [no_vars]} if generated

138 code is used for evaluation.

140 Rewrite rules for pre- or postprocessor may be

141 declared independently using \emph{@{attribute code_unfold}}

142 or \emph{@{attribute code_post}} respectively.

144 \emph{Function transformers} provide a very general

145 interface, transforming a list of function theorems to another list

146 of function theorems, provided that neither the heading constant nor

147 its type change. The @{term "0::nat"} / @{const Suc} pattern

148 used in theory @{text Code_Abstract_Nat} (see \secref{abstract_nat})

149 uses this interface.

151 \noindent The current setup of the pre- and postprocessor may be inspected

152 using the @{command_def print_codeproc} command. @{command_def

153 code_thms} (see \secref{sec:equations}) provides a convenient

154 mechanism to inspect the impact of a preprocessor setup on code

155 equations.

156 \<close>

159 subsection \<open>Understanding code equations \label{sec:equations}\<close>

161 text \<open>

162 As told in \secref{sec:principle}, the notion of code equations is

163 vital to code generation. Indeed most problems which occur in

164 practice can be resolved by an inspection of the underlying code

165 equations.

167 It is possible to exchange the default code equations for constants

168 by explicitly proving alternative ones:

169 \<close>

171 lemma %quote [code]:

172 "dequeue (AQueue xs []) =

173 (if xs = [] then (None, AQueue [] [])

174 else dequeue (AQueue [] (rev xs)))"

175 "dequeue (AQueue xs (y # ys)) =

176 (Some y, AQueue xs ys)"

177 by (cases xs, simp_all) (cases "rev xs", simp_all)

179 text \<open>

180 \noindent The annotation @{text "[code]"} is an @{text attribute}

181 which states that the given theorems should be considered as code

182 equations for a @{text fun} statement -- the corresponding constant

183 is determined syntactically. The resulting code:

184 \<close>

186 text %quotetypewriter \<open>

187 @{code_stmts dequeue (consts) dequeue (Haskell)}

188 \<close>

190 text \<open>

191 \noindent You may note that the equality test @{term "xs = []"} has

192 been replaced by the predicate @{term "List.null xs"}. This is due

193 to the default setup of the \qn{preprocessor}.

195 This possibility to select arbitrary code equations is the key

196 technique for program and datatype refinement (see

197 \secref{sec:refinement}).

199 Due to the preprocessor, there is the distinction of raw code

200 equations (before preprocessing) and code equations (after

201 preprocessing).

203 The first can be listed (among other data) using the @{command_def

204 print_codesetup} command.

206 The code equations after preprocessing are already are blueprint of

207 the generated program and can be inspected using the @{command

208 code_thms} command:

209 \<close>

211 code_thms %quote dequeue

213 text \<open>

214 \noindent This prints a table with the code equations for @{const

215 dequeue}, including \emph{all} code equations those equations depend

216 on recursively. These dependencies themselves can be visualized using

217 the @{command_def code_deps} command.

218 \<close>

221 subsection \<open>Equality\<close>

223 text \<open>

224 Implementation of equality deserves some attention. Here an example

225 function involving polymorphic equality:

226 \<close>

228 primrec %quote collect_duplicates :: "'a list \<Rightarrow> 'a list \<Rightarrow> 'a list \<Rightarrow> 'a list" where

229 "collect_duplicates xs ys [] = xs"

230 | "collect_duplicates xs ys (z#zs) = (if z \<in> set xs

231 then if z \<in> set ys

232 then collect_duplicates xs ys zs

233 else collect_duplicates xs (z#ys) zs

234 else collect_duplicates (z#xs) (z#ys) zs)"

236 text \<open>

237 \noindent During preprocessing, the membership test is rewritten,

238 resulting in @{const List.member}, which itself performs an explicit

239 equality check, as can be seen in the corresponding @{text SML} code:

240 \<close>

242 text %quotetypewriter \<open>

243 @{code_stmts collect_duplicates (SML)}

244 \<close>

246 text \<open>

247 \noindent Obviously, polymorphic equality is implemented the Haskell

248 way using a type class. How is this achieved? HOL introduces an

249 explicit class @{class equal} with a corresponding operation @{const

250 HOL.equal} such that @{thm equal [no_vars]}. The preprocessing

251 framework does the rest by propagating the @{class equal} constraints

252 through all dependent code equations. For datatypes, instances of

253 @{class equal} are implicitly derived when possible. For other types,

254 you may instantiate @{text equal} manually like any other type class.

255 \<close>

258 subsection \<open>Explicit partiality \label{sec:partiality}\<close>

260 text \<open>

261 Partiality usually enters the game by partial patterns, as

262 in the following example, again for amortised queues:

263 \<close>

265 definition %quote strict_dequeue :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

266 "strict_dequeue q = (case dequeue q

267 of (Some x, q') \<Rightarrow> (x, q'))"

269 lemma %quote strict_dequeue_AQueue [code]:

270 "strict_dequeue (AQueue xs (y # ys)) = (y, AQueue xs ys)"

271 "strict_dequeue (AQueue xs []) =

272 (case rev xs of y # ys \<Rightarrow> (y, AQueue [] ys))"

273 by (simp_all add: strict_dequeue_def) (cases xs, simp_all split: list.split)

275 text \<open>

276 \noindent In the corresponding code, there is no equation

277 for the pattern @{term "AQueue [] []"}:

278 \<close>

280 text %quotetypewriter \<open>

281 @{code_stmts strict_dequeue (consts) strict_dequeue (Haskell)}

282 \<close>

284 text \<open>

285 \noindent In some cases it is desirable to have this

286 pseudo-\qt{partiality} more explicitly, e.g.~as follows:

287 \<close>

289 axiomatization %quote empty_queue :: 'a

291 definition %quote strict_dequeue' :: "'a queue \<Rightarrow> 'a \<times> 'a queue" where

292 "strict_dequeue' q = (case dequeue q of (Some x, q') \<Rightarrow> (x, q') | _ \<Rightarrow> empty_queue)"

294 lemma %quote strict_dequeue'_AQueue [code]:

295 "strict_dequeue' (AQueue xs []) = (if xs = [] then empty_queue

296 else strict_dequeue' (AQueue [] (rev xs)))"

297 "strict_dequeue' (AQueue xs (y # ys)) =

298 (y, AQueue xs ys)"

299 by (simp_all add: strict_dequeue'_def split: list.splits)

301 text \<open>

302 Observe that on the right hand side of the definition of @{const

303 "strict_dequeue'"}, the unspecified constant @{const empty_queue} occurs.

305 Normally, if constants without any code equations occur in a

306 program, the code generator complains (since in most cases this is

307 indeed an error). But such constants can also be thought

308 of as function definitions which always fail,

309 since there is never a successful pattern match on the left hand

310 side. In order to categorise a constant into that category

311 explicitly, use the @{attribute code} attribute with

312 @{text abort}:

313 \<close>

315 declare %quote [[code abort: empty_queue]]

317 text \<open>

318 \noindent Then the code generator will just insert an error or

319 exception at the appropriate position:

320 \<close>

322 text %quotetypewriter \<open>

323 @{code_stmts strict_dequeue' (consts) empty_queue strict_dequeue' (Haskell)}

324 \<close>

326 text \<open>

327 \noindent This feature however is rarely needed in practice. Note

328 also that the HOL default setup already declares @{const undefined},

329 which is most likely to be used in such situations, as

330 @{text "code abort"}.

331 \<close>

334 subsection \<open>If something goes utterly wrong \label{sec:utterly_wrong}\<close>

336 text \<open>

337 Under certain circumstances, the code generator fails to produce

338 code entirely. To debug these, the following hints may prove

339 helpful:

341 \begin{description}

343 \ditem{\emph{Check with a different target language}.} Sometimes

344 the situation gets more clear if you switch to another target

345 language; the code generated there might give some hints what

346 prevents the code generator to produce code for the desired

347 language.

349 \ditem{\emph{Inspect code equations}.} Code equations are the central

350 carrier of code generation. Most problems occurring while generating

351 code can be traced to single equations which are printed as part of

352 the error message. A closer inspection of those may offer the key

353 for solving issues (cf.~\secref{sec:equations}).

355 \ditem{\emph{Inspect preprocessor setup}.} The preprocessor might

356 transform code equations unexpectedly; to understand an

357 inspection of its setup is necessary (cf.~\secref{sec:preproc}).

359 \ditem{\emph{Generate exceptions}.} If the code generator

360 complains about missing code equations, in can be helpful to

361 implement the offending constants as exceptions

362 (cf.~\secref{sec:partiality}); this allows at least for a formal

363 generation of code, whose inspection may then give clues what is

364 wrong.

366 \ditem{\emph{Remove offending code equations}.} If code

367 generation is prevented by just a single equation, this can be

368 removed (cf.~\secref{sec:equations}) to allow formal code

369 generation, whose result in turn can be used to trace the

370 problem. The most prominent case here are mismatches in type

371 class signatures (\qt{wellsortedness error}).

373 \end{description}

374 \<close>

376 end