author  wenzelm 
Fri, 05 Dec 1997 18:45:19 +0100  
changeset 4375  ef2a7b589004 
parent 3485  f27a30a18a17 
child 4597  a0bdee64194c 
permissions  rwrr 
323  1 
%% $Id$ 
2 
\chapter{Syntax Transformations} \label{chap:syntax} 

3 
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}} 

4 
\newcommand\mtt[1]{\mbox{\tt #1}} 

5 
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits} 

6 
\newcommand\Constant{\ttfct{Constant}} 

7 
\newcommand\Variable{\ttfct{Variable}} 

8 
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]} 

9 
\index{syntax!transformations(} 

10 

11 
This chapter is intended for experienced Isabelle users who need to define 

12 
macros or code their own translation functions. It describes the 

13 
transformations between parse trees, abstract syntax trees and terms. 

14 

15 

16 
\section{Abstract syntax trees} \label{sec:asts} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

17 
\index{ASTs(} 
323  18 

19 
The parser, given a token list from the lexer, applies productions to yield 

20 
a parse tree\index{parse trees}. By applying some internal transformations 

21 
the parse tree becomes an abstract syntax tree, or \AST{}. Macro 

22 
expansion, further translations and finally type inference yields a 

23 
welltyped term. The printing process is the reverse, except for some 

24 
subtleties to be discussed later. 

25 

26 
Figure~\ref{fig:parse_print} outlines the parsing and printing process. 

27 
Much of the complexity is due to the macro mechanism. Using macros, you 

28 
can specify most forms of concrete syntax without writing any \ML{} code. 

29 

30 
\begin{figure} 

31 
\begin{center} 

32 
\begin{tabular}{cl} 

33 
string & \\ 

3108  34 
$\downarrow$ & lexer, parser \\ 
323  35 
parse tree & \\ 
36 
$\downarrow$ & parse \AST{} translation \\ 

37 
\AST{} & \\ 

38 
$\downarrow$ & \AST{} rewriting (macros) \\ 

39 
\AST{} & \\ 

40 
$\downarrow$ & parse translation, type inference \\ 

41 
 welltyped term  & \\ 

42 
$\downarrow$ & print translation \\ 

43 
\AST{} & \\ 

44 
$\downarrow$ & \AST{} rewriting (macros) \\ 

45 
\AST{} & \\ 

3108  46 
$\downarrow$ & print \AST{} translation, token translation \\ 
323  47 
string & 
48 
\end{tabular} 

49 

50 
\end{center} 

51 
\caption{Parsing and printing}\label{fig:parse_print} 

52 
\end{figure} 

53 

54 
Abstract syntax trees are an intermediate form between the raw parse trees 

55 
and the typed $\lambda$terms. An \AST{} is either an atom (constant or 

56 
variable) or a list of {\em at least two\/} subtrees. Internally, they 

57 
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable} 

58 
\index{*Appl} 

59 
\begin{ttbox} 

60 
datatype ast = Constant of string 

61 
 Variable of string 

62 
 Appl of ast list 

63 
\end{ttbox} 

64 
% 

65 
Isabelle uses an Sexpression syntax for abstract syntax trees. Constant 

66 
atoms are shown as quoted strings, variable atoms as nonquoted strings and 

332  67 
applications as a parenthesised list of subtrees. For example, the \AST 
323  68 
\begin{ttbox} 
69 
Appl [Constant "_constrain", 

70 
Appl [Constant "_abs", Variable "x", Variable "t"], 

71 
Appl [Constant "fun", Variable "'a", Variable "'b"]] 

72 
\end{ttbox} 

73 
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. 

74 
Both {\tt ()} and {\tt (f)} are illegal because they have too few 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

75 
subtrees. 
323  76 

77 
The resemblance to Lisp's Sexpressions is intentional, but there are two 

78 
kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the 

79 
names {\tt Constant} and {\tt Variable} too literally; in the later 

80 
translation to terms, $\Variable x$ may become a constant, free or bound 

81 
variable, even a type constructor or class name; the actual outcome depends 

82 
on the context. 

83 

84 
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the 

85 
application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of 

86 
application is determined later by context; it could be a type constructor 

87 
applied to types. 

88 

89 
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are 

90 
firstorder: the {\tt "_abs"} does not bind the {\tt x} in any way. Later 

91 
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and 

92 
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term 

93 
constructor \ttindex{Bound}). 

94 

95 

96 
\section{Transforming parse trees to \AST{}s}\label{sec:astofpt} 

97 
\index{ASTs!made from parse trees} 

98 
\newcommand\astofpt[1]{\lbrakk#1\rbrakk} 

99 

100 
The parse tree is the raw output of the parser. Translation functions, 

101 
called {\bf parse AST translations}\indexbold{translations!parse AST}, 

102 
transform the parse tree into an abstract syntax tree. 

103 

104 
The parse tree is constructed by nesting the righthand sides of the 

105 
productions used to recognize the input. Such parse trees are simply lists 

106 
of tokens and constituent parse trees, the latter representing the 

107 
nonterminals of the productions. Let us refer to the actual productions in 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

108 
the form displayed by {\tt print_syntax} (see \S\ref{sec:inspctthy} for an 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

109 
example). 
323  110 

111 
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s 

112 
by stripping out delimiters and copy productions. More precisely, the 

113 
mapping $\astofpt{}$ is derived from the productions as follows: 

114 
\begin{itemize} 

115 
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

116 
\ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$ 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

117 
its associated string. Note that for {\tt xstr} this does not include the 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

118 
quotes. 
323  119 

120 
\item Copy productions:\index{productions!copy} 

121 
$\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for 

122 
strings of delimiters, which are discarded. $P$ stands for the single 

123 
constituent that is not a delimiter; it is either a nonterminal symbol or 

124 
a name token. 

125 

126 
\item 0ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. 

127 
Here there are no constituents other than delimiters, which are 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

128 
discarded. 
323  129 

130 
\item $n$ary productions, where $n \ge 1$: delimiters are discarded and 

131 
the remaining constituents $P@1$, \ldots, $P@n$ are built into an 

132 
application whose head constant is~$c$: 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

133 
\[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = 
323  134 
\Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} 
135 
\] 

136 
\end{itemize} 

137 
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, 

138 
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. 

139 
These examples illustrate the need for further translations to make \AST{}s 

140 
closer to the typed $\lambda$calculus. The Pure syntax provides 

141 
predefined parse \AST{} translations\index{translations!parse AST} for 

142 
ordinary applications, type applications, nested abstractions, meta 

143 
implications and function types. Figure~\ref{fig:parse_ast_tr} shows their 

144 
effect on some representative input strings. 

145 

146 

147 
\begin{figure} 

148 
\begin{center} 

149 
\tt\begin{tabular}{ll} 

150 
\rm input string & \rm \AST \\\hline 

151 
"f" & f \\ 

152 
"'a" & 'a \\ 

153 
"t == u" & ("==" t u) \\ 

154 
"f(x)" & ("_appl" f x) \\ 

155 
"f(x, y)" & ("_appl" f ("_args" x y)) \\ 

156 
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ 

157 
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ 

158 
\end{tabular} 

159 
\end{center} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

160 
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
323  161 
\end{figure} 
162 

163 
\begin{figure} 

164 
\begin{center} 

165 
\tt\begin{tabular}{ll} 

166 
\rm input string & \rm \AST{} \\\hline 

167 
"f(x, y, z)" & (f x y z) \\ 

168 
"'a ty" & (ty 'a) \\ 

169 
"('a, 'b) ty" & (ty 'a 'b) \\ 

170 
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ 

171 
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ 

172 
"[ P; Q; R ] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ 

173 
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) 

174 
\end{tabular} 

175 
\end{center} 

176 
\caption{Builtin parse \AST{} translations}\label{fig:parse_ast_tr} 

177 
\end{figure} 

178 

179 
The names of constant heads in the \AST{} control the translation process. 

180 
The list of constants invoking parse \AST{} translations appears in the 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

181 
output of {\tt print_syntax} under {\tt parse_ast_translation}. 
323  182 

183 

184 
\section{Transforming \AST{}s to terms}\label{sec:termofast} 

185 
\index{terms!made from ASTs} 

186 
\newcommand\termofast[1]{\lbrakk#1\rbrakk} 

187 

188 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is 

189 
transformed into a term. This term is probably illtyped since type 

190 
inference has not occurred yet. The term may contain type constraints 

191 
consisting of applications with head {\tt "_constrain"}; the second 

192 
argument is a type encoded as a term. Type inference later introduces 

193 
correct types or rejects the input. 

194 

195 
Another set of translation functions, namely parse 

196 
translations\index{translations!parse}, may affect this process. If we 

197 
ignore parse translations for the time being, then \AST{}s are transformed 

198 
to terms by mapping \AST{} constants to constants, \AST{} variables to 

199 
schematic or free variables and \AST{} applications to applications. 

200 

201 
More precisely, the mapping $\termofast{}$ is defined by 

202 
\begin{itemize} 

203 
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x, 

204 
\mtt{dummyT})$. 

205 

206 
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} = 

207 
\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$ 

208 
the index extracted from~$xi$. 

209 

210 
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x, 

211 
\mtt{dummyT})$. 

212 

213 
\item Function applications with $n$ arguments: 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

214 
\[ \termofast{\Appl{f, x@1, \ldots, x@n}} = 
323  215 
\termofast{f} \ttapp 
216 
\termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n} 

217 
\] 

218 
\end{itemize} 

219 
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and 

220 
\verb$\index{$@{\tt\$}} are constructors of the datatype \mltydx{term}, 

221 
while \ttindex{dummyT} stands for some dummy type that is ignored during 

222 
type inference. 

223 

224 
So far the outcome is still a firstorder term. Abstractions and bound 

225 
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced 

226 
by parse translations. Such translations are attached to {\tt "_abs"}, 

227 
{\tt "!!"} and userdefined binders. 

228 

229 

230 
\section{Printing of terms} 

231 
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms} 

232 

233 
The output phase is essentially the inverse of the input phase. Terms are 

234 
translated via abstract syntax trees into strings. Finally the strings are 

235 
pretty printed. 

236 

237 
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of 

238 
terms into \AST{}s. Ignoring those, the transformation maps 

239 
term constants, variables and applications to the corresponding constructs 

240 
on \AST{}s. Abstractions are mapped to applications of the special 

241 
constant {\tt _abs}. 

242 

243 
More precisely, the mapping $\astofterm{}$ is defined as follows: 

244 
\begin{itemize} 

245 
\item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$. 

246 

247 
\item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x, 

248 
\tau)$. 

249 

250 
\item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable 

251 
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of 

252 
the {\tt indexname} $(x, i)$. 

253 

254 
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant 

255 
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ 

256 
be obtained from~$t$ by replacing all bound occurrences of~$x$ by 

257 
the free variable $x'$. This replaces corresponding occurrences of the 

258 
constructor \ttindex{Bound} by the term $\ttfct{Free} (x', 

259 
\mtt{dummyT})$: 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

260 
\[ \astofterm{\ttfct{Abs} (x, \tau, t)} = 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

261 
\Appl{\Constant \mtt{"_abs"}, 
323  262 
constrain(\Variable x', \tau), \astofterm{t'}}. 
263 
\] 

264 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

265 
\item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. 
323  266 
The occurrence of constructor \ttindex{Bound} should never happen 
267 
when printing welltyped terms; it indicates a de Bruijn index with no 

268 
matching abstraction. 

269 

270 
\item Where $f$ is not an application, 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

271 
\[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = 
323  272 
\Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} 
273 
\] 

274 
\end{itemize} 

275 
% 

332  276 
Type constraints\index{type constraints} are inserted to allow the printing 
277 
of types. This is governed by the boolean variable \ttindex{show_types}: 

323  278 
\begin{itemize} 
279 
\item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or 

332  280 
\ttindex{show_types} is set to {\tt false}. 
323  281 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

282 
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

283 
\astofterm{\tau}}$ \ otherwise. 
323  284 

285 
Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type 

286 
constructors go to {\tt Constant}s; type identifiers go to {\tt 

287 
Variable}s; type applications go to {\tt Appl}s with the type 

288 
constructor as the first element. If \ttindex{show_sorts} is set to 

289 
{\tt true}, some type variables are decorated with an \AST{} encoding 

290 
of their sort. 

291 
\end{itemize} 

292 
% 

293 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is 

294 
transformed into the final output string. The builtin {\bf print AST 

332  295 
translations}\indexbold{translations!print AST} reverse the 
323  296 
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}. 
297 

298 
For the actual printing process, the names attached to productions 

299 
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play 

300 
a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or 

301 
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production 

302 
for~$c$. Each argument~$x@i$ is converted to a string, and put in 

303 
parentheses if its priority~$(p@i)$ requires this. The resulting strings 

304 
and their syntactic sugar (denoted by \dots{} above) are joined to make a 

305 
single string. 

306 

3108  307 
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments 
308 
than the corresponding production, it is first split into 

309 
$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications 

310 
with too few arguments or with nonconstant head or without a 

311 
corresponding production are printed as $f(x@1, \ldots, x@l)$ or 

312 
$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated 

313 
with some name $c$ are tried in order of appearance. An occurrence of 

323  314 
$\Variable x$ is simply printed as~$x$. 
315 

316 
Blanks are {\em not\/} inserted automatically. If blanks are required to 

317 
separate tokens, specify them in the mixfix declaration, possibly preceded 

318 
by a slash~({\tt/}) to allow a line break. 

319 
\index{ASTs)} 

320 

321 

322 

3108  323 
\section{Macros: syntactic rewriting} \label{sec:macros} 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

324 
\index{macros(}\index{rewriting!syntactic(} 
323  325 

326 
Mixfix declarations alone can handle situations where there is a direct 

327 
connection between the concrete syntax and the underlying term. Sometimes 

328 
we require a more elaborate concrete syntax, such as quantifiers and list 

329 
notation. Isabelle's {\bf macros} and {\bf translation functions} can 

330 
perform translations such as 

331 
\begin{center}\tt 

332 
\begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l} 

333 
ALL x:A.P & Ball(A, \%x.P) \\ \relax 

334 
[x, y, z] & Cons(x, Cons(y, Cons(z, Nil))) 

335 
\end{tabular} 

336 
\end{center} 

337 
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they 

338 
are the most powerful translation mechanism but are difficult to read or 

339 
write. Macros are specified by firstorder rewriting systems that operate 

340 
on abstract syntax trees. They are usually easy to read and write, and can 

341 
express all but the most obscure translations. 

342 

3108  343 
Figure~\ref{fig:set_trans} defines a fragment of firstorder logic and 
344 
set theory.\footnote{This and the following theories are complete 

345 
working examples, though they specify only syntax, no axioms. The 

346 
file {\tt ZF/ZF.thy} presents a full set theory definition, 

347 
including many macro rules.} Theory {\tt SetSyntax} declares 

348 
constants for set comprehension ({\tt Collect}), replacement ({\tt 

349 
Replace}) and bounded universal quantification ({\tt Ball}). Each 

350 
of these binds some variables. Without additional syntax we should 

351 
have to write $\forall x \in A. P$ as {\tt Ball(A,\%x.P)}, and 

352 
similarly for the others. 

323  353 

354 
\begin{figure} 

355 
\begin{ttbox} 

3108  356 
SetSyntax = Pure + 
323  357 
types 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

358 
i o 
323  359 
arities 
360 
i, o :: logic 

361 
consts 

1387  362 
Trueprop :: o => prop ("_" 5) 
363 
Collect :: [i, i => o] => i 

364 
Replace :: [i, [i, i] => o] => i 

365 
Ball :: [i, i => o] => o 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

366 
syntax 
1387  367 
"{\at}Collect" :: [idt, i, o] => i ("(1{\ttlbrace}_:_./ _{\ttrbrace})") 
368 
"{\at}Replace" :: [idt, idt, i, o] => i ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})") 

369 
"{\at}Ball" :: [idt, i, o] => o ("(3ALL _:_./ _)" 10) 

323  370 
translations 
371 
"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)" 

372 
"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)" 

373 
"ALL x:A. P" == "Ball(A, \%x. P)" 

374 
end 

375 
\end{ttbox} 

376 
\caption{Macro example: set theory}\label{fig:set_trans} 

377 
\end{figure} 

378 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

379 
The theory specifies a variablebinding syntax through additional productions 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

380 
that have mixfix declarations. Each noncopy production must specify some 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

381 
constant, which is used for building \AST{}s. \index{constants!syntactic} The 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

382 
additional constants are decorated with {\tt\at} to stress their purely 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

383 
syntactic purpose; they may not occur within the final welltyped terms, 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

384 
being declared as {\tt syntax} rather than {\tt consts}. 
323  385 

386 
The translations cause the replacement of external forms by internal forms 

387 
after parsing, and vice versa before printing of terms. As a specification 

388 
of the set theory notation, they should be largely selfexplanatory. The 

389 
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball}, 

390 
appear implicitly in the macro rules via their mixfix forms. 

391 

392 
Macros can define variablebinding syntax because they operate on \AST{}s, 

393 
which have no inbuilt notion of bound variable. The macro variables {\tt 

394 
x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers, 

395 
in this case bound variables. The macro variables {\tt P} and~{\tt Q} 

396 
range over formulae containing bound variable occurrences. 

397 

398 
Other applications of the macro system can be less straightforward, and 

399 
there are peculiarities. The rest of this section will describe in detail 

400 
how Isabelle macros are preprocessed and applied. 

401 

402 

403 
\subsection{Specifying macros} 

404 
Macros are basically rewrite rules on \AST{}s. But unlike other macro 

405 
systems found in programming languages, Isabelle's macros work in both 

406 
directions. Therefore a syntax contains two lists of rewrites: one for 

407 
parsing and one for printing. 

408 

409 
\index{*translations section} 

410 
The {\tt translations} section specifies macros. The syntax for a macro is 

411 
\[ (root)\; string \quad 

412 
\left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array} 

413 
\right\} \quad 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

414 
(root)\; string 
323  415 
\] 
416 
% 

417 
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both 

418 
({\tt ==}). The two strings specify the left and righthand sides of the 

419 
macro rule. The $(root)$ specification is optional; it specifies the 

420 
nonterminal for parsing the $string$ and if omitted defaults to {\tt 

421 
logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions: 

422 
\begin{itemize} 

423 
\item Rules must be left linear: $l$ must not contain repeated variables. 

424 

425 
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l = 

426 
(\mtt"c\mtt" ~ x@1 \ldots x@n)$. 

427 

428 
\item Every variable in~$r$ must also occur in~$l$. 

429 
\end{itemize} 

430 

3108  431 
Macro rules may refer to any syntax from the parent theories. They 
432 
may also refer to anything defined before the current {\tt 

323  433 
translations} section  including any mixfix declarations. 
434 

435 
Upon declaration, both sides of the macro rule undergo parsing and parse 

436 
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo 

437 
macro expansion. The lexer runs in a different mode that additionally 

438 
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt}, 

439 
{\tt _K}). Thus, a constant whose name starts with an underscore can 

440 
appear in macro rules but not in ordinary terms. 

441 

442 
Some atoms of the macro rule's \AST{} are designated as constants for 

443 
matching. These are all names that have been declared as classes, types or 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

444 
constants (logical and syntactic). 
323  445 

3108  446 
The result of this preprocessing is two lists of macro rules, each 
447 
stored as a pair of \AST{}s. They can be viewed using {\tt 

448 
print_syntax} (sections \ttindex{parse_rules} and 

449 
\ttindex{print_rules}). For theory~{\tt SetSyntax} of 

450 
Fig.~\ref{fig:set_trans} these are 

323  451 
\begin{ttbox} 
452 
parse_rules: 

453 
("{\at}Collect" x A P) > ("Collect" A ("_abs" x P)) 

454 
("{\at}Replace" y x A Q) > ("Replace" A ("_abs" x ("_abs" y Q))) 

455 
("{\at}Ball" x A P) > ("Ball" A ("_abs" x P)) 

456 
print_rules: 

457 
("Collect" A ("_abs" x P)) > ("{\at}Collect" x A P) 

458 
("Replace" A ("_abs" x ("_abs" y Q))) > ("{\at}Replace" y x A Q) 

459 
("Ball" A ("_abs" x P)) > ("{\at}Ball" x A P) 

460 
\end{ttbox} 

461 

462 
\begin{warn} 

463 
Avoid choosing variable names that have previously been used as 

464 
constants, types or type classes; the {\tt consts} section in the output 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

465 
of {\tt print_syntax} lists all such names. If a macro rule works 
323  466 
incorrectly, inspect its internal form as shown above, recalling that 
467 
constants appear as quoted strings and variables without quotes. 

468 
\end{warn} 

469 

470 
\begin{warn} 

471 
If \ttindex{eta_contract} is set to {\tt true}, terms will be 

472 
$\eta$contracted {\em before\/} the \AST{} rewriter sees them. Thus some 

473 
abstraction nodes needed for print rules to match may vanish. For example, 

332  474 
\verbBall(A, %x. P(x)) contracts to {\tt Ball(A, P)}; the print rule does 
323  475 
not apply and the output will be {\tt Ball(A, P)}. This problem would not 
476 
occur if \ML{} translation functions were used instead of macros (as is 

477 
done for binder declarations). 

478 
\end{warn} 

479 

480 

481 
\begin{warn} 

482 
Another trap concerns type constraints. If \ttindex{show_types} is set to 

483 
{\tt true}, bound variables will be decorated by their meta types at the 

484 
binding place (but not at occurrences in the body). Matching with 

485 
\verbCollect(A, %x. P) binds {\tt x} to something like {\tt ("_constrain" y 

486 
"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

487 
appear in the external form, say \verb{y::i:A::i. P::o}. 
323  488 

489 
To allow such constraints to be reread, your syntax should specify bound 

490 
variables using the nonterminal~\ndx{idt}. This is the case in our 

3108  491 
example. Choosing {\tt id} instead of {\tt idt} is a common error. 
323  492 
\end{warn} 
493 

494 

495 

496 
\subsection{Applying rules} 

497 
As a term is being parsed or printed, an \AST{} is generated as an 

498 
intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is 

332  499 
normalised by applying macro rules in the manner of a traditional term 
323  500 
rewriting system. We first examine how a single rule is applied. 
501 

332  502 
Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some 
323  503 
translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an 
504 
instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex 

505 
matched by $l$ may be replaced by the corresponding instance of~$r$, thus 

506 
{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf 

507 
placeholders} that may occur in rule patterns but not in ordinary 

508 
\AST{}s; {\tt Variable} atoms serve this purpose. 

509 

510 
The matching of the object~$u$ by the pattern~$l$ is performed as follows: 

511 
\begin{itemize} 

512 
\item Every constant matches itself. 

513 

514 
\item $\Variable x$ in the object matches $\Constant x$ in the pattern. 

515 
This point is discussed further below. 

516 

517 
\item Every \AST{} in the object matches $\Variable x$ in the pattern, 

518 
binding~$x$ to~$u$. 

519 

520 
\item One application matches another if they have the same number of 

521 
subtrees and corresponding subtrees match. 

522 

523 
\item In every other case, matching fails. In particular, {\tt 

524 
Constant}~$x$ can only match itself. 

525 
\end{itemize} 

526 
A successful match yields a substitution that is applied to~$r$, generating 

527 
the instance that replaces~$u$. 

528 

529 
The second case above may look odd. This is where {\tt Variable}s of 

530 
nonrule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not 

531 
far removed from parse trees; at this level it is not yet known which 

532 
identifiers will become constants, bounds, frees, types or classes. As 

533 
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

534 
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid}, 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

535 
\ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the other 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

536 
hand, when \AST{}s generated from terms for printing, all constants and type 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

537 
constructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

538 
contain a messy mixture of {\tt Variable}s and {\tt Constant}s. This is 
323  539 
insignificant at macro level because matching treats them alike. 
540 

541 
Because of this behaviour, different kinds of atoms with the same name are 

542 
indistinguishable, which may make some rules prone to misbehaviour. Example: 

543 
\begin{ttbox} 

544 
types 

545 
Nil 

546 
consts 

1387  547 
Nil :: 'a list 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

548 
syntax 
1387  549 
"[]" :: 'a list ("[]") 
323  550 
translations 
551 
"[]" == "Nil" 

552 
\end{ttbox} 

553 
The term {\tt Nil} will be printed as {\tt []}, just as expected. 

554 
The term \verb%Nil.t will be printed as \verb%[].t, which might not be 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

555 
expected! Guess how type~{\tt Nil} is printed? 
323  556 

3108  557 
Normalizing an \AST{} involves repeatedly applying macro rules until 
558 
none are applicable. Macro rules are chosen in order of appearance in 

559 
the theory definitions. You can watch the normalization of \AST{}s 

560 
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} 

561 
to {\tt true}.\index{tracing!of macros} Alternatively, use 

323  562 
\ttindex{Syntax.test_read}. The information displayed when tracing 
3108  563 
includes the \AST{} before normalization ({\tt pre}), redexes with 
564 
results ({\tt rewrote}), the normal form finally reached ({\tt post}) 

565 
and some statistics ({\tt normalize}). If tracing is off, 

566 
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to 

567 
enable printing of the normal form and statistics only. 

323  568 

569 

570 
\subsection{Example: the syntax of finite sets} 

571 
\index{examples!of macros} 

572 

573 
This example demonstrates the use of recursive macros to implement a 

574 
convenient notation for finite sets. 

575 
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol} 

576 
\index{"@Enum@{\tt\at Enum} constant} 

577 
\index{"@Finset@{\tt\at Finset} constant} 

578 
\begin{ttbox} 

3108  579 
FinSyntax = SetSyntax + 
323  580 
types 
581 
is 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

582 
syntax 
1387  583 
"" :: i => is ("_") 
584 
"{\at}Enum" :: [i, is] => is ("_,/ _") 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

585 
consts 
1387  586 
empty :: i ("{\ttlbrace}{\ttrbrace}") 
587 
insert :: [i, i] => i 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

588 
syntax 
1387  589 
"{\at}Finset" :: is => i ("{\ttlbrace}(_){\ttrbrace}") 
323  590 
translations 
591 
"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})" 

592 
"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})" 

593 
end 

594 
\end{ttbox} 

595 
Finite sets are internally built up by {\tt empty} and {\tt insert}. The 

596 
declarations above specify \verb{x, y, z} as the external representation 

597 
of 

598 
\begin{ttbox} 

599 
insert(x, insert(y, insert(z, empty))) 

600 
\end{ttbox} 

601 
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt 

602 
i} separated by commas. The mixfix declaration \hbox{\verb"_,/ _"} 

603 
allows a line break after the comma for \rmindex{pretty printing}; if no 

604 
line break is required then a space is printed instead. 

605 

606 
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

607 
declaration. Hence {\tt is} is not a logical type and may be used safely as 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

608 
a new nonterminal for custom syntax. The nonterminal~{\tt is} can later be 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

609 
reused for other enumerations of type~{\tt i} like lists or tuples. If we 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

610 
had needed polymorphic enumerations, we could have used the predefined 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

611 
nonterminal symbol \ndx{args} and skipped this part altogether. 
323  612 

613 
\index{"@Finset@{\tt\at Finset} constant} 

614 
Next follows {\tt empty}, which is already equipped with its syntax 

615 
\verb{}, and {\tt insert} without concrete syntax. The syntactic 

616 
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt 

617 
i} enclosed in curly braces. Remember that a pair of parentheses, as in 

618 
\verb"{(_)}", specifies a block of indentation for pretty printing. 

619 

620 
The translations may look strange at first. Macro rules are best 

621 
understood in their internal forms: 

622 
\begin{ttbox} 

623 
parse_rules: 

624 
("{\at}Finset" ("{\at}Enum" x xs)) > ("insert" x ("{\at}Finset" xs)) 

625 
("{\at}Finset" x) > ("insert" x "empty") 

626 
print_rules: 

627 
("insert" x ("{\at}Finset" xs)) > ("{\at}Finset" ("{\at}Enum" x xs)) 

628 
("insert" x "empty") > ("{\at}Finset" x) 

629 
\end{ttbox} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

630 
This shows that \verb{x,xs} indeed matches any set enumeration of at least 
323  631 
two elements, binding the first to {\tt x} and the rest to {\tt xs}. 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

632 
Likewise, \verb{xs} and \verb{x} represent any set enumeration. 
323  633 
The parse rules only work in the order given. 
634 

635 
\begin{warn} 

332  636 
The \AST{} rewriter cannot distinguish constants from variables and looks 
323  637 
only for names of atoms. Thus the names of {\tt Constant}s occurring in 
638 
the (internal) lefthand side of translation rules should be regarded as 

639 
\rmindex{reserved words}. Choose nonidentifiers like {\tt\at Finset} or 

640 
sufficiently long and strange names. If a bound variable's name gets 

641 
rewritten, the result will be incorrect; for example, the term 

642 
\begin{ttbox} 

643 
\%empty insert. insert(x, empty) 

644 
\end{ttbox} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

645 
\par\noindent is incorrectly printed as \verb%empty insert. {x}. 
323  646 
\end{warn} 
647 

648 

649 
\subsection{Example: a parse macro for dependent types}\label{prod_trans} 

650 
\index{examples!of macros} 

651 

652 
As stated earlier, a macro rule may not introduce new {\tt Variable}s on 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

653 
the righthand side. Something like \verb"K(B)" => "%x.B" is illegal; 
323  654 
if allowed, it could cause variable capture. In such cases you usually 
655 
must fall back on translation functions. But a trick can make things 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

656 
readable in some cases: {\em calling\/} translation functions by parse 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

657 
macros: 
323  658 
\begin{ttbox} 
3135  659 
ProdSyntax = SetSyntax + 
323  660 
consts 
1387  661 
Pi :: [i, i => i] => i 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

662 
syntax 
1387  663 
"{\at}PROD" :: [idt, i, i] => i ("(3PROD _:_./ _)" 10) 
664 
"{\at}>" :: [i, i] => i ("(_ >/ _)" [51, 50] 50) 

323  665 
\ttbreak 
666 
translations 

667 
"PROD x:A. B" => "Pi(A, \%x. B)" 

668 
"A > B" => "Pi(A, _K(B))" 

669 
end 

670 
ML 

671 
val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}>"))]; 

672 
\end{ttbox} 

673 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

674 
Here {\tt Pi} is a logical constant for constructing general products. 
323  675 
Two external forms exist: the general case {\tt PROD x:A.B} and the 
676 
function space {\tt A > B}, which abbreviates \verbPi(A, %x.B) when {\tt B} 

677 
does not depend on~{\tt x}. 

678 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

679 
The second parse macro introduces {\tt _K(B)}, which later becomes 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

680 
\verb%x.B due to a parse translation associated with \cdx{_K}. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

681 
Unfortunately there is no such trick for printing, so we have to add a {\tt 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

682 
ML} section for the print translation \ttindex{dependent_tr'}. 
323  683 

684 
Recall that identifiers with a leading {\tt _} are allowed in translation 

685 
rules, but not in ordinary terms. Thus we can create \AST{}s containing 

686 
names that are not directly expressible. 

687 

688 
The parse translation for {\tt _K} is already installed in Pure, and {\tt 

689 
dependent_tr'} is exported by the syntax module for public use. See 

690 
\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions. 

691 
\index{macros)}\index{rewriting!syntactic)} 

692 

693 

694 
\section{Translation functions} \label{sec:tr_funs} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

695 
\index{translations(} 
323  696 
% 
697 
This section describes the translation function mechanism. By writing 

3108  698 
\ML{} functions, you can do almost everything to terms or \AST{}s 
699 
during parsing and printing. The logic \LK\ is a good example of 

700 
sophisticated transformations between internal and external 

701 
representations of sequents; here, macros would be useless. 

323  702 

703 
A full understanding of translations requires some familiarity 

704 
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ}, 

705 
{\tt Syntax.ast} and the encodings of types and terms as such at the various 

706 
stages of the parsing or printing process. Most users should never need to 

707 
use translation functions. 

708 

709 
\subsection{Declaring translation functions} 

3108  710 
There are four kinds of translation functions, with one of these 
711 
coming in two variants. Each such function is associated with a name, 

712 
which triggers calls to it. Such names can be constants (logical or 

713 
syntactic) or type constructors. 

323  714 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

715 
{\tt print_syntax} displays the sets of names associated with the 
3108  716 
translation functions of a theory under 
323  717 
\ttindex{parse_ast_translation}, \ttindex{parse_translation}, 
4375  718 
\ttindex{print_translation} (or \ttindex{typed_print_translation}) and 
719 
\ttindex{print_ast_translation}. You can add new ones via the {\tt 

720 
ML} section\index{*ML section} of a theory definition file. There 

721 
may never be more than one function of the same kind per name. Even 

722 
though the {\tt ML} section is the very last part of the file, newly 

723 
installed translation functions are already effective when processing 

724 
all of the preceding sections. 

323  725 

3108  726 
The {\tt ML} section's contents are simply copied verbatim near the 
727 
beginning of the \ML\ file generated from a theory definition file. 

728 
Definitions made here are accessible as components of an \ML\ 

729 
structure; to make some parts private, use an \ML{} {\tt local} 

730 
declaration. The {\ML} code may install translation functions by 

731 
declaring any of the following identifiers: 

323  732 
\begin{ttbox} 
3108  733 
val parse_ast_translation : (string * (ast list > ast)) list 
734 
val print_ast_translation : (string * (ast list > ast)) list 

735 
val parse_translation : (string * (term list > term)) list 

736 
val print_translation : (string * (term list > term)) list 

4375  737 
val typed_print_translation : 
738 
(string * (bool > typ > term list > term)) list 

323  739 
\end{ttbox} 
740 

741 
\subsection{The translation strategy} 

3108  742 
The different kinds of translation functions are called during the 
743 
transformations between parse trees, \AST{}s and terms (recall 

744 
Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form 

745 
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation 

746 
function $f$ of appropriate kind exists for $c$, the result is 

747 
computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$. 

323  748 

3108  749 
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s. 
750 
A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, 

751 
\ldots, x@n}$. For term translations, the arguments are terms and a 

752 
combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} 

753 
(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow more 

754 
sophisticated transformations than \AST{}s do, typically involving 

755 
abstractions and bound variables. {\em Typed} print translations may 

4375  756 
even peek at the type $\tau$ of the constant they are invoked on; they 
757 
are also passed the current value of the \ttindex{show_sorts} flag. 

323  758 

3108  759 
Regardless of whether they act on terms or \AST{}s, translation 
760 
functions called during the parsing process differ from those for 

761 
printing more fundamentally in their overall behaviour: 

323  762 
\begin{description} 
763 
\item[Parse translations] are applied bottomup. The arguments are already 

764 
in translated form. The translations must not fail; exceptions trigger 

765 
an error message. 

766 

767 
\item[Print translations] are applied topdown. They are supplied with 

768 
arguments that are partly still in internal form. The result again 

769 
undergoes translation; therefore a print translation should not introduce 

770 
as head the very constant that invoked it. The function may raise 

771 
exception \xdx{Match} to indicate failure; in this event it has no 

772 
effect. 

773 
\end{description} 

774 

775 
Only constant atoms  constructor \ttindex{Constant} for \AST{}s and 

776 
\ttindex{Const} for terms  can invoke translation functions. This 

777 
causes another difference between parsing and printing. 

778 

779 
Parsing starts with a string and the constants are not yet identified. 

780 
Only parse tree heads create {\tt Constant}s in the resulting \AST, as 

781 
described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may 

782 
introduce further {\tt Constant}s. When the final \AST{} is converted to a 

783 
term, all {\tt Constant}s become {\tt Const}s, as described in 

784 
\S\ref{sec:termofast}. 

785 

786 
Printing starts with a welltyped term and all the constants are known. So 

787 
all logical constants and type constructors may invoke print translations. 

788 
These, and macros, may introduce further constants. 

789 

790 

791 
\subsection{Example: a print translation for dependent types} 

792 
\index{examples!of translations}\indexbold{*dependent_tr'} 

793 

794 
Let us continue the dependent type example (page~\pageref{prod_trans}) by 

795 
examining the parse translation for~\cdx{_K} and the print translation 

796 
{\tt dependent_tr'}, which are both builtin. By convention, parse 

797 
translations have names ending with {\tt _tr} and print translations have 

798 
names ending with {\tt _tr'}. Search for such names in the Isabelle 

799 
sources to locate more examples. 

800 

801 
Here is the parse translation for {\tt _K}: 

802 
\begin{ttbox} 

803 
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t) 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

804 
 k_tr ts = raise TERM ("k_tr", ts); 
323  805 
\end{ttbox} 
806 
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new 

807 
{\tt Abs} node with a body derived from $t$. Since terms given to parse 

808 
translations are not yet typed, the type of the bound variable in the new 

809 
{\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound} 

810 
nodes referring to outer abstractions by calling \ttindex{incr_boundvars}, 

811 
a basic term manipulation function defined in {\tt Pure/term.ML}. 

812 

813 
Here is the print translation for dependent types: 

814 
\begin{ttbox} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

815 
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) = 
323  816 
if 0 mem (loose_bnos B) then 
3108  817 
let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in 
818 
list_comb 

3135  819 
(Const (q,{\thinspace}dummyT) $ Syntax.mark_boundT (x',{\thinspace}T) $ A $ B',{\thinspace}ts) 
323  820 
end 
821 
else list_comb (Const (r, dummyT) $ A $ B, ts) 

822 
 dependent_tr' _ _ = raise Match; 

823 
\end{ttbox} 

3135  824 
The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt 
3108  825 
dependent_tr'} by a partial application during its installation. 
826 
For example, we could set up print translations for both {\tt Pi} and 

827 
{\tt Sigma} by including 

323  828 
\begin{ttbox}\index{*ML section} 
829 
val print_translation = 

830 
[("Pi", dependent_tr' ("{\at}PROD", "{\at}>")), 

831 
("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))]; 

832 
\end{ttbox} 

3108  833 
within the {\tt ML} section. The first of these transforms ${\tt 
834 
Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or 

835 
$\hbox{\tt{\at}>}(A, B)$, choosing the latter form if $B$ does not 

836 
depend on~$x$. It checks this using \ttindex{loose_bnos}, yet another 

837 
function from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$ 

838 
renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt 

839 
Bound} nodes referring to the {\tt Abs} node replaced by 

840 
$\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound 

841 
variable). 

323  842 

843 
We must be careful with types here. While types of {\tt Const}s are 

844 
ignored, type constraints may be printed for some {\tt Free}s and 

845 
{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type 

846 
\ttindex{dummyT} are never printed with constraint, though. The line 

847 
\begin{ttbox} 

3108  848 
let val (x', B') = Syntax.variant_abs' (x, dummyT, B); 
849 
\end{ttbox}\index{*Syntax.variant_abs'} 

323  850 
replaces bound variable occurrences in~$B$ by the free variable $x'$ with 
851 
type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the 

852 
correct type~{\tt T}, so this is the only place where a type 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset

853 
constraint might appear. 
3108  854 

855 
Also note that we are responsible to mark free identifiers that 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

856 
actually represent bound variables. This is achieved by 
3108  857 
\ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above. 
858 
Failing to do so may cause these names to be printed in the wrong 

859 
style. \index{translations)} \index{syntax!transformations)} 

860 

861 

862 
\section{Token translations} \label{sec:tok_tr} 

863 
\index{token translations(} 

864 
% 

865 
Isabelle's metalogic features quite a lot of different kinds of 

866 
identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free}, 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

867 
{\em bound}, {\em var}. One might want to have these printed in 
3108  868 
different styles, e.g.\ in bold or italic, or even transcribed into 
869 
something more readable like $\alpha, \alpha', \beta$ instead of {\tt 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

870 
'a}, {\tt 'aa}, {\tt 'b} for type variables. Token translations 
3108  871 
provide a means to such ends, enabling the user to install certain 
872 
\ML{} functions associated with any logical \rmindex{token class} and 

873 
depending on some \rmindex{print mode}. 

874 

875 
The logical class of identifiers can not necessarily be determined by 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

876 
its syntactic category, though. For example, consider free vs.\ bound 
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

877 
variables. So Isabelle's pretty printing mechanism, starting from 
3108  878 
fully typed terms, has to be careful to preserve this additional 
879 
information\footnote{This is done by marking atoms in abstract syntax 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

880 
trees appropriately. The marks are actually visible by print 
3108  881 
translation functions  they are just special constants applied to 
882 
atomic asts, for example \texttt{("_bound" x)}.}. In particular, 

883 
usersupplied print translation functions operating on terms have to 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

884 
be wellbehaved in this respect. Free identifiers introduced to 
3108  885 
represent bound variables have to be marked appropriately (cf.\ the 
886 
example at the end of \S\ref{sec:tr_funs}). 

887 

888 
\medskip Token translations may be installed by declaring the 

889 
\ttindex{token_translation} value within the \texttt{ML} section of a 

890 
theory definition file: 

891 
\begin{ttbox} 

892 
val token_translation: (string * string * (string > string * int)) list 

893 
\end{ttbox}\index{token_translation} 

894 
The elements of this list are of the form $(m, c, f)$, where $m$ is a 

895 
print mode identifier, $c$ a token class, and $f\colon string \to 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

896 
string \times int$ the actual translation function. Assuming that $x$ 
3108  897 
is of identifier class $c$, and print mode $m$ is the first one of all 
898 
currently active modes that provide some translation for $c$, then $x$ 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

899 
is output according to $f(x) = (x', len)$. Thereby $x'$ is the 
3108  900 
modified identifier name and $len$ its visual length approximated in 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset

901 
terms of whole characters. Thus $x'$ may include nonprinting parts 
3108  902 
like control sequences or markup information for typesetting systems. 
903 

904 
%FIXME example (?) 

905 

906 
\index{token translations)} 