323

1 
%% $Id$


2 
\chapter{Syntax Transformations} \label{chap:syntax}


3 
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}


4 
\newcommand\mtt[1]{\mbox{\tt #1}}


5 
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}


6 
\newcommand\Constant{\ttfct{Constant}}


7 
\newcommand\Variable{\ttfct{Variable}}


8 
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}


9 
\index{syntax!transformations(}


10 


11 
This chapter is intended for experienced Isabelle users who need to define


12 
macros or code their own translation functions. It describes the


13 
transformations between parse trees, abstract syntax trees and terms.


14 


15 


16 
\section{Abstract syntax trees} \label{sec:asts}


17 
\index{ASTs(}


18 


19 
The parser, given a token list from the lexer, applies productions to yield


20 
a parse tree\index{parse trees}. By applying some internal transformations


21 
the parse tree becomes an abstract syntax tree, or \AST{}. Macro


22 
expansion, further translations and finally type inference yields a


23 
welltyped term. The printing process is the reverse, except for some


24 
subtleties to be discussed later.


25 


26 
Figure~\ref{fig:parse_print} outlines the parsing and printing process.


27 
Much of the complexity is due to the macro mechanism. Using macros, you


28 
can specify most forms of concrete syntax without writing any \ML{} code.


29 


30 
\begin{figure}


31 
\begin{center}


32 
\begin{tabular}{cl}


33 
string & \\


34 
$\downarrow$ & parser \\


35 
parse tree & \\


36 
$\downarrow$ & parse \AST{} translation \\


37 
\AST{} & \\


38 
$\downarrow$ & \AST{} rewriting (macros) \\


39 
\AST{} & \\


40 
$\downarrow$ & parse translation, type inference \\


41 
 welltyped term  & \\


42 
$\downarrow$ & print translation \\


43 
\AST{} & \\


44 
$\downarrow$ & \AST{} rewriting (macros) \\


45 
\AST{} & \\


46 
$\downarrow$ & print \AST{} translation, printer \\


47 
string &


48 
\end{tabular}


49 


50 
\end{center}


51 
\caption{Parsing and printing}\label{fig:parse_print}


52 
\end{figure}


53 


54 
Abstract syntax trees are an intermediate form between the raw parse trees


55 
and the typed $\lambda$terms. An \AST{} is either an atom (constant or


56 
variable) or a list of {\em at least two\/} subtrees. Internally, they


57 
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}


58 
\index{*Appl}


59 
\begin{ttbox}


60 
datatype ast = Constant of string


61 
 Variable of string


62 
 Appl of ast list


63 
\end{ttbox}


64 
%


65 
Isabelle uses an Sexpression syntax for abstract syntax trees. Constant


66 
atoms are shown as quoted strings, variable atoms as nonquoted strings and


67 
applications as a parenthesized list of subtrees. For example, the \AST


68 
\begin{ttbox}


69 
Appl [Constant "_constrain",


70 
Appl [Constant "_abs", Variable "x", Variable "t"],


71 
Appl [Constant "fun", Variable "'a", Variable "'b"]]


72 
\end{ttbox}


73 
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.


74 
Both {\tt ()} and {\tt (f)} are illegal because they have too few


75 
subtrees.


76 


77 
The resemblance to Lisp's Sexpressions is intentional, but there are two


78 
kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the


79 
names {\tt Constant} and {\tt Variable} too literally; in the later


80 
translation to terms, $\Variable x$ may become a constant, free or bound


81 
variable, even a type constructor or class name; the actual outcome depends


82 
on the context.


83 


84 
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the


85 
application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of


86 
application is determined later by context; it could be a type constructor


87 
applied to types.


88 


89 
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are


90 
firstorder: the {\tt "_abs"} does not bind the {\tt x} in any way. Later


91 
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and


92 
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term


93 
constructor \ttindex{Bound}).


94 


95 


96 
\section{Transforming parse trees to \AST{}s}\label{sec:astofpt}


97 
\index{ASTs!made from parse trees}


98 
\newcommand\astofpt[1]{\lbrakk#1\rbrakk}


99 


100 
The parse tree is the raw output of the parser. Translation functions,


101 
called {\bf parse AST translations}\indexbold{translations!parse AST},


102 
transform the parse tree into an abstract syntax tree.


103 


104 
The parse tree is constructed by nesting the righthand sides of the


105 
productions used to recognize the input. Such parse trees are simply lists


106 
of tokens and constituent parse trees, the latter representing the


107 
nonterminals of the productions. Let us refer to the actual productions in


108 
the form displayed by {\tt Syntax.print_syntax}.


109 


110 
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s


111 
by stripping out delimiters and copy productions. More precisely, the


112 
mapping $\astofpt{}$ is derived from the productions as follows:


113 
\begin{itemize}


114 
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},


115 
\ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.


116 


117 
\item Copy productions:\index{productions!copy}


118 
$\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for


119 
strings of delimiters, which are discarded. $P$ stands for the single


120 
constituent that is not a delimiter; it is either a nonterminal symbol or


121 
a name token.


122 


123 
\item 0ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.


124 
Here there are no constituents other than delimiters, which are


125 
discarded.


126 


127 
\item $n$ary productions, where $n \ge 1$: delimiters are discarded and


128 
the remaining constituents $P@1$, \ldots, $P@n$ are built into an


129 
application whose head constant is~$c$:


130 
\[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =


131 
\Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}


132 
\]


133 
\end{itemize}


134 
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},


135 
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.


136 
These examples illustrate the need for further translations to make \AST{}s


137 
closer to the typed $\lambda$calculus. The Pure syntax provides


138 
predefined parse \AST{} translations\index{translations!parse AST} for


139 
ordinary applications, type applications, nested abstractions, meta


140 
implications and function types. Figure~\ref{fig:parse_ast_tr} shows their


141 
effect on some representative input strings.


142 


143 


144 
\begin{figure}


145 
\begin{center}


146 
\tt\begin{tabular}{ll}


147 
\rm input string & \rm \AST \\\hline


148 
"f" & f \\


149 
"'a" & 'a \\


150 
"t == u" & ("==" t u) \\


151 
"f(x)" & ("_appl" f x) \\


152 
"f(x, y)" & ("_appl" f ("_args" x y)) \\


153 
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\


154 
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\


155 
\end{tabular}


156 
\end{center}


157 
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}


158 
\end{figure}


159 


160 
\begin{figure}


161 
\begin{center}


162 
\tt\begin{tabular}{ll}


163 
\rm input string & \rm \AST{} \\\hline


164 
"f(x, y, z)" & (f x y z) \\


165 
"'a ty" & (ty 'a) \\


166 
"('a, 'b) ty" & (ty 'a 'b) \\


167 
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\


168 
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\


169 
"[ P; Q; R ] => S" & ("==>" P ("==>" Q ("==>" R S))) \\


170 
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))


171 
\end{tabular}


172 
\end{center}


173 
\caption{Builtin parse \AST{} translations}\label{fig:parse_ast_tr}


174 
\end{figure}


175 


176 
The names of constant heads in the \AST{} control the translation process.


177 
The list of constants invoking parse \AST{} translations appears in the


178 
output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.


179 


180 


181 
\section{Transforming \AST{}s to terms}\label{sec:termofast}


182 
\index{terms!made from ASTs}


183 
\newcommand\termofast[1]{\lbrakk#1\rbrakk}


184 


185 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is


186 
transformed into a term. This term is probably illtyped since type


187 
inference has not occurred yet. The term may contain type constraints


188 
consisting of applications with head {\tt "_constrain"}; the second


189 
argument is a type encoded as a term. Type inference later introduces


190 
correct types or rejects the input.


191 


192 
Another set of translation functions, namely parse


193 
translations\index{translations!parse}, may affect this process. If we


194 
ignore parse translations for the time being, then \AST{}s are transformed


195 
to terms by mapping \AST{} constants to constants, \AST{} variables to


196 
schematic or free variables and \AST{} applications to applications.


197 


198 
More precisely, the mapping $\termofast{}$ is defined by


199 
\begin{itemize}


200 
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,


201 
\mtt{dummyT})$.


202 


203 
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =


204 
\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$


205 
the index extracted from~$xi$.


206 


207 
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,


208 
\mtt{dummyT})$.


209 


210 
\item Function applications with $n$ arguments:


211 
\[ \termofast{\Appl{f, x@1, \ldots, x@n}} =


212 
\termofast{f} \ttapp


213 
\termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}


214 
\]


215 
\end{itemize}


216 
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and


217 
\verb$\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},


218 
while \ttindex{dummyT} stands for some dummy type that is ignored during


219 
type inference.


220 


221 
So far the outcome is still a firstorder term. Abstractions and bound


222 
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced


223 
by parse translations. Such translations are attached to {\tt "_abs"},


224 
{\tt "!!"} and userdefined binders.


225 


226 


227 
\section{Printing of terms}


228 
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}


229 


230 
The output phase is essentially the inverse of the input phase. Terms are


231 
translated via abstract syntax trees into strings. Finally the strings are


232 
pretty printed.


233 


234 
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of


235 
terms into \AST{}s. Ignoring those, the transformation maps


236 
term constants, variables and applications to the corresponding constructs


237 
on \AST{}s. Abstractions are mapped to applications of the special


238 
constant {\tt _abs}.


239 


240 
More precisely, the mapping $\astofterm{}$ is defined as follows:


241 
\begin{itemize}


242 
\item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.


243 


244 
\item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,


245 
\tau)$.


246 


247 
\item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable


248 
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of


249 
the {\tt indexname} $(x, i)$.


250 


251 
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant


252 
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$


253 
be obtained from~$t$ by replacing all bound occurrences of~$x$ by


254 
the free variable $x'$. This replaces corresponding occurrences of the


255 
constructor \ttindex{Bound} by the term $\ttfct{Free} (x',


256 
\mtt{dummyT})$:


257 
\[ \astofterm{\ttfct{Abs} (x, \tau, t)} =


258 
\Appl{\Constant \mtt{"_abs"},


259 
constrain(\Variable x', \tau), \astofterm{t'}}.


260 
\]


261 


262 
\item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.


263 
The occurrence of constructor \ttindex{Bound} should never happen


264 
when printing welltyped terms; it indicates a de Bruijn index with no


265 
matching abstraction.


266 


267 
\item Where $f$ is not an application,


268 
\[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =


269 
\Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}


270 
\]


271 
\end{itemize}


272 
%


273 
Type constraints are inserted to allow the printing of types. This is


274 
governed by the boolean variable \ttindex{show_types}:


275 
\begin{itemize}


276 
\item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or


277 
\ttindex{show_types} not set to {\tt true}.


278 


279 
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,


280 
\astofterm{\tau}}$ \ otherwise.


281 


282 
Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type


283 
constructors go to {\tt Constant}s; type identifiers go to {\tt


284 
Variable}s; type applications go to {\tt Appl}s with the type


285 
constructor as the first element. If \ttindex{show_sorts} is set to


286 
{\tt true}, some type variables are decorated with an \AST{} encoding


287 
of their sort.


288 
\end{itemize}


289 
%


290 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is


291 
transformed into the final output string. The builtin {\bf print AST


292 
translations}\indexbold{translations!print AST} effectively reverse the


293 
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.


294 


295 
For the actual printing process, the names attached to productions


296 
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play


297 
a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or


298 
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production


299 
for~$c$. Each argument~$x@i$ is converted to a string, and put in


300 
parentheses if its priority~$(p@i)$ requires this. The resulting strings


301 
and their syntactic sugar (denoted by \dots{} above) are joined to make a


302 
single string.


303 


304 
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the


305 
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots


306 
x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with


307 
nonconstant head or without a corresponding production are printed as


308 
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. An occurrence of


309 
$\Variable x$ is simply printed as~$x$.


310 


311 
Blanks are {\em not\/} inserted automatically. If blanks are required to


312 
separate tokens, specify them in the mixfix declaration, possibly preceded


313 
by a slash~({\tt/}) to allow a line break.


314 
\index{ASTs)}


315 


316 


317 


318 
\section{Macros: Syntactic rewriting} \label{sec:macros}


319 
\index{macros(}\index{rewriting!syntactic(}


320 


321 
Mixfix declarations alone can handle situations where there is a direct


322 
connection between the concrete syntax and the underlying term. Sometimes


323 
we require a more elaborate concrete syntax, such as quantifiers and list


324 
notation. Isabelle's {\bf macros} and {\bf translation functions} can


325 
perform translations such as


326 
\begin{center}\tt


327 
\begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}


328 
ALL x:A.P & Ball(A, \%x.P) \\ \relax


329 
[x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))


330 
\end{tabular}


331 
\end{center}


332 
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they


333 
are the most powerful translation mechanism but are difficult to read or


334 
write. Macros are specified by firstorder rewriting systems that operate


335 
on abstract syntax trees. They are usually easy to read and write, and can


336 
express all but the most obscure translations.


337 


338 
Figure~\ref{fig:set_trans} defines a fragment of firstorder logic and set


339 
theory.\footnote{This and the following theories are complete working


340 
examples, though they specify only syntax, no axioms. The file {\tt


341 
ZF/zf.thy} presents the full set theory definition, including many


342 
macro rules.} Theory {\tt SET} defines constants for set comprehension


343 
({\tt Collect}), replacement ({\tt Replace}) and bounded universal


344 
quantification ({\tt Ball}). Each of these binds some variables. Without


345 
additional syntax we should have to express $\forall x \in A. P$ as {\tt


346 
Ball(A,\%x.P)}, and similarly for the others.


347 


348 
\begin{figure}


349 
\begin{ttbox}


350 
SET = Pure +


351 
types


352 
i, o


353 
arities


354 
i, o :: logic


355 
consts


356 
Trueprop :: "o => prop" ("_" 5)


357 
Collect :: "[i, i => o] => i"


358 
"{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})")


359 
Replace :: "[i, [i, i] => o] => i"


360 
"{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")


361 
Ball :: "[i, i => o] => o"


362 
"{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10)


363 
translations


364 
"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)"


365 
"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"


366 
"ALL x:A. P" == "Ball(A, \%x. P)"


367 
end


368 
\end{ttbox}


369 
\caption{Macro example: set theory}\label{fig:set_trans}


370 
\end{figure}


371 


372 
The theory specifies a variablebinding syntax through additional


373 
productions that have mixfix declarations. Each noncopy production must


374 
specify some constant, which is used for building \AST{}s.


375 
\index{constants!syntactic} The additional constants are decorated with


376 
{\tt\at} to stress their purely syntactic purpose; they should never occur


377 
within the final welltyped terms. Furthermore, they cannot be written in


378 
formulae because they are not legal identifiers.


379 


380 
The translations cause the replacement of external forms by internal forms


381 
after parsing, and vice versa before printing of terms. As a specification


382 
of the set theory notation, they should be largely selfexplanatory. The


383 
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},


384 
appear implicitly in the macro rules via their mixfix forms.


385 


386 
Macros can define variablebinding syntax because they operate on \AST{}s,


387 
which have no inbuilt notion of bound variable. The macro variables {\tt


388 
x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,


389 
in this case bound variables. The macro variables {\tt P} and~{\tt Q}


390 
range over formulae containing bound variable occurrences.


391 


392 
Other applications of the macro system can be less straightforward, and


393 
there are peculiarities. The rest of this section will describe in detail


394 
how Isabelle macros are preprocessed and applied.


395 


396 


397 
\subsection{Specifying macros}


398 
Macros are basically rewrite rules on \AST{}s. But unlike other macro


399 
systems found in programming languages, Isabelle's macros work in both


400 
directions. Therefore a syntax contains two lists of rewrites: one for


401 
parsing and one for printing.


402 


403 
\index{*translations section}


404 
The {\tt translations} section specifies macros. The syntax for a macro is


405 
\[ (root)\; string \quad


406 
\left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}


407 
\right\} \quad


408 
(root)\; string


409 
\]


410 
%


411 
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both


412 
({\tt ==}). The two strings specify the left and righthand sides of the


413 
macro rule. The $(root)$ specification is optional; it specifies the


414 
nonterminal for parsing the $string$ and if omitted defaults to {\tt


415 
logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions:


416 
\begin{itemize}


417 
\item Rules must be left linear: $l$ must not contain repeated variables.


418 


419 
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =


420 
(\mtt"c\mtt" ~ x@1 \ldots x@n)$.


421 


422 
\item Every variable in~$r$ must also occur in~$l$.


423 
\end{itemize}


424 


425 
Macro rules may refer to any syntax from the parent theories. They may


426 
also refer to anything defined before the the {\tt .thy} file's {\tt


427 
translations} section  including any mixfix declarations.


428 


429 
Upon declaration, both sides of the macro rule undergo parsing and parse


430 
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo


431 
macro expansion. The lexer runs in a different mode that additionally


432 
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},


433 
{\tt _K}). Thus, a constant whose name starts with an underscore can


434 
appear in macro rules but not in ordinary terms.


435 


436 
Some atoms of the macro rule's \AST{} are designated as constants for


437 
matching. These are all names that have been declared as classes, types or


438 
constants.


439 


440 
The result of this preprocessing is two lists of macro rules, each stored


441 
as a pair of \AST{}s. They can be viewed using {\tt Syntax.print_syntax}


442 
(sections \ttindex{parse_rules} and \ttindex{print_rules}). For


443 
theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are


444 
\begin{ttbox}


445 
parse_rules:


446 
("{\at}Collect" x A P) > ("Collect" A ("_abs" x P))


447 
("{\at}Replace" y x A Q) > ("Replace" A ("_abs" x ("_abs" y Q)))


448 
("{\at}Ball" x A P) > ("Ball" A ("_abs" x P))


449 
print_rules:


450 
("Collect" A ("_abs" x P)) > ("{\at}Collect" x A P)


451 
("Replace" A ("_abs" x ("_abs" y Q))) > ("{\at}Replace" y x A Q)


452 
("Ball" A ("_abs" x P)) > ("{\at}Ball" x A P)


453 
\end{ttbox}


454 


455 
\begin{warn}


456 
Avoid choosing variable names that have previously been used as


457 
constants, types or type classes; the {\tt consts} section in the output


458 
of {\tt Syntax.print_syntax} lists all such names. If a macro rule works


459 
incorrectly, inspect its internal form as shown above, recalling that


460 
constants appear as quoted strings and variables without quotes.


461 
\end{warn}


462 


463 
\begin{warn}


464 
If \ttindex{eta_contract} is set to {\tt true}, terms will be


465 
$\eta$contracted {\em before\/} the \AST{} rewriter sees them. Thus some


466 
abstraction nodes needed for print rules to match may vanish. For example,


467 
\verbBall(A, %x. P(x)) contracts {\tt Ball(A, P)}; the print rule does


468 
not apply and the output will be {\tt Ball(A, P)}. This problem would not


469 
occur if \ML{} translation functions were used instead of macros (as is


470 
done for binder declarations).


471 
\end{warn}


472 


473 


474 
\begin{warn}


475 
Another trap concerns type constraints. If \ttindex{show_types} is set to


476 
{\tt true}, bound variables will be decorated by their meta types at the


477 
binding place (but not at occurrences in the body). Matching with


478 
\verbCollect(A, %x. P) binds {\tt x} to something like {\tt ("_constrain" y


479 
"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to


480 
appear in the external form, say \verb{y::i:A::i. P::o}.


481 


482 
To allow such constraints to be reread, your syntax should specify bound


483 
variables using the nonterminal~\ndx{idt}. This is the case in our


484 
example. Choosing {\tt id} instead of {\tt idt} is a common error,


485 
especially since it appears in former versions of most of Isabelle's


486 
objectlogics.


487 
\end{warn}


488 


489 


490 


491 
\subsection{Applying rules}


492 
As a term is being parsed or printed, an \AST{} is generated as an


493 
intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is


494 
normalized by applying macro rules in the manner of a traditional term


495 
rewriting system. We first examine how a single rule is applied.


496 


497 
Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some


498 
translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an


499 
instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex


500 
matched by $l$ may be replaced by the corresponding instance of~$r$, thus


501 
{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf


502 
placeholders} that may occur in rule patterns but not in ordinary


503 
\AST{}s; {\tt Variable} atoms serve this purpose.


504 


505 
The matching of the object~$u$ by the pattern~$l$ is performed as follows:


506 
\begin{itemize}


507 
\item Every constant matches itself.


508 


509 
\item $\Variable x$ in the object matches $\Constant x$ in the pattern.


510 
This point is discussed further below.


511 


512 
\item Every \AST{} in the object matches $\Variable x$ in the pattern,


513 
binding~$x$ to~$u$.


514 


515 
\item One application matches another if they have the same number of


516 
subtrees and corresponding subtrees match.


517 


518 
\item In every other case, matching fails. In particular, {\tt


519 
Constant}~$x$ can only match itself.


520 
\end{itemize}


521 
A successful match yields a substitution that is applied to~$r$, generating


522 
the instance that replaces~$u$.


523 


524 
The second case above may look odd. This is where {\tt Variable}s of


525 
nonrule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not


526 
far removed from parse trees; at this level it is not yet known which


527 
identifiers will become constants, bounds, frees, types or classes. As


528 
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as


529 
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and


530 
\ndx{tvar} become {\tt Variable}s. On the other hand, when \AST{}s


531 
generated from terms for printing, all constants and type constructors


532 
become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may contain a


533 
messy mixture of {\tt Variable}s and {\tt Constant}s. This is


534 
insignificant at macro level because matching treats them alike.


535 


536 
Because of this behaviour, different kinds of atoms with the same name are


537 
indistinguishable, which may make some rules prone to misbehaviour. Example:


538 
\begin{ttbox}


539 
types


540 
Nil


541 
consts


542 
Nil :: "'a list"


543 
"[]" :: "'a list" ("[]")


544 
translations


545 
"[]" == "Nil"


546 
\end{ttbox}


547 
The term {\tt Nil} will be printed as {\tt []}, just as expected.


548 
The term \verb%Nil.t will be printed as \verb%[].t, which might not be


549 
expected!


550 


551 
Normalizing an \AST{} involves repeatedly applying macro rules until none


552 
is applicable. Macro rules are chosen in the order that they appear in the


553 
{\tt translations} section. You can watch the normalization of \AST{}s


554 
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to


555 
{\tt true}.\index{tracing!of macros} Alternatively, use


556 
\ttindex{Syntax.test_read}. The information displayed when tracing


557 
includes the \AST{} before normalization ({\tt pre}), redexes with results


558 
({\tt rewrote}), the normal form finally reached ({\tt post}) and some


559 
statistics ({\tt normalize}). If tracing is off,


560 
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable


561 
printing of the normal form and statistics only.


562 


563 


564 
\subsection{Example: the syntax of finite sets}


565 
\index{examples!of macros}


566 


567 
This example demonstrates the use of recursive macros to implement a


568 
convenient notation for finite sets.


569 
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}


570 
\index{"@Enum@{\tt\at Enum} constant}


571 
\index{"@Finset@{\tt\at Finset} constant}


572 
\begin{ttbox}


573 
FINSET = SET +


574 
types


575 
is


576 
consts


577 
"" :: "i => is" ("_")


578 
"{\at}Enum" :: "[i, is] => is" ("_,/ _")


579 
empty :: "i" ("{\ttlbrace}{\ttrbrace}")


580 
insert :: "[i, i] => i"


581 
"{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}")


582 
translations


583 
"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"


584 
"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"


585 
end


586 
\end{ttbox}


587 
Finite sets are internally built up by {\tt empty} and {\tt insert}. The


588 
declarations above specify \verb{x, y, z} as the external representation


589 
of


590 
\begin{ttbox}


591 
insert(x, insert(y, insert(z, empty)))


592 
\end{ttbox}


593 
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt


594 
i} separated by commas. The mixfix declaration \hbox{\verb"_,/ _"}


595 
allows a line break after the comma for \rmindex{pretty printing}; if no


596 
line break is required then a space is printed instead.


597 


598 
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}


599 
declaration. Hence {\tt is} is not a logical type and no default


600 
productions are added. If we had needed enumerations of the nonterminal


601 
{\tt logic}, which would include all the logical types, we could have used


602 
the predefined nonterminal symbol \ndx{args} and skipped this part


603 
altogether. The nonterminal~{\tt is} can later be reused for other


604 
enumerations of type~{\tt i} like lists or tuples.


605 


606 
\index{"@Finset@{\tt\at Finset} constant}


607 
Next follows {\tt empty}, which is already equipped with its syntax


608 
\verb{}, and {\tt insert} without concrete syntax. The syntactic


609 
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt


610 
i} enclosed in curly braces. Remember that a pair of parentheses, as in


611 
\verb"{(_)}", specifies a block of indentation for pretty printing.


612 


613 
The translations may look strange at first. Macro rules are best


614 
understood in their internal forms:


615 
\begin{ttbox}


616 
parse_rules:


617 
("{\at}Finset" ("{\at}Enum" x xs)) > ("insert" x ("{\at}Finset" xs))


618 
("{\at}Finset" x) > ("insert" x "empty")


619 
print_rules:


620 
("insert" x ("{\at}Finset" xs)) > ("{\at}Finset" ("{\at}Enum" x xs))


621 
("insert" x "empty") > ("{\at}Finset" x)


622 
\end{ttbox}


623 
This shows that \verb{x, xs} indeed matches any set enumeration of at least


624 
two elements, binding the first to {\tt x} and the rest to {\tt xs}.


625 
Likewise, \verb{xs} and \verb{x} represent any set enumeration.


626 
The parse rules only work in the order given.


627 


628 
\begin{warn}


629 
The \AST{} rewriter cannot discern constants from variables and looks


630 
only for names of atoms. Thus the names of {\tt Constant}s occurring in


631 
the (internal) lefthand side of translation rules should be regarded as


632 
\rmindex{reserved words}. Choose nonidentifiers like {\tt\at Finset} or


633 
sufficiently long and strange names. If a bound variable's name gets


634 
rewritten, the result will be incorrect; for example, the term


635 
\begin{ttbox}


636 
\%empty insert. insert(x, empty)


637 
\end{ttbox}


638 
gets printed as \verb%empty insert. {x}.


639 
\end{warn}


640 


641 


642 
\subsection{Example: a parse macro for dependent types}\label{prod_trans}


643 
\index{examples!of macros}


644 


645 
As stated earlier, a macro rule may not introduce new {\tt Variable}s on


646 
the righthand side. Something like \verb"K(B)" => "%x. B" is illegal;


647 
if allowed, it could cause variable capture. In such cases you usually


648 
must fall back on translation functions. But a trick can make things


649 
readable in some cases: {\em calling translation functions by parse


650 
macros}:


651 
\begin{ttbox}


652 
PROD = FINSET +


653 
consts


654 
Pi :: "[i, i => i] => i"


655 
"{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10)


656 
"{\at}>" :: "[i, i] => i" ("(_ >/ _)" [51, 50] 50)


657 
\ttbreak


658 
translations


659 
"PROD x:A. B" => "Pi(A, \%x. B)"


660 
"A > B" => "Pi(A, _K(B))"


661 
end


662 
ML


663 
val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}>"))];


664 
\end{ttbox}


665 


666 
Here {\tt Pi} is an internal constant for constructing general products.


667 
Two external forms exist: the general case {\tt PROD x:A.B} and the


668 
function space {\tt A > B}, which abbreviates \verbPi(A, %x.B) when {\tt B}


669 
does not depend on~{\tt x}.


670 


671 
The second parse macro introduces {\tt _K(B)}, which later becomes \verb%x.B


672 
due to a parse translation associated with \cdx{_K}. The order of the


673 
parse rules is critical. Unfortunately there is no such trick for


674 
printing, so we have to add a {\tt ML} section for the print translation


675 
\ttindex{dependent_tr'}.


676 


677 
Recall that identifiers with a leading {\tt _} are allowed in translation


678 
rules, but not in ordinary terms. Thus we can create \AST{}s containing


679 
names that are not directly expressible.


680 


681 
The parse translation for {\tt _K} is already installed in Pure, and {\tt


682 
dependent_tr'} is exported by the syntax module for public use. See


683 
\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.


684 
\index{macros)}\index{rewriting!syntactic)}


685 


686 


687 
\section{Translation functions} \label{sec:tr_funs}


688 
\index{translations(}


689 
%


690 
This section describes the translation function mechanism. By writing


691 
\ML{} functions, you can do almost everything with terms or \AST{}s during


692 
parsing and printing. The logic \LK\ is a good example of sophisticated


693 
transformations between internal and external representations of


694 
associative sequences; here, macros would be useless.


695 


696 
A full understanding of translations requires some familiarity


697 
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},


698 
{\tt Syntax.ast} and the encodings of types and terms as such at the various


699 
stages of the parsing or printing process. Most users should never need to


700 
use translation functions.


701 


702 
\subsection{Declaring translation functions}


703 
There are four kinds of translation functions. Each such function is


704 
associated with a name, which triggers calls to it. Such names can be


705 
constants (logical or syntactic) or type constructors.


706 


707 
{\tt Syntax.print_syntax} displays the sets of names associated with the


708 
translation functions of a {\tt Syntax.syntax} under


709 
\ttindex{parse_ast_translation}, \ttindex{parse_translation},


710 
\ttindex{print_translation} and \ttindex{print_ast_translation}. You can


711 
add new ones via the {\tt ML} section\index{*ML section} of


712 
a {\tt .thy} file. There may never be more than one function of the same


713 
kind per name. Conceptually, the {\tt ML} section should appear between


714 
{\tt consts} and {\tt translations}; newly installed translation functions


715 
are already effective when macros and logical rules are parsed.


716 


717 
The {\tt ML} section is copied verbatim into the \ML\ file generated from a


718 
{\tt .thy} file. Definitions made here are accessible as components of an


719 
\ML\ structure; to make some definitions private, use an \ML{} {\tt local}


720 
declaration. The {\tt ML} section may install translation functions by


721 
declaring any of the following identifiers:


722 
\begin{ttbox}


723 
val parse_ast_translation : (string * (ast list > ast)) list


724 
val print_ast_translation : (string * (ast list > ast)) list


725 
val parse_translation : (string * (term list > term)) list


726 
val print_translation : (string * (term list > term)) list


727 
\end{ttbox}


728 


729 
\subsection{The translation strategy}


730 
All four kinds of translation functions are treated similarly. They are


731 
called during the transformations between parse trees, \AST{}s and terms


732 
(recall Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form


733 
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function


734 
$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}


735 
function call $f \mtt[ x@1, \ldots, x@n \mtt]$.


736 


737 
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s. A


738 
combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,


739 
x@n}$. For term translations, the arguments are terms and a combination


740 
has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp


741 
x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated


742 
transformations than \AST{}s do, typically involving abstractions and bound


743 
variables.


744 


745 
Regardless of whether they act on terms or \AST{}s,


746 
parse translations differ from print translations fundamentally:


747 
\begin{description}


748 
\item[Parse translations] are applied bottomup. The arguments are already


749 
in translated form. The translations must not fail; exceptions trigger


750 
an error message.


751 


752 
\item[Print translations] are applied topdown. They are supplied with


753 
arguments that are partly still in internal form. The result again


754 
undergoes translation; therefore a print translation should not introduce


755 
as head the very constant that invoked it. The function may raise


756 
exception \xdx{Match} to indicate failure; in this event it has no


757 
effect.


758 
\end{description}


759 


760 
Only constant atoms  constructor \ttindex{Constant} for \AST{}s and


761 
\ttindex{Const} for terms  can invoke translation functions. This


762 
causes another difference between parsing and printing.


763 


764 
Parsing starts with a string and the constants are not yet identified.


765 
Only parse tree heads create {\tt Constant}s in the resulting \AST, as


766 
described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may


767 
introduce further {\tt Constant}s. When the final \AST{} is converted to a


768 
term, all {\tt Constant}s become {\tt Const}s, as described in


769 
\S\ref{sec:termofast}.


770 


771 
Printing starts with a welltyped term and all the constants are known. So


772 
all logical constants and type constructors may invoke print translations.


773 
These, and macros, may introduce further constants.


774 


775 


776 
\subsection{Example: a print translation for dependent types}


777 
\index{examples!of translations}\indexbold{*dependent_tr'}


778 


779 
Let us continue the dependent type example (page~\pageref{prod_trans}) by


780 
examining the parse translation for~\cdx{_K} and the print translation


781 
{\tt dependent_tr'}, which are both builtin. By convention, parse


782 
translations have names ending with {\tt _tr} and print translations have


783 
names ending with {\tt _tr'}. Search for such names in the Isabelle


784 
sources to locate more examples.


785 


786 
Here is the parse translation for {\tt _K}:


787 
\begin{ttbox}


788 
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)


789 
 k_tr ts = raise TERM("k_tr",ts);


790 
\end{ttbox}


791 
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new


792 
{\tt Abs} node with a body derived from $t$. Since terms given to parse


793 
translations are not yet typed, the type of the bound variable in the new


794 
{\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound}


795 
nodes referring to outer abstractions by calling \ttindex{incr_boundvars},


796 
a basic term manipulation function defined in {\tt Pure/term.ML}.


797 


798 
Here is the print translation for dependent types:


799 
\begin{ttbox}


800 
fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =


801 
if 0 mem (loose_bnos B) then


802 
let val (x', B') = variant_abs (x, dummyT, B);


803 
in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)


804 
end


805 
else list_comb (Const (r, dummyT) $ A $ B, ts)


806 
 dependent_tr' _ _ = raise Match;


807 
\end{ttbox}


808 
The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried


809 
function application during its installation. We could set up print


810 
translations for both {\tt Pi} and {\tt Sigma} by including


811 
\begin{ttbox}\index{*ML section}


812 
val print_translation =


813 
[("Pi", dependent_tr' ("{\at}PROD", "{\at}>")),


814 
("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];


815 
\end{ttbox}


816 
within the {\tt ML} section. The first of these transforms ${\tt Pi}(A,


817 
\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or


818 
$\hbox{\tt{\at}>}r(A, B)$, choosing the latter form if $B$ does not depend


819 
on~$x$. It checks this using \ttindex{loose_bnos}, yet another function


820 
from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$ renamed away


821 
from all names in $B$, and $B'$ the body $B$ with {\tt Bound} nodes


822 
referring to our {\tt Abs} node replaced by $\ttfct{Free} (x',


823 
\mtt{dummyT})$.


824 


825 
We must be careful with types here. While types of {\tt Const}s are


826 
ignored, type constraints may be printed for some {\tt Free}s and


827 
{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type


828 
\ttindex{dummyT} are never printed with constraint, though. The line


829 
\begin{ttbox}


830 
let val (x', B') = variant_abs (x, dummyT, B);


831 
\end{ttbox}\index{*variant_abs}


832 
replaces bound variable occurrences in~$B$ by the free variable $x'$ with


833 
type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the


834 
correct type~{\tt T}, so this is the only place where a type


835 
constraint might appear.


836 
\index{translations)}


837 
\index{syntax!transformations)}
