author  wenzelm 
Fri, 10 Nov 2000 19:02:37 +0100  
changeset 10432  3dfbc913d184 
parent 291  a615050a7494 
permissions  rwrr 
104  1 
%% $Id$ 
291  2 
%% \([azAZ][azAZ]}\.\) \([^ ]\) \1 \2 
3 
%% @\([az09]\) ^{(\1)} 

108  4 

291  5 
\newcommand\rmindex[1]{{#1}\index{#1}\@} 
6 
\newcommand\mtt[1]{\mbox{\tt #1}} 

7 
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits} 

8 
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}} 

9 
\newcommand\Constant{\ttfct{Constant}} 

10 
\newcommand\Variable{\ttfct{Variable}} 

11 
\newcommand\Appl[1]{\ttfct{Appl}\mathopen{\mtt[}#1\mathclose{\mtt]}} 

12 
\newcommand\AST{{\sc ast}} 

13 
\let\rew=\longrightarrow 

104  14 

15 

16 
\chapter{Defining Logics} \label{DefiningLogics} 

17 

291  18 
This chapter explains how to define new formal systems  in particular, 
19 
their concrete syntax. While Isabelle can be regarded as a theorem prover 

20 
for set theory, higherorder logic or the sequent calculus, its 

21 
distinguishing feature is support for the definition of new logics. 

104  22 

291  23 
Isabelle logics are hierarchies of theories, which are described and 
24 
illustrated in {\em Introduction to Isabelle}. That material, together 

25 
with the theory files provided in the examples directories, should suffice 

26 
for all simple applications. The easiest way to define a new theory is by 

27 
modifying a copy of an existing theory. 

28 

29 
This chapter is intended for experienced Isabelle users. It documents all 

30 
aspects of theories concerned with syntax: mixfix declarations, pretty 

31 
printing, macros and translation functions. The extended examples of 

32 
\S\ref{sec:min_logics} demonstrate the logical aspects of the definition of 

33 
theories. Sections marked with * are highly technical and might be skipped 

34 
on the first reading. 

104  35 

36 

291  37 
\section{Priority grammars} \label{sec:priority_grammars} 
38 
\index{grammars!priority(} 

104  39 

291  40 
The syntax of an Isabelle logic is specified by a {\bf priority grammar}. 
41 
A contextfree grammar\index{grammars!contextfree} contains a set of 

42 
productions of the form $A=\gamma$, where $A$ is a nonterminal and 

43 
$\gamma$, the righthand side, is a string of terminals and nonterminals. 

44 
Isabelle uses an extended format permitting {\bf priorities}, or 

45 
precedences. Each nonterminal is decorated by an integer priority, as 

46 
in~$A^{(p)}$. A nonterminal $A^{(p)}$ in a derivation may be replaced 

47 
using a production $A^{(q)} = \gamma$ only if $p \le q$. Any priority 

48 
grammar can be translated into a normal context free grammar by introducing 

49 
new nonterminals and productions. 

104  50 

51 
Formally, a set of context free productions $G$ induces a derivation 

291  52 
relation $\rew@G$. Let $\alpha$ and $\beta$ denote strings of terminal or 
53 
nonterminal symbols. Then 

54 
\[ \alpha\, A^{(p)}\, \beta ~\rew@G~ \alpha\,\gamma\,\beta \] 

55 
if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$q\ge p$. 

104  56 

57 
The following simple grammar for arithmetic expressions demonstrates how 

291  58 
binding power and associativity of operators can be enforced by priorities. 
104  59 
\begin{center} 
60 
\begin{tabular}{rclr} 

291  61 
$A^{(9)}$ & = & {\tt0} \\ 
62 
$A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\ 

63 
$A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\ 

64 
$A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\ 

65 
$A^{(3)}$ & = & {\tt} $A^{(3)}$ 

104  66 
\end{tabular} 
67 
\end{center} 

291  68 
The choice of priorities determines that {\tt } binds tighter than {\tt *}, 
69 
which binds tighter than {\tt +}. Furthermore {\tt +} associates to the 

70 
left and {\tt *} to the right. 

104  71 

72 
To minimize the number of subscripts, we adopt the following conventions: 

73 
\begin{itemize} 

291  74 
\item All priorities $p$ must be in the range $0 \leq p \leq max_pri$ for 
75 
some fixed integer $max_pri$. 

76 
\item Priority $0$ on the righthand side and priority $max_pri$ on the 

104  77 
lefthand side may be omitted. 
78 
\end{itemize} 

291  79 
The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; 
80 
the priority of the lefthand side actually appears in a column on the far 

81 
right. Finally, alternatives may be separated by $$, and repetition 

104  82 
indicated by \dots. 
83 

291  84 
Using these conventions and assuming $max_pri=9$, the grammar takes the form 
104  85 
\begin{center} 
86 
\begin{tabular}{rclc} 

87 
$A$ & = & {\tt0} & \hspace*{4em} \\ 

88 
& $$ & {\tt(} $A$ {\tt)} \\ 

291  89 
& $$ & $A$ {\tt+} $A^{(1)}$ & (0) \\ 
90 
& $$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\ 

91 
& $$ & {\tt} $A^{(3)}$ & (3) 

104  92 
\end{tabular} 
93 
\end{center} 

291  94 
\index{grammars!priority)} 
104  95 

96 

291  97 
\begin{figure} 
104  98 
\begin{center} 
99 
\begin{tabular}{rclc} 

100 
$prop$ &=& \ttindex{PROP} $aprop$ ~~$$~~ {\tt(} $prop$ {\tt)} \\ 

291  101 
&$$& $logic^{(3)}$ \ttindex{==} $logic^{(2)}$ & (2) \\ 
102 
&$$& $logic^{(3)}$ \ttindex{=?=} $logic^{(2)}$ & (2) \\ 

103 
&$$& $prop^{(2)}$ \ttindex{==>} $prop^{(1)}$ & (1) \\ 

104 
&$$& {\tt[} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt]} {\tt==>} $prop^{(1)}$ & (1) \\ 

104  105 
&$$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\ 
106 
$logic$ &=& $prop$ ~~$$~~ $fun$ \\\\ 

107 
$aprop$ &=& $id$ ~~$$~~ $var$ 

291  108 
~~$$~~ $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\\\ 
104  109 
$fun$ &=& $id$ ~~$$~~ $var$ ~~$$~~ {\tt(} $fun$ {\tt)} \\ 
291  110 
&$$& $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\ 
111 
&$$& $fun^{(max_pri)}$ {\tt::} $type$ \\ 

104  112 
&$$& \ttindex{\%} $idts$ {\tt.} $logic$ & (0) \\\\ 
291  113 
$idts$ &=& $idt$ ~~$$~~ $idt^{(1)}$ $idts$ \\\\ 
104  114 
$idt$ &=& $id$ ~~$$~~ {\tt(} $idt$ {\tt)} \\ 
115 
&$$& $id$ \ttindex{::} $type$ & (0) \\\\ 

291  116 
$type$ &=& $tid$ ~~$$~~ $tvar$ ~~$$~~ $tid$ {\tt::} $sort$ 
135  117 
~~$$~~ $tvar$ {\tt::} $sort$ \\ 
291  118 
&$$& $id$ ~~$$~~ $type^{(max_pri)}$ $id$ 
104  119 
~~$$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\ 
291  120 
&$$& $type^{(1)}$ \ttindex{=>} $type$ & (0) \\ 
104  121 
&$$& {\tt[} $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\ 
122 
&$$& {\tt(} $type$ {\tt)} \\\\ 

123 
$sort$ &=& $id$ ~~$$~~ {\tt\ttlbrace\ttrbrace} 

124 
~~$$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace} 

125 
\end{tabular}\index{*"!"!}\index{*"["}\index{*""]} 

126 
\indexbold{type@$type$} \indexbold{sort@$sort$} \indexbold{idt@$idt$} 

127 
\indexbold{idts@$idts$} \indexbold{logic@$logic$} \indexbold{prop@$prop$} 

128 
\indexbold{fun@$fun$} 

129 
\end{center} 

291  130 
\caption{Metalogic syntax}\label{fig:pure_gram} 
104  131 
\end{figure} 
132 

291  133 

134 
\section{The Pure syntax} \label{sec:basic_syntax} 

135 
\index{syntax!Pure(} 

104  136 

291  137 
At the root of all objectlogics lies the Pure theory,\index{theory!Pure} 
138 
bound to the \ML{} identifier \ttindex{Pure.thy}. It contains, among many 

139 
other things, the Pure syntax. An informal account of this basic syntax 

140 
(metalogic, types, \ldots) may be found in {\em Introduction to Isabelle}. 

141 
A more precise description using a priority grammar is shown in 

142 
Fig.\ts\ref{fig:pure_gram}. The following nonterminals are defined: 

143 
\begin{description} 

144 
\item[$prop$] Terms of type $prop$. These are formulae of the metalogic. 

104  145 

291  146 
\item[$aprop$] Atomic propositions. These typically include the 
147 
judgement forms of the objectlogic; its definition introduces a 

148 
metalevel predicate for each judgement form. 

149 

150 
\item[$logic$] Terms whose type belongs to class $logic$. Initially, 

151 
this category contains just $prop$. As the syntax is extended by new 

152 
objectlogics, more productions for $logic$ are added automatically 

153 
(see below). 

104  154 

155 
\item[$fun$] Terms potentially of function type. 

156 

291  157 
\item[$type$] Types of the metalogic. 
104  158 

291  159 
\item[$idts$] A list of identifiers, possibly constrained by types. 
104  160 
\end{description} 
161 

291  162 
\begin{warn} 
163 
Note that \verbx::nat y is parsed as \verbx::(nat y), treating {\tt 

164 
y} like a type constructor applied to {\tt nat}. The likely result is 

165 
an error message. To avoid this interpretation, use parentheses and 

166 
write \verb(x::nat) y. 

104  167 

291  168 
Similarly, \verbx::nat y::nat is parsed as \verbx::(nat y::nat) and 
169 
yields a syntax error. The correct form is \verb(x::nat) (y::nat). 

170 
\end{warn} 

171 

172 
\subsection{Logical types and default syntax}\label{logicaltypes} 

173 
Isabelle's representation of mathematical languages is based on the typed 

174 
$\lambda$calculus. All logical types, namely those of class $logic$, are 

175 
automatically equipped with a basic syntax of types, identifiers, 

176 
variables, parentheses, $\lambda$abstractions and applications. 

177 

178 
More precisely, for each type constructor $ty$ with arity $(\vec{s})c$, 

179 
where $c$ is a subclass of $logic$, several productions are added: 

104  180 
\begin{center} 
181 
\begin{tabular}{rclc} 

182 
$ty$ &=& $id$ ~~$$~~ $var$ ~~$$~~ {\tt(} $ty$ {\tt)} \\ 

291  183 
&$$& $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)}\\ 
184 
&$$& $ty^{(max_pri)}$ {\tt::} $type$\\\\ 

104  185 
$logic$ &=& $ty$ 
186 
\end{tabular} 

187 
\end{center} 

188 

189 

291  190 
\subsection{Lexical matters} 
191 
The parser does not process input strings directly. It operates on token 

192 
lists provided by Isabelle's \bfindex{lexer}. There are two kinds of 

193 
tokens: \bfindex{delimiters} and \bfindex{name tokens}. 

104  194 

291  195 
Delimiters can be regarded as reserved words of the syntax. You can 
196 
add new ones when extending theories. In Fig.\ts\ref{fig:pure_gram} they 

197 
appear in typewriter font, for example {\tt ==}, {\tt =?=} and 

198 
{\tt PROP}\@. 

104  199 

291  200 
Name tokens have a predefined syntax. The lexer distinguishes four 
201 
disjoint classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type 

202 
identifiers\index{identifiers!type}, type unknowns\index{unknowns!type}. 

203 
They are denoted by $id$\index{id@$id$}, $var$\index{var@$var$}, 

204 
$tid$\index{tid@$tid$}, $tvar$\index{tvar@$tvar$}, respectively. Typical 

205 
examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3}. Here is the precise 

206 
syntax: 

188
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

207 
\begin{eqnarray*} 
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

208 
id & = & letter~quasiletter^* \\ 
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

209 
var & = & \mbox{\tt ?}id ~~~~ \mbox{\tt ?}id\mbox{\tt .}nat \\ 
291  210 
tid & = & \mbox{\tt '}id \\ 
211 
tvar & = & \mbox{\tt ?}tid ~~~~ 

212 
\mbox{\tt ?}tid\mbox{\tt .}nat \\[1ex] 

188
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

213 
letter & = & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\ 
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

214 
digit & = & \mbox{one of {\tt 0}\dots {\tt 9}} \\ 
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

215 
quasiletter & = & letter ~~~~ digit ~~~~ \mbox{\tt _} ~~~~ \mbox{\tt '} \\ 
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

216 
nat & = & digit^+ 
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

217 
\end{eqnarray*} 
291  218 
A $var$ or $tvar$ describes an unknown, which is internally a pair 
188
6be0856cdf49
last minute changes, eg literal tokens > delimiters and valued tokens >
nipkow
parents:
142
diff
changeset

219 
of base name and index (\ML\ type \ttindex{indexname}). These components are 
291  220 
either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or 
221 
run together as in {\tt ?x1}. The latter form is possible if the 

222 
base name does not end with digits. If the index is 0, it may be dropped 

223 
altogether: {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}. 

104  224 

291  225 
The lexer repeatedly takes the maximal prefix of the input string that 
226 
forms a valid token. A maximal prefix that is both a delimiter and a name 

227 
is treated as a delimiter. Spaces, tabs and newlines are separators; they 

228 
never occur within tokens. 

104  229 

291  230 
Delimiters need not be separated by white space. For example, if {\tt } 
231 
is a delimiter but {\tt } is not, then the string {\tt } is treated as 

232 
two consecutive occurrences of the token~{\tt }. In contrast, \ML\ 

233 
treats {\tt } as a single symbolic name. The consequence of Isabelle's 

234 
more liberal scheme is that the same string may be parsed in different ways 

235 
after extending the syntax: after adding {\tt } as a delimiter, the input 

236 
{\tt } is treated as a single token. 

104  237 

291  238 
Name tokens are terminal symbols, strictly speaking, but we can generally 
239 
regard them as nonterminals. This is because a name token carries with it 

240 
useful information, the name. Delimiters, on the other hand, are nothing 

241 
but than syntactic sugar. 

104  242 

243 

291  244 
\subsection{*Inspecting the syntax} 
104  245 
\begin{ttbox} 
291  246 
syn_of : theory > Syntax.syntax 
247 
Syntax.print_syntax : Syntax.syntax > unit 

248 
Syntax.print_gram : Syntax.syntax > unit 

249 
Syntax.print_trans : Syntax.syntax > unit 

104  250 
\end{ttbox} 
291  251 
The abstract type \ttindex{Syntax.syntax} allows manipulation of syntaxes 
252 
in \ML. You can display values of this type by calling the following 

253 
functions: 

254 
\begin{description} 

255 
\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle 

256 
theory~{\it thy} as an \ML\ value. 

257 

258 
\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all 

259 
information contained in the syntax {\it syn}. The displayed output can 

260 
be large. The following two functions are more selective. 

104  261 

291  262 
\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part 
263 
of~{\it syn}, namely the lexicon, roots and productions. 

104  264 

291  265 
\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation 
266 
part of~{\it syn}, namely the constants, parse/print macros and 

267 
parse/print translations. 

268 
\end{description} 

269 

270 
Let us demonstrate these functions by inspecting Pure's syntax. Even that 

271 
is too verbose to display in full. 

104  272 
\begin{ttbox} 
273 
Syntax.print_syntax (syn_of Pure.thy); 

291  274 
{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots} 
104  275 
{\out roots: logic type fun prop} 
276 
{\out prods:} 

291  277 
{\out type = tid (1000)} 
104  278 
{\out type = tvar (1000)} 
279 
{\out type = id (1000)} 

291  280 
{\out type = tid "::" sort[0] => "_ofsort" (1000)} 
104  281 
{\out type = tvar "::" sort[0] => "_ofsort" (1000)} 
282 
{\out \vdots} 

291  283 
\ttbreak 
104  284 
{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots} 
285 
{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"} 

286 
{\out "_idtyp" "_lambda" "_tapp" "_tappl"} 

287 
{\out parse_rules:} 

288 
{\out parse_translation: "!!" "_K" "_abs" "_aprop"} 

289 
{\out print_translation: "all"} 

290 
{\out print_rules:} 

291 
{\out print_ast_translation: "==>" "_abs" "_idts" "fun"} 

292 
\end{ttbox} 

293 

291  294 
As you can see, the output is divided into labeled sections. The grammar 
295 
is represented by {\tt lexicon}, {\tt roots} and {\tt prods}. The rest 

296 
refers to syntactic translations and macro expansion. Here is an 

297 
explanation of the various sections. 

104  298 
\begin{description} 
291  299 
\item[\ttindex{lexicon}] lists the delimiters used for lexical 
300 
analysis.\index{delimiters} 

104  301 

291  302 
\item[\ttindex{roots}] lists the grammar's nonterminal symbols. You must 
303 
name the desired root when calling lower level functions or specifying 

304 
macros. Higher level functions usually expect a type and derive the 

305 
actual root as described in~\S\ref{sec:grammar}. 

104  306 

291  307 
\item[\ttindex{prods}] lists the productions of the priority grammar. 
308 
The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}. 

309 
Each delimiter is quoted. Some productions are shown with {\tt =>} and 

310 
an attached string. These strings later become the heads of parse 

311 
trees; they also play a vital role when terms are printed (see 

312 
\S\ref{sec:asts}). 

104  313 

291  314 
Productions with no strings attached are called {\bf copy 
315 
productions}\indexbold{productions!copy}. Their righthand side must 

316 
have exactly one nonterminal symbol (or name token). The parser does 

317 
not create a new parse tree node for copy productions, but simply 

318 
returns the parse tree of the righthand symbol. 

104  319 

291  320 
If the righthand side consists of a single nonterminal with no 
321 
delimiters, then the copy production is called a {\bf chain 

322 
production}\indexbold{productions!chain}. Chain productions should 

323 
be seen as abbreviations: conceptually, they are removed from the 

324 
grammar by adding new productions. Priority information 

325 
attached to chain productions is ignored, only the dummy value $1$ is 

326 
displayed. 

104  327 

328 
\item[\ttindex{consts}, \ttindex{parse_rules}, \ttindex{print_rules}] 

291  329 
relate to macros (see \S\ref{sec:macros}). 
104  330 

291  331 
\item[\ttindex{parse_ast_translation}, \ttindex{print_ast_translation}] 
332 
list sets of constants that invoke translation functions for abstract 

333 
syntax trees. Section \S\ref{sec:asts} below discusses this obscure 

334 
matter. 

104  335 

291  336 
\item[\ttindex{parse_translation}, \ttindex{print_translation}] list sets 
337 
of constants that invoke translation functions for terms (see 

338 
\S\ref{sec:tr_funs}). 

339 
\end{description} 

340 
\index{syntax!Pure)} 

104  341 

342 

343 
\section{Mixfix declarations} \label{sec:mixfix} 

291  344 
\index{mixfix declaration(} 
104  345 

291  346 
When defining a theory, you declare new constants by giving their names, 
347 
their type, and an optional {\bf mixfix annotation}. Mixfix annotations 

348 
allow you to extend Isabelle's basic $\lambda$calculus syntax with 

349 
readable notation. They can express any contextfree priority grammar. 

350 
Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more 

351 
general than the priority declarations of \ML\ and Prolog. 

352 

353 
A mixfix annotation defines a production of the priority grammar. It 

354 
describes the concrete syntax, the translation to abstract syntax, and the 

355 
pretty printing. Special case annotations provide a simple means of 

356 
specifying infix operators, binders and so forth. 

104  357 

291  358 
\subsection{Grammar productions}\label{sec:grammar} 
359 
Let us examine the treatment of the production 

360 
\[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\, 

361 
A@n^{(p@n)}\, w@n. \] 

362 
Here $A@i^{(p@i)}$ is a nonterminal with priority~$p@i$ for $i=1$, 

363 
\ldots,~$n$, while $w@0$, \ldots,~$w@n$ are strings of terminals. 

364 
In the corresponding mixfix annotation, the priorities are given separately 

365 
as $[p@1,\ldots,p@n]$ and~$p$. The nonterminal symbols are identified with 

366 
types~$\tau$, $\tau@1$, \ldots,~$\tau@n$ respectively, and the production's 

367 
effect on nonterminals is expressed as the function type 

368 
\[ [\tau@1, \ldots, \tau@n]\To \tau. \] 

369 
Finally, the template 

370 
\[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n \] 

371 
describes the strings of terminals. 

104  372 

291  373 
A simple type is typically declared for each nonterminal symbol. In 
374 
firstorder logic, type~$i$ stands for terms and~$o$ for formulae. Only 

375 
the outermost type constructor is taken into account. For example, any 

376 
type of the form $\sigma list$ stands for a list; productions may refer 

377 
to the symbol {\tt list} and will apply lists of any type. 

378 

379 
The symbol associated with a type is called its {\bf root} since it may 

380 
serve as the root of a parse tree. Precisely, the root of $(\tau@1, \dots, 

381 
\tau@n)ty$ is $ty$, where $\tau@1$, \ldots, $\tau@n$ are types and $ty$ is 

382 
a type constructor. Type infixes are a special case of this; in 

383 
particular, the root of $\tau@1 \To \tau@2$ is {\tt fun}. Finally, the 

384 
root of a type variable is {\tt logic}; general productions might 

385 
refer to this nonterminal. 

104  386 

291  387 
Identifying nonterminals with types allows a constant's type to specify 
388 
syntax as well. We can declare the function~$f$ to have type $[\tau@1, 

389 
\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the 

390 
layout of the function's $n$ arguments. The constant's name, in this 

391 
case~$f$, will also serve as the label in the abstract syntax tree. There 

392 
are two exceptions to this treatment of constants: 

393 
\begin{enumerate} 

394 
\item A production need not map directly to a logical function. In this 

395 
case, you must declare a constant whose purpose is purely syntactic. 

396 
By convention such constants begin with the symbol~{\tt\at}, 

397 
ensuring that they can never be written in formulae. 

398 

399 
\item A copy production has no associated constant. 

400 
\end{enumerate} 

401 
There is something artificial about this representation of productions, 

402 
but it is convenient, particularly for simple theory extensions. 

104  403 

291  404 
\subsection{The general mixfix form} 
405 
Here is a detailed account of the general \bfindex{mixfix declaration} as 

406 
it may occur within the {\tt consts} section of a {\tt .thy} file. 

407 
\begin{center} 

408 
{\tt "$c$" ::\ "$\sigma$" ("$template$" $ps$ $p$)} 

409 
\end{center} 

410 
This constant declaration and mixfix annotation is interpreted as follows: 

411 
\begin{itemize} 

412 
\item The string {\tt "$c$"} is the name of the constant associated with 

413 
the production. If $c$ is empty (given as~{\tt ""}) then this is a copy 

414 
production.\index{productions!copy} Otherwise, parsing an instance of the 

415 
phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$ 

416 
$a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$th 

417 
argument. 

418 

419 
\item The constant $c$, if nonempty, is declared to have type $\sigma$. 

104  420 

291  421 
\item The string $template$ specifies the righthand side of 
422 
the production. It has the form 

423 
\[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \] 

424 
where each occurrence of \ttindex{_} denotes an 

425 
argument\index{argument!mixfix} position and the~$w@i$ do not 

426 
contain~{\tt _}. (If you want a literal~{\tt _} in the concrete 

427 
syntax, you must escape it as described below.) The $w@i$ may 

428 
consist of \rmindex{delimiters}, spaces or \rmindex{pretty 

429 
printing} annotations (see below). 

104  430 

291  431 
\item The type $\sigma$ specifies the production's nonterminal symbols (or name 
432 
tokens). If $template$ is of the form above then $\sigma$ must be a 

433 
function type with at least~$n$ argument positions, say $\sigma = 

434 
[\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are derived 

435 
from the type $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described above. 

436 
Any of these may be function types; the corresponding root is then {\tt 

437 
fun}. 

104  438 

291  439 
\item The optional list~$ps$ may contain at most $n$ integers, say {\tt 
440 
[$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal 

441 
priority\indexbold{priorities} required of any phrase that may appear 

442 
as the $i$th argument. Missing priorities default to~$0$. 

443 

444 
\item The integer $p$ is the priority of this production. If omitted, it 

445 
defaults to the maximal priority. 

446 

447 
Priorities, or precedences, range between $0$ and 

448 
$max_pri$\indexbold{max_pri@$max_pri$} (= 1000). 

104  449 
\end{itemize} 
450 

291  451 
The declaration {\tt $c$ ::\ "$\sigma$" ("$template$")} specifies no 
452 
priorities. The resulting production puts no priority constraints on any 

453 
of its arguments and has maximal priority itself. Omitting priorities in 

454 
this manner will introduce syntactic ambiguities unless the production's 

455 
righthand side is fully bracketed, as in \verb"if _ then _ else _ fi". 

104  456 

291  457 
\begin{warn} 
458 
Theories must sometimes declare types for purely syntactic purposes. One 

459 
example is {\tt type}, the builtin type of types. This is a `type of 

460 
all types' in the syntactic sense only. Do not declare such types under 

461 
{\tt arities} as belonging to class $logic$, for that would allow their 

462 
use in arbitrary Isabelle expressions~(\S\ref{logicaltypes}). 

463 
\end{warn} 

464 

465 
\subsection{Example: arithmetic expressions} 

466 
This theory specification contains a {\tt consts} section with mixfix 

467 
declarations encoding the priority grammar from 

468 
\S\ref{sec:priority_grammars}: 

104  469 
\begin{ttbox} 
470 
EXP = Pure + 

471 
types 

291  472 
exp 
104  473 
arities 
474 
exp :: logic 

475 
consts 

291  476 
"0" :: "exp" ("0" 9) 
477 
"+" :: "[exp, exp] => exp" ("_ + _" [0, 1] 0) 

478 
"*" :: "[exp, exp] => exp" ("_ * _" [3, 2] 2) 

479 
"" :: "exp => exp" (" _" [3] 3) 

104  480 
end 
481 
\end{ttbox} 

482 
Note that the {\tt arities} declaration causes {\tt exp} to be added to the 

291  483 
syntax' roots. If you put the text above into a file {\tt exp.thy} and load 
135  484 
it via {\tt use_thy "EXP"}, you can run some tests: 
104  485 
\begin{ttbox} 
486 
val read_exp = Syntax.test_read (syn_of EXP.thy) "exp"; 

291  487 
{\out val it = fn : string > unit} 
104  488 
read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0"; 
489 
{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"} 

490 
{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")} 

491 
{\out \vdots} 

492 
read_exp "0 +  0 + 0"; 

493 
{\out tokens: "0" "+" "" "0" "+" "0"} 

494 
{\out raw: ("+" ("+" "0" ("" "0")) "0")} 

495 
{\out \vdots} 

496 
\end{ttbox} 

497 
The output of \ttindex{Syntax.test_read} includes the token list ({\tt 

291  498 
tokens}) and the raw \AST{} directly derived from the parse tree, 
499 
ignoring parse \AST{} translations. The rest is tracing information 

500 
provided by the macro expander (see \S\ref{sec:macros}). 

104  501 

291  502 
Executing {\tt Syntax.print_gram} reveals the productions derived 
503 
from our mixfix declarations (lots of additional information deleted): 

104  504 
\begin{ttbox} 
291  505 
Syntax.print_gram (syn_of EXP.thy); 
506 
{\out exp = "0" => "0" (9)} 

507 
{\out exp = exp[0] "+" exp[1] => "+" (0)} 

508 
{\out exp = exp[3] "*" exp[2] => "*" (2)} 

509 
{\out exp = "" exp[3] => "" (3)} 

104  510 
\end{ttbox} 
511 

291  512 

513 
\subsection{The mixfix template} 

514 
Let us take a closer look at the string $template$ appearing in mixfix 

515 
annotations. This string specifies a list of parsing and printing 

516 
directives: delimiters\index{delimiter}, arguments\index{argument!mixfix}, 

517 
spaces, blocks of indentation and line breaks. These are encoded via the 

104  518 
following character sequences: 
519 

291  520 
\index{pretty printing(} 
104  521 
\begin{description} 
291  522 
\item[~\ttindex_~] An argument\index{argument!mixfix} position, which 
523 
stands for a nonterminal symbol or name token. 

104  524 

291  525 
\item[~$d$~] A \rmindex{delimiter}, namely a nonempty sequence of 
526 
nonspecial or escaped characters. Escaping a character\index{escape 

527 
character} means preceding it with a {\tt '} (single quote). Thus 

528 
you have to write {\tt ''} if you really want a single quote. You must 

529 
also escape {\tt _}, {\tt (}, {\tt )} and {\tt /}. Delimiters may 

530 
never contain white space, though. 

104  531 

291  532 
\item[~$s$~] A nonempty sequence of spaces for printing. This 
533 
and the following specifications do not affect parsing at all. 

104  534 

291  535 
\item[~{\ttindex($n$}~] Open a pretty printing block. The optional 
536 
number $n$ specifies how much indentation to add when a line break 

537 
occurs within the block. If {\tt(} is not followed by digits, the 

538 
indentation defaults to~$0$. 

104  539 

291  540 
\item[~\ttindex)~] Close a pretty printing block. 
104  541 

291  542 
\item[~\ttindex{//}~] Force a line break. 
104  543 

291  544 
\item[~\ttindex/$s$~] Allow a line break. Here $s$ stands for the string 
545 
of spaces (zero or more) right after the {\tt /} character. These 

546 
spaces are printed if the break is not taken. 

547 
\end{description} 

548 
Isabelle's pretty printer resembles the one described in 

549 
Paulson~\cite{paulson91}. \index{pretty printing)} 

104  550 

551 

552 
\subsection{Infixes} 

291  553 
\indexbold{infix operators} 
554 
Infix operators associating to the left or right can be declared 

555 
using {\tt infixl} or {\tt infixr}. 

556 
Roughly speaking, the form {\tt $c$ ::\ "$\sigma$" (infixl $p$)} 

557 
abbreviates the declarations 

104  558 
\begin{ttbox} 
291  559 
"op \(c\)" :: "\(\sigma\)" ("op \(c\)") 
560 
"op \(c\)" :: "\(\sigma\)" ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\)) 

104  561 
\end{ttbox} 
291  562 
and {\tt $c$ ::\ "$\sigma$" (infixr $p$)} abbreviates the declarations 
104  563 
\begin{ttbox} 
291  564 
"op \(c\)" :: "\(\sigma\)" ("op \(c\)") 
565 
"op \(c\)" :: "\(\sigma\)" ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\)) 

104  566 
\end{ttbox} 
291  567 
The infix operator is declared as a constant with the prefix {\tt op}. 
104  568 
Thus, prefixing infixes with \ttindex{op} makes them behave like ordinary 
291  569 
function symbols, as in \ML. Special characters occurring in~$c$ must be 
570 
escaped, as in delimiters, using a single quote. 

571 

572 
The expanded forms above would be actually illegal in a {\tt .thy} file 

573 
because they declare the constant \hbox{\tt"op \(c\)"} twice. 

104  574 

575 

576 
\subsection{Binders} 

291  577 
\indexbold{binders} 
578 
\begingroup 

579 
\def\Q{{\cal Q}} 

580 
A {\bf binder} is a variablebinding construct such as a quantifier. The 

581 
binder declaration \indexbold{*binder} 

104  582 
\begin{ttbox} 
291  583 
\(c\) :: "\(\sigma\)" (binder "\(\Q\)" \(p\)) 
104  584 
\end{ttbox} 
291  585 
introduces a constant~$c$ of type~$\sigma$, which must have the form 
586 
$(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is $\Q~x.P$, where 

587 
$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$ 

588 
and the whole term has type~$\tau@3$. Special characters in $\Q$ must be 

589 
escaped using a single quote. 

590 

591 
Let us declare the quantifier~$\forall$: 

104  592 
\begin{ttbox} 
593 
All :: "('a => o) => o" (binder "ALL " 10) 

594 
\end{ttbox} 

291  595 
This let us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL 
596 
$x$.$P$}. When printing, Isabelle prefers the latter form, but must fall 

597 
back on $\mtt{All}(P)$ if $P$ is not an abstraction. Both $P$ and {\tt ALL 

598 
$x$.$P$} have type~$o$, the type of formulae, while the bound variable 

599 
can be polymorphic. 

104  600 

291  601 
The binder~$c$ of type $(\sigma \To \tau) \To \tau$ can be nested. The 
602 
external form $\Q~x@1~x@2 \ldots x@n. P$ corresponds to the internal form 

603 
\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)) \] 

104  604 

605 
\medskip 

606 
The general binder declaration 

607 
\begin{ttbox} 

291  608 
\(c\) :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)" (binder "\(\Q\)" \(p\)) 
104  609 
\end{ttbox} 
610 
is internally expanded to 

611 
\begin{ttbox} 

291  612 
\(c\) :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)" 
613 
"\(\Q\)"\hskip3pt :: "[idts, \(\tau@2\)] => \(\tau@3\)" ("(3\(\Q\)_./ _)" \(p\)) 

104  614 
\end{ttbox} 
291  615 
with $idts$ being the nonterminal symbol for a list of $id$s optionally 
616 
constrained (see Fig.\ts\ref{fig:pure_gram}). The declaration also 

617 
installs a parse translation\index{translations!parse} for~$\Q$ and a print 

618 
translation\index{translations!print} for~$c$ to translate between the 

619 
internal and external forms. 

620 
\endgroup 

104  621 

291  622 
\index{mixfix declaration)} 
104  623 

624 

625 
\section{Example: some minimal logics} \label{sec:min_logics} 

291  626 
This section presents some examples that have a simple syntax. They 
627 
demonstrate how to define new objectlogics from scratch. 

104  628 

291  629 
First we must define how an objectlogic syntax embedded into the 
630 
metalogic. Since all theorems must conform to the syntax for~$prop$ (see 

631 
Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the 

632 
objectlevel syntax. Assume that the syntax of your objectlogic defines a 

633 
nonterminal symbol~$o$ of formulae. These formulae can now appear in 

634 
axioms and theorems wherever $prop$ does if you add the production 

104  635 
\[ prop ~=~ o. \] 
291  636 
This is not a copy production but a coercion from formulae to propositions: 
104  637 
\begin{ttbox} 
638 
Base = Pure + 

639 
types 

291  640 
o 
104  641 
arities 
642 
o :: logic 

643 
consts 

644 
Trueprop :: "o => prop" ("_" 5) 

645 
end 

646 
\end{ttbox} 

647 
The constant {\tt Trueprop} (the name is arbitrary) acts as an invisible 

291  648 
coercion function. Assuming this definition resides in a file {\tt base.thy}, 
135  649 
you have to load it with the command {\tt use_thy "Base"}. 
104  650 

291  651 
One of the simplest nontrivial logics is {\bf minimal logic} of 
652 
implication. Its definition in Isabelle needs no advanced features but 

653 
illustrates the overall mechanism nicely: 

104  654 
\begin{ttbox} 
655 
Hilbert = Base + 

656 
consts 

657 
">" :: "[o, o] => o" (infixr 10) 

658 
rules 

659 
K "P > Q > P" 

660 
S "(P > Q > R) > (P > Q) > P > R" 

661 
MP "[ P > Q; P ] ==> Q" 

662 
end 

663 
\end{ttbox} 

291  664 
After loading this definition from the file {\tt hilbert.thy}, you can 
665 
start to prove theorems in the logic: 

104  666 
\begin{ttbox} 
667 
goal Hilbert.thy "P > P"; 

668 
{\out Level 0} 

669 
{\out P > P} 

670 
{\out 1. P > P} 

291  671 
\ttbreak 
104  672 
by (resolve_tac [Hilbert.MP] 1); 
673 
{\out Level 1} 

674 
{\out P > P} 

675 
{\out 1. ?P > P > P} 

676 
{\out 2. ?P} 

291  677 
\ttbreak 
104  678 
by (resolve_tac [Hilbert.MP] 1); 
679 
{\out Level 2} 

680 
{\out P > P} 

681 
{\out 1. ?P1 > ?P > P > P} 

682 
{\out 2. ?P1} 

683 
{\out 3. ?P} 

291  684 
\ttbreak 
104  685 
by (resolve_tac [Hilbert.S] 1); 
686 
{\out Level 3} 

687 
{\out P > P} 

688 
{\out 1. P > ?Q2 > P} 

689 
{\out 2. P > ?Q2} 

291  690 
\ttbreak 
104  691 
by (resolve_tac [Hilbert.K] 1); 
692 
{\out Level 4} 

693 
{\out P > P} 

694 
{\out 1. P > ?Q2} 

291  695 
\ttbreak 
104  696 
by (resolve_tac [Hilbert.K] 1); 
697 
{\out Level 5} 

698 
{\out P > P} 

699 
{\out No subgoals!} 

700 
\end{ttbox} 

291  701 
As we can see, this Hilbertstyle formulation of minimal logic is easy to 
702 
define but difficult to use. The following natural deduction formulation is 

703 
better: 

104  704 
\begin{ttbox} 
705 
MinI = Base + 

706 
consts 

707 
">" :: "[o, o] => o" (infixr 10) 

708 
rules 

709 
impI "(P ==> Q) ==> P > Q" 

710 
impE "[ P > Q; P ] ==> Q" 

711 
end 

712 
\end{ttbox} 

291  713 
Note, however, that although the two systems are equivalent, this fact 
714 
cannot be proved within Isabelle. Axioms {\tt S} and {\tt K} can be 

715 
derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt 

716 
Hilbert}. The reason is that {\tt impI} is only an {\bf admissible} rule 

717 
in {\tt Hilbert}, something that can only be shown by induction over all 

718 
possible proofs in {\tt Hilbert}. 

104  719 

291  720 
We may easily extend minimal logic with falsity: 
104  721 
\begin{ttbox} 
722 
MinIF = MinI + 

723 
consts 

724 
False :: "o" 

725 
rules 

726 
FalseE "False ==> P" 

727 
end 

728 
\end{ttbox} 

729 
On the other hand, we may wish to introduce conjunction only: 

730 
\begin{ttbox} 

731 
MinC = Base + 

732 
consts 

733 
"&" :: "[o, o] => o" (infixr 30) 

291  734 
\ttbreak 
104  735 
rules 
736 
conjI "[ P; Q ] ==> P & Q" 

737 
conjE1 "P & Q ==> P" 

738 
conjE2 "P & Q ==> Q" 

739 
end 

740 
\end{ttbox} 

291  741 
And if we want to have all three connectives together, we create and load a 
742 
theory file consisting of a single line:\footnote{We can combine the 

743 
theories without creating a theory file using the ML declaration 

744 
\begin{ttbox} 

745 
val MinIFC_thy = merge_theories(MinIF,MinC) 

746 
\end{ttbox} 

747 
\index{*merge_theoriesfnote}} 

104  748 
\begin{ttbox} 
749 
MinIFC = MinIF + MinC 

750 
\end{ttbox} 

751 
Now we can prove mixed theorems like 

752 
\begin{ttbox} 

753 
goal MinIFC.thy "P & False > Q"; 

754 
by (resolve_tac [MinI.impI] 1); 

755 
by (dresolve_tac [MinC.conjE2] 1); 

756 
by (eresolve_tac [MinIF.FalseE] 1); 

757 
\end{ttbox} 

758 
Try this as an exercise! 

759 

291  760 
\medskip 
761 
Unless you need to define macros or syntax translation functions, you may 

762 
skip the rest of this chapter. 

763 

764 

765 
\section{*Abstract syntax trees} \label{sec:asts} 

766 
\index{trees!abstract syntax(} The parser, given a token list from the 

767 
lexer, applies productions to yield a parse tree\index{trees!parse}. By 

768 
applying some internal transformations the parse tree becomes an abstract 

769 
syntax tree, or \AST{}. Macro expansion, further translations and finally 

770 
type inference yields a welltyped term\index{terms!obtained from ASTs}. 

771 
The printing process is the reverse, except for some subtleties to be 

772 
discussed later. 

773 

774 
Figure~\ref{fig:parse_print} outlines the parsing and printing process. 

775 
Much of the complexity is due to the macro mechanism. Using macros, you 

776 
can specify most forms of concrete syntax without writing any \ML{} code. 

777 

778 
\begin{figure} 

779 
\begin{center} 

780 
\begin{tabular}{cl} 

781 
string & \\ 

782 
$\downarrow$ & parser \\ 

783 
parse tree & \\ 

784 
$\downarrow$ & parse \AST{} translation \\ 

785 
\AST{} & \\ 

786 
$\downarrow$ & \AST{} rewriting (macros) \\ 

787 
\AST{} & \\ 

788 
$\downarrow$ & parse translation, type inference \\ 

789 
 welltyped term  & \\ 

790 
$\downarrow$ & print translation \\ 

791 
\AST{} & \\ 

792 
$\downarrow$ & \AST{} rewriting (macros) \\ 

793 
\AST{} & \\ 

794 
$\downarrow$ & print \AST{} translation, printer \\ 

795 
string & 

796 
\end{tabular} 

797 
\index{translations!parse}\index{translations!parse AST} 

798 
\index{translations!print}\index{translations!print AST} 

799 

800 
\end{center} 

801 
\caption{Parsing and printing}\label{fig:parse_print} 

802 
\end{figure} 

803 

804 
Abstract syntax trees are an intermediate form between the raw parse trees 

805 
and the typed $\lambda$terms. An \AST{} is either an atom (constant or 

806 
variable) or a list of {\em at least two\/} subtrees. Internally, they 

807 
have type \ttindex{Syntax.ast}: \index{*Constant} \index{*Variable} 

808 
\index{*Appl} 

809 
\begin{ttbox} 

810 
datatype ast = Constant of string 

811 
 Variable of string 

812 
 Appl of ast list 

813 
\end{ttbox} 

814 

815 
Isabelle uses an Sexpression syntax for abstract syntax trees. Constant 

816 
atoms are shown as quoted strings, variable atoms as nonquoted strings and 

817 
applications as a parenthesized list of subtrees. For example, the \AST 

818 
\begin{ttbox} 

819 
Appl [Constant "_constrain", 

820 
Appl [Constant "_abs", Variable "x", Variable "t"], 

821 
Appl [Constant "fun", Variable "'a", Variable "'b"]] 

822 
\end{ttbox} 

823 
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. 

824 
Both {\tt ()} and {\tt (f)} are illegal because they have too few 

825 
subtrees. 

826 

827 
The resemblance of Lisp's Sexpressions is intentional, but there are two 

828 
kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the 

829 
names ``{\tt Constant}'' and ``{\tt Variable}'' too literally; in the later 

830 
translation to terms, $\Variable x$ may become a constant, free or bound 

831 
variable, even a type constructor or class name; the actual outcome depends 

832 
on the context. 

833 

834 
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the 

835 
application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of 

836 
application is determined later by context; it could be a type constructor 

837 
applied to types. 

838 

839 
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are 

840 
firstorder: the {\tt "_abs"} does not bind the {\tt x} in any way. Later 

841 
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and 

842 
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term 

843 
constructor \ttindex{Bound}). 

844 

845 

846 
\subsection{Transforming parse trees to \AST{}s} 

847 
The parse tree is the raw output of the parser. Translation functions, 

848 
called {\bf parse AST translations}\indexbold{translations!parse AST}, 

849 
transform the parse tree into an abstract syntax tree. 

850 

851 
The parse tree is constructed by nesting the righthand sides of the 

852 
productions used to recognize the input. Such parse trees are simply lists 

853 
of tokens and constituent parse trees, the latter representing the 

854 
nonterminals of the productions. Let us refer to the actual productions in 

855 
the form displayed by {\tt Syntax.print_syntax}. 

856 

857 
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s 

858 
by stripping out delimiters and copy productions. More precisely, the 

859 
mapping $ast_of_pt$\index{ast_of_pt@$ast_of_pt$} is derived from the 

860 
productions as follows: 

861 
\begin{itemize} 

862 
\item Name tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$, 

863 
$var$, $tid$ or $tvar$ token, and $s$ its associated string. 

864 

865 
\item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$. 

866 
Here $\ldots$ stands for strings of delimiters, which are 

867 
discarded. $P$ stands for the single constituent that is not a 

868 
delimiter; it is either a nonterminal symbol or a name token. 

869 

870 
\item $0$ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$. 

871 
Here there are no constituents other than delimiters, which are 

872 
discarded. 

873 

874 
\item $n$ary productions, where $n \ge 1$: delimiters are discarded and 

875 
the remaining constituents $P@1$, \ldots, $P@n$ are built into an 

876 
application whose head constant is~$c$: 

877 
\begin{eqnarray*} 

878 
\lefteqn{ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>} c)} \\ 

879 
&&\qquad{}= \Appl{\Constant c, ast_of_pt(P@1), \ldots, ast_of_pt(P@n)} 

880 
\end{eqnarray*} 

881 
\end{itemize} 

882 
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, 

883 
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. 

884 
These examples illustrate the need for further translations to make \AST{}s 

885 
closer to the typed $\lambda$calculus. The Pure syntax provides 

886 
predefined parse \AST{} translations\index{translations!parse AST} for 

887 
ordinary applications, type applications, nested abstractions, meta 

888 
implications and function types. Figure~\ref{fig:parse_ast_tr} shows their 

889 
effect on some representative input strings. 

890 

891 

892 
\begin{figure} 

893 
\begin{center} 

894 
\tt\begin{tabular}{ll} 

895 
\rm input string & \rm \AST \\\hline 

896 
"f" & f \\ 

897 
"'a" & 'a \\ 

898 
"t == u" & ("==" t u) \\ 

899 
"f(x)" & ("_appl" f x) \\ 

900 
"f(x, y)" & ("_appl" f ("_args" x y)) \\ 

901 
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ 

902 
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ 

903 
\end{tabular} 

904 
\end{center} 

905 
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 

906 
\end{figure} 

907 

908 
\begin{figure} 

909 
\begin{center} 

910 
\tt\begin{tabular}{ll} 

911 
\rm input string & \rm \AST{} \\\hline 

912 
"f(x, y, z)" & (f x y z) \\ 

913 
"'a ty" & (ty 'a) \\ 

914 
"('a, 'b) ty" & (ty 'a 'b) \\ 

915 
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ 

916 
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ 

917 
"[ P; Q; R ] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ 

918 
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) 

919 
\end{tabular} 

920 
\end{center} 

921 
\caption{Builtin parse \AST{} translations}\label{fig:parse_ast_tr} 

922 
\end{figure} 

923 

924 
The names of constant heads in the \AST{} control the translation process. 

925 
The list of constants invoking parse \AST{} translations appears in the 

926 
output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}. 

927 

928 

929 
\subsection{Transforming \AST{}s to terms} 

930 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is 

931 
transformed into a term. This term is probably illtyped since type 

932 
inference has not occurred yet. The term may contain type constraints 

933 
consisting of applications with head {\tt "_constrain"}; the second 

934 
argument is a type encoded as a term. Type inference later introduces 

935 
correct types or rejects the input. 

936 

937 
Another set of translation functions, namely parse 

938 
translations,\index{translations!parse}, may affect this process. If we 

939 
ignore parse translations for the time being, then \AST{}s are transformed 

940 
to terms by mapping \AST{} constants to constants, \AST{} variables to 

941 
schematic or free variables and \AST{} applications to applications. 

942 

943 
More precisely, the mapping $term_of_ast$\index{term_of_ast@$term_of_ast$} 

944 
is defined by 

945 
\begin{itemize} 

946 
\item Constants: $term_of_ast(\Constant x) = \ttfct{Const} (x, 

947 
\mtt{dummyT})$. 

948 

949 
\item Schematic variables: $term_of_ast(\Variable \mtt{"?}xi\mtt") = 

950 
\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$ 

951 
the index extracted from $xi$. 

952 

953 
\item Free variables: $term_of_ast(\Variable x) = \ttfct{Free} (x, 

954 
\mtt{dummyT})$. 

955 

956 
\item Function applications with $n$ arguments: 

957 
\begin{eqnarray*} 

958 
\lefteqn{term_of_ast(\Appl{f, x@1, \ldots, x@n})} \\ 

959 
&&\qquad{}= term_of_ast(f) \ttapp 

960 
term_of_ast(x@1) \ttapp \ldots \ttapp term_of_ast(x@n) 

961 
\end{eqnarray*} 

962 
\end{itemize} 

963 
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and 

964 
\verb$\index{$@{\tt\$}} are constructors of the datatype {\tt term}, 

965 
while \ttindex{dummyT} stands for some dummy type that is ignored during 

966 
type inference. 

967 

968 
So far the outcome is still a firstorder term. Abstractions and bound 

969 
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced 

970 
by parse translations. Such translations are attached to {\tt "_abs"}, 

971 
{\tt "!!"} and userdefined binders. 

972 

973 

974 
\subsection{Printing of terms} 

975 
The output phase is essentially the inverse of the input phase. Terms are 

976 
translated via abstract syntax trees into strings. Finally the strings are 

977 
pretty printed. 

978 

979 
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of 

980 
terms into \AST{}s. Ignoring those, the transformation maps 

981 
term constants, variables and applications to the corresponding constructs 

982 
on \AST{}s. Abstractions are mapped to applications of the special 

983 
constant {\tt _abs}. 

984 

985 
More precisely, the mapping $ast_of_term$\index{ast_of_term@$ast_of_term$} 

986 
is defined as follows: 

987 
\begin{itemize} 

988 
\item $ast_of_term(\ttfct{Const} (x, \tau)) = \Constant x$. 

989 

990 
\item $ast_of_term(\ttfct{Free} (x, \tau)) = constrain (\Variable x, 

991 
\tau)$. 

992 

993 
\item $ast_of_term(\ttfct{Var} ((x, i), \tau)) = constrain (\Variable 

994 
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of 

995 
the {\tt indexname} $(x, i)$. 

996 

997 
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant 

998 
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ 

999 
be obtained from~$t$ by replacing all bound occurrences of~$x$ by 

1000 
the free variable $x'$. This replaces corresponding occurrences of the 

1001 
constructor \ttindex{Bound} by the term $\ttfct{Free} (x', 

1002 
\mtt{dummyT})$: 

1003 
\begin{eqnarray*} 

1004 
\lefteqn{ast_of_term(\ttfct{Abs} (x, \tau, t))} \\ 

1005 
&&\qquad{}= \ttfct{Appl} 

1006 
\mathopen{\mtt[} 

1007 
\Constant \mtt{"_abs"}, constrain(\Variable x', \tau), \\ 

1008 
&&\qquad\qquad\qquad ast_of_term(t') \mathclose{\mtt]}. 

1009 
\end{eqnarray*} 

1010 

1011 
\item $ast_of_term(\ttfct{Bound} i) = \Variable \mtt{"B.}i\mtt"$. 

1012 
The occurrence of constructor \ttindex{Bound} should never happen 

1013 
when printing welltyped terms; it indicates a de Bruijn index with no 

1014 
matching abstraction. 

1015 

1016 
\item Where $f$ is not an application, 

1017 
\begin{eqnarray*} 

1018 
\lefteqn{ast_of_term(f \ttapp x@1 \ttapp \ldots \ttapp x@n)} \\ 

1019 
&&\qquad{}= \ttfct{Appl} 

1020 
\mathopen{\mtt[} ast_of_term(f), 

1021 
ast_of_term(x@1), \ldots,ast_of_term(x@n) 

1022 
\mathclose{\mtt]} 

1023 
\end{eqnarray*} 

1024 
\end{itemize} 

1025 

1026 
Type constraints are inserted to allow the printing of types, which is 

1027 
governed by the boolean variable \ttindex{show_types}. Constraints are 

1028 
treated as follows: 

1029 
\begin{itemize} 

1030 
\item $constrain(x, \tau) = x$, if $\tau = \mtt{dummyT}$ \index{*dummyT} or 

1031 
\ttindex{show_types} not set to {\tt true}. 

1032 

1033 
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, ty}$, 

1034 
where $ty$ is the \AST{} encoding of $\tau$. That is, type constructors as 

1035 
{\tt Constant}s, type identifiers as {\tt Variable}s and type applications 

1036 
as {\tt Appl}s with the head type constructor as first element. 

1037 
Additionally, if \ttindex{show_sorts} is set to {\tt true}, some type 

1038 
variables are decorated with an \AST{} encoding of their sort. 

1039 
\end{itemize} 

1040 

1041 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is 

1042 
transformed into the final output string. The builtin {\bf print AST 

1043 
translations}\indexbold{translations!print AST} effectively reverse the 

1044 
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}. 

1045 

1046 
For the actual printing process, the names attached to productions 

1047 
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play 

1048 
a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or 

1049 
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production 

1050 
for~$c$. Each argument~$x@i$ is converted to a string, and put in 

1051 
parentheses if its priority~$(p@i)$ requires this. The resulting strings 

1052 
and their syntactic sugar (denoted by ``\dots'' above) are joined to make a 

1053 
single string. 

1054 

1055 
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the 

1056 
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots 

1057 
x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with 

1058 
nonconstant head or without a corresponding production are printed as 

1059 
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. An occurrence of 

1060 
$\Variable x$ is simply printed as~$x$. 

1061 

1062 
Blanks are {\em not\/} inserted automatically. If blanks are required to 

1063 
separate tokens, specify them in the mixfix declaration, possibly preceeded 

1064 
by a slash~({\tt/}) to allow a line break. 

1065 
\index{trees!abstract syntax)} 

1066 

1067 

1068 

1069 
\section{*Macros: Syntactic rewriting} \label{sec:macros} 

1070 
\index{macros(}\index{rewriting!syntactic(} 

1071 

1072 
Mixfix declarations alone can handle situations where there is a direct 

1073 
connection between the concrete syntax and the underlying term. Sometimes 

1074 
we require a more elaborate concrete syntax, such as quantifiers and list 

1075 
notation. Isabelle's {\bf macros} and {\bf translation functions} can 

1076 
perform translations such as 

1077 
\begin{center}\tt 

1078 
\begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l} 

1079 
ALL x:A.P & Ball(A, \%x.P) \\ \relax 

1080 
[x, y, z] & Cons(x, Cons(y, Cons(z, Nil))) 

1081 
\end{tabular} 

1082 
\end{center} 

1083 
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they 

1084 
are the most powerful translation mechanism but are difficult to read or 

1085 
write. Macros are specified by firstorder rewriting systems that operate 

1086 
on abstract syntax trees. They are usually easy to read and write, and can 

1087 
express all but the most obscure translations. 

1088 

1089 
Figure~\ref{fig:set_trans} defines a fragment of firstorder logic and set 

1090 
theory.\footnote{This and the following theories are complete working 

1091 
examples, though they specify only syntax, no axioms. The file {\tt 

1092 
ZF/zf.thy} presents the full set theory definition, including many 

1093 
macro rules.} Theory {\tt SET} defines constants for set comprehension 

1094 
({\tt Collect}), replacement ({\tt Replace}) and bounded universal 

1095 
quantification ({\tt Ball}). Each of these binds some variables. Without 

1096 
additional syntax we should have to express $\forall x \in A. P$ as {\tt 

1097 
Ball(A,\%x.P)}, and similarly for the others. 

1098 

1099 
\begin{figure} 

1100 
\begin{ttbox} 

1101 
SET = Pure + 

1102 
types 

1103 
i, o 

1104 
arities 

1105 
i, o :: logic 

1106 
consts 

1107 
Trueprop :: "o => prop" ("_" 5) 

1108 
Collect :: "[i, i => o] => i" 

1109 
"{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})") 

1110 
Replace :: "[i, [i, i] => o] => i" 

1111 
"{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})") 

1112 
Ball :: "[i, i => o] => o" 

1113 
"{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10) 

1114 
translations 

1115 
"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)" 

1116 
"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)" 

1117 
"ALL x:A. P" == "Ball(A, \%x. P)" 

1118 
end 

1119 
\end{ttbox} 

1120 
\caption{Macro example: set theory}\label{fig:set_trans} 

1121 
\end{figure} 

1122 

1123 
The theory specifies a variablebinding syntax through additional 

1124 
productions that have mixfix declarations. Each noncopy production must 

1125 
specify some constant, which is used for building \AST{}s. The additional 

1126 
constants are decorated with {\tt\at} to stress their purely syntactic 

1127 
purpose; they should never occur within the final welltyped terms. 

1128 
Furthermore, they cannot be written in formulae because they are not legal 

1129 
identifiers. 

1130 

1131 
The translations cause the replacement of external forms by internal forms 

1132 
after parsing, and vice versa before printing of terms. As a specification 

1133 
of the set theory notation, they should be largely selfexplanatory. The 

1134 
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball}, 

1135 
appear implicitly in the macro rules via their mixfix forms. 

1136 

1137 
Macros can define variablebinding syntax because they operate on \AST{}s, 

1138 
which have no inbuilt notion of bound variable. The macro variables {\tt 

1139 
x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers, 

1140 
in this case bound variables. The macro variables {\tt P} and~{\tt Q} 

1141 
range over formulae containing bound variable occurrences. 

1142 

1143 
Other applications of the macro system can be less straightforward, and 

1144 
there are peculiarities. The rest of this section will describe in detail 

1145 
how Isabelle macros are preprocessed and applied. 

1146 

1147 

1148 
\subsection{Specifying macros} 

1149 
Macros are basically rewrite rules on \AST{}s. But unlike other macro 

1150 
systems found in programming languages, Isabelle's macros work in both 

1151 
directions. Therefore a syntax contains two lists of rewrites: one for 

1152 
parsing and one for printing. 

1153 

1154 
The {\tt translations} section\index{translations section@{\tt translations} 

1155 
section} specifies macros. The syntax for a macro is 

1156 
\[ (root)\; string \quad 

1157 
\left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array} 

1158 
\right\} \quad 

1159 
(root)\; string 

1160 
\] 

1161 
% 

1162 
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both 

1163 
({\tt ==}). The two strings specify the left and righthand sides of the 

1164 
macro rule. The $(root)$ specification is optional; it specifies the 

1165 
nonterminal for parsing the $string$ and if omitted defaults to {\tt 

1166 
logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions: 

1167 
\begin{itemize} 

1168 
\item Rules must be left linear: $l$ must not contain repeated variables. 

1169 

1170 
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l = 

1171 
(\mtt"c\mtt" ~ x@1 \ldots x@n)$. 

1172 

1173 
\item Every variable in~$r$ must also occur in~$l$. 

1174 
\end{itemize} 

1175 

1176 
Macro rules may refer to any syntax from the parent theories. They may 

1177 
also refer to anything defined before the the {\tt .thy} file's {\tt 

1178 
translations} section  including any mixfix declarations. 

1179 

1180 
Upon declaration, both sides of the macro rule undergo parsing and parse 

1181 
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo 

1182 
macro expansion. The lexer runs in a different mode that additionally 

1183 
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt}, 

1184 
{\tt _K}). Thus, a constant whose name starts with an underscore can 

1185 
appear in macro rules but not in ordinary terms. 

1186 

1187 
Some atoms of the macro rule's \AST{} are designated as constants for 

1188 
matching. These are all names that have been declared as classes, types or 

1189 
constants. 

1190 

1191 
The result of this preprocessing is two lists of macro rules, each stored 

1192 
as a pair of \AST{}s. They can be viewed using {\tt Syntax.print_syntax} 

1193 
(sections \ttindex{parse_rules} and \ttindex{print_rules}). For 

1194 
theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are 

1195 
\begin{ttbox} 

1196 
parse_rules: 

1197 
("{\at}Collect" x A P) > ("Collect" A ("_abs" x P)) 

1198 
("{\at}Replace" y x A Q) > ("Replace" A ("_abs" x ("_abs" y Q))) 

1199 
("{\at}Ball" x A P) > ("Ball" A ("_abs" x P)) 

1200 
print_rules: 

1201 
("Collect" A ("_abs" x P)) > ("{\at}Collect" x A P) 

1202 
("Replace" A ("_abs" x ("_abs" y Q))) > ("{\at}Replace" y x A Q) 

1203 
("Ball" A ("_abs" x P)) > ("{\at}Ball" x A P) 

1204 
\end{ttbox} 

1205 

1206 
\begin{warn} 

1207 
Avoid choosing variable names that have previously been used as 

1208 
constants, types or type classes; the {\tt consts} section in the output 

1209 
of {\tt Syntax.print_syntax} lists all such names. If a macro rule works 

1210 
incorrectly, inspect its internal form as shown above, recalling that 

1211 
constants appear as quoted strings and variables without quotes. 

1212 
\end{warn} 

1213 

1214 
\begin{warn} 

1215 
If \ttindex{eta_contract} is set to {\tt true}, terms will be 

1216 
$\eta$contracted {\em before\/} the \AST{} rewriter sees them. Thus some 

1217 
abstraction nodes needed for print rules to match may vanish. For example, 

1218 
\verbBall(A, %x. P(x)) contracts {\tt Ball(A, P)}; the print rule does 

1219 
not apply and the output will be {\tt Ball(A, P)}. This problem would not 

1220 
occur if \ML{} translation functions were used instead of macros (as is 

1221 
done for binder declarations). 

1222 
\end{warn} 

1223 

1224 

1225 
\begin{warn} 

1226 
Another trap concerns type constraints. If \ttindex{show_types} is set to 

1227 
{\tt true}, bound variables will be decorated by their meta types at the 

1228 
binding place (but not at occurrences in the body). Matching with 

1229 
\verbCollect(A, %x. P) binds {\tt x} to something like {\tt ("_constrain" y 

1230 
"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to 

1231 
appear in the external form, say \verb{y::i:A::i. P::o}. 

1232 

1233 
To allow such constraints to be reread, your syntax should specify bound 

1234 
variables using the nonterminal~\ttindex{idt}. This is the case in our 

1235 
example. Choosing {\tt id} instead of {\tt idt} is a common error, 

1236 
especially since it appears in former versions of most of Isabelle's 

1237 
objectlogics. 

1238 
\end{warn} 

1239 

1240 

1241 

1242 
\subsection{Applying rules} 

1243 
As a term is being parsed or printed, an \AST{} is generated as an 

1244 
intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is 

1245 
normalized by applying macro rules in the manner of a traditional term 

1246 
rewriting system. We first examine how a single rule is applied. 

1247 

1248 
Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some 

1249 
translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an 

1250 
instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex 

1251 
matched by $l$ may be replaced by the corresponding instance of~$r$, thus 

1252 
{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf 

1253 
placeholders} that may occur in rule patterns but not in ordinary 

1254 
\AST{}s; {\tt Variable} atoms serve this purpose. 

1255 

1256 
The matching of the object~$u$ by the pattern~$l$ is performed as follows: 

1257 
\begin{itemize} 

1258 
\item Every constant matches itself. 

1259 

1260 
\item $\Variable x$ in the object matches $\Constant x$ in the pattern. 

1261 
This point is discussed further below. 

1262 

1263 
\item Every \AST{} in the object matches $\Variable x$ in the pattern, 

1264 
binding~$x$ to~$u$. 

1265 

1266 
\item One application matches another if they have the same number of 

1267 
subtrees and corresponding subtrees match. 

1268 

1269 
\item In every other case, matching fails. In particular, {\tt 

1270 
Constant}~$x$ can only match itself. 

1271 
\end{itemize} 

1272 
A successful match yields a substitution that is applied to~$r$, generating 

1273 
the instance that replaces~$u$. 

1274 

1275 
The second case above may look odd. This is where {\tt Variable}s of 

1276 
nonrule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not 

1277 
far removed from parse trees; at this level it is not yet known which 

1278 
identifiers will become constants, bounds, frees, types or classes. As 

1279 
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as 

1280 
{\tt Constant}s, while $id$s, $var$s, $tid$s and $tvar$s become {\tt 

1281 
Variable}s. On the other hand, when \AST{}s generated from terms for 

1282 
printing, all constants and type constructors become {\tt Constant}s; see 

1283 
\S\ref{sec:asts}. Thus \AST{}s may contain a messy mixture of {\tt 

1284 
Variable}s and {\tt Constant}s. This is insignificant at macro level 

1285 
because matching treats them alike. 

1286 

1287 
Because of this behaviour, different kinds of atoms with the same name are 

1288 
indistinguishable, which may make some rules prone to misbehaviour. Example: 

1289 
\begin{ttbox} 

1290 
types 

1291 
Nil 

1292 
consts 

1293 
Nil :: "'a list" 

1294 
"[]" :: "'a list" ("[]") 

1295 
translations 

1296 
"[]" == "Nil" 

1297 
\end{ttbox} 

1298 
The term {\tt Nil} will be printed as {\tt []}, just as expected. What 

1299 
happens with \verb%Nil.t or {\tt x::Nil} is left as an exercise. 

1300 

1301 
Normalizing an \AST{} involves repeatedly applying macro rules until none 

1302 
is applicable. Macro rules are chosen in the order that they appear in the 

1303 
{\tt translations} section. You can watch the normalization of \AST{}s 

1304 
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to 

1305 
{\tt true}.\index{tracing!of macros} Alternatively, use 

1306 
\ttindex{Syntax.test_read}. The information displayed when tracing 

1307 
includes the \AST{} before normalization ({\tt pre}), redexes with results 

1308 
({\tt rewrote}), the normal form finally reached ({\tt post}) and some 

1309 
statistics ({\tt normalize}). If tracing is off, 

1310 
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable 

1311 
printing of the normal form and statistics only. 

1312 

1313 

1314 
\subsection{Example: the syntax of finite sets} 

1315 
This example demonstrates the use of recursive macros to implement a 

1316 
convenient notation for finite sets. 

1317 
\begin{ttbox} 

1318 
FINSET = SET + 

1319 
types 

1320 
is 

1321 
consts 

1322 
"" :: "i => is" ("_") 

1323 
"{\at}Enum" :: "[i, is] => is" ("_,/ _") 

1324 
empty :: "i" ("{\ttlbrace}{\ttrbrace}") 

1325 
insert :: "[i, i] => i" 

1326 
"{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}") 

1327 
translations 

1328 
"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})" 

1329 
"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})" 

1330 
end 

1331 
\end{ttbox} 

1332 
Finite sets are internally built up by {\tt empty} and {\tt insert}. The 

1333 
declarations above specify \verb{x, y, z} as the external representation 

1334 
of 

1335 
\begin{ttbox} 

1336 
insert(x, insert(y, insert(z, empty))) 

1337 
\end{ttbox} 

1338 

1339 
The nonterminal symbol~{\tt is} stands for one or more objects of type~{\tt 

1340 
i} separated by commas. The mixfix declaration \hbox{\verb"_,/ _"} 

1341 
allows a line break after the comma for pretty printing; if no line break 

1342 
is required then a space is printed instead. 

1343 

1344 
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities} 

1345 
declaration. Hence {\tt is} is not a logical type and no default 

1346 
productions are added. If we had needed enumerations of the nonterminal 

1347 
{\tt logic}, which would include all the logical types, we could have used 

1348 
the predefined nonterminal symbol \ttindex{args} and skipped this part 

1349 
altogether. The nonterminal~{\tt is} can later be reused for other 

1350 
enumerations of type~{\tt i} like lists or tuples. 

1351 

1352 
Next follows {\tt empty}, which is already equipped with its syntax 

1353 
\verb{}, and {\tt insert} without concrete syntax. The syntactic 

1354 
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt 

1355 
i} enclosed in curly braces. Remember that a pair of parentheses, as in 

1356 
\verb"{(_)}", specifies a block of indentation for pretty printing. 

1357 

1358 
The translations may look strange at first. Macro rules are best 

1359 
understood in their internal forms: 

1360 
\begin{ttbox} 

1361 
parse_rules: 

1362 
("{\at}Finset" ("{\at}Enum" x xs)) > ("insert" x ("{\at}Finset" xs)) 

1363 
("{\at}Finset" x) > ("insert" x "empty") 

1364 
print_rules: 

1365 
("insert" x ("{\at}Finset" xs)) > ("{\at}Finset" ("{\at}Enum" x xs)) 

1366 
("insert" x "empty") > ("{\at}Finset" x) 

1367 
\end{ttbox} 

1368 
This shows that \verb{x, xs} indeed matches any set enumeration of at least 

1369 
two elements, binding the first to {\tt x} and the rest to {\tt xs}. 

1370 
Likewise, \verb{xs} and \verb{x} represent any set enumeration. 

1371 
The parse rules only work in the order given. 

1372 

1373 
\begin{warn} 

1374 
The \AST{} rewriter cannot discern constants from variables and looks 

1375 
only for names of atoms. Thus the names of {\tt Constant}s occurring in 

1376 
the (internal) lefthand side of translation rules should be regarded as 

1377 
reserved keywords. Choose nonidentifiers like {\tt\at Finset} or 

1378 
sufficiently long and strange names. If a bound variable's name gets 

1379 
rewritten, the result will be incorrect; for example, the term 

1380 
\begin{ttbox} 

1381 
\%empty insert. insert(x, empty) 

1382 
\end{ttbox} 

1383 
gets printed as \verb%empty insert. {x}. 

1384 
\end{warn} 

1385 
