author | paulson |
Thu, 20 Mar 2003 15:58:25 +0100 | |
changeset 13871 | 26e5f5e624f6 |
parent 9695 | ec7d7f877712 |
child 14893 | 55e83c32cdec |
permissions | -rw-r--r-- |
323 | 1 |
%% $Id$ |
2 |
\chapter{Syntax Transformations} \label{chap:syntax} |
|
3 |
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}} |
|
4 |
\newcommand\mtt[1]{\mbox{\tt #1}} |
|
5 |
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits} |
|
6 |
\newcommand\Constant{\ttfct{Constant}} |
|
7 |
\newcommand\Variable{\ttfct{Variable}} |
|
8 |
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]} |
|
9 |
\index{syntax!transformations|(} |
|
10 |
||
11 |
This chapter is intended for experienced Isabelle users who need to define |
|
12 |
macros or code their own translation functions. It describes the |
|
13 |
transformations between parse trees, abstract syntax trees and terms. |
|
14 |
||
15 |
||
16 |
\section{Abstract syntax trees} \label{sec:asts} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
17 |
\index{ASTs|(} |
323 | 18 |
|
19 |
The parser, given a token list from the lexer, applies productions to yield |
|
20 |
a parse tree\index{parse trees}. By applying some internal transformations |
|
21 |
the parse tree becomes an abstract syntax tree, or \AST{}. Macro |
|
22 |
expansion, further translations and finally type inference yields a |
|
23 |
well-typed term. The printing process is the reverse, except for some |
|
24 |
subtleties to be discussed later. |
|
25 |
||
26 |
Figure~\ref{fig:parse_print} outlines the parsing and printing process. |
|
27 |
Much of the complexity is due to the macro mechanism. Using macros, you |
|
28 |
can specify most forms of concrete syntax without writing any \ML{} code. |
|
29 |
||
30 |
\begin{figure} |
|
31 |
\begin{center} |
|
32 |
\begin{tabular}{cl} |
|
33 |
string & \\ |
|
3108 | 34 |
$\downarrow$ & lexer, parser \\ |
323 | 35 |
parse tree & \\ |
36 |
$\downarrow$ & parse \AST{} translation \\ |
|
37 |
\AST{} & \\ |
|
38 |
$\downarrow$ & \AST{} rewriting (macros) \\ |
|
39 |
\AST{} & \\ |
|
40 |
$\downarrow$ & parse translation, type inference \\ |
|
41 |
--- well-typed term --- & \\ |
|
42 |
$\downarrow$ & print translation \\ |
|
43 |
\AST{} & \\ |
|
44 |
$\downarrow$ & \AST{} rewriting (macros) \\ |
|
45 |
\AST{} & \\ |
|
3108 | 46 |
$\downarrow$ & print \AST{} translation, token translation \\ |
323 | 47 |
string & |
48 |
\end{tabular} |
|
49 |
||
50 |
\end{center} |
|
51 |
\caption{Parsing and printing}\label{fig:parse_print} |
|
52 |
\end{figure} |
|
53 |
||
54 |
Abstract syntax trees are an intermediate form between the raw parse trees |
|
55 |
and the typed $\lambda$-terms. An \AST{} is either an atom (constant or |
|
56 |
variable) or a list of {\em at least two\/} subtrees. Internally, they |
|
57 |
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable} |
|
58 |
\index{*Appl} |
|
59 |
\begin{ttbox} |
|
60 |
datatype ast = Constant of string |
|
61 |
| Variable of string |
|
62 |
| Appl of ast list |
|
63 |
\end{ttbox} |
|
64 |
% |
|
65 |
Isabelle uses an S-expression syntax for abstract syntax trees. Constant |
|
66 |
atoms are shown as quoted strings, variable atoms as non-quoted strings and |
|
332 | 67 |
applications as a parenthesised list of subtrees. For example, the \AST |
323 | 68 |
\begin{ttbox} |
69 |
Appl [Constant "_constrain", |
|
70 |
Appl [Constant "_abs", Variable "x", Variable "t"], |
|
71 |
Appl [Constant "fun", Variable "'a", Variable "'b"]] |
|
72 |
\end{ttbox} |
|
73 |
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. |
|
74 |
Both {\tt ()} and {\tt (f)} are illegal because they have too few |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
75 |
subtrees. |
323 | 76 |
|
77 |
The resemblance to Lisp's S-expressions is intentional, but there are two |
|
78 |
kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the |
|
79 |
names {\tt Constant} and {\tt Variable} too literally; in the later |
|
80 |
translation to terms, $\Variable x$ may become a constant, free or bound |
|
81 |
variable, even a type constructor or class name; the actual outcome depends |
|
82 |
on the context. |
|
83 |
||
84 |
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the |
|
85 |
application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of |
|
86 |
application is determined later by context; it could be a type constructor |
|
87 |
applied to types. |
|
88 |
||
89 |
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are |
|
90 |
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later |
|
91 |
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and |
|
92 |
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term |
|
93 |
constructor \ttindex{Bound}). |
|
94 |
||
95 |
||
6618 | 96 |
\section{Transforming parse trees to ASTs}\label{sec:astofpt} |
323 | 97 |
\index{ASTs!made from parse trees} |
98 |
\newcommand\astofpt[1]{\lbrakk#1\rbrakk} |
|
99 |
||
100 |
The parse tree is the raw output of the parser. Translation functions, |
|
101 |
called {\bf parse AST translations}\indexbold{translations!parse AST}, |
|
102 |
transform the parse tree into an abstract syntax tree. |
|
103 |
||
104 |
The parse tree is constructed by nesting the right-hand sides of the |
|
105 |
productions used to recognize the input. Such parse trees are simply lists |
|
106 |
of tokens and constituent parse trees, the latter representing the |
|
107 |
nonterminals of the productions. Let us refer to the actual productions in |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
108 |
the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
109 |
example). |
323 | 110 |
|
111 |
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s |
|
112 |
by stripping out delimiters and copy productions. More precisely, the |
|
113 |
mapping $\astofpt{-}$ is derived from the productions as follows: |
|
114 |
\begin{itemize} |
|
115 |
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
116 |
\ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$ |
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
117 |
its associated string. Note that for {\tt xstr} this does not include the |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
118 |
quotes. |
323 | 119 |
|
120 |
\item Copy productions:\index{productions!copy} |
|
121 |
$\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for |
|
122 |
strings of delimiters, which are discarded. $P$ stands for the single |
|
123 |
constituent that is not a delimiter; it is either a nonterminal symbol or |
|
124 |
a name token. |
|
125 |
||
126 |
\item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. |
|
127 |
Here there are no constituents other than delimiters, which are |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
128 |
discarded. |
323 | 129 |
|
130 |
\item $n$-ary productions, where $n \ge 1$: delimiters are discarded and |
|
131 |
the remaining constituents $P@1$, \ldots, $P@n$ are built into an |
|
132 |
application whose head constant is~$c$: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
133 |
\[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = |
323 | 134 |
\Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} |
135 |
\] |
|
136 |
\end{itemize} |
|
137 |
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, |
|
138 |
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. |
|
139 |
These examples illustrate the need for further translations to make \AST{}s |
|
140 |
closer to the typed $\lambda$-calculus. The Pure syntax provides |
|
141 |
predefined parse \AST{} translations\index{translations!parse AST} for |
|
142 |
ordinary applications, type applications, nested abstractions, meta |
|
143 |
implications and function types. Figure~\ref{fig:parse_ast_tr} shows their |
|
144 |
effect on some representative input strings. |
|
145 |
||
146 |
||
147 |
\begin{figure} |
|
148 |
\begin{center} |
|
149 |
\tt\begin{tabular}{ll} |
|
150 |
\rm input string & \rm \AST \\\hline |
|
151 |
"f" & f \\ |
|
152 |
"'a" & 'a \\ |
|
153 |
"t == u" & ("==" t u) \\ |
|
154 |
"f(x)" & ("_appl" f x) \\ |
|
155 |
"f(x, y)" & ("_appl" f ("_args" x y)) \\ |
|
156 |
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ |
|
157 |
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ |
|
158 |
\end{tabular} |
|
159 |
\end{center} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
160 |
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} |
323 | 161 |
\end{figure} |
162 |
||
163 |
\begin{figure} |
|
164 |
\begin{center} |
|
165 |
\tt\begin{tabular}{ll} |
|
166 |
\rm input string & \rm \AST{} \\\hline |
|
167 |
"f(x, y, z)" & (f x y z) \\ |
|
168 |
"'a ty" & (ty 'a) \\ |
|
169 |
"('a, 'b) ty" & (ty 'a 'b) \\ |
|
170 |
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ |
|
171 |
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ |
|
172 |
"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ |
|
173 |
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) |
|
174 |
\end{tabular} |
|
175 |
\end{center} |
|
176 |
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr} |
|
177 |
\end{figure} |
|
178 |
||
179 |
The names of constant heads in the \AST{} control the translation process. |
|
180 |
The list of constants invoking parse \AST{} translations appears in the |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
181 |
output of {\tt print_syntax} under {\tt parse_ast_translation}. |
323 | 182 |
|
183 |
||
6618 | 184 |
\section{Transforming ASTs to terms}\label{sec:termofast} |
323 | 185 |
\index{terms!made from ASTs} |
186 |
\newcommand\termofast[1]{\lbrakk#1\rbrakk} |
|
187 |
||
188 |
The \AST{}, after application of macros (see \S\ref{sec:macros}), is |
|
189 |
transformed into a term. This term is probably ill-typed since type |
|
190 |
inference has not occurred yet. The term may contain type constraints |
|
191 |
consisting of applications with head {\tt "_constrain"}; the second |
|
192 |
argument is a type encoded as a term. Type inference later introduces |
|
193 |
correct types or rejects the input. |
|
194 |
||
195 |
Another set of translation functions, namely parse |
|
196 |
translations\index{translations!parse}, may affect this process. If we |
|
197 |
ignore parse translations for the time being, then \AST{}s are transformed |
|
198 |
to terms by mapping \AST{} constants to constants, \AST{} variables to |
|
199 |
schematic or free variables and \AST{} applications to applications. |
|
200 |
||
201 |
More precisely, the mapping $\termofast{-}$ is defined by |
|
202 |
\begin{itemize} |
|
203 |
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x, |
|
204 |
\mtt{dummyT})$. |
|
205 |
||
206 |
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} = |
|
207 |
\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$ |
|
208 |
the index extracted from~$xi$. |
|
209 |
||
210 |
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x, |
|
211 |
\mtt{dummyT})$. |
|
212 |
||
213 |
\item Function applications with $n$ arguments: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
214 |
\[ \termofast{\Appl{f, x@1, \ldots, x@n}} = |
323 | 215 |
\termofast{f} \ttapp |
216 |
\termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n} |
|
217 |
\] |
|
218 |
\end{itemize} |
|
219 |
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and |
|
220 |
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term}, |
|
221 |
while \ttindex{dummyT} stands for some dummy type that is ignored during |
|
222 |
type inference. |
|
223 |
||
224 |
So far the outcome is still a first-order term. Abstractions and bound |
|
225 |
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced |
|
226 |
by parse translations. Such translations are attached to {\tt "_abs"}, |
|
227 |
{\tt "!!"} and user-defined binders. |
|
228 |
||
229 |
||
230 |
\section{Printing of terms} |
|
231 |
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms} |
|
232 |
||
233 |
The output phase is essentially the inverse of the input phase. Terms are |
|
234 |
translated via abstract syntax trees into strings. Finally the strings are |
|
235 |
pretty printed. |
|
236 |
||
237 |
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of |
|
238 |
terms into \AST{}s. Ignoring those, the transformation maps |
|
239 |
term constants, variables and applications to the corresponding constructs |
|
240 |
on \AST{}s. Abstractions are mapped to applications of the special |
|
241 |
constant {\tt _abs}. |
|
242 |
||
243 |
More precisely, the mapping $\astofterm{-}$ is defined as follows: |
|
244 |
\begin{itemize} |
|
245 |
\item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$. |
|
246 |
||
247 |
\item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x, |
|
248 |
\tau)$. |
|
249 |
||
250 |
\item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable |
|
251 |
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of |
|
252 |
the {\tt indexname} $(x, i)$. |
|
253 |
||
254 |
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant |
|
255 |
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ |
|
256 |
be obtained from~$t$ by replacing all bound occurrences of~$x$ by |
|
257 |
the free variable $x'$. This replaces corresponding occurrences of the |
|
258 |
constructor \ttindex{Bound} by the term $\ttfct{Free} (x', |
|
259 |
\mtt{dummyT})$: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
260 |
\[ \astofterm{\ttfct{Abs} (x, \tau, t)} = |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
261 |
\Appl{\Constant \mtt{"_abs"}, |
8136 | 262 |
constrain(\Variable x', \tau), \astofterm{t'}} |
323 | 263 |
\] |
264 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
265 |
\item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. |
323 | 266 |
The occurrence of constructor \ttindex{Bound} should never happen |
267 |
when printing well-typed terms; it indicates a de Bruijn index with no |
|
268 |
matching abstraction. |
|
269 |
||
270 |
\item Where $f$ is not an application, |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
271 |
\[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = |
323 | 272 |
\Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} |
273 |
\] |
|
274 |
\end{itemize} |
|
275 |
% |
|
332 | 276 |
Type constraints\index{type constraints} are inserted to allow the printing |
277 |
of types. This is governed by the boolean variable \ttindex{show_types}: |
|
323 | 278 |
\begin{itemize} |
279 |
\item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or |
|
332 | 280 |
\ttindex{show_types} is set to {\tt false}. |
323 | 281 |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
282 |
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
283 |
\astofterm{\tau}}$ \ otherwise. |
323 | 284 |
|
285 |
Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type |
|
286 |
constructors go to {\tt Constant}s; type identifiers go to {\tt |
|
287 |
Variable}s; type applications go to {\tt Appl}s with the type |
|
288 |
constructor as the first element. If \ttindex{show_sorts} is set to |
|
289 |
{\tt true}, some type variables are decorated with an \AST{} encoding |
|
290 |
of their sort. |
|
291 |
\end{itemize} |
|
292 |
% |
|
293 |
The \AST{}, after application of macros (see \S\ref{sec:macros}), is |
|
294 |
transformed into the final output string. The built-in {\bf print AST |
|
332 | 295 |
translations}\indexbold{translations!print AST} reverse the |
323 | 296 |
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}. |
297 |
||
298 |
For the actual printing process, the names attached to productions |
|
299 |
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play |
|
300 |
a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or |
|
301 |
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production |
|
302 |
for~$c$. Each argument~$x@i$ is converted to a string, and put in |
|
303 |
parentheses if its priority~$(p@i)$ requires this. The resulting strings |
|
304 |
and their syntactic sugar (denoted by \dots{} above) are joined to make a |
|
305 |
single string. |
|
306 |
||
3108 | 307 |
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments |
308 |
than the corresponding production, it is first split into |
|
309 |
$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications |
|
310 |
with too few arguments or with non-constant head or without a |
|
311 |
corresponding production are printed as $f(x@1, \ldots, x@l)$ or |
|
312 |
$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated |
|
313 |
with some name $c$ are tried in order of appearance. An occurrence of |
|
323 | 314 |
$\Variable x$ is simply printed as~$x$. |
315 |
||
316 |
Blanks are {\em not\/} inserted automatically. If blanks are required to |
|
317 |
separate tokens, specify them in the mixfix declaration, possibly preceded |
|
318 |
by a slash~({\tt/}) to allow a line break. |
|
319 |
\index{ASTs|)} |
|
320 |
||
321 |
||
322 |
||
3108 | 323 |
\section{Macros: syntactic rewriting} \label{sec:macros} |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
324 |
\index{macros|(}\index{rewriting!syntactic|(} |
323 | 325 |
|
326 |
Mixfix declarations alone can handle situations where there is a direct |
|
327 |
connection between the concrete syntax and the underlying term. Sometimes |
|
328 |
we require a more elaborate concrete syntax, such as quantifiers and list |
|
329 |
notation. Isabelle's {\bf macros} and {\bf translation functions} can |
|
330 |
perform translations such as |
|
331 |
\begin{center}\tt |
|
332 |
\begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l} |
|
333 |
ALL x:A.P & Ball(A, \%x.P) \\ \relax |
|
334 |
[x, y, z] & Cons(x, Cons(y, Cons(z, Nil))) |
|
335 |
\end{tabular} |
|
336 |
\end{center} |
|
337 |
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they |
|
338 |
are the most powerful translation mechanism but are difficult to read or |
|
339 |
write. Macros are specified by first-order rewriting systems that operate |
|
340 |
on abstract syntax trees. They are usually easy to read and write, and can |
|
341 |
express all but the most obscure translations. |
|
342 |
||
3108 | 343 |
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and |
344 |
set theory.\footnote{This and the following theories are complete |
|
345 |
working examples, though they specify only syntax, no axioms. The |
|
346 |
file {\tt ZF/ZF.thy} presents a full set theory definition, |
|
347 |
including many macro rules.} Theory {\tt SetSyntax} declares |
|
348 |
constants for set comprehension ({\tt Collect}), replacement ({\tt |
|
349 |
Replace}) and bounded universal quantification ({\tt Ball}). Each |
|
350 |
of these binds some variables. Without additional syntax we should |
|
351 |
have to write $\forall x \in A. P$ as {\tt Ball(A,\%x.P)}, and |
|
352 |
similarly for the others. |
|
323 | 353 |
|
354 |
\begin{figure} |
|
355 |
\begin{ttbox} |
|
3108 | 356 |
SetSyntax = Pure + |
323 | 357 |
types |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
358 |
i o |
323 | 359 |
arities |
360 |
i, o :: logic |
|
361 |
consts |
|
1387 | 362 |
Trueprop :: o => prop ("_" 5) |
363 |
Collect :: [i, i => o] => i |
|
364 |
Replace :: [i, [i, i] => o] => i |
|
365 |
Ball :: [i, i => o] => o |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
366 |
syntax |
1387 | 367 |
"{\at}Collect" :: [idt, i, o] => i ("(1{\ttlbrace}_:_./ _{\ttrbrace})") |
368 |
"{\at}Replace" :: [idt, idt, i, o] => i ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})") |
|
369 |
"{\at}Ball" :: [idt, i, o] => o ("(3ALL _:_./ _)" 10) |
|
323 | 370 |
translations |
371 |
"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)" |
|
372 |
"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)" |
|
373 |
"ALL x:A. P" == "Ball(A, \%x. P)" |
|
374 |
end |
|
375 |
\end{ttbox} |
|
376 |
\caption{Macro example: set theory}\label{fig:set_trans} |
|
377 |
\end{figure} |
|
378 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
379 |
The theory specifies a variable-binding syntax through additional productions |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
380 |
that have mixfix declarations. Each non-copy production must specify some |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
381 |
constant, which is used for building \AST{}s. \index{constants!syntactic} The |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
382 |
additional constants are decorated with {\tt\at} to stress their purely |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
383 |
syntactic purpose; they may not occur within the final well-typed terms, |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
384 |
being declared as {\tt syntax} rather than {\tt consts}. |
323 | 385 |
|
386 |
The translations cause the replacement of external forms by internal forms |
|
387 |
after parsing, and vice versa before printing of terms. As a specification |
|
388 |
of the set theory notation, they should be largely self-explanatory. The |
|
389 |
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball}, |
|
390 |
appear implicitly in the macro rules via their mixfix forms. |
|
391 |
||
392 |
Macros can define variable-binding syntax because they operate on \AST{}s, |
|
393 |
which have no inbuilt notion of bound variable. The macro variables {\tt |
|
394 |
x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers, |
|
395 |
in this case bound variables. The macro variables {\tt P} and~{\tt Q} |
|
396 |
range over formulae containing bound variable occurrences. |
|
397 |
||
398 |
Other applications of the macro system can be less straightforward, and |
|
399 |
there are peculiarities. The rest of this section will describe in detail |
|
400 |
how Isabelle macros are preprocessed and applied. |
|
401 |
||
402 |
||
403 |
\subsection{Specifying macros} |
|
404 |
Macros are basically rewrite rules on \AST{}s. But unlike other macro |
|
405 |
systems found in programming languages, Isabelle's macros work in both |
|
406 |
directions. Therefore a syntax contains two lists of rewrites: one for |
|
407 |
parsing and one for printing. |
|
408 |
||
409 |
\index{*translations section} |
|
410 |
The {\tt translations} section specifies macros. The syntax for a macro is |
|
411 |
\[ (root)\; string \quad |
|
412 |
\left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array} |
|
413 |
\right\} \quad |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
414 |
(root)\; string |
323 | 415 |
\] |
416 |
% |
|
417 |
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both |
|
418 |
({\tt ==}). The two strings specify the left and right-hand sides of the |
|
419 |
macro rule. The $(root)$ specification is optional; it specifies the |
|
420 |
nonterminal for parsing the $string$ and if omitted defaults to {\tt |
|
421 |
logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions: |
|
422 |
\begin{itemize} |
|
423 |
\item Rules must be left linear: $l$ must not contain repeated variables. |
|
424 |
||
425 |
\item Every variable in~$r$ must also occur in~$l$. |
|
426 |
\end{itemize} |
|
427 |
||
3108 | 428 |
Macro rules may refer to any syntax from the parent theories. They |
429 |
may also refer to anything defined before the current {\tt |
|
323 | 430 |
translations} section --- including any mixfix declarations. |
431 |
||
432 |
Upon declaration, both sides of the macro rule undergo parsing and parse |
|
433 |
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo |
|
434 |
macro expansion. The lexer runs in a different mode that additionally |
|
435 |
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt}, |
|
436 |
{\tt _K}). Thus, a constant whose name starts with an underscore can |
|
437 |
appear in macro rules but not in ordinary terms. |
|
438 |
||
439 |
Some atoms of the macro rule's \AST{} are designated as constants for |
|
440 |
matching. These are all names that have been declared as classes, types or |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
441 |
constants (logical and syntactic). |
323 | 442 |
|
3108 | 443 |
The result of this preprocessing is two lists of macro rules, each |
444 |
stored as a pair of \AST{}s. They can be viewed using {\tt |
|
445 |
print_syntax} (sections \ttindex{parse_rules} and |
|
446 |
\ttindex{print_rules}). For theory~{\tt SetSyntax} of |
|
447 |
Fig.~\ref{fig:set_trans} these are |
|
323 | 448 |
\begin{ttbox} |
449 |
parse_rules: |
|
450 |
("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P)) |
|
451 |
("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q))) |
|
452 |
("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P)) |
|
453 |
print_rules: |
|
454 |
("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P) |
|
455 |
("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q) |
|
456 |
("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P) |
|
457 |
\end{ttbox} |
|
458 |
||
459 |
\begin{warn} |
|
460 |
Avoid choosing variable names that have previously been used as |
|
461 |
constants, types or type classes; the {\tt consts} section in the output |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
462 |
of {\tt print_syntax} lists all such names. If a macro rule works |
323 | 463 |
incorrectly, inspect its internal form as shown above, recalling that |
464 |
constants appear as quoted strings and variables without quotes. |
|
465 |
\end{warn} |
|
466 |
||
467 |
\begin{warn} |
|
468 |
If \ttindex{eta_contract} is set to {\tt true}, terms will be |
|
469 |
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus some |
|
470 |
abstraction nodes needed for print rules to match may vanish. For example, |
|
332 | 471 |
\verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does |
323 | 472 |
not apply and the output will be {\tt Ball(A, P)}. This problem would not |
473 |
occur if \ML{} translation functions were used instead of macros (as is |
|
474 |
done for binder declarations). |
|
475 |
\end{warn} |
|
476 |
||
477 |
||
478 |
\begin{warn} |
|
479 |
Another trap concerns type constraints. If \ttindex{show_types} is set to |
|
480 |
{\tt true}, bound variables will be decorated by their meta types at the |
|
481 |
binding place (but not at occurrences in the body). Matching with |
|
482 |
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y |
|
483 |
"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
484 |
appear in the external form, say \verb|{y::i:A::i. P::o}|. |
323 | 485 |
|
486 |
To allow such constraints to be re-read, your syntax should specify bound |
|
487 |
variables using the nonterminal~\ndx{idt}. This is the case in our |
|
3108 | 488 |
example. Choosing {\tt id} instead of {\tt idt} is a common error. |
323 | 489 |
\end{warn} |
490 |
||
491 |
||
492 |
||
493 |
\subsection{Applying rules} |
|
494 |
As a term is being parsed or printed, an \AST{} is generated as an |
|
495 |
intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is |
|
332 | 496 |
normalised by applying macro rules in the manner of a traditional term |
323 | 497 |
rewriting system. We first examine how a single rule is applied. |
498 |
||
332 | 499 |
Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some |
323 | 500 |
translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an |
501 |
instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex |
|
502 |
matched by $l$ may be replaced by the corresponding instance of~$r$, thus |
|
503 |
{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf |
|
504 |
place-holders} that may occur in rule patterns but not in ordinary |
|
505 |
\AST{}s; {\tt Variable} atoms serve this purpose. |
|
506 |
||
507 |
The matching of the object~$u$ by the pattern~$l$ is performed as follows: |
|
508 |
\begin{itemize} |
|
509 |
\item Every constant matches itself. |
|
510 |
||
511 |
\item $\Variable x$ in the object matches $\Constant x$ in the pattern. |
|
512 |
This point is discussed further below. |
|
513 |
||
514 |
\item Every \AST{} in the object matches $\Variable x$ in the pattern, |
|
515 |
binding~$x$ to~$u$. |
|
516 |
||
517 |
\item One application matches another if they have the same number of |
|
518 |
subtrees and corresponding subtrees match. |
|
519 |
||
520 |
\item In every other case, matching fails. In particular, {\tt |
|
521 |
Constant}~$x$ can only match itself. |
|
522 |
\end{itemize} |
|
523 |
A successful match yields a substitution that is applied to~$r$, generating |
|
524 |
the instance that replaces~$u$. |
|
525 |
||
526 |
The second case above may look odd. This is where {\tt Variable}s of |
|
527 |
non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not |
|
528 |
far removed from parse trees; at this level it is not yet known which |
|
529 |
identifiers will become constants, bounds, frees, types or classes. As |
|
530 |
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
531 |
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid}, |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
532 |
\ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the other |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
533 |
hand, when \AST{}s generated from terms for printing, all constants and type |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
534 |
constructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
535 |
contain a messy mixture of {\tt Variable}s and {\tt Constant}s. This is |
323 | 536 |
insignificant at macro level because matching treats them alike. |
537 |
||
538 |
Because of this behaviour, different kinds of atoms with the same name are |
|
539 |
indistinguishable, which may make some rules prone to misbehaviour. Example: |
|
540 |
\begin{ttbox} |
|
541 |
types |
|
542 |
Nil |
|
543 |
consts |
|
1387 | 544 |
Nil :: 'a list |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
545 |
syntax |
1387 | 546 |
"[]" :: 'a list ("[]") |
323 | 547 |
translations |
548 |
"[]" == "Nil" |
|
549 |
\end{ttbox} |
|
550 |
The term {\tt Nil} will be printed as {\tt []}, just as expected. |
|
551 |
The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
552 |
expected! Guess how type~{\tt Nil} is printed? |
323 | 553 |
|
3108 | 554 |
Normalizing an \AST{} involves repeatedly applying macro rules until |
555 |
none are applicable. Macro rules are chosen in order of appearance in |
|
556 |
the theory definitions. You can watch the normalization of \AST{}s |
|
8996
5a5bbb6b6688
renamed Syntax.trace_norm_ast to Syntax.trace_ast;
wenzelm
parents:
8701
diff
changeset
|
557 |
during parsing and printing by setting \ttindex{Syntax.trace_ast} |
3108 | 558 |
to {\tt true}.\index{tracing!of macros} Alternatively, use |
323 | 559 |
\ttindex{Syntax.test_read}. The information displayed when tracing |
3108 | 560 |
includes the \AST{} before normalization ({\tt pre}), redexes with |
561 |
results ({\tt rewrote}), the normal form finally reached ({\tt post}) |
|
8996
5a5bbb6b6688
renamed Syntax.trace_norm_ast to Syntax.trace_ast;
wenzelm
parents:
8701
diff
changeset
|
562 |
and some statistics ({\tt normalize}). |
323 | 563 |
|
564 |
\subsection{Example: the syntax of finite sets} |
|
565 |
\index{examples!of macros} |
|
566 |
||
567 |
This example demonstrates the use of recursive macros to implement a |
|
568 |
convenient notation for finite sets. |
|
569 |
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol} |
|
570 |
\index{"@Enum@{\tt\at Enum} constant} |
|
571 |
\index{"@Finset@{\tt\at Finset} constant} |
|
572 |
\begin{ttbox} |
|
3108 | 573 |
FinSyntax = SetSyntax + |
323 | 574 |
types |
575 |
is |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
576 |
syntax |
1387 | 577 |
"" :: i => is ("_") |
578 |
"{\at}Enum" :: [i, is] => is ("_,/ _") |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
579 |
consts |
1387 | 580 |
empty :: i ("{\ttlbrace}{\ttrbrace}") |
581 |
insert :: [i, i] => i |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
582 |
syntax |
1387 | 583 |
"{\at}Finset" :: is => i ("{\ttlbrace}(_){\ttrbrace}") |
323 | 584 |
translations |
585 |
"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})" |
|
586 |
"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})" |
|
587 |
end |
|
588 |
\end{ttbox} |
|
589 |
Finite sets are internally built up by {\tt empty} and {\tt insert}. The |
|
590 |
declarations above specify \verb|{x, y, z}| as the external representation |
|
591 |
of |
|
592 |
\begin{ttbox} |
|
593 |
insert(x, insert(y, insert(z, empty))) |
|
594 |
\end{ttbox} |
|
595 |
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt |
|
596 |
i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|} |
|
597 |
allows a line break after the comma for \rmindex{pretty printing}; if no |
|
598 |
line break is required then a space is printed instead. |
|
599 |
||
600 |
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
601 |
declaration. Hence {\tt is} is not a logical type and may be used safely as |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
602 |
a new nonterminal for custom syntax. The nonterminal~{\tt is} can later be |
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
603 |
re-used for other enumerations of type~{\tt i} like lists or tuples. If we |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
604 |
had needed polymorphic enumerations, we could have used the predefined |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
605 |
nonterminal symbol \ndx{args} and skipped this part altogether. |
323 | 606 |
|
607 |
\index{"@Finset@{\tt\at Finset} constant} |
|
608 |
Next follows {\tt empty}, which is already equipped with its syntax |
|
609 |
\verb|{}|, and {\tt insert} without concrete syntax. The syntactic |
|
610 |
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt |
|
611 |
i} enclosed in curly braces. Remember that a pair of parentheses, as in |
|
612 |
\verb|"{(_)}"|, specifies a block of indentation for pretty printing. |
|
613 |
||
614 |
The translations may look strange at first. Macro rules are best |
|
615 |
understood in their internal forms: |
|
616 |
\begin{ttbox} |
|
617 |
parse_rules: |
|
618 |
("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs)) |
|
619 |
("{\at}Finset" x) -> ("insert" x "empty") |
|
620 |
print_rules: |
|
621 |
("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs)) |
|
622 |
("insert" x "empty") -> ("{\at}Finset" x) |
|
623 |
\end{ttbox} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
624 |
This shows that \verb|{x,xs}| indeed matches any set enumeration of at least |
323 | 625 |
two elements, binding the first to {\tt x} and the rest to {\tt xs}. |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
626 |
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration. |
323 | 627 |
The parse rules only work in the order given. |
628 |
||
629 |
\begin{warn} |
|
332 | 630 |
The \AST{} rewriter cannot distinguish constants from variables and looks |
323 | 631 |
only for names of atoms. Thus the names of {\tt Constant}s occurring in |
632 |
the (internal) left-hand side of translation rules should be regarded as |
|
633 |
\rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or |
|
634 |
sufficiently long and strange names. If a bound variable's name gets |
|
635 |
rewritten, the result will be incorrect; for example, the term |
|
636 |
\begin{ttbox} |
|
637 |
\%empty insert. insert(x, empty) |
|
638 |
\end{ttbox} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
639 |
\par\noindent is incorrectly printed as \verb|%empty insert. {x}|. |
323 | 640 |
\end{warn} |
641 |
||
642 |
||
643 |
\subsection{Example: a parse macro for dependent types}\label{prod_trans} |
|
644 |
\index{examples!of macros} |
|
645 |
||
646 |
As stated earlier, a macro rule may not introduce new {\tt Variable}s on |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
647 |
the right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal; |
323 | 648 |
if allowed, it could cause variable capture. In such cases you usually |
649 |
must fall back on translation functions. But a trick can make things |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
650 |
readable in some cases: {\em calling\/} translation functions by parse |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
651 |
macros: |
323 | 652 |
\begin{ttbox} |
3135 | 653 |
ProdSyntax = SetSyntax + |
323 | 654 |
consts |
1387 | 655 |
Pi :: [i, i => i] => i |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
656 |
syntax |
1387 | 657 |
"{\at}PROD" :: [idt, i, i] => i ("(3PROD _:_./ _)" 10) |
658 |
"{\at}->" :: [i, i] => i ("(_ ->/ _)" [51, 50] 50) |
|
323 | 659 |
\ttbreak |
660 |
translations |
|
661 |
"PROD x:A. B" => "Pi(A, \%x. B)" |
|
662 |
"A -> B" => "Pi(A, _K(B))" |
|
663 |
end |
|
664 |
ML |
|
665 |
val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))]; |
|
666 |
\end{ttbox} |
|
667 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
668 |
Here {\tt Pi} is a logical constant for constructing general products. |
323 | 669 |
Two external forms exist: the general case {\tt PROD x:A.B} and the |
670 |
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B} |
|
671 |
does not depend on~{\tt x}. |
|
672 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
673 |
The second parse macro introduces {\tt _K(B)}, which later becomes |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
674 |
\verb|%x.B| due to a parse translation associated with \cdx{_K}. |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
675 |
Unfortunately there is no such trick for printing, so we have to add a {\tt |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
676 |
ML} section for the print translation \ttindex{dependent_tr'}. |
323 | 677 |
|
678 |
Recall that identifiers with a leading {\tt _} are allowed in translation |
|
679 |
rules, but not in ordinary terms. Thus we can create \AST{}s containing |
|
680 |
names that are not directly expressible. |
|
681 |
||
8136 | 682 |
The parse translation for {\tt _K} is already installed in Pure, and the |
683 |
function {\tt dependent_tr'} is exported by the syntax module for public use. |
|
684 |
See \S\ref{sec:tr_funs} below for more of the arcane lore of translation |
|
685 |
functions. \index{macros|)}\index{rewriting!syntactic|)} |
|
323 | 686 |
|
687 |
||
688 |
\section{Translation functions} \label{sec:tr_funs} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
689 |
\index{translations|(} |
323 | 690 |
% |
9695 | 691 |
This section describes the translation function mechanism. By writing \ML{} |
692 |
functions, you can do almost everything to terms or \AST{}s during parsing and |
|
693 |
printing. The logic LK is a good example of sophisticated transformations |
|
694 |
between internal and external representations of sequents; here, macros would |
|
695 |
be useless. |
|
323 | 696 |
|
697 |
A full understanding of translations requires some familiarity |
|
698 |
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ}, |
|
699 |
{\tt Syntax.ast} and the encodings of types and terms as such at the various |
|
700 |
stages of the parsing or printing process. Most users should never need to |
|
701 |
use translation functions. |
|
702 |
||
703 |
\subsection{Declaring translation functions} |
|
3108 | 704 |
There are four kinds of translation functions, with one of these |
705 |
coming in two variants. Each such function is associated with a name, |
|
706 |
which triggers calls to it. Such names can be constants (logical or |
|
707 |
syntactic) or type constructors. |
|
323 | 708 |
|
6343 | 709 |
Function {\tt print_syntax} displays the sets of names associated with the |
710 |
translation functions of a theory under \texttt{parse_ast_translation}, etc. |
|
711 |
You can add new ones via the {\tt ML} section\index{*ML section} of a theory |
|
712 |
definition file. Even though the {\tt ML} section is the very last part of |
|
713 |
the file, newly installed translation functions are already effective when |
|
714 |
processing all of the preceding sections. |
|
323 | 715 |
|
3108 | 716 |
The {\tt ML} section's contents are simply copied verbatim near the |
717 |
beginning of the \ML\ file generated from a theory definition file. |
|
718 |
Definitions made here are accessible as components of an \ML\ |
|
719 |
structure; to make some parts private, use an \ML{} {\tt local} |
|
720 |
declaration. The {\ML} code may install translation functions by |
|
721 |
declaring any of the following identifiers: |
|
323 | 722 |
\begin{ttbox} |
3108 | 723 |
val parse_ast_translation : (string * (ast list -> ast)) list |
724 |
val print_ast_translation : (string * (ast list -> ast)) list |
|
725 |
val parse_translation : (string * (term list -> term)) list |
|
726 |
val print_translation : (string * (term list -> term)) list |
|
4375 | 727 |
val typed_print_translation : |
728 |
(string * (bool -> typ -> term list -> term)) list |
|
323 | 729 |
\end{ttbox} |
730 |
||
731 |
\subsection{The translation strategy} |
|
3108 | 732 |
The different kinds of translation functions are called during the |
733 |
transformations between parse trees, \AST{}s and terms (recall |
|
734 |
Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form |
|
735 |
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation |
|
736 |
function $f$ of appropriate kind exists for $c$, the result is |
|
737 |
computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$. |
|
323 | 738 |
|
3108 | 739 |
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s. |
740 |
A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, |
|
741 |
\ldots, x@n}$. For term translations, the arguments are terms and a |
|
742 |
combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} |
|
743 |
(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow more |
|
744 |
sophisticated transformations than \AST{}s do, typically involving |
|
745 |
abstractions and bound variables. {\em Typed} print translations may |
|
4375 | 746 |
even peek at the type $\tau$ of the constant they are invoked on; they |
747 |
are also passed the current value of the \ttindex{show_sorts} flag. |
|
323 | 748 |
|
3108 | 749 |
Regardless of whether they act on terms or \AST{}s, translation |
750 |
functions called during the parsing process differ from those for |
|
751 |
printing more fundamentally in their overall behaviour: |
|
323 | 752 |
\begin{description} |
6343 | 753 |
\item[Parse translations] are applied bottom-up. The arguments are already in |
754 |
translated form. The translations must not fail; exceptions trigger an |
|
755 |
error message. There may never be more than one function associated with |
|
756 |
any syntactic name. |
|
757 |
||
323 | 758 |
\item[Print translations] are applied top-down. They are supplied with |
759 |
arguments that are partly still in internal form. The result again |
|
6343 | 760 |
undergoes translation; therefore a print translation should not introduce as |
761 |
head the very constant that invoked it. The function may raise exception |
|
762 |
\xdx{Match} to indicate failure; in this event it has no effect. Multiple |
|
763 |
functions associated with some syntactic name are tried in an unspecified |
|
764 |
order. |
|
323 | 765 |
\end{description} |
766 |
||
767 |
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and |
|
768 |
\ttindex{Const} for terms --- can invoke translation functions. This |
|
769 |
causes another difference between parsing and printing. |
|
770 |
||
771 |
Parsing starts with a string and the constants are not yet identified. |
|
772 |
Only parse tree heads create {\tt Constant}s in the resulting \AST, as |
|
773 |
described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may |
|
774 |
introduce further {\tt Constant}s. When the final \AST{} is converted to a |
|
775 |
term, all {\tt Constant}s become {\tt Const}s, as described in |
|
776 |
\S\ref{sec:termofast}. |
|
777 |
||
778 |
Printing starts with a well-typed term and all the constants are known. So |
|
779 |
all logical constants and type constructors may invoke print translations. |
|
780 |
These, and macros, may introduce further constants. |
|
781 |
||
782 |
||
783 |
\subsection{Example: a print translation for dependent types} |
|
784 |
\index{examples!of translations}\indexbold{*dependent_tr'} |
|
785 |
||
786 |
Let us continue the dependent type example (page~\pageref{prod_trans}) by |
|
787 |
examining the parse translation for~\cdx{_K} and the print translation |
|
788 |
{\tt dependent_tr'}, which are both built-in. By convention, parse |
|
789 |
translations have names ending with {\tt _tr} and print translations have |
|
790 |
names ending with {\tt _tr'}. Search for such names in the Isabelle |
|
791 |
sources to locate more examples. |
|
792 |
||
793 |
Here is the parse translation for {\tt _K}: |
|
794 |
\begin{ttbox} |
|
795 |
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t) |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
796 |
| k_tr ts = raise TERM ("k_tr", ts); |
323 | 797 |
\end{ttbox} |
798 |
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new |
|
799 |
{\tt Abs} node with a body derived from $t$. Since terms given to parse |
|
800 |
translations are not yet typed, the type of the bound variable in the new |
|
801 |
{\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound} |
|
802 |
nodes referring to outer abstractions by calling \ttindex{incr_boundvars}, |
|
803 |
a basic term manipulation function defined in {\tt Pure/term.ML}. |
|
804 |
||
805 |
Here is the print translation for dependent types: |
|
806 |
\begin{ttbox} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
807 |
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) = |
323 | 808 |
if 0 mem (loose_bnos B) then |
3108 | 809 |
let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in |
810 |
list_comb |
|
8136 | 811 |
(Const (q,dummyT) $ |
812 |
Syntax.mark_boundT (x',{\thinspace}T) $ A $ B', ts) |
|
323 | 813 |
end |
814 |
else list_comb (Const (r, dummyT) $ A $ B, ts) |
|
815 |
| dependent_tr' _ _ = raise Match; |
|
816 |
\end{ttbox} |
|
3135 | 817 |
The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt |
3108 | 818 |
dependent_tr'} by a partial application during its installation. |
819 |
For example, we could set up print translations for both {\tt Pi} and |
|
820 |
{\tt Sigma} by including |
|
323 | 821 |
\begin{ttbox}\index{*ML section} |
822 |
val print_translation = |
|
823 |
[("Pi", dependent_tr' ("{\at}PROD", "{\at}->")), |
|
824 |
("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))]; |
|
825 |
\end{ttbox} |
|
3108 | 826 |
within the {\tt ML} section. The first of these transforms ${\tt |
827 |
Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or |
|
828 |
$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not |
|
829 |
depend on~$x$. It checks this using \ttindex{loose_bnos}, yet another |
|
830 |
function from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$ |
|
831 |
renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt |
|
832 |
Bound} nodes referring to the {\tt Abs} node replaced by |
|
833 |
$\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound |
|
834 |
variable). |
|
323 | 835 |
|
836 |
We must be careful with types here. While types of {\tt Const}s are |
|
837 |
ignored, type constraints may be printed for some {\tt Free}s and |
|
838 |
{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type |
|
839 |
\ttindex{dummyT} are never printed with constraint, though. The line |
|
840 |
\begin{ttbox} |
|
3108 | 841 |
let val (x', B') = Syntax.variant_abs' (x, dummyT, B); |
842 |
\end{ttbox}\index{*Syntax.variant_abs'} |
|
323 | 843 |
replaces bound variable occurrences in~$B$ by the free variable $x'$ with |
844 |
type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the |
|
845 |
correct type~{\tt T}, so this is the only place where a type |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
846 |
constraint might appear. |
3108 | 847 |
|
848 |
Also note that we are responsible to mark free identifiers that |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
849 |
actually represent bound variables. This is achieved by |
3108 | 850 |
\ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above. |
851 |
Failing to do so may cause these names to be printed in the wrong |
|
852 |
style. \index{translations|)} \index{syntax!transformations|)} |
|
853 |
||
854 |
||
855 |
\section{Token translations} \label{sec:tok_tr} |
|
856 |
\index{token translations|(} |
|
857 |
% |
|
858 |
Isabelle's meta-logic features quite a lot of different kinds of |
|
859 |
identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free}, |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
860 |
{\em bound}, {\em var}. One might want to have these printed in |
3108 | 861 |
different styles, e.g.\ in bold or italic, or even transcribed into |
862 |
something more readable like $\alpha, \alpha', \beta$ instead of {\tt |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
863 |
'a}, {\tt 'aa}, {\tt 'b} for type variables. Token translations |
3108 | 864 |
provide a means to such ends, enabling the user to install certain |
865 |
\ML{} functions associated with any logical \rmindex{token class} and |
|
866 |
depending on some \rmindex{print mode}. |
|
867 |
||
868 |
The logical class of identifiers can not necessarily be determined by |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
869 |
its syntactic category, though. For example, consider free vs.\ bound |
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
870 |
variables. So Isabelle's pretty printing mechanism, starting from |
3108 | 871 |
fully typed terms, has to be careful to preserve this additional |
872 |
information\footnote{This is done by marking atoms in abstract syntax |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
873 |
trees appropriately. The marks are actually visible by print |
3108 | 874 |
translation functions -- they are just special constants applied to |
875 |
atomic asts, for example \texttt{("_bound" x)}.}. In particular, |
|
876 |
user-supplied print translation functions operating on terms have to |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
877 |
be well-behaved in this respect. Free identifiers introduced to |
3108 | 878 |
represent bound variables have to be marked appropriately (cf.\ the |
879 |
example at the end of \S\ref{sec:tr_funs}). |
|
880 |
||
881 |
\medskip Token translations may be installed by declaring the |
|
6343 | 882 |
\ttindex{token_translation} value within the \texttt{ML} section of a theory |
883 |
definition file: |
|
3108 | 884 |
\begin{ttbox} |
8136 | 885 |
val token_translation: |
886 |
(string * string * (string -> string * real)) list |
|
8701 | 887 |
\end{ttbox} |
6343 | 888 |
The elements of this list are of the form $(m, c, f)$, where $m$ is a print |
889 |
mode identifier, $c$ a token class, and $f\colon string \to string \times |
|
890 |
real$ the actual translation function. Assuming that $x$ is of identifier |
|
891 |
class $c$, and print mode $m$ is the first (active) mode providing some |
|
892 |
translation for $c$, then $x$ is output according to $f(x) = (x', len)$. |
|
893 |
Thereby $x'$ is the modified identifier name and $len$ its visual length in |
|
894 |
terms of characters (e.g.\ length 1.0 would correspond to $1/2$\,em in |
|
895 |
\LaTeX). Thus $x'$ may include non-printing parts like control sequences or |
|
896 |
markup information for typesetting systems. |
|
3108 | 897 |
|
898 |
||
899 |
\index{token translations|)} |
|
5371 | 900 |
|
901 |
||
902 |
%%% Local Variables: |
|
903 |
%%% mode: latex |
|
904 |
%%% TeX-master: "ref" |
|
905 |
%%% End: |