author | wenzelm |
Thu, 20 Nov 1997 15:28:48 +0100 | |
changeset 4256 | e768c42069bb |
parent 3485 | f27a30a18a17 |
child 4375 | ef2a7b589004 |
permissions | -rw-r--r-- |
323 | 1 |
%% $Id$ |
2 |
\chapter{Syntax Transformations} \label{chap:syntax} |
|
3 |
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}} |
|
4 |
\newcommand\mtt[1]{\mbox{\tt #1}} |
|
5 |
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits} |
|
6 |
\newcommand\Constant{\ttfct{Constant}} |
|
7 |
\newcommand\Variable{\ttfct{Variable}} |
|
8 |
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]} |
|
9 |
\index{syntax!transformations|(} |
|
10 |
||
11 |
This chapter is intended for experienced Isabelle users who need to define |
|
12 |
macros or code their own translation functions. It describes the |
|
13 |
transformations between parse trees, abstract syntax trees and terms. |
|
14 |
||
15 |
||
16 |
\section{Abstract syntax trees} \label{sec:asts} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
17 |
\index{ASTs|(} |
323 | 18 |
|
19 |
The parser, given a token list from the lexer, applies productions to yield |
|
20 |
a parse tree\index{parse trees}. By applying some internal transformations |
|
21 |
the parse tree becomes an abstract syntax tree, or \AST{}. Macro |
|
22 |
expansion, further translations and finally type inference yields a |
|
23 |
well-typed term. The printing process is the reverse, except for some |
|
24 |
subtleties to be discussed later. |
|
25 |
||
26 |
Figure~\ref{fig:parse_print} outlines the parsing and printing process. |
|
27 |
Much of the complexity is due to the macro mechanism. Using macros, you |
|
28 |
can specify most forms of concrete syntax without writing any \ML{} code. |
|
29 |
||
30 |
\begin{figure} |
|
31 |
\begin{center} |
|
32 |
\begin{tabular}{cl} |
|
33 |
string & \\ |
|
3108 | 34 |
$\downarrow$ & lexer, parser \\ |
323 | 35 |
parse tree & \\ |
36 |
$\downarrow$ & parse \AST{} translation \\ |
|
37 |
\AST{} & \\ |
|
38 |
$\downarrow$ & \AST{} rewriting (macros) \\ |
|
39 |
\AST{} & \\ |
|
40 |
$\downarrow$ & parse translation, type inference \\ |
|
41 |
--- well-typed term --- & \\ |
|
42 |
$\downarrow$ & print translation \\ |
|
43 |
\AST{} & \\ |
|
44 |
$\downarrow$ & \AST{} rewriting (macros) \\ |
|
45 |
\AST{} & \\ |
|
3108 | 46 |
$\downarrow$ & print \AST{} translation, token translation \\ |
323 | 47 |
string & |
48 |
\end{tabular} |
|
49 |
||
50 |
\end{center} |
|
51 |
\caption{Parsing and printing}\label{fig:parse_print} |
|
52 |
\end{figure} |
|
53 |
||
54 |
Abstract syntax trees are an intermediate form between the raw parse trees |
|
55 |
and the typed $\lambda$-terms. An \AST{} is either an atom (constant or |
|
56 |
variable) or a list of {\em at least two\/} subtrees. Internally, they |
|
57 |
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable} |
|
58 |
\index{*Appl} |
|
59 |
\begin{ttbox} |
|
60 |
datatype ast = Constant of string |
|
61 |
| Variable of string |
|
62 |
| Appl of ast list |
|
63 |
\end{ttbox} |
|
64 |
% |
|
65 |
Isabelle uses an S-expression syntax for abstract syntax trees. Constant |
|
66 |
atoms are shown as quoted strings, variable atoms as non-quoted strings and |
|
332 | 67 |
applications as a parenthesised list of subtrees. For example, the \AST |
323 | 68 |
\begin{ttbox} |
69 |
Appl [Constant "_constrain", |
|
70 |
Appl [Constant "_abs", Variable "x", Variable "t"], |
|
71 |
Appl [Constant "fun", Variable "'a", Variable "'b"]] |
|
72 |
\end{ttbox} |
|
73 |
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. |
|
74 |
Both {\tt ()} and {\tt (f)} are illegal because they have too few |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
75 |
subtrees. |
323 | 76 |
|
77 |
The resemblance to Lisp's S-expressions is intentional, but there are two |
|
78 |
kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the |
|
79 |
names {\tt Constant} and {\tt Variable} too literally; in the later |
|
80 |
translation to terms, $\Variable x$ may become a constant, free or bound |
|
81 |
variable, even a type constructor or class name; the actual outcome depends |
|
82 |
on the context. |
|
83 |
||
84 |
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the |
|
85 |
application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of |
|
86 |
application is determined later by context; it could be a type constructor |
|
87 |
applied to types. |
|
88 |
||
89 |
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are |
|
90 |
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later |
|
91 |
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and |
|
92 |
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term |
|
93 |
constructor \ttindex{Bound}). |
|
94 |
||
95 |
||
96 |
\section{Transforming parse trees to \AST{}s}\label{sec:astofpt} |
|
97 |
\index{ASTs!made from parse trees} |
|
98 |
\newcommand\astofpt[1]{\lbrakk#1\rbrakk} |
|
99 |
||
100 |
The parse tree is the raw output of the parser. Translation functions, |
|
101 |
called {\bf parse AST translations}\indexbold{translations!parse AST}, |
|
102 |
transform the parse tree into an abstract syntax tree. |
|
103 |
||
104 |
The parse tree is constructed by nesting the right-hand sides of the |
|
105 |
productions used to recognize the input. Such parse trees are simply lists |
|
106 |
of tokens and constituent parse trees, the latter representing the |
|
107 |
nonterminals of the productions. Let us refer to the actual productions in |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
108 |
the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
109 |
example). |
323 | 110 |
|
111 |
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s |
|
112 |
by stripping out delimiters and copy productions. More precisely, the |
|
113 |
mapping $\astofpt{-}$ is derived from the productions as follows: |
|
114 |
\begin{itemize} |
|
115 |
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
116 |
\ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$ |
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
117 |
its associated string. Note that for {\tt xstr} this does not include the |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
118 |
quotes. |
323 | 119 |
|
120 |
\item Copy productions:\index{productions!copy} |
|
121 |
$\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for |
|
122 |
strings of delimiters, which are discarded. $P$ stands for the single |
|
123 |
constituent that is not a delimiter; it is either a nonterminal symbol or |
|
124 |
a name token. |
|
125 |
||
126 |
\item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. |
|
127 |
Here there are no constituents other than delimiters, which are |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
128 |
discarded. |
323 | 129 |
|
130 |
\item $n$-ary productions, where $n \ge 1$: delimiters are discarded and |
|
131 |
the remaining constituents $P@1$, \ldots, $P@n$ are built into an |
|
132 |
application whose head constant is~$c$: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
133 |
\[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = |
323 | 134 |
\Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} |
135 |
\] |
|
136 |
\end{itemize} |
|
137 |
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, |
|
138 |
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. |
|
139 |
These examples illustrate the need for further translations to make \AST{}s |
|
140 |
closer to the typed $\lambda$-calculus. The Pure syntax provides |
|
141 |
predefined parse \AST{} translations\index{translations!parse AST} for |
|
142 |
ordinary applications, type applications, nested abstractions, meta |
|
143 |
implications and function types. Figure~\ref{fig:parse_ast_tr} shows their |
|
144 |
effect on some representative input strings. |
|
145 |
||
146 |
||
147 |
\begin{figure} |
|
148 |
\begin{center} |
|
149 |
\tt\begin{tabular}{ll} |
|
150 |
\rm input string & \rm \AST \\\hline |
|
151 |
"f" & f \\ |
|
152 |
"'a" & 'a \\ |
|
153 |
"t == u" & ("==" t u) \\ |
|
154 |
"f(x)" & ("_appl" f x) \\ |
|
155 |
"f(x, y)" & ("_appl" f ("_args" x y)) \\ |
|
156 |
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ |
|
157 |
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ |
|
158 |
\end{tabular} |
|
159 |
\end{center} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
160 |
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} |
323 | 161 |
\end{figure} |
162 |
||
163 |
\begin{figure} |
|
164 |
\begin{center} |
|
165 |
\tt\begin{tabular}{ll} |
|
166 |
\rm input string & \rm \AST{} \\\hline |
|
167 |
"f(x, y, z)" & (f x y z) \\ |
|
168 |
"'a ty" & (ty 'a) \\ |
|
169 |
"('a, 'b) ty" & (ty 'a 'b) \\ |
|
170 |
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ |
|
171 |
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ |
|
172 |
"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ |
|
173 |
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) |
|
174 |
\end{tabular} |
|
175 |
\end{center} |
|
176 |
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr} |
|
177 |
\end{figure} |
|
178 |
||
179 |
The names of constant heads in the \AST{} control the translation process. |
|
180 |
The list of constants invoking parse \AST{} translations appears in the |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
181 |
output of {\tt print_syntax} under {\tt parse_ast_translation}. |
323 | 182 |
|
183 |
||
184 |
\section{Transforming \AST{}s to terms}\label{sec:termofast} |
|
185 |
\index{terms!made from ASTs} |
|
186 |
\newcommand\termofast[1]{\lbrakk#1\rbrakk} |
|
187 |
||
188 |
The \AST{}, after application of macros (see \S\ref{sec:macros}), is |
|
189 |
transformed into a term. This term is probably ill-typed since type |
|
190 |
inference has not occurred yet. The term may contain type constraints |
|
191 |
consisting of applications with head {\tt "_constrain"}; the second |
|
192 |
argument is a type encoded as a term. Type inference later introduces |
|
193 |
correct types or rejects the input. |
|
194 |
||
195 |
Another set of translation functions, namely parse |
|
196 |
translations\index{translations!parse}, may affect this process. If we |
|
197 |
ignore parse translations for the time being, then \AST{}s are transformed |
|
198 |
to terms by mapping \AST{} constants to constants, \AST{} variables to |
|
199 |
schematic or free variables and \AST{} applications to applications. |
|
200 |
||
201 |
More precisely, the mapping $\termofast{-}$ is defined by |
|
202 |
\begin{itemize} |
|
203 |
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x, |
|
204 |
\mtt{dummyT})$. |
|
205 |
||
206 |
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} = |
|
207 |
\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$ |
|
208 |
the index extracted from~$xi$. |
|
209 |
||
210 |
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x, |
|
211 |
\mtt{dummyT})$. |
|
212 |
||
213 |
\item Function applications with $n$ arguments: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
214 |
\[ \termofast{\Appl{f, x@1, \ldots, x@n}} = |
323 | 215 |
\termofast{f} \ttapp |
216 |
\termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n} |
|
217 |
\] |
|
218 |
\end{itemize} |
|
219 |
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and |
|
220 |
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term}, |
|
221 |
while \ttindex{dummyT} stands for some dummy type that is ignored during |
|
222 |
type inference. |
|
223 |
||
224 |
So far the outcome is still a first-order term. Abstractions and bound |
|
225 |
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced |
|
226 |
by parse translations. Such translations are attached to {\tt "_abs"}, |
|
227 |
{\tt "!!"} and user-defined binders. |
|
228 |
||
229 |
||
230 |
\section{Printing of terms} |
|
231 |
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms} |
|
232 |
||
233 |
The output phase is essentially the inverse of the input phase. Terms are |
|
234 |
translated via abstract syntax trees into strings. Finally the strings are |
|
235 |
pretty printed. |
|
236 |
||
237 |
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of |
|
238 |
terms into \AST{}s. Ignoring those, the transformation maps |
|
239 |
term constants, variables and applications to the corresponding constructs |
|
240 |
on \AST{}s. Abstractions are mapped to applications of the special |
|
241 |
constant {\tt _abs}. |
|
242 |
||
243 |
More precisely, the mapping $\astofterm{-}$ is defined as follows: |
|
244 |
\begin{itemize} |
|
245 |
\item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$. |
|
246 |
||
247 |
\item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x, |
|
248 |
\tau)$. |
|
249 |
||
250 |
\item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable |
|
251 |
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of |
|
252 |
the {\tt indexname} $(x, i)$. |
|
253 |
||
254 |
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant |
|
255 |
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ |
|
256 |
be obtained from~$t$ by replacing all bound occurrences of~$x$ by |
|
257 |
the free variable $x'$. This replaces corresponding occurrences of the |
|
258 |
constructor \ttindex{Bound} by the term $\ttfct{Free} (x', |
|
259 |
\mtt{dummyT})$: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
260 |
\[ \astofterm{\ttfct{Abs} (x, \tau, t)} = |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
261 |
\Appl{\Constant \mtt{"_abs"}, |
323 | 262 |
constrain(\Variable x', \tau), \astofterm{t'}}. |
263 |
\] |
|
264 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
265 |
\item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. |
323 | 266 |
The occurrence of constructor \ttindex{Bound} should never happen |
267 |
when printing well-typed terms; it indicates a de Bruijn index with no |
|
268 |
matching abstraction. |
|
269 |
||
270 |
\item Where $f$ is not an application, |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
271 |
\[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = |
323 | 272 |
\Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} |
273 |
\] |
|
274 |
\end{itemize} |
|
275 |
% |
|
332 | 276 |
Type constraints\index{type constraints} are inserted to allow the printing |
277 |
of types. This is governed by the boolean variable \ttindex{show_types}: |
|
323 | 278 |
\begin{itemize} |
279 |
\item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or |
|
332 | 280 |
\ttindex{show_types} is set to {\tt false}. |
323 | 281 |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
282 |
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
283 |
\astofterm{\tau}}$ \ otherwise. |
323 | 284 |
|
285 |
Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type |
|
286 |
constructors go to {\tt Constant}s; type identifiers go to {\tt |
|
287 |
Variable}s; type applications go to {\tt Appl}s with the type |
|
288 |
constructor as the first element. If \ttindex{show_sorts} is set to |
|
289 |
{\tt true}, some type variables are decorated with an \AST{} encoding |
|
290 |
of their sort. |
|
291 |
\end{itemize} |
|
292 |
% |
|
293 |
The \AST{}, after application of macros (see \S\ref{sec:macros}), is |
|
294 |
transformed into the final output string. The built-in {\bf print AST |
|
332 | 295 |
translations}\indexbold{translations!print AST} reverse the |
323 | 296 |
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}. |
297 |
||
298 |
For the actual printing process, the names attached to productions |
|
299 |
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play |
|
300 |
a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or |
|
301 |
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production |
|
302 |
for~$c$. Each argument~$x@i$ is converted to a string, and put in |
|
303 |
parentheses if its priority~$(p@i)$ requires this. The resulting strings |
|
304 |
and their syntactic sugar (denoted by \dots{} above) are joined to make a |
|
305 |
single string. |
|
306 |
||
3108 | 307 |
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments |
308 |
than the corresponding production, it is first split into |
|
309 |
$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications |
|
310 |
with too few arguments or with non-constant head or without a |
|
311 |
corresponding production are printed as $f(x@1, \ldots, x@l)$ or |
|
312 |
$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated |
|
313 |
with some name $c$ are tried in order of appearance. An occurrence of |
|
323 | 314 |
$\Variable x$ is simply printed as~$x$. |
315 |
||
316 |
Blanks are {\em not\/} inserted automatically. If blanks are required to |
|
317 |
separate tokens, specify them in the mixfix declaration, possibly preceded |
|
318 |
by a slash~({\tt/}) to allow a line break. |
|
319 |
\index{ASTs|)} |
|
320 |
||
321 |
||
322 |
||
3108 | 323 |
\section{Macros: syntactic rewriting} \label{sec:macros} |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
324 |
\index{macros|(}\index{rewriting!syntactic|(} |
323 | 325 |
|
326 |
Mixfix declarations alone can handle situations where there is a direct |
|
327 |
connection between the concrete syntax and the underlying term. Sometimes |
|
328 |
we require a more elaborate concrete syntax, such as quantifiers and list |
|
329 |
notation. Isabelle's {\bf macros} and {\bf translation functions} can |
|
330 |
perform translations such as |
|
331 |
\begin{center}\tt |
|
332 |
\begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l} |
|
333 |
ALL x:A.P & Ball(A, \%x.P) \\ \relax |
|
334 |
[x, y, z] & Cons(x, Cons(y, Cons(z, Nil))) |
|
335 |
\end{tabular} |
|
336 |
\end{center} |
|
337 |
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they |
|
338 |
are the most powerful translation mechanism but are difficult to read or |
|
339 |
write. Macros are specified by first-order rewriting systems that operate |
|
340 |
on abstract syntax trees. They are usually easy to read and write, and can |
|
341 |
express all but the most obscure translations. |
|
342 |
||
3108 | 343 |
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and |
344 |
set theory.\footnote{This and the following theories are complete |
|
345 |
working examples, though they specify only syntax, no axioms. The |
|
346 |
file {\tt ZF/ZF.thy} presents a full set theory definition, |
|
347 |
including many macro rules.} Theory {\tt SetSyntax} declares |
|
348 |
constants for set comprehension ({\tt Collect}), replacement ({\tt |
|
349 |
Replace}) and bounded universal quantification ({\tt Ball}). Each |
|
350 |
of these binds some variables. Without additional syntax we should |
|
351 |
have to write $\forall x \in A. P$ as {\tt Ball(A,\%x.P)}, and |
|
352 |
similarly for the others. |
|
323 | 353 |
|
354 |
\begin{figure} |
|
355 |
\begin{ttbox} |
|
3108 | 356 |
SetSyntax = Pure + |
323 | 357 |
types |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
358 |
i o |
323 | 359 |
arities |
360 |
i, o :: logic |
|
361 |
consts |
|
1387 | 362 |
Trueprop :: o => prop ("_" 5) |
363 |
Collect :: [i, i => o] => i |
|
364 |
Replace :: [i, [i, i] => o] => i |
|
365 |
Ball :: [i, i => o] => o |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
366 |
syntax |
1387 | 367 |
"{\at}Collect" :: [idt, i, o] => i ("(1{\ttlbrace}_:_./ _{\ttrbrace})") |
368 |
"{\at}Replace" :: [idt, idt, i, o] => i ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})") |
|
369 |
"{\at}Ball" :: [idt, i, o] => o ("(3ALL _:_./ _)" 10) |
|
323 | 370 |
translations |
371 |
"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)" |
|
372 |
"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)" |
|
373 |
"ALL x:A. P" == "Ball(A, \%x. P)" |
|
374 |
end |
|
375 |
\end{ttbox} |
|
376 |
\caption{Macro example: set theory}\label{fig:set_trans} |
|
377 |
\end{figure} |
|
378 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
379 |
The theory specifies a variable-binding syntax through additional productions |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
380 |
that have mixfix declarations. Each non-copy production must specify some |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
381 |
constant, which is used for building \AST{}s. \index{constants!syntactic} The |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
382 |
additional constants are decorated with {\tt\at} to stress their purely |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
383 |
syntactic purpose; they may not occur within the final well-typed terms, |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
384 |
being declared as {\tt syntax} rather than {\tt consts}. |
323 | 385 |
|
386 |
The translations cause the replacement of external forms by internal forms |
|
387 |
after parsing, and vice versa before printing of terms. As a specification |
|
388 |
of the set theory notation, they should be largely self-explanatory. The |
|
389 |
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball}, |
|
390 |
appear implicitly in the macro rules via their mixfix forms. |
|
391 |
||
392 |
Macros can define variable-binding syntax because they operate on \AST{}s, |
|
393 |
which have no inbuilt notion of bound variable. The macro variables {\tt |
|
394 |
x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers, |
|
395 |
in this case bound variables. The macro variables {\tt P} and~{\tt Q} |
|
396 |
range over formulae containing bound variable occurrences. |
|
397 |
||
398 |
Other applications of the macro system can be less straightforward, and |
|
399 |
there are peculiarities. The rest of this section will describe in detail |
|
400 |
how Isabelle macros are preprocessed and applied. |
|
401 |
||
402 |
||
403 |
\subsection{Specifying macros} |
|
404 |
Macros are basically rewrite rules on \AST{}s. But unlike other macro |
|
405 |
systems found in programming languages, Isabelle's macros work in both |
|
406 |
directions. Therefore a syntax contains two lists of rewrites: one for |
|
407 |
parsing and one for printing. |
|
408 |
||
409 |
\index{*translations section} |
|
410 |
The {\tt translations} section specifies macros. The syntax for a macro is |
|
411 |
\[ (root)\; string \quad |
|
412 |
\left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array} |
|
413 |
\right\} \quad |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
414 |
(root)\; string |
323 | 415 |
\] |
416 |
% |
|
417 |
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both |
|
418 |
({\tt ==}). The two strings specify the left and right-hand sides of the |
|
419 |
macro rule. The $(root)$ specification is optional; it specifies the |
|
420 |
nonterminal for parsing the $string$ and if omitted defaults to {\tt |
|
421 |
logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions: |
|
422 |
\begin{itemize} |
|
423 |
\item Rules must be left linear: $l$ must not contain repeated variables. |
|
424 |
||
425 |
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l = |
|
426 |
(\mtt"c\mtt" ~ x@1 \ldots x@n)$. |
|
427 |
||
428 |
\item Every variable in~$r$ must also occur in~$l$. |
|
429 |
\end{itemize} |
|
430 |
||
3108 | 431 |
Macro rules may refer to any syntax from the parent theories. They |
432 |
may also refer to anything defined before the current {\tt |
|
323 | 433 |
translations} section --- including any mixfix declarations. |
434 |
||
435 |
Upon declaration, both sides of the macro rule undergo parsing and parse |
|
436 |
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo |
|
437 |
macro expansion. The lexer runs in a different mode that additionally |
|
438 |
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt}, |
|
439 |
{\tt _K}). Thus, a constant whose name starts with an underscore can |
|
440 |
appear in macro rules but not in ordinary terms. |
|
441 |
||
442 |
Some atoms of the macro rule's \AST{} are designated as constants for |
|
443 |
matching. These are all names that have been declared as classes, types or |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
444 |
constants (logical and syntactic). |
323 | 445 |
|
3108 | 446 |
The result of this preprocessing is two lists of macro rules, each |
447 |
stored as a pair of \AST{}s. They can be viewed using {\tt |
|
448 |
print_syntax} (sections \ttindex{parse_rules} and |
|
449 |
\ttindex{print_rules}). For theory~{\tt SetSyntax} of |
|
450 |
Fig.~\ref{fig:set_trans} these are |
|
323 | 451 |
\begin{ttbox} |
452 |
parse_rules: |
|
453 |
("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P)) |
|
454 |
("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q))) |
|
455 |
("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P)) |
|
456 |
print_rules: |
|
457 |
("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P) |
|
458 |
("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q) |
|
459 |
("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P) |
|
460 |
\end{ttbox} |
|
461 |
||
462 |
\begin{warn} |
|
463 |
Avoid choosing variable names that have previously been used as |
|
464 |
constants, types or type classes; the {\tt consts} section in the output |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
465 |
of {\tt print_syntax} lists all such names. If a macro rule works |
323 | 466 |
incorrectly, inspect its internal form as shown above, recalling that |
467 |
constants appear as quoted strings and variables without quotes. |
|
468 |
\end{warn} |
|
469 |
||
470 |
\begin{warn} |
|
471 |
If \ttindex{eta_contract} is set to {\tt true}, terms will be |
|
472 |
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus some |
|
473 |
abstraction nodes needed for print rules to match may vanish. For example, |
|
332 | 474 |
\verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does |
323 | 475 |
not apply and the output will be {\tt Ball(A, P)}. This problem would not |
476 |
occur if \ML{} translation functions were used instead of macros (as is |
|
477 |
done for binder declarations). |
|
478 |
\end{warn} |
|
479 |
||
480 |
||
481 |
\begin{warn} |
|
482 |
Another trap concerns type constraints. If \ttindex{show_types} is set to |
|
483 |
{\tt true}, bound variables will be decorated by their meta types at the |
|
484 |
binding place (but not at occurrences in the body). Matching with |
|
485 |
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y |
|
486 |
"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
487 |
appear in the external form, say \verb|{y::i:A::i. P::o}|. |
323 | 488 |
|
489 |
To allow such constraints to be re-read, your syntax should specify bound |
|
490 |
variables using the nonterminal~\ndx{idt}. This is the case in our |
|
3108 | 491 |
example. Choosing {\tt id} instead of {\tt idt} is a common error. |
323 | 492 |
\end{warn} |
493 |
||
494 |
||
495 |
||
496 |
\subsection{Applying rules} |
|
497 |
As a term is being parsed or printed, an \AST{} is generated as an |
|
498 |
intermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} is |
|
332 | 499 |
normalised by applying macro rules in the manner of a traditional term |
323 | 500 |
rewriting system. We first examine how a single rule is applied. |
501 |
||
332 | 502 |
Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some |
323 | 503 |
translation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is an |
504 |
instance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redex |
|
505 |
matched by $l$ may be replaced by the corresponding instance of~$r$, thus |
|
506 |
{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf |
|
507 |
place-holders} that may occur in rule patterns but not in ordinary |
|
508 |
\AST{}s; {\tt Variable} atoms serve this purpose. |
|
509 |
||
510 |
The matching of the object~$u$ by the pattern~$l$ is performed as follows: |
|
511 |
\begin{itemize} |
|
512 |
\item Every constant matches itself. |
|
513 |
||
514 |
\item $\Variable x$ in the object matches $\Constant x$ in the pattern. |
|
515 |
This point is discussed further below. |
|
516 |
||
517 |
\item Every \AST{} in the object matches $\Variable x$ in the pattern, |
|
518 |
binding~$x$ to~$u$. |
|
519 |
||
520 |
\item One application matches another if they have the same number of |
|
521 |
subtrees and corresponding subtrees match. |
|
522 |
||
523 |
\item In every other case, matching fails. In particular, {\tt |
|
524 |
Constant}~$x$ can only match itself. |
|
525 |
\end{itemize} |
|
526 |
A successful match yields a substitution that is applied to~$r$, generating |
|
527 |
the instance that replaces~$u$. |
|
528 |
||
529 |
The second case above may look odd. This is where {\tt Variable}s of |
|
530 |
non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not |
|
531 |
far removed from parse trees; at this level it is not yet known which |
|
532 |
identifiers will become constants, bounds, frees, types or classes. As |
|
533 |
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
534 |
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid}, |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
535 |
\ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the other |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
536 |
hand, when \AST{}s generated from terms for printing, all constants and type |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
537 |
constructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
538 |
contain a messy mixture of {\tt Variable}s and {\tt Constant}s. This is |
323 | 539 |
insignificant at macro level because matching treats them alike. |
540 |
||
541 |
Because of this behaviour, different kinds of atoms with the same name are |
|
542 |
indistinguishable, which may make some rules prone to misbehaviour. Example: |
|
543 |
\begin{ttbox} |
|
544 |
types |
|
545 |
Nil |
|
546 |
consts |
|
1387 | 547 |
Nil :: 'a list |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
548 |
syntax |
1387 | 549 |
"[]" :: 'a list ("[]") |
323 | 550 |
translations |
551 |
"[]" == "Nil" |
|
552 |
\end{ttbox} |
|
553 |
The term {\tt Nil} will be printed as {\tt []}, just as expected. |
|
554 |
The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
555 |
expected! Guess how type~{\tt Nil} is printed? |
323 | 556 |
|
3108 | 557 |
Normalizing an \AST{} involves repeatedly applying macro rules until |
558 |
none are applicable. Macro rules are chosen in order of appearance in |
|
559 |
the theory definitions. You can watch the normalization of \AST{}s |
|
560 |
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} |
|
561 |
to {\tt true}.\index{tracing!of macros} Alternatively, use |
|
323 | 562 |
\ttindex{Syntax.test_read}. The information displayed when tracing |
3108 | 563 |
includes the \AST{} before normalization ({\tt pre}), redexes with |
564 |
results ({\tt rewrote}), the normal form finally reached ({\tt post}) |
|
565 |
and some statistics ({\tt normalize}). If tracing is off, |
|
566 |
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to |
|
567 |
enable printing of the normal form and statistics only. |
|
323 | 568 |
|
569 |
||
570 |
\subsection{Example: the syntax of finite sets} |
|
571 |
\index{examples!of macros} |
|
572 |
||
573 |
This example demonstrates the use of recursive macros to implement a |
|
574 |
convenient notation for finite sets. |
|
575 |
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol} |
|
576 |
\index{"@Enum@{\tt\at Enum} constant} |
|
577 |
\index{"@Finset@{\tt\at Finset} constant} |
|
578 |
\begin{ttbox} |
|
3108 | 579 |
FinSyntax = SetSyntax + |
323 | 580 |
types |
581 |
is |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
582 |
syntax |
1387 | 583 |
"" :: i => is ("_") |
584 |
"{\at}Enum" :: [i, is] => is ("_,/ _") |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
585 |
consts |
1387 | 586 |
empty :: i ("{\ttlbrace}{\ttrbrace}") |
587 |
insert :: [i, i] => i |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
588 |
syntax |
1387 | 589 |
"{\at}Finset" :: is => i ("{\ttlbrace}(_){\ttrbrace}") |
323 | 590 |
translations |
591 |
"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})" |
|
592 |
"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})" |
|
593 |
end |
|
594 |
\end{ttbox} |
|
595 |
Finite sets are internally built up by {\tt empty} and {\tt insert}. The |
|
596 |
declarations above specify \verb|{x, y, z}| as the external representation |
|
597 |
of |
|
598 |
\begin{ttbox} |
|
599 |
insert(x, insert(y, insert(z, empty))) |
|
600 |
\end{ttbox} |
|
601 |
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt |
|
602 |
i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|} |
|
603 |
allows a line break after the comma for \rmindex{pretty printing}; if no |
|
604 |
line break is required then a space is printed instead. |
|
605 |
||
606 |
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
607 |
declaration. Hence {\tt is} is not a logical type and may be used safely as |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
608 |
a new nonterminal for custom syntax. The nonterminal~{\tt is} can later be |
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
609 |
re-used for other enumerations of type~{\tt i} like lists or tuples. If we |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
610 |
had needed polymorphic enumerations, we could have used the predefined |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
611 |
nonterminal symbol \ndx{args} and skipped this part altogether. |
323 | 612 |
|
613 |
\index{"@Finset@{\tt\at Finset} constant} |
|
614 |
Next follows {\tt empty}, which is already equipped with its syntax |
|
615 |
\verb|{}|, and {\tt insert} without concrete syntax. The syntactic |
|
616 |
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt |
|
617 |
i} enclosed in curly braces. Remember that a pair of parentheses, as in |
|
618 |
\verb|"{(_)}"|, specifies a block of indentation for pretty printing. |
|
619 |
||
620 |
The translations may look strange at first. Macro rules are best |
|
621 |
understood in their internal forms: |
|
622 |
\begin{ttbox} |
|
623 |
parse_rules: |
|
624 |
("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs)) |
|
625 |
("{\at}Finset" x) -> ("insert" x "empty") |
|
626 |
print_rules: |
|
627 |
("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs)) |
|
628 |
("insert" x "empty") -> ("{\at}Finset" x) |
|
629 |
\end{ttbox} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
630 |
This shows that \verb|{x,xs}| indeed matches any set enumeration of at least |
323 | 631 |
two elements, binding the first to {\tt x} and the rest to {\tt xs}. |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
632 |
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration. |
323 | 633 |
The parse rules only work in the order given. |
634 |
||
635 |
\begin{warn} |
|
332 | 636 |
The \AST{} rewriter cannot distinguish constants from variables and looks |
323 | 637 |
only for names of atoms. Thus the names of {\tt Constant}s occurring in |
638 |
the (internal) left-hand side of translation rules should be regarded as |
|
639 |
\rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or |
|
640 |
sufficiently long and strange names. If a bound variable's name gets |
|
641 |
rewritten, the result will be incorrect; for example, the term |
|
642 |
\begin{ttbox} |
|
643 |
\%empty insert. insert(x, empty) |
|
644 |
\end{ttbox} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
645 |
\par\noindent is incorrectly printed as \verb|%empty insert. {x}|. |
323 | 646 |
\end{warn} |
647 |
||
648 |
||
649 |
\subsection{Example: a parse macro for dependent types}\label{prod_trans} |
|
650 |
\index{examples!of macros} |
|
651 |
||
652 |
As stated earlier, a macro rule may not introduce new {\tt Variable}s on |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
653 |
the right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal; |
323 | 654 |
if allowed, it could cause variable capture. In such cases you usually |
655 |
must fall back on translation functions. But a trick can make things |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
656 |
readable in some cases: {\em calling\/} translation functions by parse |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
657 |
macros: |
323 | 658 |
\begin{ttbox} |
3135 | 659 |
ProdSyntax = SetSyntax + |
323 | 660 |
consts |
1387 | 661 |
Pi :: [i, i => i] => i |
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
662 |
syntax |
1387 | 663 |
"{\at}PROD" :: [idt, i, i] => i ("(3PROD _:_./ _)" 10) |
664 |
"{\at}->" :: [i, i] => i ("(_ ->/ _)" [51, 50] 50) |
|
323 | 665 |
\ttbreak |
666 |
translations |
|
667 |
"PROD x:A. B" => "Pi(A, \%x. B)" |
|
668 |
"A -> B" => "Pi(A, _K(B))" |
|
669 |
end |
|
670 |
ML |
|
671 |
val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))]; |
|
672 |
\end{ttbox} |
|
673 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
674 |
Here {\tt Pi} is a logical constant for constructing general products. |
323 | 675 |
Two external forms exist: the general case {\tt PROD x:A.B} and the |
676 |
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B} |
|
677 |
does not depend on~{\tt x}. |
|
678 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
679 |
The second parse macro introduces {\tt _K(B)}, which later becomes |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
680 |
\verb|%x.B| due to a parse translation associated with \cdx{_K}. |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
681 |
Unfortunately there is no such trick for printing, so we have to add a {\tt |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
682 |
ML} section for the print translation \ttindex{dependent_tr'}. |
323 | 683 |
|
684 |
Recall that identifiers with a leading {\tt _} are allowed in translation |
|
685 |
rules, but not in ordinary terms. Thus we can create \AST{}s containing |
|
686 |
names that are not directly expressible. |
|
687 |
||
688 |
The parse translation for {\tt _K} is already installed in Pure, and {\tt |
|
689 |
dependent_tr'} is exported by the syntax module for public use. See |
|
690 |
\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions. |
|
691 |
\index{macros|)}\index{rewriting!syntactic|)} |
|
692 |
||
693 |
||
694 |
\section{Translation functions} \label{sec:tr_funs} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
695 |
\index{translations|(} |
323 | 696 |
% |
697 |
This section describes the translation function mechanism. By writing |
|
3108 | 698 |
\ML{} functions, you can do almost everything to terms or \AST{}s |
699 |
during parsing and printing. The logic \LK\ is a good example of |
|
700 |
sophisticated transformations between internal and external |
|
701 |
representations of sequents; here, macros would be useless. |
|
323 | 702 |
|
703 |
A full understanding of translations requires some familiarity |
|
704 |
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ}, |
|
705 |
{\tt Syntax.ast} and the encodings of types and terms as such at the various |
|
706 |
stages of the parsing or printing process. Most users should never need to |
|
707 |
use translation functions. |
|
708 |
||
709 |
\subsection{Declaring translation functions} |
|
3108 | 710 |
There are four kinds of translation functions, with one of these |
711 |
coming in two variants. Each such function is associated with a name, |
|
712 |
which triggers calls to it. Such names can be constants (logical or |
|
713 |
syntactic) or type constructors. |
|
323 | 714 |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
715 |
{\tt print_syntax} displays the sets of names associated with the |
3108 | 716 |
translation functions of a theory under |
323 | 717 |
\ttindex{parse_ast_translation}, \ttindex{parse_translation}, |
3108 | 718 |
\ttindex{print_translation} and \ttindex{print_ast_translation}. You |
719 |
can add new ones via the {\tt ML} section\index{*ML section} of a |
|
720 |
theory definition file. There may never be more than one function of |
|
721 |
the same kind per name. Even though the {\tt ML} section is the very |
|
722 |
last part of the file, newly installed translation functions are |
|
723 |
already effective when processing all of the preceding sections. |
|
323 | 724 |
|
3108 | 725 |
The {\tt ML} section's contents are simply copied verbatim near the |
726 |
beginning of the \ML\ file generated from a theory definition file. |
|
727 |
Definitions made here are accessible as components of an \ML\ |
|
728 |
structure; to make some parts private, use an \ML{} {\tt local} |
|
729 |
declaration. The {\ML} code may install translation functions by |
|
730 |
declaring any of the following identifiers: |
|
323 | 731 |
\begin{ttbox} |
3108 | 732 |
val parse_ast_translation : (string * (ast list -> ast)) list |
733 |
val print_ast_translation : (string * (ast list -> ast)) list |
|
734 |
val parse_translation : (string * (term list -> term)) list |
|
735 |
val print_translation : (string * (term list -> term)) list |
|
736 |
val typed_print_translation : (string * (typ -> term list -> term)) list |
|
323 | 737 |
\end{ttbox} |
738 |
||
739 |
\subsection{The translation strategy} |
|
3108 | 740 |
The different kinds of translation functions are called during the |
741 |
transformations between parse trees, \AST{}s and terms (recall |
|
742 |
Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form |
|
743 |
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation |
|
744 |
function $f$ of appropriate kind exists for $c$, the result is |
|
745 |
computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$. |
|
323 | 746 |
|
3108 | 747 |
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s. |
748 |
A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, |
|
749 |
\ldots, x@n}$. For term translations, the arguments are terms and a |
|
750 |
combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} |
|
751 |
(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow more |
|
752 |
sophisticated transformations than \AST{}s do, typically involving |
|
753 |
abstractions and bound variables. {\em Typed} print translations may |
|
754 |
even peek at the type $\tau$ of the constant they are invoked on. |
|
323 | 755 |
|
3108 | 756 |
Regardless of whether they act on terms or \AST{}s, translation |
757 |
functions called during the parsing process differ from those for |
|
758 |
printing more fundamentally in their overall behaviour: |
|
323 | 759 |
\begin{description} |
760 |
\item[Parse translations] are applied bottom-up. The arguments are already |
|
761 |
in translated form. The translations must not fail; exceptions trigger |
|
762 |
an error message. |
|
763 |
||
764 |
\item[Print translations] are applied top-down. They are supplied with |
|
765 |
arguments that are partly still in internal form. The result again |
|
766 |
undergoes translation; therefore a print translation should not introduce |
|
767 |
as head the very constant that invoked it. The function may raise |
|
768 |
exception \xdx{Match} to indicate failure; in this event it has no |
|
769 |
effect. |
|
770 |
\end{description} |
|
771 |
||
772 |
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and |
|
773 |
\ttindex{Const} for terms --- can invoke translation functions. This |
|
774 |
causes another difference between parsing and printing. |
|
775 |
||
776 |
Parsing starts with a string and the constants are not yet identified. |
|
777 |
Only parse tree heads create {\tt Constant}s in the resulting \AST, as |
|
778 |
described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may |
|
779 |
introduce further {\tt Constant}s. When the final \AST{} is converted to a |
|
780 |
term, all {\tt Constant}s become {\tt Const}s, as described in |
|
781 |
\S\ref{sec:termofast}. |
|
782 |
||
783 |
Printing starts with a well-typed term and all the constants are known. So |
|
784 |
all logical constants and type constructors may invoke print translations. |
|
785 |
These, and macros, may introduce further constants. |
|
786 |
||
787 |
||
788 |
\subsection{Example: a print translation for dependent types} |
|
789 |
\index{examples!of translations}\indexbold{*dependent_tr'} |
|
790 |
||
791 |
Let us continue the dependent type example (page~\pageref{prod_trans}) by |
|
792 |
examining the parse translation for~\cdx{_K} and the print translation |
|
793 |
{\tt dependent_tr'}, which are both built-in. By convention, parse |
|
794 |
translations have names ending with {\tt _tr} and print translations have |
|
795 |
names ending with {\tt _tr'}. Search for such names in the Isabelle |
|
796 |
sources to locate more examples. |
|
797 |
||
798 |
Here is the parse translation for {\tt _K}: |
|
799 |
\begin{ttbox} |
|
800 |
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t) |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
801 |
| k_tr ts = raise TERM ("k_tr", ts); |
323 | 802 |
\end{ttbox} |
803 |
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new |
|
804 |
{\tt Abs} node with a body derived from $t$. Since terms given to parse |
|
805 |
translations are not yet typed, the type of the bound variable in the new |
|
806 |
{\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound} |
|
807 |
nodes referring to outer abstractions by calling \ttindex{incr_boundvars}, |
|
808 |
a basic term manipulation function defined in {\tt Pure/term.ML}. |
|
809 |
||
810 |
Here is the print translation for dependent types: |
|
811 |
\begin{ttbox} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
812 |
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) = |
323 | 813 |
if 0 mem (loose_bnos B) then |
3108 | 814 |
let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in |
815 |
list_comb |
|
3135 | 816 |
(Const (q,{\thinspace}dummyT) $ Syntax.mark_boundT (x',{\thinspace}T) $ A $ B',{\thinspace}ts) |
323 | 817 |
end |
818 |
else list_comb (Const (r, dummyT) $ A $ B, ts) |
|
819 |
| dependent_tr' _ _ = raise Match; |
|
820 |
\end{ttbox} |
|
3135 | 821 |
The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt |
3108 | 822 |
dependent_tr'} by a partial application during its installation. |
823 |
For example, we could set up print translations for both {\tt Pi} and |
|
824 |
{\tt Sigma} by including |
|
323 | 825 |
\begin{ttbox}\index{*ML section} |
826 |
val print_translation = |
|
827 |
[("Pi", dependent_tr' ("{\at}PROD", "{\at}->")), |
|
828 |
("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))]; |
|
829 |
\end{ttbox} |
|
3108 | 830 |
within the {\tt ML} section. The first of these transforms ${\tt |
831 |
Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or |
|
832 |
$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not |
|
833 |
depend on~$x$. It checks this using \ttindex{loose_bnos}, yet another |
|
834 |
function from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$ |
|
835 |
renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt |
|
836 |
Bound} nodes referring to the {\tt Abs} node replaced by |
|
837 |
$\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound |
|
838 |
variable). |
|
323 | 839 |
|
840 |
We must be careful with types here. While types of {\tt Const}s are |
|
841 |
ignored, type constraints may be printed for some {\tt Free}s and |
|
842 |
{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type |
|
843 |
\ttindex{dummyT} are never printed with constraint, though. The line |
|
844 |
\begin{ttbox} |
|
3108 | 845 |
let val (x', B') = Syntax.variant_abs' (x, dummyT, B); |
846 |
\end{ttbox}\index{*Syntax.variant_abs'} |
|
323 | 847 |
replaces bound variable occurrences in~$B$ by the free variable $x'$ with |
848 |
type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the |
|
849 |
correct type~{\tt T}, so this is the only place where a type |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
850 |
constraint might appear. |
3108 | 851 |
|
852 |
Also note that we are responsible to mark free identifiers that |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
853 |
actually represent bound variables. This is achieved by |
3108 | 854 |
\ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above. |
855 |
Failing to do so may cause these names to be printed in the wrong |
|
856 |
style. \index{translations|)} \index{syntax!transformations|)} |
|
857 |
||
858 |
||
859 |
\section{Token translations} \label{sec:tok_tr} |
|
860 |
\index{token translations|(} |
|
861 |
% |
|
862 |
Isabelle's meta-logic features quite a lot of different kinds of |
|
863 |
identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free}, |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
864 |
{\em bound}, {\em var}. One might want to have these printed in |
3108 | 865 |
different styles, e.g.\ in bold or italic, or even transcribed into |
866 |
something more readable like $\alpha, \alpha', \beta$ instead of {\tt |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
867 |
'a}, {\tt 'aa}, {\tt 'b} for type variables. Token translations |
3108 | 868 |
provide a means to such ends, enabling the user to install certain |
869 |
\ML{} functions associated with any logical \rmindex{token class} and |
|
870 |
depending on some \rmindex{print mode}. |
|
871 |
||
872 |
The logical class of identifiers can not necessarily be determined by |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
873 |
its syntactic category, though. For example, consider free vs.\ bound |
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
874 |
variables. So Isabelle's pretty printing mechanism, starting from |
3108 | 875 |
fully typed terms, has to be careful to preserve this additional |
876 |
information\footnote{This is done by marking atoms in abstract syntax |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
877 |
trees appropriately. The marks are actually visible by print |
3108 | 878 |
translation functions -- they are just special constants applied to |
879 |
atomic asts, for example \texttt{("_bound" x)}.}. In particular, |
|
880 |
user-supplied print translation functions operating on terms have to |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
881 |
be well-behaved in this respect. Free identifiers introduced to |
3108 | 882 |
represent bound variables have to be marked appropriately (cf.\ the |
883 |
example at the end of \S\ref{sec:tr_funs}). |
|
884 |
||
885 |
\medskip Token translations may be installed by declaring the |
|
886 |
\ttindex{token_translation} value within the \texttt{ML} section of a |
|
887 |
theory definition file: |
|
888 |
\begin{ttbox} |
|
889 |
val token_translation: (string * string * (string -> string * int)) list |
|
890 |
\end{ttbox}\index{token_translation} |
|
891 |
The elements of this list are of the form $(m, c, f)$, where $m$ is a |
|
892 |
print mode identifier, $c$ a token class, and $f\colon string \to |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
893 |
string \times int$ the actual translation function. Assuming that $x$ |
3108 | 894 |
is of identifier class $c$, and print mode $m$ is the first one of all |
895 |
currently active modes that provide some translation for $c$, then $x$ |
|
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
896 |
is output according to $f(x) = (x', len)$. Thereby $x'$ is the |
3108 | 897 |
modified identifier name and $len$ its visual length approximated in |
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3135
diff
changeset
|
898 |
terms of whole characters. Thus $x'$ may include non-printing parts |
3108 | 899 |
like control sequences or markup information for typesetting systems. |
900 |
||
901 |
%FIXME example (?) |
|
902 |
||
903 |
\index{token translations|)} |