| author | wenzelm | 
| Wed, 18 Jan 1995 10:17:55 +0100 | |
| changeset 864 | d63b111b917a | 
| parent 499 | 5a54c796b808 | 
| child 1387 | 9bcad9c22fd4 | 
| permissions | -rw-r--r-- | 
| 323 | 1  | 
%% $Id$  | 
2  | 
\chapter{Syntax Transformations} \label{chap:syntax}
 | 
|
3  | 
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
 | 
|
4  | 
\newcommand\mtt[1]{\mbox{\tt #1}}
 | 
|
5  | 
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
 | 
|
6  | 
\newcommand\Constant{\ttfct{Constant}}
 | 
|
7  | 
\newcommand\Variable{\ttfct{Variable}}
 | 
|
8  | 
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
 | 
|
9  | 
\index{syntax!transformations|(}
 | 
|
10  | 
||
11  | 
This chapter is intended for experienced Isabelle users who need to define  | 
|
12  | 
macros or code their own translation functions. It describes the  | 
|
13  | 
transformations between parse trees, abstract syntax trees and terms.  | 
|
14  | 
||
15  | 
||
16  | 
\section{Abstract syntax trees} \label{sec:asts}
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
17  | 
\index{ASTs|(}
 | 
| 323 | 18  | 
|
19  | 
The parser, given a token list from the lexer, applies productions to yield  | 
|
20  | 
a parse tree\index{parse trees}.  By applying some internal transformations
 | 
|
21  | 
the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
 | 
|
22  | 
expansion, further translations and finally type inference yields a  | 
|
23  | 
well-typed term. The printing process is the reverse, except for some  | 
|
24  | 
subtleties to be discussed later.  | 
|
25  | 
||
26  | 
Figure~\ref{fig:parse_print} outlines the parsing and printing process.
 | 
|
27  | 
Much of the complexity is due to the macro mechanism. Using macros, you  | 
|
28  | 
can specify most forms of concrete syntax without writing any \ML{} code.
 | 
|
29  | 
||
30  | 
\begin{figure}
 | 
|
31  | 
\begin{center}
 | 
|
32  | 
\begin{tabular}{cl}
 | 
|
33  | 
string & \\  | 
|
34  | 
$\downarrow$ & parser \\  | 
|
35  | 
parse tree & \\  | 
|
36  | 
$\downarrow$    & parse \AST{} translation \\
 | 
|
37  | 
\AST{}             & \\
 | 
|
38  | 
$\downarrow$    & \AST{} rewriting (macros) \\
 | 
|
39  | 
\AST{}             & \\
 | 
|
40  | 
$\downarrow$ & parse translation, type inference \\  | 
|
41  | 
--- well-typed term --- & \\  | 
|
42  | 
$\downarrow$ & print translation \\  | 
|
43  | 
\AST{}             & \\
 | 
|
44  | 
$\downarrow$    & \AST{} rewriting (macros) \\
 | 
|
45  | 
\AST{}             & \\
 | 
|
46  | 
$\downarrow$    & print \AST{} translation, printer \\
 | 
|
47  | 
string &  | 
|
48  | 
\end{tabular}
 | 
|
49  | 
||
50  | 
\end{center}
 | 
|
51  | 
\caption{Parsing and printing}\label{fig:parse_print}
 | 
|
52  | 
\end{figure}
 | 
|
53  | 
||
54  | 
Abstract syntax trees are an intermediate form between the raw parse trees  | 
|
55  | 
and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
 | 
|
56  | 
variable) or a list of {\em at least two\/} subtrees.  Internally, they
 | 
|
57  | 
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
 | 
|
58  | 
\index{*Appl}
 | 
|
59  | 
\begin{ttbox}
 | 
|
60  | 
datatype ast = Constant of string  | 
|
61  | 
| Variable of string  | 
|
62  | 
| Appl of ast list  | 
|
63  | 
\end{ttbox}
 | 
|
64  | 
%  | 
|
65  | 
Isabelle uses an S-expression syntax for abstract syntax trees. Constant  | 
|
66  | 
atoms are shown as quoted strings, variable atoms as non-quoted strings and  | 
|
| 332 | 67  | 
applications as a parenthesised list of subtrees. For example, the \AST  | 
| 323 | 68  | 
\begin{ttbox}
 | 
69  | 
Appl [Constant "_constrain",  | 
|
70  | 
Appl [Constant "_abs", Variable "x", Variable "t"],  | 
|
71  | 
Appl [Constant "fun", Variable "'a", Variable "'b"]]  | 
|
72  | 
\end{ttbox}
 | 
|
73  | 
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
 | 
|
74  | 
Both {\tt ()} and {\tt (f)} are illegal because they have too few
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
75  | 
subtrees.  | 
| 323 | 76  | 
|
77  | 
The resemblance to Lisp's S-expressions is intentional, but there are two  | 
|
78  | 
kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the  | 
|
79  | 
names {\tt Constant} and {\tt Variable} too literally; in the later
 | 
|
80  | 
translation to terms, $\Variable x$ may become a constant, free or bound  | 
|
81  | 
variable, even a type constructor or class name; the actual outcome depends  | 
|
82  | 
on the context.  | 
|
83  | 
||
84  | 
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
 | 
|
85  | 
application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of  | 
|
86  | 
application is determined later by context; it could be a type constructor  | 
|
87  | 
applied to types.  | 
|
88  | 
||
89  | 
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
 | 
|
90  | 
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
 | 
|
91  | 
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
 | 
|
92  | 
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
 | 
|
93  | 
constructor \ttindex{Bound}).
 | 
|
94  | 
||
95  | 
||
96  | 
\section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
 | 
|
97  | 
\index{ASTs!made from parse trees}
 | 
|
98  | 
\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
 | 
|
99  | 
||
100  | 
The parse tree is the raw output of the parser. Translation functions,  | 
|
101  | 
called {\bf parse AST translations}\indexbold{translations!parse AST},
 | 
|
102  | 
transform the parse tree into an abstract syntax tree.  | 
|
103  | 
||
104  | 
The parse tree is constructed by nesting the right-hand sides of the  | 
|
105  | 
productions used to recognize the input. Such parse trees are simply lists  | 
|
106  | 
of tokens and constituent parse trees, the latter representing the  | 
|
107  | 
nonterminals of the productions. Let us refer to the actual productions in  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
108  | 
the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
109  | 
example).  | 
| 323 | 110  | 
|
111  | 
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
 | 
|
112  | 
by stripping out delimiters and copy productions. More precisely, the  | 
|
113  | 
mapping $\astofpt{-}$ is derived from the productions as follows:
 | 
|
114  | 
\begin{itemize}
 | 
|
115  | 
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
116  | 
  \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
117  | 
  its associated string. Note that for {\tt xstr} this does not include the
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
118  | 
quotes.  | 
| 323 | 119  | 
|
120  | 
\item Copy productions:\index{productions!copy}
 | 
|
121  | 
  $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
 | 
|
122  | 
strings of delimiters, which are discarded. $P$ stands for the single  | 
|
123  | 
constituent that is not a delimiter; it is either a nonterminal symbol or  | 
|
124  | 
a name token.  | 
|
125  | 
||
126  | 
  \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
 | 
|
127  | 
Here there are no constituents other than delimiters, which are  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
128  | 
discarded.  | 
| 323 | 129  | 
|
130  | 
\item $n$-ary productions, where $n \ge 1$: delimiters are discarded and  | 
|
131  | 
the remaining constituents $P@1$, \ldots, $P@n$ are built into an  | 
|
132  | 
application whose head constant is~$c$:  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
133  | 
    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
 | 
| 323 | 134  | 
       \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
 | 
135  | 
\]  | 
|
136  | 
\end{itemize}
 | 
|
137  | 
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
 | 
|
138  | 
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
 | 
|
139  | 
These examples illustrate the need for further translations to make \AST{}s
 | 
|
140  | 
closer to the typed $\lambda$-calculus. The Pure syntax provides  | 
|
141  | 
predefined parse \AST{} translations\index{translations!parse AST} for
 | 
|
142  | 
ordinary applications, type applications, nested abstractions, meta  | 
|
143  | 
implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
 | 
|
144  | 
effect on some representative input strings.  | 
|
145  | 
||
146  | 
||
147  | 
\begin{figure}
 | 
|
148  | 
\begin{center}
 | 
|
149  | 
\tt\begin{tabular}{ll}
 | 
|
150  | 
\rm input string & \rm \AST \\\hline  | 
|
151  | 
"f" & f \\  | 
|
152  | 
"'a" & 'a \\  | 
|
153  | 
"t == u"            & ("==" t u) \\
 | 
|
154  | 
"f(x)"              & ("_appl" f x) \\
 | 
|
155  | 
"f(x, y)"           & ("_appl" f ("_args" x y)) \\
 | 
|
156  | 
"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
 | 
|
157  | 
"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
 | 
|
158  | 
\end{tabular}
 | 
|
159  | 
\end{center}
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
160  | 
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
 | 
| 323 | 161  | 
\end{figure}
 | 
162  | 
||
163  | 
\begin{figure}
 | 
|
164  | 
\begin{center}
 | 
|
165  | 
\tt\begin{tabular}{ll}
 | 
|
166  | 
\rm input string            & \rm \AST{} \\\hline
 | 
|
167  | 
"f(x, y, z)" & (f x y z) \\  | 
|
168  | 
"'a ty" & (ty 'a) \\  | 
|
169  | 
"('a, 'b) ty"               & (ty 'a 'b) \\
 | 
|
170  | 
"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
 | 
|
171  | 
"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
 | 
|
172  | 
"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
 | 
|
173  | 
"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
 | 
|
174  | 
\end{tabular}
 | 
|
175  | 
\end{center}
 | 
|
176  | 
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
 | 
|
177  | 
\end{figure}
 | 
|
178  | 
||
179  | 
The names of constant heads in the \AST{} control the translation process.
 | 
|
180  | 
The list of constants invoking parse \AST{} translations appears in the
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
181  | 
output of {\tt print_syntax} under {\tt parse_ast_translation}.
 | 
| 323 | 182  | 
|
183  | 
||
184  | 
\section{Transforming \AST{}s to terms}\label{sec:termofast}
 | 
|
185  | 
\index{terms!made from ASTs}
 | 
|
186  | 
\newcommand\termofast[1]{\lbrakk#1\rbrakk}
 | 
|
187  | 
||
188  | 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | 
|
189  | 
transformed into a term. This term is probably ill-typed since type  | 
|
190  | 
inference has not occurred yet. The term may contain type constraints  | 
|
191  | 
consisting of applications with head {\tt "_constrain"}; the second
 | 
|
192  | 
argument is a type encoded as a term. Type inference later introduces  | 
|
193  | 
correct types or rejects the input.  | 
|
194  | 
||
195  | 
Another set of translation functions, namely parse  | 
|
196  | 
translations\index{translations!parse}, may affect this process.  If we
 | 
|
197  | 
ignore parse translations for the time being, then \AST{}s are transformed
 | 
|
198  | 
to terms by mapping \AST{} constants to constants, \AST{} variables to
 | 
|
199  | 
schematic or free variables and \AST{} applications to applications.
 | 
|
200  | 
||
201  | 
More precisely, the mapping $\termofast{-}$ is defined by
 | 
|
202  | 
\begin{itemize}
 | 
|
203  | 
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
 | 
|
204  | 
  \mtt{dummyT})$.
 | 
|
205  | 
||
206  | 
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
 | 
|
207  | 
  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
 | 
|
208  | 
the index extracted from~$xi$.  | 
|
209  | 
||
210  | 
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
 | 
|
211  | 
  \mtt{dummyT})$.
 | 
|
212  | 
||
213  | 
\item Function applications with $n$ arguments:  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
214  | 
    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
 | 
| 323 | 215  | 
       \termofast{f} \ttapp
 | 
216  | 
         \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
 | 
|
217  | 
\]  | 
|
218  | 
\end{itemize}
 | 
|
219  | 
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
 | 
|
220  | 
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
 | 
|
221  | 
while \ttindex{dummyT} stands for some dummy type that is ignored during
 | 
|
222  | 
type inference.  | 
|
223  | 
||
224  | 
So far the outcome is still a first-order term. Abstractions and bound  | 
|
225  | 
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
 | 
|
226  | 
by parse translations.  Such translations are attached to {\tt "_abs"},
 | 
|
227  | 
{\tt "!!"} and user-defined binders.
 | 
|
228  | 
||
229  | 
||
230  | 
\section{Printing of terms}
 | 
|
231  | 
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
 | 
|
232  | 
||
233  | 
The output phase is essentially the inverse of the input phase. Terms are  | 
|
234  | 
translated via abstract syntax trees into strings. Finally the strings are  | 
|
235  | 
pretty printed.  | 
|
236  | 
||
237  | 
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
 | 
|
238  | 
terms into \AST{}s.  Ignoring those, the transformation maps
 | 
|
239  | 
term constants, variables and applications to the corresponding constructs  | 
|
240  | 
on \AST{}s.  Abstractions are mapped to applications of the special
 | 
|
241  | 
constant {\tt _abs}.
 | 
|
242  | 
||
243  | 
More precisely, the mapping $\astofterm{-}$ is defined as follows:
 | 
|
244  | 
\begin{itemize}
 | 
|
245  | 
  \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
 | 
|
246  | 
||
247  | 
  \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
 | 
|
248  | 
\tau)$.  | 
|
249  | 
||
250  | 
  \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
 | 
|
251  | 
    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
 | 
|
252  | 
    the {\tt indexname} $(x, i)$.
 | 
|
253  | 
||
254  | 
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant  | 
|
255  | 
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$  | 
|
256  | 
be obtained from~$t$ by replacing all bound occurrences of~$x$ by  | 
|
257  | 
the free variable $x'$. This replaces corresponding occurrences of the  | 
|
258  | 
    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
 | 
|
259  | 
    \mtt{dummyT})$:
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
260  | 
   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
261  | 
      \Appl{\Constant \mtt{"_abs"},
 | 
| 323 | 262  | 
        constrain(\Variable x', \tau), \astofterm{t'}}.
 | 
263  | 
\]  | 
|
264  | 
||
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
265  | 
  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
 | 
| 323 | 266  | 
    The occurrence of constructor \ttindex{Bound} should never happen
 | 
267  | 
when printing well-typed terms; it indicates a de Bruijn index with no  | 
|
268  | 
matching abstraction.  | 
|
269  | 
||
270  | 
\item Where $f$ is not an application,  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
271  | 
    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
 | 
| 323 | 272  | 
       \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
 | 
273  | 
\]  | 
|
274  | 
\end{itemize}
 | 
|
275  | 
%  | 
|
| 332 | 276  | 
Type constraints\index{type constraints} are inserted to allow the printing
 | 
277  | 
of types.  This is governed by the boolean variable \ttindex{show_types}:
 | 
|
| 323 | 278  | 
\begin{itemize}
 | 
279  | 
  \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
 | 
|
| 332 | 280  | 
    \ttindex{show_types} is set to {\tt false}.
 | 
| 323 | 281  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
282  | 
  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
283  | 
         \astofterm{\tau}}$ \ otherwise.
 | 
| 323 | 284  | 
|
285  | 
    Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
 | 
|
286  | 
    constructors go to {\tt Constant}s; type identifiers go to {\tt
 | 
|
287  | 
      Variable}s; type applications go to {\tt Appl}s with the type
 | 
|
288  | 
    constructor as the first element.  If \ttindex{show_sorts} is set to
 | 
|
289  | 
    {\tt true}, some type variables are decorated with an \AST{} encoding
 | 
|
290  | 
of their sort.  | 
|
291  | 
\end{itemize}
 | 
|
292  | 
%  | 
|
293  | 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | 
|
294  | 
transformed into the final output string.  The built-in {\bf print AST
 | 
|
| 332 | 295  | 
  translations}\indexbold{translations!print AST} reverse the
 | 
| 323 | 296  | 
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
 | 
297  | 
||
298  | 
For the actual printing process, the names attached to productions  | 
|
299  | 
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
 | 
|
300  | 
a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
 | 
|
301  | 
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production  | 
|
302  | 
for~$c$. Each argument~$x@i$ is converted to a string, and put in  | 
|
303  | 
parentheses if its priority~$(p@i)$ requires this. The resulting strings  | 
|
304  | 
and their syntactic sugar (denoted by \dots{} above) are joined to make a
 | 
|
305  | 
single string.  | 
|
306  | 
||
307  | 
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the  | 
|
308  | 
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots  | 
|
309  | 
x@n) ~ x@{n+1} \ldots x@m)$.  Applications with too few arguments or with
 | 
|
310  | 
non-constant head or without a corresponding production are printed as  | 
|
311  | 
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. An occurrence of  | 
|
312  | 
$\Variable x$ is simply printed as~$x$.  | 
|
313  | 
||
314  | 
Blanks are {\em not\/} inserted automatically.  If blanks are required to
 | 
|
315  | 
separate tokens, specify them in the mixfix declaration, possibly preceded  | 
|
316  | 
by a slash~({\tt/}) to allow a line break.
 | 
|
317  | 
\index{ASTs|)}
 | 
|
318  | 
||
319  | 
||
320  | 
||
321  | 
\section{Macros: Syntactic rewriting} \label{sec:macros}
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
322  | 
\index{macros|(}\index{rewriting!syntactic|(}
 | 
| 323 | 323  | 
|
324  | 
Mixfix declarations alone can handle situations where there is a direct  | 
|
325  | 
connection between the concrete syntax and the underlying term. Sometimes  | 
|
326  | 
we require a more elaborate concrete syntax, such as quantifiers and list  | 
|
327  | 
notation.  Isabelle's {\bf macros} and {\bf translation functions} can
 | 
|
328  | 
perform translations such as  | 
|
329  | 
\begin{center}\tt
 | 
|
330  | 
  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
 | 
|
331  | 
ALL x:A.P & Ball(A, \%x.P) \\ \relax  | 
|
332  | 
[x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))  | 
|
333  | 
  \end{tabular}
 | 
|
334  | 
\end{center}
 | 
|
335  | 
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
 | 
|
336  | 
are the most powerful translation mechanism but are difficult to read or  | 
|
337  | 
write. Macros are specified by first-order rewriting systems that operate  | 
|
338  | 
on abstract syntax trees. They are usually easy to read and write, and can  | 
|
339  | 
express all but the most obscure translations.  | 
|
340  | 
||
341  | 
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
 | 
|
342  | 
theory.\footnote{This and the following theories are complete working
 | 
|
343  | 
  examples, though they specify only syntax, no axioms.  The file {\tt
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
344  | 
ZF/ZF.thy} presents a full set theory definition, including many  | 
| 323 | 345  | 
  macro rules.}  Theory {\tt SET} defines constants for set comprehension
 | 
346  | 
({\tt Collect}), replacement ({\tt Replace}) and bounded universal
 | 
|
347  | 
quantification ({\tt Ball}).  Each of these binds some variables.  Without
 | 
|
| 332 | 348  | 
additional syntax we should have to write $\forall x \in A.  P$ as {\tt
 | 
| 323 | 349  | 
Ball(A,\%x.P)}, and similarly for the others.  | 
350  | 
||
351  | 
\begin{figure}
 | 
|
352  | 
\begin{ttbox}
 | 
|
353  | 
SET = Pure +  | 
|
354  | 
types  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
355  | 
i o  | 
| 323 | 356  | 
arities  | 
357  | 
i, o :: logic  | 
|
358  | 
consts  | 
|
359  | 
  Trueprop      :: "o => prop"              ("_" 5)
 | 
|
360  | 
Collect :: "[i, i => o] => i"  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
361  | 
Replace :: "[i, [i, i] => o] => i"  | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
362  | 
Ball :: "[i, i => o] => o"  | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
363  | 
syntax  | 
| 323 | 364  | 
  "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
 | 
365  | 
  "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
 | 
|
366  | 
  "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
 | 
|
367  | 
translations  | 
|
368  | 
  "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
 | 
|
369  | 
  "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
 | 
|
370  | 
"ALL x:A. P" == "Ball(A, \%x. P)"  | 
|
371  | 
end  | 
|
372  | 
\end{ttbox}
 | 
|
373  | 
\caption{Macro example: set theory}\label{fig:set_trans}
 | 
|
374  | 
\end{figure}
 | 
|
375  | 
||
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
376  | 
The theory specifies a variable-binding syntax through additional productions  | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
377  | 
that have mixfix declarations. Each non-copy production must specify some  | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
378  | 
constant, which is used for building \AST{}s. \index{constants!syntactic} The
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
379  | 
additional constants are decorated with {\tt\at} to stress their purely
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
380  | 
syntactic purpose; they may not occur within the final well-typed terms,  | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
381  | 
being declared as {\tt syntax} rather than {\tt consts}.
 | 
| 323 | 382  | 
|
383  | 
The translations cause the replacement of external forms by internal forms  | 
|
384  | 
after parsing, and vice versa before printing of terms. As a specification  | 
|
385  | 
of the set theory notation, they should be largely self-explanatory. The  | 
|
386  | 
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
 | 
|
387  | 
appear implicitly in the macro rules via their mixfix forms.  | 
|
388  | 
||
389  | 
Macros can define variable-binding syntax because they operate on \AST{}s,
 | 
|
390  | 
which have no inbuilt notion of bound variable.  The macro variables {\tt
 | 
|
391  | 
  x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
 | 
|
392  | 
in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
 | 
|
393  | 
range over formulae containing bound variable occurrences.  | 
|
394  | 
||
395  | 
Other applications of the macro system can be less straightforward, and  | 
|
396  | 
there are peculiarities. The rest of this section will describe in detail  | 
|
397  | 
how Isabelle macros are preprocessed and applied.  | 
|
398  | 
||
399  | 
||
400  | 
\subsection{Specifying macros}
 | 
|
401  | 
Macros are basically rewrite rules on \AST{}s.  But unlike other macro
 | 
|
402  | 
systems found in programming languages, Isabelle's macros work in both  | 
|
403  | 
directions. Therefore a syntax contains two lists of rewrites: one for  | 
|
404  | 
parsing and one for printing.  | 
|
405  | 
||
406  | 
\index{*translations section}
 | 
|
407  | 
The {\tt translations} section specifies macros.  The syntax for a macro is
 | 
|
408  | 
\[ (root)\; string \quad  | 
|
409  | 
   \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
 | 
|
410  | 
\right\} \quad  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
411  | 
(root)\; string  | 
| 323 | 412  | 
\]  | 
413  | 
%  | 
|
414  | 
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
 | 
|
415  | 
({\tt ==}).  The two strings specify the left and right-hand sides of the
 | 
|
416  | 
macro rule. The $(root)$ specification is optional; it specifies the  | 
|
417  | 
nonterminal for parsing the $string$ and if omitted defaults to {\tt
 | 
|
418  | 
  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
 | 
|
419  | 
\begin{itemize}
 | 
|
420  | 
\item Rules must be left linear: $l$ must not contain repeated variables.  | 
|
421  | 
||
422  | 
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =  | 
|
423  | 
(\mtt"c\mtt" ~ x@1 \ldots x@n)$.  | 
|
424  | 
||
425  | 
\item Every variable in~$r$ must also occur in~$l$.  | 
|
426  | 
\end{itemize}
 | 
|
427  | 
||
428  | 
Macro rules may refer to any syntax from the parent theories. They may  | 
|
| 499 | 429  | 
also refer to anything defined before the {\tt .thy} file's {\tt
 | 
| 323 | 430  | 
translations} section --- including any mixfix declarations.  | 
431  | 
||
432  | 
Upon declaration, both sides of the macro rule undergo parsing and parse  | 
|
433  | 
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
 | 
|
434  | 
macro expansion. The lexer runs in a different mode that additionally  | 
|
435  | 
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
 | 
|
436  | 
{\tt _K}).  Thus, a constant whose name starts with an underscore can
 | 
|
437  | 
appear in macro rules but not in ordinary terms.  | 
|
438  | 
||
439  | 
Some atoms of the macro rule's \AST{} are designated as constants for
 | 
|
440  | 
matching. These are all names that have been declared as classes, types or  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
441  | 
constants (logical and syntactic).  | 
| 323 | 442  | 
|
443  | 
The result of this preprocessing is two lists of macro rules, each stored  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
444  | 
as a pair of \AST{}s.  They can be viewed using {\tt print_syntax}
 | 
| 323 | 445  | 
(sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
 | 
446  | 
theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
 | 
|
447  | 
\begin{ttbox}
 | 
|
448  | 
parse_rules:  | 
|
449  | 
  ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
 | 
|
450  | 
  ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
 | 
|
451  | 
  ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
 | 
|
452  | 
print_rules:  | 
|
453  | 
  ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
 | 
|
454  | 
  ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
 | 
|
455  | 
  ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
 | 
|
456  | 
\end{ttbox}
 | 
|
457  | 
||
458  | 
\begin{warn}
 | 
|
459  | 
Avoid choosing variable names that have previously been used as  | 
|
460  | 
  constants, types or type classes; the {\tt consts} section in the output
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
461  | 
  of {\tt print_syntax} lists all such names.  If a macro rule works
 | 
| 323 | 462  | 
incorrectly, inspect its internal form as shown above, recalling that  | 
463  | 
constants appear as quoted strings and variables without quotes.  | 
|
464  | 
\end{warn}
 | 
|
465  | 
||
466  | 
\begin{warn}
 | 
|
467  | 
If \ttindex{eta_contract} is set to {\tt true}, terms will be
 | 
|
468  | 
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
 | 
|
469  | 
abstraction nodes needed for print rules to match may vanish. For example,  | 
|
| 332 | 470  | 
\verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
 | 
| 323 | 471  | 
not apply and the output will be {\tt Ball(A, P)}.  This problem would not
 | 
472  | 
occur if \ML{} translation functions were used instead of macros (as is
 | 
|
473  | 
done for binder declarations).  | 
|
474  | 
\end{warn}
 | 
|
475  | 
||
476  | 
||
477  | 
\begin{warn}
 | 
|
478  | 
Another trap concerns type constraints.  If \ttindex{show_types} is set to
 | 
|
479  | 
{\tt true}, bound variables will be decorated by their meta types at the
 | 
|
480  | 
binding place (but not at occurrences in the body). Matching with  | 
|
481  | 
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
 | 
|
482  | 
"i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
483  | 
appear in the external form, say \verb|{y::i:A::i. P::o}|.
 | 
| 323 | 484  | 
|
485  | 
To allow such constraints to be re-read, your syntax should specify bound  | 
|
486  | 
variables using the nonterminal~\ndx{idt}.  This is the case in our
 | 
|
487  | 
example.  Choosing {\tt id} instead of {\tt idt} is a common error,
 | 
|
488  | 
especially since it appears in former versions of most of Isabelle's  | 
|
489  | 
object-logics.  | 
|
490  | 
\end{warn}
 | 
|
491  | 
||
492  | 
||
493  | 
||
494  | 
\subsection{Applying rules}
 | 
|
495  | 
As a term is being parsed or printed, an \AST{} is generated as an
 | 
|
496  | 
intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
 | 
|
| 332 | 497  | 
normalised by applying macro rules in the manner of a traditional term  | 
| 323 | 498  | 
rewriting system. We first examine how a single rule is applied.  | 
499  | 
||
| 332 | 500  | 
Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some  | 
| 323 | 501  | 
translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
 | 
502  | 
instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
 | 
|
503  | 
matched by $l$ may be replaced by the corresponding instance of~$r$, thus  | 
|
504  | 
{\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
 | 
|
505  | 
place-holders} that may occur in rule patterns but not in ordinary  | 
|
506  | 
\AST{}s; {\tt Variable} atoms serve this purpose.
 | 
|
507  | 
||
508  | 
The matching of the object~$u$ by the pattern~$l$ is performed as follows:  | 
|
509  | 
\begin{itemize}
 | 
|
510  | 
\item Every constant matches itself.  | 
|
511  | 
||
512  | 
\item $\Variable x$ in the object matches $\Constant x$ in the pattern.  | 
|
513  | 
This point is discussed further below.  | 
|
514  | 
||
515  | 
  \item Every \AST{} in the object matches $\Variable x$ in the pattern,
 | 
|
516  | 
binding~$x$ to~$u$.  | 
|
517  | 
||
518  | 
\item One application matches another if they have the same number of  | 
|
519  | 
subtrees and corresponding subtrees match.  | 
|
520  | 
||
521  | 
  \item In every other case, matching fails.  In particular, {\tt
 | 
|
522  | 
Constant}~$x$ can only match itself.  | 
|
523  | 
\end{itemize}
 | 
|
524  | 
A successful match yields a substitution that is applied to~$r$, generating  | 
|
525  | 
the instance that replaces~$u$.  | 
|
526  | 
||
527  | 
The second case above may look odd.  This is where {\tt Variable}s of
 | 
|
528  | 
non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
 | 
|
529  | 
far removed from parse trees; at this level it is not yet known which  | 
|
530  | 
identifiers will become constants, bounds, frees, types or classes. As  | 
|
531  | 
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
532  | 
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
533  | 
\ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s.  On the other
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
534  | 
hand, when \AST{}s generated from terms for printing, all constants and type
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
535  | 
constructors become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
536  | 
contain a messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
 | 
| 323 | 537  | 
insignificant at macro level because matching treats them alike.  | 
538  | 
||
539  | 
Because of this behaviour, different kinds of atoms with the same name are  | 
|
540  | 
indistinguishable, which may make some rules prone to misbehaviour. Example:  | 
|
541  | 
\begin{ttbox}
 | 
|
542  | 
types  | 
|
543  | 
Nil  | 
|
544  | 
consts  | 
|
545  | 
Nil :: "'a list"  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
546  | 
syntax  | 
| 323 | 547  | 
  "[]"    :: "'a list"    ("[]")
 | 
548  | 
translations  | 
|
549  | 
"[]" == "Nil"  | 
|
550  | 
\end{ttbox}
 | 
|
551  | 
The term {\tt Nil} will be printed as {\tt []}, just as expected.
 | 
|
552  | 
The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
553  | 
expected!  Guess how type~{\tt Nil} is printed?
 | 
| 323 | 554  | 
|
555  | 
Normalizing an \AST{} involves repeatedly applying macro rules until none
 | 
|
| 332 | 556  | 
are applicable. Macro rules are chosen in the order that they appear in the  | 
| 323 | 557  | 
{\tt translations} section.  You can watch the normalization of \AST{}s
 | 
558  | 
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
 | 
|
559  | 
{\tt true}.\index{tracing!of macros} Alternatively, use
 | 
|
560  | 
\ttindex{Syntax.test_read}.  The information displayed when tracing
 | 
|
561  | 
includes the \AST{} before normalization ({\tt pre}), redexes with results
 | 
|
562  | 
({\tt rewrote}), the normal form finally reached ({\tt post}) and some
 | 
|
563  | 
statistics ({\tt normalize}).  If tracing is off,
 | 
|
564  | 
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
 | 
|
565  | 
printing of the normal form and statistics only.  | 
|
566  | 
||
567  | 
||
568  | 
\subsection{Example: the syntax of finite sets}
 | 
|
569  | 
\index{examples!of macros}
 | 
|
570  | 
||
571  | 
This example demonstrates the use of recursive macros to implement a  | 
|
572  | 
convenient notation for finite sets.  | 
|
573  | 
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
 | 
|
574  | 
\index{"@Enum@{\tt\at Enum} constant}
 | 
|
575  | 
\index{"@Finset@{\tt\at Finset} constant}
 | 
|
576  | 
\begin{ttbox}
 | 
|
577  | 
FINSET = SET +  | 
|
578  | 
types  | 
|
579  | 
is  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
580  | 
syntax  | 
| 323 | 581  | 
  ""            :: "i => is"                ("_")
 | 
582  | 
  "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
583  | 
consts  | 
| 323 | 584  | 
  empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
 | 
585  | 
insert :: "[i, i] => i"  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
586  | 
syntax  | 
| 323 | 587  | 
  "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
 | 
588  | 
translations  | 
|
589  | 
  "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
 | 
|
590  | 
  "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
 | 
|
591  | 
end  | 
|
592  | 
\end{ttbox}
 | 
|
593  | 
Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
 | 
|
594  | 
declarations above specify \verb|{x, y, z}| as the external representation
 | 
|
595  | 
of  | 
|
596  | 
\begin{ttbox}
 | 
|
597  | 
insert(x, insert(y, insert(z, empty)))  | 
|
598  | 
\end{ttbox}
 | 
|
599  | 
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
 | 
|
600  | 
  i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
 | 
|
601  | 
allows a line break after the comma for \rmindex{pretty printing}; if no
 | 
|
602  | 
line break is required then a space is printed instead.  | 
|
603  | 
||
604  | 
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
605  | 
declaration.  Hence {\tt is} is not a logical type and may be used safely as
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
606  | 
a new nonterminal for custom syntax.  The nonterminal~{\tt is} can later be
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
607  | 
re-used for other enumerations of type~{\tt i} like lists or tuples. If we
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
608  | 
had needed polymorphic enumerations, we could have used the predefined  | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
609  | 
nonterminal symbol \ndx{args} and skipped this part altogether.
 | 
| 323 | 610  | 
|
611  | 
\index{"@Finset@{\tt\at Finset} constant}
 | 
|
612  | 
Next follows {\tt empty}, which is already equipped with its syntax
 | 
|
613  | 
\verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
 | 
|
614  | 
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
 | 
|
615  | 
i} enclosed in curly braces. Remember that a pair of parentheses, as in  | 
|
616  | 
\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
 | 
|
617  | 
||
618  | 
The translations may look strange at first. Macro rules are best  | 
|
619  | 
understood in their internal forms:  | 
|
620  | 
\begin{ttbox}
 | 
|
621  | 
parse_rules:  | 
|
622  | 
  ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
 | 
|
623  | 
  ("{\at}Finset" x)  ->  ("insert" x "empty")
 | 
|
624  | 
print_rules:  | 
|
625  | 
  ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
 | 
|
626  | 
  ("insert" x "empty")  ->  ("{\at}Finset" x)
 | 
|
627  | 
\end{ttbox}
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
628  | 
This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
 | 
| 323 | 629  | 
two elements, binding the first to {\tt x} and the rest to {\tt xs}.
 | 
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
630  | 
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
 | 
| 323 | 631  | 
The parse rules only work in the order given.  | 
632  | 
||
633  | 
\begin{warn}
 | 
|
| 332 | 634  | 
  The \AST{} rewriter cannot distinguish constants from variables and looks
 | 
| 323 | 635  | 
  only for names of atoms.  Thus the names of {\tt Constant}s occurring in
 | 
636  | 
the (internal) left-hand side of translation rules should be regarded as  | 
|
637  | 
  \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
 | 
|
638  | 
sufficiently long and strange names. If a bound variable's name gets  | 
|
639  | 
rewritten, the result will be incorrect; for example, the term  | 
|
640  | 
\begin{ttbox}
 | 
|
641  | 
\%empty insert. insert(x, empty)  | 
|
642  | 
\end{ttbox}
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
643  | 
\par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
 | 
| 323 | 644  | 
\end{warn}
 | 
645  | 
||
646  | 
||
647  | 
\subsection{Example: a parse macro for dependent types}\label{prod_trans}
 | 
|
648  | 
\index{examples!of macros}
 | 
|
649  | 
||
650  | 
As stated earlier, a macro rule may not introduce new {\tt Variable}s on
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
651  | 
the right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal;  | 
| 323 | 652  | 
if allowed, it could cause variable capture. In such cases you usually  | 
653  | 
must fall back on translation functions. But a trick can make things  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
654  | 
readable in some cases: {\em calling\/} translation functions by parse
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
655  | 
macros:  | 
| 323 | 656  | 
\begin{ttbox}
 | 
657  | 
PROD = FINSET +  | 
|
658  | 
consts  | 
|
659  | 
Pi :: "[i, i => i] => i"  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
660  | 
syntax  | 
| 323 | 661  | 
  "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
 | 
662  | 
  "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
 | 
|
663  | 
\ttbreak  | 
|
664  | 
translations  | 
|
665  | 
"PROD x:A. B" => "Pi(A, \%x. B)"  | 
|
666  | 
"A -> B" => "Pi(A, _K(B))"  | 
|
667  | 
end  | 
|
668  | 
ML  | 
|
669  | 
  val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
 | 
|
670  | 
\end{ttbox}
 | 
|
671  | 
||
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
672  | 
Here {\tt Pi} is a logical constant for constructing general products.
 | 
| 323 | 673  | 
Two external forms exist: the general case {\tt PROD x:A.B} and the
 | 
674  | 
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
 | 
|
675  | 
does not depend on~{\tt x}.
 | 
|
676  | 
||
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
677  | 
The second parse macro introduces {\tt _K(B)}, which later becomes
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
678  | 
\verb|%x.B| due to a parse translation associated with \cdx{_K}.
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
679  | 
Unfortunately there is no such trick for printing, so we have to add a {\tt
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
680  | 
ML} section for the print translation \ttindex{dependent_tr'}.
 | 
| 323 | 681  | 
|
682  | 
Recall that identifiers with a leading {\tt _} are allowed in translation
 | 
|
683  | 
rules, but not in ordinary terms.  Thus we can create \AST{}s containing
 | 
|
684  | 
names that are not directly expressible.  | 
|
685  | 
||
686  | 
The parse translation for {\tt _K} is already installed in Pure, and {\tt
 | 
|
687  | 
dependent_tr'} is exported by the syntax module for public use. See  | 
|
688  | 
\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
 | 
|
689  | 
\index{macros|)}\index{rewriting!syntactic|)}
 | 
|
690  | 
||
691  | 
||
692  | 
\section{Translation functions} \label{sec:tr_funs}
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
693  | 
\index{translations|(}
 | 
| 323 | 694  | 
%  | 
695  | 
This section describes the translation function mechanism. By writing  | 
|
696  | 
\ML{} functions, you can do almost everything with terms or \AST{}s during
 | 
|
697  | 
parsing and printing. The logic \LK\ is a good example of sophisticated  | 
|
| 332 | 698  | 
transformations between internal and external representations of sequents;  | 
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
699  | 
here, macros would be useless.  | 
| 323 | 700  | 
|
701  | 
A full understanding of translations requires some familiarity  | 
|
702  | 
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
 | 
|
703  | 
{\tt Syntax.ast} and the encodings of types and terms as such at the various
 | 
|
704  | 
stages of the parsing or printing process. Most users should never need to  | 
|
705  | 
use translation functions.  | 
|
706  | 
||
707  | 
\subsection{Declaring translation functions}
 | 
|
708  | 
There are four kinds of translation functions. Each such function is  | 
|
709  | 
associated with a name, which triggers calls to it. Such names can be  | 
|
710  | 
constants (logical or syntactic) or type constructors.  | 
|
711  | 
||
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
712  | 
{\tt print_syntax} displays the sets of names associated with the
 | 
| 323 | 713  | 
translation functions of a {\tt Syntax.syntax} under
 | 
714  | 
\ttindex{parse_ast_translation}, \ttindex{parse_translation},
 | 
|
715  | 
\ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
 | 
|
716  | 
add new ones via the {\tt ML} section\index{*ML section} of
 | 
|
717  | 
a {\tt .thy} file.  There may never be more than one function of the same
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
718  | 
kind per name.  Even though the {\tt ML} section is the very last part of a
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
719  | 
{\tt .thy} file, newly installed translation functions are effective when
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
720  | 
processing the preceding section.  | 
| 323 | 721  | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
722  | 
The {\tt ML} section is copied verbatim near the beginning of the \ML\ file
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
723  | 
generated from a {\tt .thy} file.  Definitions made here are accessible as
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
724  | 
components of an \ML\ structure; to make some definitions private, use an  | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
725  | 
\ML{} {\tt local} declaration.  The {\tt ML} section may install translation
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
726  | 
functions by declaring any of the following identifiers:  | 
| 323 | 727  | 
\begin{ttbox}
 | 
728  | 
val parse_ast_translation : (string * (ast list -> ast)) list  | 
|
729  | 
val print_ast_translation : (string * (ast list -> ast)) list  | 
|
730  | 
val parse_translation : (string * (term list -> term)) list  | 
|
731  | 
val print_translation : (string * (term list -> term)) list  | 
|
732  | 
\end{ttbox}
 | 
|
733  | 
||
734  | 
\subsection{The translation strategy}
 | 
|
735  | 
All four kinds of translation functions are treated similarly. They are  | 
|
736  | 
called during the transformations between parse trees, \AST{}s and terms
 | 
|
737  | 
(recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
 | 
|
738  | 
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function  | 
|
739  | 
$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
 | 
|
740  | 
function call $f \mtt[ x@1, \ldots, x@n \mtt]$.  | 
|
741  | 
||
742  | 
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
 | 
|
743  | 
combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
 | 
|
744  | 
x@n}$. For term translations, the arguments are terms and a combination  | 
|
745  | 
has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
 | 
|
746  | 
x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated  | 
|
747  | 
transformations than \AST{}s do, typically involving abstractions and bound
 | 
|
748  | 
variables.  | 
|
749  | 
||
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
750  | 
Regardless of whether they act on terms or \AST{}s, parse translations differ
 | 
| 
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
751  | 
from print translations in their overall behaviour fundamentally:  | 
| 323 | 752  | 
\begin{description}
 | 
753  | 
\item[Parse translations] are applied bottom-up. The arguments are already  | 
|
754  | 
in translated form. The translations must not fail; exceptions trigger  | 
|
755  | 
an error message.  | 
|
756  | 
||
757  | 
\item[Print translations] are applied top-down. They are supplied with  | 
|
758  | 
arguments that are partly still in internal form. The result again  | 
|
759  | 
undergoes translation; therefore a print translation should not introduce  | 
|
760  | 
as head the very constant that invoked it. The function may raise  | 
|
761  | 
  exception \xdx{Match} to indicate failure; in this event it has no
 | 
|
762  | 
effect.  | 
|
763  | 
\end{description}
 | 
|
764  | 
||
765  | 
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
 | 
|
766  | 
\ttindex{Const} for terms --- can invoke translation functions.  This
 | 
|
767  | 
causes another difference between parsing and printing.  | 
|
768  | 
||
769  | 
Parsing starts with a string and the constants are not yet identified.  | 
|
770  | 
Only parse tree heads create {\tt Constant}s in the resulting \AST, as
 | 
|
771  | 
described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
 | 
|
772  | 
introduce further {\tt Constant}s.  When the final \AST{} is converted to a
 | 
|
773  | 
term, all {\tt Constant}s become {\tt Const}s, as described in
 | 
|
774  | 
\S\ref{sec:termofast}.
 | 
|
775  | 
||
776  | 
Printing starts with a well-typed term and all the constants are known. So  | 
|
777  | 
all logical constants and type constructors may invoke print translations.  | 
|
778  | 
These, and macros, may introduce further constants.  | 
|
779  | 
||
780  | 
||
781  | 
\subsection{Example: a print translation for dependent types}
 | 
|
782  | 
\index{examples!of translations}\indexbold{*dependent_tr'}
 | 
|
783  | 
||
784  | 
Let us continue the dependent type example (page~\pageref{prod_trans}) by
 | 
|
785  | 
examining the parse translation for~\cdx{_K} and the print translation
 | 
|
786  | 
{\tt dependent_tr'}, which are both built-in.  By convention, parse
 | 
|
787  | 
translations have names ending with {\tt _tr} and print translations have
 | 
|
788  | 
names ending with {\tt _tr'}.  Search for such names in the Isabelle
 | 
|
789  | 
sources to locate more examples.  | 
|
790  | 
||
791  | 
Here is the parse translation for {\tt _K}:
 | 
|
792  | 
\begin{ttbox}
 | 
|
793  | 
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
794  | 
  | k_tr ts = raise TERM ("k_tr", ts);
 | 
| 323 | 795  | 
\end{ttbox}
 | 
796  | 
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
 | 
|
797  | 
{\tt Abs} node with a body derived from $t$.  Since terms given to parse
 | 
|
798  | 
translations are not yet typed, the type of the bound variable in the new  | 
|
799  | 
{\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
 | 
|
800  | 
nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
 | 
|
801  | 
a basic term manipulation function defined in {\tt Pure/term.ML}.
 | 
|
802  | 
||
803  | 
Here is the print translation for dependent types:  | 
|
804  | 
\begin{ttbox}
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
805  | 
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =  | 
| 323 | 806  | 
if 0 mem (loose_bnos B) then  | 
807  | 
let val (x', B') = variant_abs (x, dummyT, B);  | 
|
808  | 
in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)  | 
|
809  | 
end  | 
|
810  | 
else list_comb (Const (r, dummyT) $ A $ B, ts)  | 
|
811  | 
| dependent_tr' _ _ = raise Match;  | 
|
812  | 
\end{ttbox}
 | 
|
| 332 | 813  | 
The argument {\tt (q,r)} is supplied to the curried function {\tt
 | 
814  | 
dependent_tr'} by a partial application during its installation. We  | 
|
815  | 
can set up print translations for both {\tt Pi} and {\tt Sigma} by
 | 
|
816  | 
including  | 
|
| 323 | 817  | 
\begin{ttbox}\index{*ML section}
 | 
818  | 
val print_translation =  | 
|
819  | 
  [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
 | 
|
820  | 
   ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
 | 
|
821  | 
\end{ttbox}
 | 
|
822  | 
within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
 | 
|
823  | 
\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
 | 
|
| 332 | 824  | 
$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not depend
 | 
| 323 | 825  | 
on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
 | 
826  | 
from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
 | 
|
| 332 | 827  | 
from all names in $B$, and $B'$ is the body $B$ with {\tt Bound} nodes
 | 
828  | 
referring to the {\tt Abs} node replaced by $\ttfct{Free} (x',
 | 
|
| 323 | 829  | 
\mtt{dummyT})$.
 | 
830  | 
||
831  | 
We must be careful with types here.  While types of {\tt Const}s are
 | 
|
832  | 
ignored, type constraints may be printed for some {\tt Free}s and
 | 
|
833  | 
{\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
 | 
|
834  | 
\ttindex{dummyT} are never printed with constraint, though.  The line
 | 
|
835  | 
\begin{ttbox}
 | 
|
836  | 
let val (x', B') = variant_abs (x, dummyT, B);  | 
|
837  | 
\end{ttbox}\index{*variant_abs}
 | 
|
838  | 
replaces bound variable occurrences in~$B$ by the free variable $x'$ with  | 
|
839  | 
type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
 | 
|
840  | 
correct type~{\tt T}, so this is the only place where a type
 | 
|
| 
864
 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 
wenzelm 
parents: 
499 
diff
changeset
 | 
841  | 
constraint might appear.  | 
| 323 | 842  | 
\index{translations|)}
 | 
843  | 
\index{syntax!transformations|)}
 |