author | wenzelm |
Tue, 19 Jun 2012 22:06:08 +0200 | |
changeset 48117 | e21f4d5b9636 |
parent 48116 | fd773574cb81 |
child 48118 | 8537313dd261 |
permissions | -rw-r--r-- |
30184
37969710e61f
removed parts of the manual that are clearly obsolete, or covered by
wenzelm
parents:
14947
diff
changeset
|
1 |
|
323 | 2 |
\chapter{Syntax Transformations} \label{chap:syntax} |
3 |
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}} |
|
4 |
\newcommand\mtt[1]{\mbox{\tt #1}} |
|
5 |
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits} |
|
6 |
\newcommand\Constant{\ttfct{Constant}} |
|
7 |
\newcommand\Variable{\ttfct{Variable}} |
|
8 |
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]} |
|
9 |
\index{syntax!transformations|(} |
|
10 |
||
11 |
||
6618 | 12 |
\section{Transforming parse trees to ASTs}\label{sec:astofpt} |
323 | 13 |
\index{ASTs!made from parse trees} |
14 |
\newcommand\astofpt[1]{\lbrakk#1\rbrakk} |
|
15 |
||
16 |
The parse tree is the raw output of the parser. Translation functions, |
|
17 |
called {\bf parse AST translations}\indexbold{translations!parse AST}, |
|
18 |
transform the parse tree into an abstract syntax tree. |
|
19 |
||
20 |
The parse tree is constructed by nesting the right-hand sides of the |
|
21 |
productions used to recognize the input. Such parse trees are simply lists |
|
22 |
of tokens and constituent parse trees, the latter representing the |
|
23 |
nonterminals of the productions. Let us refer to the actual productions in |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
24 |
the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
25 |
example). |
323 | 26 |
|
27 |
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s |
|
28 |
by stripping out delimiters and copy productions. More precisely, the |
|
29 |
mapping $\astofpt{-}$ is derived from the productions as follows: |
|
30 |
\begin{itemize} |
|
31 |
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, |
|
14947 | 32 |
\ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token, |
33 |
and $s$ its associated string. Note that for {\tt xstr} this does not |
|
34 |
include the quotes. |
|
323 | 35 |
|
36 |
\item Copy productions:\index{productions!copy} |
|
37 |
$\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for |
|
38 |
strings of delimiters, which are discarded. $P$ stands for the single |
|
39 |
constituent that is not a delimiter; it is either a nonterminal symbol or |
|
40 |
a name token. |
|
41 |
||
42 |
\item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. |
|
43 |
Here there are no constituents other than delimiters, which are |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
44 |
discarded. |
323 | 45 |
|
46 |
\item $n$-ary productions, where $n \ge 1$: delimiters are discarded and |
|
47 |
the remaining constituents $P@1$, \ldots, $P@n$ are built into an |
|
48 |
application whose head constant is~$c$: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
49 |
\[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = |
323 | 50 |
\Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} |
51 |
\] |
|
52 |
\end{itemize} |
|
53 |
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, |
|
54 |
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. |
|
55 |
These examples illustrate the need for further translations to make \AST{}s |
|
56 |
closer to the typed $\lambda$-calculus. The Pure syntax provides |
|
57 |
predefined parse \AST{} translations\index{translations!parse AST} for |
|
58 |
ordinary applications, type applications, nested abstractions, meta |
|
59 |
implications and function types. Figure~\ref{fig:parse_ast_tr} shows their |
|
60 |
effect on some representative input strings. |
|
61 |
||
62 |
||
63 |
\begin{figure} |
|
64 |
\begin{center} |
|
65 |
\tt\begin{tabular}{ll} |
|
66 |
\rm input string & \rm \AST \\\hline |
|
67 |
"f" & f \\ |
|
68 |
"'a" & 'a \\ |
|
69 |
"t == u" & ("==" t u) \\ |
|
70 |
"f(x)" & ("_appl" f x) \\ |
|
71 |
"f(x, y)" & ("_appl" f ("_args" x y)) \\ |
|
72 |
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ |
|
73 |
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ |
|
74 |
\end{tabular} |
|
75 |
\end{center} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
76 |
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} |
323 | 77 |
\end{figure} |
78 |
||
79 |
\begin{figure} |
|
80 |
\begin{center} |
|
81 |
\tt\begin{tabular}{ll} |
|
82 |
\rm input string & \rm \AST{} \\\hline |
|
83 |
"f(x, y, z)" & (f x y z) \\ |
|
84 |
"'a ty" & (ty 'a) \\ |
|
85 |
"('a, 'b) ty" & (ty 'a 'b) \\ |
|
86 |
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ |
|
87 |
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ |
|
88 |
"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ |
|
89 |
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) |
|
90 |
\end{tabular} |
|
91 |
\end{center} |
|
92 |
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr} |
|
93 |
\end{figure} |
|
94 |
||
95 |
The names of constant heads in the \AST{} control the translation process. |
|
96 |
The list of constants invoking parse \AST{} translations appears in the |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
97 |
output of {\tt print_syntax} under {\tt parse_ast_translation}. |
323 | 98 |
|
99 |
||
6618 | 100 |
\section{Transforming ASTs to terms}\label{sec:termofast} |
323 | 101 |
\index{terms!made from ASTs} |
102 |
\newcommand\termofast[1]{\lbrakk#1\rbrakk} |
|
103 |
||
104 |
The \AST{}, after application of macros (see \S\ref{sec:macros}), is |
|
105 |
transformed into a term. This term is probably ill-typed since type |
|
106 |
inference has not occurred yet. The term may contain type constraints |
|
107 |
consisting of applications with head {\tt "_constrain"}; the second |
|
108 |
argument is a type encoded as a term. Type inference later introduces |
|
109 |
correct types or rejects the input. |
|
110 |
||
111 |
Another set of translation functions, namely parse |
|
112 |
translations\index{translations!parse}, may affect this process. If we |
|
113 |
ignore parse translations for the time being, then \AST{}s are transformed |
|
114 |
to terms by mapping \AST{} constants to constants, \AST{} variables to |
|
115 |
schematic or free variables and \AST{} applications to applications. |
|
116 |
||
117 |
More precisely, the mapping $\termofast{-}$ is defined by |
|
118 |
\begin{itemize} |
|
119 |
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x, |
|
120 |
\mtt{dummyT})$. |
|
121 |
||
122 |
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} = |
|
123 |
\ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$ |
|
124 |
the index extracted from~$xi$. |
|
125 |
||
126 |
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x, |
|
127 |
\mtt{dummyT})$. |
|
128 |
||
129 |
\item Function applications with $n$ arguments: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
130 |
\[ \termofast{\Appl{f, x@1, \ldots, x@n}} = |
323 | 131 |
\termofast{f} \ttapp |
132 |
\termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n} |
|
133 |
\] |
|
134 |
\end{itemize} |
|
135 |
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and |
|
136 |
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term}, |
|
137 |
while \ttindex{dummyT} stands for some dummy type that is ignored during |
|
138 |
type inference. |
|
139 |
||
140 |
So far the outcome is still a first-order term. Abstractions and bound |
|
141 |
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced |
|
142 |
by parse translations. Such translations are attached to {\tt "_abs"}, |
|
143 |
{\tt "!!"} and user-defined binders. |
|
144 |
||
145 |
||
146 |
\section{Printing of terms} |
|
147 |
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms} |
|
148 |
||
149 |
The output phase is essentially the inverse of the input phase. Terms are |
|
150 |
translated via abstract syntax trees into strings. Finally the strings are |
|
151 |
pretty printed. |
|
152 |
||
153 |
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of |
|
154 |
terms into \AST{}s. Ignoring those, the transformation maps |
|
155 |
term constants, variables and applications to the corresponding constructs |
|
156 |
on \AST{}s. Abstractions are mapped to applications of the special |
|
157 |
constant {\tt _abs}. |
|
158 |
||
159 |
More precisely, the mapping $\astofterm{-}$ is defined as follows: |
|
160 |
\begin{itemize} |
|
161 |
\item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$. |
|
162 |
||
163 |
\item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x, |
|
164 |
\tau)$. |
|
165 |
||
166 |
\item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable |
|
167 |
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of |
|
168 |
the {\tt indexname} $(x, i)$. |
|
169 |
||
170 |
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant |
|
171 |
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ |
|
172 |
be obtained from~$t$ by replacing all bound occurrences of~$x$ by |
|
173 |
the free variable $x'$. This replaces corresponding occurrences of the |
|
174 |
constructor \ttindex{Bound} by the term $\ttfct{Free} (x', |
|
175 |
\mtt{dummyT})$: |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
176 |
\[ \astofterm{\ttfct{Abs} (x, \tau, t)} = |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
177 |
\Appl{\Constant \mtt{"_abs"}, |
8136 | 178 |
constrain(\Variable x', \tau), \astofterm{t'}} |
323 | 179 |
\] |
180 |
||
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
181 |
\item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. |
323 | 182 |
The occurrence of constructor \ttindex{Bound} should never happen |
183 |
when printing well-typed terms; it indicates a de Bruijn index with no |
|
184 |
matching abstraction. |
|
185 |
||
186 |
\item Where $f$ is not an application, |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
187 |
\[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = |
323 | 188 |
\Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} |
189 |
\] |
|
190 |
\end{itemize} |
|
191 |
% |
|
332 | 192 |
Type constraints\index{type constraints} are inserted to allow the printing |
193 |
of types. This is governed by the boolean variable \ttindex{show_types}: |
|
323 | 194 |
\begin{itemize} |
195 |
\item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or |
|
332 | 196 |
\ttindex{show_types} is set to {\tt false}. |
323 | 197 |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
198 |
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, |
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
199 |
\astofterm{\tau}}$ \ otherwise. |
323 | 200 |
|
201 |
Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type |
|
202 |
constructors go to {\tt Constant}s; type identifiers go to {\tt |
|
203 |
Variable}s; type applications go to {\tt Appl}s with the type |
|
204 |
constructor as the first element. If \ttindex{show_sorts} is set to |
|
205 |
{\tt true}, some type variables are decorated with an \AST{} encoding |
|
206 |
of their sort. |
|
207 |
\end{itemize} |
|
208 |
% |
|
209 |
The \AST{}, after application of macros (see \S\ref{sec:macros}), is |
|
210 |
transformed into the final output string. The built-in {\bf print AST |
|
332 | 211 |
translations}\indexbold{translations!print AST} reverse the |
323 | 212 |
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}. |
213 |
||
214 |
For the actual printing process, the names attached to productions |
|
215 |
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play |
|
216 |
a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or |
|
217 |
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production |
|
218 |
for~$c$. Each argument~$x@i$ is converted to a string, and put in |
|
219 |
parentheses if its priority~$(p@i)$ requires this. The resulting strings |
|
220 |
and their syntactic sugar (denoted by \dots{} above) are joined to make a |
|
221 |
single string. |
|
222 |
||
3108 | 223 |
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments |
224 |
than the corresponding production, it is first split into |
|
225 |
$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications |
|
226 |
with too few arguments or with non-constant head or without a |
|
227 |
corresponding production are printed as $f(x@1, \ldots, x@l)$ or |
|
228 |
$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated |
|
229 |
with some name $c$ are tried in order of appearance. An occurrence of |
|
323 | 230 |
$\Variable x$ is simply printed as~$x$. |
231 |
||
232 |
Blanks are {\em not\/} inserted automatically. If blanks are required to |
|
233 |
separate tokens, specify them in the mixfix declaration, possibly preceded |
|
234 |
by a slash~({\tt/}) to allow a line break. |
|
235 |
\index{ASTs|)} |
|
236 |
||
237 |
||
238 |
\section{Translation functions} \label{sec:tr_funs} |
|
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
499
diff
changeset
|
239 |
\index{translations|(} |
323 | 240 |
% |
9695 | 241 |
This section describes the translation function mechanism. By writing \ML{} |
242 |
functions, you can do almost everything to terms or \AST{}s during parsing and |
|
243 |
printing. The logic LK is a good example of sophisticated transformations |
|
244 |
between internal and external representations of sequents; here, macros would |
|
245 |
be useless. |
|
323 | 246 |
|
247 |
A full understanding of translations requires some familiarity |
|
248 |
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ}, |
|
249 |
{\tt Syntax.ast} and the encodings of types and terms as such at the various |
|
250 |
stages of the parsing or printing process. Most users should never need to |
|
251 |
use translation functions. |
|
252 |
||
253 |
\subsection{Declaring translation functions} |
|
3108 | 254 |
There are four kinds of translation functions, with one of these |
255 |
coming in two variants. Each such function is associated with a name, |
|
256 |
which triggers calls to it. Such names can be constants (logical or |
|
257 |
syntactic) or type constructors. |
|
323 | 258 |
|
6343 | 259 |
Function {\tt print_syntax} displays the sets of names associated with the |
260 |
translation functions of a theory under \texttt{parse_ast_translation}, etc. |
|
261 |
You can add new ones via the {\tt ML} section\index{*ML section} of a theory |
|
262 |
definition file. Even though the {\tt ML} section is the very last part of |
|
263 |
the file, newly installed translation functions are already effective when |
|
264 |
processing all of the preceding sections. |
|
323 | 265 |
|
3108 | 266 |
The {\tt ML} section's contents are simply copied verbatim near the |
267 |
beginning of the \ML\ file generated from a theory definition file. |
|
268 |
Definitions made here are accessible as components of an \ML\ |
|
269 |
structure; to make some parts private, use an \ML{} {\tt local} |
|
270 |
declaration. The {\ML} code may install translation functions by |
|
271 |
declaring any of the following identifiers: |
|
323 | 272 |
\begin{ttbox} |
3108 | 273 |
val parse_ast_translation : (string * (ast list -> ast)) list |
274 |
val print_ast_translation : (string * (ast list -> ast)) list |
|
275 |
val parse_translation : (string * (term list -> term)) list |
|
276 |
val print_translation : (string * (term list -> term)) list |
|
4375 | 277 |
val typed_print_translation : |
278 |
(string * (bool -> typ -> term list -> term)) list |
|
323 | 279 |
\end{ttbox} |
280 |
||
281 |
\subsection{The translation strategy} |
|
3108 | 282 |
The different kinds of translation functions are called during the |
283 |
transformations between parse trees, \AST{}s and terms (recall |
|
284 |
Fig.\ts\ref{fig:parse_print}). Whenever a combination of the form |
|
285 |
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation |
|
286 |
function $f$ of appropriate kind exists for $c$, the result is |
|
287 |
computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$. |
|
323 | 288 |
|
3108 | 289 |
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s. |
290 |
A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, |
|
291 |
\ldots, x@n}$. For term translations, the arguments are terms and a |
|
292 |
combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} |
|
293 |
(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow more |
|
294 |
sophisticated transformations than \AST{}s do, typically involving |
|
295 |
abstractions and bound variables. {\em Typed} print translations may |
|
4375 | 296 |
even peek at the type $\tau$ of the constant they are invoked on; they |
297 |
are also passed the current value of the \ttindex{show_sorts} flag. |
|
323 | 298 |
|
3108 | 299 |
Regardless of whether they act on terms or \AST{}s, translation |
300 |
functions called during the parsing process differ from those for |
|
301 |
printing more fundamentally in their overall behaviour: |
|
323 | 302 |
\begin{description} |
6343 | 303 |
\item[Parse translations] are applied bottom-up. The arguments are already in |
304 |
translated form. The translations must not fail; exceptions trigger an |
|
305 |
error message. There may never be more than one function associated with |
|
306 |
any syntactic name. |
|
307 |
||
323 | 308 |
\item[Print translations] are applied top-down. They are supplied with |
309 |
arguments that are partly still in internal form. The result again |
|
6343 | 310 |
undergoes translation; therefore a print translation should not introduce as |
311 |
head the very constant that invoked it. The function may raise exception |
|
312 |
\xdx{Match} to indicate failure; in this event it has no effect. Multiple |
|
313 |
functions associated with some syntactic name are tried in an unspecified |
|
314 |
order. |
|
323 | 315 |
\end{description} |
316 |
||
317 |
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and |
|
318 |
\ttindex{Const} for terms --- can invoke translation functions. This |
|
319 |
causes another difference between parsing and printing. |
|
320 |
||
321 |
Parsing starts with a string and the constants are not yet identified. |
|
322 |
Only parse tree heads create {\tt Constant}s in the resulting \AST, as |
|
323 |
described in \S\ref{sec:astofpt}. Macros and parse \AST{} translations may |
|
324 |
introduce further {\tt Constant}s. When the final \AST{} is converted to a |
|
325 |
term, all {\tt Constant}s become {\tt Const}s, as described in |
|
326 |
\S\ref{sec:termofast}. |
|
327 |
||
328 |
Printing starts with a well-typed term and all the constants are known. So |
|
329 |
all logical constants and type constructors may invoke print translations. |
|
330 |
These, and macros, may introduce further constants. |
|
331 |
||
5371 | 332 |
%%% Local Variables: |
333 |
%%% mode: latex |
|
334 |
%%% TeX-master: "ref" |
|
335 |
%%% End: |