| author | blanchet | 
| Wed, 11 Jul 2012 09:32:29 +0200 | |
| changeset 48240 | 6a8d18798161 | 
| parent 48118 | 8537313dd261 | 
| permissions | -rw-r--r-- | 
| 30184 
37969710e61f
removed parts of the manual that are clearly obsolete, or covered by
 wenzelm parents: 
14947diff
changeset | 1 | |
| 323 | 2 | \chapter{Syntax Transformations} \label{chap:syntax}
 | 
| 3 | \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
 | |
| 4 | \newcommand\mtt[1]{\mbox{\tt #1}}
 | |
| 5 | \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
 | |
| 6 | \newcommand\Constant{\ttfct{Constant}}
 | |
| 7 | \newcommand\Variable{\ttfct{Variable}}
 | |
| 8 | \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
 | |
| 9 | \index{syntax!transformations|(}
 | |
| 10 | ||
| 11 | ||
| 6618 | 12 | \section{Transforming parse trees to ASTs}\label{sec:astofpt}
 | 
| 323 | 13 | \index{ASTs!made from parse trees}
 | 
| 14 | \newcommand\astofpt[1]{\lbrakk#1\rbrakk}
 | |
| 15 | ||
| 16 | The parse tree is the raw output of the parser. Translation functions, | |
| 17 | called {\bf parse AST translations}\indexbold{translations!parse AST},
 | |
| 18 | transform the parse tree into an abstract syntax tree. | |
| 19 | ||
| 20 | The parse tree is constructed by nesting the right-hand sides of the | |
| 21 | productions used to recognize the input. Such parse trees are simply lists | |
| 22 | of tokens and constituent parse trees, the latter representing the | |
| 23 | nonterminals of the productions. Let us refer to the actual productions in | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 24 | the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 25 | example). | 
| 323 | 26 | |
| 27 | Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
 | |
| 28 | by stripping out delimiters and copy productions. More precisely, the | |
| 29 | mapping $\astofpt{-}$ is derived from the productions as follows:
 | |
| 30 | \begin{itemize}
 | |
| 31 | \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
 | |
| 14947 | 32 |   \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token,
 | 
| 33 |   and $s$ its associated string.  Note that for {\tt xstr} this does not
 | |
| 34 | include the quotes. | |
| 323 | 35 | |
| 36 | \item Copy productions:\index{productions!copy}
 | |
| 37 |   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
 | |
| 38 | strings of delimiters, which are discarded. $P$ stands for the single | |
| 39 | constituent that is not a delimiter; it is either a nonterminal symbol or | |
| 40 | a name token. | |
| 41 | ||
| 42 |   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
 | |
| 43 | Here there are no constituents other than delimiters, which are | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 44 | discarded. | 
| 323 | 45 | |
| 46 | \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and | |
| 47 | the remaining constituents $P@1$, \ldots, $P@n$ are built into an | |
| 48 | application whose head constant is~$c$: | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 49 |     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
 | 
| 323 | 50 |        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
 | 
| 51 | \] | |
| 52 | \end{itemize}
 | |
| 53 | Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
 | |
| 54 | {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
 | |
| 55 | These examples illustrate the need for further translations to make \AST{}s
 | |
| 56 | closer to the typed $\lambda$-calculus. The Pure syntax provides | |
| 57 | predefined parse \AST{} translations\index{translations!parse AST} for
 | |
| 58 | ordinary applications, type applications, nested abstractions, meta | |
| 59 | implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
 | |
| 60 | effect on some representative input strings. | |
| 61 | ||
| 62 | ||
| 63 | \begin{figure}
 | |
| 64 | \begin{center}
 | |
| 65 | \tt\begin{tabular}{ll}
 | |
| 66 | \rm input string & \rm \AST \\\hline | |
| 67 | "f" & f \\ | |
| 68 | "'a" & 'a \\ | |
| 69 | "t == u"            & ("==" t u) \\
 | |
| 70 | "f(x)"              & ("_appl" f x) \\
 | |
| 71 | "f(x, y)"           & ("_appl" f ("_args" x y)) \\
 | |
| 72 | "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
 | |
| 73 | "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
 | |
| 74 | \end{tabular}
 | |
| 75 | \end{center}
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 76 | \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
 | 
| 323 | 77 | \end{figure}
 | 
| 78 | ||
| 79 | \begin{figure}
 | |
| 80 | \begin{center}
 | |
| 81 | \tt\begin{tabular}{ll}
 | |
| 82 | \rm input string            & \rm \AST{} \\\hline
 | |
| 83 | "f(x, y, z)" & (f x y z) \\ | |
| 84 | "'a ty" & (ty 'a) \\ | |
| 85 | "('a, 'b) ty"               & (ty 'a 'b) \\
 | |
| 86 | "\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
 | |
| 87 | "\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
 | |
| 88 | "[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
 | |
| 89 | "['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
 | |
| 90 | \end{tabular}
 | |
| 91 | \end{center}
 | |
| 92 | \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
 | |
| 93 | \end{figure}
 | |
| 94 | ||
| 95 | The names of constant heads in the \AST{} control the translation process.
 | |
| 96 | The list of constants invoking parse \AST{} translations appears in the
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 97 | output of {\tt print_syntax} under {\tt parse_ast_translation}.
 | 
| 323 | 98 | |
| 99 | ||
| 6618 | 100 | \section{Transforming ASTs to terms}\label{sec:termofast}
 | 
| 323 | 101 | \index{terms!made from ASTs}
 | 
| 102 | \newcommand\termofast[1]{\lbrakk#1\rbrakk}
 | |
| 103 | ||
| 104 | The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | |
| 105 | transformed into a term. This term is probably ill-typed since type | |
| 106 | inference has not occurred yet. The term may contain type constraints | |
| 107 | consisting of applications with head {\tt "_constrain"}; the second
 | |
| 108 | argument is a type encoded as a term. Type inference later introduces | |
| 109 | correct types or rejects the input. | |
| 110 | ||
| 111 | Another set of translation functions, namely parse | |
| 112 | translations\index{translations!parse}, may affect this process.  If we
 | |
| 113 | ignore parse translations for the time being, then \AST{}s are transformed
 | |
| 114 | to terms by mapping \AST{} constants to constants, \AST{} variables to
 | |
| 115 | schematic or free variables and \AST{} applications to applications.
 | |
| 116 | ||
| 117 | More precisely, the mapping $\termofast{-}$ is defined by
 | |
| 118 | \begin{itemize}
 | |
| 119 | \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
 | |
| 120 |   \mtt{dummyT})$.
 | |
| 121 | ||
| 122 | \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
 | |
| 123 |   \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
 | |
| 124 | the index extracted from~$xi$. | |
| 125 | ||
| 126 | \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
 | |
| 127 |   \mtt{dummyT})$.
 | |
| 128 | ||
| 129 | \item Function applications with $n$ arguments: | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 130 |     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
 | 
| 323 | 131 |        \termofast{f} \ttapp
 | 
| 132 |          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
 | |
| 133 | \] | |
| 134 | \end{itemize}
 | |
| 135 | Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
 | |
| 136 | \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
 | |
| 137 | while \ttindex{dummyT} stands for some dummy type that is ignored during
 | |
| 138 | type inference. | |
| 139 | ||
| 140 | So far the outcome is still a first-order term. Abstractions and bound | |
| 141 | variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
 | |
| 142 | by parse translations.  Such translations are attached to {\tt "_abs"},
 | |
| 143 | {\tt "!!"} and user-defined binders.
 | |
| 144 | ||
| 145 | ||
| 146 | \section{Printing of terms}
 | |
| 147 | \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
 | |
| 148 | ||
| 149 | The output phase is essentially the inverse of the input phase. Terms are | |
| 150 | translated via abstract syntax trees into strings. Finally the strings are | |
| 151 | pretty printed. | |
| 152 | ||
| 153 | Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
 | |
| 154 | terms into \AST{}s.  Ignoring those, the transformation maps
 | |
| 155 | term constants, variables and applications to the corresponding constructs | |
| 156 | on \AST{}s.  Abstractions are mapped to applications of the special
 | |
| 157 | constant {\tt _abs}.
 | |
| 158 | ||
| 159 | More precisely, the mapping $\astofterm{-}$ is defined as follows:
 | |
| 160 | \begin{itemize}
 | |
| 161 |   \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
 | |
| 162 | ||
| 163 |   \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
 | |
| 164 | \tau)$. | |
| 165 | ||
| 166 |   \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
 | |
| 167 |     \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
 | |
| 168 |     the {\tt indexname} $(x, i)$.
 | |
| 169 | ||
| 170 | \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant | |
| 171 | of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ | |
| 172 | be obtained from~$t$ by replacing all bound occurrences of~$x$ by | |
| 173 | the free variable $x'$. This replaces corresponding occurrences of the | |
| 174 |     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
 | |
| 175 |     \mtt{dummyT})$:
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 176 |    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 177 |       \Appl{\Constant \mtt{"_abs"},
 | 
| 8136 | 178 |         constrain(\Variable x', \tau), \astofterm{t'}}
 | 
| 323 | 179 | \] | 
| 180 | ||
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 181 |   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
 | 
| 323 | 182 |     The occurrence of constructor \ttindex{Bound} should never happen
 | 
| 183 | when printing well-typed terms; it indicates a de Bruijn index with no | |
| 184 | matching abstraction. | |
| 185 | ||
| 186 | \item Where $f$ is not an application, | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 187 |     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
 | 
| 323 | 188 |        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
 | 
| 189 | \] | |
| 190 | \end{itemize}
 | |
| 191 | % | |
| 332 | 192 | Type constraints\index{type constraints} are inserted to allow the printing
 | 
| 193 | of types.  This is governed by the boolean variable \ttindex{show_types}:
 | |
| 323 | 194 | \begin{itemize}
 | 
| 195 |   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
 | |
| 332 | 196 |     \ttindex{show_types} is set to {\tt false}.
 | 
| 323 | 197 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 198 |   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 199 |          \astofterm{\tau}}$ \ otherwise.
 | 
| 323 | 200 | |
| 201 |     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
 | |
| 202 |     constructors go to {\tt Constant}s; type identifiers go to {\tt
 | |
| 203 |       Variable}s; type applications go to {\tt Appl}s with the type
 | |
| 204 |     constructor as the first element.  If \ttindex{show_sorts} is set to
 | |
| 205 |     {\tt true}, some type variables are decorated with an \AST{} encoding
 | |
| 206 | of their sort. | |
| 207 | \end{itemize}
 | |
| 208 | % | |
| 209 | The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | |
| 210 | transformed into the final output string.  The built-in {\bf print AST
 | |
| 332 | 211 |   translations}\indexbold{translations!print AST} reverse the
 | 
| 323 | 212 | parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
 | 
| 213 | ||
| 214 | For the actual printing process, the names attached to productions | |
| 215 | of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
 | |
| 216 | a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
 | |
| 217 | $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production | |
| 218 | for~$c$. Each argument~$x@i$ is converted to a string, and put in | |
| 219 | parentheses if its priority~$(p@i)$ requires this. The resulting strings | |
| 220 | and their syntactic sugar (denoted by \dots{} above) are joined to make a
 | |
| 221 | single string. | |
| 222 | ||
| 3108 | 223 | If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments | 
| 224 | than the corresponding production, it is first split into | |
| 225 | $((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications
 | |
| 226 | with too few arguments or with non-constant head or without a | |
| 227 | corresponding production are printed as $f(x@1, \ldots, x@l)$ or | |
| 228 | $(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated | |
| 229 | with some name $c$ are tried in order of appearance. An occurrence of | |
| 323 | 230 | $\Variable x$ is simply printed as~$x$. | 
| 231 | ||
| 232 | Blanks are {\em not\/} inserted automatically.  If blanks are required to
 | |
| 233 | separate tokens, specify them in the mixfix declaration, possibly preceded | |
| 234 | by a slash~({\tt/}) to allow a line break.
 | |
| 235 | \index{ASTs|)}
 | |
| 236 | ||
| 5371 | 237 | %%% Local Variables: | 
| 238 | %%% mode: latex | |
| 239 | %%% TeX-master: "ref" | |
| 240 | %%% End: |