diff -r f51d4a302962 -r 5386df44a037 src/Doc/Ref/document/syntax.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/Doc/Ref/document/syntax.tex Tue Aug 28 18:57:32 2012 +0200 @@ -0,0 +1,240 @@ + +\chapter{Syntax Transformations} \label{chap:syntax} +\newcommand\ttapp{\mathrel{\hbox{\tt\$}}} +\newcommand\mtt[1]{\mbox{\tt #1}} +\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits} +\newcommand\Constant{\ttfct{Constant}} +\newcommand\Variable{\ttfct{Variable}} +\newcommand\Appl[1]{\ttfct{Appl}\,[#1]} +\index{syntax!transformations|(} + + +\section{Transforming parse trees to ASTs}\label{sec:astofpt} +\index{ASTs!made from parse trees} +\newcommand\astofpt[1]{\lbrakk#1\rbrakk} + +The parse tree is the raw output of the parser. Translation functions, +called {\bf parse AST translations}\indexbold{translations!parse AST}, +transform the parse tree into an abstract syntax tree. + +The parse tree is constructed by nesting the right-hand sides of the +productions used to recognize the input. Such parse trees are simply lists +of tokens and constituent parse trees, the latter representing the +nonterminals of the productions. Let us refer to the actual productions in +the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an +example). + +Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s +by stripping out delimiters and copy productions. More precisely, the +mapping $\astofpt{-}$ is derived from the productions as follows: +\begin{itemize} +\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, + \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token, + and $s$ its associated string. Note that for {\tt xstr} this does not + include the quotes. + +\item Copy productions:\index{productions!copy} + $\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for + strings of delimiters, which are discarded. $P$ stands for the single + constituent that is not a delimiter; it is either a nonterminal symbol or + a name token. + + \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. + Here there are no constituents other than delimiters, which are + discarded. + + \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and + the remaining constituents $P@1$, \ldots, $P@n$ are built into an + application whose head constant is~$c$: + \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = + \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} + \] +\end{itemize} +Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, +{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. +These examples illustrate the need for further translations to make \AST{}s +closer to the typed $\lambda$-calculus. The Pure syntax provides +predefined parse \AST{} translations\index{translations!parse AST} for +ordinary applications, type applications, nested abstractions, meta +implications and function types. Figure~\ref{fig:parse_ast_tr} shows their +effect on some representative input strings. + + +\begin{figure} +\begin{center} +\tt\begin{tabular}{ll} +\rm input string & \rm \AST \\\hline +"f" & f \\ +"'a" & 'a \\ +"t == u" & ("==" t u) \\ +"f(x)" & ("_appl" f x) \\ +"f(x, y)" & ("_appl" f ("_args" x y)) \\ +"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ +"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ +\end{tabular} +\end{center} +\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} +\end{figure} + +\begin{figure} +\begin{center} +\tt\begin{tabular}{ll} +\rm input string & \rm \AST{} \\\hline +"f(x, y, z)" & (f x y z) \\ +"'a ty" & (ty 'a) \\ +"('a, 'b) ty" & (ty 'a 'b) \\ +"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ +"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ +"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ +"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) +\end{tabular} +\end{center} +\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr} +\end{figure} + +The names of constant heads in the \AST{} control the translation process. +The list of constants invoking parse \AST{} translations appears in the +output of {\tt print_syntax} under {\tt parse_ast_translation}. + + +\section{Transforming ASTs to terms}\label{sec:termofast} +\index{terms!made from ASTs} +\newcommand\termofast[1]{\lbrakk#1\rbrakk} + +The \AST{}, after application of macros (see \S\ref{sec:macros}), is +transformed into a term. This term is probably ill-typed since type +inference has not occurred yet. The term may contain type constraints +consisting of applications with head {\tt "_constrain"}; the second +argument is a type encoded as a term. Type inference later introduces +correct types or rejects the input. + +Another set of translation functions, namely parse +translations\index{translations!parse}, may affect this process. If we +ignore parse translations for the time being, then \AST{}s are transformed +to terms by mapping \AST{} constants to constants, \AST{} variables to +schematic or free variables and \AST{} applications to applications. + +More precisely, the mapping $\termofast{-}$ is defined by +\begin{itemize} +\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x, + \mtt{dummyT})$. + +\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} = + \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$ + the index extracted from~$xi$. + +\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x, + \mtt{dummyT})$. + +\item Function applications with $n$ arguments: + \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = + \termofast{f} \ttapp + \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n} + \] +\end{itemize} +Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and +\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term}, +while \ttindex{dummyT} stands for some dummy type that is ignored during +type inference. + +So far the outcome is still a first-order term. Abstractions and bound +variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced +by parse translations. Such translations are attached to {\tt "_abs"}, +{\tt "!!"} and user-defined binders. + + +\section{Printing of terms} +\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms} + +The output phase is essentially the inverse of the input phase. Terms are +translated via abstract syntax trees into strings. Finally the strings are +pretty printed. + +Print translations (\S\ref{sec:tr_funs}) may affect the transformation of +terms into \AST{}s. Ignoring those, the transformation maps +term constants, variables and applications to the corresponding constructs +on \AST{}s. Abstractions are mapped to applications of the special +constant {\tt _abs}. + +More precisely, the mapping $\astofterm{-}$ is defined as follows: +\begin{itemize} + \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$. + + \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x, + \tau)$. + + \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable + \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of + the {\tt indexname} $(x, i)$. + + \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant + of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ + be obtained from~$t$ by replacing all bound occurrences of~$x$ by + the free variable $x'$. This replaces corresponding occurrences of the + constructor \ttindex{Bound} by the term $\ttfct{Free} (x', + \mtt{dummyT})$: + \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = + \Appl{\Constant \mtt{"_abs"}, + constrain(\Variable x', \tau), \astofterm{t'}} + \] + + \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. + The occurrence of constructor \ttindex{Bound} should never happen + when printing well-typed terms; it indicates a de Bruijn index with no + matching abstraction. + + \item Where $f$ is not an application, + \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = + \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} + \] +\end{itemize} +% +Type constraints\index{type constraints} are inserted to allow the printing +of types. This is governed by the boolean variable \ttindex{show_types}: +\begin{itemize} + \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or + \ttindex{show_types} is set to {\tt false}. + + \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, + \astofterm{\tau}}$ \ otherwise. + + Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type + constructors go to {\tt Constant}s; type identifiers go to {\tt + Variable}s; type applications go to {\tt Appl}s with the type + constructor as the first element. If \ttindex{show_sorts} is set to + {\tt true}, some type variables are decorated with an \AST{} encoding + of their sort. +\end{itemize} +% +The \AST{}, after application of macros (see \S\ref{sec:macros}), is +transformed into the final output string. The built-in {\bf print AST + translations}\indexbold{translations!print AST} reverse the +parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}. + +For the actual printing process, the names attached to productions +of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play +a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or +$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production +for~$c$. Each argument~$x@i$ is converted to a string, and put in +parentheses if its priority~$(p@i)$ requires this. The resulting strings +and their syntactic sugar (denoted by \dots{} above) are joined to make a +single string. + +If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments +than the corresponding production, it is first split into +$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications +with too few arguments or with non-constant head or without a +corresponding production are printed as $f(x@1, \ldots, x@l)$ or +$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated +with some name $c$ are tried in order of appearance. An occurrence of +$\Variable x$ is simply printed as~$x$. + +Blanks are {\em not\/} inserted automatically. If blanks are required to +separate tokens, specify them in the mixfix declaration, possibly preceded +by a slash~({\tt/}) to allow a line break. +\index{ASTs|)} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "ref" +%%% End: