doc-src/Ref/document/syntax.tex
changeset 48939 83bd9eb1c70c
parent 48118 8537313dd261
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc-src/Ref/document/syntax.tex	Mon Aug 27 17:24:01 2012 +0200
@@ -0,0 +1,240 @@
+
+\chapter{Syntax Transformations} \label{chap:syntax}
+\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
+\newcommand\mtt[1]{\mbox{\tt #1}}
+\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
+\newcommand\Constant{\ttfct{Constant}}
+\newcommand\Variable{\ttfct{Variable}}
+\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
+\index{syntax!transformations|(}
+
+
+\section{Transforming parse trees to ASTs}\label{sec:astofpt}
+\index{ASTs!made from parse trees}
+\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
+
+The parse tree is the raw output of the parser.  Translation functions,
+called {\bf parse AST translations}\indexbold{translations!parse AST},
+transform the parse tree into an abstract syntax tree.
+
+The parse tree is constructed by nesting the right-hand sides of the
+productions used to recognize the input.  Such parse trees are simply lists
+of tokens and constituent parse trees, the latter representing the
+nonterminals of the productions.  Let us refer to the actual productions in
+the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
+example).
+
+Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
+by stripping out delimiters and copy productions.  More precisely, the
+mapping $\astofpt{-}$ is derived from the productions as follows:
+\begin{itemize}
+\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
+  \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token,
+  and $s$ its associated string.  Note that for {\tt xstr} this does not
+  include the quotes.
+
+\item Copy productions:\index{productions!copy}
+  $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
+  strings of delimiters, which are discarded.  $P$ stands for the single
+  constituent that is not a delimiter; it is either a nonterminal symbol or
+  a name token.
+
+  \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
+    Here there are no constituents other than delimiters, which are
+    discarded.
+
+  \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
+    the remaining constituents $P@1$, \ldots, $P@n$ are built into an
+    application whose head constant is~$c$:
+    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
+       \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
+    \]
+\end{itemize}
+Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
+{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
+These examples illustrate the need for further translations to make \AST{}s
+closer to the typed $\lambda$-calculus.  The Pure syntax provides
+predefined parse \AST{} translations\index{translations!parse AST} for
+ordinary applications, type applications, nested abstractions, meta
+implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
+effect on some representative input strings.
+
+
+\begin{figure}
+\begin{center}
+\tt\begin{tabular}{ll}
+\rm input string    & \rm \AST \\\hline
+"f"                 & f \\
+"'a"                & 'a \\
+"t == u"            & ("==" t u) \\
+"f(x)"              & ("_appl" f x) \\
+"f(x, y)"           & ("_appl" f ("_args" x y)) \\
+"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
+"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
+\end{tabular}
+\end{center}
+\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
+\end{figure}
+
+\begin{figure}
+\begin{center}
+\tt\begin{tabular}{ll}
+\rm input string            & \rm \AST{} \\\hline
+"f(x, y, z)"                & (f x y z) \\
+"'a ty"                     & (ty 'a) \\
+"('a, 'b) ty"               & (ty 'a 'b) \\
+"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
+"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
+"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
+"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
+\end{tabular}
+\end{center}
+\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
+\end{figure}
+
+The names of constant heads in the \AST{} control the translation process.
+The list of constants invoking parse \AST{} translations appears in the
+output of {\tt print_syntax} under {\tt parse_ast_translation}.
+
+
+\section{Transforming ASTs to terms}\label{sec:termofast}
+\index{terms!made from ASTs}
+\newcommand\termofast[1]{\lbrakk#1\rbrakk}
+
+The \AST{}, after application of macros (see \S\ref{sec:macros}), is
+transformed into a term.  This term is probably ill-typed since type
+inference has not occurred yet.  The term may contain type constraints
+consisting of applications with head {\tt "_constrain"}; the second
+argument is a type encoded as a term.  Type inference later introduces
+correct types or rejects the input.
+
+Another set of translation functions, namely parse
+translations\index{translations!parse}, may affect this process.  If we
+ignore parse translations for the time being, then \AST{}s are transformed
+to terms by mapping \AST{} constants to constants, \AST{} variables to
+schematic or free variables and \AST{} applications to applications.
+
+More precisely, the mapping $\termofast{-}$ is defined by
+\begin{itemize}
+\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
+  \mtt{dummyT})$.
+
+\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
+  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
+  the index extracted from~$xi$.
+
+\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
+  \mtt{dummyT})$.
+
+\item Function applications with $n$ arguments:
+    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
+       \termofast{f} \ttapp
+         \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
+    \]
+\end{itemize}
+Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
+\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
+while \ttindex{dummyT} stands for some dummy type that is ignored during
+type inference.
+
+So far the outcome is still a first-order term.  Abstractions and bound
+variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
+by parse translations.  Such translations are attached to {\tt "_abs"},
+{\tt "!!"} and user-defined binders.
+
+
+\section{Printing of terms}
+\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
+
+The output phase is essentially the inverse of the input phase.  Terms are
+translated via abstract syntax trees into strings.  Finally the strings are
+pretty printed.
+
+Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
+terms into \AST{}s.  Ignoring those, the transformation maps
+term constants, variables and applications to the corresponding constructs
+on \AST{}s.  Abstractions are mapped to applications of the special
+constant {\tt _abs}.
+
+More precisely, the mapping $\astofterm{-}$ is defined as follows:
+\begin{itemize}
+  \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
+
+  \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
+    \tau)$.
+
+  \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
+    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
+    the {\tt indexname} $(x, i)$.
+
+  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
+    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
+    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
+    the free variable $x'$.  This replaces corresponding occurrences of the
+    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
+    \mtt{dummyT})$:
+   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
+      \Appl{\Constant \mtt{"_abs"},
+        constrain(\Variable x', \tau), \astofterm{t'}}
+    \]
+
+  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
+    The occurrence of constructor \ttindex{Bound} should never happen
+    when printing well-typed terms; it indicates a de Bruijn index with no
+    matching abstraction.
+
+  \item Where $f$ is not an application,
+    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
+       \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
+    \]
+\end{itemize}
+%
+Type constraints\index{type constraints} are inserted to allow the printing
+of types.  This is governed by the boolean variable \ttindex{show_types}:
+\begin{itemize}
+  \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
+    \ttindex{show_types} is set to {\tt false}.
+
+  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
+         \astofterm{\tau}}$ \ otherwise.
+
+    Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
+    constructors go to {\tt Constant}s; type identifiers go to {\tt
+      Variable}s; type applications go to {\tt Appl}s with the type
+    constructor as the first element.  If \ttindex{show_sorts} is set to
+    {\tt true}, some type variables are decorated with an \AST{} encoding
+    of their sort.
+\end{itemize}
+%
+The \AST{}, after application of macros (see \S\ref{sec:macros}), is
+transformed into the final output string.  The built-in {\bf print AST
+  translations}\indexbold{translations!print AST} reverse the
+parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
+
+For the actual printing process, the names attached to productions
+of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
+a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
+$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
+for~$c$.  Each argument~$x@i$ is converted to a string, and put in
+parentheses if its priority~$(p@i)$ requires this.  The resulting strings
+and their syntactic sugar (denoted by \dots{} above) are joined to make a
+single string.
+
+If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
+than the corresponding production, it is first split into
+$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications
+with too few arguments or with non-constant head or without a
+corresponding production are printed as $f(x@1, \ldots, x@l)$ or
+$(\alpha@1, \ldots, \alpha@l) ty$.  Multiple productions associated
+with some name $c$ are tried in order of appearance.  An occurrence of
+$\Variable x$ is simply printed as~$x$.
+
+Blanks are {\em not\/} inserted automatically.  If blanks are required to
+separate tokens, specify them in the mixfix declaration, possibly preceded
+by a slash~({\tt/}) to allow a line break.
+\index{ASTs|)}
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "ref"
+%%% End: