doc-src/Ref/syntax.tex
author berghofe
Mon, 23 Jul 2012 18:52:10 +0200
changeset 48453 2421ff8c57a5
parent 48118 8537313dd261
permissions -rw-r--r--
set_vcs now derives prefix from fully qualified procedure / function name


\chapter{Syntax Transformations} \label{chap:syntax}
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
\newcommand\mtt[1]{\mbox{\tt #1}}
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
\newcommand\Constant{\ttfct{Constant}}
\newcommand\Variable{\ttfct{Variable}}
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
\index{syntax!transformations|(}


\section{Transforming parse trees to ASTs}\label{sec:astofpt}
\index{ASTs!made from parse trees}
\newcommand\astofpt[1]{\lbrakk#1\rbrakk}

The parse tree is the raw output of the parser.  Translation functions,
called {\bf parse AST translations}\indexbold{translations!parse AST},
transform the parse tree into an abstract syntax tree.

The parse tree is constructed by nesting the right-hand sides of the
productions used to recognize the input.  Such parse trees are simply lists
of tokens and constituent parse trees, the latter representing the
nonterminals of the productions.  Let us refer to the actual productions in
the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
example).

Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
by stripping out delimiters and copy productions.  More precisely, the
mapping $\astofpt{-}$ is derived from the productions as follows:
\begin{itemize}
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
  \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token,
  and $s$ its associated string.  Note that for {\tt xstr} this does not
  include the quotes.

\item Copy productions:\index{productions!copy}
  $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
  strings of delimiters, which are discarded.  $P$ stands for the single
  constituent that is not a delimiter; it is either a nonterminal symbol or
  a name token.

  \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
    Here there are no constituents other than delimiters, which are
    discarded.

  \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
    the remaining constituents $P@1$, \ldots, $P@n$ are built into an
    application whose head constant is~$c$:
    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
       \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
    \]
\end{itemize}
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
These examples illustrate the need for further translations to make \AST{}s
closer to the typed $\lambda$-calculus.  The Pure syntax provides
predefined parse \AST{} translations\index{translations!parse AST} for
ordinary applications, type applications, nested abstractions, meta
implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
effect on some representative input strings.


\begin{figure}
\begin{center}
\tt\begin{tabular}{ll}
\rm input string    & \rm \AST \\\hline
"f"                 & f \\
"'a"                & 'a \\
"t == u"            & ("==" t u) \\
"f(x)"              & ("_appl" f x) \\
"f(x, y)"           & ("_appl" f ("_args" x y)) \\
"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
\end{tabular}
\end{center}
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
\end{figure}

\begin{figure}
\begin{center}
\tt\begin{tabular}{ll}
\rm input string            & \rm \AST{} \\\hline
"f(x, y, z)"                & (f x y z) \\
"'a ty"                     & (ty 'a) \\
"('a, 'b) ty"               & (ty 'a 'b) \\
"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
\end{tabular}
\end{center}
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
\end{figure}

The names of constant heads in the \AST{} control the translation process.
The list of constants invoking parse \AST{} translations appears in the
output of {\tt print_syntax} under {\tt parse_ast_translation}.


\section{Transforming ASTs to terms}\label{sec:termofast}
\index{terms!made from ASTs}
\newcommand\termofast[1]{\lbrakk#1\rbrakk}

The \AST{}, after application of macros (see \S\ref{sec:macros}), is
transformed into a term.  This term is probably ill-typed since type
inference has not occurred yet.  The term may contain type constraints
consisting of applications with head {\tt "_constrain"}; the second
argument is a type encoded as a term.  Type inference later introduces
correct types or rejects the input.

Another set of translation functions, namely parse
translations\index{translations!parse}, may affect this process.  If we
ignore parse translations for the time being, then \AST{}s are transformed
to terms by mapping \AST{} constants to constants, \AST{} variables to
schematic or free variables and \AST{} applications to applications.

More precisely, the mapping $\termofast{-}$ is defined by
\begin{itemize}
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
  \mtt{dummyT})$.

\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
  the index extracted from~$xi$.

\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
  \mtt{dummyT})$.

\item Function applications with $n$ arguments:
    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
       \termofast{f} \ttapp
         \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
    \]
\end{itemize}
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
while \ttindex{dummyT} stands for some dummy type that is ignored during
type inference.

So far the outcome is still a first-order term.  Abstractions and bound
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
by parse translations.  Such translations are attached to {\tt "_abs"},
{\tt "!!"} and user-defined binders.


\section{Printing of terms}
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}

The output phase is essentially the inverse of the input phase.  Terms are
translated via abstract syntax trees into strings.  Finally the strings are
pretty printed.

Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
terms into \AST{}s.  Ignoring those, the transformation maps
term constants, variables and applications to the corresponding constructs
on \AST{}s.  Abstractions are mapped to applications of the special
constant {\tt _abs}.

More precisely, the mapping $\astofterm{-}$ is defined as follows:
\begin{itemize}
  \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.

  \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
    \tau)$.

  \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
    the {\tt indexname} $(x, i)$.

  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
    the free variable $x'$.  This replaces corresponding occurrences of the
    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
    \mtt{dummyT})$:
   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
      \Appl{\Constant \mtt{"_abs"},
        constrain(\Variable x', \tau), \astofterm{t'}}
    \]

  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
    The occurrence of constructor \ttindex{Bound} should never happen
    when printing well-typed terms; it indicates a de Bruijn index with no
    matching abstraction.

  \item Where $f$ is not an application,
    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
       \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
    \]
\end{itemize}
%
Type constraints\index{type constraints} are inserted to allow the printing
of types.  This is governed by the boolean variable \ttindex{show_types}:
\begin{itemize}
  \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
    \ttindex{show_types} is set to {\tt false}.

  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
         \astofterm{\tau}}$ \ otherwise.

    Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
    constructors go to {\tt Constant}s; type identifiers go to {\tt
      Variable}s; type applications go to {\tt Appl}s with the type
    constructor as the first element.  If \ttindex{show_sorts} is set to
    {\tt true}, some type variables are decorated with an \AST{} encoding
    of their sort.
\end{itemize}
%
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
transformed into the final output string.  The built-in {\bf print AST
  translations}\indexbold{translations!print AST} reverse the
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.

For the actual printing process, the names attached to productions
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
for~$c$.  Each argument~$x@i$ is converted to a string, and put in
parentheses if its priority~$(p@i)$ requires this.  The resulting strings
and their syntactic sugar (denoted by \dots{} above) are joined to make a
single string.

If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
than the corresponding production, it is first split into
$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications
with too few arguments or with non-constant head or without a
corresponding production are printed as $f(x@1, \ldots, x@l)$ or
$(\alpha@1, \ldots, \alpha@l) ty$.  Multiple productions associated
with some name $c$ are tried in order of appearance.  An occurrence of
$\Variable x$ is simply printed as~$x$.

Blanks are {\em not\/} inserted automatically.  If blanks are required to
separate tokens, specify them in the mixfix declaration, possibly preceded
by a slash~({\tt/}) to allow a line break.
\index{ASTs|)}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "ref"
%%% End: