doc-src/Ref/syntax.tex
changeset 323 361a71713176
child 332 01b87a921967
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc-src/Ref/syntax.tex	Fri Apr 15 17:16:23 1994 +0200
@@ -0,0 +1,837 @@
+%% $Id$
+\chapter{Syntax Transformations} \label{chap:syntax}
+\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
+\newcommand\mtt[1]{\mbox{\tt #1}}
+\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
+\newcommand\Constant{\ttfct{Constant}}
+\newcommand\Variable{\ttfct{Variable}}
+\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
+\index{syntax!transformations|(}
+
+This chapter is intended for experienced Isabelle users who need to define
+macros or code their own translation functions.  It describes the
+transformations between parse trees, abstract syntax trees and terms.
+
+
+\section{Abstract syntax trees} \label{sec:asts}
+\index{ASTs|(} 
+
+The parser, given a token list from the lexer, applies productions to yield
+a parse tree\index{parse trees}.  By applying some internal transformations
+the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
+expansion, further translations and finally type inference yields a
+well-typed term.  The printing process is the reverse, except for some
+subtleties to be discussed later.
+
+Figure~\ref{fig:parse_print} outlines the parsing and printing process.
+Much of the complexity is due to the macro mechanism.  Using macros, you
+can specify most forms of concrete syntax without writing any \ML{} code.
+
+\begin{figure}
+\begin{center}
+\begin{tabular}{cl}
+string          & \\
+$\downarrow$    & parser \\
+parse tree      & \\
+$\downarrow$    & parse \AST{} translation \\
+\AST{}             & \\
+$\downarrow$    & \AST{} rewriting (macros) \\
+\AST{}             & \\
+$\downarrow$    & parse translation, type inference \\
+--- well-typed term --- & \\
+$\downarrow$    & print translation \\
+\AST{}             & \\
+$\downarrow$    & \AST{} rewriting (macros) \\
+\AST{}             & \\
+$\downarrow$    & print \AST{} translation, printer \\
+string          &
+\end{tabular}
+
+\end{center}
+\caption{Parsing and printing}\label{fig:parse_print}
+\end{figure}
+
+Abstract syntax trees are an intermediate form between the raw parse trees
+and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
+variable) or a list of {\em at least two\/} subtrees.  Internally, they
+have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
+\index{*Appl}
+\begin{ttbox}
+datatype ast = Constant of string
+             | Variable of string
+             | Appl of ast list
+\end{ttbox}
+%
+Isabelle uses an S-expression syntax for abstract syntax trees.  Constant
+atoms are shown as quoted strings, variable atoms as non-quoted strings and
+applications as a parenthesized list of subtrees.  For example, the \AST
+\begin{ttbox}
+Appl [Constant "_constrain",
+      Appl [Constant "_abs", Variable "x", Variable "t"],
+      Appl [Constant "fun", Variable "'a", Variable "'b"]]
+\end{ttbox}
+is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
+Both {\tt ()} and {\tt (f)} are illegal because they have too few
+subtrees. 
+
+The resemblance to Lisp's S-expressions is intentional, but there are two
+kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
+names {\tt Constant} and {\tt Variable} too literally; in the later
+translation to terms, $\Variable x$ may become a constant, free or bound
+variable, even a type constructor or class name; the actual outcome depends
+on the context.
+
+Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
+application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of
+application is determined later by context; it could be a type constructor
+applied to types.
+
+Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
+first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
+at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
+occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
+constructor \ttindex{Bound}).
+
+
+\section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
+\index{ASTs!made from parse trees}
+\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
+
+The parse tree is the raw output of the parser.  Translation functions,
+called {\bf parse AST translations}\indexbold{translations!parse AST},
+transform the parse tree into an abstract syntax tree.
+
+The parse tree is constructed by nesting the right-hand sides of the
+productions used to recognize the input.  Such parse trees are simply lists
+of tokens and constituent parse trees, the latter representing the
+nonterminals of the productions.  Let us refer to the actual productions in
+the form displayed by {\tt Syntax.print_syntax}.
+
+Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
+by stripping out delimiters and copy productions.  More precisely, the
+mapping $\astofpt{-}$ is derived from the productions as follows:
+\begin{itemize}
+\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
+  \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.
+
+\item Copy productions:\index{productions!copy}
+  $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
+  strings of delimiters, which are discarded.  $P$ stands for the single
+  constituent that is not a delimiter; it is either a nonterminal symbol or
+  a name token.
+
+  \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
+    Here there are no constituents other than delimiters, which are
+    discarded. 
+
+  \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
+    the remaining constituents $P@1$, \ldots, $P@n$ are built into an
+    application whose head constant is~$c$:
+    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = 
+       \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
+    \]
+\end{itemize}
+Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
+{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
+These examples illustrate the need for further translations to make \AST{}s
+closer to the typed $\lambda$-calculus.  The Pure syntax provides
+predefined parse \AST{} translations\index{translations!parse AST} for
+ordinary applications, type applications, nested abstractions, meta
+implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
+effect on some representative input strings.
+
+
+\begin{figure}
+\begin{center}
+\tt\begin{tabular}{ll}
+\rm input string    & \rm \AST \\\hline
+"f"                 & f \\
+"'a"                & 'a \\
+"t == u"            & ("==" t u) \\
+"f(x)"              & ("_appl" f x) \\
+"f(x, y)"           & ("_appl" f ("_args" x y)) \\
+"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
+"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
+\end{tabular}
+\end{center}
+\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
+\end{figure}
+
+\begin{figure}
+\begin{center}
+\tt\begin{tabular}{ll}
+\rm input string            & \rm \AST{} \\\hline
+"f(x, y, z)"                & (f x y z) \\
+"'a ty"                     & (ty 'a) \\
+"('a, 'b) ty"               & (ty 'a 'b) \\
+"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
+"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
+"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
+"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
+\end{tabular}
+\end{center}
+\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
+\end{figure}
+
+The names of constant heads in the \AST{} control the translation process.
+The list of constants invoking parse \AST{} translations appears in the
+output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
+
+
+\section{Transforming \AST{}s to terms}\label{sec:termofast}
+\index{terms!made from ASTs}
+\newcommand\termofast[1]{\lbrakk#1\rbrakk}
+
+The \AST{}, after application of macros (see \S\ref{sec:macros}), is
+transformed into a term.  This term is probably ill-typed since type
+inference has not occurred yet.  The term may contain type constraints
+consisting of applications with head {\tt "_constrain"}; the second
+argument is a type encoded as a term.  Type inference later introduces
+correct types or rejects the input.
+
+Another set of translation functions, namely parse
+translations\index{translations!parse}, may affect this process.  If we
+ignore parse translations for the time being, then \AST{}s are transformed
+to terms by mapping \AST{} constants to constants, \AST{} variables to
+schematic or free variables and \AST{} applications to applications.
+
+More precisely, the mapping $\termofast{-}$ is defined by
+\begin{itemize}
+\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
+  \mtt{dummyT})$.
+
+\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
+  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
+  the index extracted from~$xi$.
+
+\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
+  \mtt{dummyT})$.
+
+\item Function applications with $n$ arguments:
+    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = 
+       \termofast{f} \ttapp
+         \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
+    \]
+\end{itemize}
+Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
+\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
+while \ttindex{dummyT} stands for some dummy type that is ignored during
+type inference.
+
+So far the outcome is still a first-order term.  Abstractions and bound
+variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
+by parse translations.  Such translations are attached to {\tt "_abs"},
+{\tt "!!"} and user-defined binders.
+
+
+\section{Printing of terms}
+\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
+
+The output phase is essentially the inverse of the input phase.  Terms are
+translated via abstract syntax trees into strings.  Finally the strings are
+pretty printed.
+
+Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
+terms into \AST{}s.  Ignoring those, the transformation maps
+term constants, variables and applications to the corresponding constructs
+on \AST{}s.  Abstractions are mapped to applications of the special
+constant {\tt _abs}.
+
+More precisely, the mapping $\astofterm{-}$ is defined as follows:
+\begin{itemize}
+  \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
+
+  \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
+    \tau)$.
+
+  \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
+    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
+    the {\tt indexname} $(x, i)$.
+
+  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
+    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
+    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
+    the free variable $x'$.  This replaces corresponding occurrences of the
+    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
+    \mtt{dummyT})$:
+   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = 
+      \Appl{\Constant \mtt{"_abs"}, 
+        constrain(\Variable x', \tau), \astofterm{t'}}.
+    \]
+
+  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.  
+    The occurrence of constructor \ttindex{Bound} should never happen
+    when printing well-typed terms; it indicates a de Bruijn index with no
+    matching abstraction.
+
+  \item Where $f$ is not an application,
+    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = 
+       \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
+    \]
+\end{itemize}
+%
+Type constraints are inserted to allow the printing of types.  This is
+governed by the boolean variable \ttindex{show_types}:
+\begin{itemize}
+  \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
+    \ttindex{show_types} not set to {\tt true}.
+
+  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, 
+         \astofterm{\tau}}$ \ otherwise.  
+
+    Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
+    constructors go to {\tt Constant}s; type identifiers go to {\tt
+      Variable}s; type applications go to {\tt Appl}s with the type
+    constructor as the first element.  If \ttindex{show_sorts} is set to
+    {\tt true}, some type variables are decorated with an \AST{} encoding
+    of their sort.
+\end{itemize}
+%
+The \AST{}, after application of macros (see \S\ref{sec:macros}), is
+transformed into the final output string.  The built-in {\bf print AST
+  translations}\indexbold{translations!print AST} effectively reverse the
+parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
+
+For the actual printing process, the names attached to productions
+of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
+a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
+$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
+for~$c$.  Each argument~$x@i$ is converted to a string, and put in
+parentheses if its priority~$(p@i)$ requires this.  The resulting strings
+and their syntactic sugar (denoted by \dots{} above) are joined to make a
+single string.
+
+If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
+corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
+x@n) ~ x@{n+1} \ldots x@m)$.  Applications with too few arguments or with
+non-constant head or without a corresponding production are printed as
+$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of
+$\Variable x$ is simply printed as~$x$.
+
+Blanks are {\em not\/} inserted automatically.  If blanks are required to
+separate tokens, specify them in the mixfix declaration, possibly preceded
+by a slash~({\tt/}) to allow a line break.
+\index{ASTs|)}
+
+
+
+\section{Macros: Syntactic rewriting} \label{sec:macros}
+\index{macros|(}\index{rewriting!syntactic|(} 
+
+Mixfix declarations alone can handle situations where there is a direct
+connection between the concrete syntax and the underlying term.  Sometimes
+we require a more elaborate concrete syntax, such as quantifiers and list
+notation.  Isabelle's {\bf macros} and {\bf translation functions} can
+perform translations such as
+\begin{center}\tt
+  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
+    ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
+    [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
+  \end{tabular}
+\end{center}
+Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
+are the most powerful translation mechanism but are difficult to read or
+write.  Macros are specified by first-order rewriting systems that operate
+on abstract syntax trees.  They are usually easy to read and write, and can
+express all but the most obscure translations.
+
+Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
+theory.\footnote{This and the following theories are complete working
+  examples, though they specify only syntax, no axioms.  The file {\tt
+    ZF/zf.thy} presents the full set theory definition, including many
+  macro rules.}  Theory {\tt SET} defines constants for set comprehension
+({\tt Collect}), replacement ({\tt Replace}) and bounded universal
+quantification ({\tt Ball}).  Each of these binds some variables.  Without
+additional syntax we should have to express $\forall x \in A.  P$ as {\tt
+  Ball(A,\%x.P)}, and similarly for the others.
+
+\begin{figure}
+\begin{ttbox}
+SET = Pure +
+types
+  i, o
+arities
+  i, o :: logic
+consts
+  Trueprop      :: "o => prop"              ("_" 5)
+  Collect       :: "[i, i => o] => i"
+  "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
+  Replace       :: "[i, [i, i] => o] => i"
+  "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
+  Ball          :: "[i, i => o] => o"
+  "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
+translations
+  "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
+  "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
+  "ALL x:A. P"  == "Ball(A, \%x. P)"
+end
+\end{ttbox}
+\caption{Macro example: set theory}\label{fig:set_trans}
+\end{figure}
+
+The theory specifies a variable-binding syntax through additional
+productions that have mixfix declarations.  Each non-copy production must
+specify some constant, which is used for building \AST{}s.
+\index{constants!syntactic} The additional constants are decorated with
+{\tt\at} to stress their purely syntactic purpose; they should never occur
+within the final well-typed terms.  Furthermore, they cannot be written in
+formulae because they are not legal identifiers.
+
+The translations cause the replacement of external forms by internal forms
+after parsing, and vice versa before printing of terms.  As a specification
+of the set theory notation, they should be largely self-explanatory.  The
+syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
+appear implicitly in the macro rules via their mixfix forms.
+
+Macros can define variable-binding syntax because they operate on \AST{}s,
+which have no inbuilt notion of bound variable.  The macro variables {\tt
+  x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
+in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
+range over formulae containing bound variable occurrences.
+
+Other applications of the macro system can be less straightforward, and
+there are peculiarities.  The rest of this section will describe in detail
+how Isabelle macros are preprocessed and applied.
+
+
+\subsection{Specifying macros}
+Macros are basically rewrite rules on \AST{}s.  But unlike other macro
+systems found in programming languages, Isabelle's macros work in both
+directions.  Therefore a syntax contains two lists of rewrites: one for
+parsing and one for printing.
+
+\index{*translations section}
+The {\tt translations} section specifies macros.  The syntax for a macro is
+\[ (root)\; string \quad
+   \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
+   \right\} \quad
+   (root)\; string 
+\]
+%
+This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
+({\tt ==}).  The two strings specify the left and right-hand sides of the
+macro rule.  The $(root)$ specification is optional; it specifies the
+nonterminal for parsing the $string$ and if omitted defaults to {\tt
+  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
+\begin{itemize}
+\item Rules must be left linear: $l$ must not contain repeated variables.
+
+\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
+  (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
+
+\item Every variable in~$r$ must also occur in~$l$.
+\end{itemize}
+
+Macro rules may refer to any syntax from the parent theories.  They may
+also refer to anything defined before the the {\tt .thy} file's {\tt
+  translations} section --- including any mixfix declarations.
+
+Upon declaration, both sides of the macro rule undergo parsing and parse
+\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
+macro expansion.  The lexer runs in a different mode that additionally
+accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
+{\tt _K}).  Thus, a constant whose name starts with an underscore can
+appear in macro rules but not in ordinary terms.
+
+Some atoms of the macro rule's \AST{} are designated as constants for
+matching.  These are all names that have been declared as classes, types or
+constants.
+
+The result of this preprocessing is two lists of macro rules, each stored
+as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
+(sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
+theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
+\begin{ttbox}
+parse_rules:
+  ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
+  ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
+  ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
+print_rules:
+  ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
+  ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
+  ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
+\end{ttbox}
+
+\begin{warn}
+  Avoid choosing variable names that have previously been used as
+  constants, types or type classes; the {\tt consts} section in the output
+  of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
+  incorrectly, inspect its internal form as shown above, recalling that
+  constants appear as quoted strings and variables without quotes.
+\end{warn}
+
+\begin{warn}
+If \ttindex{eta_contract} is set to {\tt true}, terms will be
+$\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
+abstraction nodes needed for print rules to match may vanish.  For example,
+\verb|Ball(A, %x. P(x))| contracts {\tt Ball(A, P)}; the print rule does
+not apply and the output will be {\tt Ball(A, P)}.  This problem would not
+occur if \ML{} translation functions were used instead of macros (as is
+done for binder declarations).
+\end{warn}
+
+
+\begin{warn}
+Another trap concerns type constraints.  If \ttindex{show_types} is set to
+{\tt true}, bound variables will be decorated by their meta types at the
+binding place (but not at occurrences in the body).  Matching with
+\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
+"i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
+appear in the external form, say \verb|{y::i:A::i. P::o}|.  
+
+To allow such constraints to be re-read, your syntax should specify bound
+variables using the nonterminal~\ndx{idt}.  This is the case in our
+example.  Choosing {\tt id} instead of {\tt idt} is a common error,
+especially since it appears in former versions of most of Isabelle's
+object-logics.
+\end{warn}
+
+
+
+\subsection{Applying rules}
+As a term is being parsed or printed, an \AST{} is generated as an
+intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
+normalized by applying macro rules in the manner of a traditional term
+rewriting system.  We first examine how a single rule is applied.
+
+Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some
+translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
+instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
+matched by $l$ may be replaced by the corresponding instance of~$r$, thus
+{\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
+  place-holders} that may occur in rule patterns but not in ordinary
+\AST{}s; {\tt Variable} atoms serve this purpose.
+
+The matching of the object~$u$ by the pattern~$l$ is performed as follows:
+\begin{itemize}
+  \item Every constant matches itself.
+
+  \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
+    This point is discussed further below.
+
+  \item Every \AST{} in the object matches $\Variable x$ in the pattern,
+    binding~$x$ to~$u$.
+
+  \item One application matches another if they have the same number of
+    subtrees and corresponding subtrees match.
+
+  \item In every other case, matching fails.  In particular, {\tt
+      Constant}~$x$ can only match itself.
+\end{itemize}
+A successful match yields a substitution that is applied to~$r$, generating
+the instance that replaces~$u$.
+
+The second case above may look odd.  This is where {\tt Variable}s of
+non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
+far removed from parse trees; at this level it is not yet known which
+identifiers will become constants, bounds, frees, types or classes.  As
+\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
+{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and
+\ndx{tvar} become {\tt Variable}s.  On the other hand, when \AST{}s
+generated from terms for printing, all constants and type constructors
+become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may contain a
+messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
+insignificant at macro level because matching treats them alike.
+
+Because of this behaviour, different kinds of atoms with the same name are
+indistinguishable, which may make some rules prone to misbehaviour.  Example:
+\begin{ttbox}
+types
+  Nil
+consts
+  Nil     :: "'a list"
+  "[]"    :: "'a list"    ("[]")
+translations
+  "[]"    == "Nil"
+\end{ttbox}
+The term {\tt Nil} will be printed as {\tt []}, just as expected.
+The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
+expected! 
+
+Normalizing an \AST{} involves repeatedly applying macro rules until none
+is applicable.  Macro rules are chosen in the order that they appear in the
+{\tt translations} section.  You can watch the normalization of \AST{}s
+during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
+{\tt true}.\index{tracing!of macros} Alternatively, use
+\ttindex{Syntax.test_read}.  The information displayed when tracing
+includes the \AST{} before normalization ({\tt pre}), redexes with results
+({\tt rewrote}), the normal form finally reached ({\tt post}) and some
+statistics ({\tt normalize}).  If tracing is off,
+\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
+printing of the normal form and statistics only.
+
+
+\subsection{Example: the syntax of finite sets}
+\index{examples!of macros}
+
+This example demonstrates the use of recursive macros to implement a
+convenient notation for finite sets.
+\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
+\index{"@Enum@{\tt\at Enum} constant}
+\index{"@Finset@{\tt\at Finset} constant}
+\begin{ttbox}
+FINSET = SET +
+types
+  is
+consts
+  ""            :: "i => is"                ("_")
+  "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
+  empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
+  insert        :: "[i, i] => i"
+  "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
+translations
+  "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
+  "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
+end
+\end{ttbox}
+Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
+declarations above specify \verb|{x, y, z}| as the external representation
+of
+\begin{ttbox}
+insert(x, insert(y, insert(z, empty)))
+\end{ttbox}
+The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
+  i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
+allows a line break after the comma for \rmindex{pretty printing}; if no
+line break is required then a space is printed instead.
+
+The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
+declaration.  Hence {\tt is} is not a logical type and no default
+productions are added.  If we had needed enumerations of the nonterminal
+{\tt logic}, which would include all the logical types, we could have used
+the predefined nonterminal symbol \ndx{args} and skipped this part
+altogether.  The nonterminal~{\tt is} can later be reused for other
+enumerations of type~{\tt i} like lists or tuples.
+
+\index{"@Finset@{\tt\at Finset} constant}
+Next follows {\tt empty}, which is already equipped with its syntax
+\verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
+constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
+  i} enclosed in curly braces.  Remember that a pair of parentheses, as in
+\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
+
+The translations may look strange at first.  Macro rules are best
+understood in their internal forms:
+\begin{ttbox}
+parse_rules:
+  ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
+  ("{\at}Finset" x)  ->  ("insert" x "empty")
+print_rules:
+  ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
+  ("insert" x "empty")  ->  ("{\at}Finset" x)
+\end{ttbox}
+This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
+two elements, binding the first to {\tt x} and the rest to {\tt xs}.
+Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
+The parse rules only work in the order given.
+
+\begin{warn}
+  The \AST{} rewriter cannot discern constants from variables and looks
+  only for names of atoms.  Thus the names of {\tt Constant}s occurring in
+  the (internal) left-hand side of translation rules should be regarded as
+  \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
+  sufficiently long and strange names.  If a bound variable's name gets
+  rewritten, the result will be incorrect; for example, the term
+\begin{ttbox}
+\%empty insert. insert(x, empty)
+\end{ttbox}
+  gets printed as \verb|%empty insert. {x}|.
+\end{warn}
+
+
+\subsection{Example: a parse macro for dependent types}\label{prod_trans}
+\index{examples!of macros}
+
+As stated earlier, a macro rule may not introduce new {\tt Variable}s on
+the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;
+if allowed, it could cause variable capture.  In such cases you usually
+must fall back on translation functions.  But a trick can make things
+readable in some cases: {\em calling translation functions by parse
+  macros}:
+\begin{ttbox}
+PROD = FINSET +
+consts
+  Pi            :: "[i, i => i] => i"
+  "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
+  "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
+\ttbreak
+translations
+  "PROD x:A. B" => "Pi(A, \%x. B)"
+  "A -> B"      => "Pi(A, _K(B))"
+end
+ML
+  val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
+\end{ttbox}
+
+Here {\tt Pi} is an internal constant for constructing general products.
+Two external forms exist: the general case {\tt PROD x:A.B} and the
+function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
+does not depend on~{\tt x}.
+
+The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
+due to a parse translation associated with \cdx{_K}.  The order of the
+parse rules is critical.  Unfortunately there is no such trick for
+printing, so we have to add a {\tt ML} section for the print translation
+\ttindex{dependent_tr'}.
+
+Recall that identifiers with a leading {\tt _} are allowed in translation
+rules, but not in ordinary terms.  Thus we can create \AST{}s containing
+names that are not directly expressible.
+
+The parse translation for {\tt _K} is already installed in Pure, and {\tt
+dependent_tr'} is exported by the syntax module for public use.  See
+\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
+\index{macros|)}\index{rewriting!syntactic|)}
+
+
+\section{Translation functions} \label{sec:tr_funs}
+\index{translations|(} 
+%
+This section describes the translation function mechanism.  By writing
+\ML{} functions, you can do almost everything with terms or \AST{}s during
+parsing and printing.  The logic \LK\ is a good example of sophisticated
+transformations between internal and external representations of
+associative sequences; here, macros would be useless.
+
+A full understanding of translations requires some familiarity
+with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
+{\tt Syntax.ast} and the encodings of types and terms as such at the various
+stages of the parsing or printing process.  Most users should never need to
+use translation functions.
+
+\subsection{Declaring translation functions}
+There are four kinds of translation functions.  Each such function is
+associated with a name, which triggers calls to it.  Such names can be
+constants (logical or syntactic) or type constructors.
+
+{\tt Syntax.print_syntax} displays the sets of names associated with the
+translation functions of a {\tt Syntax.syntax} under
+\ttindex{parse_ast_translation}, \ttindex{parse_translation},
+\ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
+add new ones via the {\tt ML} section\index{*ML section} of
+a {\tt .thy} file.  There may never be more than one function of the same
+kind per name.  Conceptually, the {\tt ML} section should appear between
+{\tt consts} and {\tt translations}; newly installed translation functions
+are already effective when macros and logical rules are parsed.
+
+The {\tt ML} section is copied verbatim into the \ML\ file generated from a
+{\tt .thy} file.  Definitions made here are accessible as components of an
+\ML\ structure; to make some definitions private, use an \ML{} {\tt local}
+declaration.  The {\tt ML} section may install translation functions by
+declaring any of the following identifiers:
+\begin{ttbox}
+val parse_ast_translation : (string * (ast list -> ast)) list
+val print_ast_translation : (string * (ast list -> ast)) list
+val parse_translation     : (string * (term list -> term)) list
+val print_translation     : (string * (term list -> term)) list
+\end{ttbox}
+
+\subsection{The translation strategy}
+All four kinds of translation functions are treated similarly.  They are
+called during the transformations between parse trees, \AST{}s and terms
+(recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
+$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
+$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
+function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
+
+For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
+combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
+  x@n}$.  For term translations, the arguments are terms and a combination
+has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
+x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
+transformations than \AST{}s do, typically involving abstractions and bound
+variables.
+
+Regardless of whether they act on terms or \AST{}s,
+parse translations differ from print translations fundamentally:
+\begin{description}
+\item[Parse translations] are applied bottom-up.  The arguments are already
+  in translated form.  The translations must not fail; exceptions trigger
+  an error message.
+
+\item[Print translations] are applied top-down.  They are supplied with
+  arguments that are partly still in internal form.  The result again
+  undergoes translation; therefore a print translation should not introduce
+  as head the very constant that invoked it.  The function may raise
+  exception \xdx{Match} to indicate failure; in this event it has no
+  effect.
+\end{description}
+
+Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
+\ttindex{Const} for terms --- can invoke translation functions.  This
+causes another difference between parsing and printing.
+
+Parsing starts with a string and the constants are not yet identified.
+Only parse tree heads create {\tt Constant}s in the resulting \AST, as
+described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
+introduce further {\tt Constant}s.  When the final \AST{} is converted to a
+term, all {\tt Constant}s become {\tt Const}s, as described in
+\S\ref{sec:termofast}.
+
+Printing starts with a well-typed term and all the constants are known.  So
+all logical constants and type constructors may invoke print translations.
+These, and macros, may introduce further constants.
+
+
+\subsection{Example: a print translation for dependent types}
+\index{examples!of translations}\indexbold{*dependent_tr'}
+
+Let us continue the dependent type example (page~\pageref{prod_trans}) by
+examining the parse translation for~\cdx{_K} and the print translation
+{\tt dependent_tr'}, which are both built-in.  By convention, parse
+translations have names ending with {\tt _tr} and print translations have
+names ending with {\tt _tr'}.  Search for such names in the Isabelle
+sources to locate more examples.
+
+Here is the parse translation for {\tt _K}:
+\begin{ttbox}
+fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
+  | k_tr ts = raise TERM("k_tr",ts);
+\end{ttbox}
+If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
+{\tt Abs} node with a body derived from $t$.  Since terms given to parse
+translations are not yet typed, the type of the bound variable in the new
+{\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
+nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
+a basic term manipulation function defined in {\tt Pure/term.ML}.
+
+Here is the print translation for dependent types:
+\begin{ttbox}
+fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
+      if 0 mem (loose_bnos B) then
+        let val (x', B') = variant_abs (x, dummyT, B);
+        in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
+        end
+      else list_comb (Const (r, dummyT) $ A $ B, ts)
+  | dependent_tr' _ _ = raise Match;
+\end{ttbox}
+The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried
+function application during its installation.  We could set up print
+translations for both {\tt Pi} and {\tt Sigma} by including
+\begin{ttbox}\index{*ML section}
+val print_translation =
+  [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
+   ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
+\end{ttbox}
+within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
+\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
+$\hbox{\tt{\at}->}r(A, B)$, choosing the latter form if $B$ does not depend
+on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
+from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
+from all names in $B$, and $B'$ the body $B$ with {\tt Bound} nodes
+referring to our {\tt Abs} node replaced by $\ttfct{Free} (x',
+\mtt{dummyT})$.
+
+We must be careful with types here.  While types of {\tt Const}s are
+ignored, type constraints may be printed for some {\tt Free}s and
+{\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
+\ttindex{dummyT} are never printed with constraint, though.  The line
+\begin{ttbox}
+        let val (x', B') = variant_abs (x, dummyT, B);
+\end{ttbox}\index{*variant_abs}
+replaces bound variable occurrences in~$B$ by the free variable $x'$ with
+type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
+correct type~{\tt T}, so this is the only place where a type
+constraint might appear. 
+\index{translations|)}
+\index{syntax!transformations|)}