doc-src/Ref/syntax.tex
 changeset 3108 335efc3f5632 parent 1387 9bcad9c22fd4 child 3128 d01d4c0c4b44
--- a/doc-src/Ref/syntax.tex	Mon May 05 21:18:01 1997 +0200
+++ b/doc-src/Ref/syntax.tex	Tue May 06 12:50:16 1997 +0200
@@ -31,7 +31,7 @@
\begin{center}
\begin{tabular}{cl}
string          & \\
-$\downarrow$    & parser \\
+$\downarrow$    & lexer, parser \\
parse tree      & \\
$\downarrow$    & parse \AST{} translation \\
\AST{}             & \\
@@ -43,7 +43,7 @@
\AST{}             & \\
$\downarrow$    & \AST{} rewriting (macros) \\
\AST{}             & \\
-$\downarrow$    & print \AST{} translation, printer \\
+$\downarrow$    & print \AST{} translation, token translation \\
string          &
\end{tabular}

@@ -304,11 +304,13 @@
and their syntactic sugar (denoted by \dots{} above) are joined to make a
single string.

-If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
-corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots -x@n) ~ x@{n+1} \ldots x@m)$.  Applications with too few arguments or with
-non-constant head or without a corresponding production are printed as
-$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of
+If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
+than the corresponding production, it is first split into
+$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications
+with too few arguments or with non-constant head or without a
+corresponding production are printed as $f(x@1, \ldots, x@l)$ or
+$(\alpha@1, \ldots, \alpha@l) ty$.  Multiple productions associated
+with some name $c$ are tried in order of appearance.  An occurrence of
$\Variable x$ is simply printed as~$x$.

Blanks are {\em not\/} inserted automatically.  If blanks are required to
@@ -318,7 +320,7 @@

-\section{Macros: Syntactic rewriting} \label{sec:macros}
+\section{Macros: syntactic rewriting} \label{sec:macros}
\index{macros|(}\index{rewriting!syntactic|(}

Mixfix declarations alone can handle situations where there is a direct
@@ -338,19 +340,20 @@
on abstract syntax trees.  They are usually easy to read and write, and can
express all but the most obscure translations.

-Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
-theory.\footnote{This and the following theories are complete working
-  examples, though they specify only syntax, no axioms.  The file {\tt
-    ZF/ZF.thy} presents a full set theory definition, including many
-  macro rules.}  Theory {\tt SET} defines constants for set comprehension
-({\tt Collect}), replacement ({\tt Replace}) and bounded universal
-quantification ({\tt Ball}).  Each of these binds some variables.  Without
-additional syntax we should have to write $\forall x \in A. P$ as {\tt
-  Ball(A,\%x.P)}, and similarly for the others.
+Figure~\ref{fig:set_trans} defines a fragment of first-order logic and
+set theory.\footnote{This and the following theories are complete
+  working examples, though they specify only syntax, no axioms.  The
+  file {\tt ZF/ZF.thy} presents a full set theory definition,
+  including many macro rules.} Theory {\tt SetSyntax} declares
+constants for set comprehension ({\tt Collect}), replacement ({\tt
+  Replace}) and bounded universal quantification ({\tt Ball}).  Each
+of these binds some variables.  Without additional syntax we should
+have to write $\forall x \in A. P$ as {\tt Ball(A,\%x.P)}, and
+similarly for the others.

\begin{figure}
\begin{ttbox}
-SET = Pure +
+SetSyntax = Pure +
types
i o
arities
@@ -425,8 +428,8 @@
\item Every variable in~$r$ must also occur in~$l$.
\end{itemize}

-Macro rules may refer to any syntax from the parent theories.  They may
-also refer to anything defined before the {\tt .thy} file's {\tt
+Macro rules may refer to any syntax from the parent theories.  They
+may also refer to anything defined before the current {\tt
translations} section --- including any mixfix declarations.

Upon declaration, both sides of the macro rule undergo parsing and parse
@@ -440,10 +443,11 @@
matching.  These are all names that have been declared as classes, types or
constants (logical and syntactic).

-The result of this preprocessing is two lists of macro rules, each stored
-as a pair of \AST{}s.  They can be viewed using {\tt print_syntax}
-(sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
-theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
+The result of this preprocessing is two lists of macro rules, each
+stored as a pair of \AST{}s.  They can be viewed using {\tt
+  print_syntax} (sections \ttindex{parse_rules} and
+\ttindex{print_rules}).  For theory~{\tt SetSyntax} of
+Fig.~\ref{fig:set_trans} these are
\begin{ttbox}
parse_rules:
("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
@@ -484,9 +488,7 @@

To allow such constraints to be re-read, your syntax should specify bound
variables using the nonterminal~\ndx{idt}.  This is the case in our
-example.  Choosing {\tt id} instead of {\tt idt} is a common error,
-especially since it appears in former versions of most of Isabelle's
-object-logics.
+example.  Choosing {\tt id} instead of {\tt idt} is a common error.
\end{warn}

@@ -552,17 +554,17 @@
The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
expected!  Guess how type~{\tt Nil} is printed?

-Normalizing an \AST{} involves repeatedly applying macro rules until none
-are applicable.  Macro rules are chosen in the order that they appear in the
-{\tt translations} section.  You can watch the normalization of \AST{}s
-during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
-{\tt true}.\index{tracing!of macros} Alternatively, use
+Normalizing an \AST{} involves repeatedly applying macro rules until
+none are applicable.  Macro rules are chosen in order of appearance in
+the theory definitions.  You can watch the normalization of \AST{}s
+during parsing and printing by setting \ttindex{Syntax.trace_norm_ast}
+to {\tt true}.\index{tracing!of macros} Alternatively, use
\ttindex{Syntax.test_read}.  The information displayed when tracing
-includes the \AST{} before normalization ({\tt pre}), redexes with results
-({\tt rewrote}), the normal form finally reached ({\tt post}) and some
-statistics ({\tt normalize}).  If tracing is off,
-\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
-printing of the normal form and statistics only.
+includes the \AST{} before normalization ({\tt pre}), redexes with
+results ({\tt rewrote}), the normal form finally reached ({\tt post})
+and some statistics ({\tt normalize}).  If tracing is off,
+\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to
+enable printing of the normal form and statistics only.

\subsection{Example: the syntax of finite sets}
@@ -574,7 +576,7 @@
\index{"@Enum@{\tt\at Enum} constant}
\index{"@Finset@{\tt\at Finset} constant}
\begin{ttbox}
-FINSET = SET +
+FinSyntax = SetSyntax +
types
is
syntax
@@ -654,7 +656,7 @@
readable in some cases: {\em calling\/} translation functions by parse
macros:
\begin{ttbox}
-PROD = FINSET +
+ProdSyntax = FinSyntax +
consts
Pi            :: [i, i => i] => i
syntax
@@ -693,10 +695,10 @@
\index{translations|(}
%
This section describes the translation function mechanism.  By writing
-\ML{} functions, you can do almost everything with terms or \AST{}s during
-parsing and printing.  The logic \LK\ is a good example of sophisticated
-transformations between internal and external representations of sequents;
-here, macros would be useless.
+\ML{} functions, you can do almost everything to terms or \AST{}s
+during parsing and printing.  The logic \LK\ is a good example of
+sophisticated transformations between internal and external
+representations of sequents; here, macros would be useless.

A full understanding of translations requires some familiarity
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
@@ -705,50 +707,55 @@
use translation functions.

\subsection{Declaring translation functions}
-There are four kinds of translation functions.  Each such function is
-associated with a name, which triggers calls to it.  Such names can be
-constants (logical or syntactic) or type constructors.
+There are four kinds of translation functions, with one of these
+coming in two variants.  Each such function is associated with a name,
+which triggers calls to it.  Such names can be constants (logical or
+syntactic) or type constructors.

{\tt print_syntax} displays the sets of names associated with the
-translation functions of a {\tt Syntax.syntax} under
+translation functions of a theory under
\ttindex{parse_ast_translation}, \ttindex{parse_translation},
-\ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
-add new ones via the {\tt ML} section\index{*ML section} of
-a {\tt .thy} file.  There may never be more than one function of the same
-kind per name.  Even though the {\tt ML} section is the very last part of a
-{\tt .thy} file, newly installed translation functions are effective when
-processing the preceding section.
+\ttindex{print_translation} and \ttindex{print_ast_translation}.  You
+can add new ones via the {\tt ML} section\index{*ML section} of a
+theory definition file.  There may never be more than one function of
+the same kind per name.  Even though the {\tt ML} section is the very
+last part of the file, newly installed translation functions are
+already effective when processing all of the preceding sections.

-The {\tt ML} section is copied verbatim near the beginning of the \ML\ file
-generated from a {\tt .thy} file.  Definitions made here are accessible as
-components of an \ML\ structure; to make some definitions private, use an
-\ML{} {\tt local} declaration.  The {\tt ML} section may install translation
-functions by declaring any of the following identifiers:
+The {\tt ML} section's contents are simply copied verbatim near the
+beginning of the \ML\ file generated from a theory definition file.
+Definitions made here are accessible as components of an \ML\
+structure; to make some parts private, use an \ML{} {\tt local}
+declaration.  The {\ML} code may install translation functions by
+declaring any of the following identifiers:
\begin{ttbox}
-val parse_ast_translation : (string * (ast list -> ast)) list
-val print_ast_translation : (string * (ast list -> ast)) list
-val parse_translation     : (string * (term list -> term)) list
-val print_translation     : (string * (term list -> term)) list
+val parse_ast_translation   : (string * (ast list -> ast)) list
+val print_ast_translation   : (string * (ast list -> ast)) list
+val parse_translation       : (string * (term list -> term)) list
+val print_translation       : (string * (term list -> term)) list
+val typed_print_translation : (string * (typ -> term list -> term)) list
\end{ttbox}

\subsection{The translation strategy}
-All four kinds of translation functions are treated similarly.  They are
-called during the transformations between parse trees, \AST{}s and terms
-(recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
-$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
-$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
-function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
+The different kinds of translation functions are called during the
+transformations between parse trees, \AST{}s and terms (recall
+Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
+$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation
+function $f$ of appropriate kind exists for $c$, the result is
+computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.

-For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
-combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots, - x@n}$.  For term translations, the arguments are terms and a combination
-has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp -x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
-transformations than \AST{}s do, typically involving abstractions and bound
-variables.
+For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.
+A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, + \ldots, x@n}$.  For term translations, the arguments are terms and a
+combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} +(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more
+sophisticated transformations than \AST{}s do, typically involving
+abstractions and bound variables. {\em Typed} print translations may
+even peek at the type $\tau$ of the constant they are invoked on.

-Regardless of whether they act on terms or \AST{}s, parse translations differ
-from print translations in their overall behaviour fundamentally:
+Regardless of whether they act on terms or \AST{}s, translation
+functions called during the parsing process differ from those for
+printing more fundamentally in their overall behaviour:
\begin{description}
\item[Parse translations] are applied bottom-up.  The arguments are already
in translated form.  The translations must not fail; exceptions trigger
@@ -804,40 +811,93 @@
\begin{ttbox}
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
if 0 mem (loose_bnos B) then
-        let val (x', B') = variant_abs (x, dummyT, B);
-        in list_comb (Const (q, dummyT) $Free (x', T)$ A $B', ts) + let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in + list_comb + (Const (q, dummyT)$ Syntax.mark_boundT (x', T) $A$ B', ts)
end
else list_comb (Const (r, dummyT) $A$ B, ts)
| dependent_tr' _ _ = raise Match;
\end{ttbox}
-The argument {\tt (q,r)} is supplied to the curried function {\tt
-  dependent_tr'} by a partial application during its installation.  We
-can set up print translations for both {\tt Pi} and {\tt Sigma} by
-including
+The argument {\tt (q, r)} is supplied to the curried function {\tt
+  dependent_tr'} by a partial application during its installation.
+For example, we could set up print translations for both {\tt Pi} and
+{\tt Sigma} by including
\begin{ttbox}\index{*ML section}
val print_translation =
[("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
\end{ttbox}
-within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A, -\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
-$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not depend
-on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
-from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
-from all names in $B$, and $B'$ is the body $B$ with {\tt Bound} nodes
-referring to the {\tt Abs} node replaced by $\ttfct{Free} (x', -\mtt{dummyT})$.
+within the {\tt ML} section.  The first of these transforms ${\tt + Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
+$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not
+depend on~$x$.  It checks this using \ttindex{loose_bnos}, yet another
+function from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$
+renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt
+  Bound} nodes referring to the {\tt Abs} node replaced by
+$\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound
+variable).

We must be careful with types here.  While types of {\tt Const}s are
ignored, type constraints may be printed for some {\tt Free}s and
{\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
\ttindex{dummyT} are never printed with constraint, though.  The line
\begin{ttbox}
-        let val (x', B') = variant_abs (x, dummyT, B);
-\end{ttbox}\index{*variant_abs}
+        let val (x', B') = Syntax.variant_abs' (x, dummyT, B);
+\end{ttbox}\index{*Syntax.variant_abs'}
replaces bound variable occurrences in~$B$ by the free variable $x'$ with
type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
correct type~{\tt T}, so this is the only place where a type
constraint might appear.
-\index{translations|)}
-\index{syntax!transformations|)}
+
+Also note that we are responsible to mark free identifiers that
+actually represent bound variables. This is achieved by
+\ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.
+Failing to do so may cause these names to be printed in the wrong
+style.  \index{translations|)} \index{syntax!transformations|)}
+
+
+\section{Token translations} \label{sec:tok_tr}
+\index{token translations|(}
+%
+Isabelle's meta-logic features quite a lot of different kinds of
+identifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free},
+{\em bound}, {\em var}. One might want to have these printed in
+different styles, e.g.\ in bold or italic, or even transcribed into
+something more readable like $\alpha, \alpha', \beta$ instead of {\tt
+  'a}, {\tt 'aa}, {\tt 'b} for type variables. Token translations
+provide a means to such ends, enabling the user to install certain
+\ML{} functions associated with any logical \rmindex{token class} and
+depending on some \rmindex{print mode}.
+
+The logical class of identifiers can not necessarily be determined by
+its syntactic category, though. For example, consider free vs.\ bound
+variables. So Isabelle's pretty printing mechanism, starting from
+fully typed terms, has to be careful to preserve this additional
+information\footnote{This is done by marking atoms in abstract syntax
+  trees appropriately. The marks are actually visible by print
+  translation functions -- they are just special constants applied to
+  atomic asts, for example \texttt{("_bound" x)}.}.  In particular,
+user-supplied print translation functions operating on terms have to
+be well-behaved in this respect. Free identifiers introduced to
+represent bound variables have to be marked appropriately (cf.\ the
+example at the end of \S\ref{sec:tr_funs}).
+
+\ttindex{token_translation} value within the \texttt{ML} section of a
+theory definition file:
+\begin{ttbox}
+val token_translation: (string * string * (string -> string * int)) list
+\end{ttbox}\index{token_translation}
+The elements of this list are of the form $(m, c, f)$, where $m$ is a
+print mode identifier, $c$ a token class, and $f\colon string \to +string \times int$ the actual translation function. Assuming that $x$
+is of identifier class $c$, and print mode $m$ is the first one of all
+currently active modes that provide some translation for $c$, then $x$
+is output according to $f(x) = (x', len)$. Thereby $x'$ is the
+modified identifier name and $len$ its visual length approximated in
+terms of whole characters. Thus $x'$ may include non-printing parts
+like control sequences or markup information for typesetting systems.
+
+%FIXME example (?)
+
+\index{token translations|)}