--- a/doc-src/IsarRef/Thy/Inner_Syntax.thy Mon Jun 18 16:30:20 2012 +0200
+++ b/doc-src/IsarRef/Thy/Inner_Syntax.thy Mon Jun 18 21:17:34 2012 +0200
@@ -1062,11 +1062,123 @@
subsection {* Abstract syntax trees *}
-text {*
- FIXME
+text {* The ML datatype @{ML_type Ast.ast} explicitly represents the
+ intermediate AST format that is used for syntax rewriting
+ (\secref{sec:syn-trans}). It is defined in ML as follows:
+ \begin{ttbox}
+ datatype ast =
+ Constant of string |
+ Variable of string |
+ Appl of ast list
+ \end{ttbox}
+
+ An AST is either an atom (constant or variable) or a list of (at
+ least two) subtrees. Occasional diagnostic output of ASTs uses
+ notation that resembles S-expression of LISP. Constant atoms are
+ shown as quoted strings, variable atoms as non-quoted strings and
+ applications as a parenthesized list of subtrees. For example, the
+ AST
+ @{ML [display] "Ast.Appl
+ [Ast.Constant \"_abs\", Ast.Variable \"x\", Ast.Variable \"t\"]"}
+ is pretty-printed as @{verbatim "(\"_abs\" x t)"}. Note that
+ @{verbatim "()"} and @{verbatim "(x)"} are excluded as ASTs, because
+ they have too few subtrees.
+
+ \medskip AST application is merely a pro-forma mechanism to indicate
+ certain syntactic structures. Thus @{verbatim "(c a b)"} could mean
+ either term application or type application, depending on the
+ syntactic context.
+
+ Nested application like @{verbatim "((\"_abs\" x t) u)"} is also
+ possible, but ASTs are definitely first-order: the syntax constant
+ @{verbatim "\"_abs\""} does not bind the @{verbatim x} in any way.
+ Proper bindings are introduced in later stages of the term syntax,
+ where @{verbatim "(\"_abs\" x t)"} becomes an @{ML Abs} node and
+ occurrences of @{verbatim x} in @{verbatim t} are replaced by bound
+ variables (represented as de-Bruijn indices).
*}
+subsubsection {* Ast constants versus variables *}
+
+text {* Depending on the situation --- input syntax, output syntax,
+ translation patterns --- the distinction of atomic asts as @{ML
+ Ast.Constant} versus @{ML Ast.Variable} serves slightly different
+ purposes.
+
+ Input syntax of a term such as @{text "f a b = c"} does not yet
+ indicate the scopes of atomic entities @{text "f, a, b, c"}: they
+ could be global constants or local variables, even bound ones
+ depending on the context of the term. @{ML Ast.Variable} leaves
+ this choice still open: later syntax layers (or translation
+ functions) may capture such a variable to determine its role
+ specifically, to make it a constant, bound variable, free variable
+ etc. In contrast, syntax translations that introduce already known
+ constants would rather do it via @{ML Ast.Constant} to prevent
+ accidental re-interpretation later on.
+
+ Output syntax turns term constants into @{ML Ast.Constant} and
+ variables (free or schematic) into @{ML Ast.Variable}. This
+ information is precise when printing fully formal @{text "\<lambda>"}-terms.
+
+ In AST translation patterns (\secref{sec:syn-trans}) the system
+ guesses from the current theory context which atoms should be
+ treated as constant versus variable for the matching process.
+ Sometimes this needs to be indicated more explicitly using @{text
+ "CONST c"} inside the term language. It is also possible to use
+ @{command syntax} declarations (without mixfix annotation) to
+ enforce that certain unqualified names are always treated as
+ constant within the syntax machinery.
+
+ \medskip For ASTs that represent the language of types or sorts, the
+ situation is much simpler, since the concrete syntax already
+ distinguishes type variables from type constants (constructors). So
+ @{text "('a, 'b) foo"} corresponds to an AST application of some
+ constant for @{text foo} and variable arguments for @{text "'a"} and
+ @{text "'b"}. Note that the postfix application is merely a feature
+ of the concrete syntax, while in the AST the constructor occurs in
+ head position. *}
+
+
+subsubsection {* Authentic syntax names *}
+
+text {* Naming constant entities within ASTs is another delicate
+ issue. Unqualified names are looked up in the name space tables in
+ the last stage of parsing, after all translations have been applied.
+ Since syntax transformations do not know about this later name
+ resolution yet, there can be surprises in boundary cases.
+
+ \emph{Authentic syntax names} for @{ML Ast.Constant} avoid this
+ problem: the fully-qualified constant name with a special prefix for
+ its formal category (@{text "class"}, @{text "type"}, @{text
+ "const"}, @{text "fixed"}) represents the information faithfully
+ within the untyped AST format. Accidental overlap with free or
+ bound variables is excluded as well. Authentic syntax names work
+ implicitly in the following situations:
+
+ \begin{itemize}
+
+ \item Input of term constants (or fixed variables) that are
+ introduced by concrete syntax via @{command notation}: the
+ correspondence of a particular grammar production to some known term
+ entity is preserved.
+
+ \item Input type constants (constructors) and type classes ---
+ thanks to explicit syntactic distinction independently on the
+ context.
+
+ \item Output of term constants, type constants, type classes ---
+ this information is already available from the internal term to be
+ printed.
+
+ \end{itemize}
+
+ In other words, syntax transformations that operate on input terms
+ written as prefix applications are difficult to achieve. Luckily,
+ this case rarely occurs in practice, because syntax forms to be
+ translated usually correspond to some bits of concrete notation. *}
+
+
subsection {* Raw syntax and translations \label{sec:syn-trans} *}
text {*
--- a/doc-src/IsarRef/Thy/document/Inner_Syntax.tex Mon Jun 18 16:30:20 2012 +0200
+++ b/doc-src/IsarRef/Thy/document/Inner_Syntax.tex Mon Jun 18 21:17:34 2012 +0200
@@ -1244,7 +1244,127 @@
\isamarkuptrue%
%
\begin{isamarkuptext}%
-FIXME%
+The ML datatype \verb|Ast.ast| explicitly represents the
+ intermediate AST format that is used for syntax rewriting
+ (\secref{sec:syn-trans}). It is defined in ML as follows:
+ \begin{ttbox}
+ datatype ast =
+ Constant of string |
+ Variable of string |
+ Appl of ast list
+ \end{ttbox}
+
+ An AST is either an atom (constant or variable) or a list of (at
+ least two) subtrees. Occasional diagnostic output of ASTs uses
+ notation that resembles S-expression of LISP. Constant atoms are
+ shown as quoted strings, variable atoms as non-quoted strings and
+ applications as a parenthesized list of subtrees. For example, the
+ AST
+ \begin{verbatim}
+Ast.Appl
+ [Ast.Constant "_abs", Ast.Variable "x", Ast.Variable "t"]
+\end{verbatim}
+ is pretty-printed as \verb|("_abs" x t)|. Note that
+ \verb|()| and \verb|(x)| are excluded as ASTs, because
+ they have too few subtrees.
+
+ \medskip AST application is merely a pro-forma mechanism to indicate
+ certain syntactic structures. Thus \verb|(c a b)| could mean
+ either term application or type application, depending on the
+ syntactic context.
+
+ Nested application like \verb|(("_abs" x t) u)| is also
+ possible, but ASTs are definitely first-order: the syntax constant
+ \verb|"_abs"| does not bind the \verb|x| in any way.
+ Proper bindings are introduced in later stages of the term syntax,
+ where \verb|("_abs" x t)| becomes an \verb|Abs| node and
+ occurrences of \verb|x| in \verb|t| are replaced by bound
+ variables (represented as de-Bruijn indices).%
+\end{isamarkuptext}%
+\isamarkuptrue%
+%
+\isamarkupsubsubsection{Ast constants versus variables%
+}
+\isamarkuptrue%
+%
+\begin{isamarkuptext}%
+Depending on the situation --- input syntax, output syntax,
+ translation patterns --- the distinction of atomic asts as \verb|Ast.Constant| versus \verb|Ast.Variable| serves slightly different
+ purposes.
+
+ Input syntax of a term such as \isa{{\isaliteral{22}{\isachardoublequote}}f\ a\ b\ {\isaliteral{3D}{\isacharequal}}\ c{\isaliteral{22}{\isachardoublequote}}} does not yet
+ indicate the scopes of atomic entities \isa{{\isaliteral{22}{\isachardoublequote}}f{\isaliteral{2C}{\isacharcomma}}\ a{\isaliteral{2C}{\isacharcomma}}\ b{\isaliteral{2C}{\isacharcomma}}\ c{\isaliteral{22}{\isachardoublequote}}}: they
+ could be global constants or local variables, even bound ones
+ depending on the context of the term. \verb|Ast.Variable| leaves
+ this choice still open: later syntax layers (or translation
+ functions) may capture such a variable to determine its role
+ specifically, to make it a constant, bound variable, free variable
+ etc. In contrast, syntax translations that introduce already known
+ constants would rather do it via \verb|Ast.Constant| to prevent
+ accidental re-interpretation later on.
+
+ Output syntax turns term constants into \verb|Ast.Constant| and
+ variables (free or schematic) into \verb|Ast.Variable|. This
+ information is precise when printing fully formal \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C6C616D6264613E}{\isasymlambda}}{\isaliteral{22}{\isachardoublequote}}}-terms.
+
+ In AST translation patterns (\secref{sec:syn-trans}) the system
+ guesses from the current theory context which atoms should be
+ treated as constant versus variable for the matching process.
+ Sometimes this needs to be indicated more explicitly using \isa{{\isaliteral{22}{\isachardoublequote}}CONST\ c{\isaliteral{22}{\isachardoublequote}}} inside the term language. It is also possible to use
+ \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}} declarations (without mixfix annotation) to
+ enforce that certain unqualified names are always treated as
+ constant within the syntax machinery.
+
+ \medskip For ASTs that represent the language of types or sorts, the
+ situation is much simpler, since the concrete syntax already
+ distinguishes type variables from type constants (constructors). So
+ \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{28}{\isacharparenleft}}{\isaliteral{27}{\isacharprime}}a{\isaliteral{2C}{\isacharcomma}}\ {\isaliteral{27}{\isacharprime}}b{\isaliteral{29}{\isacharparenright}}\ foo{\isaliteral{22}{\isachardoublequote}}} corresponds to an AST application of some
+ constant for \isa{foo} and variable arguments for \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{27}{\isacharprime}}a{\isaliteral{22}{\isachardoublequote}}} and
+ \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{27}{\isacharprime}}b{\isaliteral{22}{\isachardoublequote}}}. Note that the postfix application is merely a feature
+ of the concrete syntax, while in the AST the constructor occurs in
+ head position.%
+\end{isamarkuptext}%
+\isamarkuptrue%
+%
+\isamarkupsubsubsection{Authentic syntax names%
+}
+\isamarkuptrue%
+%
+\begin{isamarkuptext}%
+Naming constant entities within ASTs is another delicate
+ issue. Unqualified names are looked up in the name space tables in
+ the last stage of parsing, after all translations have been applied.
+ Since syntax transformations do not know about this later name
+ resolution yet, there can be surprises in boundary cases.
+
+ \emph{Authentic syntax names} for \verb|Ast.Constant| avoid this
+ problem: the fully-qualified constant name with a special prefix for
+ its formal category (\isa{{\isaliteral{22}{\isachardoublequote}}class{\isaliteral{22}{\isachardoublequote}}}, \isa{{\isaliteral{22}{\isachardoublequote}}type{\isaliteral{22}{\isachardoublequote}}}, \isa{{\isaliteral{22}{\isachardoublequote}}const{\isaliteral{22}{\isachardoublequote}}}, \isa{{\isaliteral{22}{\isachardoublequote}}fixed{\isaliteral{22}{\isachardoublequote}}}) represents the information faithfully
+ within the untyped AST format. Accidental overlap with free or
+ bound variables is excluded as well. Authentic syntax names work
+ implicitly in the following situations:
+
+ \begin{itemize}
+
+ \item Input of term constants (or fixed variables) that are
+ introduced by concrete syntax via \hyperlink{command.notation}{\mbox{\isa{\isacommand{notation}}}}: the
+ correspondence of a particular grammar production to some known term
+ entity is preserved.
+
+ \item Input type constants (constructors) and type classes ---
+ thanks to explicit syntactic distinction independently on the
+ context.
+
+ \item Output of term constants, type constants, type classes ---
+ this information is already available from the internal term to be
+ printed.
+
+ \end{itemize}
+
+ In other words, syntax transformations that operate on input terms
+ written as prefix applications are difficult to achieve. Luckily,
+ this case rarely occurs in practice, because syntax forms to be
+ translated usually correspond to some bits of concrete notation.%
\end{isamarkuptext}%
\isamarkuptrue%
%
--- a/doc-src/Ref/syntax.tex Mon Jun 18 16:30:20 2012 +0200
+++ b/doc-src/Ref/syntax.tex Mon Jun 18 21:17:34 2012 +0200
@@ -8,50 +8,6 @@
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
\index{syntax!transformations|(}
-\section{Abstract syntax trees} \label{sec:asts}
-\index{ASTs|(}
-
-Abstract syntax trees are an intermediate form between the raw parse trees
-and the typed $\lambda$-terms. An \AST{} is either an atom (constant or
-variable) or a list of {\em at least two\/} subtrees. Internally, they
-have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
-\index{*Appl}
-\begin{ttbox}
-datatype ast = Constant of string
- | Variable of string
- | Appl of ast list
-\end{ttbox}
-%
-Isabelle uses an S-expression syntax for abstract syntax trees. Constant
-atoms are shown as quoted strings, variable atoms as non-quoted strings and
-applications as a parenthesised list of subtrees. For example, the \AST
-\begin{ttbox}
-Appl [Constant "_constrain",
- Appl [Constant "_abs", Variable "x", Variable "t"],
- Appl [Constant "fun", Variable "'a", Variable "'b"]]
-\end{ttbox}
-is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
-Both {\tt ()} and {\tt (f)} are illegal because they have too few
-subtrees.
-
-The resemblance to Lisp's S-expressions is intentional, but there are two
-kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the
-names {\tt Constant} and {\tt Variable} too literally; in the later
-translation to terms, $\Variable x$ may become a constant, free or bound
-variable, even a type constructor or class name; the actual outcome depends
-on the context.
-
-Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
-application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of
-application is determined later by context; it could be a type constructor
-applied to types.
-
-Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
-first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later
-at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
-occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
-constructor \ttindex{Bound}).
-
\section{Transforming parse trees to ASTs}\label{sec:astofpt}
\index{ASTs!made from parse trees}