# HG changeset patch # User wenzelm # Date 1340047054 -7200 # Node ID 428e55887f243997bacc2eae64e78a5864ff2f44 # Parent 1c4500446ba479a67ac98ee9151def0f72803382 more on "Abstract syntax trees"; diff -r 1c4500446ba4 -r 428e55887f24 doc-src/IsarRef/Thy/Inner_Syntax.thy --- a/doc-src/IsarRef/Thy/Inner_Syntax.thy Mon Jun 18 16:30:20 2012 +0200 +++ b/doc-src/IsarRef/Thy/Inner_Syntax.thy Mon Jun 18 21:17:34 2012 +0200 @@ -1062,11 +1062,123 @@ subsection {* Abstract syntax trees *} -text {* - FIXME +text {* The ML datatype @{ML_type Ast.ast} explicitly represents the + intermediate AST format that is used for syntax rewriting + (\secref{sec:syn-trans}). It is defined in ML as follows: + \begin{ttbox} + datatype ast = + Constant of string | + Variable of string | + Appl of ast list + \end{ttbox} + + An AST is either an atom (constant or variable) or a list of (at + least two) subtrees. Occasional diagnostic output of ASTs uses + notation that resembles S-expression of LISP. Constant atoms are + shown as quoted strings, variable atoms as non-quoted strings and + applications as a parenthesized list of subtrees. For example, the + AST + @{ML [display] "Ast.Appl + [Ast.Constant \"_abs\", Ast.Variable \"x\", Ast.Variable \"t\"]"} + is pretty-printed as @{verbatim "(\"_abs\" x t)"}. Note that + @{verbatim "()"} and @{verbatim "(x)"} are excluded as ASTs, because + they have too few subtrees. + + \medskip AST application is merely a pro-forma mechanism to indicate + certain syntactic structures. Thus @{verbatim "(c a b)"} could mean + either term application or type application, depending on the + syntactic context. + + Nested application like @{verbatim "((\"_abs\" x t) u)"} is also + possible, but ASTs are definitely first-order: the syntax constant + @{verbatim "\"_abs\""} does not bind the @{verbatim x} in any way. + Proper bindings are introduced in later stages of the term syntax, + where @{verbatim "(\"_abs\" x t)"} becomes an @{ML Abs} node and + occurrences of @{verbatim x} in @{verbatim t} are replaced by bound + variables (represented as de-Bruijn indices). *} +subsubsection {* Ast constants versus variables *} + +text {* Depending on the situation --- input syntax, output syntax, + translation patterns --- the distinction of atomic asts as @{ML + Ast.Constant} versus @{ML Ast.Variable} serves slightly different + purposes. + + Input syntax of a term such as @{text "f a b = c"} does not yet + indicate the scopes of atomic entities @{text "f, a, b, c"}: they + could be global constants or local variables, even bound ones + depending on the context of the term. @{ML Ast.Variable} leaves + this choice still open: later syntax layers (or translation + functions) may capture such a variable to determine its role + specifically, to make it a constant, bound variable, free variable + etc. In contrast, syntax translations that introduce already known + constants would rather do it via @{ML Ast.Constant} to prevent + accidental re-interpretation later on. + + Output syntax turns term constants into @{ML Ast.Constant} and + variables (free or schematic) into @{ML Ast.Variable}. This + information is precise when printing fully formal @{text "\"}-terms. + + In AST translation patterns (\secref{sec:syn-trans}) the system + guesses from the current theory context which atoms should be + treated as constant versus variable for the matching process. + Sometimes this needs to be indicated more explicitly using @{text + "CONST c"} inside the term language. It is also possible to use + @{command syntax} declarations (without mixfix annotation) to + enforce that certain unqualified names are always treated as + constant within the syntax machinery. + + \medskip For ASTs that represent the language of types or sorts, the + situation is much simpler, since the concrete syntax already + distinguishes type variables from type constants (constructors). So + @{text "('a, 'b) foo"} corresponds to an AST application of some + constant for @{text foo} and variable arguments for @{text "'a"} and + @{text "'b"}. Note that the postfix application is merely a feature + of the concrete syntax, while in the AST the constructor occurs in + head position. *} + + +subsubsection {* Authentic syntax names *} + +text {* Naming constant entities within ASTs is another delicate + issue. Unqualified names are looked up in the name space tables in + the last stage of parsing, after all translations have been applied. + Since syntax transformations do not know about this later name + resolution yet, there can be surprises in boundary cases. + + \emph{Authentic syntax names} for @{ML Ast.Constant} avoid this + problem: the fully-qualified constant name with a special prefix for + its formal category (@{text "class"}, @{text "type"}, @{text + "const"}, @{text "fixed"}) represents the information faithfully + within the untyped AST format. Accidental overlap with free or + bound variables is excluded as well. Authentic syntax names work + implicitly in the following situations: + + \begin{itemize} + + \item Input of term constants (or fixed variables) that are + introduced by concrete syntax via @{command notation}: the + correspondence of a particular grammar production to some known term + entity is preserved. + + \item Input type constants (constructors) and type classes --- + thanks to explicit syntactic distinction independently on the + context. + + \item Output of term constants, type constants, type classes --- + this information is already available from the internal term to be + printed. + + \end{itemize} + + In other words, syntax transformations that operate on input terms + written as prefix applications are difficult to achieve. Luckily, + this case rarely occurs in practice, because syntax forms to be + translated usually correspond to some bits of concrete notation. *} + + subsection {* Raw syntax and translations \label{sec:syn-trans} *} text {* diff -r 1c4500446ba4 -r 428e55887f24 doc-src/IsarRef/Thy/document/Inner_Syntax.tex --- a/doc-src/IsarRef/Thy/document/Inner_Syntax.tex Mon Jun 18 16:30:20 2012 +0200 +++ b/doc-src/IsarRef/Thy/document/Inner_Syntax.tex Mon Jun 18 21:17:34 2012 +0200 @@ -1244,7 +1244,127 @@ \isamarkuptrue% % \begin{isamarkuptext}% -FIXME% +The ML datatype \verb|Ast.ast| explicitly represents the + intermediate AST format that is used for syntax rewriting + (\secref{sec:syn-trans}). It is defined in ML as follows: + \begin{ttbox} + datatype ast = + Constant of string | + Variable of string | + Appl of ast list + \end{ttbox} + + An AST is either an atom (constant or variable) or a list of (at + least two) subtrees. Occasional diagnostic output of ASTs uses + notation that resembles S-expression of LISP. Constant atoms are + shown as quoted strings, variable atoms as non-quoted strings and + applications as a parenthesized list of subtrees. For example, the + AST + \begin{verbatim} +Ast.Appl + [Ast.Constant "_abs", Ast.Variable "x", Ast.Variable "t"] +\end{verbatim} + is pretty-printed as \verb|("_abs" x t)|. Note that + \verb|()| and \verb|(x)| are excluded as ASTs, because + they have too few subtrees. + + \medskip AST application is merely a pro-forma mechanism to indicate + certain syntactic structures. Thus \verb|(c a b)| could mean + either term application or type application, depending on the + syntactic context. + + Nested application like \verb|(("_abs" x t) u)| is also + possible, but ASTs are definitely first-order: the syntax constant + \verb|"_abs"| does not bind the \verb|x| in any way. + Proper bindings are introduced in later stages of the term syntax, + where \verb|("_abs" x t)| becomes an \verb|Abs| node and + occurrences of \verb|x| in \verb|t| are replaced by bound + variables (represented as de-Bruijn indices).% +\end{isamarkuptext}% +\isamarkuptrue% +% +\isamarkupsubsubsection{Ast constants versus variables% +} +\isamarkuptrue% +% +\begin{isamarkuptext}% +Depending on the situation --- input syntax, output syntax, + translation patterns --- the distinction of atomic asts as \verb|Ast.Constant| versus \verb|Ast.Variable| serves slightly different + purposes. + + Input syntax of a term such as \isa{{\isaliteral{22}{\isachardoublequote}}f\ a\ b\ {\isaliteral{3D}{\isacharequal}}\ c{\isaliteral{22}{\isachardoublequote}}} does not yet + indicate the scopes of atomic entities \isa{{\isaliteral{22}{\isachardoublequote}}f{\isaliteral{2C}{\isacharcomma}}\ a{\isaliteral{2C}{\isacharcomma}}\ b{\isaliteral{2C}{\isacharcomma}}\ c{\isaliteral{22}{\isachardoublequote}}}: they + could be global constants or local variables, even bound ones + depending on the context of the term. \verb|Ast.Variable| leaves + this choice still open: later syntax layers (or translation + functions) may capture such a variable to determine its role + specifically, to make it a constant, bound variable, free variable + etc. In contrast, syntax translations that introduce already known + constants would rather do it via \verb|Ast.Constant| to prevent + accidental re-interpretation later on. + + Output syntax turns term constants into \verb|Ast.Constant| and + variables (free or schematic) into \verb|Ast.Variable|. This + information is precise when printing fully formal \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C6C616D6264613E}{\isasymlambda}}{\isaliteral{22}{\isachardoublequote}}}-terms. + + In AST translation patterns (\secref{sec:syn-trans}) the system + guesses from the current theory context which atoms should be + treated as constant versus variable for the matching process. + Sometimes this needs to be indicated more explicitly using \isa{{\isaliteral{22}{\isachardoublequote}}CONST\ c{\isaliteral{22}{\isachardoublequote}}} inside the term language. It is also possible to use + \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}} declarations (without mixfix annotation) to + enforce that certain unqualified names are always treated as + constant within the syntax machinery. + + \medskip For ASTs that represent the language of types or sorts, the + situation is much simpler, since the concrete syntax already + distinguishes type variables from type constants (constructors). So + \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{28}{\isacharparenleft}}{\isaliteral{27}{\isacharprime}}a{\isaliteral{2C}{\isacharcomma}}\ {\isaliteral{27}{\isacharprime}}b{\isaliteral{29}{\isacharparenright}}\ foo{\isaliteral{22}{\isachardoublequote}}} corresponds to an AST application of some + constant for \isa{foo} and variable arguments for \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{27}{\isacharprime}}a{\isaliteral{22}{\isachardoublequote}}} and + \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{27}{\isacharprime}}b{\isaliteral{22}{\isachardoublequote}}}. Note that the postfix application is merely a feature + of the concrete syntax, while in the AST the constructor occurs in + head position.% +\end{isamarkuptext}% +\isamarkuptrue% +% +\isamarkupsubsubsection{Authentic syntax names% +} +\isamarkuptrue% +% +\begin{isamarkuptext}% +Naming constant entities within ASTs is another delicate + issue. Unqualified names are looked up in the name space tables in + the last stage of parsing, after all translations have been applied. + Since syntax transformations do not know about this later name + resolution yet, there can be surprises in boundary cases. + + \emph{Authentic syntax names} for \verb|Ast.Constant| avoid this + problem: the fully-qualified constant name with a special prefix for + its formal category (\isa{{\isaliteral{22}{\isachardoublequote}}class{\isaliteral{22}{\isachardoublequote}}}, \isa{{\isaliteral{22}{\isachardoublequote}}type{\isaliteral{22}{\isachardoublequote}}}, \isa{{\isaliteral{22}{\isachardoublequote}}const{\isaliteral{22}{\isachardoublequote}}}, \isa{{\isaliteral{22}{\isachardoublequote}}fixed{\isaliteral{22}{\isachardoublequote}}}) represents the information faithfully + within the untyped AST format. Accidental overlap with free or + bound variables is excluded as well. Authentic syntax names work + implicitly in the following situations: + + \begin{itemize} + + \item Input of term constants (or fixed variables) that are + introduced by concrete syntax via \hyperlink{command.notation}{\mbox{\isa{\isacommand{notation}}}}: the + correspondence of a particular grammar production to some known term + entity is preserved. + + \item Input type constants (constructors) and type classes --- + thanks to explicit syntactic distinction independently on the + context. + + \item Output of term constants, type constants, type classes --- + this information is already available from the internal term to be + printed. + + \end{itemize} + + In other words, syntax transformations that operate on input terms + written as prefix applications are difficult to achieve. Luckily, + this case rarely occurs in practice, because syntax forms to be + translated usually correspond to some bits of concrete notation.% \end{isamarkuptext}% \isamarkuptrue% % diff -r 1c4500446ba4 -r 428e55887f24 doc-src/Ref/syntax.tex --- a/doc-src/Ref/syntax.tex Mon Jun 18 16:30:20 2012 +0200 +++ b/doc-src/Ref/syntax.tex Mon Jun 18 21:17:34 2012 +0200 @@ -8,50 +8,6 @@ \newcommand\Appl[1]{\ttfct{Appl}\,[#1]} \index{syntax!transformations|(} -\section{Abstract syntax trees} \label{sec:asts} -\index{ASTs|(} - -Abstract syntax trees are an intermediate form between the raw parse trees -and the typed $\lambda$-terms. An \AST{} is either an atom (constant or -variable) or a list of {\em at least two\/} subtrees. Internally, they -have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable} -\index{*Appl} -\begin{ttbox} -datatype ast = Constant of string - | Variable of string - | Appl of ast list -\end{ttbox} -% -Isabelle uses an S-expression syntax for abstract syntax trees. Constant -atoms are shown as quoted strings, variable atoms as non-quoted strings and -applications as a parenthesised list of subtrees. For example, the \AST -\begin{ttbox} -Appl [Constant "_constrain", - Appl [Constant "_abs", Variable "x", Variable "t"], - Appl [Constant "fun", Variable "'a", Variable "'b"]] -\end{ttbox} -is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. -Both {\tt ()} and {\tt (f)} are illegal because they have too few -subtrees. - -The resemblance to Lisp's S-expressions is intentional, but there are two -kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the -names {\tt Constant} and {\tt Variable} too literally; in the later -translation to terms, $\Variable x$ may become a constant, free or bound -variable, even a type constructor or class name; the actual outcome depends -on the context. - -Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the -application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of -application is determined later by context; it could be a type constructor -applied to types. - -Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are -first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later -at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and -occurrences of {\tt x} in $t$ will be replaced by bound variables (the term -constructor \ttindex{Bound}). - \section{Transforming parse trees to ASTs}\label{sec:astofpt} \index{ASTs!made from parse trees}