# HG changeset patch # User wenzelm # Date 1340131389 -7200 # Node ID d868e4f7905b26a3011d12b5a11b36b2d2024d31 # Parent 428e55887f243997bacc2eae64e78a5864ff2f44 more on syntax translations; diff -r 428e55887f24 -r d868e4f7905b doc-src/IsarRef/Thy/Inner_Syntax.thy --- a/doc-src/IsarRef/Thy/Inner_Syntax.thy Mon Jun 18 21:17:34 2012 +0200 +++ b/doc-src/IsarRef/Thy/Inner_Syntax.thy Tue Jun 19 20:43:09 2012 +0200 @@ -1060,7 +1060,7 @@ Isabelle/Pure. *} -subsection {* Abstract syntax trees *} +subsection {* Abstract syntax trees \label{sec:ast} *} text {* The ML datatype @{ML_type Ast.ast} explicitly represents the intermediate AST format that is used for syntax rewriting @@ -1099,7 +1099,7 @@ *} -subsubsection {* Ast constants versus variables *} +subsubsection {* AST constants versus variables *} text {* Depending on the situation --- input syntax, output syntax, translation patterns --- the distinction of atomic asts as @{ML @@ -1192,12 +1192,12 @@ Unlike mixfix notation for existing formal entities (\secref{sec:notation}), raw syntax declarations provide full access - to the priority grammar of the inner syntax. This includes - additional syntactic categories (via @{command nonterminal}) and - free-form grammar productions (via @{command syntax}). Additional - syntax translations (or macros, via @{command translations}) are - required to turn resulting parse trees into proper representations - of formal entities again. + to the priority grammar of the inner syntax, without any sanity + checks. This includes additional syntactic categories (via + @{command nonterminal}) and free-form grammar productions (via + @{command syntax}). Additional syntax translations (or macros, via + @{command translations}) are required to turn resulting parse trees + into proper representations of formal entities again. @{rail " @@{command nonterminal} (@{syntax name} + @'and') @@ -1274,10 +1274,51 @@ are interpreted in the same manner as for @{command "syntax"} above. \item @{command "translations"}~@{text rules} specifies syntactic - translation rules (i.e.\ macros): parse~/ print rules (@{text "\"}), - parse rules (@{text "\"}), or print rules (@{text "\"}). + translation rules (i.e.\ macros) as first-order rewrite rules on + ASTs (see also \secref{sec:ast}). The theory context maintains two + independent lists translation rules: parse rules (@{verbatim "=>"} + or @{text "\"}) and print rules (@{verbatim "<="} or @{text "\"}). + For convenience, both can be specified simultaneously as parse~/ + print rules (@{verbatim "=="} or @{text "\"}). + Translation patterns may be prefixed by the syntactic category to be - used for parsing; the default is @{text logic}. + used for parsing; the default is @{text logic} which means that + regular term syntax is used. Both sides of the syntax translation + rule undergo parsing and parse AST translations + \secref{sec:tr-funs}, in order to perform some fundamental + normalization like @{text "\x y. b \ \x. \y. b"}, but other AST + translation rules are \emph{not} applied recursively here. + + When processing AST patterns, the inner syntax lexer runs in a + different mode that allows identifiers to start with underscore. + This accommodates the usual naming convention for auxiliary syntax + constants --- those that do not have a logical counter part --- by + allowing to specify arbitrary AST applications within the term + syntax, independently of the corresponding concrete syntax. + + Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML + Ast.Variable} as follows: a qualified name or syntax constant + declared via @{command syntax}, or parse tree head of concrete + notation becomes @{ML Ast.Constant}, anything else @{ML + Ast.Variable}. Note that @{text CONST} and @{text XCONST} within + the term language (\secref{sec:pure-grammar}) allow to enforce + treatment as constants. + + AST rewrite rules @{text "(lhs, rhs)"} need to obey the following + side-conditions: + + \begin{itemize} + + \item Rules must be left linear: @{text "lhs"} must not contain + repeated variables.\footnote{The deeper reason for this is that AST + equality is not well-defined: different occurrences of the ``same'' + AST could be decorated differently by accidental type-constraints or + source position information, for example.} + + \item Every variable in @{text "rhs"} must also occur in @{text + "lhs"}. + + \end{itemize} \item @{command "no_translations"}~@{text rules} removes syntactic translation rules, which are interpreted in the same manner as for diff -r 428e55887f24 -r d868e4f7905b doc-src/IsarRef/Thy/document/Inner_Syntax.tex --- a/doc-src/IsarRef/Thy/document/Inner_Syntax.tex Mon Jun 18 21:17:34 2012 +0200 +++ b/doc-src/IsarRef/Thy/document/Inner_Syntax.tex Tue Jun 19 20:43:09 2012 +0200 @@ -1239,7 +1239,7 @@ \end{isamarkuptext}% \isamarkuptrue% % -\isamarkupsubsection{Abstract syntax trees% +\isamarkupsubsection{Abstract syntax trees \label{sec:ast}% } \isamarkuptrue% % @@ -1283,7 +1283,7 @@ \end{isamarkuptext}% \isamarkuptrue% % -\isamarkupsubsubsection{Ast constants versus variables% +\isamarkupsubsubsection{AST constants versus variables% } \isamarkuptrue% % @@ -1383,12 +1383,12 @@ Unlike mixfix notation for existing formal entities (\secref{sec:notation}), raw syntax declarations provide full access - to the priority grammar of the inner syntax. This includes - additional syntactic categories (via \hyperlink{command.nonterminal}{\mbox{\isa{\isacommand{nonterminal}}}}) and - free-form grammar productions (via \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}}). Additional - syntax translations (or macros, via \hyperlink{command.translations}{\mbox{\isa{\isacommand{translations}}}}) are - required to turn resulting parse trees into proper representations - of formal entities again. + to the priority grammar of the inner syntax, without any sanity + checks. This includes additional syntactic categories (via + \hyperlink{command.nonterminal}{\mbox{\isa{\isacommand{nonterminal}}}}) and free-form grammar productions (via + \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}}). Additional syntax translations (or macros, via + \hyperlink{command.translations}{\mbox{\isa{\isacommand{translations}}}}) are required to turn resulting parse trees + into proper representations of formal entities again. \begin{railoutput} \rail@begin{2}{} @@ -1525,10 +1525,48 @@ are interpreted in the same manner as for \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}} above. \item \hyperlink{command.translations}{\mbox{\isa{\isacommand{translations}}}}~\isa{rules} specifies syntactic - translation rules (i.e.\ macros): parse~/ print rules (\isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C72696768746C656674686172706F6F6E733E}{\isasymrightleftharpoons}}{\isaliteral{22}{\isachardoublequote}}}), - parse rules (\isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C7269676874686172706F6F6E75703E}{\isasymrightharpoonup}}{\isaliteral{22}{\isachardoublequote}}}), or print rules (\isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C6C656674686172706F6F6E646F776E3E}{\isasymleftharpoondown}}{\isaliteral{22}{\isachardoublequote}}}). + translation rules (i.e.\ macros) as first-order rewrite rules on + ASTs (see also \secref{sec:ast}). The theory context maintains two + independent lists translation rules: parse rules (\verb|=>| + or \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C7269676874686172706F6F6E75703E}{\isasymrightharpoonup}}{\isaliteral{22}{\isachardoublequote}}}) and print rules (\verb|<=| or \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C6C656674686172706F6F6E646F776E3E}{\isasymleftharpoondown}}{\isaliteral{22}{\isachardoublequote}}}). + For convenience, both can be specified simultaneously as parse~/ + print rules (\verb|==| or \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C72696768746C656674686172706F6F6E733E}{\isasymrightleftharpoons}}{\isaliteral{22}{\isachardoublequote}}}). + Translation patterns may be prefixed by the syntactic category to be - used for parsing; the default is \isa{logic}. + used for parsing; the default is \isa{logic} which means that + regular term syntax is used. Both sides of the syntax translation + rule undergo parsing and parse AST translations + \secref{sec:tr-funs}, in order to perform some fundamental + normalization like \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C6C616D6264613E}{\isasymlambda}}x\ y{\isaliteral{2E}{\isachardot}}\ b\ {\isaliteral{5C3C6C65616473746F3E}{\isasymleadsto}}\ {\isaliteral{5C3C6C616D6264613E}{\isasymlambda}}x{\isaliteral{2E}{\isachardot}}\ {\isaliteral{5C3C6C616D6264613E}{\isasymlambda}}y{\isaliteral{2E}{\isachardot}}\ b{\isaliteral{22}{\isachardoublequote}}}, but other AST + translation rules are \emph{not} applied recursively here. + + When processing AST patterns, the inner syntax lexer runs in a + different mode that allows identifiers to start with underscore. + This accommodates the usual naming convention for auxiliary syntax + constants --- those that do not have a logical counter part --- by + allowing to specify arbitrary AST applications within the term + syntax, independently of the corresponding concrete syntax. + + Atomic ASTs are distinguished as \verb|Ast.Constant| versus \verb|Ast.Variable| as follows: a qualified name or syntax constant + declared via \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}}, or parse tree head of concrete + notation becomes \verb|Ast.Constant|, anything else \verb|Ast.Variable|. Note that \isa{CONST} and \isa{XCONST} within + the term language (\secref{sec:pure-grammar}) allow to enforce + treatment as constants. + + AST rewrite rules \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{28}{\isacharparenleft}}lhs{\isaliteral{2C}{\isacharcomma}}\ rhs{\isaliteral{29}{\isacharparenright}}{\isaliteral{22}{\isachardoublequote}}} need to obey the following + side-conditions: + + \begin{itemize} + + \item Rules must be left linear: \isa{{\isaliteral{22}{\isachardoublequote}}lhs{\isaliteral{22}{\isachardoublequote}}} must not contain + repeated variables.\footnote{The deeper reason for this is that AST + equality is not well-defined: different occurrences of the ``same'' + AST could be decorated differently by accidental type-constraints or + source position information, for example.} + + \item Every variable in \isa{{\isaliteral{22}{\isachardoublequote}}rhs{\isaliteral{22}{\isachardoublequote}}} must also occur in \isa{{\isaliteral{22}{\isachardoublequote}}lhs{\isaliteral{22}{\isachardoublequote}}}. + + \end{itemize} \item \hyperlink{command.no-translations}{\mbox{\isa{\isacommand{no{\isaliteral{5F}{\isacharunderscore}}translations}}}}~\isa{rules} removes syntactic translation rules, which are interpreted in the same manner as for diff -r 428e55887f24 -r d868e4f7905b doc-src/Ref/syntax.tex --- a/doc-src/Ref/syntax.tex Mon Jun 18 21:17:34 2012 +0200 +++ b/doc-src/Ref/syntax.tex Tue Jun 19 20:43:09 2012 +0200 @@ -239,23 +239,6 @@ \section{Macros: syntactic rewriting} \label{sec:macros} \index{macros|(}\index{rewriting!syntactic|(} -Mixfix declarations alone can handle situations where there is a direct -connection between the concrete syntax and the underlying term. Sometimes -we require a more elaborate concrete syntax, such as quantifiers and list -notation. Isabelle's {\bf macros} and {\bf translation functions} can -perform translations such as -\begin{center}\tt - \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l} - ALL x:A.P & Ball(A, \%x.P) \\ \relax - [x, y, z] & Cons(x, Cons(y, Cons(z, Nil))) - \end{tabular} -\end{center} -Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they -are the most powerful translation mechanism but are difficult to read or -write. Macros are specified by first-order rewriting systems that operate -on abstract syntax trees. They are usually easy to read and write, and can -express all but the most obscure translations. - Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set theory.\footnote{This and the following theories are complete working examples, though they specify only syntax, no axioms. The @@ -317,50 +300,8 @@ \subsection{Specifying macros} -Macros are basically rewrite rules on \AST{}s. But unlike other macro -systems found in programming languages, Isabelle's macros work in both -directions. Therefore a syntax contains two lists of rewrites: one for -parsing and one for printing. -\index{*translations section} -The {\tt translations} section specifies macros. The syntax for a macro is -\[ (root)\; string \quad - \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array} - \right\} \quad - (root)\; string -\] -% -This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both -({\tt ==}). The two strings specify the left and right-hand sides of the -macro rule. The $(root)$ specification is optional; it specifies the -nonterminal for parsing the $string$ and if omitted defaults to {\tt - logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions: -\begin{itemize} -\item Rules must be left linear: $l$ must not contain repeated variables. - -\item Every variable in~$r$ must also occur in~$l$. -\end{itemize} - -Macro rules may refer to any syntax from the parent theories. They -may also refer to anything defined before the current {\tt - translations} section --- including any mixfix declarations. - -Upon declaration, both sides of the macro rule undergo parsing and parse -\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo -macro expansion. The lexer runs in a different mode that additionally -accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt}, -{\tt _K}). Thus, a constant whose name starts with an underscore can -appear in macro rules but not in ordinary terms. - -Some atoms of the macro rule's \AST{} are designated as constants for -matching. These are all names that have been declared as classes, types or -constants (logical and syntactic). - -The result of this preprocessing is two lists of macro rules, each -stored as a pair of \AST{}s. They can be viewed using {\tt - print_syntax} (sections \ttindex{parse_rules} and -\ttindex{print_rules}). For theory~{\tt SetSyntax} of -Fig.~\ref{fig:set_trans} these are +For theory~{\tt SetSyntax} of Fig.~\ref{fig:set_trans} these are \begin{ttbox} parse_rules: ("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P))