more on syntax translations;
authorwenzelm
Tue, 19 Jun 2012 20:43:09 +0200
changeset 48115 d868e4f7905b
parent 48114 428e55887f24
child 48116 fd773574cb81
more on syntax translations;
doc-src/IsarRef/Thy/Inner_Syntax.thy
doc-src/IsarRef/Thy/document/Inner_Syntax.tex
doc-src/Ref/syntax.tex
--- a/doc-src/IsarRef/Thy/Inner_Syntax.thy	Mon Jun 18 21:17:34 2012 +0200
+++ b/doc-src/IsarRef/Thy/Inner_Syntax.thy	Tue Jun 19 20:43:09 2012 +0200
@@ -1060,7 +1060,7 @@
   Isabelle/Pure. *}
 
 
-subsection {* Abstract syntax trees *}
+subsection {* Abstract syntax trees \label{sec:ast} *}
 
 text {* The ML datatype @{ML_type Ast.ast} explicitly represents the
   intermediate AST format that is used for syntax rewriting
@@ -1099,7 +1099,7 @@
 *}
 
 
-subsubsection {* Ast constants versus variables *}
+subsubsection {* AST constants versus variables *}
 
 text {* Depending on the situation --- input syntax, output syntax,
   translation patterns --- the distinction of atomic asts as @{ML
@@ -1192,12 +1192,12 @@
 
   Unlike mixfix notation for existing formal entities
   (\secref{sec:notation}), raw syntax declarations provide full access
-  to the priority grammar of the inner syntax.  This includes
-  additional syntactic categories (via @{command nonterminal}) and
-  free-form grammar productions (via @{command syntax}).  Additional
-  syntax translations (or macros, via @{command translations}) are
-  required to turn resulting parse trees into proper representations
-  of formal entities again.
+  to the priority grammar of the inner syntax, without any sanity
+  checks.  This includes additional syntactic categories (via
+  @{command nonterminal}) and free-form grammar productions (via
+  @{command syntax}).  Additional syntax translations (or macros, via
+  @{command translations}) are required to turn resulting parse trees
+  into proper representations of formal entities again.
 
   @{rail "
     @@{command nonterminal} (@{syntax name} + @'and')
@@ -1274,10 +1274,51 @@
   are interpreted in the same manner as for @{command "syntax"} above.
 
   \item @{command "translations"}~@{text rules} specifies syntactic
-  translation rules (i.e.\ macros): parse~/ print rules (@{text "\<rightleftharpoons>"}),
-  parse rules (@{text "\<rightharpoonup>"}), or print rules (@{text "\<leftharpoondown>"}).
+  translation rules (i.e.\ macros) as first-order rewrite rules on
+  ASTs (see also \secref{sec:ast}).  The theory context maintains two
+  independent lists translation rules: parse rules (@{verbatim "=>"}
+  or @{text "\<rightharpoonup>"}) and print rules (@{verbatim "<="} or @{text "\<leftharpoondown>"}).
+  For convenience, both can be specified simultaneously as parse~/
+  print rules (@{verbatim "=="} or @{text "\<rightleftharpoons>"}).
+
   Translation patterns may be prefixed by the syntactic category to be
-  used for parsing; the default is @{text logic}.
+  used for parsing; the default is @{text logic} which means that
+  regular term syntax is used.  Both sides of the syntax translation
+  rule undergo parsing and parse AST translations
+  \secref{sec:tr-funs}, in order to perform some fundamental
+  normalization like @{text "\<lambda>x y. b \<leadsto> \<lambda>x. \<lambda>y. b"}, but other AST
+  translation rules are \emph{not} applied recursively here.
+
+  When processing AST patterns, the inner syntax lexer runs in a
+  different mode that allows identifiers to start with underscore.
+  This accommodates the usual naming convention for auxiliary syntax
+  constants --- those that do not have a logical counter part --- by
+  allowing to specify arbitrary AST applications within the term
+  syntax, independently of the corresponding concrete syntax.
+
+  Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML
+  Ast.Variable} as follows: a qualified name or syntax constant
+  declared via @{command syntax}, or parse tree head of concrete
+  notation becomes @{ML Ast.Constant}, anything else @{ML
+  Ast.Variable}.  Note that @{text CONST} and @{text XCONST} within
+  the term language (\secref{sec:pure-grammar}) allow to enforce
+  treatment as constants.
+
+  AST rewrite rules @{text "(lhs, rhs)"} need to obey the following
+  side-conditions:
+
+  \begin{itemize}
+
+  \item Rules must be left linear: @{text "lhs"} must not contain
+  repeated variables.\footnote{The deeper reason for this is that AST
+  equality is not well-defined: different occurrences of the ``same''
+  AST could be decorated differently by accidental type-constraints or
+  source position information, for example.}
+
+  \item Every variable in @{text "rhs"} must also occur in @{text
+  "lhs"}.
+
+  \end{itemize}
 
   \item @{command "no_translations"}~@{text rules} removes syntactic
   translation rules, which are interpreted in the same manner as for
--- a/doc-src/IsarRef/Thy/document/Inner_Syntax.tex	Mon Jun 18 21:17:34 2012 +0200
+++ b/doc-src/IsarRef/Thy/document/Inner_Syntax.tex	Tue Jun 19 20:43:09 2012 +0200
@@ -1239,7 +1239,7 @@
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
-\isamarkupsubsection{Abstract syntax trees%
+\isamarkupsubsection{Abstract syntax trees \label{sec:ast}%
 }
 \isamarkuptrue%
 %
@@ -1283,7 +1283,7 @@
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
-\isamarkupsubsubsection{Ast constants versus variables%
+\isamarkupsubsubsection{AST constants versus variables%
 }
 \isamarkuptrue%
 %
@@ -1383,12 +1383,12 @@
 
   Unlike mixfix notation for existing formal entities
   (\secref{sec:notation}), raw syntax declarations provide full access
-  to the priority grammar of the inner syntax.  This includes
-  additional syntactic categories (via \hyperlink{command.nonterminal}{\mbox{\isa{\isacommand{nonterminal}}}}) and
-  free-form grammar productions (via \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}}).  Additional
-  syntax translations (or macros, via \hyperlink{command.translations}{\mbox{\isa{\isacommand{translations}}}}) are
-  required to turn resulting parse trees into proper representations
-  of formal entities again.
+  to the priority grammar of the inner syntax, without any sanity
+  checks.  This includes additional syntactic categories (via
+  \hyperlink{command.nonterminal}{\mbox{\isa{\isacommand{nonterminal}}}}) and free-form grammar productions (via
+  \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}}).  Additional syntax translations (or macros, via
+  \hyperlink{command.translations}{\mbox{\isa{\isacommand{translations}}}}) are required to turn resulting parse trees
+  into proper representations of formal entities again.
 
   \begin{railoutput}
 \rail@begin{2}{}
@@ -1525,10 +1525,48 @@
   are interpreted in the same manner as for \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}} above.
 
   \item \hyperlink{command.translations}{\mbox{\isa{\isacommand{translations}}}}~\isa{rules} specifies syntactic
-  translation rules (i.e.\ macros): parse~/ print rules (\isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C72696768746C656674686172706F6F6E733E}{\isasymrightleftharpoons}}{\isaliteral{22}{\isachardoublequote}}}),
-  parse rules (\isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C7269676874686172706F6F6E75703E}{\isasymrightharpoonup}}{\isaliteral{22}{\isachardoublequote}}}), or print rules (\isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C6C656674686172706F6F6E646F776E3E}{\isasymleftharpoondown}}{\isaliteral{22}{\isachardoublequote}}}).
+  translation rules (i.e.\ macros) as first-order rewrite rules on
+  ASTs (see also \secref{sec:ast}).  The theory context maintains two
+  independent lists translation rules: parse rules (\verb|=>|
+  or \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C7269676874686172706F6F6E75703E}{\isasymrightharpoonup}}{\isaliteral{22}{\isachardoublequote}}}) and print rules (\verb|<=| or \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C6C656674686172706F6F6E646F776E3E}{\isasymleftharpoondown}}{\isaliteral{22}{\isachardoublequote}}}).
+  For convenience, both can be specified simultaneously as parse~/
+  print rules (\verb|==| or \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C72696768746C656674686172706F6F6E733E}{\isasymrightleftharpoons}}{\isaliteral{22}{\isachardoublequote}}}).
+
   Translation patterns may be prefixed by the syntactic category to be
-  used for parsing; the default is \isa{logic}.
+  used for parsing; the default is \isa{logic} which means that
+  regular term syntax is used.  Both sides of the syntax translation
+  rule undergo parsing and parse AST translations
+  \secref{sec:tr-funs}, in order to perform some fundamental
+  normalization like \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{5C3C6C616D6264613E}{\isasymlambda}}x\ y{\isaliteral{2E}{\isachardot}}\ b\ {\isaliteral{5C3C6C65616473746F3E}{\isasymleadsto}}\ {\isaliteral{5C3C6C616D6264613E}{\isasymlambda}}x{\isaliteral{2E}{\isachardot}}\ {\isaliteral{5C3C6C616D6264613E}{\isasymlambda}}y{\isaliteral{2E}{\isachardot}}\ b{\isaliteral{22}{\isachardoublequote}}}, but other AST
+  translation rules are \emph{not} applied recursively here.
+
+  When processing AST patterns, the inner syntax lexer runs in a
+  different mode that allows identifiers to start with underscore.
+  This accommodates the usual naming convention for auxiliary syntax
+  constants --- those that do not have a logical counter part --- by
+  allowing to specify arbitrary AST applications within the term
+  syntax, independently of the corresponding concrete syntax.
+
+  Atomic ASTs are distinguished as \verb|Ast.Constant| versus \verb|Ast.Variable| as follows: a qualified name or syntax constant
+  declared via \hyperlink{command.syntax}{\mbox{\isa{\isacommand{syntax}}}}, or parse tree head of concrete
+  notation becomes \verb|Ast.Constant|, anything else \verb|Ast.Variable|.  Note that \isa{CONST} and \isa{XCONST} within
+  the term language (\secref{sec:pure-grammar}) allow to enforce
+  treatment as constants.
+
+  AST rewrite rules \isa{{\isaliteral{22}{\isachardoublequote}}{\isaliteral{28}{\isacharparenleft}}lhs{\isaliteral{2C}{\isacharcomma}}\ rhs{\isaliteral{29}{\isacharparenright}}{\isaliteral{22}{\isachardoublequote}}} need to obey the following
+  side-conditions:
+
+  \begin{itemize}
+
+  \item Rules must be left linear: \isa{{\isaliteral{22}{\isachardoublequote}}lhs{\isaliteral{22}{\isachardoublequote}}} must not contain
+  repeated variables.\footnote{The deeper reason for this is that AST
+  equality is not well-defined: different occurrences of the ``same''
+  AST could be decorated differently by accidental type-constraints or
+  source position information, for example.}
+
+  \item Every variable in \isa{{\isaliteral{22}{\isachardoublequote}}rhs{\isaliteral{22}{\isachardoublequote}}} must also occur in \isa{{\isaliteral{22}{\isachardoublequote}}lhs{\isaliteral{22}{\isachardoublequote}}}.
+
+  \end{itemize}
 
   \item \hyperlink{command.no-translations}{\mbox{\isa{\isacommand{no{\isaliteral{5F}{\isacharunderscore}}translations}}}}~\isa{rules} removes syntactic
   translation rules, which are interpreted in the same manner as for
--- a/doc-src/Ref/syntax.tex	Mon Jun 18 21:17:34 2012 +0200
+++ b/doc-src/Ref/syntax.tex	Tue Jun 19 20:43:09 2012 +0200
@@ -239,23 +239,6 @@
 \section{Macros: syntactic rewriting} \label{sec:macros}
 \index{macros|(}\index{rewriting!syntactic|(}
 
-Mixfix declarations alone can handle situations where there is a direct
-connection between the concrete syntax and the underlying term.  Sometimes
-we require a more elaborate concrete syntax, such as quantifiers and list
-notation.  Isabelle's {\bf macros} and {\bf translation functions} can
-perform translations such as
-\begin{center}\tt
-  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
-    ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
-    [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
-  \end{tabular}
-\end{center}
-Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
-are the most powerful translation mechanism but are difficult to read or
-write.  Macros are specified by first-order rewriting systems that operate
-on abstract syntax trees.  They are usually easy to read and write, and can
-express all but the most obscure translations.
-
 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and
 set theory.\footnote{This and the following theories are complete
   working examples, though they specify only syntax, no axioms.  The
@@ -317,50 +300,8 @@
 
 
 \subsection{Specifying macros}
-Macros are basically rewrite rules on \AST{}s.  But unlike other macro
-systems found in programming languages, Isabelle's macros work in both
-directions.  Therefore a syntax contains two lists of rewrites: one for
-parsing and one for printing.
 
-\index{*translations section}
-The {\tt translations} section specifies macros.  The syntax for a macro is
-\[ (root)\; string \quad
-   \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
-   \right\} \quad
-   (root)\; string
-\]
-%
-This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
-({\tt ==}).  The two strings specify the left and right-hand sides of the
-macro rule.  The $(root)$ specification is optional; it specifies the
-nonterminal for parsing the $string$ and if omitted defaults to {\tt
-  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
-\begin{itemize}
-\item Rules must be left linear: $l$ must not contain repeated variables.
-
-\item Every variable in~$r$ must also occur in~$l$.
-\end{itemize}
-
-Macro rules may refer to any syntax from the parent theories.  They
-may also refer to anything defined before the current {\tt
-  translations} section --- including any mixfix declarations.
-
-Upon declaration, both sides of the macro rule undergo parsing and parse
-\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
-macro expansion.  The lexer runs in a different mode that additionally
-accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
-{\tt _K}).  Thus, a constant whose name starts with an underscore can
-appear in macro rules but not in ordinary terms.
-
-Some atoms of the macro rule's \AST{} are designated as constants for
-matching.  These are all names that have been declared as classes, types or
-constants (logical and syntactic).
-
-The result of this preprocessing is two lists of macro rules, each
-stored as a pair of \AST{}s.  They can be viewed using {\tt
-  print_syntax} (sections \ttindex{parse_rules} and
-\ttindex{print_rules}).  For theory~{\tt SetSyntax} of
-Fig.~\ref{fig:set_trans} these are
+For theory~{\tt SetSyntax} of Fig.~\ref{fig:set_trans} these are
 \begin{ttbox}
 parse_rules:
   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))