more on built-in syntax transformations, based on reduced version of old material;
authorwenzelm
Tue, 18 Jun 2013 15:15:36 +0200
changeset 52414 8429123bc58a
parent 52413 a59ba6de9687
child 52415 d9fed6e99a57
more on built-in syntax transformations, based on reduced version of old material;
src/Doc/IsarRef/Inner_Syntax.thy
src/Doc/ROOT
src/Doc/Ref/document/root.tex
src/Doc/Ref/document/syntax.tex
src/HOL/Tools/reflection.ML
--- a/src/Doc/IsarRef/Inner_Syntax.thy	Tue Jun 18 12:21:57 2013 +0200
+++ b/src/Doc/IsarRef/Inner_Syntax.thy	Tue Jun 18 15:15:36 2013 +0200
@@ -1579,4 +1579,131 @@
   are explicitly known from the given term in its fully internal form.
 *}
 
+
+subsection {* Built-in syntax transformations *}
+
+text {*
+  Here are some further details of the main syntax transformation
+  phases of \figref{fig:parse-print}.
+*}
+
+
+subsubsection {* Transforming parse trees to ASTs *}
+
+text {* The parse tree is the raw output of the parser.  It is
+  transformed into an AST according to some basic scheme that may be
+  augmented by AST translation functions as explained in
+  \secref{sec:tr-funs}.
+
+  The parse tree is constructed by nesting the right-hand sides of the
+  productions used to recognize the input.  Such parse trees are
+  simply lists of tokens and constituent parse trees, the latter
+  representing the nonterminals of the productions.  Ignoring AST
+  translation functions, parse trees are transformed to ASTs by
+  stripping out delimiters and copy productions, while retaining some
+  source position information from input tokens.
+
+  The Pure syntax provides predefined AST translations to make the
+  basic @{text "\<lambda>"}-term structure more apparent within the
+  (first-order) AST representation, and thus facilitate the use of
+  @{command translations} (see also \secref{sec:syn-trans}).  This
+  covers ordinary term application, type application, nested
+  abstraction, iterated meta implications and function types.  The
+  effect is illustrated on some representative input strings is as
+  follows:
+
+  \begin{center}
+  \begin{tabular}{ll}
+  input source & AST \\
+  \hline
+  @{text "f x y z"} & @{verbatim "(f x y z)"} \\
+  @{text "'a ty"} & @{verbatim "(ty 'a)"} \\
+  @{text "('a, 'b)ty"} & @{verbatim "(ty 'a 'b)"} \\
+  @{text "\<lambda>x y z. t"} & @{verbatim "(\"_abs\" x (\"_abs\" y (\"_abs\" z t)))"} \\
+  @{text "\<lambda>x :: 'a. t"} & @{verbatim "(\"_abs\" (\"_constrain\" x 'a) t)"} \\
+  @{text "\<lbrakk>P; Q; R\<rbrakk> \<Longrightarrow> S"} & @{verbatim "(\"==>\" P (\"==>\" Q (\"==>\" R S)))"} \\
+   @{text "['a, 'b, 'c] \<Rightarrow> 'd"} & @{verbatim "(\"fun\" 'a (\"fun\" 'b (\"fun\" 'c 'd)))"} \\
+  \end{tabular}
+  \end{center}
+
+  Note that type and sort constraints may occur in further places ---
+  translations need to be ready to cope with them.  The built-in
+  syntax transformation from parse trees to ASTs insert additional
+  constraints that represent source positions.
+*}
+
+
+subsubsection {* Transforming ASTs to terms *}
+
+text {* After application of macros (\secref{sec:syn-trans}), the AST
+  is transformed into a term.  This term still lacks proper type
+  information, but it might contain some constraints consisting of
+  applications with head @{verbatim "_constrain"}, where the second
+  argument is a type encoded as a pre-term within the syntax.  Type
+  inference later introduces correct types, or indicates type errors
+  in the input.
+
+  Ignoring parse translations, ASTs are transformed to terms by
+  mapping AST constants to term constants, AST variables to term
+  variables or constants (according to the name space), and AST
+  applications to iterated term applications.
+
+  The outcome is still a first-order term.  Proper abstractions and
+  bound variables are introduced by parse translations associated with
+  certain syntax constants.  Thus @{verbatim "(_abs x x)"} eventually
+  becomes a de-Bruijn term @{verbatim "Abs (\"x\", _, Bound 0)"}.
+*}
+
+
+subsubsection {* Printing of terms *}
+
+text {* The output phase is essentially the inverse of the input
+  phase.  Terms are translated via abstract syntax trees into
+  pretty-printed text.
+
+  Ignoring print translations, the transformation maps term constants,
+  variables and applications to the corresponding constructs on ASTs.
+  Abstractions are mapped to applications of the special constant
+  @{verbatim "_abs"} as seen before.  Type constraints are represented
+  via special @{verbatim "_constrain"} forms, according to various
+  policies of type annotation determined elsewhere.  Sort constraints
+  of type variables are handled in a similar fashion.
+
+  After application of macros (\secref{sec:syn-trans}), the AST is
+  finally pretty-printed.  The built-in print AST translations reverse
+  the corresponding parse AST translations.
+
+  \medskip For the actual printing process, the priority grammar
+  (\secref{sec:priority-grammar}) plays a vital role: productions are
+  used as templates for pretty printing, with argument slots stemming
+  from nonterminals, and syntactic sugar stemming from literal tokens.
+
+  Each AST application with constant head @{text "c"} and arguments
+  @{text "t\<^sub>1"}, \dots, @{text "t\<^sub>n"} (for @{text "n = 0"} the AST is
+  just the constant @{text "c"} itself) is printed according to the
+  first grammar production of result name @{text "c"}.  The required
+  syntax priority of the argument slot is given by its nonterminal
+  @{text "A\<^sup>(\<^sup>p\<^sup>)"}.  The argument @{text "t\<^sub>i"} that corresponds to the
+  position of @{text "A\<^sup>(\<^sup>p\<^sup>)"} is printed recursively, and then put in
+  parentheses \emph{if} its priority @{text "p"} requires this.  The
+  resulting output is concatenated with the syntactic sugar according
+  to the grammar production.
+
+  If an AST application @{text "(c x\<^sub>1 \<dots> x\<^sub>m)"} has more arguments than
+  the corresponding production, it is first split into @{text "((c x\<^sub>1
+  \<dots> x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \<dots> x\<^sub>m)"} and then printed recursively as above.
+
+  Applications with too few arguments or with non-constant head or
+  without a corresponding production are printed in prefix-form like
+  @{text "f t\<^sub>1 \<dots> t\<^sub>n"} for terms.
+
+  Multiple productions associated with some name @{text "c"} are tried
+  in order of appearance within the grammar.  An occurrence of some
+  AST variable @{text "x"} is printed as @{text "x"} outright.
+
+  \medskip White space is \emph{not} inserted automatically.  If
+  blanks (or breaks) are required to separate tokens, they need to be
+  specified in the mixfix declaration (\secref{sec:mixfix}).
+*}
+
 end
--- a/src/Doc/ROOT	Tue Jun 18 12:21:57 2013 +0200
+++ b/src/Doc/ROOT	Tue Jun 18 15:15:36 2013 +0200
@@ -257,7 +257,6 @@
     "../manual.bib"
     "document/build"
     "document/root.tex"
-    "document/syntax.tex"
 
 session Sledgehammer (doc) in "Sledgehammer" = Pure +
   options [document_variants = "sledgehammer"]
--- a/src/Doc/Ref/document/root.tex	Tue Jun 18 12:21:57 2013 +0200
+++ b/src/Doc/Ref/document/root.tex	Tue Jun 18 15:15:36 2013 +0200
@@ -41,8 +41,6 @@
 
 \pagenumbering{roman} \tableofcontents \clearfirst
 
-\input{syntax}
-
 %%seealso's must be last so that they appear last in the index entries
 \index{meta-rewriting|seealso{tactics, theorems}}
 
--- a/src/Doc/Ref/document/syntax.tex	Tue Jun 18 12:21:57 2013 +0200
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,240 +0,0 @@
-
-\chapter{Syntax Transformations} \label{chap:syntax}
-\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
-\newcommand\mtt[1]{\mbox{\tt #1}}
-\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
-\newcommand\Constant{\ttfct{Constant}}
-\newcommand\Variable{\ttfct{Variable}}
-\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
-\index{syntax!transformations|(}
-
-
-\section{Transforming parse trees to ASTs}\label{sec:astofpt}
-\index{ASTs!made from parse trees}
-\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
-
-The parse tree is the raw output of the parser.  Translation functions,
-called {\bf parse AST translations}\indexbold{translations!parse AST},
-transform the parse tree into an abstract syntax tree.
-
-The parse tree is constructed by nesting the right-hand sides of the
-productions used to recognize the input.  Such parse trees are simply lists
-of tokens and constituent parse trees, the latter representing the
-nonterminals of the productions.  Let us refer to the actual productions in
-the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
-example).
-
-Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
-by stripping out delimiters and copy productions.  More precisely, the
-mapping $\astofpt{-}$ is derived from the productions as follows:
-\begin{itemize}
-\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
-  \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token,
-  and $s$ its associated string.  Note that for {\tt xstr} this does not
-  include the quotes.
-
-\item Copy productions:\index{productions!copy}
-  $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
-  strings of delimiters, which are discarded.  $P$ stands for the single
-  constituent that is not a delimiter; it is either a nonterminal symbol or
-  a name token.
-
-  \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
-    Here there are no constituents other than delimiters, which are
-    discarded.
-
-  \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
-    the remaining constituents $P@1$, \ldots, $P@n$ are built into an
-    application whose head constant is~$c$:
-    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
-       \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
-    \]
-\end{itemize}
-Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
-{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
-These examples illustrate the need for further translations to make \AST{}s
-closer to the typed $\lambda$-calculus.  The Pure syntax provides
-predefined parse \AST{} translations\index{translations!parse AST} for
-ordinary applications, type applications, nested abstractions, meta
-implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
-effect on some representative input strings.
-
-
-\begin{figure}
-\begin{center}
-\tt\begin{tabular}{ll}
-\rm input string    & \rm \AST \\\hline
-"f"                 & f \\
-"'a"                & 'a \\
-"t == u"            & ("==" t u) \\
-"f(x)"              & ("_appl" f x) \\
-"f(x, y)"           & ("_appl" f ("_args" x y)) \\
-"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
-"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
-\end{tabular}
-\end{center}
-\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
-\end{figure}
-
-\begin{figure}
-\begin{center}
-\tt\begin{tabular}{ll}
-\rm input string            & \rm \AST{} \\\hline
-"f(x, y, z)"                & (f x y z) \\
-"'a ty"                     & (ty 'a) \\
-"('a, 'b) ty"               & (ty 'a 'b) \\
-"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
-"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
-"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
-"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
-\end{tabular}
-\end{center}
-\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
-\end{figure}
-
-The names of constant heads in the \AST{} control the translation process.
-The list of constants invoking parse \AST{} translations appears in the
-output of {\tt print_syntax} under {\tt parse_ast_translation}.
-
-
-\section{Transforming ASTs to terms}\label{sec:termofast}
-\index{terms!made from ASTs}
-\newcommand\termofast[1]{\lbrakk#1\rbrakk}
-
-The \AST{}, after application of macros (see \S\ref{sec:macros}), is
-transformed into a term.  This term is probably ill-typed since type
-inference has not occurred yet.  The term may contain type constraints
-consisting of applications with head {\tt "_constrain"}; the second
-argument is a type encoded as a term.  Type inference later introduces
-correct types or rejects the input.
-
-Another set of translation functions, namely parse
-translations\index{translations!parse}, may affect this process.  If we
-ignore parse translations for the time being, then \AST{}s are transformed
-to terms by mapping \AST{} constants to constants, \AST{} variables to
-schematic or free variables and \AST{} applications to applications.
-
-More precisely, the mapping $\termofast{-}$ is defined by
-\begin{itemize}
-\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
-  \mtt{dummyT})$.
-
-\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
-  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
-  the index extracted from~$xi$.
-
-\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
-  \mtt{dummyT})$.
-
-\item Function applications with $n$ arguments:
-    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
-       \termofast{f} \ttapp
-         \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
-    \]
-\end{itemize}
-Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
-\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
-while \ttindex{dummyT} stands for some dummy type that is ignored during
-type inference.
-
-So far the outcome is still a first-order term.  Abstractions and bound
-variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
-by parse translations.  Such translations are attached to {\tt "_abs"},
-{\tt "!!"} and user-defined binders.
-
-
-\section{Printing of terms}
-\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
-
-The output phase is essentially the inverse of the input phase.  Terms are
-translated via abstract syntax trees into strings.  Finally the strings are
-pretty printed.
-
-Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
-terms into \AST{}s.  Ignoring those, the transformation maps
-term constants, variables and applications to the corresponding constructs
-on \AST{}s.  Abstractions are mapped to applications of the special
-constant {\tt _abs}.
-
-More precisely, the mapping $\astofterm{-}$ is defined as follows:
-\begin{itemize}
-  \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
-
-  \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
-    \tau)$.
-
-  \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
-    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
-    the {\tt indexname} $(x, i)$.
-
-  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
-    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
-    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
-    the free variable $x'$.  This replaces corresponding occurrences of the
-    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
-    \mtt{dummyT})$:
-   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
-      \Appl{\Constant \mtt{"_abs"},
-        constrain(\Variable x', \tau), \astofterm{t'}}
-    \]
-
-  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
-    The occurrence of constructor \ttindex{Bound} should never happen
-    when printing well-typed terms; it indicates a de Bruijn index with no
-    matching abstraction.
-
-  \item Where $f$ is not an application,
-    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
-       \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
-    \]
-\end{itemize}
-%
-Type constraints\index{type constraints} are inserted to allow the printing
-of types.  This is governed by the boolean variable \ttindex{show_types}:
-\begin{itemize}
-  \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
-    \ttindex{show_types} is set to {\tt false}.
-
-  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
-         \astofterm{\tau}}$ \ otherwise.
-
-    Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
-    constructors go to {\tt Constant}s; type identifiers go to {\tt
-      Variable}s; type applications go to {\tt Appl}s with the type
-    constructor as the first element.  If \ttindex{show_sorts} is set to
-    {\tt true}, some type variables are decorated with an \AST{} encoding
-    of their sort.
-\end{itemize}
-%
-The \AST{}, after application of macros (see \S\ref{sec:macros}), is
-transformed into the final output string.  The built-in {\bf print AST
-  translations}\indexbold{translations!print AST} reverse the
-parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
-
-For the actual printing process, the names attached to productions
-of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
-a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
-$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
-for~$c$.  Each argument~$x@i$ is converted to a string, and put in
-parentheses if its priority~$(p@i)$ requires this.  The resulting strings
-and their syntactic sugar (denoted by \dots{} above) are joined to make a
-single string.
-
-If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments
-than the corresponding production, it is first split into
-$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications
-with too few arguments or with non-constant head or without a
-corresponding production are printed as $f(x@1, \ldots, x@l)$ or
-$(\alpha@1, \ldots, \alpha@l) ty$.  Multiple productions associated
-with some name $c$ are tried in order of appearance.  An occurrence of
-$\Variable x$ is simply printed as~$x$.
-
-Blanks are {\em not\/} inserted automatically.  If blanks are required to
-separate tokens, specify them in the mixfix declaration, possibly preceded
-by a slash~({\tt/}) to allow a line break.
-\index{ASTs|)}
-
-%%% Local Variables: 
-%%% mode: latex
-%%% TeX-master: "ref"
-%%% End: