# HG changeset patch # User wenzelm # Date 1371561336 -7200 # Node ID 8429123bc58ab3b3bb671f2b9ad7fb8af93d1721 # Parent a59ba6de968751ae19d7a4defb626fd4d9d914b2 more on built-in syntax transformations, based on reduced version of old material; diff -r a59ba6de9687 -r 8429123bc58a src/Doc/IsarRef/Inner_Syntax.thy --- a/src/Doc/IsarRef/Inner_Syntax.thy Tue Jun 18 12:21:57 2013 +0200 +++ b/src/Doc/IsarRef/Inner_Syntax.thy Tue Jun 18 15:15:36 2013 +0200 @@ -1579,4 +1579,131 @@ are explicitly known from the given term in its fully internal form. *} + +subsection {* Built-in syntax transformations *} + +text {* + Here are some further details of the main syntax transformation + phases of \figref{fig:parse-print}. +*} + + +subsubsection {* Transforming parse trees to ASTs *} + +text {* The parse tree is the raw output of the parser. It is + transformed into an AST according to some basic scheme that may be + augmented by AST translation functions as explained in + \secref{sec:tr-funs}. + + The parse tree is constructed by nesting the right-hand sides of the + productions used to recognize the input. Such parse trees are + simply lists of tokens and constituent parse trees, the latter + representing the nonterminals of the productions. Ignoring AST + translation functions, parse trees are transformed to ASTs by + stripping out delimiters and copy productions, while retaining some + source position information from input tokens. + + The Pure syntax provides predefined AST translations to make the + basic @{text "\"}-term structure more apparent within the + (first-order) AST representation, and thus facilitate the use of + @{command translations} (see also \secref{sec:syn-trans}). This + covers ordinary term application, type application, nested + abstraction, iterated meta implications and function types. The + effect is illustrated on some representative input strings is as + follows: + + \begin{center} + \begin{tabular}{ll} + input source & AST \\ + \hline + @{text "f x y z"} & @{verbatim "(f x y z)"} \\ + @{text "'a ty"} & @{verbatim "(ty 'a)"} \\ + @{text "('a, 'b)ty"} & @{verbatim "(ty 'a 'b)"} \\ + @{text "\x y z. t"} & @{verbatim "(\"_abs\" x (\"_abs\" y (\"_abs\" z t)))"} \\ + @{text "\x :: 'a. t"} & @{verbatim "(\"_abs\" (\"_constrain\" x 'a) t)"} \\ + @{text "\P; Q; R\ \ S"} & @{verbatim "(\"==>\" P (\"==>\" Q (\"==>\" R S)))"} \\ + @{text "['a, 'b, 'c] \ 'd"} & @{verbatim "(\"fun\" 'a (\"fun\" 'b (\"fun\" 'c 'd)))"} \\ + \end{tabular} + \end{center} + + Note that type and sort constraints may occur in further places --- + translations need to be ready to cope with them. The built-in + syntax transformation from parse trees to ASTs insert additional + constraints that represent source positions. +*} + + +subsubsection {* Transforming ASTs to terms *} + +text {* After application of macros (\secref{sec:syn-trans}), the AST + is transformed into a term. This term still lacks proper type + information, but it might contain some constraints consisting of + applications with head @{verbatim "_constrain"}, where the second + argument is a type encoded as a pre-term within the syntax. Type + inference later introduces correct types, or indicates type errors + in the input. + + Ignoring parse translations, ASTs are transformed to terms by + mapping AST constants to term constants, AST variables to term + variables or constants (according to the name space), and AST + applications to iterated term applications. + + The outcome is still a first-order term. Proper abstractions and + bound variables are introduced by parse translations associated with + certain syntax constants. Thus @{verbatim "(_abs x x)"} eventually + becomes a de-Bruijn term @{verbatim "Abs (\"x\", _, Bound 0)"}. +*} + + +subsubsection {* Printing of terms *} + +text {* The output phase is essentially the inverse of the input + phase. Terms are translated via abstract syntax trees into + pretty-printed text. + + Ignoring print translations, the transformation maps term constants, + variables and applications to the corresponding constructs on ASTs. + Abstractions are mapped to applications of the special constant + @{verbatim "_abs"} as seen before. Type constraints are represented + via special @{verbatim "_constrain"} forms, according to various + policies of type annotation determined elsewhere. Sort constraints + of type variables are handled in a similar fashion. + + After application of macros (\secref{sec:syn-trans}), the AST is + finally pretty-printed. The built-in print AST translations reverse + the corresponding parse AST translations. + + \medskip For the actual printing process, the priority grammar + (\secref{sec:priority-grammar}) plays a vital role: productions are + used as templates for pretty printing, with argument slots stemming + from nonterminals, and syntactic sugar stemming from literal tokens. + + Each AST application with constant head @{text "c"} and arguments + @{text "t\<^sub>1"}, \dots, @{text "t\<^sub>n"} (for @{text "n = 0"} the AST is + just the constant @{text "c"} itself) is printed according to the + first grammar production of result name @{text "c"}. The required + syntax priority of the argument slot is given by its nonterminal + @{text "A\<^sup>(\<^sup>p\<^sup>)"}. The argument @{text "t\<^sub>i"} that corresponds to the + position of @{text "A\<^sup>(\<^sup>p\<^sup>)"} is printed recursively, and then put in + parentheses \emph{if} its priority @{text "p"} requires this. The + resulting output is concatenated with the syntactic sugar according + to the grammar production. + + If an AST application @{text "(c x\<^sub>1 \ x\<^sub>m)"} has more arguments than + the corresponding production, it is first split into @{text "((c x\<^sub>1 + \ x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \ x\<^sub>m)"} and then printed recursively as above. + + Applications with too few arguments or with non-constant head or + without a corresponding production are printed in prefix-form like + @{text "f t\<^sub>1 \ t\<^sub>n"} for terms. + + Multiple productions associated with some name @{text "c"} are tried + in order of appearance within the grammar. An occurrence of some + AST variable @{text "x"} is printed as @{text "x"} outright. + + \medskip White space is \emph{not} inserted automatically. If + blanks (or breaks) are required to separate tokens, they need to be + specified in the mixfix declaration (\secref{sec:mixfix}). +*} + end diff -r a59ba6de9687 -r 8429123bc58a src/Doc/ROOT --- a/src/Doc/ROOT Tue Jun 18 12:21:57 2013 +0200 +++ b/src/Doc/ROOT Tue Jun 18 15:15:36 2013 +0200 @@ -257,7 +257,6 @@ "../manual.bib" "document/build" "document/root.tex" - "document/syntax.tex" session Sledgehammer (doc) in "Sledgehammer" = Pure + options [document_variants = "sledgehammer"] diff -r a59ba6de9687 -r 8429123bc58a src/Doc/Ref/document/root.tex --- a/src/Doc/Ref/document/root.tex Tue Jun 18 12:21:57 2013 +0200 +++ b/src/Doc/Ref/document/root.tex Tue Jun 18 15:15:36 2013 +0200 @@ -41,8 +41,6 @@ \pagenumbering{roman} \tableofcontents \clearfirst -\input{syntax} - %%seealso's must be last so that they appear last in the index entries \index{meta-rewriting|seealso{tactics, theorems}} diff -r a59ba6de9687 -r 8429123bc58a src/Doc/Ref/document/syntax.tex --- a/src/Doc/Ref/document/syntax.tex Tue Jun 18 12:21:57 2013 +0200 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,240 +0,0 @@ - -\chapter{Syntax Transformations} \label{chap:syntax} -\newcommand\ttapp{\mathrel{\hbox{\tt\$}}} -\newcommand\mtt[1]{\mbox{\tt #1}} -\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits} -\newcommand\Constant{\ttfct{Constant}} -\newcommand\Variable{\ttfct{Variable}} -\newcommand\Appl[1]{\ttfct{Appl}\,[#1]} -\index{syntax!transformations|(} - - -\section{Transforming parse trees to ASTs}\label{sec:astofpt} -\index{ASTs!made from parse trees} -\newcommand\astofpt[1]{\lbrakk#1\rbrakk} - -The parse tree is the raw output of the parser. Translation functions, -called {\bf parse AST translations}\indexbold{translations!parse AST}, -transform the parse tree into an abstract syntax tree. - -The parse tree is constructed by nesting the right-hand sides of the -productions used to recognize the input. Such parse trees are simply lists -of tokens and constituent parse trees, the latter representing the -nonterminals of the productions. Let us refer to the actual productions in -the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an -example). - -Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s -by stripping out delimiters and copy productions. More precisely, the -mapping $\astofpt{-}$ is derived from the productions as follows: -\begin{itemize} -\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, - \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token, - and $s$ its associated string. Note that for {\tt xstr} this does not - include the quotes. - -\item Copy productions:\index{productions!copy} - $\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for - strings of delimiters, which are discarded. $P$ stands for the single - constituent that is not a delimiter; it is either a nonterminal symbol or - a name token. - - \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. - Here there are no constituents other than delimiters, which are - discarded. - - \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and - the remaining constituents $P@1$, \ldots, $P@n$ are built into an - application whose head constant is~$c$: - \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = - \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} - \] -\end{itemize} -Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, -{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. -These examples illustrate the need for further translations to make \AST{}s -closer to the typed $\lambda$-calculus. The Pure syntax provides -predefined parse \AST{} translations\index{translations!parse AST} for -ordinary applications, type applications, nested abstractions, meta -implications and function types. Figure~\ref{fig:parse_ast_tr} shows their -effect on some representative input strings. - - -\begin{figure} -\begin{center} -\tt\begin{tabular}{ll} -\rm input string & \rm \AST \\\hline -"f" & f \\ -"'a" & 'a \\ -"t == u" & ("==" t u) \\ -"f(x)" & ("_appl" f x) \\ -"f(x, y)" & ("_appl" f ("_args" x y)) \\ -"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ -"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ -\end{tabular} -\end{center} -\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} -\end{figure} - -\begin{figure} -\begin{center} -\tt\begin{tabular}{ll} -\rm input string & \rm \AST{} \\\hline -"f(x, y, z)" & (f x y z) \\ -"'a ty" & (ty 'a) \\ -"('a, 'b) ty" & (ty 'a 'b) \\ -"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ -"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ -"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ -"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) -\end{tabular} -\end{center} -\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr} -\end{figure} - -The names of constant heads in the \AST{} control the translation process. -The list of constants invoking parse \AST{} translations appears in the -output of {\tt print_syntax} under {\tt parse_ast_translation}. - - -\section{Transforming ASTs to terms}\label{sec:termofast} -\index{terms!made from ASTs} -\newcommand\termofast[1]{\lbrakk#1\rbrakk} - -The \AST{}, after application of macros (see \S\ref{sec:macros}), is -transformed into a term. This term is probably ill-typed since type -inference has not occurred yet. The term may contain type constraints -consisting of applications with head {\tt "_constrain"}; the second -argument is a type encoded as a term. Type inference later introduces -correct types or rejects the input. - -Another set of translation functions, namely parse -translations\index{translations!parse}, may affect this process. If we -ignore parse translations for the time being, then \AST{}s are transformed -to terms by mapping \AST{} constants to constants, \AST{} variables to -schematic or free variables and \AST{} applications to applications. - -More precisely, the mapping $\termofast{-}$ is defined by -\begin{itemize} -\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x, - \mtt{dummyT})$. - -\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} = - \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$ - the index extracted from~$xi$. - -\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x, - \mtt{dummyT})$. - -\item Function applications with $n$ arguments: - \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = - \termofast{f} \ttapp - \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n} - \] -\end{itemize} -Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and -\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term}, -while \ttindex{dummyT} stands for some dummy type that is ignored during -type inference. - -So far the outcome is still a first-order term. Abstractions and bound -variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced -by parse translations. Such translations are attached to {\tt "_abs"}, -{\tt "!!"} and user-defined binders. - - -\section{Printing of terms} -\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms} - -The output phase is essentially the inverse of the input phase. Terms are -translated via abstract syntax trees into strings. Finally the strings are -pretty printed. - -Print translations (\S\ref{sec:tr_funs}) may affect the transformation of -terms into \AST{}s. Ignoring those, the transformation maps -term constants, variables and applications to the corresponding constructs -on \AST{}s. Abstractions are mapped to applications of the special -constant {\tt _abs}. - -More precisely, the mapping $\astofterm{-}$ is defined as follows: -\begin{itemize} - \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$. - - \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x, - \tau)$. - - \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable - \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of - the {\tt indexname} $(x, i)$. - - \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant - of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ - be obtained from~$t$ by replacing all bound occurrences of~$x$ by - the free variable $x'$. This replaces corresponding occurrences of the - constructor \ttindex{Bound} by the term $\ttfct{Free} (x', - \mtt{dummyT})$: - \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = - \Appl{\Constant \mtt{"_abs"}, - constrain(\Variable x', \tau), \astofterm{t'}} - \] - - \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. - The occurrence of constructor \ttindex{Bound} should never happen - when printing well-typed terms; it indicates a de Bruijn index with no - matching abstraction. - - \item Where $f$ is not an application, - \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = - \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} - \] -\end{itemize} -% -Type constraints\index{type constraints} are inserted to allow the printing -of types. This is governed by the boolean variable \ttindex{show_types}: -\begin{itemize} - \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or - \ttindex{show_types} is set to {\tt false}. - - \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, - \astofterm{\tau}}$ \ otherwise. - - Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type - constructors go to {\tt Constant}s; type identifiers go to {\tt - Variable}s; type applications go to {\tt Appl}s with the type - constructor as the first element. If \ttindex{show_sorts} is set to - {\tt true}, some type variables are decorated with an \AST{} encoding - of their sort. -\end{itemize} -% -The \AST{}, after application of macros (see \S\ref{sec:macros}), is -transformed into the final output string. The built-in {\bf print AST - translations}\indexbold{translations!print AST} reverse the -parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}. - -For the actual printing process, the names attached to productions -of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play -a vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or -$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production -for~$c$. Each argument~$x@i$ is converted to a string, and put in -parentheses if its priority~$(p@i)$ requires this. The resulting strings -and their syntactic sugar (denoted by \dots{} above) are joined to make a -single string. - -If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments -than the corresponding production, it is first split into -$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applications -with too few arguments or with non-constant head or without a -corresponding production are printed as $f(x@1, \ldots, x@l)$ or -$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated -with some name $c$ are tried in order of appearance. An occurrence of -$\Variable x$ is simply printed as~$x$. - -Blanks are {\em not\/} inserted automatically. If blanks are required to -separate tokens, specify them in the mixfix declaration, possibly preceded -by a slash~({\tt/}) to allow a line break. -\index{ASTs|)} - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "ref" -%%% End: