New lemmas connected with the reals and infinite series
\chapter{Syntax Transformations} \label{chap:syntax}\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}\newcommand\mtt[1]{\mbox{\tt #1}}\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}\newcommand\Constant{\ttfct{Constant}}\newcommand\Variable{\ttfct{Variable}}\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}\index{syntax!transformations|(}This chapter is intended for experienced Isabelle users who need to definemacros or code their own translation functions. It describes thetransformations between parse trees, abstract syntax trees and terms.\section{Abstract syntax trees} \label{sec:asts}\index{ASTs|(}The parser, given a token list from the lexer, applies productions to yielda parse tree\index{parse trees}. By applying some internal transformationsthe parse tree becomes an abstract syntax tree, or \AST{}. Macroexpansion, further translations and finally type inference yields awell-typed term. The printing process is the reverse, except for somesubtleties to be discussed later.Figure~\ref{fig:parse_print} outlines the parsing and printing process.Much of the complexity is due to the macro mechanism. Using macros, youcan specify most forms of concrete syntax without writing any \ML{} code.\begin{figure}\begin{center}\begin{tabular}{cl}string & \\$\downarrow$ & lexer, parser \\parse tree & \\$\downarrow$ & parse \AST{} translation \\\AST{} & \\$\downarrow$ & \AST{} rewriting (macros) \\\AST{} & \\$\downarrow$ & parse translation, type inference \\--- well-typed term --- & \\$\downarrow$ & print translation \\\AST{} & \\$\downarrow$ & \AST{} rewriting (macros) \\\AST{} & \\$\downarrow$ & print \AST{} translation, token translation \\string &\end{tabular}\end{center}\caption{Parsing and printing}\label{fig:parse_print}\end{figure}Abstract syntax trees are an intermediate form between the raw parse treesand the typed $\lambda$-terms. An \AST{} is either an atom (constant orvariable) or a list of {\em at least two\/} subtrees. Internally, theyhave type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}\index{*Appl}\begin{ttbox}datatype ast = Constant of string | Variable of string | Appl of ast list\end{ttbox}%Isabelle uses an S-expression syntax for abstract syntax trees. Constantatoms are shown as quoted strings, variable atoms as non-quoted strings andapplications as a parenthesised list of subtrees. For example, the \AST\begin{ttbox}Appl [Constant "_constrain", Appl [Constant "_abs", Variable "x", Variable "t"], Appl [Constant "fun", Variable "'a", Variable "'b"]]\end{ttbox}is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.Both {\tt ()} and {\tt (f)} are illegal because they have too fewsubtrees.The resemblance to Lisp's S-expressions is intentional, but there are twokinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take thenames {\tt Constant} and {\tt Variable} too literally; in the latertranslation to terms, $\Variable x$ may become a constant, free or boundvariable, even a type constructor or class name; the actual outcome dependson the context.Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as theapplication of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind ofapplication is determined later by context; it could be a type constructorapplied to types.Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s arefirst-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Laterat the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node andoccurrences of {\tt x} in $t$ will be replaced by bound variables (the termconstructor \ttindex{Bound}).\section{Transforming parse trees to ASTs}\label{sec:astofpt}\index{ASTs!made from parse trees}\newcommand\astofpt[1]{\lbrakk#1\rbrakk}The parse tree is the raw output of the parser. Translation functions,called {\bf parse AST translations}\indexbold{translations!parse AST},transform the parse tree into an abstract syntax tree.The parse tree is constructed by nesting the right-hand sides of theproductions used to recognize the input. Such parse trees are simply listsof tokens and constituent parse trees, the latter representing thenonterminals of the productions. Let us refer to the actual productions inthe form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for anexample).Ignoring parse \AST{} translations, parse trees are transformed to \AST{}sby stripping out delimiters and copy productions. More precisely, themapping $\astofpt{-}$ is derived from the productions as follows:\begin{itemize}\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token, and $s$ its associated string. Note that for {\tt xstr} this does not include the quotes.\item Copy productions:\index{productions!copy} $\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for strings of delimiters, which are discarded. $P$ stands for the single constituent that is not a delimiter; it is either a nonterminal symbol or a name token. \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. Here there are no constituents other than delimiters, which are discarded. \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and the remaining constituents $P@1$, \ldots, $P@n$ are built into an application whose head constant is~$c$: \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} \]\end{itemize}Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.These examples illustrate the need for further translations to make \AST{}scloser to the typed $\lambda$-calculus. The Pure syntax providespredefined parse \AST{} translations\index{translations!parse AST} forordinary applications, type applications, nested abstractions, metaimplications and function types. Figure~\ref{fig:parse_ast_tr} shows theireffect on some representative input strings.\begin{figure}\begin{center}\tt\begin{tabular}{ll}\rm input string & \rm \AST \\\hline"f" & f \\"'a" & 'a \\"t == u" & ("==" t u) \\"f(x)" & ("_appl" f x) \\"f(x, y)" & ("_appl" f ("_args" x y)) \\"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\\end{tabular}\end{center}\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}\end{figure}\begin{figure}\begin{center}\tt\begin{tabular}{ll}\rm input string & \rm \AST{} \\\hline"f(x, y, z)" & (f x y z) \\"'a ty" & (ty 'a) \\"('a, 'b) ty" & (ty 'a 'b) \\"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))\end{tabular}\end{center}\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}\end{figure}The names of constant heads in the \AST{} control the translation process.The list of constants invoking parse \AST{} translations appears in theoutput of {\tt print_syntax} under {\tt parse_ast_translation}.\section{Transforming ASTs to terms}\label{sec:termofast}\index{terms!made from ASTs}\newcommand\termofast[1]{\lbrakk#1\rbrakk}The \AST{}, after application of macros (see \S\ref{sec:macros}), istransformed into a term. This term is probably ill-typed since typeinference has not occurred yet. The term may contain type constraintsconsisting of applications with head {\tt "_constrain"}; the secondargument is a type encoded as a term. Type inference later introducescorrect types or rejects the input.Another set of translation functions, namely parsetranslations\index{translations!parse}, may affect this process. If weignore parse translations for the time being, then \AST{}s are transformedto terms by mapping \AST{} constants to constants, \AST{} variables toschematic or free variables and \AST{} applications to applications.More precisely, the mapping $\termofast{-}$ is defined by\begin{itemize}\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x, \mtt{dummyT})$.\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} = \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$ the index extracted from~$xi$.\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x, \mtt{dummyT})$.\item Function applications with $n$ arguments: \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = \termofast{f} \ttapp \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n} \]\end{itemize}Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},while \ttindex{dummyT} stands for some dummy type that is ignored duringtype inference.So far the outcome is still a first-order term. Abstractions and boundvariables (constructors \ttindex{Abs} and \ttindex{Bound}) are introducedby parse translations. Such translations are attached to {\tt "_abs"},{\tt "!!"} and user-defined binders.\section{Printing of terms}\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}The output phase is essentially the inverse of the input phase. Terms aretranslated via abstract syntax trees into strings. Finally the strings arepretty printed.Print translations (\S\ref{sec:tr_funs}) may affect the transformation ofterms into \AST{}s. Ignoring those, the transformation mapsterm constants, variables and applications to the corresponding constructson \AST{}s. Abstractions are mapped to applications of the specialconstant {\tt _abs}.More precisely, the mapping $\astofterm{-}$ is defined as follows:\begin{itemize} \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$. \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x, \tau)$. \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of the {\tt indexname} $(x, i)$. \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ be obtained from~$t$ by replacing all bound occurrences of~$x$ by the free variable $x'$. This replaces corresponding occurrences of the constructor \ttindex{Bound} by the term $\ttfct{Free} (x', \mtt{dummyT})$: \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = \Appl{\Constant \mtt{"_abs"}, constrain(\Variable x', \tau), \astofterm{t'}} \] \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. The occurrence of constructor \ttindex{Bound} should never happen when printing well-typed terms; it indicates a de Bruijn index with no matching abstraction. \item Where $f$ is not an application, \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} \]\end{itemize}%Type constraints\index{type constraints} are inserted to allow the printingof types. This is governed by the boolean variable \ttindex{show_types}:\begin{itemize} \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or \ttindex{show_types} is set to {\tt false}. \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, \astofterm{\tau}}$ \ otherwise. Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type constructors go to {\tt Constant}s; type identifiers go to {\tt Variable}s; type applications go to {\tt Appl}s with the type constructor as the first element. If \ttindex{show_sorts} is set to {\tt true}, some type variables are decorated with an \AST{} encoding of their sort.\end{itemize}%The \AST{}, after application of macros (see \S\ref{sec:macros}), istransformed into the final output string. The built-in {\bf print AST translations}\indexbold{translations!print AST} reverse theparse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.For the actual printing process, the names attached to productionsof the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ playa vital role. Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the productionfor~$c$. Each argument~$x@i$ is converted to a string, and put inparentheses if its priority~$(p@i)$ requires this. The resulting stringsand their syntactic sugar (denoted by \dots{} above) are joined to make asingle string.If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more argumentsthan the corresponding production, it is first split into$((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$. Applicationswith too few arguments or with non-constant head or without acorresponding production are printed as $f(x@1, \ldots, x@l)$ or$(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associatedwith some name $c$ are tried in order of appearance. An occurrence of$\Variable x$ is simply printed as~$x$.Blanks are {\em not\/} inserted automatically. If blanks are required toseparate tokens, specify them in the mixfix declaration, possibly precededby a slash~({\tt/}) to allow a line break.\index{ASTs|)}\section{Macros: syntactic rewriting} \label{sec:macros}\index{macros|(}\index{rewriting!syntactic|(}Mixfix declarations alone can handle situations where there is a directconnection between the concrete syntax and the underlying term. Sometimeswe require a more elaborate concrete syntax, such as quantifiers and listnotation. Isabelle's {\bf macros} and {\bf translation functions} canperform translations such as\begin{center}\tt \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l} ALL x:A.P & Ball(A, \%x.P) \\ \relax [x, y, z] & Cons(x, Cons(y, Cons(z, Nil))) \end{tabular}\end{center}Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; theyare the most powerful translation mechanism but are difficult to read orwrite. Macros are specified by first-order rewriting systems that operateon abstract syntax trees. They are usually easy to read and write, and canexpress all but the most obscure translations.Figure~\ref{fig:set_trans} defines a fragment of first-order logic andset theory.\footnote{This and the following theories are complete working examples, though they specify only syntax, no axioms. The file {\tt ZF/ZF.thy} presents a full set theory definition, including many macro rules.} Theory {\tt SetSyntax} declaresconstants for set comprehension ({\tt Collect}), replacement ({\tt Replace}) and bounded universal quantification ({\tt Ball}). Eachof these binds some variables. Without additional syntax we shouldhave to write $\forall x \in A. P$ as {\tt Ball(A,\%x.P)}, andsimilarly for the others.\begin{figure}\begin{ttbox}SetSyntax = Pure +types i oarities i, o :: logicconsts Trueprop :: o => prop ("_" 5) Collect :: [i, i => o] => i Replace :: [i, [i, i] => o] => i Ball :: [i, i => o] => osyntax "{\at}Collect" :: [idt, i, o] => i ("(1{\ttlbrace}_:_./ _{\ttrbrace})") "{\at}Replace" :: [idt, idt, i, o] => i ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})") "{\at}Ball" :: [idt, i, o] => o ("(3ALL _:_./ _)" 10)translations "{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)" "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)" "ALL x:A. P" == "Ball(A, \%x. P)"end\end{ttbox}\caption{Macro example: set theory}\label{fig:set_trans}\end{figure}The theory specifies a variable-binding syntax through additional productionsthat have mixfix declarations. Each non-copy production must specify someconstant, which is used for building \AST{}s. \index{constants!syntactic} Theadditional constants are decorated with {\tt\at} to stress their purelysyntactic purpose; they may not occur within the final well-typed terms,being declared as {\tt syntax} rather than {\tt consts}.The translations cause the replacement of external forms by internal formsafter parsing, and vice versa before printing of terms. As a specificationof the set theory notation, they should be largely self-explanatory. Thesyntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},appear implicitly in the macro rules via their mixfix forms.Macros can define variable-binding syntax because they operate on \AST{}s,which have no inbuilt notion of bound variable. The macro variables {\tt x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,in this case bound variables. The macro variables {\tt P} and~{\tt Q}range over formulae containing bound variable occurrences.Other applications of the macro system can be less straightforward, andthere are peculiarities. The rest of this section will describe in detailhow Isabelle macros are preprocessed and applied.\subsection{Specifying macros}Macros are basically rewrite rules on \AST{}s. But unlike other macrosystems found in programming languages, Isabelle's macros work in bothdirections. Therefore a syntax contains two lists of rewrites: one forparsing and one for printing.\index{*translations section}The {\tt translations} section specifies macros. The syntax for a macro is\[ (root)\; string \quad \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array} \right\} \quad (root)\; string\]%This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both({\tt ==}). The two strings specify the left and right-hand sides of themacro rule. The $(root)$ specification is optional; it specifies thenonterminal for parsing the $string$ and if omitted defaults to {\tt logic}. \AST{} rewrite rules $(l, r)$ must obey certain conditions:\begin{itemize}\item Rules must be left linear: $l$ must not contain repeated variables.\item Every variable in~$r$ must also occur in~$l$.\end{itemize}Macro rules may refer to any syntax from the parent theories. Theymay also refer to anything defined before the current {\tt translations} section --- including any mixfix declarations.Upon declaration, both sides of the macro rule undergo parsing and parse\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergomacro expansion. The lexer runs in a different mode that additionallyaccepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},{\tt _K}). Thus, a constant whose name starts with an underscore canappear in macro rules but not in ordinary terms.Some atoms of the macro rule's \AST{} are designated as constants formatching. These are all names that have been declared as classes, types orconstants (logical and syntactic).The result of this preprocessing is two lists of macro rules, eachstored as a pair of \AST{}s. They can be viewed using {\tt print_syntax} (sections \ttindex{parse_rules} and\ttindex{print_rules}). For theory~{\tt SetSyntax} ofFig.~\ref{fig:set_trans} these are\begin{ttbox}parse_rules: ("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P)) ("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q))) ("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P))print_rules: ("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P) ("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q) ("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P)\end{ttbox}\begin{warn} Avoid choosing variable names that have previously been used as constants, types or type classes; the {\tt consts} section in the output of {\tt print_syntax} lists all such names. If a macro rule works incorrectly, inspect its internal form as shown above, recalling that constants appear as quoted strings and variables without quotes.\end{warn}\begin{warn}If \ttindex{eta_contract} is set to {\tt true}, terms will be$\eta$-contracted {\em before\/} the \AST{} rewriter sees them. Thus someabstraction nodes needed for print rules to match may vanish. For example,\verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule doesnot apply and the output will be {\tt Ball(A, P)}. This problem would notoccur if \ML{} translation functions were used instead of macros (as isdone for binder declarations).\end{warn}\begin{warn}Another trap concerns type constraints. If \ttindex{show_types} is set to{\tt true}, bound variables will be decorated by their meta types at thebinding place (but not at occurrences in the body). Matching with\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y"i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint toappear in the external form, say \verb|{y::i:A::i. P::o}|.To allow such constraints to be re-read, your syntax should specify boundvariables using the nonterminal~\ndx{idt}. This is the case in ourexample. Choosing {\tt id} instead of {\tt idt} is a common error.\end{warn}\subsection{Applying rules}As a term is being parsed or printed, an \AST{} is generated as anintermediate form (recall Fig.\ts\ref{fig:parse_print}). The \AST{} isnormalised by applying macro rules in the manner of a traditional termrewriting system. We first examine how a single rule is applied.Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ sometranslation rule. A subtree~$u$ of $t$ is a {\bf redex} if it is aninstance of~$l$; in this case $l$ is said to {\bf match}~$u$. A redexmatched by $l$ may be replaced by the corresponding instance of~$r$, thus{\bf rewriting} the \AST~$t$. Matching requires some notion of {\bf place-holders} that may occur in rule patterns but not in ordinary\AST{}s; {\tt Variable} atoms serve this purpose.The matching of the object~$u$ by the pattern~$l$ is performed as follows:\begin{itemize} \item Every constant matches itself. \item $\Variable x$ in the object matches $\Constant x$ in the pattern. This point is discussed further below. \item Every \AST{} in the object matches $\Variable x$ in the pattern, binding~$x$ to~$u$. \item One application matches another if they have the same number of subtrees and corresponding subtrees match. \item In every other case, matching fails. In particular, {\tt Constant}~$x$ can only match itself.\end{itemize}A successful match yields a substitution that is applied to~$r$, generatingthe instance that replaces~$u$.The second case above may look odd. This is where {\tt Variable}s ofnon-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are notfar removed from parse trees; at this level it is not yet known whichidentifiers will become constants, bounds, frees, types or classes. As\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},\ndx{tvar}, \ndx{num}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the otherhand, when \AST{}s generated from terms for printing, all constants and typeconstructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s maycontain a messy mixture of {\tt Variable}s and {\tt Constant}s. This isinsignificant at macro level because matching treats them alike.Because of this behaviour, different kinds of atoms with the same name areindistinguishable, which may make some rules prone to misbehaviour. Example:\begin{ttbox}types Nilconsts Nil :: 'a listsyntax "[]" :: 'a list ("[]")translations "[]" == "Nil"\end{ttbox}The term {\tt Nil} will be printed as {\tt []}, just as expected.The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not beexpected! Guess how type~{\tt Nil} is printed?Normalizing an \AST{} involves repeatedly applying macro rules until none areapplicable. Macro rules are chosen in order of appearance in the theorydefinitions. You can watch the normalization of \AST{}s during parsing andprinting by setting \ttindex{Syntax.trace_ast} to {\tt true}.\index{tracing!of macros} The information displayed when tracing includes the \AST{} beforenormalization ({\tt pre}), redexes with results ({\tt rewrote}), the normalform finally reached ({\tt post}) and some statistics ({\tt normalize}).\subsection{Example: the syntax of finite sets}\index{examples!of macros}This example demonstrates the use of recursive macros to implement aconvenient notation for finite sets.\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}\index{"@Enum@{\tt\at Enum} constant}\index{"@Finset@{\tt\at Finset} constant}\begin{ttbox}FinSyntax = SetSyntax +types issyntax "" :: i => is ("_") "{\at}Enum" :: [i, is] => is ("_,/ _")consts empty :: i ("{\ttlbrace}{\ttrbrace}") insert :: [i, i] => isyntax "{\at}Finset" :: is => i ("{\ttlbrace}(_){\ttrbrace}")translations "{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})" "{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"end\end{ttbox}Finite sets are internally built up by {\tt empty} and {\tt insert}. Thedeclarations above specify \verb|{x, y, z}| as the external representationof\begin{ttbox}insert(x, insert(y, insert(z, empty)))\end{ttbox}The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|}allows a line break after the comma for \rmindex{pretty printing}; if noline break is required then a space is printed instead.The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}declaration. Hence {\tt is} is not a logical type and may be used safely asa new nonterminal for custom syntax. The nonterminal~{\tt is} can later bere-used for other enumerations of type~{\tt i} like lists or tuples. If wehad needed polymorphic enumerations, we could have used the predefinednonterminal symbol \ndx{args} and skipped this part altogether.\index{"@Finset@{\tt\at Finset} constant}Next follows {\tt empty}, which is already equipped with its syntax\verb|{}|, and {\tt insert} without concrete syntax. The syntacticconstant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt i} enclosed in curly braces. Remember that a pair of parentheses, as in\verb|"{(_)}"|, specifies a block of indentation for pretty printing.The translations may look strange at first. Macro rules are bestunderstood in their internal forms:\begin{ttbox}parse_rules: ("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs)) ("{\at}Finset" x) -> ("insert" x "empty")print_rules: ("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs)) ("insert" x "empty") -> ("{\at}Finset" x)\end{ttbox}This shows that \verb|{x,xs}| indeed matches any set enumeration of at leasttwo elements, binding the first to {\tt x} and the rest to {\tt xs}.Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.The parse rules only work in the order given.\begin{warn} The \AST{} rewriter cannot distinguish constants from variables and looks only for names of atoms. Thus the names of {\tt Constant}s occurring in the (internal) left-hand side of translation rules should be regarded as \rmindex{reserved words}. Choose non-identifiers like {\tt\at Finset} or sufficiently long and strange names. If a bound variable's name gets rewritten, the result will be incorrect; for example, the term\begin{ttbox}\%empty insert. insert(x, empty)\end{ttbox}\par\noindent is incorrectly printed as \verb|%empty insert. {x}|.\end{warn}\subsection{Example: a parse macro for dependent types}\label{prod_trans}\index{examples!of macros}As stated earlier, a macro rule may not introduce new {\tt Variable}s onthe right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal;if allowed, it could cause variable capture. In such cases you usuallymust fall back on translation functions. But a trick can make thingsreadable in some cases: {\em calling\/} translation functions by parsemacros:\begin{ttbox}ProdSyntax = SetSyntax +consts Pi :: [i, i => i] => isyntax "{\at}PROD" :: [idt, i, i] => i ("(3PROD _:_./ _)" 10) "{\at}->" :: [i, i] => i ("(_ ->/ _)" [51, 50] 50)\ttbreaktranslations "PROD x:A. B" => "Pi(A, \%x. B)" "A -> B" => "Pi(A, _K(B))"endML val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];\end{ttbox}Here {\tt Pi} is a logical constant for constructing general products.Two external forms exist: the general case {\tt PROD x:A.B} and thefunction space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}does not depend on~{\tt x}.The second parse macro introduces {\tt _K(B)}, which later becomes\verb|%x.B| due to a parse translation associated with \cdx{_K}.Unfortunately there is no such trick for printing, so we have to add a {\ttML} section for the print translation \ttindex{dependent_tr'}.Recall that identifiers with a leading {\tt _} are allowed in translationrules, but not in ordinary terms. Thus we can create \AST{}s containingnames that are not directly expressible.The parse translation for {\tt _K} is already installed in Pure, and thefunction {\tt dependent_tr'} is exported by the syntax module for public use.See \S\ref{sec:tr_funs} below for more of the arcane lore of translationfunctions. \index{macros|)}\index{rewriting!syntactic|)}\section{Translation functions} \label{sec:tr_funs}\index{translations|(}%This section describes the translation function mechanism. By writing \ML{}functions, you can do almost everything to terms or \AST{}s during parsing andprinting. The logic LK is a good example of sophisticated transformationsbetween internal and external representations of sequents; here, macros wouldbe useless.A full understanding of translations requires some familiaritywith Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},{\tt Syntax.ast} and the encodings of types and terms as such at the variousstages of the parsing or printing process. Most users should never need touse translation functions.\subsection{Declaring translation functions}There are four kinds of translation functions, with one of thesecoming in two variants. Each such function is associated with a name,which triggers calls to it. Such names can be constants (logical orsyntactic) or type constructors.Function {\tt print_syntax} displays the sets of names associated with thetranslation functions of a theory under \texttt{parse_ast_translation}, etc.You can add new ones via the {\tt ML} section\index{*ML section} of a theorydefinition file. Even though the {\tt ML} section is the very last part ofthe file, newly installed translation functions are already effective whenprocessing all of the preceding sections.The {\tt ML} section's contents are simply copied verbatim near thebeginning of the \ML\ file generated from a theory definition file.Definitions made here are accessible as components of an \ML\ structure; to make some parts private, use an \ML{} {\tt local}declaration. The {\ML} code may install translation functions bydeclaring any of the following identifiers:\begin{ttbox}val parse_ast_translation : (string * (ast list -> ast)) listval print_ast_translation : (string * (ast list -> ast)) listval parse_translation : (string * (term list -> term)) listval print_translation : (string * (term list -> term)) listval typed_print_translation : (string * (bool -> typ -> term list -> term)) list\end{ttbox}\subsection{The translation strategy}The different kinds of translation functions are called during thetransformations between parse trees, \AST{}s and terms (recallFig.\ts\ref{fig:parse_print}). Whenever a combination of the form$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translationfunction $f$ of appropriate kind exists for $c$, the result iscomputed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots, x@n}$. For term translations, the arguments are terms and acombination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const}(c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow moresophisticated transformations than \AST{}s do, typically involvingabstractions and bound variables. {\em Typed} print translations mayeven peek at the type $\tau$ of the constant they are invoked on; theyare also passed the current value of the \ttindex{show_sorts} flag.Regardless of whether they act on terms or \AST{}s, translationfunctions called during the parsing process differ from those forprinting more fundamentally in their overall behaviour:\begin{description}\item[Parse translations] are applied bottom-up. The arguments are already in translated form. The translations must not fail; exceptions trigger an error message. There may never be more than one function associated with any syntactic name.\item[Print translations] are applied top-down. They are supplied with arguments that are partly still in internal form. The result again undergoes translation; therefore a print translation should not introduce as head the very constant that invoked it. The function may raise exception \xdx{Match} to indicate failure; in this event it has no effect. Multiple functions associated with some syntactic name are tried in an unspecified order.\end{description}Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and\ttindex{Const} for terms --- can invoke translation functions. Thiscauses another difference between parsing and printing.Parsing starts with a string and the constants are not yet identified.Only parse tree heads create {\tt Constant}s in the resulting \AST, asdescribed in \S\ref{sec:astofpt}. Macros and parse \AST{} translations mayintroduce further {\tt Constant}s. When the final \AST{} is converted to aterm, all {\tt Constant}s become {\tt Const}s, as described in\S\ref{sec:termofast}.Printing starts with a well-typed term and all the constants are known. Soall logical constants and type constructors may invoke print translations.These, and macros, may introduce further constants.\subsection{Example: a print translation for dependent types}\index{examples!of translations}\indexbold{*dependent_tr'}Let us continue the dependent type example (page~\pageref{prod_trans}) byexamining the parse translation for~\cdx{_K} and the print translation{\tt dependent_tr'}, which are both built-in. By convention, parsetranslations have names ending with {\tt _tr} and print translations havenames ending with {\tt _tr'}. Search for such names in the Isabellesources to locate more examples.Here is the parse translation for {\tt _K}:\begin{ttbox}fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t) | k_tr ts = raise TERM ("k_tr", ts);\end{ttbox}If {\tt k_tr} is called with exactly one argument~$t$, it creates a new{\tt Abs} node with a body derived from $t$. Since terms given to parsetranslations are not yet typed, the type of the bound variable in the new{\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound}nodes referring to outer abstractions by calling \ttindex{incr_boundvars},a basic term manipulation function defined in {\tt Pure/term.ML}.Here is the print translation for dependent types:\begin{ttbox}fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) = if 0 mem (loose_bnos B) then let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in list_comb (Const (q,dummyT) $ Syntax.mark_boundT (x',{\thinspace}T) $ A $ B', ts) end else list_comb (Const (r, dummyT) $ A $ B, ts) | dependent_tr' _ _ = raise Match;\end{ttbox}The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt dependent_tr'} by a partial application during its installation.For example, we could set up print translations for both {\tt Pi} and{\tt Sigma} by including\begin{ttbox}\index{*ML section}val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->")), ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];\end{ttbox}within the {\tt ML} section. The first of these transforms ${\tt Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does notdepend on~$x$. It checks this using \ttindex{loose_bnos}, yet anotherfunction from {\tt Pure/term.ML}. Note that $x'$ is a version of $x$renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt Bound} nodes referring to the {\tt Abs} node replaced by$\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a boundvariable).We must be careful with types here. While types of {\tt Const}s areignored, type constraints may be printed for some {\tt Free}s and{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type\ttindex{dummyT} are never printed with constraint, though. The line\begin{ttbox} let val (x', B') = Syntax.variant_abs' (x, dummyT, B);\end{ttbox}\index{*Syntax.variant_abs'}replaces bound variable occurrences in~$B$ by the free variable $x'$ withtype {\tt dummyT}. Only the binding occurrence of~$x'$ is given thecorrect type~{\tt T}, so this is the only place where a typeconstraint might appear.Also note that we are responsible to mark free identifiers thatactually represent bound variables. This is achieved by\ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.Failing to do so may cause these names to be printed in the wrongstyle. \index{translations|)} \index{syntax!transformations|)}\section{Token translations} \label{sec:tok_tr}\index{token translations|(}%Isabelle's meta-logic features quite a lot of different kinds ofidentifiers, namely {\em class}, {\em tfree}, {\em tvar}, {\em free},{\em bound}, {\em var}. One might want to have these printed indifferent styles, e.g.\ in bold or italic, or even transcribed intosomething more readable like $\alpha, \alpha', \beta$ instead of {\tt 'a}, {\tt 'aa}, {\tt 'b} for type variables. Token translationsprovide a means to such ends, enabling the user to install certain\ML{} functions associated with any logical \rmindex{token class} anddepending on some \rmindex{print mode}.The logical class of identifiers can not necessarily be determined byits syntactic category, though. For example, consider free vs.\ boundvariables. So Isabelle's pretty printing mechanism, starting fromfully typed terms, has to be careful to preserve this additionalinformation\footnote{This is done by marking atoms in abstract syntax trees appropriately. The marks are actually visible by print translation functions -- they are just special constants applied to atomic asts, for example \texttt{("_bound" x)}.}. In particular,user-supplied print translation functions operating on terms have tobe well-behaved in this respect. Free identifiers introduced torepresent bound variables have to be marked appropriately (cf.\ theexample at the end of \S\ref{sec:tr_funs}).\medskip Token translations may be installed by declaring the\ttindex{token_translation} value within the \texttt{ML} section of a theorydefinition file:\begin{ttbox}val token_translation: (string * string * (string -> string * real)) list\end{ttbox}The elements of this list are of the form $(m, c, f)$, where $m$ is a printmode identifier, $c$ a token class, and $f\colon string \to string \timesreal$ the actual translation function. Assuming that $x$ is of identifierclass $c$, and print mode $m$ is the first (active) mode providing sometranslation for $c$, then $x$ is output according to $f(x) = (x', len)$.Thereby $x'$ is the modified identifier name and $len$ its visual length interms of characters (e.g.\ length 1.0 would correspond to $1/2$\,em in\LaTeX). Thus $x'$ may include non-printing parts like control sequences ormarkup information for typesetting systems.\index{token translations|)}%%% Local Variables: %%% mode: latex%%% TeX-master: "ref"%%% End: