author lcp Wed, 23 Mar 1994 11:10:16 +0100 changeset 291 a615050a7494 parent 290 37d580c16af5 child 292 cc69ef31cfc3
first draft of Springer volume
 doc-src/Logics/LK.tex file | annotate | diff | comparison | revisions doc-src/Logics/defining.tex file | annotate | diff | comparison | revisions
--- a/doc-src/Logics/LK.tex	Tue Mar 22 12:43:51 1994 +0100
+++ b/doc-src/Logics/LK.tex	Wed Mar 23 11:10:16 1994 +0100
@@ -1,5 +1,5 @@
%% $Id$
-\chapter{First-order sequent calculus}
+\chapter{First-Order Sequent Calculus}
The directory~\ttindexbold{LK} implements classical first-order logic through
Gentzen's sequent calculus (see Gallier~\cite{gallier86} or
Takeuti~\cite{takeuti87}).  Resembling the method of semantic tableaux, the
@@ -244,7 +244,7 @@
\ttindex{res_inst_tac} can instantiate the variable~{\tt?P} in these rules,
specifying the formula to duplicate.

-See the files \ttindexbold{LK/lk.thy} and \ttindexbold{LK/lk.ML}
+See the files {\tt LK/lk.thy} and {\tt LK/lk.ML}
for complete listings of the rules and derived rules.

@@ -279,7 +279,7 @@

\item[\ttindexbold{cutL_tac} {\it P\/} {\it i}]
reads an \LK{} formula~$P$, and applies the cut rule to subgoal~$i$.  It
-then deletes some formula from the let side of the new subgoal $i+1$,
+then deletes some formula from the left side of the new subgoal $i+1$,
replacing that formula by~$P$.
\end{description}
All the structural rules --- cut, contraction, and thinning --- can be
@@ -309,10 +309,11 @@
checks that each conclusion formula is unifiable (using {\tt could_unify})
with some subgoal formula.

-\item[\ttindexbold{could_resolve} $(t,u)$]
-tests whether two sequents could be resolved.  Sequent $t$ is a premise
-(subgoal), while $u$ is the conclusion of an object-rule.  It uses {\tt
-could_res} to check the left and right sides of the two sequents.
+\item[\ttindexbold{could_resolve_seq} $(t,u)$]
+  tests whether two sequents could be resolved.  Sequent $t$ is a premise
+  (subgoal), while $u$ is the conclusion of an object-rule.  It simply
+  calls {\tt could_res} twice to check that both the left and the right
+  sides of the sequents are compatible.

\item[\ttindexbold{filseq_resolve_tac} {\it thms} {\it maxr} {\it i}]
uses {\tt filter_thms could_resolve} to extract the {\it thms} that are
@@ -324,10 +325,10 @@

\section{Packaging sequent rules}
-For theorem proving, rules can be classified as {\bf safe} or {\bf
-unsafe}.  An unsafe rule (typically a weakened quantifier rule) may reduce
-a provable goal to an unprovable set of subgoals, and should only be used
-as a last resort.
+For theorem proving, rules can be classified as {\bf safe} or {\bf unsafe}.
+An unsafe rule may reduce a provable goal to an unprovable set of subgoals,
+and should only be used as a last resort.  Typical examples are the
+weakened quantifier rules {\tt allL_thin} and {\tt exR_thin}.

A {\bf pack} is a pair whose first component is a list of safe
rules, and whose second is a list of unsafe rules.  Packs can be extended
@@ -541,7 +542,7 @@
$\turn \neg (\exists x. \forall y. y\in x \bimp y\not\in y)$
This does not require special properties of membership; we may
generalize $x\in y$ to an arbitrary predicate~$F(x,y)$.  The theorem has a
-short manual proof.  See the directory \ttindexbold{LK/ex} for many more
+short manual proof.  See the directory {\tt LK/ex} for many more
examples.

We set the main goal and move the negated formula to the left.
--- a/doc-src/Logics/defining.tex	Tue Mar 22 12:43:51 1994 +0100
+++ b/doc-src/Logics/defining.tex	Wed Mar 23 11:10:16 1994 +0100
@@ -1,128 +1,123 @@
%% $Id$
+%%  $$[a-zA-Z][a-zA-Z]}\.$$ $$[^ ]$$             \1  \2
+%%  @$$[a-z0-9]$$       ^{(\1)}

-\newcommand{\rmindex}[1]{{#1}\index{#1}\@}
-\newcommand{\mtt}[1]{\mbox{\tt #1}}
-\newcommand{\ttfct}[1]{\mathop{\mtt{#1}}\nolimits}
-\newcommand{\ttrel}[1]{\mathrel{\mtt{#1}}}
-\newcommand{\Constant}{\ttfct{Constant}}
-\newcommand{\Variable}{\ttfct{Variable}}
-\newcommand{\Appl}[1]{\ttfct{Appl}\mathopen{\mtt[}#1\mathclose{\mtt]}}
-
+\newcommand\rmindex[1]{{#1}\index{#1}\@}
+\newcommand\mtt[1]{\mbox{\tt #1}}
+\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
+\newcommand\ttapp{\mathrel{\hbox{\tt\$}}} +\newcommand\Constant{\ttfct{Constant}} +\newcommand\Variable{\ttfct{Variable}} +\newcommand\Appl[1]{\ttfct{Appl}\mathopen{\mtt[}#1\mathclose{\mtt]}} +\newcommand\AST{{\sc ast}} +\let\rew=\longrightarrow \chapter{Defining Logics} \label{Defining-Logics} -This chapter is intended for Isabelle experts. It explains how to define new -logical systems, Isabelle's {\em raison d'\^etre}. Isabelle logics are -hierarchies of theories. A number of simple examples are contained in {\em -Introduction to Isabelle}; the full syntax of theory definition files ({\tt -.thy} files) is shown in {\em The Isabelle Reference Manual}. This chapter's -chief purpose is a thorough description of all syntax related matters -concerning theories. The most important sections are \S\ref{sec:mixfix} about -mixfix declarations and \S\ref{sec:macros} describing the macro system. The -concluding examples of \S\ref{sec:min_logics} are more concerned with the -logical aspects of the definition of theories. Sections marked with * can be -skipped on first reading. +This chapter explains how to define new formal systems --- in particular, +their concrete syntax. While Isabelle can be regarded as a theorem prover +for set theory, higher-order logic or the sequent calculus, its +distinguishing feature is support for the definition of new logics. +Isabelle logics are hierarchies of theories, which are described and +illustrated in {\em Introduction to Isabelle}. That material, together +with the theory files provided in the examples directories, should suffice +for all simple applications. The easiest way to define a new theory is by +modifying a copy of an existing theory. + +This chapter is intended for experienced Isabelle users. It documents all +aspects of theories concerned with syntax: mixfix declarations, pretty +printing, macros and translation functions. The extended examples of +\S\ref{sec:min_logics} demonstrate the logical aspects of the definition of +theories. Sections marked with * are highly technical and might be skipped +on the first reading. -\section{Precedence grammars} \label{sec:precedence_grammars} +\section{Priority grammars} \label{sec:priority_grammars} +\index{grammars!priority|(} -The precise syntax of a logic is best defined by a \rmindex{context-free -grammar}. In order to simplify the description of mathematical languages, we -introduce an extended format which permits {\bf -precedences}\indexbold{precedence}. This scheme generalizes precedence -declarations in \ML\ and {\sc prolog}. In this extended grammar format, -nonterminals are decorated by integers, their precedence. In the sequel, -precedences are shown as subscripts. A nonterminal$A@p$on the right-hand -side of a production may only be replaced using a production$A@q = \gamma$-where$p \le q$. +The syntax of an Isabelle logic is specified by a {\bf priority grammar}. +A context-free grammar\index{grammars!context-free} contains a set of +productions of the form$A=\gamma$, where$A$is a nonterminal and +$\gamma$, the right-hand side, is a string of terminals and nonterminals. +Isabelle uses an extended format permitting {\bf priorities}, or +precedences. Each nonterminal is decorated by an integer priority, as +in~$A^{(p)}$. A nonterminal$A^{(p)}$in a derivation may be replaced +using a production$A^{(q)} = \gamma$only if$p \le q$. Any priority +grammar can be translated into a normal context free grammar by introducing +new nonterminals and productions. Formally, a set of context free productions$G$induces a derivation -relation$\rew@G$on strings as follows: -$\alpha A@p \beta ~\rew@G~ \alpha\gamma\beta ~~~iff~~~ - \exists (A@q=\gamma) \in G.~q \ge p -$ -Any extended grammar of this kind can be translated into a normal context -free grammar. However, this translation may require the introduction of a -large number of new nonterminals and productions. +relation$\rew@G$. Let$\alpha$and$\beta$denote strings of terminal or +nonterminal symbols. Then +$\alpha\, A^{(p)}\, \beta ~\rew@G~ \alpha\,\gamma\,\beta$ +if and only if$G$contains some production$A^{(q)}=\gamma$for~$q\ge p$. -\begin{example} \label{ex:precedence} The following simple grammar for arithmetic expressions demonstrates how -binding power and associativity of operators can be enforced by precedences. +binding power and associativity of operators can be enforced by priorities. \begin{center} \begin{tabular}{rclr} -$A@9$& = & {\tt0} \\ -$A@9$& = & {\tt(}$A@0${\tt)} \\ -$A@0$& = &$A@0${\tt+}$A@1$\\ -$A@2$& = &$A@3${\tt*}$A@2$\\ -$A@3$& = & {\tt-}$A@3$+$A^{(9)}$& = & {\tt0} \\ +$A^{(9)}$& = & {\tt(}$A^{(0)}${\tt)} \\ +$A^{(0)}$& = &$A^{(0)}${\tt+}$A^{(1)}$\\ +$A^{(2)}$& = &$A^{(3)}${\tt*}$A^{(2)}$\\ +$A^{(3)}$& = & {\tt-}$A^{(3)}$\end{tabular} \end{center} -The choice of precedences determines that {\tt -} binds tighter than {\tt *} -which binds tighter than {\tt +}, and that {\tt +} associates to the left and -{\tt *} to the right. -\end{example} +The choice of priorities determines that {\tt -} binds tighter than {\tt *}, +which binds tighter than {\tt +}. Furthermore {\tt +} associates to the +left and {\tt *} to the right. To minimize the number of subscripts, we adopt the following conventions: \begin{itemize} -\item All precedences$p$must be in the range$0 \leq p \leq max_pri$for - some fixed$max_pri$. -\item Precedence$0$on the right-hand side and precedence$max_pri$on the +\item All priorities$p$must be in the range$0 \leq p \leq max_pri$for + some fixed integer$max_pri$. +\item Priority$0$on the right-hand side and priority$max_pri$on the left-hand side may be omitted. \end{itemize} -In addition, we write the production$A@p = \alpha$as$A = \alpha~(p)$, -i.e.\ the precedence of the left-hand side actually appears in the uttermost -right. Finally, alternatives may be separated by$|$, and repetition +The production$A^{(p)} = \alpha$is written as$A = \alpha~(p)$; +the priority of the left-hand side actually appears in a column on the far +right. Finally, alternatives may be separated by$|$, and repetition indicated by \dots. -Using these conventions and assuming$max_pri=9$, the grammar in -Example~\ref{ex:precedence} becomes +Using these conventions and assuming$max_pri=9$, the grammar takes the form \begin{center} \begin{tabular}{rclc}$A$& = & {\tt0} & \hspace*{4em} \\ &$|$& {\tt(}$A${\tt)} \\ - &$|$&$A${\tt+}$A@1$& (0) \\ - &$|$&$A@3${\tt*}$A@2$& (2) \\ - &$|$& {\tt-}$A@3$& (3) + &$|$&$A${\tt+}$A^{(1)}$& (0) \\ + &$|$&$A^{(3)}${\tt*}$A^{(2)}$& (2) \\ + &$|$& {\tt-}$A^{(3)}$& (3) \end{tabular} \end{center} - +\index{grammars!priority|)} -\section{Basic syntax} \label{sec:basic_syntax} - -The basis of all extensions by object-logics is the \rmindex{Pure theory}, -bound to the \ML-identifier \ttindex{Pure.thy}. It contains, among many other -things, the \rmindex{Pure syntax}. An informal account of this basic syntax -(meta-logic, types etc.) may be found in {\em Introduction to Isabelle}. A -more precise description using a precedence grammar is shown in -Figure~\ref{fig:pure_gram}. - -\begin{figure}[htb] +\begin{figure} \begin{center} \begin{tabular}{rclc}$prop$&=& \ttindex{PROP}$aprop$~~$|$~~ {\tt(}$prop${\tt)} \\ - &$|$&$logic@3$\ttindex{==}$logic@2$& (2) \\ - &$|$&$logic@3$\ttindex{=?=}$logic@2$& (2) \\ - &$|$&$prop@2$\ttindex{==>}$prop@1$& (1) \\ - &$|$& {\tt[|}$prop${\tt;} \dots {\tt;}$prop${\tt|]} {\tt==>}$prop@1$& (1) \\ + &$|$&$logic^{(3)}$\ttindex{==}$logic^{(2)}$& (2) \\ + &$|$&$logic^{(3)}$\ttindex{=?=}$logic^{(2)}$& (2) \\ + &$|$&$prop^{(2)}$\ttindex{==>}$prop^{(1)}$& (1) \\ + &$|$& {\tt[|}$prop${\tt;} \dots {\tt;}$prop${\tt|]} {\tt==>}$prop^{(1)}$& (1) \\ &$|$& {\tt!!}$idts${\tt.}$prop$& (0) \\\\$logic$&=&$prop$~~$|$~~$fun$\\\\$aprop$&=&$id$~~$|$~~$var$- ~~$|$~~$fun@{max_pri}${\tt(}$logic${\tt,} \dots {\tt,}$logic${\tt)} \\\\ + ~~$|$~~$fun^{(max_pri)}${\tt(}$logic${\tt,} \dots {\tt,}$logic${\tt)} \\\\$fun$&=&$id$~~$|$~~$var$~~$|$~~ {\tt(}$fun${\tt)} \\ - &$|$&$fun@{max_pri}${\tt(}$logic${\tt,} \dots {\tt,}$logic${\tt)} \\ - &$|$&$fun@{max_pri}${\tt::}$type$\\ + &$|$&$fun^{(max_pri)}${\tt(}$logic${\tt,} \dots {\tt,}$logic${\tt)} \\ + &$|$&$fun^{(max_pri)}${\tt::}$type$\\ &$|$& \ttindex{\%}$idts${\tt.}$logic$& (0) \\\\ -$idts$&=&$idt$~~$|$~~$idt@1idts$\\\\ +$idts$&=&$idt$~~$|$~~$idt^{(1)}idts$\\\\$idt$&=&$id$~~$|$~~ {\tt(}$idt${\tt)} \\ &$|$&$id$\ttindex{::}$type$& (0) \\\\ -$type$&=&$tfree$~~$|$~~$tvar$~~$|$~~$tfree${\tt::}$sort$+$type$&=&$tid$~~$|$~~$tvar$~~$|$~~$tid${\tt::}$sort$~~$|$~~$tvar${\tt::}$sort$\\ - &$|$&$id$~~$|$~~$type@{max_pri}id$+ &$|$&$id$~~$|$~~$type^{(max_pri)}id$~~$|$~~ {\tt(}$type${\tt,} \dots {\tt,}$type${\tt)}$id$\\ - &$|$&$type@1$\ttindex{=>}$type$& (0) \\ + &$|$&$type^{(1)}$\ttindex{=>}$type$& (0) \\ &$|$& {\tt[}$type${\tt,} \dots {\tt,}$type${\tt]} {\tt=>}$type$&(0)\\ &$|$& {\tt(}$type${\tt)} \\\\$sort$&=&$id$~~$|$~~ {\tt\ttlbrace\ttrbrace} @@ -132,142 +127,160 @@ \indexbold{idts@$idts$} \indexbold{logic@$logic$} \indexbold{prop@$prop$} \indexbold{fun@$fun$} \end{center} -\caption{Meta-Logic Syntax} -\label{fig:pure_gram} +\caption{Meta-logic syntax}\label{fig:pure_gram} \end{figure} -The following main categories are defined: -\begin{description} - \item[$prop$] Terms of type$prop$, i.e.\ formulae of the meta-logic. + +\section{The Pure syntax} \label{sec:basic_syntax} +\index{syntax!Pure|(} - \item[$aprop$] Atomic propositions. +At the root of all object-logics lies the Pure theory,\index{theory!Pure} +bound to the \ML{} identifier \ttindex{Pure.thy}. It contains, among many +other things, the Pure syntax. An informal account of this basic syntax +(meta-logic, types, \ldots) may be found in {\em Introduction to Isabelle}. +A more precise description using a priority grammar is shown in +Fig.\ts\ref{fig:pure_gram}. The following nonterminals are defined: +\begin{description} + \item[$prop$] Terms of type$prop$. These are formulae of the meta-logic. - \item[$logic$] Terms of types in class$logic$. Initially,$logic$contains - merely$prop$. As the syntax is extended by new object-logics, more - productions for$logic$are added automatically (see below). + \item[$aprop$] Atomic propositions. These typically include the + judgement forms of the object-logic; its definition introduces a + meta-level predicate for each judgement form. + + \item[$logic$] Terms whose type belongs to class$logic$. Initially, + this category contains just$prop$. As the syntax is extended by new + object-logics, more productions for$logic$are added automatically + (see below). \item[$fun$] Terms potentially of function type. - \item[$type$] Meta-types. + \item[$type$] Types of the meta-logic. - \item[$idts$] A list of identifiers, possibly constrained by types. Note - that \verb|x :: nat y| is parsed as \verb|x :: (nat y)|, i.e.\ {\tt y} - would be treated like a type constructor applied to {\tt nat}. + \item[$idts$] A list of identifiers, possibly constrained by types. \end{description} - -\subsection{Logical types and default syntax} +\begin{warn} + Note that \verb|x::nat y| is parsed as \verb|x::(nat y)|, treating {\tt + y} like a type constructor applied to {\tt nat}. The likely result is + an error message. To avoid this interpretation, use parentheses and + write \verb|(x::nat) y|. -Isabelle is concerned with mathematical languages which have a certain -minimal vocabulary: identifiers, variables, parentheses, and the lambda -calculus. Logical types, i.e.\ those of class$logic$, are automatically -equipped with this basic syntax. More precisely, for any type constructor -$ty$with arity$(\vec{s})c$, where$c$is a subclass of$logic$, the -following productions are added: + Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and + yields a syntax error. The correct form is \verb|(x::nat) (y::nat)|. +\end{warn} + +\subsection{Logical types and default syntax}\label{logical-types} +Isabelle's representation of mathematical languages is based on the typed +$\lambda$-calculus. All logical types, namely those of class$logic$, are +automatically equipped with a basic syntax of types, identifiers, +variables, parentheses,$\lambda$-abstractions and applications. + +More precisely, for each type constructor$ty$with arity$(\vec{s})c$, +where$c$is a subclass of$logic$, several productions are added: \begin{center} \begin{tabular}{rclc}$ty$&=&$id$~~$|$~~$var$~~$|$~~ {\tt(}$ty${\tt)} \\ - &$|$&$fun@{max_pri}${\tt(}$logic${\tt,} \dots {\tt,}$logic${\tt)}\\ - &$|$&$ty@{max_pri}${\tt::}$type$\\\\ + &$|$&$fun^{(max_pri)}${\tt(}$logic${\tt,} \dots {\tt,}$logic${\tt)}\\ + &$|$&$ty^{(max_pri)}${\tt::}$type$\\\\$logic$&=&$ty$\end{tabular} \end{center} -\subsection{Lexical matters *} - -The parser does not process input strings directly, it rather operates on -token lists provided by Isabelle's \rmindex{lexical analyzer} (the -\bfindex{lexer}). There are two different kinds of tokens: {\bf - delimiters}\indexbold{delimiter} and {\bf names}\indexbold{name}. +\subsection{Lexical matters} +The parser does not process input strings directly. It operates on token +lists provided by Isabelle's \bfindex{lexer}. There are two kinds of +tokens: \bfindex{delimiters} and \bfindex{name tokens}. -Delimiters can be regarded as reserved words\index{reserved word} of the -syntax and the user can add new ones, when extending theories. In -Figure~\ref{fig:pure_gram} they appear in typewriter font, e.g.\ {\tt PROP}, -{\tt ==}, {\tt =?=}, {\tt ;}. +Delimiters can be regarded as reserved words of the syntax. You can +add new ones when extending theories. In Fig.\ts\ref{fig:pure_gram} they +appear in typewriter font, for example {\tt ==}, {\tt =?=} and +{\tt PROP}\@. -Names on the other hand have a fixed predefined syntax. The lexer -distinguishes four kinds of them: identifiers\index{identifier}, -unknowns\index{unknown}\index{scheme variable|see{unknown}}, type -variables\index{type variable}, type unknowns\index{type unknown}\index{type -scheme variable|see{type unknown}}; they are denoted by$id$\index{id@$id$}, -$var$\index{var@$var$},$tfree$\index{tfree@$tfree$}, -$tvar$\index{tvar@$tvar$}, respectively. Typical examples are {\tt x}, {\tt -?x7}, {\tt 'a}, {\tt ?'a3}, the exact syntax is: +Name tokens have a predefined syntax. The lexer distinguishes four +disjoint classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type +identifiers\index{identifiers!type}, type unknowns\index{unknowns!type}. +They are denoted by$id$\index{id@$id$},$var$\index{var@$var$}, +$tid$\index{tid@$tid$},$tvar$\index{tvar@$tvar$}, respectively. Typical +examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3}. Here is the precise +syntax: \begin{eqnarray*} id & = & letter~quasiletter^* \\ var & = & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\ -tfree & = & \mbox{\tt '}id \\ -tvar & = & \mbox{\tt ?}tfree ~~|~~ - \mbox{\tt ?}tfree\mbox{\tt .}nat \$1ex] +tid & = & \mbox{\tt '}id \\ +tvar & = & \mbox{\tt ?}tid ~~|~~ + \mbox{\tt ?}tid\mbox{\tt .}nat \\[1ex] letter & = & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\ digit & = & \mbox{one of {\tt 0}\dots {\tt 9}} \\ quasiletter & = & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\ nat & = & digit^+ \end{eqnarray*} -A var or tvar describes an \rmindex{unknown} which is internally a pair +A var or tvar describes an unknown, which is internally a pair of base name and index (\ML\ type \ttindex{indexname}). These components are -either explicitly separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or -directly run together as in {\tt ?x1}. The latter form is possible, if the -base name does not end with digits; if the index is 0, it may be dropped -altogether: {\tt ?x} abbreviates {\tt ?x0} or {\tt ?x.0}. +either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or +run together as in {\tt ?x1}. The latter form is possible if the +base name does not end with digits. If the index is 0, it may be dropped +altogether: {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}. -Note that id, var, tfree, tvar are mutually disjoint, but it is -perfectly legal that they overlap with the set of delimiters (e.g.\ {\tt - PROP}, {\tt ALL}, {\tt EX}). +The lexer repeatedly takes the maximal prefix of the input string that +forms a valid token. A maximal prefix that is both a delimiter and a name +is treated as a delimiter. Spaces, tabs and newlines are separators; they +never occur within tokens. -The lexical analyzer translates input strings to token lists by repeatedly -taking the maximal prefix of the input string that forms a valid token. A -maximal prefix that is both a delimiter and a name is treated as a delimiter. -Spaces, tabs and newlines are separators; they never occur within tokens. +Delimiters need not be separated by white space. For example, if {\tt -} +is a delimiter but {\tt --} is not, then the string {\tt --} is treated as +two consecutive occurrences of the token~{\tt -}. In contrast, \ML\ +treats {\tt --} as a single symbolic name. The consequence of Isabelle's +more liberal scheme is that the same string may be parsed in different ways +after extending the syntax: after adding {\tt --} as a delimiter, the input +{\tt --} is treated as a single token. -Note that delimiters need not necessarily be surrounded by white space to be -recognized as separate. For example, if {\tt -} is a delimiter but {\tt --} -is not, then the string {\tt --} is treated as two consecutive occurrences of -{\tt -}. This is in contrast to \ML\ which always treats {\tt --} as a single -symbolic name. The consequence of Isabelle's more liberal scheme is that the -same string may be parsed in different ways after extending the syntax: after -adding {\tt --} as a delimiter, the input {\tt --} is treated as a single -token. +Name tokens are terminal symbols, strictly speaking, but we can generally +regard them as nonterminals. This is because a name token carries with it +useful information, the name. Delimiters, on the other hand, are nothing +but than syntactic sugar. -\subsection{Inspecting syntax *} - -You may get the \ML\ representation of the syntax of any Isabelle theory by -applying \index{*syn_of} -\begin{ttbox} - syn_of: theory -> Syntax.syntax -\end{ttbox} -\ttindex{Syntax.syntax} is an abstract type. Values of this type can be -displayed by the following functions: \index{*Syntax.print_syntax} -\index{*Syntax.print_gram} \index{*Syntax.print_trans} +\subsection{*Inspecting the syntax} \begin{ttbox} - Syntax.print_syntax: Syntax.syntax -> unit - Syntax.print_gram: Syntax.syntax -> unit - Syntax.print_trans: Syntax.syntax -> unit +syn_of : theory -> Syntax.syntax +Syntax.print_syntax : Syntax.syntax -> unit +Syntax.print_gram : Syntax.syntax -> unit +Syntax.print_trans : Syntax.syntax -> unit \end{ttbox} -{\tt Syntax.print_syntax} shows virtually all information contained in a -syntax. Its output is divided into labeled sections. The syntax proper is -represented by {\tt lexicon}, {\tt roots} and {\tt prods}. The rest refers to -the manifold facilities to apply syntactic translations (macro expansion -etc.). +The abstract type \ttindex{Syntax.syntax} allows manipulation of syntaxes +in \ML. You can display values of this type by calling the following +functions: +\begin{description} +\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle + theory~{\it thy} as an \ML\ value. + +\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all + information contained in the syntax {\it syn}. The displayed output can + be large. The following two functions are more selective. -To cope with the verbosity of {\tt Syntax.print_syntax}, there are -\ttindex{Syntax.print_gram} and \ttindex{Syntax.print_trans} to print the -syntax proper and the translation related information only. +\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part + of~{\it syn}, namely the lexicon, roots and productions. -Let's have a closer look at part of Pure's syntax: +\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation + part of~{\it syn}, namely the constants, parse/print macros and + parse/print translations. +\end{description} + +Let us demonstrate these functions by inspecting Pure's syntax. Even that +is too verbose to display in full. \begin{ttbox} Syntax.print_syntax (syn_of Pure.thy); -{\out lexicon: "!!" "%" "(" ")" "," "." "::" ";" "==" "==>" \dots} +{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots} {\out roots: logic type fun prop} {\out prods:} -{\out type = tfree (1000)} +{\out type = tid (1000)} {\out type = tvar (1000)} {\out type = id (1000)} -{\out type = tfree "::" sort[0] => "_ofsort" (1000)} +{\out type = tid "::" sort[0] => "_ofsort" (1000)} {\out type = tvar "::" sort[0] => "_ofsort" (1000)} {\out \vdots} +\ttbreak {\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots} {\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"} {\out "_idtyp" "_lambda" "_tapp" "_tappl"} @@ -278,416 +291,200 @@ {\out print_ast_translation: "==>" "_abs" "_idts" "fun"} \end{ttbox} +As you can see, the output is divided into labeled sections. The grammar +is represented by {\tt lexicon}, {\tt roots} and {\tt prods}. The rest +refers to syntactic translations and macro expansion. Here is an +explanation of the various sections. \begin{description} - \item[\ttindex{lexicon}] The set of delimiters used for lexical analysis. + \item[\ttindex{lexicon}] lists the delimiters used for lexical + analysis.\index{delimiters} - \item[\ttindex{roots}] - The legal syntactic categories to start parsing with. You name the - desired root directly as a string when calling lower level functions or - specifying macros. Higher level functions usually expect a type and - derive the actual root as follows:\index{root_of_type@root_of_type} - \begin{itemize} - \item root_of_type(\tau@1 \To \tau@2) = \mtt{fun}. - - \item root_of_type(\tau@1 \mathrel{ty} \tau@2) = ty. - - \item root_of_type((\tau@1, \dots, \tau@n)ty) = ty. - - \item root_of_type(\alpha) = \mtt{logic}. - \end{itemize} - Thereby \tau@1, \dots, \tau@n are types, ty an infix or ordinary type - constructor and \alpha a type variable or unknown. Note that only the - outermost type constructor is taken into account. + \item[\ttindex{roots}] lists the grammar's nonterminal symbols. You must + name the desired root when calling lower level functions or specifying + macros. Higher level functions usually expect a type and derive the + actual root as described in~\S\ref{sec:grammar}. - \item[\ttindex{prods}] - The list of productions describing the precedence grammar. Nonterminals - A@n are rendered in {\sc ascii} as {\tt A[n]}, delimiters are - quoted. Some productions have strings attached after an {\tt =>}. These - strings later become the heads of parse trees, but they also play a vital - role when terms are printed (see \S\ref{sec:asts}). + \item[\ttindex{prods}] lists the productions of the priority grammar. + The nonterminal A^{(n)} is rendered in {\sc ascii} as {\tt A[n]}. + Each delimiter is quoted. Some productions are shown with {\tt =>} and + an attached string. These strings later become the heads of parse + trees; they also play a vital role when terms are printed (see + \S\ref{sec:asts}). - Productions which do not have string attached and thus do not create a - new parse tree node are called {\bf copy productions}\indexbold{copy - production}. They must have exactly one - argument\index{argument!production} (i.e.\ nonterminal or name token) - on the right-hand side. The parse tree generated when parsing that - argument is simply passed up as the result of parsing the whole copy - production. + Productions with no strings attached are called {\bf copy + productions}\indexbold{productions!copy}. Their right-hand side must + have exactly one nonterminal symbol (or name token). The parser does + not create a new parse tree node for copy productions, but simply + returns the parse tree of the right-hand symbol. - A special kind of copy production is one where the argument is a - nonterminal and no additional syntactic sugar (delimiters) exists. It is - called a \bfindex{chain production}. Chain productions should be seen as - an abbreviation mechanism: conceptually, they are removed from the - grammar by adding appropriate new rules. Precedence information attached - to chain productions is ignored, only the dummy value -1 is displayed. + If the right-hand side consists of a single nonterminal with no + delimiters, then the copy production is called a {\bf chain + production}\indexbold{productions!chain}. Chain productions should + be seen as abbreviations: conceptually, they are removed from the + grammar by adding new productions. Priority information + attached to chain productions is ignored, only the dummy value -1 is + displayed. \item[\ttindex{consts}, \ttindex{parse_rules}, \ttindex{print_rules}] - This is macro related information (see \S\ref{sec:macros}). - - \item[\tt *_translation] - \index{*parse_ast_translation} \index{*parse_translation} - \index{*print_translation} \index{*print_ast_translation} - The sets of constants that invoke translation functions. These are more - arcane matters (see \S\ref{sec:asts} and \S\ref{sec:tr_funs}). -\end{description} - -You may inspect the syntax of any theory using the above calling -sequence. Beware that, as more and more material accumulates, the output -becomes longer and longer. -% When extending syntaxes, new {\tt roots}, {\tt prods}, {\tt parse_rules} -% and {\tt print_rules} are appended to the end. The other lists are -% displayed sorted. - - -\section{Abstract syntax trees} \label{sec:asts} - -Figure~\ref{fig:parse_print} shows a simplified model of the parsing and -printing process. - -\begin{figure}[htb] -\begin{center} -\begin{tabular}{cl} -string & \\ -\downarrow & parser \\ -parse tree & \\ -\downarrow & \rmindex{parse ast translation} \\ -ast & \\ -\downarrow & ast rewriting (macros) \\ -ast & \\ -\downarrow & \rmindex{parse translation}, type-inference \\ ---- well-typed term --- & \\ -\downarrow & \rmindex{print translation} \\ -ast & \\ -\downarrow & ast rewriting (macros) \\ -ast & \\ -\downarrow & \rmindex{print ast translation}, printer \\ -string & -\end{tabular} -\end{center} -\caption{Parsing and Printing} -\label{fig:parse_print} -\end{figure} - -The parser takes an input string (more precisely the token list produced by -the lexer) and produces a \rmin«ôx{parse tree}, which is directly derived -from the productions involved. By applying some internal transformations the -parse tree becomes an \bfindex{abstract syntax tree} (\bfindex{ast}). Macro -expansion, further translations and finally type inference yields a -well-typed term\index{term!well-typed}. - -The printing process is the reverse, except for some subtleties to be -discussed later. - -Asts are an intermediate form between the very raw parse trees and the typed -\lambda-terms. An ast is either an atom (constant or variable) or a list of -{\em at least two\/} subasts. Internally, they have type -\ttindex{Syntax.ast}: \index{*Constant} \index{*Variable} \index{*Appl} -\begin{ttbox} -datatype ast = Constant of string - | Variable of string - | Appl of ast list -\end{ttbox} - -We write constant atoms as quoted strings, variable atoms as non-quoted -strings and applications as list of subasts separated by space and enclosed -in parentheses. For example: -\begin{ttbox} - Appl [Constant "_constrain", - Appl [Constant "_abs", Variable "x", Variable "t"], - Appl [Constant "fun", Variable "'a", Variable "'b"]] -\end{ttbox} -is written as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. -Note that {\tt ()} and {\tt (f)} are both illegal. - -The resemblance of Lisp's S-expressions is intentional, but notice the two -kinds of atomic symbols: \Constant x and \Variable x. This distinction -has some more obscure reasons and you can ignore it about half of the time. -You should not take the names {\tt Constant}'' and {\tt Variable}'' too -literally. In the later translation to terms, \Variable x may become a -constant, free or bound variable, even a type constructor or class name; the -actual outcome depends on the context. - -Similarly, you can think of {\tt (} f~x@1~\ldots~x@n{\tt )} as some sort of -application of f to the arguments x@1, \ldots, x@n. But the actual kind -of application (say a type constructor f applied to types x@1, \ldots, -x@n) is determined later by context, too. - -Forms like {\tt (("_abs" x t) u)} are perfectly legal, but asts are not -higher-order: the {\tt "_abs"} does not yet bind the {\tt x} in any way, -though later at the term level, {\tt ("_abs" x t)} will become an {\tt Abs} -node and occurrences of {\tt x} in t will be replaced by {\tt Bound}s. Even -if non-constant heads of applications may seem unusual, asts should be -regarded as first-order. Purists may think of {\tt (} f~x@1~\ldots~x@n{\tt -)} as a first-order application of some invisible (n+1)-ary constant. - - -\subsection{Parse trees to asts} - -Asts are generated from parse trees by applying some translation functions, -which are internally called {\bf parse ast translations}\indexbold{parse ast -translation}. - -We can think of the raw output of the parser as trees built up by nesting the -right-hand sides of those productions that were used to derive a word that -matches the input token list. Then parse trees are simply lists of tokens and -sub parse trees, the latter replacing the nonterminals of the productions. -Note that we refer here to the actual productions in their internal form as -displayed by {\tt Syntax.print_syntax}. - -Ignoring parse ast translations, the mapping -ast_of_pt\index{ast_of_pt@ast_of_pt} from parse trees to asts is derived -from the productions involved as follows: -\begin{itemize} - \item Name tokens: ast_of_pt(t) = \Variable s, where t is an id, - var, tfree or tvar token, and s its associated string. - - \item Copy productions: ast_of_pt(\ldots P \ldots) = ast_of_pt(P). - - \item 0-ary productions: ast_of_pt(\ldots \mtt{=>} c) = \Constant c. + relate to macros (see \S\ref{sec:macros}). - \item n-ary productions: ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>} - c) = \Appl{\Constant c, ast_of_pt(P@1), \ldots, ast_of_pt(P@n)}, - where n \ge 1. -\end{itemize} -where P, P@1, \ldots, P@n denote parse trees or name tokens and \dots'' -zero or more delimiters. Delimiters are stripped and do not appear in asts. -The following table presents some simple examples: -\begin{center} -{\tt\begin{tabular}{ll} -\rm input string & \rm ast \\\hline -"f" & f \\ -"'a" & 'a \\ -"t == u" & ("==" t u) \\ -"f(x)" & ("_appl" f x) \\ -"f(x, y)" & ("_appl" f ("_args" x y)) \\ -"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ -"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ -\end{tabular}} -\end{center} -Some of these examples illustrate why further translations are desirable in -order to provide some nice standard form of asts before macro expansion takes -place. Hence the Pure syntax provides predefined parse ast -translations\index{parse ast translation!of Pure} for ordinary applications, -type applications, nested abstractions, meta implications and function types. -Their net effect on some representative input strings is shown in -Figure~\ref{fig:parse_ast_tr}. - -\begin{figure}[htb] -\begin{center} -{\tt\begin{tabular}{ll} -\rm string & \rm ast \\\hline -"f(x, y, z)" & (f x y z) \\ -"'a ty" & (ty 'a) \\ -"('a, 'b) ty" & (ty 'a 'b) \\ -"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ -"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ -"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ -"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) -\end{tabular}} -\end{center} -\caption{Built-in Parse Ast Translations} -\label{fig:parse_ast_tr} -\end{figure} - -This translation process is guided by names of constant heads of asts. The -list of constants invoking parse ast translations is shown in the output of -{\tt Syntax.print_syntax} under {\tt parse_ast_translation}. - - -\subsection{Asts to terms *} - -After the ast has been normalized by the macro expander (see -\S\ref{sec:macros}), it is transformed into a term with yet another set of -translation functions interspersed: {\bf parse translations}\indexbold{parse -translation}. The result is a non-typed term\index{term!non-typed} which may -contain type constraints, that is 2-place applications with head {\tt -"_constrain"}. The second argument of a constraint is a type encoded as term. -Real types will be introduced later during type inference, which is not -discussed here. - -If we ignore the built-in parse translations of Pure first, then the mapping -term_of_ast\index{term_of_ast@term_of_ast} from asts to (non-typed) terms -is defined by: -\begin{itemize} - \item term_of_ast(\Constant x) = \ttfct{Const} (x, \mtt{dummyT}). - - \item term_of_ast(\Variable \mtt{"?}xi\mtt") = \ttfct{Var} ((x, i), - \mtt{dummyT}), where x is the base name and i the index extracted - from xi. - - \item term_of_ast(\Variable x) = \ttfct{Free} (x, \mtt{dummyT}). + \item[\ttindex{parse_ast_translation}, \ttindex{print_ast_translation}] + list sets of constants that invoke translation functions for abstract + syntax trees. Section \S\ref{sec:asts} below discusses this obscure + matter. - \item term_of_ast(\Appl{f, x@1, \ldots, x@n}) = term_of_ast(f) ~{\tt\}~ - term_of_ast(x@1) {\tt\} \dots\ {\tt\} term_of_ast(x@n). -\end{itemize} -Note that {\tt Const}, {\tt Var}, {\tt Free} belong to the datatype {\tt -term} and \ttindex{dummyT} is bound to some dummy {\tt typ}, which is ignored -during type-inference. - -So far the outcome is still a first-order term. {\tt Abs}s and {\tt Bound}s -are introduced by parse translations associated with {\tt "_abs"} and {\tt -"!!"} (and any other user defined binder). - - -\subsection{Printing of terms *} - -When terms are prepared for printing, they are first transformed into asts -via ast_of_term\index{ast_of_term@ast_of_term} (ignoring {\bf print -translations}\indexbold{print translation}): -\begin{itemize} - \item ast_of_term(\ttfct{Const} (x, \tau)) = \Constant x. - - \item ast_of_term(\ttfct{Free} (x, \tau)) = constrain (\Variable x, - \tau). - - \item ast_of_term(\ttfct{Var} ((x, i), \tau)) = constrain (\Variable - \mtt{"?}xi\mtt", \tau), where \mtt?xi is the string representation of - the {\tt indexname} (x, i). - - \item ast_of_term(\ttfct{Abs} (x, \tau, t)) = \ttfct{Appl} - \mathopen{\mtt[} \Constant \mtt{"_abs"}, constrain(\Variable x', \tau), - ast_of_term(t') \mathclose{\mtt]}, where x' is a version of x - renamed away from all names occurring in t, and t' the body t with - all {\tt Bound}s referring to this {\tt Abs} replaced by \ttfct{Free} - (x', \mtt{dummyT}). - - \item ast_of_term(\ttfct{Bound} i) = \Variable \mtt{"B.}i\mtt". Note that - the occurrence of loose bound variables is abnormal and should never - happen when printing well-typed terms. - - \item ast_of_term(f \ttrel{\} x@1 \ttrel{\} \ldots \ttrel{\} x@n) = - \ttfct{Appl} \mathopen{\mtt[} ast_of_term(f), ast_of_term(x@1), \ldots, - ast_of_term(x@n) \mathclose{\mtt]}, where f is not an application. - - \item constrain(x, \tau) = x, if \tau = \mtt{dummyT} \index{*dummyT} or - \ttindex{show_types} not set to {\tt true}. - - \item constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, ty}, - where ty is the ast encoding of \tau. That is: type constructors as - {\tt Constant}s, type variables as {\tt Variable}s and type applications - as {\tt Appl}s with the head type constructor as first element. - Additionally, if \ttindex{show_sorts} is set to {\tt true}, some type - variables are decorated with an ast encoding of their sort. -\end{itemize} - -\medskip -After an ast has been normalized wrt.\ the print macros, it is transformed -into the final output string. The built-in {\bf print ast -translations}\indexbold{print ast translation} are essentially the reverse -ones of the parse ast translations of Figure~\ref{fig:parse_ast_tr}. - -For the actual printing process, the names attached to grammar productions of -the form \ldots A@{p@1}^1 \ldots A@{p@n}^n \ldots \mtt{=>} c play a vital -role. Whenever an ast with constant head c, i.e.\ \mtt"c\mtt" or -(\mtt"c\mtt"~ x@1 \ldots x@n), is encountered it is printed according to -the production for c. This means that first the arguments x@1 \dots x@n -are printed, then put in parentheses if necessary for reasons of precedence, -and finally joined to a single string, separated by the syntactic sugar of -the production (denoted by \dots'' above). - -If an application (\mtt"c\mtt"~ x@1 \ldots x@m) has more arguments than the -corresponding production, it is first split into ((\mtt"c\mtt"~ x@1 \ldots -x@n) ~ x@{n+1} \ldots x@m). Applications with too few arguments or with -non-constant head or without a corresponding production are printed as -f(x@1, \ldots, x@l) or (\alpha@1, \ldots, \alpha@l) ty. A single -\Variable x is simply printed as x. - -Note that the system does {\em not\/} insert blanks automatically. They -should be part of the mixfix declaration the production has been derived -from, if they are required to separate tokens. Mixfix declarations may also -contain pretty printing annotations. - + \item[\ttindex{parse_translation}, \ttindex{print_translation}] list sets + of constants that invoke translation functions for terms (see + \S\ref{sec:tr_funs}). +\end{description} +\index{syntax!Pure|)} \section{Mixfix declarations} \label{sec:mixfix} +\index{mixfix declaration|(} -When defining theories, the user usually declares new constants as names -associated with a type followed by an optional \bfindex{mixfix annotation}. -If none of the constants are introduced with mixfix annotations, there is no -concrete syntax to speak of: terms can only be abstractions or applications -of the form f(t@1, \dots, t@n). Since this notation quickly becomes -unreadable, Isabelle supports syntax definitions in the form of unrestricted -context-free \index{context-free grammar} \index{precedence grammar} -precedence grammars using mixfix annotations. +When defining a theory, you declare new constants by giving their names, +their type, and an optional {\bf mixfix annotation}. Mixfix annotations +allow you to extend Isabelle's basic \lambda-calculus syntax with +readable notation. They can express any context-free priority grammar. +Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more +general than the priority declarations of \ML\ and Prolog. + +A mixfix annotation defines a production of the priority grammar. It +describes the concrete syntax, the translation to abstract syntax, and the +pretty printing. Special case annotations provide a simple means of +specifying infix operators, binders and so forth. -Mixfix annotations describe the {\em concrete\/} syntax, a standard -translation into the abstract syntax and a pretty printing scheme, all in -one. Isabelle syntax definitions are inspired by \OBJ's~\cite{OBJ} {\em -mixfix\/} syntax. Each mixfix annotation defines a precedence grammar -production and optionally associates a constant with it. - -There is a general form of mixfix annotation and some shortcuts for common -cases like infix operators. +\subsection{Grammar productions}\label{sec:grammar} +Let us examine the treatment of the production +\[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\, + A@n^{(p@n)}\, w@n.$ +Here$A@i^{(p@i)}$is a nonterminal with priority~$p@i$for$i=1$, +\ldots,~$n$, while$w@0$, \ldots,~$w@n$are strings of terminals. +In the corresponding mixfix annotation, the priorities are given separately +as$[p@1,\ldots,p@n]$and~$p$. The nonterminal symbols are identified with +types~$\tau$,$\tau@1$, \ldots,~$\tau@n$respectively, and the production's +effect on nonterminals is expressed as the function type +$[\tau@1, \ldots, \tau@n]\To \tau.$ +Finally, the template +$w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n$ +describes the strings of terminals. -The general \bfindex{mixfix declaration} as it may occur within the {\tt -consts} section\index{consts section@{\tt consts} section} of a {\tt .thy} -file, specifies a constant declaration and a grammar production at the same -time. It has the form {\tt$c$::\ "$\tau$" ("$sy$"$psp$)} and is -interpreted as follows: -\begin{itemize} - \item {\tt$c$::\ "$\tau$"} is the actual constant declaration without any - syntactic significance. +A simple type is typically declared for each nonterminal symbol. In +first-order logic, type~$i$stands for terms and~$o$for formulae. Only +the outermost type constructor is taken into account. For example, any +type of the form$\sigma list$stands for a list; productions may refer +to the symbol {\tt list} and will apply lists of any type. + +The symbol associated with a type is called its {\bf root} since it may +serve as the root of a parse tree. Precisely, the root of$(\tau@1, \dots,
+\tau@n)ty$is$ty$, where$\tau@1$, \ldots,$\tau@n$are types and$ty$is +a type constructor. Type infixes are a special case of this; in +particular, the root of$\tau@1 \To \tau@2$is {\tt fun}. Finally, the +root of a type variable is {\tt logic}; general productions might +refer to this nonterminal. - \item$sy$is the right-hand side of the production, specified as a - symbolic string. In general,$sy$is of the form$\alpha@0 \_ \alpha@1
-    \dots \alpha@{n-1} \_ \alpha@n$, where each occurrence of \ttindex{_} - denotes an argument\index{argument!mixfix} position and the strings -$\alpha@i$do not contain (non-escaped) {\tt _}. The$\alpha@i$may - consist of delimiters\index{delimiter}, spaces\index{space (pretty - printing)} or \rmindex{pretty printing} annotations (see below). +Identifying nonterminals with types allows a constant's type to specify +syntax as well. We can declare the function~$f$to have type$[\tau@1,
+\ldots, \tau@n]\To \tau$and, through a mixfix annotation, specify the +layout of the function's$n$arguments. The constant's name, in this +case~$f$, will also serve as the label in the abstract syntax tree. There +are two exceptions to this treatment of constants: +\begin{enumerate} + \item A production need not map directly to a logical function. In this + case, you must declare a constant whose purpose is purely syntactic. + By convention such constants begin with the symbol~{\tt\at}, + ensuring that they can never be written in formulae. + + \item A copy production has no associated constant. +\end{enumerate} +There is something artificial about this representation of productions, +but it is convenient, particularly for simple theory extensions. - \item$\tau$specifies the syntactic categories of the arguments on the - left-hand and of the right-hand side. Arguments may be nonterminals or - name tokens. If$sy$is of the form above,$\tau$must be a nested - function type with at least$n$argument positions, say$\tau = [\tau@1,
-    \dots, \tau@n] \To \tau'$. The syntactic category of argument$i$is - derived from type$\tau@i$(see$root_of_type$in - \S\ref{sec:basic_syntax}). The result, i.e.\ the left-hand side of the - production, is derived from type$\tau'$. Note that the$\tau@i$and -$\tau'$may be function types. +\subsection{The general mixfix form} +Here is a detailed account of the general \bfindex{mixfix declaration} as +it may occur within the {\tt consts} section of a {\tt .thy} file. +\begin{center} + {\tt "$c$" ::\ "$\sigma$" ("$template$"$psp$)} +\end{center} +This constant declaration and mixfix annotation is interpreted as follows: +\begin{itemize} +\item The string {\tt "$c$"} is the name of the constant associated with + the production. If$c$is empty (given as~{\tt ""}) then this is a copy + production.\index{productions!copy} Otherwise, parsing an instance of the + phrase$template$generates the \AST{} {\tt ("$c$"$a@1\ldots$+$a@n$)}, where$a@i$is the \AST{} generated by parsing the$i$-th + argument. + + \item The constant$c$, if non-empty, is declared to have type$\sigma$. - \item$c$is the name of the constant associated with the production. If -$c$is the empty string {\tt ""} then this is a \rmindex{copy - production}. Otherwise, parsing an instance of the phrase$sy$generates - the ast {\tt ("$c$"$a@1\ldotsa@n$)}, where$a@i$is the ast - generated by parsing the$i$-th argument. + \item The string$template$specifies the right-hand side of + the production. It has the form + $w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n,$ + where each occurrence of \ttindex{_} denotes an + argument\index{argument!mixfix} position and the~$w@i$do not + contain~{\tt _}. (If you want a literal~{\tt _} in the concrete + syntax, you must escape it as described below.) The$w@i$may + consist of \rmindex{delimiters}, spaces or \rmindex{pretty + printing} annotations (see below). - \item$ps$is an optional list of at most$n$integers, say {\tt [$p@1$, -$\ldots$,$p@m$]}, where$p@i$is the minimal \rmindex{precedence} - required of any phrase that may appear as the$i$-th argument. Missing - precedences default to$0$. + \item The type$\sigma$specifies the production's nonterminal symbols (or name + tokens). If$template$is of the form above then$\sigma$must be a + function type with at least~$n$argument positions, say$\sigma =
+    [\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are derived + from the type$\tau@1$, \ldots,~$\tau@n$,$\tau$as described above. + Any of these may be function types; the corresponding root is then {\tt + fun}. - \item$p$is the \rmindex{precedence} the of this production. + \item The optional list~$ps$may contain at most$n$integers, say {\tt + [$p@1$,$\ldots$,$p@m$]}, where$p@i$is the minimal + priority\indexbold{priorities} required of any phrase that may appear + as the$i$-th argument. Missing priorities default to~$0$. + + \item The integer$p$is the priority of this production. If omitted, it + defaults to the maximal priority. + + Priorities, or precedences, range between$0$and +$max_pri$\indexbold{max_pri@$max_pri$} (= 1000). \end{itemize} -Precedences\index{precedence!range of} may range between$0$and -$max_pri$\indexbold{max_pri@$max_pri$} (= 1000). If you want to ignore them, -the safest way to do so is to use the declaration {\tt$c$::\ "$\tau$" -("$sy$")}. The resulting production puts no precedence constraints on any of -its arguments and has maximal precedence itself. +The declaration {\tt$c$::\ "$\sigma$" ("$template$")} specifies no +priorities. The resulting production puts no priority constraints on any +of its arguments and has maximal priority itself. Omitting priorities in +this manner will introduce syntactic ambiguities unless the production's +right-hand side is fully bracketed, as in \verb|"if _ then _ else _ fi"|. -\begin{example} -The following theory specification contains a {\tt consts} section that -encodes the precedence grammar of Example~\ref{ex:precedence} as mixfix -declarations: +\begin{warn} + Theories must sometimes declare types for purely syntactic purposes. One + example is {\tt type}, the built-in type of types. This is a type of + all types' in the syntactic sense only. Do not declare such types under + {\tt arities} as belonging to class$logic$, for that would allow their + use in arbitrary Isabelle expressions~(\S\ref{logical-types}). +\end{warn} + +\subsection{Example: arithmetic expressions} +This theory specification contains a {\tt consts} section with mixfix +declarations encoding the priority grammar from +\S\ref{sec:priority_grammars}: \begin{ttbox} EXP = Pure + types - exp 0 + exp arities exp :: logic consts - "0" :: "exp" ("0" 9) - "+" :: "[exp, exp] => exp" ("_ + _" [0, 1] 0) - "*" :: "[exp, exp] => exp" ("_ * _" [3, 2] 2) - "-" :: "exp => exp" ("- _" [3] 3) + "0" :: "exp" ("0" 9) + "+" :: "[exp, exp] => exp" ("_ + _" [0, 1] 0) + "*" :: "[exp, exp] => exp" ("_ * _" [3, 2] 2) + "-" :: "exp => exp" ("- _" [3] 3) end \end{ttbox} Note that the {\tt arities} declaration causes {\tt exp} to be added to the -syntax' roots. If you put the above text into a file {\tt exp.thy} and load +syntax' roots. If you put the text above into a file {\tt exp.thy} and load it via {\tt use_thy "EXP"}, you can run some tests: \begin{ttbox} val read_exp = Syntax.test_read (syn_of EXP.thy) "exp"; +{\out val it = fn : string -> unit} read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0"; {\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"} {\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")} @@ -698,711 +495,149 @@ {\out \vdots} \end{ttbox} The output of \ttindex{Syntax.test_read} includes the token list ({\tt -tokens}) and the raw ast directly derived from the parse tree, ignoring parse -ast translations. The rest is tracing information provided by the macro -expander (see \S\ref{sec:macros}). + tokens}) and the raw \AST{} directly derived from the parse tree, +ignoring parse \AST{} translations. The rest is tracing information +provided by the macro expander (see \S\ref{sec:macros}). -Executing {\tt Syntax.print_gram (syn_of EXP.thy)} reveals the actual grammar -productions derived from the above mixfix declarations (lots of additional -information deleted): +Executing {\tt Syntax.print_gram} reveals the productions derived +from our mixfix declarations (lots of additional information deleted): \begin{ttbox} -exp = "0" => "0" (9) -exp = exp[0] "+" exp[1] => "+" (0) -exp = exp[3] "*" exp[2] => "*" (2) -exp = "-" exp[3] => "-" (3) +Syntax.print_gram (syn_of EXP.thy); +{\out exp = "0" => "0" (9)} +{\out exp = exp[0] "+" exp[1] => "+" (0)} +{\out exp = exp[3] "*" exp[2] => "*" (2)} +{\out exp = "-" exp[3] => "-" (3)} \end{ttbox} -\end{example} -Let us now have a closer look at the structure of the string$sy$appearing -in mixfix annotations. This string specifies a list of parsing and printing -directives, namely delimiters\index{delimiter}, -arguments\index{argument!mixfix}, spaces\index{space (pretty printing)}, -blocks of indentation\index{block (pretty printing)} and optional or forced -line breaks\index{break (pretty printing)}. These are encoded via the + +\subsection{The mixfix template} +Let us take a closer look at the string$template$appearing in mixfix +annotations. This string specifies a list of parsing and printing +directives: delimiters\index{delimiter}, arguments\index{argument!mixfix}, +spaces, blocks of indentation and line breaks. These are encoded via the following character sequences: +\index{pretty printing|(} \begin{description} - \item[~\ttindex_~] An argument\index{argument!mixfix} position. + \item[~\ttindex_~] An argument\index{argument!mixfix} position, which + stands for a nonterminal symbol or name token. - \item[~$d$~] A \rmindex{delimiter}, i.e.\ a non-empty sequence of - non-special or escaped characters. Escaping a character\index{escape - character} means preceding it with a {\tt '} (quote). Thus you have to - write {\tt ''} if you really want a single quote. Delimiters may never - contain white space, though. + \item[~$d$~] A \rmindex{delimiter}, namely a non-empty sequence of + non-special or escaped characters. Escaping a character\index{escape + character} means preceding it with a {\tt '} (single quote). Thus + you have to write {\tt ''} if you really want a single quote. You must + also escape {\tt _}, {\tt (}, {\tt )} and {\tt /}. Delimiters may + never contain white space, though. - \item[~$s$~] A non-empty sequence of spaces\index{space (pretty printing)} - for printing. + \item[~$s$~] A non-empty sequence of spaces for printing. This + and the following specifications do not affect parsing at all. - \item[~{\ttindex($n$}~] Open a block\index{block (pretty printing)}.$n$is - an optional sequence of digits that specifies the amount of indentation - to be added when a line break occurs within the block. If {\tt(} is not - followed by a digit, the indentation defaults to$0$. + \item[~{\ttindex($n$}~] Open a pretty printing block. The optional + number$n$specifies how much indentation to add when a line break + occurs within the block. If {\tt(} is not followed by digits, the + indentation defaults to~$0$. - \item[~\ttindex)~] Close a block. - - \item[~\ttindex{//}~] Force a line break\index{break (pretty printing)}. + \item[~\ttindex)~] Close a pretty printing block. - \item[~\ttindex/$s$~] Allow a line break\index{break (pretty printing)}. - Spaces$s$right after {\tt /} are only printed if the break is not - taken. -\end{description} + \item[~\ttindex{//}~] Force a line break. -In terms of parsing, arguments are nonterminals or name tokens and -delimiters are delimiters. The other directives have only significance for -printing. The \rmindex{pretty printing} mechanisms of Isabelle is essentially -the one described in \cite{paulson91}. + \item[~\ttindex/$s$~] Allow a line break. Here$s$stands for the string + of spaces (zero or more) right after the {\tt /} character. These + spaces are printed if the break is not taken. +\end{description} +Isabelle's pretty printer resembles the one described in +Paulson~\cite{paulson91}. \index{pretty printing|)} \subsection{Infixes} - -Infix\index{infix} operators associating to the left or right can be declared -conveniently using \ttindex{infixl} or \ttindex{infixr}. - -Roughly speaking, the form {\tt$c$::\ "$\tau$" (infixl$p$)} abbreviates: +\indexbold{infix operators} +Infix operators associating to the left or right can be declared +using {\tt infixl} or {\tt infixr}. +Roughly speaking, the form {\tt$c$::\ "$\sigma$" (infixl$p$)} +abbreviates the declarations \begin{ttbox} -"op $$c$$" ::\ "$$\tau$$" ("op $$c$$") -"op $$c$$" ::\ "$$\tau$$" ("(_ $$c$$/ _)" [$$p$$, $$p + 1$$] $$p$$) +"op $$c$$" :: "$$\sigma$$" ("op $$c$$") +"op $$c$$" :: "$$\sigma$$" ("(_ $$c$$/ _)" [$$p$$, $$p+1$$] $$p$$) \end{ttbox} -and {\tt$c$::\ "$\tau$" (infixr$p$)} abbreviates: +and {\tt$c$::\ "$\sigma$" (infixr$p$)} abbreviates the declarations \begin{ttbox} -"op $$c$$" ::\ "$$\tau$$" ("op $$c$$") -"op $$c$$" ::\ "$$\tau$$" ("(_ $$c$$/ _)" [$$p + 1$$, $$p$$] $$p$$) +"op $$c$$" :: "$$\sigma$$" ("op $$c$$") +"op $$c$$" :: "$$\sigma$$" ("(_ $$c$$/ _)" [$$p+1$$, $$p$$] $$p$$) \end{ttbox} - +The infix operator is declared as a constant with the prefix {\tt op}. Thus, prefixing infixes with \ttindex{op} makes them behave like ordinary -function symbols. Special characters occurring in$c$have to be escaped as -in delimiters. Also note that the expanded forms above would be actually -illegal at the user level because of duplicate declarations of constants. +function symbols, as in \ML. Special characters occurring in~$c$must be +escaped, as in delimiters, using a single quote. + +The expanded forms above would be actually illegal in a {\tt .thy} file +because they declare the constant \hbox{\tt"op $$c$$"} twice. \subsection{Binders} - -A \bfindex{binder} is a variable-binding construct, such as a -\rmindex{quantifier}. The constant declaration \indexbold{*binder} +\indexbold{binders} +\begingroup +\def\Q{{\cal Q}} +A {\bf binder} is a variable-binding construct such as a quantifier. The +binder declaration \indexbold{*binder} \begin{ttbox} -$$c$$ ::\ "$$\tau$$" (binder "$$Q$$" $$p$$) +$$c$$ :: "$$\sigma$$" (binder "$$\Q$$" $$p$$) \end{ttbox} -introduces a binder$c$of type$\tau$, which must have the form$(\tau@1 \To
-\tau@2) \To \tau@3$. Its concrete syntax is$Q~x.P$. A binder is like a -generalized quantifier where$\tau@1$is the type of the bound variable$x$, -$\tau@2$the type of the body$P$, and$\tau@3$the type of the whole term. -For example$\forall$can be declared like this: +introduces a constant~$c$of type~$\sigma$, which must have the form +$(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is$\Q~x.P$, where +$x$is a bound variable of type~$\tau@1$, the body~$P$has type$\tau@2$+and the whole term has type~$\tau@3$. Special characters in$\Q$must be +escaped using a single quote. + +Let us declare the quantifier~$\forall$: \begin{ttbox} All :: "('a => o) => o" (binder "ALL " 10) \end{ttbox} -This allows us to write$\forall x.P$either as {\tt All(\%$x$.$P$)} or {\tt -ALL$x$.$P$}. When printing terms, Isabelle usually uses the latter form, but -has to fall back on$\mtt{All}(P)$, if$P$is not an abstraction. +This let us write$\forall x.P$as either {\tt All(\%$x$.$P$)} or {\tt ALL +$x$.$P$}. When printing, Isabelle prefers the latter form, but must fall +back on$\mtt{All}(P)$if$P$is not an abstraction. Both$P$and {\tt ALL +$x$.$P$} have type~$o$, the type of formulae, while the bound variable +can be polymorphic. -Binders$c$of type$(\sigma \To \tau) \To \tau$can be nested; then the -internal form$c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P)
-\ldots))$corresponds to external$Q~x@1~x@2 \ldots x@n. P$. +The binder~$c$of type$(\sigma \To \tau) \To \tau$can be nested. The +external form$\Q~x@1~x@2 \ldots x@n. P$corresponds to the internal form +$c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots))$ \medskip The general binder declaration \begin{ttbox} -$$c$$ ::\ "($$\tau@1$$ => $$\tau@2$$) => $$\tau@3$$" (binder "$$Q$$" $$p$$) +$$c$$ :: "($$\tau@1$$ => $$\tau@2$$) => $$\tau@3$$" (binder "$$\Q$$" $$p$$) \end{ttbox} is internally expanded to \begin{ttbox} -$$c$$ ::\ "($$\tau@1$$ => $$\tau@2$$) => $$\tau@3$$" -"$$Q$$" ::\ "[idts, $$\tau@2$$] => $$\tau@3$$" ("(3$$Q$$_./ _)" $$p$$) -\end{ttbox} -with$idts$being the syntactic category for a list of$id$s optionally -constrained (see Figure~\ref{fig:pure_gram}). Note that special characters in -$Q$have to be escaped as in delimiters. - -Additionally, a parse translation\index{parse translation!for binder} for$Q$-and a print translation\index{print translation!for binder} for$c$is -installed. These perform behind the scenes the translation between the -internal and external forms. - - - -\section{Syntactic translations (macros)} \label{sec:macros} - -So far we have pretended that there is a close enough relationship between -concrete and abstract syntax to allow an automatic translation from one to -the other using the constant name supplied with each non-copy production. In -many cases this scheme is not powerful enough. Some typical examples involve -variable binding constructs (e.g.\ {\tt ALL x:A.P} vs.\ {\tt Ball(A, \%x.P)} -or convenient notations for enumerations like finite sets, lists etc.\ (e.g.\ -{\tt [x, y, z]} vs.\ {\tt Cons(x, Cons(y, Cons(z, Nil)))}). - -Isabelle offers such translation facilities at two different levels, namely -{\bf macros}\indexbold{macro} and {\bf translation functions}. - -Macros are specified by first-order rewriting systems that operate on asts. -They are usually easy to read and in most cases not very difficult to write. -Unfortunately, some more obscure translations cannot be expressed as macros -and you have to fall back on the more powerful mechanism of translation -functions written in \ML. These are quite unreadable and hard to write (see -\S\ref{sec:tr_funs}). - -\medskip -Let us now get started with the macro system by a simple example: - -\begin{example}~ \label{ex:set_trans} - -\begin{ttbox} -SET = Pure + -types - i, o 0 -arities - i, o :: logic -consts - Trueprop :: "o => prop" ("_" 5) - Collect :: "[i, i => o] => i" - "{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})") - Replace :: "[i, [i, i] => o] => i" - "{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})") - Ball :: "[i, i => o] => o" - "{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10) -translations - "{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, %x. P)" - "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, %x y. Q)" - "ALL x:A. P" == "Ball(A, %x. P)" -end -\end{ttbox} - -This and the following theories are complete working examples, though they -are fragmentary as they contain merely syntax. The full theory definition can -be found in {\tt ZF/zf.thy}. - -{\tt SET} defines constants for set comprehension ({\tt Collect}), -replacement ({\tt Replace}) and bounded universal quantification ({\tt -Ball}). Without additional syntax you would have to express$\forall x \in A.
-P(x)$as {\tt Ball(A, P)}. Since this is quite awkward, we define additional -constants with appropriate concrete syntax. These constants are decorated -with {\tt\at} to stress their pure syntactic purpose; they should never occur -within the final well-typed terms. Another consequence is that the user -cannot refer to such names directly, since they are not legal identifiers. - -The translations cause the replacement of external forms by internal forms -after parsing, and vice versa before printing of terms. -\end{example} - -This is only a very simple but common instance of a more powerful mechanism. -As a specification of what is to be translated, it should be comprehensible -without further explanations. But there are also some snags and other -peculiarities that are typical for macro systems in general. The purpose of -this section is to explain how Isabelle's macro system really works. - - -\subsection{Specifying macros} - -Basically macros are rewrite rules on asts. But unlike other macro systems of -various programming languages, Isabelle's macros work in both directions. -Therefore a syntax contains two lists of rules: one for parsing and one for -printing. - -The {\tt translations} section\index{translations section@{\tt translations} -section} consists of a list of rule specifications of the form: - -\begin{center} -{\tt$[$($root$)$]string[$=>$|$<=$|$==$][$($root$)$]$-$string$}. -\end{center} - -This specifies a \rmindex{parse rule} ({\tt =>}) a \rmindex{print rule} ({\tt - <=}) or both ({\tt ==}). The two$string$s preceded by optional -parenthesized$root$s are the left-hand and right-hand side of the rule in -user-defined syntax. - -Rules are internalized wrt.\ an intermediate signature that is obtained from -the parent theories' ones by adding all material of all sections preceding -{\tt translations} in the {\tt .thy} file. Especially, new syntax defined in -{\tt consts} is already effective. - -Then part of the process that transforms input strings into terms is applied: -lexing, parsing and parse ast translations (see \S\ref{sec:asts}). Macros -specified in the parents are {\em not\/} expanded. Also note that the lexer -runs in a different mode that additionally accepts identifiers of the form -$\_~letter~quasiletter^*$(like {\tt _idt}, {\tt _K}). The syntactic category -to parse is specified by$root$, which defaults to {\tt logic}. - -Finally, Isabelle tries to guess which atoms of the resulting ast of the rule -should be treated as constants during matching (see below). These names are -extracted from all class, type and constant declarations made so far. - -\medskip -The result are two lists of translation rules in internal form, that is pairs -of asts. They can be viewed using {\tt Syntax.print_syntax} (sections -\ttindex{parse_rules} and \ttindex{print_rules}). For {\tt SET} of -Example~\ref{ex:set_trans} these are: -\begin{ttbox} -parse_rules: - ("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P)) - ("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q))) - ("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P)) -print_rules: - ("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P) - ("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q) - ("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P) +$$c$$ :: "($$\tau@1$$ => $$\tau@2$$) => $$\tau@3$$" +"$$\Q$$"\hskip-3pt :: "[idts, $$\tau@2$$] => $$\tau@3$$" ("(3$$\Q$$_./ _)" $$p$$) \end{ttbox} - -Note that in this notation all rules are oriented left to right. In the {\tt -translations} section, which has been split into two parts, print rules -appeared right to left. - -\begin{warn} -Be careful not to choose names for variables in rules that are actually -treated as constant. If in doubt, check the rules in their internal form or -the section labeled {\tt consts} in the output of {\tt Syntax.print_syntax}. -\end{warn} - - -\subsection{Applying rules} - -In the course of parsing and printing terms, asts are generated as an -intermediate form as pictured in Figure~\ref{fig:parse_print}. These asts are -normalized wrt.\ the given lists of translation rules in a uniform manner. As -stated earlier, asts are supposed to be first-order terms'. The rewriting -systems derived from {\tt translations} sections essentially resemble -traditional first-order term rewriting systems. We first examine how a single -rule is applied. - -Let$t$be the ast to be normalized and$(l, r)$some translation rule. A -subast$u$of$t$is called {\bf redex}\indexbold{redex (ast)} (reducible -expression), if it is an instance of$l$. In this case$l$is said to {\bf -match}\indexbold{match (ast)}$u$. A redex matched by$l$may be replaced by -the corresponding instance of$r$, thus {\bf rewriting}\index{rewrite (ast)} -the ast$t$. - -Matching requires some notion of {\bf place-holders}\indexbold{place-holder -(ast)} that may occur in rule patterns but not in ordinary asts, which are -considered ground. Here we simply use {\tt Variable}s for this purpose. - -More formally, the matching of$u$by$l$is performed as follows (the rule -pattern is the second argument): \index{match (ast)@$match$(ast)} -\begin{itemize} - \item$match(\Constant x, \Constant x) = \mbox{OK}$. - - \item$match(\Variable x, \Constant x) = \mbox{OK}$. - - \item$match(u, \Variable x) = \mbox{OK, bind}~x~\mbox{to}~u$. - - \item$match(\Appl{u@1, \ldots, u@n}, \Appl{l@1, \ldots, l@n}) = match(u@1,
-    l@1), \ldots, match(u@n, l@n)$. - - \item$match(u, l) = \mbox{FAIL}$in any other case. -\end{itemize} - -This means that a {\tt Constant} pattern matches any atomic asts of the same -name, while a {\tt Variable} matches any ast. If successful,$match$yields a -substitution$\sigma$that is applied to$r$, generating the appropriate -instance that replaces$u$. - -\medskip -In order to make things simple and fast, ast rewrite rules$(l, r)$are -restricted by the following conditions: -\begin{itemize} - \item Rules have to be left linear, i.e.\$l$must not contain any {\tt - Variable} more than once. - - \item Rules must have constant heads, i.e.\$l = \mtt"c\mtt"$or$l =
-    (\mtt"c\mtt" ~ x@1 \ldots x@n)$. - - \item The set of variables contained in$r$has to be a subset of those of -$l$. -\end{itemize} - -\medskip -Having first-order matching in mind, the second case of$match$may look a -bit odd. But this is exactly the place, where {\tt Variable}s of non-rule -asts behave like {\tt Constant}s. The deeper meaning of this is related with -asts being very primitive' in some sense, ignorant of the underlying -semantics', not far removed from parse trees. At this level it is not yet -known, which$id$s will become constants, bounds, frees, types or classes. As -$ast_of_pt$(see \S\ref{sec:asts}) shows, former parse tree heads appear in -asts as {\tt Constant}s, while$id$s,$var$s,$tfree$s and$tvar$s become -{\tt Variable}s. - -This is at variance with asts generated from terms before printing (see -$ast_of_term$in \S\ref{sec:asts}), where all constants and type constructors -become {\tt Constant}s. - -\begin{warn} -This means asts may contain quite a messy mixture of {\tt Variable}s and {\tt -Constant}s, which is insignificant at macro level because$match$treats them -alike. -\end{warn} - -Because of this behaviour, different kinds of atoms with the same name are -indistinguishable, which may make some rules prone to misbehaviour. Regard -the following fragmentary example: -\begin{ttbox} -types - Nil 0 -consts - Nil :: "'a list" - "[]" :: "'a list" ("[]") -translations - "[]" == "Nil" -\end{ttbox} -Then the term {\tt Nil} will be printed as {\tt []}, just as expected. What -happens with \verb|%Nil.t| or {\tt x::Nil} is left as an exercise. - - -\subsection{Rewriting strategy} - -When normalizing an ast by repeatedly applying translation rules until no -more rule is applicable, there are in each step two choices: which rule to -apply next, and which redex to reduce. - -We could assume that the user always supplies terminating and confluent -rewriting systems, but this would often complicate things unnecessarily. -Therefore, we reveal part of the actual rewriting strategy: The normalizer -always applies the first matching rule reducing an unspecified redex chosen -first. - -Thereby, first rule' is roughly speaking meant wrt.\ the appearance of the -rules in the {\tt translations} sections. But this is more tricky than it -seems: If a given theory is {\em extended}, new rules are simply appended to -the end. But if theories are {\em merged}, it is not clear which list of -rules has priority over the other. In fact the merge order is left -unspecified. This shouldn't cause any problems in practice, since -translations of different theories usually do not overlap. {\tt -Syntax.print_syntax} shows the rules in their internal order. - -\medskip -You can watch the normalization of asts during parsing and printing by -setting \ttindex{Syntax.trace_norm_ast} to {\tt true}. An alternative is the -use of \ttindex{Syntax.test_read}, which is always in trace mode. The -information displayed when tracing\index{tracing (ast)} includes: the ast -before normalization ({\tt pre}), redexes with results ({\tt rewrote}), the -normal form finally reached ({\tt post}) and some statistics ({\tt -normalize}). If tracing is off, \ttindex{Syntax.stat_norm_ast} can be set to -{\tt true} in order to enable printing of the normal form and statistics -only. - - -\subsection{More examples} - -Let us first reconsider Example~\ref{ex:set_trans}, which is concerned with -variable binding constructs. - -There is a minor disadvantage over an implementation via translation -functions (as done for binders): - -\begin{warn} -If \ttindex{eta_contract} is set to {\tt true}, terms will be -$\eta$-contracted {\em before\/} the ast rewriter sees them. Thus some -abstraction nodes needed for print rules to match may get lost. E.g.\ -\verb|Ball(A, %x. P(x))| is contracted to {\tt Ball(A, P)}, the print rule is -no longer applicable and the output will be {\tt Ball(A, P)}. Note that -$\eta$-expansion via macros is {\em not\/} possible. -\end{warn} +with$idts$being the nonterminal symbol for a list of$id$s optionally +constrained (see Fig.\ts\ref{fig:pure_gram}). The declaration also +installs a parse translation\index{translations!parse} for~$\Q$and a print +translation\index{translations!print} for~$c$to translate between the +internal and external forms. +\endgroup -\medskip -Another common trap are meta constraints. If \ttindex{show_types} is set to -{\tt true}, bound variables will be decorated by their meta types at the -binding place (but not at occurrences in the body). E.g.\ matching with -\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y -"i")} rather than only {\tt y}. Ast rewriting will cause the constraint to -appear in the external form, say \verb|{y::i:A::i. P::o}|. Therefore your -syntax should be ready for such constraints to be re-read. This is the case -in our example, because of the category {\tt idt} of the first argument. - -\begin{warn} -Choosing {\tt id} instead of {\tt idt} is a very common error, especially -since it appears in former versions of most of Isabelle's object-logics. -\end{warn} - -\begin{example} \label{ex:finset_trans} -This example demonstrates the use of recursive macros to implement a -convenient notation for finite sets. -\begin{ttbox} -FINSET = SET + -types - is 0 -consts - "" :: "i => is" ("_") - "{\at}Enum" :: "[i, is] => is" ("_,/ _") - empty :: "i" ("{\ttlbrace}{\ttrbrace}") - insert :: "[i, i] => i" - "{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}") -translations - "{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})" - "{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})" -end -\end{ttbox} - -Finite sets are internally built up by {\tt empty} and {\tt insert}. -Externally we would like to see \verb|{x, y, z}| rather than {\tt insert(x, -insert(y, insert(z, empty)))}. - -First we define the generic syntactic category {\tt is} for one or more -objects of type {\tt i} separated by commas (including breaks for pretty -printing). The category has to be declared as a 0-place type constructor, but -without {\tt arities} declaration. Hence {\tt is} is not a logical type, no -default productions will be added, and we can cook our own syntax for {\tt -is} (first two lines of {\tt consts} section). If we had needed generic -enumerations of type$\alpha$(i.e.\ {\tt logic}), we could have used the -predefined category \ttindex{args} and skipped this part altogether. - -Next follows {\tt empty}, which is already equipped with its syntax -\verb|{}|, and {\tt insert} without concrete syntax. The syntactic constant -{\tt\at Finset} provides concrete syntax for enumerations of {\tt i} enclosed -in curly braces. Remember that a pair of parentheses specifies a block of -indentation for pretty printing. The category {\tt is} can later be reused -for other enumerations like lists or tuples. - -The translations may look a bit odd at first sight, but rules can only be -fully understood in their internal forms, which are: -\begin{ttbox} -parse_rules: - ("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs)) - ("{\at}Finset" x) -> ("insert" x "empty") -print_rules: - ("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs)) - ("insert" x "empty") -> ("{\at}Finset" x) -\end{ttbox} -This shows that \verb|{x, xs}| indeed matches any set enumeration of at least -two elements, binding the first to {\tt x} and the rest to {\tt xs}. -Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration. Note that -the parse rules only work in this order. - -\medskip -Some rules are prone to misbehaviour, as -\verb|%empty insert. insert(x, empty)| shows, which is printed as -\verb|%empty insert. {x}|. This problem arises, because the ast rewriter -cannot discern constants, frees, bounds etc.\ and looks only for names of -atoms. - -Thus the names of {\tt Constant}s occurring in the (internal) left-hand side -of translation rules should be regarded as reserved keywords'. It is good -practice to choose non-identifiers here like {\tt\at Finset} or sufficiently -long and strange names. -\end{example} - -\begin{example} \label{ex:prod_trans} -One of the well-formedness conditions for ast rewrite rules stated earlier -implies that you can never introduce new {\tt Variable}s on the right-hand -side. Something like \verb|"K(B)" => "%x. B"| is illegal and could cause -variable capturing, if it were allowed. In such cases you usually have to -fall back on translation functions. But there is a trick that makes things -quite readable in some cases: {\em calling parse translations by parse -rules}. This is demonstrated here. -\begin{ttbox} -PROD = FINSET + -consts - Pi :: "[i, i => i] => i" - "{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10) - "{\at}->" :: "[i, i] => i" ("(_ ->/ _)" [51, 50] 50) -translations - "PROD x:A. B" => "Pi(A, %x. B)" - "A -> B" => "Pi(A, _K(B))" -end -ML - val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))]; -\end{ttbox} - -{\tt Pi} is an internal constant for constructing dependent products. Two -external forms exist: {\tt PROD x:A.B}, the general case, and {\tt A -> B}, -an abbreviation for \verb|Pi(A, %x.B)| with {\tt B} not actually depending on -{\tt x}. - -Now the second parse rule is where the trick comes in: {\tt _K(B)} is -introduced during ast rewriting, which later becomes \verb|%x.B| due to a -parse translation associated with \ttindex{_K}. Note that a leading {\tt _} -in$id$s is allowed in translation rules, but not in ordinary terms. This -special behaviour of the lexer is very useful for forging' asts containing -names that are not directly accessible normally. - -Unfortunately, there is no such trick for printing, so we have to add a {\tt -ML} section for the print translation \ttindex{dependent_tr'}. - -The parse translation for {\tt _K} is already installed in Pure, and {\tt -dependent_tr'} is exported by the syntax module for public use. See -\S\ref{sec:tr_funs} for more of the arcane lore of translation functions. -\end{example} - - - -\section{Translation functions *} \label{sec:tr_funs} - -This section is about the remaining translation mechanism which enables the -designer of theories to do almost everything with terms or asts during the -parsing or printing process, by writing \ML-functions. The logic \LK\ is a -good example of a quite sophisticated use of this to transform between -internal and external representations of associative sequences. The high -level macro system described in \S\ref{sec:macros} fails here completely. - -\begin{warn} -A full understanding of the matters presented here requires some familiarity -with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ}, -{\tt Syntax.ast} and the encodings of types and terms as such at the various -stages of the parsing or printing process. You probably do not really want to -use translation functions at all! -\end{warn} - -As already mentioned in \S\ref{sec:asts}, there are four kinds of translation -functions. All such functions are associated with a name which specifies an -ast's or term's head invoking that function. Such names can be (logical or -syntactic) constants or type constructors. - -{\tt Syntax.print_syntax} displays the sets of names associated with the -translation functions of a {\tt Syntax.syntax} under -\ttindex{parse_ast_translation}, \ttindex{parse_translation}, -\ttindex{print_translation} and \ttindex{print_ast_translation}. The user can -add new ones via the {\tt ML} section\index{ML section@{\tt ML} section} of a -{\tt .thy} file. But there may never be more than one function of the same -kind per name. - -\begin{warn} -Conceptually, the {\tt ML} section should appear between {\tt consts} and -{\tt translations}, i.e.\ newly installed translation functions are already -effective when macros and logical rules are parsed. {\tt ML} has to be the -last section because the {\tt .thy} file parser is unable to detect the end -of \ML\ code in another way than by end-of-file. -\end{warn} - -All text of the {\tt ML} section is simply copied verbatim into the \ML\ file -generated from a {\tt .thy} file. Definitions made here by the user become -components of a \ML\ structure of the same name as the theory to be created. -Therefore local things should be declared within {\tt local}. The following -special \ML\ values, which are all optional, serve as the interface for the -installation of user defined translation functions. - -\begin{ttbox} -val parse_ast_translation: (string * (ast list -> ast)) list -val parse_translation: (string * (term list -> term)) list -val print_translation: (string * (term list -> term)) list -val print_ast_translation: (string * (ast list -> ast)) list -\end{ttbox} - -The basic idea behind all four kinds of functions is relatively simple (see -also Figure~\ref{fig:parse_print}): Whenever --- during the transformations -between parse trees, asts and terms --- a combination of the form -$(\mtt"c\mtt"~x@1 \ldots x@n)$is encountered, and a translation function$f$-of appropriate kind exists for$c$, the result will be$f \mtt[ x@1, \ldots,
-x@n \mtt]$. Thereby,$x@1, \ldots, x@n$(with$n \ge 0$) are asts for ast -translations and terms for term translations. A combination' at ast level is -of the form$\Constant c$or$\Appl{\Constant c, x@1, \ldots, x@n}$, and at -term level$\ttfct{Const} (c, \tau)$or$\ttfct{Const} (c, \tau) \ttrel{\$} -x@1 \ttrel{\$} \dots \ttrel{\$} x@n$.
-
-\medskip
-Translation functions at ast level differ from those at term level only in
-the same way, as asts and terms differ. Terms, being more complex and more
-specific, allow more sophisticated transformations (typically involving
-abstractions and bound variables).
-
-On the other hand, {\em parse\/} (ast) translations differ from {\em print\/}
-(ast) translations more fundamentally:
-\begin{description}
-  \item[Parse (ast) translations] are applied bottom-up, i.e.\ the arguments
-    supplied ($x@1, \ldots, x@n$ above) are already in translated form.
-    Additionally, they may not fail, exceptions are re-raised after printing
-    an error message.
-
-  \item[Print (ast) translations] are applied top-down, i.e.\ supplied with
-    arguments that are partly still in internal form. The result is again fed
-    into the translation machinery as a whole. Therefore a print (ast)
-    translation should not introduce as head a constant of the same name that
-    invoked it in the first place. Alternatively, exception \ttindex{Match}
-    may be raised, indicating failure of translation.
-\end{description}
-
-Another difference between the parsing and the printing process is, which
-atoms are {\tt Constant}s or {\tt Const}s, i.e.\ able to invoke translation
-functions.
-
-For parse ast translations only former parse tree heads are {\tt Constant}s
-(see also $ast_of_pt$ in \S\ref{sec:asts}). These and additionally introduced
-{\tt Constant}s (e.g.\ by macros), become {\tt Const}s for parse translations
-(see also $term_of_ast$ in \S\ref{sec:asts}).
-
-The situation is slightly different, when terms are prepared for printing,
-since the role of atoms is known. Initially, all logical constants and type
-constructors may invoke print translations. New constants may be introduced
-by these or by macros, able to invoke parse ast translations.
-
-
-\subsection{A simple example *}
-
-Presenting a simple and useful example of translation functions is not that
-easy, since the macro system is sufficient for most simple applications. By
-convention, translation functions always have names ending with {\tt
-_ast_tr}, {\tt _tr}, {\tt _tr'} or {\tt _ast_tr'}. You may look for such
-names in the sources of Pure Isabelle for more examples.
-
-\begin{example} \label{ex:tr_funs}
-
-We continue Example~\ref{ex:prod_trans} by presenting the \ML\ sources of the
-parse translation for \ttindex{_K} and the print translation
-\ttindex{dependent_tr'}:
-\begin{ttbox}
-(* nondependent abstraction *)
-fun k_tr (*"_K"*) [t] = Abs ("x", dummyT, incr_boundvars 1 t)
-  | k_tr (*"_K"*) ts = raise TERM("k_tr",ts);
-
-(* dependent / nondependent quantifiers *)
-fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
-      if 0 mem (loose_bnos B) then
-        let val (x', B') = variant_abs (x, dummyT, B);
-        in list_comb (Const (q, dummyT) $Free (x', T)$ A $B', ts) - end - else list_comb (Const (r, dummyT)$ A $B, ts) - | dependent_tr' _ _ = raise Match; -\end{ttbox} -This text is taken from the Pure sources. Ordinary user translations -would typically appear within the {\tt ML} section of the {\tt .thy} file. To -link {\tt k_tr} into the parser requires the definition -\begin{ttbox} -val parse_translation = [("_K",k_tr)]; -\end{ttbox} -For {\tt k_tr} this is not necessary, as it is built-in, but user-defined -translation functions are added in exactly the same way. \medskip - -If {\tt k_tr} is called with exactly one argument$t$, it creates a new {\tt -Abs} node with a body derived from$t$: loose {\tt Bound}s, i.e.\ those -referring to outer {\tt Abs}s, are incremented using -\ttindex{incr_boundvars}. This and many other basic term manipulation -functions are defined in {\tt Pure/term.ML}, the comments there being in most -cases the only documentation. - -Since terms fed into parse translations are not yet typed, the type of the -bound variable in the new {\tt Abs} is simply {\tt dummyT}. - -\medskip -The argument$(q, r)$for {\tt dependent_tr'} is supplied already during the -installation within an {\tt ML} section. This yields a parse translation that -transforms something like$c(A, \mtt{Abs}(x, T, B), t@1, \ldots, t@n)$into -$q(x', A, B', t@1, \ldots, t@n)$or$r(A, B, t@1, \ldots, t@n)$. The latter -form, if$B$does not actually depend on$x$. This is checked using -\ttindex{loose_bnos}, yet another function of {\tt Pure/term.ML}. Note that -$x'$is a version of$x$renamed away from all names in$B$, and$B'$the -body$B$with {\tt Bound}s referring to our {\tt Abs} node replaced by -$\ttfct{Free} (x', \mtt{dummyT})$. - -We have to be more careful with types here. While types of {\tt Const}s are -completely ignored, type constraints may be printed for some {\tt Free}s and -{\tt Var}s (if \ttindex{show_types} is set to {\tt true}). Variables of type -\ttindex{dummyT} are never printed with constraint, though. Thus, a -constraint of$x'$may only appear at its binding place, since {\tt Free}s of -$B'$replacing the appropriate {\tt Bound}s of$B$via \ttindex{variant_abs} -have all type {\tt dummyT}. \end{example} - +\index{mixfix declaration|)} \section{Example: some minimal logics} \label{sec:min_logics} +This section presents some examples that have a simple syntax. They +demonstrate how to define new object-logics from scratch. -This concluding section presents some examples that are very simple from a -syntactic point of view. They should rather demonstrate how to define new -object-logics from scratch. In particular we need to say how an object-logic -syntax is hooked up to the meta-logic. Since all theorems must conform to the -syntax for$prop$(see Figure~\ref{fig:pure_gram}), that syntax has to be -extended with the object-level syntax. Assume that the syntax of your -object-logic defines a category$o$of formulae. These formulae can now -appear in axioms and theorems wherever$prop$does if you add the production +First we must define how an object-logic syntax embedded into the +meta-logic. Since all theorems must conform to the syntax for~$prop$(see +Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the +object-level syntax. Assume that the syntax of your object-logic defines a +nonterminal symbol~$o$of formulae. These formulae can now appear in +axioms and theorems wherever$prop$does if you add the production $prop ~=~ o.$ -More precisely, you need a coercion from formulae to propositions: +This is not a copy production but a coercion from formulae to propositions: \begin{ttbox} Base = Pure + types - o 0 + o arities o :: logic consts @@ -1410,12 +645,12 @@ end \end{ttbox} The constant {\tt Trueprop} (the name is arbitrary) acts as an invisible -coercion function. Assuming this definition resides in a file {\tt base.thy}, +coercion function. Assuming this definition resides in a file {\tt base.thy}, you have to load it with the command {\tt use_thy "Base"}. -One of the simplest nontrivial logics is {\em minimal logic\/} of -implication. Its definition in Isabelle needs no advanced features but -illustrates the overall mechanism quite nicely: +One of the simplest nontrivial logics is {\bf minimal logic} of +implication. Its definition in Isabelle needs no advanced features but +illustrates the overall mechanism nicely: \begin{ttbox} Hilbert = Base + consts @@ -1426,40 +661,46 @@ MP "[| P --> Q; P |] ==> Q" end \end{ttbox} -After loading this definition you can start to prove theorems in this logic: +After loading this definition from the file {\tt hilbert.thy}, you can +start to prove theorems in the logic: \begin{ttbox} goal Hilbert.thy "P --> P"; {\out Level 0} {\out P --> P} {\out 1. P --> P} +\ttbreak by (resolve_tac [Hilbert.MP] 1); {\out Level 1} {\out P --> P} {\out 1. ?P --> P --> P} {\out 2. ?P} +\ttbreak by (resolve_tac [Hilbert.MP] 1); {\out Level 2} {\out P --> P} {\out 1. ?P1 --> ?P --> P --> P} {\out 2. ?P1} {\out 3. ?P} +\ttbreak by (resolve_tac [Hilbert.S] 1); {\out Level 3} {\out P --> P} {\out 1. P --> ?Q2 --> P} {\out 2. P --> ?Q2} +\ttbreak by (resolve_tac [Hilbert.K] 1); {\out Level 4} {\out P --> P} {\out 1. P --> ?Q2} +\ttbreak by (resolve_tac [Hilbert.K] 1); {\out Level 5} {\out P --> P} {\out No subgoals!} \end{ttbox} -As you can see, this Hilbert-style formulation of minimal logic is easy to -define but difficult to use. The following natural deduction formulation is -far preferable: +As we can see, this Hilbert-style formulation of minimal logic is easy to +define but difficult to use. The following natural deduction formulation is +better: \begin{ttbox} MinI = Base + consts @@ -1469,14 +710,14 @@ impE "[| P --> Q; P |] ==> Q" end \end{ttbox} -Note, however, that although the two systems are equivalent, this fact cannot -be proved within Isabelle: {\tt S} and {\tt K} can be derived in {\tt MinI} -(exercise!), but {\tt impI} cannot be derived in {\tt Hilbert}. The reason is -that {\tt impI} is only an {\em admissible\/} rule in {\tt Hilbert}, -something that can only be shown by induction over all possible proofs in -{\tt Hilbert}. +Note, however, that although the two systems are equivalent, this fact +cannot be proved within Isabelle. Axioms {\tt S} and {\tt K} can be +derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt + Hilbert}. The reason is that {\tt impI} is only an {\bf admissible} rule +in {\tt Hilbert}, something that can only be shown by induction over all +possible proofs in {\tt Hilbert}. -It is a very simple matter to extend minimal logic with falsity: +We may easily extend minimal logic with falsity: \begin{ttbox} MinIF = MinI + consts @@ -1490,13 +731,20 @@ MinC = Base + consts "&" :: "[o, o] => o" (infixr 30) +\ttbreak rules conjI "[| P; Q |] ==> P & Q" conjE1 "P & Q ==> P" conjE2 "P & Q ==> Q" end \end{ttbox} -And if we want to have all three connectives together, we define: +And if we want to have all three connectives together, we create and load a +theory file consisting of a single line:\footnote{We can combine the + theories without creating a theory file using the ML declaration +\begin{ttbox} +val MinIFC_thy = merge_theories(MinIF,MinC) +\end{ttbox} +\index{*merge_theories|fnote}} \begin{ttbox} MinIFC = MinIF + MinC \end{ttbox} @@ -1509,3 +757,826 @@ \end{ttbox} Try this as an exercise! +\medskip +Unless you need to define macros or syntax translation functions, you may +skip the rest of this chapter. + + +\section{*Abstract syntax trees} \label{sec:asts} +\index{trees!abstract syntax|(} The parser, given a token list from the +lexer, applies productions to yield a parse tree\index{trees!parse}. By +applying some internal transformations the parse tree becomes an abstract +syntax tree, or \AST{}. Macro expansion, further translations and finally +type inference yields a well-typed term\index{terms!obtained from ASTs}. +The printing process is the reverse, except for some subtleties to be +discussed later. + +Figure~\ref{fig:parse_print} outlines the parsing and printing process. +Much of the complexity is due to the macro mechanism. Using macros, you +can specify most forms of concrete syntax without writing any \ML{} code. + +\begin{figure} +\begin{center} +\begin{tabular}{cl} +string & \\ +$\downarrow$& parser \\ +parse tree & \\ +$\downarrow$& parse \AST{} translation \\ +\AST{} & \\ +$\downarrow$& \AST{} rewriting (macros) \\ +\AST{} & \\ +$\downarrow$& parse translation, type inference \\ +--- well-typed term --- & \\ +$\downarrow$& print translation \\ +\AST{} & \\ +$\downarrow$& \AST{} rewriting (macros) \\ +\AST{} & \\ +$\downarrow$& print \AST{} translation, printer \\ +string & +\end{tabular} +\index{translations!parse}\index{translations!parse AST} +\index{translations!print}\index{translations!print AST} + +\end{center} +\caption{Parsing and printing}\label{fig:parse_print} +\end{figure} + +Abstract syntax trees are an intermediate form between the raw parse trees +and the typed$\lambda$-terms. An \AST{} is either an atom (constant or +variable) or a list of {\em at least two\/} subtrees. Internally, they +have type \ttindex{Syntax.ast}: \index{*Constant} \index{*Variable} +\index{*Appl} +\begin{ttbox} +datatype ast = Constant of string + | Variable of string + | Appl of ast list +\end{ttbox} + +Isabelle uses an S-expression syntax for abstract syntax trees. Constant +atoms are shown as quoted strings, variable atoms as non-quoted strings and +applications as a parenthesized list of subtrees. For example, the \AST +\begin{ttbox} +Appl [Constant "_constrain", + Appl [Constant "_abs", Variable "x", Variable "t"], + Appl [Constant "fun", Variable "'a", Variable "'b"]] +\end{ttbox} +is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. +Both {\tt ()} and {\tt (f)} are illegal because they have too few +subtrees. + +The resemblance of Lisp's S-expressions is intentional, but there are two +kinds of atomic symbols:$\Constant x$and$\Variable x$. Do not take the +names {\tt Constant}'' and {\tt Variable}'' too literally; in the later +translation to terms,$\Variable x$may become a constant, free or bound +variable, even a type constructor or class name; the actual outcome depends +on the context. + +Similarly, you can think of${\tt (} f~x@1~\ldots~x@n{\tt )}$as the +application of~$f$to the arguments$x@1, \ldots, x@n$. But the kind of +application is determined later by context; it could be a type constructor +applied to types. + +Forms like {\tt (("_abs" x$t$)$u$)} are legal, but \AST{}s are +first-order: the {\tt "_abs"} does not bind the {\tt x} in any way. Later +at the term level, {\tt ("_abs" x$t$)} will become an {\tt Abs} node and +occurrences of {\tt x} in$t$will be replaced by bound variables (the term +constructor \ttindex{Bound}). + + +\subsection{Transforming parse trees to \AST{}s} +The parse tree is the raw output of the parser. Translation functions, +called {\bf parse AST translations}\indexbold{translations!parse AST}, +transform the parse tree into an abstract syntax tree. + +The parse tree is constructed by nesting the right-hand sides of the +productions used to recognize the input. Such parse trees are simply lists +of tokens and constituent parse trees, the latter representing the +nonterminals of the productions. Let us refer to the actual productions in +the form displayed by {\tt Syntax.print_syntax}. + +Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s +by stripping out delimiters and copy productions. More precisely, the +mapping$ast_of_pt$\index{ast_of_pt@$ast_of_pt$} is derived from the +productions as follows: +\begin{itemize} + \item Name tokens:$ast_of_pt(t) = \Variable s$, where$t$is an$id$, +$var$,$tid$or$tvar$token, and$s$its associated string. + + \item Copy productions:$ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$. + Here$\ldots$stands for strings of delimiters, which are + discarded.$P$stands for the single constituent that is not a + delimiter; it is either a nonterminal symbol or a name token. + + \item$0$-ary productions:$ast_of_pt(\ldots \mtt{=>} c) = \Constant c$. + Here there are no constituents other than delimiters, which are + discarded. + + \item$n$-ary productions, where$n \ge 1$: delimiters are discarded and + the remaining constituents$P@1$, \ldots,$P@n$are built into an + application whose head constant is~$c$: + \begin{eqnarray*} + \lefteqn{ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>} c)} \\ + &&\qquad{}= \Appl{\Constant c, ast_of_pt(P@1), \ldots, ast_of_pt(P@n)} + \end{eqnarray*} +\end{itemize} +Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, +{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. +These examples illustrate the need for further translations to make \AST{}s +closer to the typed$\lambda$-calculus. The Pure syntax provides +predefined parse \AST{} translations\index{translations!parse AST} for +ordinary applications, type applications, nested abstractions, meta +implications and function types. Figure~\ref{fig:parse_ast_tr} shows their +effect on some representative input strings. + + +\begin{figure} +\begin{center} +\tt\begin{tabular}{ll} +\rm input string & \rm \AST \\\hline +"f" & f \\ +"'a" & 'a \\ +"t == u" & ("==" t u) \\ +"f(x)" & ("_appl" f x) \\ +"f(x, y)" & ("_appl" f ("_args" x y)) \\ +"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\ +"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\ +\end{tabular} +\end{center} +\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} +\end{figure} + +\begin{figure} +\begin{center} +\tt\begin{tabular}{ll} +\rm input string & \rm \AST{} \\\hline +"f(x, y, z)" & (f x y z) \\ +"'a ty" & (ty 'a) \\ +"('a, 'b) ty" & (ty 'a 'b) \\ +"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\ +"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\ +"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\ +"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd))) +\end{tabular} +\end{center} +\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr} +\end{figure} + +The names of constant heads in the \AST{} control the translation process. +The list of constants invoking parse \AST{} translations appears in the +output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}. + + +\subsection{Transforming \AST{}s to terms} +The \AST{}, after application of macros (see \S\ref{sec:macros}), is +transformed into a term. This term is probably ill-typed since type +inference has not occurred yet. The term may contain type constraints +consisting of applications with head {\tt "_constrain"}; the second +argument is a type encoded as a term. Type inference later introduces +correct types or rejects the input. + +Another set of translation functions, namely parse +translations,\index{translations!parse}, may affect this process. If we +ignore parse translations for the time being, then \AST{}s are transformed +to terms by mapping \AST{} constants to constants, \AST{} variables to +schematic or free variables and \AST{} applications to applications. + +More precisely, the mapping$term_of_ast$\index{term_of_ast@$term_of_ast$} +is defined by +\begin{itemize} +\item Constants:$term_of_ast(\Constant x) = \ttfct{Const} (x,
+  \mtt{dummyT})$. + +\item Schematic variables:$term_of_ast(\Variable \mtt{"?}xi\mtt") =
+  \ttfct{Var} ((x, i), \mtt{dummyT})$, where$x$is the base name and$i$+ the index extracted from$xi$. + +\item Free variables:$term_of_ast(\Variable x) = \ttfct{Free} (x,
+  \mtt{dummyT})$. + +\item Function applications with$n$arguments: + \begin{eqnarray*} + \lefteqn{term_of_ast(\Appl{f, x@1, \ldots, x@n})} \\ + &&\qquad{}= term_of_ast(f) \ttapp + term_of_ast(x@1) \ttapp \ldots \ttapp term_of_ast(x@n) + \end{eqnarray*} +\end{itemize} +Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and +\verb|$|\index{$@{\tt\$}} are constructors of the datatype {\tt term},
+while \ttindex{dummyT} stands for some dummy type that is ignored during
+type inference.
+
+So far the outcome is still a first-order term.  Abstractions and bound
+variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
+by parse translations.  Such translations are attached to {\tt "_abs"},
+{\tt "!!"} and user-defined binders.
+
+
+\subsection{Printing of terms}
+The output phase is essentially the inverse of the input phase.  Terms are
+translated via abstract syntax trees into strings.  Finally the strings are
+pretty printed.
+
+Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
+terms into \AST{}s.  Ignoring those, the transformation maps
+term constants, variables and applications to the corresponding constructs
+on \AST{}s.  Abstractions are mapped to applications of the special
+constant {\tt _abs}.
+
+More precisely, the mapping $ast_of_term$\index{ast_of_term@$ast_of_term$}
+is defined as follows:
+\begin{itemize}
+  \item $ast_of_term(\ttfct{Const} (x, \tau)) = \Constant x$.
+
+  \item $ast_of_term(\ttfct{Free} (x, \tau)) = constrain (\Variable x, + \tau)$.
+
+  \item $ast_of_term(\ttfct{Var} ((x, i), \tau)) = constrain (\Variable + \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
+    the {\tt indexname} $(x, i)$.
+
+  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
+    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
+    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
+    the free variable $x'$.  This replaces corresponding occurrences of the
+    constructor \ttindex{Bound} by the term $\ttfct{Free} (x', + \mtt{dummyT})$:
+   \begin{eqnarray*}
+      \lefteqn{ast_of_term(\ttfct{Abs} (x, \tau, t))} \\
+                  \mathopen{\mtt[}
+                  \Constant \mtt{"_abs"}, constrain(\Variable x', \tau), \\
+    \end{eqnarray*}
+
+  \item $ast_of_term(\ttfct{Bound} i) = \Variable \mtt{"B.}i\mtt"$.
+    The occurrence of constructor \ttindex{Bound} should never happen
+    when printing well-typed terms; it indicates a de Bruijn index with no
+    matching abstraction.
+
+  \item Where $f$ is not an application,
+    \begin{eqnarray*}
+      \lefteqn{ast_of_term(f \ttapp x@1 \ttapp \ldots \ttapp x@n)} \\
+                  \mathopen{\mtt[} ast_of_term(f),
+                  ast_of_term(x@1), \ldots,ast_of_term(x@n)
+                  \mathclose{\mtt]}
+    \end{eqnarray*}
+\end{itemize}
+
+Type constraints are inserted to allow the printing of types, which is
+governed by the boolean variable \ttindex{show_types}.  Constraints are
+treated as follows:
+\begin{itemize}
+  \item $constrain(x, \tau) = x$, if $\tau = \mtt{dummyT}$ \index{*dummyT} or
+    \ttindex{show_types} not set to {\tt true}.
+
+  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, ty}$,
+    where $ty$ is the \AST{} encoding of $\tau$.  That is, type constructors as
+    {\tt Constant}s, type identifiers as {\tt Variable}s and type applications
+    as {\tt Appl}s with the head type constructor as first element.
+    Additionally, if \ttindex{show_sorts} is set to {\tt true}, some type
+    variables are decorated with an \AST{} encoding of their sort.
+\end{itemize}
+
+The \AST{}, after application of macros (see \S\ref{sec:macros}), is
+transformed into the final output string.  The built-in {\bf print AST
+  translations}\indexbold{translations!print AST} effectively reverse the
+parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
+
+For the actual printing process, the names attached to productions
+of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
+a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
+$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
+for~$c$.  Each argument~$x@i$ is converted to a string, and put in
+parentheses if its priority~$(p@i)$ requires this.  The resulting strings
+and their syntactic sugar (denoted by \dots'' above) are joined to make a
+single string.
+
+If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
+corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots +x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with
+non-constant head or without a corresponding production are printed as
+$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of
+$\Variable x$ is simply printed as~$x$.
+
+Blanks are {\em not\/} inserted automatically.  If blanks are required to
+separate tokens, specify them in the mixfix declaration, possibly preceeded
+by a slash~({\tt/}) to allow a line break.
+\index{trees!abstract syntax|)}
+
+
+
+\section{*Macros: Syntactic rewriting} \label{sec:macros}
+\index{macros|(}\index{rewriting!syntactic|(}
+
+Mixfix declarations alone can handle situations where there is a direct
+connection between the concrete syntax and the underlying term.  Sometimes
+we require a more elaborate concrete syntax, such as quantifiers and list
+notation.  Isabelle's {\bf macros} and {\bf translation functions} can
+perform translations such as
+\begin{center}\tt
+  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
+    ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
+    [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
+  \end{tabular}
+\end{center}
+Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
+are the most powerful translation mechanism but are difficult to read or
+write.  Macros are specified by first-order rewriting systems that operate
+on abstract syntax trees.  They are usually easy to read and write, and can
+express all but the most obscure translations.
+
+Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
+theory.\footnote{This and the following theories are complete working
+  examples, though they specify only syntax, no axioms.  The file {\tt
+    ZF/zf.thy} presents the full set theory definition, including many
+  macro rules.}  Theory {\tt SET} defines constants for set comprehension
+({\tt Collect}), replacement ({\tt Replace}) and bounded universal
+quantification ({\tt Ball}).  Each of these binds some variables.  Without
+additional syntax we should have to express $\forall x \in A. P$ as {\tt
+  Ball(A,\%x.P)}, and similarly for the others.
+
+\begin{figure}
+\begin{ttbox}
+SET = Pure +
+types
+  i, o
+arities
+  i, o :: logic
+consts
+  Trueprop      :: "o => prop"              ("_" 5)
+  Collect       :: "[i, i => o] => i"
+  "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
+  Replace       :: "[i, [i, i] => o] => i"
+  "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
+  Ball          :: "[i, i => o] => o"
+  "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
+translations
+  "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
+  "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
+  "ALL x:A. P"  == "Ball(A, \%x. P)"
+end
+\end{ttbox}
+\caption{Macro example: set theory}\label{fig:set_trans}
+\end{figure}
+
+The theory specifies a variable-binding syntax through additional
+productions that have mixfix declarations.  Each non-copy production must
+specify some constant, which is used for building \AST{}s.  The additional
+constants are decorated with {\tt\at} to stress their purely syntactic
+purpose; they should never occur within the final well-typed terms.
+Furthermore, they cannot be written in formulae because they are not legal
+identifiers.
+
+The translations cause the replacement of external forms by internal forms
+after parsing, and vice versa before printing of terms.  As a specification
+of the set theory notation, they should be largely self-explanatory.  The
+syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
+appear implicitly in the macro rules via their mixfix forms.
+
+Macros can define variable-binding syntax because they operate on \AST{}s,
+which have no inbuilt notion of bound variable.  The macro variables {\tt
+  x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
+in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
+range over formulae containing bound variable occurrences.
+
+Other applications of the macro system can be less straightforward, and
+there are peculiarities.  The rest of this section will describe in detail
+how Isabelle macros are preprocessed and applied.
+
+
+\subsection{Specifying macros}
+Macros are basically rewrite rules on \AST{}s.  But unlike other macro
+systems found in programming languages, Isabelle's macros work in both
+directions.  Therefore a syntax contains two lists of rewrites: one for
+parsing and one for printing.
+
+The {\tt translations} section\index{translations section@{\tt translations}
+section} specifies macros.  The syntax for a macro is
+$(root)\; string \quad + \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array} + \right\} \quad + (root)\; string +$
+%
+This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
+({\tt ==}).  The two strings specify the left and right-hand sides of the
+macro rule.  The $(root)$ specification is optional; it specifies the
+nonterminal for parsing the $string$ and if omitted defaults to {\tt
+  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
+\begin{itemize}
+\item Rules must be left linear: $l$ must not contain repeated variables.
+
+\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l = + (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
+
+\item Every variable in~$r$ must also occur in~$l$.
+\end{itemize}
+
+Macro rules may refer to any syntax from the parent theories.  They may
+also refer to anything defined before the the {\tt .thy} file's {\tt
+  translations} section --- including any mixfix declarations.
+
+Upon declaration, both sides of the macro rule undergo parsing and parse
+\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
+macro expansion.  The lexer runs in a different mode that additionally
+accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
+{\tt _K}).  Thus, a constant whose name starts with an underscore can
+appear in macro rules but not in ordinary terms.
+
+Some atoms of the macro rule's \AST{} are designated as constants for
+matching.  These are all names that have been declared as classes, types or
+constants.
+
+The result of this preprocessing is two lists of macro rules, each stored
+as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
+(sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
+theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
+\begin{ttbox}
+parse_rules:
+  ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
+  ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
+  ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
+print_rules:
+  ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
+  ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
+  ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
+\end{ttbox}
+
+\begin{warn}
+  Avoid choosing variable names that have previously been used as
+  constants, types or type classes; the {\tt consts} section in the output
+  of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
+  incorrectly, inspect its internal form as shown above, recalling that
+  constants appear as quoted strings and variables without quotes.
+\end{warn}
+
+\begin{warn}
+If \ttindex{eta_contract} is set to {\tt true}, terms will be
+$\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
+abstraction nodes needed for print rules to match may vanish.  For example,
+\verb|Ball(A, %x. P(x))| contracts {\tt Ball(A, P)}; the print rule does
+not apply and the output will be {\tt Ball(A, P)}.  This problem would not
+occur if \ML{} translation functions were used instead of macros (as is
+done for binder declarations).
+\end{warn}
+
+
+\begin{warn}
+Another trap concerns type constraints.  If \ttindex{show_types} is set to
+{\tt true}, bound variables will be decorated by their meta types at the
+binding place (but not at occurrences in the body).  Matching with
+\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
+"i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
+appear in the external form, say \verb|{y::i:A::i. P::o}|.
+
+To allow such constraints to be re-read, your syntax should specify bound
+variables using the nonterminal~\ttindex{idt}.  This is the case in our
+example.  Choosing {\tt id} instead of {\tt idt} is a common error,
+especially since it appears in former versions of most of Isabelle's
+object-logics.
+\end{warn}
+
+
+
+\subsection{Applying rules}
+As a term is being parsed or printed, an \AST{} is generated as an
+intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
+normalized by applying macro rules in the manner of a traditional term
+rewriting system.  We first examine how a single rule is applied.
+
+Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some
+translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
+instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
+matched by $l$ may be replaced by the corresponding instance of~$r$, thus
+{\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
+  place-holders} that may occur in rule patterns but not in ordinary
+\AST{}s; {\tt Variable} atoms serve this purpose.
+
+The matching of the object~$u$ by the pattern~$l$ is performed as follows:
+\begin{itemize}
+  \item Every constant matches itself.
+
+  \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
+    This point is discussed further below.
+
+  \item Every \AST{} in the object matches $\Variable x$ in the pattern,
+    binding~$x$ to~$u$.
+
+  \item One application matches another if they have the same number of
+    subtrees and corresponding subtrees match.
+
+  \item In every other case, matching fails.  In particular, {\tt
+      Constant}~$x$ can only match itself.
+\end{itemize}
+A successful match yields a substitution that is applied to~$r$, generating
+the instance that replaces~$u$.
+
+The second case above may look odd.  This is where {\tt Variable}s of
+non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
+far removed from parse trees; at this level it is not yet known which
+identifiers will become constants, bounds, frees, types or classes.  As
+\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
+{\tt Constant}s, while $id$s, $var$s, $tid$s and $tvar$s become {\tt
+  Variable}s.  On the other hand, when \AST{}s generated from terms for
+printing, all constants and type constructors become {\tt Constant}s; see
+\S\ref{sec:asts}.  Thus \AST{}s may contain a messy mixture of {\tt
+  Variable}s and {\tt Constant}s.  This is insignificant at macro level
+because matching treats them alike.
+
+Because of this behaviour, different kinds of atoms with the same name are
+indistinguishable, which may make some rules prone to misbehaviour.  Example:
+\begin{ttbox}
+types
+  Nil
+consts
+  Nil     :: "'a list"
+  "[]"    :: "'a list"    ("[]")
+translations
+  "[]"    == "Nil"
+\end{ttbox}
+The term {\tt Nil} will be printed as {\tt []}, just as expected.  What
+happens with \verb|%Nil.t| or {\tt x::Nil} is left as an exercise.
+
+Normalizing an \AST{} involves repeatedly applying macro rules until none
+is applicable.  Macro rules are chosen in the order that they appear in the
+{\tt translations} section.  You can watch the normalization of \AST{}s
+during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
+{\tt true}.\index{tracing!of macros} Alternatively, use
+\ttindex{Syntax.test_read}.  The information displayed when tracing
+includes the \AST{} before normalization ({\tt pre}), redexes with results
+({\tt rewrote}), the normal form finally reached ({\tt post}) and some
+statistics ({\tt normalize}).  If tracing is off,
+\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
+printing of the normal form and statistics only.
+
+
+\subsection{Example: the syntax of finite sets}
+This example demonstrates the use of recursive macros to implement a
+convenient notation for finite sets.
+\begin{ttbox}
+FINSET = SET +
+types
+  is
+consts
+  ""            :: "i => is"                ("_")
+  "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
+  empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
+  insert        :: "[i, i] => i"
+  "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
+translations
+  "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
+  "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
+end
+\end{ttbox}
+Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
+declarations above specify \verb|{x, y, z}| as the external representation
+of
+\begin{ttbox}
+insert(x, insert(y, insert(z, empty)))
+\end{ttbox}
+
+The nonterminal symbol~{\tt is} stands for one or more objects of type~{\tt
+  i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
+allows a line break after the comma for pretty printing; if no line break
+is required then a space is printed instead.
+
+The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
+declaration.  Hence {\tt is} is not a logical type and no default
+{\tt logic}, which would include all the logical types, we could have used
+the predefined nonterminal symbol \ttindex{args} and skipped this part
+altogether.  The nonterminal~{\tt is} can later be reused for other
+enumerations of type~{\tt i} like lists or tuples.
+
+Next follows {\tt empty}, which is already equipped with its syntax
+\verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
+constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
+  i} enclosed in curly braces.  Remember that a pair of parentheses, as in
+\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
+
+The translations may look strange at first.  Macro rules are best
+understood in their internal forms:
+\begin{ttbox}
+parse_rules:
+  ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
+  ("{\at}Finset" x)  ->  ("insert" x "empty")
+print_rules:
+  ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
+  ("insert" x "empty")  ->  ("{\at}Finset" x)
+\end{ttbox}
+This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
+two elements, binding the first to {\tt x} and the rest to {\tt xs}.
+Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
+The parse rules only work in the order given.
+
+\begin{warn}
+  The \AST{} rewriter cannot discern constants from variables and looks
+  only for names of atoms.  Thus the names of {\tt Constant}s occurring in
+  the (internal) left-hand side of translation rules should be regarded as
+  reserved keywords.  Choose non-identifiers like {\tt\at Finset} or
+  sufficiently long and strange names.  If a bound variable's name gets
+  rewritten, the result will be incorrect; for example, the term
+\begin{ttbox}
+\%empty insert. insert(x, empty)
+\end{ttbox}
+  gets printed as \verb|%empty insert. {x}|.
+\end{warn}
+
+
+\subsection{Example: a parse macro for dependent types}\label{prod_trans}
+As stated earlier, a macro rule may not introduce new {\tt Variable}s on
+the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;
+it allowed, it could cause variable capture.  In such cases you usually
+must fall back on translation functions.  But a trick can make things
+readable in some cases: {\em calling translation functions by parse
+  macros}:
+\begin{ttbox}
+PROD = FINSET +
+consts
+  Pi            :: "[i, i => i] => i"
+  "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
+  "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
+\ttbreak
+translations
+  "PROD x:A. B" => "Pi(A, \%x. B)"
+  "A -> B"      => "Pi(A, _K(B))"
+end
+ML
+  val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
+\end{ttbox}
+
+Here {\tt Pi} is an internal constant for constructing general products.
+Two external forms exist: the general case {\tt PROD x:A.B} and the
+function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
+does not depend on~{\tt x}.
+
+The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
+due to a parse translation associated with \ttindex{_K}.  The order of the
+parse rules is critical.  Unfortunately there is no such trick for
+printing, so we have to add a {\tt ML} section for the print translation
+\ttindex{dependent_tr'}.
+
+Recall that identifiers with a leading {\tt _} are allowed in translation
+rules, but not in ordinary terms.  Thus we can create \AST{}s containing
+names that are not directly expressible.
+
+The parse translation for {\tt _K} is already installed in Pure, and {\tt
+dependent_tr'} is exported by the syntax module for public use.  See
+\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
+\index{macros|)}\index{rewriting!syntactic|)}
+
+
+
+\section{*Translation functions} \label{sec:tr_funs}
+\index{translations|(}
+%
+This section describes the translation function mechanism.  By writing
+\ML{} functions, you can do almost everything with terms or \AST{}s during
+parsing and printing.  The logic \LK\ is a good example of sophisticated
+transformations between internal and external representations of
+associative sequences; here, macros would be useless.
+
+A full understanding of translations requires some familiarity
+with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
+{\tt Syntax.ast} and the encodings of types and terms as such at the various
+stages of the parsing or printing process.  Most users should never need to
+use translation functions.
+
+\subsection{Declaring translation functions}
+There are four kinds of translation functions.  Each such function is
+associated with a name, which triggers calls to it.  Such names can be
+constants (logical or syntactic) or type constructors.
+
+{\tt Syntax.print_syntax} displays the sets of names associated with the
+translation functions of a {\tt Syntax.syntax} under
+\ttindex{parse_ast_translation}, \ttindex{parse_translation},
+\ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
+add new ones via the {\tt ML} section\index{ML section@{\tt ML} section} of
+a {\tt .thy} file.  There may never be more than one function of the same
+kind per name.  Conceptually, the {\tt ML} section should appear between
+{\tt consts} and {\tt translations}; newly installed translation functions
+are already effective when macros and logical rules are parsed.
+
+The {\tt ML} section is copied verbatim into the \ML\ file generated from a
+{\tt .thy} file.  Definitions made here are accessible as components of an
+\ML\ structure; to make some definitions private, use an \ML{} {\tt local}
+declaration.  The {\tt ML} section may install translation functions by
+declaring any of the following identifiers:
+\begin{ttbox}
+val parse_ast_translation : (string * (ast list -> ast)) list
+val print_ast_translation : (string * (ast list -> ast)) list
+val parse_translation     : (string * (term list -> term)) list
+val print_translation     : (string * (term list -> term)) list
+\end{ttbox}
+
+\subsection{The translation strategy}
+All four kinds of translation functions are treated similarly.  They are
+called during the transformations between parse trees, \AST{}s and terms
+(recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
+$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
+$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
+function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
+
+For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
+combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots, + x@n}$.  For term translations, the arguments are terms and a combination
+has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp +x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
+transformations than \AST{}s do, typically involving abstractions and bound
+variables.
+
+Regardless of whether they act on terms or \AST{}s,
+parse translations differ from print translations fundamentally:
+\begin{description}
+\item[Parse translations] are applied bottom-up.  The arguments are already
+  in translated form.  The translations must not fail; exceptions trigger
+  an error message.
+
+\item[Print translations] are applied top-down.  They are supplied with
+  arguments that are partly still in internal form.  The result again
+  undergoes translation; therefore a print translation should not introduce
+  as head the very constant that invoked it.  The function may raise
+  exception \ttindex{Match} to indicate failure; in this event it has no
+  effect.
+\end{description}
+
+Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
+\ttindex{Const} for terms --- can invoke translation functions.  This
+causes another difference between parsing and printing.
+
+Parsing starts with a string and the constants are not yet identified.
+Only parse tree heads create {\tt Constant}s in the resulting \AST; recall
+$ast_of_pt$ in \S\ref{sec:asts}.  Macros and parse \AST{} translations may
+introduce further {\tt Constant}s.  When the final \AST{} is converted to a
+term, all {\tt Constant}s become {\tt Const}s; recall $term_of_ast$ in
+\S\ref{sec:asts}.
+
+Printing starts with a well-typed term and all the constants are known.  So
+all logical constants and type constructors may invoke print translations.
+These, and macros, may introduce further constants.
+
+
+\subsection{Example: a print translation for dependent types}
+\indexbold{*_K}\indexbold{*dependent_tr'}
+Let us continue the dependent type example (page~\pageref{prod_trans}) by
+examining the parse translation for {\tt _K} and the print translation
+{\tt dependent_tr'}, which are both built-in.  By convention, parse
+translations have names ending with {\tt _tr} and print translations have
+names ending with {\tt _tr'}.  Search for such names in the Isabelle
+sources to locate more examples.
+
+Here is the parse translation for {\tt _K}:
+\begin{ttbox}
+fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
+  | k_tr ts = raise TERM("k_tr",ts);
+\end{ttbox}
+If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
+{\tt Abs} node with a body derived from $t$.  Since terms given to parse
+translations are not yet typed, the type of the bound variable in the new
+{\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
+nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
+a basic term manipulation function defined in {\tt Pure/term.ML}.
+
+Here is the print translation for dependent types:
+\begin{ttbox}
+fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
+      if 0 mem (loose_bnos B) then
+        let val (x', B') = variant_abs (x, dummyT, B);
+        in list_comb (Const (q, dummyT) $Free (x', T)$ A $B', ts) + end + else list_comb (Const (r, dummyT)$ A $B, ts) + | dependent_tr' _ _ = raise Match; +\end{ttbox} +The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried +function application during its installation. We could set up print +translations for both {\tt Pi} and {\tt Sigma} by including +\begin{ttbox} +val print_translation = + [("Pi", dependent_tr' ("{\at}PROD", "{\at}->")), + ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))]; +\end{ttbox} +within the {\tt ML} section. The first of these transforms${\tt Pi}(A,
+\mtt{Abs}(x, T, B))$into$\hbox{\tt{\at}PROD}(x', A, B')$or +$\hbox{\tt{\at}->}r(A, B)$, choosing the latter form if$B$does not depend +on~$x$. It checks this using \ttindex{loose_bnos}, yet another function +from {\tt Pure/term.ML}. Note that$x'$is a version of$x$renamed away +from all names in$B$, and$B'$the body$B$with {\tt Bound} nodes +referring to our {\tt Abs} node replaced by$\ttfct{Free} (x',
+\mtt{dummyT})$. + +We must be careful with types here. While types of {\tt Const}s are +ignored, type constraints may be printed for some {\tt Free}s and +{\tt Var}s if \ttindex{show_types} is set to {\tt true}. Variables of type +\ttindex{dummyT} are never printed with constraint, though. The line +\begin{ttbox} + let val (x', B') = variant_abs (x, dummyT, B); +\end{ttbox}\index{*variant_abs} +replaces bound variable occurrences in~$B$by the free variable$x'$with +type {\tt dummyT}. Only the binding occurrence of~$x'\$ is given the
+correct type~{\tt T}, so this is the only place where a type
+constraint might appear.
+\index{translations|)}
+
+
+