| author | berghofe | 
| Mon, 05 Aug 2002 14:35:33 +0200 | |
| changeset 13455 | f88a91ff8ac6 | 
| parent 291 | a615050a7494 | 
| permissions | -rw-r--r-- | 
| 104 | 1  | 
%% $Id$  | 
| 291 | 2  | 
%% \([a-zA-Z][a-zA-Z]}\.\) \([^ ]\) \1 \2  | 
3  | 
%%  @\([a-z0-9]\)       ^{(\1)}
 | 
|
| 108 | 4  | 
|
| 291 | 5  | 
\newcommand\rmindex[1]{{#1}\index{#1}\@}
 | 
6  | 
\newcommand\mtt[1]{\mbox{\tt #1}}
 | 
|
7  | 
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
 | 
|
8  | 
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
 | 
|
9  | 
\newcommand\Constant{\ttfct{Constant}}
 | 
|
10  | 
\newcommand\Variable{\ttfct{Variable}}
 | 
|
11  | 
\newcommand\Appl[1]{\ttfct{Appl}\mathopen{\mtt[}#1\mathclose{\mtt]}}
 | 
|
12  | 
\newcommand\AST{{\sc ast}}
 | 
|
13  | 
\let\rew=\longrightarrow  | 
|
| 104 | 14  | 
|
15  | 
||
16  | 
\chapter{Defining Logics} \label{Defining-Logics}
 | 
|
17  | 
||
| 291 | 18  | 
This chapter explains how to define new formal systems --- in particular,  | 
19  | 
their concrete syntax. While Isabelle can be regarded as a theorem prover  | 
|
20  | 
for set theory, higher-order logic or the sequent calculus, its  | 
|
21  | 
distinguishing feature is support for the definition of new logics.  | 
|
| 104 | 22  | 
|
| 291 | 23  | 
Isabelle logics are hierarchies of theories, which are described and  | 
24  | 
illustrated in {\em Introduction to Isabelle}.  That material, together
 | 
|
25  | 
with the theory files provided in the examples directories, should suffice  | 
|
26  | 
for all simple applications. The easiest way to define a new theory is by  | 
|
27  | 
modifying a copy of an existing theory.  | 
|
28  | 
||
29  | 
This chapter is intended for experienced Isabelle users. It documents all  | 
|
30  | 
aspects of theories concerned with syntax: mixfix declarations, pretty  | 
|
31  | 
printing, macros and translation functions. The extended examples of  | 
|
32  | 
\S\ref{sec:min_logics} demonstrate the logical aspects of the definition of
 | 
|
33  | 
theories. Sections marked with * are highly technical and might be skipped  | 
|
34  | 
on the first reading.  | 
|
| 104 | 35  | 
|
36  | 
||
| 291 | 37  | 
\section{Priority grammars} \label{sec:priority_grammars}
 | 
38  | 
\index{grammars!priority|(} 
 | 
|
| 104 | 39  | 
|
| 291 | 40  | 
The syntax of an Isabelle logic is specified by a {\bf priority grammar}.
 | 
41  | 
A context-free grammar\index{grammars!context-free} contains a set of
 | 
|
42  | 
productions of the form $A=\gamma$, where $A$ is a nonterminal and  | 
|
43  | 
$\gamma$, the right-hand side, is a string of terminals and nonterminals.  | 
|
44  | 
Isabelle uses an extended format permitting {\bf priorities}, or
 | 
|
45  | 
precedences. Each nonterminal is decorated by an integer priority, as  | 
|
46  | 
in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be replaced
 | 
|
47  | 
using a production $A^{(q)} = \gamma$ only if $p \le q$.  Any priority
 | 
|
48  | 
grammar can be translated into a normal context free grammar by introducing  | 
|
49  | 
new nonterminals and productions.  | 
|
| 104 | 50  | 
|
51  | 
Formally, a set of context free productions $G$ induces a derivation  | 
|
| 291 | 52  | 
relation $\rew@G$. Let $\alpha$ and $\beta$ denote strings of terminal or  | 
53  | 
nonterminal symbols. Then  | 
|
54  | 
\[ \alpha\, A^{(p)}\, \beta ~\rew@G~ \alpha\,\gamma\,\beta \] 
 | 
|
55  | 
if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$q\ge p$.
 | 
|
| 104 | 56  | 
|
57  | 
The following simple grammar for arithmetic expressions demonstrates how  | 
|
| 291 | 58  | 
binding power and associativity of operators can be enforced by priorities.  | 
| 104 | 59  | 
\begin{center}
 | 
60  | 
\begin{tabular}{rclr}
 | 
|
| 291 | 61  | 
  $A^{(9)}$ & = & {\tt0} \\
 | 
62  | 
  $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
 | 
|
63  | 
  $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
 | 
|
64  | 
  $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
 | 
|
65  | 
  $A^{(3)}$ & = & {\tt-} $A^{(3)}$
 | 
|
| 104 | 66  | 
\end{tabular}
 | 
67  | 
\end{center}
 | 
|
| 291 | 68  | 
The choice of priorities determines that {\tt -} binds tighter than {\tt *},
 | 
69  | 
which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the
 | 
|
70  | 
left and {\tt *} to the right.
 | 
|
| 104 | 71  | 
|
72  | 
To minimize the number of subscripts, we adopt the following conventions:  | 
|
73  | 
\begin{itemize}
 | 
|
| 291 | 74  | 
\item All priorities $p$ must be in the range $0 \leq p \leq max_pri$ for  | 
75  | 
some fixed integer $max_pri$.  | 
|
76  | 
\item Priority $0$ on the right-hand side and priority $max_pri$ on the  | 
|
| 104 | 77  | 
left-hand side may be omitted.  | 
78  | 
\end{itemize}
 | 
|
| 291 | 79  | 
The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$;
 | 
80  | 
the priority of the left-hand side actually appears in a column on the far  | 
|
81  | 
right. Finally, alternatives may be separated by $|$, and repetition  | 
|
| 104 | 82  | 
indicated by \dots.  | 
83  | 
||
| 291 | 84  | 
Using these conventions and assuming $max_pri=9$, the grammar takes the form  | 
| 104 | 85  | 
\begin{center}
 | 
86  | 
\begin{tabular}{rclc}
 | 
|
87  | 
$A$ & = & {\tt0} & \hspace*{4em} \\
 | 
|
88  | 
 & $|$ & {\tt(} $A$ {\tt)} \\
 | 
|
| 291 | 89  | 
 & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
 | 
90  | 
 & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
 | 
|
91  | 
 & $|$ & {\tt-} $A^{(3)}$ & (3)
 | 
|
| 104 | 92  | 
\end{tabular}
 | 
93  | 
\end{center}
 | 
|
| 291 | 94  | 
\index{grammars!priority|)}
 | 
| 104 | 95  | 
|
96  | 
||
| 291 | 97  | 
\begin{figure}
 | 
| 104 | 98  | 
\begin{center}
 | 
99  | 
\begin{tabular}{rclc}
 | 
|
100  | 
$prop$ &=& \ttindex{PROP} $aprop$ ~~$|$~~ {\tt(} $prop$ {\tt)} \\
 | 
|
| 291 | 101  | 
     &$|$& $logic^{(3)}$ \ttindex{==} $logic^{(2)}$ & (2) \\
 | 
102  | 
     &$|$& $logic^{(3)}$ \ttindex{=?=} $logic^{(2)}$ & (2) \\
 | 
|
103  | 
     &$|$& $prop^{(2)}$ \ttindex{==>} $prop^{(1)}$ & (1) \\
 | 
|
104  | 
     &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
 | 
|
| 104 | 105  | 
     &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\
 | 
106  | 
$logic$ &=& $prop$ ~~$|$~~ $fun$ \\\\  | 
|
107  | 
$aprop$ &=& $id$ ~~$|$~~ $var$  | 
|
| 291 | 108  | 
    ~~$|$~~ $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\\\
 | 
| 104 | 109  | 
$fun$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $fun$ {\tt)} \\
 | 
| 291 | 110  | 
    &$|$& $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\
 | 
111  | 
    &$|$& $fun^{(max_pri)}$ {\tt::} $type$ \\
 | 
|
| 104 | 112  | 
    &$|$& \ttindex{\%} $idts$ {\tt.} $logic$ & (0) \\\\
 | 
| 291 | 113  | 
$idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
 | 
| 104 | 114  | 
$idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
 | 
115  | 
    &$|$& $id$ \ttindex{::} $type$ & (0) \\\\
 | 
|
| 291 | 116  | 
$type$ &=& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
 | 
| 135 | 117  | 
  ~~$|$~~ $tvar$ {\tt::} $sort$ \\
 | 
| 291 | 118  | 
     &$|$& $id$ ~~$|$~~ $type^{(max_pri)}$ $id$
 | 
| 104 | 119  | 
                ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
 | 
| 291 | 120  | 
     &$|$& $type^{(1)}$ \ttindex{=>} $type$ & (0) \\
 | 
| 104 | 121  | 
     &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\
 | 
122  | 
     &$|$& {\tt(} $type$ {\tt)} \\\\
 | 
|
123  | 
$sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
 | 
|
124  | 
                ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
 | 
|
125  | 
\end{tabular}\index{*"!"!}\index{*"["|}\index{*"|"]}
 | 
|
126  | 
\indexbold{type@$type$} \indexbold{sort@$sort$} \indexbold{idt@$idt$}
 | 
|
127  | 
\indexbold{idts@$idts$} \indexbold{logic@$logic$} \indexbold{prop@$prop$}
 | 
|
128  | 
\indexbold{fun@$fun$}
 | 
|
129  | 
\end{center}
 | 
|
| 291 | 130  | 
\caption{Meta-logic syntax}\label{fig:pure_gram}
 | 
| 104 | 131  | 
\end{figure}
 | 
132  | 
||
| 291 | 133  | 
|
134  | 
\section{The Pure syntax} \label{sec:basic_syntax}
 | 
|
135  | 
\index{syntax!Pure|(}
 | 
|
| 104 | 136  | 
|
| 291 | 137  | 
At the root of all object-logics lies the Pure theory,\index{theory!Pure}
 | 
138  | 
bound to the \ML{} identifier \ttindex{Pure.thy}.  It contains, among many
 | 
|
139  | 
other things, the Pure syntax. An informal account of this basic syntax  | 
|
140  | 
(meta-logic, types, \ldots) may be found in {\em Introduction to Isabelle}.
 | 
|
141  | 
A more precise description using a priority grammar is shown in  | 
|
142  | 
Fig.\ts\ref{fig:pure_gram}.  The following nonterminals are defined:
 | 
|
143  | 
\begin{description}
 | 
|
144  | 
\item[$prop$] Terms of type $prop$. These are formulae of the meta-logic.  | 
|
| 104 | 145  | 
|
| 291 | 146  | 
\item[$aprop$] Atomic propositions. These typically include the  | 
147  | 
judgement forms of the object-logic; its definition introduces a  | 
|
148  | 
meta-level predicate for each judgement form.  | 
|
149  | 
||
150  | 
\item[$logic$] Terms whose type belongs to class $logic$. Initially,  | 
|
151  | 
this category contains just $prop$. As the syntax is extended by new  | 
|
152  | 
object-logics, more productions for $logic$ are added automatically  | 
|
153  | 
(see below).  | 
|
| 104 | 154  | 
|
155  | 
\item[$fun$] Terms potentially of function type.  | 
|
156  | 
||
| 291 | 157  | 
\item[$type$] Types of the meta-logic.  | 
| 104 | 158  | 
|
| 291 | 159  | 
\item[$idts$] A list of identifiers, possibly constrained by types.  | 
| 104 | 160  | 
\end{description}
 | 
161  | 
||
| 291 | 162  | 
\begin{warn}
 | 
163  | 
  Note that \verb|x::nat y| is parsed as \verb|x::(nat y)|, treating {\tt
 | 
|
164  | 
    y} like a type constructor applied to {\tt nat}.  The likely result is
 | 
|
165  | 
an error message. To avoid this interpretation, use parentheses and  | 
|
166  | 
write \verb|(x::nat) y|.  | 
|
| 104 | 167  | 
|
| 291 | 168  | 
Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and  | 
169  | 
yields a syntax error. The correct form is \verb|(x::nat) (y::nat)|.  | 
|
170  | 
\end{warn}
 | 
|
171  | 
||
172  | 
\subsection{Logical types and default syntax}\label{logical-types}
 | 
|
173  | 
Isabelle's representation of mathematical languages is based on the typed  | 
|
174  | 
$\lambda$-calculus. All logical types, namely those of class $logic$, are  | 
|
175  | 
automatically equipped with a basic syntax of types, identifiers,  | 
|
176  | 
variables, parentheses, $\lambda$-abstractions and applications.  | 
|
177  | 
||
178  | 
More precisely, for each type constructor $ty$ with arity $(\vec{s})c$,
 | 
|
179  | 
where $c$ is a subclass of $logic$, several productions are added:  | 
|
| 104 | 180  | 
\begin{center}
 | 
181  | 
\begin{tabular}{rclc}
 | 
|
182  | 
$ty$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $ty$ {\tt)} \\
 | 
|
| 291 | 183  | 
  &$|$& $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)}\\
 | 
184  | 
  &$|$& $ty^{(max_pri)}$ {\tt::} $type$\\\\
 | 
|
| 104 | 185  | 
$logic$ &=& $ty$  | 
186  | 
\end{tabular}
 | 
|
187  | 
\end{center}
 | 
|
188  | 
||
189  | 
||
| 291 | 190  | 
\subsection{Lexical matters}
 | 
191  | 
The parser does not process input strings directly. It operates on token  | 
|
192  | 
lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of
 | 
|
193  | 
tokens: \bfindex{delimiters} and \bfindex{name tokens}.
 | 
|
| 104 | 194  | 
|
| 291 | 195  | 
Delimiters can be regarded as reserved words of the syntax. You can  | 
196  | 
add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they
 | 
|
197  | 
appear in typewriter font, for example {\tt ==}, {\tt =?=} and
 | 
|
198  | 
{\tt PROP}\@.
 | 
|
| 104 | 199  | 
|
| 291 | 200  | 
Name tokens have a predefined syntax. The lexer distinguishes four  | 
201  | 
disjoint classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
 | 
|
202  | 
identifiers\index{identifiers!type}, type unknowns\index{unknowns!type}.
 | 
|
203  | 
They are denoted by $id$\index{id@$id$}, $var$\index{var@$var$},
 | 
|
204  | 
$tid$\index{tid@$tid$}, $tvar$\index{tvar@$tvar$}, respectively.  Typical
 | 
|
205  | 
examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3}.  Here is the precise
 | 
|
206  | 
syntax:  | 
|
| 
188
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
207  | 
\begin{eqnarray*}
 | 
| 
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
208  | 
id & = & letter~quasiletter^* \\  | 
| 
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
209  | 
var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
 | 
| 291 | 210  | 
tid     & =   & \mbox{\tt '}id \\
 | 
211  | 
tvar      & =   & \mbox{\tt ?}tid ~~|~~
 | 
|
212  | 
                  \mbox{\tt ?}tid\mbox{\tt .}nat \\[1ex]
 | 
|
| 
188
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
213  | 
letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
 | 
| 
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
214  | 
digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
 | 
| 
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
215  | 
quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
 | 
| 
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
216  | 
nat & = & digit^+  | 
| 
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
217  | 
\end{eqnarray*}
 | 
| 291 | 218  | 
A $var$ or $tvar$ describes an unknown, which is internally a pair  | 
| 
188
 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 
nipkow 
parents: 
142 
diff
changeset
 | 
219  | 
of base name and index (\ML\ type \ttindex{indexname}).  These components are
 | 
| 291 | 220  | 
either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
 | 
221  | 
run together as in {\tt ?x1}.  The latter form is possible if the
 | 
|
222  | 
base name does not end with digits. If the index is 0, it may be dropped  | 
|
223  | 
altogether: {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
 | 
|
| 104 | 224  | 
|
| 291 | 225  | 
The lexer repeatedly takes the maximal prefix of the input string that  | 
226  | 
forms a valid token. A maximal prefix that is both a delimiter and a name  | 
|
227  | 
is treated as a delimiter. Spaces, tabs and newlines are separators; they  | 
|
228  | 
never occur within tokens.  | 
|
| 104 | 229  | 
|
| 291 | 230  | 
Delimiters need not be separated by white space.  For example, if {\tt -}
 | 
231  | 
is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
 | 
|
232  | 
two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\ 
 | 
|
233  | 
treats {\tt --} as a single symbolic name.  The consequence of Isabelle's
 | 
|
234  | 
more liberal scheme is that the same string may be parsed in different ways  | 
|
235  | 
after extending the syntax: after adding {\tt --} as a delimiter, the input
 | 
|
236  | 
{\tt --} is treated as a single token.
 | 
|
| 104 | 237  | 
|
| 291 | 238  | 
Name tokens are terminal symbols, strictly speaking, but we can generally  | 
239  | 
regard them as nonterminals. This is because a name token carries with it  | 
|
240  | 
useful information, the name. Delimiters, on the other hand, are nothing  | 
|
241  | 
but than syntactic sugar.  | 
|
| 104 | 242  | 
|
243  | 
||
| 291 | 244  | 
\subsection{*Inspecting the syntax}
 | 
| 104 | 245  | 
\begin{ttbox}
 | 
| 291 | 246  | 
syn_of : theory -> Syntax.syntax  | 
247  | 
Syntax.print_syntax : Syntax.syntax -> unit  | 
|
248  | 
Syntax.print_gram : Syntax.syntax -> unit  | 
|
249  | 
Syntax.print_trans : Syntax.syntax -> unit  | 
|
| 104 | 250  | 
\end{ttbox}
 | 
| 291 | 251  | 
The abstract type \ttindex{Syntax.syntax} allows manipulation of syntaxes
 | 
252  | 
in \ML. You can display values of this type by calling the following  | 
|
253  | 
functions:  | 
|
254  | 
\begin{description}
 | 
|
255  | 
\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
 | 
|
256  | 
  theory~{\it thy} as an \ML\ value.
 | 
|
257  | 
||
258  | 
\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
 | 
|
259  | 
  information contained in the syntax {\it syn}.  The displayed output can
 | 
|
260  | 
be large. The following two functions are more selective.  | 
|
| 104 | 261  | 
|
| 291 | 262  | 
\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
 | 
263  | 
  of~{\it syn}, namely the lexicon, roots and productions.
 | 
|
| 104 | 264  | 
|
| 291 | 265  | 
\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
 | 
266  | 
  part of~{\it syn}, namely the constants, parse/print macros and
 | 
|
267  | 
parse/print translations.  | 
|
268  | 
\end{description}
 | 
|
269  | 
||
270  | 
Let us demonstrate these functions by inspecting Pure's syntax. Even that  | 
|
271  | 
is too verbose to display in full.  | 
|
| 104 | 272  | 
\begin{ttbox}
 | 
273  | 
Syntax.print_syntax (syn_of Pure.thy);  | 
|
| 291 | 274  | 
{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
 | 
| 104 | 275  | 
{\out roots: logic type fun prop}
 | 
276  | 
{\out prods:}
 | 
|
| 291 | 277  | 
{\out   type = tid  (1000)}
 | 
| 104 | 278  | 
{\out   type = tvar  (1000)}
 | 
279  | 
{\out   type = id  (1000)}
 | 
|
| 291 | 280  | 
{\out   type = tid "::" sort[0]  => "_ofsort" (1000)}
 | 
| 104 | 281  | 
{\out   type = tvar "::" sort[0]  => "_ofsort" (1000)}
 | 
282  | 
{\out   \vdots}
 | 
|
| 291 | 283  | 
\ttbreak  | 
| 104 | 284  | 
{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
 | 
285  | 
{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
 | 
|
286  | 
{\out   "_idtyp" "_lambda" "_tapp" "_tappl"}
 | 
|
287  | 
{\out parse_rules:}
 | 
|
288  | 
{\out parse_translation: "!!" "_K" "_abs" "_aprop"}
 | 
|
289  | 
{\out print_translation: "all"}
 | 
|
290  | 
{\out print_rules:}
 | 
|
291  | 
{\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
 | 
|
292  | 
\end{ttbox}
 | 
|
293  | 
||
| 291 | 294  | 
As you can see, the output is divided into labeled sections. The grammar  | 
295  | 
is represented by {\tt lexicon}, {\tt roots} and {\tt prods}.  The rest
 | 
|
296  | 
refers to syntactic translations and macro expansion. Here is an  | 
|
297  | 
explanation of the various sections.  | 
|
| 104 | 298  | 
\begin{description}
 | 
| 291 | 299  | 
  \item[\ttindex{lexicon}] lists the delimiters used for lexical
 | 
300  | 
    analysis.\index{delimiters} 
 | 
|
| 104 | 301  | 
|
| 291 | 302  | 
  \item[\ttindex{roots}] lists the grammar's nonterminal symbols.  You must
 | 
303  | 
name the desired root when calling lower level functions or specifying  | 
|
304  | 
macros. Higher level functions usually expect a type and derive the  | 
|
305  | 
    actual root as described in~\S\ref{sec:grammar}.
 | 
|
| 104 | 306  | 
|
| 291 | 307  | 
  \item[\ttindex{prods}] lists the productions of the priority grammar.
 | 
308  | 
    The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
 | 
|
309  | 
    Each delimiter is quoted.  Some productions are shown with {\tt =>} and
 | 
|
310  | 
an attached string. These strings later become the heads of parse  | 
|
311  | 
trees; they also play a vital role when terms are printed (see  | 
|
312  | 
    \S\ref{sec:asts}).
 | 
|
| 104 | 313  | 
|
| 291 | 314  | 
    Productions with no strings attached are called {\bf copy
 | 
315  | 
      productions}\indexbold{productions!copy}.  Their right-hand side must
 | 
|
316  | 
have exactly one nonterminal symbol (or name token). The parser does  | 
|
317  | 
not create a new parse tree node for copy productions, but simply  | 
|
318  | 
returns the parse tree of the right-hand symbol.  | 
|
| 104 | 319  | 
|
| 291 | 320  | 
If the right-hand side consists of a single nonterminal with no  | 
321  | 
    delimiters, then the copy production is called a {\bf chain
 | 
|
322  | 
      production}\indexbold{productions!chain}.  Chain productions should
 | 
|
323  | 
be seen as abbreviations: conceptually, they are removed from the  | 
|
324  | 
grammar by adding new productions. Priority information  | 
|
325  | 
attached to chain productions is ignored, only the dummy value $-1$ is  | 
|
326  | 
displayed.  | 
|
| 104 | 327  | 
|
328  | 
  \item[\ttindex{consts}, \ttindex{parse_rules}, \ttindex{print_rules}]
 | 
|
| 291 | 329  | 
    relate to macros (see \S\ref{sec:macros}).
 | 
| 104 | 330  | 
|
| 291 | 331  | 
  \item[\ttindex{parse_ast_translation}, \ttindex{print_ast_translation}]
 | 
332  | 
list sets of constants that invoke translation functions for abstract  | 
|
333  | 
    syntax trees.  Section \S\ref{sec:asts} below discusses this obscure
 | 
|
334  | 
matter.  | 
|
| 104 | 335  | 
|
| 291 | 336  | 
  \item[\ttindex{parse_translation}, \ttindex{print_translation}] list sets
 | 
337  | 
of constants that invoke translation functions for terms (see  | 
|
338  | 
    \S\ref{sec:tr_funs}).
 | 
|
339  | 
\end{description}
 | 
|
340  | 
\index{syntax!Pure|)}
 | 
|
| 104 | 341  | 
|
342  | 
||
343  | 
\section{Mixfix declarations} \label{sec:mixfix}
 | 
|
| 291 | 344  | 
\index{mixfix declaration|(} 
 | 
| 104 | 345  | 
|
| 291 | 346  | 
When defining a theory, you declare new constants by giving their names,  | 
347  | 
their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
 | 
|
348  | 
allow you to extend Isabelle's basic $\lambda$-calculus syntax with  | 
|
349  | 
readable notation. They can express any context-free priority grammar.  | 
|
350  | 
Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
 | 
|
351  | 
general than the priority declarations of \ML\ and Prolog.  | 
|
352  | 
||
353  | 
A mixfix annotation defines a production of the priority grammar. It  | 
|
354  | 
describes the concrete syntax, the translation to abstract syntax, and the  | 
|
355  | 
pretty printing. Special case annotations provide a simple means of  | 
|
356  | 
specifying infix operators, binders and so forth.  | 
|
| 104 | 357  | 
|
| 291 | 358  | 
\subsection{Grammar productions}\label{sec:grammar}
 | 
359  | 
Let us examine the treatment of the production  | 
|
360  | 
\[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\,  
 | 
|
361  | 
                  A@n^{(p@n)}\, w@n. \]
 | 
|
362  | 
Here $A@i^{(p@i)}$ is a nonterminal with priority~$p@i$ for $i=1$,
 | 
|
363  | 
\ldots,~$n$, while $w@0$, \ldots,~$w@n$ are strings of terminals.  | 
|
364  | 
In the corresponding mixfix annotation, the priorities are given separately  | 
|
365  | 
as $[p@1,\ldots,p@n]$ and~$p$. The nonterminal symbols are identified with  | 
|
366  | 
types~$\tau$, $\tau@1$, \ldots,~$\tau@n$ respectively, and the production's  | 
|
367  | 
effect on nonterminals is expressed as the function type  | 
|
368  | 
\[ [\tau@1, \ldots, \tau@n]\To \tau. \]  | 
|
369  | 
Finally, the template  | 
|
370  | 
\[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n \]  | 
|
371  | 
describes the strings of terminals.  | 
|
| 104 | 372  | 
|
| 291 | 373  | 
A simple type is typically declared for each nonterminal symbol. In  | 
374  | 
first-order logic, type~$i$ stands for terms and~$o$ for formulae. Only  | 
|
375  | 
the outermost type constructor is taken into account. For example, any  | 
|
376  | 
type of the form $\sigma list$ stands for a list; productions may refer  | 
|
377  | 
to the symbol {\tt list} and will apply lists of any type.
 | 
|
378  | 
||
379  | 
The symbol associated with a type is called its {\bf root} since it may
 | 
|
380  | 
serve as the root of a parse tree. Precisely, the root of $(\tau@1, \dots,  | 
|
381  | 
\tau@n)ty$ is $ty$, where $\tau@1$, \ldots, $\tau@n$ are types and $ty$ is  | 
|
382  | 
a type constructor. Type infixes are a special case of this; in  | 
|
383  | 
particular, the root of $\tau@1 \To \tau@2$ is {\tt fun}.  Finally, the
 | 
|
384  | 
root of a type variable is {\tt logic}; general productions might
 | 
|
385  | 
refer to this nonterminal.  | 
|
| 104 | 386  | 
|
| 291 | 387  | 
Identifying nonterminals with types allows a constant's type to specify  | 
388  | 
syntax as well. We can declare the function~$f$ to have type $[\tau@1,  | 
|
389  | 
\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the  | 
|
390  | 
layout of the function's $n$ arguments. The constant's name, in this  | 
|
391  | 
case~$f$, will also serve as the label in the abstract syntax tree. There  | 
|
392  | 
are two exceptions to this treatment of constants:  | 
|
393  | 
\begin{enumerate}
 | 
|
394  | 
\item A production need not map directly to a logical function. In this  | 
|
395  | 
case, you must declare a constant whose purpose is purely syntactic.  | 
|
396  | 
    By convention such constants begin with the symbol~{\tt\at}, 
 | 
|
397  | 
ensuring that they can never be written in formulae.  | 
|
398  | 
||
399  | 
\item A copy production has no associated constant.  | 
|
400  | 
\end{enumerate}
 | 
|
401  | 
There is something artificial about this representation of productions,  | 
|
402  | 
but it is convenient, particularly for simple theory extensions.  | 
|
| 104 | 403  | 
|
| 291 | 404  | 
\subsection{The general mixfix form}
 | 
405  | 
Here is a detailed account of the general \bfindex{mixfix declaration} as
 | 
|
406  | 
it may occur within the {\tt consts} section of a {\tt .thy} file.
 | 
|
407  | 
\begin{center}
 | 
|
408  | 
  {\tt "$c$" ::\ "$\sigma$" ("$template$" $ps$ $p$)}
 | 
|
409  | 
\end{center}
 | 
|
410  | 
This constant declaration and mixfix annotation is interpreted as follows:  | 
|
411  | 
\begin{itemize}
 | 
|
412  | 
\item The string {\tt "$c$"} is the name of the constant associated with
 | 
|
413  | 
  the production.  If $c$ is empty (given as~{\tt ""}) then this is a copy
 | 
|
414  | 
  production.\index{productions!copy} Otherwise, parsing an instance of the
 | 
|
415  | 
  phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
 | 
|
416  | 
    $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
 | 
|
417  | 
argument.  | 
|
418  | 
||
419  | 
\item The constant $c$, if non-empty, is declared to have type $\sigma$.  | 
|
| 104 | 420  | 
|
| 291 | 421  | 
\item The string $template$ specifies the right-hand side of  | 
422  | 
the production. It has the form  | 
|
423  | 
\[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]  | 
|
424  | 
    where each occurrence of \ttindex{_} denotes an
 | 
|
425  | 
    argument\index{argument!mixfix} position and the~$w@i$ do not
 | 
|
426  | 
    contain~{\tt _}.  (If you want a literal~{\tt _} in the concrete
 | 
|
427  | 
syntax, you must escape it as described below.) The $w@i$ may  | 
|
428  | 
    consist of \rmindex{delimiters}, spaces or \rmindex{pretty
 | 
|
429  | 
printing} annotations (see below).  | 
|
| 104 | 430  | 
|
| 291 | 431  | 
\item The type $\sigma$ specifies the production's nonterminal symbols (or name  | 
432  | 
tokens). If $template$ is of the form above then $\sigma$ must be a  | 
|
433  | 
function type with at least~$n$ argument positions, say $\sigma =  | 
|
434  | 
[\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are derived  | 
|
435  | 
from the type $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described above.  | 
|
436  | 
    Any of these may be function types; the corresponding root is then {\tt
 | 
|
437  | 
fun}.  | 
|
| 104 | 438  | 
|
| 291 | 439  | 
  \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
 | 
440  | 
[$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal  | 
|
441  | 
    priority\indexbold{priorities} required of any phrase that may appear
 | 
|
442  | 
as the $i$-th argument. Missing priorities default to~$0$.  | 
|
443  | 
||
444  | 
\item The integer $p$ is the priority of this production. If omitted, it  | 
|
445  | 
defaults to the maximal priority.  | 
|
446  | 
||
447  | 
Priorities, or precedences, range between $0$ and  | 
|
448  | 
    $max_pri$\indexbold{max_pri@$max_pri$} (= 1000).
 | 
|
| 104 | 449  | 
\end{itemize}
 | 
450  | 
||
| 291 | 451  | 
The declaration {\tt $c$ ::\ "$\sigma$" ("$template$")} specifies no
 | 
452  | 
priorities. The resulting production puts no priority constraints on any  | 
|
453  | 
of its arguments and has maximal priority itself. Omitting priorities in  | 
|
454  | 
this manner will introduce syntactic ambiguities unless the production's  | 
|
455  | 
right-hand side is fully bracketed, as in \verb|"if _ then _ else _ fi"|.  | 
|
| 104 | 456  | 
|
| 291 | 457  | 
\begin{warn}
 | 
458  | 
Theories must sometimes declare types for purely syntactic purposes. One  | 
|
459  | 
  example is {\tt type}, the built-in type of types.  This is a `type of
 | 
|
460  | 
all types' in the syntactic sense only. Do not declare such types under  | 
|
461  | 
  {\tt arities} as belonging to class $logic$, for that would allow their
 | 
|
462  | 
  use in arbitrary Isabelle expressions~(\S\ref{logical-types}).
 | 
|
463  | 
\end{warn}
 | 
|
464  | 
||
465  | 
\subsection{Example: arithmetic expressions}
 | 
|
466  | 
This theory specification contains a {\tt consts} section with mixfix
 | 
|
467  | 
declarations encoding the priority grammar from  | 
|
468  | 
\S\ref{sec:priority_grammars}:
 | 
|
| 104 | 469  | 
\begin{ttbox}
 | 
470  | 
EXP = Pure +  | 
|
471  | 
types  | 
|
| 291 | 472  | 
exp  | 
| 104 | 473  | 
arities  | 
474  | 
exp :: logic  | 
|
475  | 
consts  | 
|
| 291 | 476  | 
  "0" :: "exp"                ("0"      9)
 | 
477  | 
  "+" :: "[exp, exp] => exp"  ("_ + _"  [0, 1] 0)
 | 
|
478  | 
  "*" :: "[exp, exp] => exp"  ("_ * _"  [3, 2] 2)
 | 
|
479  | 
  "-" :: "exp => exp"         ("- _"    [3] 3)
 | 
|
| 104 | 480  | 
end  | 
481  | 
\end{ttbox}
 | 
|
482  | 
Note that the {\tt arities} declaration causes {\tt exp} to be added to the
 | 
|
| 291 | 483  | 
syntax' roots.  If you put the text above into a file {\tt exp.thy} and load
 | 
| 135 | 484  | 
it via {\tt use_thy "EXP"}, you can run some tests:
 | 
| 104 | 485  | 
\begin{ttbox}
 | 
486  | 
val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";  | 
|
| 291 | 487  | 
{\out val it = fn : string -> unit}
 | 
| 104 | 488  | 
read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";  | 
489  | 
{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
 | 
|
490  | 
{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
 | 
|
491  | 
{\out \vdots}
 | 
|
492  | 
read_exp "0 + - 0 + 0";  | 
|
493  | 
{\out tokens: "0" "+" "-" "0" "+" "0"}
 | 
|
494  | 
{\out raw: ("+" ("+" "0" ("-" "0")) "0")}
 | 
|
495  | 
{\out \vdots}
 | 
|
496  | 
\end{ttbox}
 | 
|
497  | 
The output of \ttindex{Syntax.test_read} includes the token list ({\tt
 | 
|
| 291 | 498  | 
  tokens}) and the raw \AST{} directly derived from the parse tree,
 | 
499  | 
ignoring parse \AST{} translations.  The rest is tracing information
 | 
|
500  | 
provided by the macro expander (see \S\ref{sec:macros}).
 | 
|
| 104 | 501  | 
|
| 291 | 502  | 
Executing {\tt Syntax.print_gram} reveals the productions derived
 | 
503  | 
from our mixfix declarations (lots of additional information deleted):  | 
|
| 104 | 504  | 
\begin{ttbox}
 | 
| 291 | 505  | 
Syntax.print_gram (syn_of EXP.thy);  | 
506  | 
{\out exp = "0"  => "0" (9)}
 | 
|
507  | 
{\out exp = exp[0] "+" exp[1]  => "+" (0)}
 | 
|
508  | 
{\out exp = exp[3] "*" exp[2]  => "*" (2)}
 | 
|
509  | 
{\out exp = "-" exp[3]  => "-" (3)}
 | 
|
| 104 | 510  | 
\end{ttbox}
 | 
511  | 
||
| 291 | 512  | 
|
513  | 
\subsection{The mixfix template}
 | 
|
514  | 
Let us take a closer look at the string $template$ appearing in mixfix  | 
|
515  | 
annotations. This string specifies a list of parsing and printing  | 
|
516  | 
directives: delimiters\index{delimiter}, arguments\index{argument!mixfix},
 | 
|
517  | 
spaces, blocks of indentation and line breaks. These are encoded via the  | 
|
| 104 | 518  | 
following character sequences:  | 
519  | 
||
| 291 | 520  | 
\index{pretty printing|(}
 | 
| 104 | 521  | 
\begin{description}
 | 
| 291 | 522  | 
  \item[~\ttindex_~] An argument\index{argument!mixfix} position, which
 | 
523  | 
stands for a nonterminal symbol or name token.  | 
|
| 104 | 524  | 
|
| 291 | 525  | 
  \item[~$d$~] A \rmindex{delimiter}, namely a non-empty sequence of
 | 
526  | 
    non-special or escaped characters.  Escaping a character\index{escape
 | 
|
527  | 
      character} means preceding it with a {\tt '} (single quote).  Thus
 | 
|
528  | 
    you have to write {\tt ''} if you really want a single quote.  You must
 | 
|
529  | 
    also escape {\tt _}, {\tt (}, {\tt )} and {\tt /}.  Delimiters may
 | 
|
530  | 
never contain white space, though.  | 
|
| 104 | 531  | 
|
| 291 | 532  | 
\item[~$s$~] A non-empty sequence of spaces for printing. This  | 
533  | 
and the following specifications do not affect parsing at all.  | 
|
| 104 | 534  | 
|
| 291 | 535  | 
  \item[~{\ttindex($n$}~] Open a pretty printing block.  The optional
 | 
536  | 
number $n$ specifies how much indentation to add when a line break  | 
|
537  | 
    occurs within the block.  If {\tt(} is not followed by digits, the
 | 
|
538  | 
indentation defaults to~$0$.  | 
|
| 104 | 539  | 
|
| 291 | 540  | 
\item[~\ttindex)~] Close a pretty printing block.  | 
| 104 | 541  | 
|
| 291 | 542  | 
  \item[~\ttindex{//}~] Force a line break.
 | 
| 104 | 543  | 
|
| 291 | 544  | 
\item[~\ttindex/$s$~] Allow a line break. Here $s$ stands for the string  | 
545  | 
    of spaces (zero or more) right after the {\tt /} character.  These
 | 
|
546  | 
spaces are printed if the break is not taken.  | 
|
547  | 
\end{description}
 | 
|
548  | 
Isabelle's pretty printer resembles the one described in  | 
|
549  | 
Paulson~\cite{paulson91}.  \index{pretty printing|)}
 | 
|
| 104 | 550  | 
|
551  | 
||
552  | 
\subsection{Infixes}
 | 
|
| 291 | 553  | 
\indexbold{infix operators}
 | 
554  | 
Infix operators associating to the left or right can be declared  | 
|
555  | 
using {\tt infixl} or {\tt infixr}.
 | 
|
556  | 
Roughly speaking, the form {\tt $c$ ::\ "$\sigma$" (infixl $p$)}
 | 
|
557  | 
abbreviates the declarations  | 
|
| 104 | 558  | 
\begin{ttbox}
 | 
| 291 | 559  | 
"op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
 | 
560  | 
"op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
 | 
|
| 104 | 561  | 
\end{ttbox}
 | 
| 291 | 562  | 
and {\tt $c$ ::\ "$\sigma$" (infixr $p$)} abbreviates the declarations
 | 
| 104 | 563  | 
\begin{ttbox}
 | 
| 291 | 564  | 
"op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
 | 
565  | 
"op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
 | 
|
| 104 | 566  | 
\end{ttbox}
 | 
| 291 | 567  | 
The infix operator is declared as a constant with the prefix {\tt op}.
 | 
| 104 | 568  | 
Thus, prefixing infixes with \ttindex{op} makes them behave like ordinary
 | 
| 291 | 569  | 
function symbols, as in \ML. Special characters occurring in~$c$ must be  | 
570  | 
escaped, as in delimiters, using a single quote.  | 
|
571  | 
||
572  | 
The expanded forms above would be actually illegal in a {\tt .thy} file
 | 
|
573  | 
because they declare the constant \hbox{\tt"op \(c\)"} twice.
 | 
|
| 104 | 574  | 
|
575  | 
||
576  | 
\subsection{Binders}
 | 
|
| 291 | 577  | 
\indexbold{binders}
 | 
578  | 
\begingroup  | 
|
579  | 
\def\Q{{\cal Q}}
 | 
|
580  | 
A {\bf binder} is a variable-binding construct such as a quantifier.  The
 | 
|
581  | 
binder declaration \indexbold{*binder}
 | 
|
| 104 | 582  | 
\begin{ttbox}
 | 
| 291 | 583  | 
\(c\) :: "\(\sigma\)" (binder "\(\Q\)" \(p\))  | 
| 104 | 584  | 
\end{ttbox}
 | 
| 291 | 585  | 
introduces a constant~$c$ of type~$\sigma$, which must have the form  | 
586  | 
$(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is $\Q~x.P$, where  | 
|
587  | 
$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$  | 
|
588  | 
and the whole term has type~$\tau@3$. Special characters in $\Q$ must be  | 
|
589  | 
escaped using a single quote.  | 
|
590  | 
||
591  | 
Let us declare the quantifier~$\forall$:  | 
|
| 104 | 592  | 
\begin{ttbox}
 | 
593  | 
All :: "('a => o) => o"   (binder "ALL " 10)
 | 
|
594  | 
\end{ttbox}
 | 
|
| 291 | 595  | 
This let us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
 | 
596  | 
$x$.$P$}. When printing, Isabelle prefers the latter form, but must fall  | 
|
597  | 
back on $\mtt{All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
 | 
|
598  | 
$x$.$P$} have type~$o$, the type of formulae, while the bound variable  | 
|
599  | 
can be polymorphic.  | 
|
| 104 | 600  | 
|
| 291 | 601  | 
The binder~$c$ of type $(\sigma \To \tau) \To \tau$ can be nested. The  | 
602  | 
external form $\Q~x@1~x@2 \ldots x@n. P$ corresponds to the internal form  | 
|
603  | 
\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)) \]  | 
|
| 104 | 604  | 
|
605  | 
\medskip  | 
|
606  | 
The general binder declaration  | 
|
607  | 
\begin{ttbox}
 | 
|
| 291 | 608  | 
\(c\) :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)" (binder "\(\Q\)" \(p\))  | 
| 104 | 609  | 
\end{ttbox}
 | 
610  | 
is internally expanded to  | 
|
611  | 
\begin{ttbox}
 | 
|
| 291 | 612  | 
\(c\) :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)"  | 
613  | 
"\(\Q\)"\hskip-3pt  :: "[idts, \(\tau@2\)] => \(\tau@3\)"   ("(3\(\Q\)_./ _)" \(p\))
 | 
|
| 104 | 614  | 
\end{ttbox}
 | 
| 291 | 615  | 
with $idts$ being the nonterminal symbol for a list of $id$s optionally  | 
616  | 
constrained (see Fig.\ts\ref{fig:pure_gram}).  The declaration also
 | 
|
617  | 
installs a parse translation\index{translations!parse} for~$\Q$ and a print
 | 
|
618  | 
translation\index{translations!print} for~$c$ to translate between the
 | 
|
619  | 
internal and external forms.  | 
|
620  | 
\endgroup  | 
|
| 104 | 621  | 
|
| 291 | 622  | 
\index{mixfix declaration|)}
 | 
| 104 | 623  | 
|
624  | 
||
625  | 
\section{Example: some minimal logics} \label{sec:min_logics}
 | 
|
| 291 | 626  | 
This section presents some examples that have a simple syntax. They  | 
627  | 
demonstrate how to define new object-logics from scratch.  | 
|
| 104 | 628  | 
|
| 291 | 629  | 
First we must define how an object-logic syntax embedded into the  | 
630  | 
meta-logic. Since all theorems must conform to the syntax for~$prop$ (see  | 
|
631  | 
Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
 | 
|
632  | 
object-level syntax. Assume that the syntax of your object-logic defines a  | 
|
633  | 
nonterminal symbol~$o$ of formulae. These formulae can now appear in  | 
|
634  | 
axioms and theorems wherever $prop$ does if you add the production  | 
|
| 104 | 635  | 
\[ prop ~=~ o. \]  | 
| 291 | 636  | 
This is not a copy production but a coercion from formulae to propositions:  | 
| 104 | 637  | 
\begin{ttbox}
 | 
638  | 
Base = Pure +  | 
|
639  | 
types  | 
|
| 291 | 640  | 
o  | 
| 104 | 641  | 
arities  | 
642  | 
o :: logic  | 
|
643  | 
consts  | 
|
644  | 
  Trueprop :: "o => prop"   ("_" 5)
 | 
|
645  | 
end  | 
|
646  | 
\end{ttbox}
 | 
|
647  | 
The constant {\tt Trueprop} (the name is arbitrary) acts as an invisible
 | 
|
| 291 | 648  | 
coercion function.  Assuming this definition resides in a file {\tt base.thy},
 | 
| 135 | 649  | 
you have to load it with the command {\tt use_thy "Base"}.
 | 
| 104 | 650  | 
|
| 291 | 651  | 
One of the simplest nontrivial logics is {\bf minimal logic} of
 | 
652  | 
implication. Its definition in Isabelle needs no advanced features but  | 
|
653  | 
illustrates the overall mechanism nicely:  | 
|
| 104 | 654  | 
\begin{ttbox}
 | 
655  | 
Hilbert = Base +  | 
|
656  | 
consts  | 
|
657  | 
"-->" :: "[o, o] => o" (infixr 10)  | 
|
658  | 
rules  | 
|
659  | 
K "P --> Q --> P"  | 
|
660  | 
S "(P --> Q --> R) --> (P --> Q) --> P --> R"  | 
|
661  | 
MP "[| P --> Q; P |] ==> Q"  | 
|
662  | 
end  | 
|
663  | 
\end{ttbox}
 | 
|
| 291 | 664  | 
After loading this definition from the file {\tt hilbert.thy}, you can
 | 
665  | 
start to prove theorems in the logic:  | 
|
| 104 | 666  | 
\begin{ttbox}
 | 
667  | 
goal Hilbert.thy "P --> P";  | 
|
668  | 
{\out Level 0}
 | 
|
669  | 
{\out P --> P}
 | 
|
670  | 
{\out  1.  P --> P}
 | 
|
| 291 | 671  | 
\ttbreak  | 
| 104 | 672  | 
by (resolve_tac [Hilbert.MP] 1);  | 
673  | 
{\out Level 1}
 | 
|
674  | 
{\out P --> P}
 | 
|
675  | 
{\out  1.  ?P --> P --> P}
 | 
|
676  | 
{\out  2.  ?P}
 | 
|
| 291 | 677  | 
\ttbreak  | 
| 104 | 678  | 
by (resolve_tac [Hilbert.MP] 1);  | 
679  | 
{\out Level 2}
 | 
|
680  | 
{\out P --> P}
 | 
|
681  | 
{\out  1.  ?P1 --> ?P --> P --> P}
 | 
|
682  | 
{\out  2.  ?P1}
 | 
|
683  | 
{\out  3.  ?P}
 | 
|
| 291 | 684  | 
\ttbreak  | 
| 104 | 685  | 
by (resolve_tac [Hilbert.S] 1);  | 
686  | 
{\out Level 3}
 | 
|
687  | 
{\out P --> P}
 | 
|
688  | 
{\out  1.  P --> ?Q2 --> P}
 | 
|
689  | 
{\out  2.  P --> ?Q2}
 | 
|
| 291 | 690  | 
\ttbreak  | 
| 104 | 691  | 
by (resolve_tac [Hilbert.K] 1);  | 
692  | 
{\out Level 4}
 | 
|
693  | 
{\out P --> P}
 | 
|
694  | 
{\out  1.  P --> ?Q2}
 | 
|
| 291 | 695  | 
\ttbreak  | 
| 104 | 696  | 
by (resolve_tac [Hilbert.K] 1);  | 
697  | 
{\out Level 5}
 | 
|
698  | 
{\out P --> P}
 | 
|
699  | 
{\out No subgoals!}
 | 
|
700  | 
\end{ttbox}
 | 
|
| 291 | 701  | 
As we can see, this Hilbert-style formulation of minimal logic is easy to  | 
702  | 
define but difficult to use. The following natural deduction formulation is  | 
|
703  | 
better:  | 
|
| 104 | 704  | 
\begin{ttbox}
 | 
705  | 
MinI = Base +  | 
|
706  | 
consts  | 
|
707  | 
"-->" :: "[o, o] => o" (infixr 10)  | 
|
708  | 
rules  | 
|
709  | 
impI "(P ==> Q) ==> P --> Q"  | 
|
710  | 
impE "[| P --> Q; P |] ==> Q"  | 
|
711  | 
end  | 
|
712  | 
\end{ttbox}
 | 
|
| 291 | 713  | 
Note, however, that although the two systems are equivalent, this fact  | 
714  | 
cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be
 | 
|
715  | 
derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
 | 
|
716  | 
  Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule
 | 
|
717  | 
in {\tt Hilbert}, something that can only be shown by induction over all
 | 
|
718  | 
possible proofs in {\tt Hilbert}.
 | 
|
| 104 | 719  | 
|
| 291 | 720  | 
We may easily extend minimal logic with falsity:  | 
| 104 | 721  | 
\begin{ttbox}
 | 
722  | 
MinIF = MinI +  | 
|
723  | 
consts  | 
|
724  | 
False :: "o"  | 
|
725  | 
rules  | 
|
726  | 
FalseE "False ==> P"  | 
|
727  | 
end  | 
|
728  | 
\end{ttbox}
 | 
|
729  | 
On the other hand, we may wish to introduce conjunction only:  | 
|
730  | 
\begin{ttbox}
 | 
|
731  | 
MinC = Base +  | 
|
732  | 
consts  | 
|
733  | 
"&" :: "[o, o] => o" (infixr 30)  | 
|
| 291 | 734  | 
\ttbreak  | 
| 104 | 735  | 
rules  | 
736  | 
conjI "[| P; Q |] ==> P & Q"  | 
|
737  | 
conjE1 "P & Q ==> P"  | 
|
738  | 
conjE2 "P & Q ==> Q"  | 
|
739  | 
end  | 
|
740  | 
\end{ttbox}
 | 
|
| 291 | 741  | 
And if we want to have all three connectives together, we create and load a  | 
742  | 
theory file consisting of a single line:\footnote{We can combine the
 | 
|
743  | 
theories without creating a theory file using the ML declaration  | 
|
744  | 
\begin{ttbox}
 | 
|
745  | 
val MinIFC_thy = merge_theories(MinIF,MinC)  | 
|
746  | 
\end{ttbox}
 | 
|
747  | 
\index{*merge_theories|fnote}}
 | 
|
| 104 | 748  | 
\begin{ttbox}
 | 
749  | 
MinIFC = MinIF + MinC  | 
|
750  | 
\end{ttbox}
 | 
|
751  | 
Now we can prove mixed theorems like  | 
|
752  | 
\begin{ttbox}
 | 
|
753  | 
goal MinIFC.thy "P & False --> Q";  | 
|
754  | 
by (resolve_tac [MinI.impI] 1);  | 
|
755  | 
by (dresolve_tac [MinC.conjE2] 1);  | 
|
756  | 
by (eresolve_tac [MinIF.FalseE] 1);  | 
|
757  | 
\end{ttbox}
 | 
|
758  | 
Try this as an exercise!  | 
|
759  | 
||
| 291 | 760  | 
\medskip  | 
761  | 
Unless you need to define macros or syntax translation functions, you may  | 
|
762  | 
skip the rest of this chapter.  | 
|
763  | 
||
764  | 
||
765  | 
\section{*Abstract syntax trees} \label{sec:asts}
 | 
|
766  | 
\index{trees!abstract syntax|(} The parser, given a token list from the
 | 
|
767  | 
lexer, applies productions to yield a parse tree\index{trees!parse}.  By
 | 
|
768  | 
applying some internal transformations the parse tree becomes an abstract  | 
|
769  | 
syntax tree, or \AST{}.  Macro expansion, further translations and finally
 | 
|
770  | 
type inference yields a well-typed term\index{terms!obtained from ASTs}.
 | 
|
771  | 
The printing process is the reverse, except for some subtleties to be  | 
|
772  | 
discussed later.  | 
|
773  | 
||
774  | 
Figure~\ref{fig:parse_print} outlines the parsing and printing process.
 | 
|
775  | 
Much of the complexity is due to the macro mechanism. Using macros, you  | 
|
776  | 
can specify most forms of concrete syntax without writing any \ML{} code.
 | 
|
777  | 
||
778  | 
\begin{figure}
 | 
|
779  | 
\begin{center}
 | 
|
780  | 
\begin{tabular}{cl}
 | 
|
781  | 
string & \\  | 
|
782  | 
$\downarrow$ & parser \\  | 
|
783  | 
parse tree & \\  | 
|
784  | 
$\downarrow$    & parse \AST{} translation \\
 | 
|
785  | 
\AST{}             & \\
 | 
|
786  | 
$\downarrow$    & \AST{} rewriting (macros) \\
 | 
|
787  | 
\AST{}             & \\
 | 
|
788  | 
$\downarrow$ & parse translation, type inference \\  | 
|
789  | 
--- well-typed term --- & \\  | 
|
790  | 
$\downarrow$ & print translation \\  | 
|
791  | 
\AST{}             & \\
 | 
|
792  | 
$\downarrow$    & \AST{} rewriting (macros) \\
 | 
|
793  | 
\AST{}             & \\
 | 
|
794  | 
$\downarrow$    & print \AST{} translation, printer \\
 | 
|
795  | 
string &  | 
|
796  | 
\end{tabular}
 | 
|
797  | 
\index{translations!parse}\index{translations!parse AST}
 | 
|
798  | 
\index{translations!print}\index{translations!print AST}
 | 
|
799  | 
||
800  | 
\end{center}
 | 
|
801  | 
\caption{Parsing and printing}\label{fig:parse_print}
 | 
|
802  | 
\end{figure}
 | 
|
803  | 
||
804  | 
Abstract syntax trees are an intermediate form between the raw parse trees  | 
|
805  | 
and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
 | 
|
806  | 
variable) or a list of {\em at least two\/} subtrees.  Internally, they
 | 
|
807  | 
have type \ttindex{Syntax.ast}: \index{*Constant} \index{*Variable}
 | 
|
808  | 
\index{*Appl}
 | 
|
809  | 
\begin{ttbox}
 | 
|
810  | 
datatype ast = Constant of string  | 
|
811  | 
| Variable of string  | 
|
812  | 
| Appl of ast list  | 
|
813  | 
\end{ttbox}
 | 
|
814  | 
||
815  | 
Isabelle uses an S-expression syntax for abstract syntax trees. Constant  | 
|
816  | 
atoms are shown as quoted strings, variable atoms as non-quoted strings and  | 
|
817  | 
applications as a parenthesized list of subtrees. For example, the \AST  | 
|
818  | 
\begin{ttbox}
 | 
|
819  | 
Appl [Constant "_constrain",  | 
|
820  | 
Appl [Constant "_abs", Variable "x", Variable "t"],  | 
|
821  | 
Appl [Constant "fun", Variable "'a", Variable "'b"]]  | 
|
822  | 
\end{ttbox}
 | 
|
823  | 
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
 | 
|
824  | 
Both {\tt ()} and {\tt (f)} are illegal because they have too few
 | 
|
825  | 
subtrees.  | 
|
826  | 
||
827  | 
The resemblance of Lisp's S-expressions is intentional, but there are two  | 
|
828  | 
kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the  | 
|
829  | 
names ``{\tt Constant}'' and ``{\tt Variable}'' too literally; in the later
 | 
|
830  | 
translation to terms, $\Variable x$ may become a constant, free or bound  | 
|
831  | 
variable, even a type constructor or class name; the actual outcome depends  | 
|
832  | 
on the context.  | 
|
833  | 
||
834  | 
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
 | 
|
835  | 
application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of  | 
|
836  | 
application is determined later by context; it could be a type constructor  | 
|
837  | 
applied to types.  | 
|
838  | 
||
839  | 
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
 | 
|
840  | 
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
 | 
|
841  | 
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
 | 
|
842  | 
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
 | 
|
843  | 
constructor \ttindex{Bound}).
 | 
|
844  | 
||
845  | 
||
846  | 
\subsection{Transforming parse trees to \AST{}s}
 | 
|
847  | 
The parse tree is the raw output of the parser. Translation functions,  | 
|
848  | 
called {\bf parse AST translations}\indexbold{translations!parse AST},
 | 
|
849  | 
transform the parse tree into an abstract syntax tree.  | 
|
850  | 
||
851  | 
The parse tree is constructed by nesting the right-hand sides of the  | 
|
852  | 
productions used to recognize the input. Such parse trees are simply lists  | 
|
853  | 
of tokens and constituent parse trees, the latter representing the  | 
|
854  | 
nonterminals of the productions. Let us refer to the actual productions in  | 
|
855  | 
the form displayed by {\tt Syntax.print_syntax}.
 | 
|
856  | 
||
857  | 
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
 | 
|
858  | 
by stripping out delimiters and copy productions. More precisely, the  | 
|
859  | 
mapping $ast_of_pt$\index{ast_of_pt@$ast_of_pt$} is derived from the
 | 
|
860  | 
productions as follows:  | 
|
861  | 
\begin{itemize}
 | 
|
862  | 
\item Name tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$,  | 
|
863  | 
$var$, $tid$ or $tvar$ token, and $s$ its associated string.  | 
|
864  | 
||
865  | 
\item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$.  | 
|
866  | 
Here $\ldots$ stands for strings of delimiters, which are  | 
|
867  | 
discarded. $P$ stands for the single constituent that is not a  | 
|
868  | 
delimiter; it is either a nonterminal symbol or a name token.  | 
|
869  | 
||
870  | 
  \item $0$-ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$.
 | 
|
871  | 
Here there are no constituents other than delimiters, which are  | 
|
872  | 
discarded.  | 
|
873  | 
||
874  | 
\item $n$-ary productions, where $n \ge 1$: delimiters are discarded and  | 
|
875  | 
the remaining constituents $P@1$, \ldots, $P@n$ are built into an  | 
|
876  | 
application whose head constant is~$c$:  | 
|
877  | 
    \begin{eqnarray*}
 | 
|
878  | 
      \lefteqn{ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>} c)} \\
 | 
|
879  | 
      &&\qquad{}= \Appl{\Constant c, ast_of_pt(P@1), \ldots, ast_of_pt(P@n)}
 | 
|
880  | 
    \end{eqnarray*}
 | 
|
881  | 
\end{itemize}
 | 
|
882  | 
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
 | 
|
883  | 
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
 | 
|
884  | 
These examples illustrate the need for further translations to make \AST{}s
 | 
|
885  | 
closer to the typed $\lambda$-calculus. The Pure syntax provides  | 
|
886  | 
predefined parse \AST{} translations\index{translations!parse AST} for
 | 
|
887  | 
ordinary applications, type applications, nested abstractions, meta  | 
|
888  | 
implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
 | 
|
889  | 
effect on some representative input strings.  | 
|
890  | 
||
891  | 
||
892  | 
\begin{figure}
 | 
|
893  | 
\begin{center}
 | 
|
894  | 
\tt\begin{tabular}{ll}
 | 
|
895  | 
\rm input string & \rm \AST \\\hline  | 
|
896  | 
"f" & f \\  | 
|
897  | 
"'a" & 'a \\  | 
|
898  | 
"t == u"            & ("==" t u) \\
 | 
|
899  | 
"f(x)"              & ("_appl" f x) \\
 | 
|
900  | 
"f(x, y)"           & ("_appl" f ("_args" x y)) \\
 | 
|
901  | 
"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
 | 
|
902  | 
"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
 | 
|
903  | 
\end{tabular}
 | 
|
904  | 
\end{center}
 | 
|
905  | 
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
 | 
|
906  | 
\end{figure}
 | 
|
907  | 
||
908  | 
\begin{figure}
 | 
|
909  | 
\begin{center}
 | 
|
910  | 
\tt\begin{tabular}{ll}
 | 
|
911  | 
\rm input string            & \rm \AST{} \\\hline
 | 
|
912  | 
"f(x, y, z)" & (f x y z) \\  | 
|
913  | 
"'a ty" & (ty 'a) \\  | 
|
914  | 
"('a, 'b) ty"               & (ty 'a 'b) \\
 | 
|
915  | 
"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
 | 
|
916  | 
"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
 | 
|
917  | 
"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
 | 
|
918  | 
"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
 | 
|
919  | 
\end{tabular}
 | 
|
920  | 
\end{center}
 | 
|
921  | 
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
 | 
|
922  | 
\end{figure}
 | 
|
923  | 
||
924  | 
The names of constant heads in the \AST{} control the translation process.
 | 
|
925  | 
The list of constants invoking parse \AST{} translations appears in the
 | 
|
926  | 
output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
 | 
|
927  | 
||
928  | 
||
929  | 
\subsection{Transforming \AST{}s to terms}
 | 
|
930  | 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | 
|
931  | 
transformed into a term. This term is probably ill-typed since type  | 
|
932  | 
inference has not occurred yet. The term may contain type constraints  | 
|
933  | 
consisting of applications with head {\tt "_constrain"}; the second
 | 
|
934  | 
argument is a type encoded as a term. Type inference later introduces  | 
|
935  | 
correct types or rejects the input.  | 
|
936  | 
||
937  | 
Another set of translation functions, namely parse  | 
|
938  | 
translations,\index{translations!parse}, may affect this process.  If we
 | 
|
939  | 
ignore parse translations for the time being, then \AST{}s are transformed
 | 
|
940  | 
to terms by mapping \AST{} constants to constants, \AST{} variables to
 | 
|
941  | 
schematic or free variables and \AST{} applications to applications.
 | 
|
942  | 
||
943  | 
More precisely, the mapping $term_of_ast$\index{term_of_ast@$term_of_ast$}
 | 
|
944  | 
is defined by  | 
|
945  | 
\begin{itemize}
 | 
|
946  | 
\item Constants: $term_of_ast(\Constant x) = \ttfct{Const} (x,
 | 
|
947  | 
  \mtt{dummyT})$.
 | 
|
948  | 
||
949  | 
\item Schematic variables: $term_of_ast(\Variable \mtt{"?}xi\mtt") =
 | 
|
950  | 
  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
 | 
|
951  | 
the index extracted from $xi$.  | 
|
952  | 
||
953  | 
\item Free variables: $term_of_ast(\Variable x) = \ttfct{Free} (x,
 | 
|
954  | 
  \mtt{dummyT})$.
 | 
|
955  | 
||
956  | 
\item Function applications with $n$ arguments:  | 
|
957  | 
    \begin{eqnarray*}
 | 
|
958  | 
      \lefteqn{term_of_ast(\Appl{f, x@1, \ldots, x@n})} \\
 | 
|
959  | 
      &&\qquad{}= term_of_ast(f) \ttapp
 | 
|
960  | 
term_of_ast(x@1) \ttapp \ldots \ttapp term_of_ast(x@n)  | 
|
961  | 
    \end{eqnarray*}
 | 
|
962  | 
\end{itemize}
 | 
|
963  | 
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
 | 
|
964  | 
\verb|$|\index{$@{\tt\$}} are constructors of the datatype {\tt term},
 | 
|
965  | 
while \ttindex{dummyT} stands for some dummy type that is ignored during
 | 
|
966  | 
type inference.  | 
|
967  | 
||
968  | 
So far the outcome is still a first-order term. Abstractions and bound  | 
|
969  | 
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
 | 
|
970  | 
by parse translations.  Such translations are attached to {\tt "_abs"},
 | 
|
971  | 
{\tt "!!"} and user-defined binders.
 | 
|
972  | 
||
973  | 
||
974  | 
\subsection{Printing of terms}
 | 
|
975  | 
The output phase is essentially the inverse of the input phase. Terms are  | 
|
976  | 
translated via abstract syntax trees into strings. Finally the strings are  | 
|
977  | 
pretty printed.  | 
|
978  | 
||
979  | 
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
 | 
|
980  | 
terms into \AST{}s.  Ignoring those, the transformation maps
 | 
|
981  | 
term constants, variables and applications to the corresponding constructs  | 
|
982  | 
on \AST{}s.  Abstractions are mapped to applications of the special
 | 
|
983  | 
constant {\tt _abs}.
 | 
|
984  | 
||
985  | 
More precisely, the mapping $ast_of_term$\index{ast_of_term@$ast_of_term$}
 | 
|
986  | 
is defined as follows:  | 
|
987  | 
\begin{itemize}
 | 
|
988  | 
  \item $ast_of_term(\ttfct{Const} (x, \tau)) = \Constant x$.
 | 
|
989  | 
||
990  | 
  \item $ast_of_term(\ttfct{Free} (x, \tau)) = constrain (\Variable x,
 | 
|
991  | 
\tau)$.  | 
|
992  | 
||
993  | 
  \item $ast_of_term(\ttfct{Var} ((x, i), \tau)) = constrain (\Variable
 | 
|
994  | 
    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
 | 
|
995  | 
    the {\tt indexname} $(x, i)$.
 | 
|
996  | 
||
997  | 
\item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant  | 
|
998  | 
of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$  | 
|
999  | 
be obtained from~$t$ by replacing all bound occurrences of~$x$ by  | 
|
1000  | 
the free variable $x'$. This replaces corresponding occurrences of the  | 
|
1001  | 
    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
 | 
|
1002  | 
    \mtt{dummyT})$:
 | 
|
1003  | 
   \begin{eqnarray*}
 | 
|
1004  | 
      \lefteqn{ast_of_term(\ttfct{Abs} (x, \tau, t))} \\
 | 
|
1005  | 
      &&\qquad{}=   \ttfct{Appl}
 | 
|
1006  | 
                  \mathopen{\mtt[} 
 | 
|
1007  | 
                  \Constant \mtt{"_abs"}, constrain(\Variable x', \tau), \\
 | 
|
1008  | 
      &&\qquad\qquad\qquad ast_of_term(t') \mathclose{\mtt]}.
 | 
|
1009  | 
    \end{eqnarray*}
 | 
|
1010  | 
||
1011  | 
  \item $ast_of_term(\ttfct{Bound} i) = \Variable \mtt{"B.}i\mtt"$.  
 | 
|
1012  | 
    The occurrence of constructor \ttindex{Bound} should never happen
 | 
|
1013  | 
when printing well-typed terms; it indicates a de Bruijn index with no  | 
|
1014  | 
matching abstraction.  | 
|
1015  | 
||
1016  | 
\item Where $f$ is not an application,  | 
|
1017  | 
    \begin{eqnarray*}
 | 
|
1018  | 
      \lefteqn{ast_of_term(f \ttapp x@1 \ttapp \ldots \ttapp x@n)} \\
 | 
|
1019  | 
      &&\qquad{}= \ttfct{Appl} 
 | 
|
1020  | 
                  \mathopen{\mtt[} ast_of_term(f), 
 | 
|
1021  | 
ast_of_term(x@1), \ldots,ast_of_term(x@n)  | 
|
1022  | 
                  \mathclose{\mtt]}
 | 
|
1023  | 
    \end{eqnarray*}
 | 
|
1024  | 
\end{itemize}
 | 
|
1025  | 
||
1026  | 
Type constraints are inserted to allow the printing of types, which is  | 
|
1027  | 
governed by the boolean variable \ttindex{show_types}.  Constraints are
 | 
|
1028  | 
treated as follows:  | 
|
1029  | 
\begin{itemize}
 | 
|
1030  | 
  \item $constrain(x, \tau) = x$, if $\tau = \mtt{dummyT}$ \index{*dummyT} or
 | 
|
1031  | 
    \ttindex{show_types} not set to {\tt true}.
 | 
|
1032  | 
||
1033  | 
  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, ty}$,
 | 
|
1034  | 
    where $ty$ is the \AST{} encoding of $\tau$.  That is, type constructors as
 | 
|
1035  | 
    {\tt Constant}s, type identifiers as {\tt Variable}s and type applications
 | 
|
1036  | 
    as {\tt Appl}s with the head type constructor as first element.
 | 
|
1037  | 
    Additionally, if \ttindex{show_sorts} is set to {\tt true}, some type
 | 
|
1038  | 
    variables are decorated with an \AST{} encoding of their sort.
 | 
|
1039  | 
\end{itemize}
 | 
|
1040  | 
||
1041  | 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | 
|
1042  | 
transformed into the final output string.  The built-in {\bf print AST
 | 
|
1043  | 
  translations}\indexbold{translations!print AST} effectively reverse the
 | 
|
1044  | 
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
 | 
|
1045  | 
||
1046  | 
For the actual printing process, the names attached to productions  | 
|
1047  | 
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
 | 
|
1048  | 
a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
 | 
|
1049  | 
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production  | 
|
1050  | 
for~$c$. Each argument~$x@i$ is converted to a string, and put in  | 
|
1051  | 
parentheses if its priority~$(p@i)$ requires this. The resulting strings  | 
|
1052  | 
and their syntactic sugar (denoted by ``\dots'' above) are joined to make a  | 
|
1053  | 
single string.  | 
|
1054  | 
||
1055  | 
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the  | 
|
1056  | 
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots  | 
|
1057  | 
x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with
 | 
|
1058  | 
non-constant head or without a corresponding production are printed as  | 
|
1059  | 
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. An occurrence of  | 
|
1060  | 
$\Variable x$ is simply printed as~$x$.  | 
|
1061  | 
||
1062  | 
Blanks are {\em not\/} inserted automatically.  If blanks are required to
 | 
|
1063  | 
separate tokens, specify them in the mixfix declaration, possibly preceeded  | 
|
1064  | 
by a slash~({\tt/}) to allow a line break.
 | 
|
1065  | 
\index{trees!abstract syntax|)}
 | 
|
1066  | 
||
1067  | 
||
1068  | 
||
1069  | 
\section{*Macros: Syntactic rewriting} \label{sec:macros}
 | 
|
1070  | 
\index{macros|(}\index{rewriting!syntactic|(} 
 | 
|
1071  | 
||
1072  | 
Mixfix declarations alone can handle situations where there is a direct  | 
|
1073  | 
connection between the concrete syntax and the underlying term. Sometimes  | 
|
1074  | 
we require a more elaborate concrete syntax, such as quantifiers and list  | 
|
1075  | 
notation.  Isabelle's {\bf macros} and {\bf translation functions} can
 | 
|
1076  | 
perform translations such as  | 
|
1077  | 
\begin{center}\tt
 | 
|
1078  | 
  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
 | 
|
1079  | 
ALL x:A.P & Ball(A, \%x.P) \\ \relax  | 
|
1080  | 
[x, y, z] & Cons(x, Cons(y, Cons(z, Nil)))  | 
|
1081  | 
  \end{tabular}
 | 
|
1082  | 
\end{center}
 | 
|
1083  | 
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
 | 
|
1084  | 
are the most powerful translation mechanism but are difficult to read or  | 
|
1085  | 
write. Macros are specified by first-order rewriting systems that operate  | 
|
1086  | 
on abstract syntax trees. They are usually easy to read and write, and can  | 
|
1087  | 
express all but the most obscure translations.  | 
|
1088  | 
||
1089  | 
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
 | 
|
1090  | 
theory.\footnote{This and the following theories are complete working
 | 
|
1091  | 
  examples, though they specify only syntax, no axioms.  The file {\tt
 | 
|
1092  | 
ZF/zf.thy} presents the full set theory definition, including many  | 
|
1093  | 
  macro rules.}  Theory {\tt SET} defines constants for set comprehension
 | 
|
1094  | 
({\tt Collect}), replacement ({\tt Replace}) and bounded universal
 | 
|
1095  | 
quantification ({\tt Ball}).  Each of these binds some variables.  Without
 | 
|
1096  | 
additional syntax we should have to express $\forall x \in A.  P$ as {\tt
 | 
|
1097  | 
Ball(A,\%x.P)}, and similarly for the others.  | 
|
1098  | 
||
1099  | 
\begin{figure}
 | 
|
1100  | 
\begin{ttbox}
 | 
|
1101  | 
SET = Pure +  | 
|
1102  | 
types  | 
|
1103  | 
i, o  | 
|
1104  | 
arities  | 
|
1105  | 
i, o :: logic  | 
|
1106  | 
consts  | 
|
1107  | 
  Trueprop      :: "o => prop"              ("_" 5)
 | 
|
1108  | 
Collect :: "[i, i => o] => i"  | 
|
1109  | 
  "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
 | 
|
1110  | 
Replace :: "[i, [i, i] => o] => i"  | 
|
1111  | 
  "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
 | 
|
1112  | 
Ball :: "[i, i => o] => o"  | 
|
1113  | 
  "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
 | 
|
1114  | 
translations  | 
|
1115  | 
  "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
 | 
|
1116  | 
  "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
 | 
|
1117  | 
"ALL x:A. P" == "Ball(A, \%x. P)"  | 
|
1118  | 
end  | 
|
1119  | 
\end{ttbox}
 | 
|
1120  | 
\caption{Macro example: set theory}\label{fig:set_trans}
 | 
|
1121  | 
\end{figure}
 | 
|
1122  | 
||
1123  | 
The theory specifies a variable-binding syntax through additional  | 
|
1124  | 
productions that have mixfix declarations. Each non-copy production must  | 
|
1125  | 
specify some constant, which is used for building \AST{}s.  The additional
 | 
|
1126  | 
constants are decorated with {\tt\at} to stress their purely syntactic
 | 
|
1127  | 
purpose; they should never occur within the final well-typed terms.  | 
|
1128  | 
Furthermore, they cannot be written in formulae because they are not legal  | 
|
1129  | 
identifiers.  | 
|
1130  | 
||
1131  | 
The translations cause the replacement of external forms by internal forms  | 
|
1132  | 
after parsing, and vice versa before printing of terms. As a specification  | 
|
1133  | 
of the set theory notation, they should be largely self-explanatory. The  | 
|
1134  | 
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
 | 
|
1135  | 
appear implicitly in the macro rules via their mixfix forms.  | 
|
1136  | 
||
1137  | 
Macros can define variable-binding syntax because they operate on \AST{}s,
 | 
|
1138  | 
which have no inbuilt notion of bound variable.  The macro variables {\tt
 | 
|
1139  | 
  x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
 | 
|
1140  | 
in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
 | 
|
1141  | 
range over formulae containing bound variable occurrences.  | 
|
1142  | 
||
1143  | 
Other applications of the macro system can be less straightforward, and  | 
|
1144  | 
there are peculiarities. The rest of this section will describe in detail  | 
|
1145  | 
how Isabelle macros are preprocessed and applied.  | 
|
1146  | 
||
1147  | 
||
1148  | 
\subsection{Specifying macros}
 | 
|
1149  | 
Macros are basically rewrite rules on \AST{}s.  But unlike other macro
 | 
|
1150  | 
systems found in programming languages, Isabelle's macros work in both  | 
|
1151  | 
directions. Therefore a syntax contains two lists of rewrites: one for  | 
|
1152  | 
parsing and one for printing.  | 
|
1153  | 
||
1154  | 
The {\tt translations} section\index{translations section@{\tt translations}
 | 
|
1155  | 
section} specifies macros. The syntax for a macro is  | 
|
1156  | 
\[ (root)\; string \quad  | 
|
1157  | 
   \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
 | 
|
1158  | 
\right\} \quad  | 
|
1159  | 
(root)\; string  | 
|
1160  | 
\]  | 
|
1161  | 
%  | 
|
1162  | 
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
 | 
|
1163  | 
({\tt ==}).  The two strings specify the left and right-hand sides of the
 | 
|
1164  | 
macro rule. The $(root)$ specification is optional; it specifies the  | 
|
1165  | 
nonterminal for parsing the $string$ and if omitted defaults to {\tt
 | 
|
1166  | 
  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
 | 
|
1167  | 
\begin{itemize}
 | 
|
1168  | 
\item Rules must be left linear: $l$ must not contain repeated variables.  | 
|
1169  | 
||
1170  | 
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =  | 
|
1171  | 
(\mtt"c\mtt" ~ x@1 \ldots x@n)$.  | 
|
1172  | 
||
1173  | 
\item Every variable in~$r$ must also occur in~$l$.  | 
|
1174  | 
\end{itemize}
 | 
|
1175  | 
||
1176  | 
Macro rules may refer to any syntax from the parent theories. They may  | 
|
1177  | 
also refer to anything defined before the the {\tt .thy} file's {\tt
 | 
|
1178  | 
translations} section --- including any mixfix declarations.  | 
|
1179  | 
||
1180  | 
Upon declaration, both sides of the macro rule undergo parsing and parse  | 
|
1181  | 
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
 | 
|
1182  | 
macro expansion. The lexer runs in a different mode that additionally  | 
|
1183  | 
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
 | 
|
1184  | 
{\tt _K}).  Thus, a constant whose name starts with an underscore can
 | 
|
1185  | 
appear in macro rules but not in ordinary terms.  | 
|
1186  | 
||
1187  | 
Some atoms of the macro rule's \AST{} are designated as constants for
 | 
|
1188  | 
matching. These are all names that have been declared as classes, types or  | 
|
1189  | 
constants.  | 
|
1190  | 
||
1191  | 
The result of this preprocessing is two lists of macro rules, each stored  | 
|
1192  | 
as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
 | 
|
1193  | 
(sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
 | 
|
1194  | 
theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
 | 
|
1195  | 
\begin{ttbox}
 | 
|
1196  | 
parse_rules:  | 
|
1197  | 
  ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
 | 
|
1198  | 
  ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
 | 
|
1199  | 
  ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
 | 
|
1200  | 
print_rules:  | 
|
1201  | 
  ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
 | 
|
1202  | 
  ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
 | 
|
1203  | 
  ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
 | 
|
1204  | 
\end{ttbox}
 | 
|
1205  | 
||
1206  | 
\begin{warn}
 | 
|
1207  | 
Avoid choosing variable names that have previously been used as  | 
|
1208  | 
  constants, types or type classes; the {\tt consts} section in the output
 | 
|
1209  | 
  of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
 | 
|
1210  | 
incorrectly, inspect its internal form as shown above, recalling that  | 
|
1211  | 
constants appear as quoted strings and variables without quotes.  | 
|
1212  | 
\end{warn}
 | 
|
1213  | 
||
1214  | 
\begin{warn}
 | 
|
1215  | 
If \ttindex{eta_contract} is set to {\tt true}, terms will be
 | 
|
1216  | 
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
 | 
|
1217  | 
abstraction nodes needed for print rules to match may vanish. For example,  | 
|
1218  | 
\verb|Ball(A, %x. P(x))| contracts {\tt Ball(A, P)}; the print rule does
 | 
|
1219  | 
not apply and the output will be {\tt Ball(A, P)}.  This problem would not
 | 
|
1220  | 
occur if \ML{} translation functions were used instead of macros (as is
 | 
|
1221  | 
done for binder declarations).  | 
|
1222  | 
\end{warn}
 | 
|
1223  | 
||
1224  | 
||
1225  | 
\begin{warn}
 | 
|
1226  | 
Another trap concerns type constraints.  If \ttindex{show_types} is set to
 | 
|
1227  | 
{\tt true}, bound variables will be decorated by their meta types at the
 | 
|
1228  | 
binding place (but not at occurrences in the body). Matching with  | 
|
1229  | 
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
 | 
|
1230  | 
"i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
 | 
|
1231  | 
appear in the external form, say \verb|{y::i:A::i. P::o}|.  
 | 
|
1232  | 
||
1233  | 
To allow such constraints to be re-read, your syntax should specify bound  | 
|
1234  | 
variables using the nonterminal~\ttindex{idt}.  This is the case in our
 | 
|
1235  | 
example.  Choosing {\tt id} instead of {\tt idt} is a common error,
 | 
|
1236  | 
especially since it appears in former versions of most of Isabelle's  | 
|
1237  | 
object-logics.  | 
|
1238  | 
\end{warn}
 | 
|
1239  | 
||
1240  | 
||
1241  | 
||
1242  | 
\subsection{Applying rules}
 | 
|
1243  | 
As a term is being parsed or printed, an \AST{} is generated as an
 | 
|
1244  | 
intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
 | 
|
1245  | 
normalized by applying macro rules in the manner of a traditional term  | 
|
1246  | 
rewriting system. We first examine how a single rule is applied.  | 
|
1247  | 
||
1248  | 
Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some  | 
|
1249  | 
translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
 | 
|
1250  | 
instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
 | 
|
1251  | 
matched by $l$ may be replaced by the corresponding instance of~$r$, thus  | 
|
1252  | 
{\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
 | 
|
1253  | 
place-holders} that may occur in rule patterns but not in ordinary  | 
|
1254  | 
\AST{}s; {\tt Variable} atoms serve this purpose.
 | 
|
1255  | 
||
1256  | 
The matching of the object~$u$ by the pattern~$l$ is performed as follows:  | 
|
1257  | 
\begin{itemize}
 | 
|
1258  | 
\item Every constant matches itself.  | 
|
1259  | 
||
1260  | 
\item $\Variable x$ in the object matches $\Constant x$ in the pattern.  | 
|
1261  | 
This point is discussed further below.  | 
|
1262  | 
||
1263  | 
  \item Every \AST{} in the object matches $\Variable x$ in the pattern,
 | 
|
1264  | 
binding~$x$ to~$u$.  | 
|
1265  | 
||
1266  | 
\item One application matches another if they have the same number of  | 
|
1267  | 
subtrees and corresponding subtrees match.  | 
|
1268  | 
||
1269  | 
  \item In every other case, matching fails.  In particular, {\tt
 | 
|
1270  | 
Constant}~$x$ can only match itself.  | 
|
1271  | 
\end{itemize}
 | 
|
1272  | 
A successful match yields a substitution that is applied to~$r$, generating  | 
|
1273  | 
the instance that replaces~$u$.  | 
|
1274  | 
||
1275  | 
The second case above may look odd.  This is where {\tt Variable}s of
 | 
|
1276  | 
non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
 | 
|
1277  | 
far removed from parse trees; at this level it is not yet known which  | 
|
1278  | 
identifiers will become constants, bounds, frees, types or classes. As  | 
|
1279  | 
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
 | 
|
1280  | 
{\tt Constant}s, while $id$s, $var$s, $tid$s and $tvar$s become {\tt
 | 
|
1281  | 
  Variable}s.  On the other hand, when \AST{}s generated from terms for
 | 
|
1282  | 
printing, all constants and type constructors become {\tt Constant}s; see
 | 
|
1283  | 
\S\ref{sec:asts}.  Thus \AST{}s may contain a messy mixture of {\tt
 | 
|
1284  | 
  Variable}s and {\tt Constant}s.  This is insignificant at macro level
 | 
|
1285  | 
because matching treats them alike.  | 
|
1286  | 
||
1287  | 
Because of this behaviour, different kinds of atoms with the same name are  | 
|
1288  | 
indistinguishable, which may make some rules prone to misbehaviour. Example:  | 
|
1289  | 
\begin{ttbox}
 | 
|
1290  | 
types  | 
|
1291  | 
Nil  | 
|
1292  | 
consts  | 
|
1293  | 
Nil :: "'a list"  | 
|
1294  | 
  "[]"    :: "'a list"    ("[]")
 | 
|
1295  | 
translations  | 
|
1296  | 
"[]" == "Nil"  | 
|
1297  | 
\end{ttbox}
 | 
|
1298  | 
The term {\tt Nil} will be printed as {\tt []}, just as expected.  What
 | 
|
1299  | 
happens with \verb|%Nil.t| or {\tt x::Nil} is left as an exercise.
 | 
|
1300  | 
||
1301  | 
Normalizing an \AST{} involves repeatedly applying macro rules until none
 | 
|
1302  | 
is applicable. Macro rules are chosen in the order that they appear in the  | 
|
1303  | 
{\tt translations} section.  You can watch the normalization of \AST{}s
 | 
|
1304  | 
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
 | 
|
1305  | 
{\tt true}.\index{tracing!of macros} Alternatively, use
 | 
|
1306  | 
\ttindex{Syntax.test_read}.  The information displayed when tracing
 | 
|
1307  | 
includes the \AST{} before normalization ({\tt pre}), redexes with results
 | 
|
1308  | 
({\tt rewrote}), the normal form finally reached ({\tt post}) and some
 | 
|
1309  | 
statistics ({\tt normalize}).  If tracing is off,
 | 
|
1310  | 
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
 | 
|
1311  | 
printing of the normal form and statistics only.  | 
|
1312  | 
||
1313  | 
||
1314  | 
\subsection{Example: the syntax of finite sets}
 | 
|
1315  | 
This example demonstrates the use of recursive macros to implement a  | 
|
1316  | 
convenient notation for finite sets.  | 
|
1317  | 
\begin{ttbox}
 | 
|
1318  | 
FINSET = SET +  | 
|
1319  | 
types  | 
|
1320  | 
is  | 
|
1321  | 
consts  | 
|
1322  | 
  ""            :: "i => is"                ("_")
 | 
|
1323  | 
  "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
 | 
|
1324  | 
  empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
 | 
|
1325  | 
insert :: "[i, i] => i"  | 
|
1326  | 
  "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
 | 
|
1327  | 
translations  | 
|
1328  | 
  "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
 | 
|
1329  | 
  "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
 | 
|
1330  | 
end  | 
|
1331  | 
\end{ttbox}
 | 
|
1332  | 
Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
 | 
|
1333  | 
declarations above specify \verb|{x, y, z}| as the external representation
 | 
|
1334  | 
of  | 
|
1335  | 
\begin{ttbox}
 | 
|
1336  | 
insert(x, insert(y, insert(z, empty)))  | 
|
1337  | 
\end{ttbox}
 | 
|
1338  | 
||
1339  | 
The nonterminal symbol~{\tt is} stands for one or more objects of type~{\tt
 | 
|
1340  | 
  i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
 | 
|
1341  | 
allows a line break after the comma for pretty printing; if no line break  | 
|
1342  | 
is required then a space is printed instead.  | 
|
1343  | 
||
1344  | 
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
 | 
|
1345  | 
declaration.  Hence {\tt is} is not a logical type and no default
 | 
|
1346  | 
productions are added. If we had needed enumerations of the nonterminal  | 
|
1347  | 
{\tt logic}, which would include all the logical types, we could have used
 | 
|
1348  | 
the predefined nonterminal symbol \ttindex{args} and skipped this part
 | 
|
1349  | 
altogether.  The nonterminal~{\tt is} can later be reused for other
 | 
|
1350  | 
enumerations of type~{\tt i} like lists or tuples.
 | 
|
1351  | 
||
1352  | 
Next follows {\tt empty}, which is already equipped with its syntax
 | 
|
1353  | 
\verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
 | 
|
1354  | 
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
 | 
|
1355  | 
i} enclosed in curly braces. Remember that a pair of parentheses, as in  | 
|
1356  | 
\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
 | 
|
1357  | 
||
1358  | 
The translations may look strange at first. Macro rules are best  | 
|
1359  | 
understood in their internal forms:  | 
|
1360  | 
\begin{ttbox}
 | 
|
1361  | 
parse_rules:  | 
|
1362  | 
  ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
 | 
|
1363  | 
  ("{\at}Finset" x)  ->  ("insert" x "empty")
 | 
|
1364  | 
print_rules:  | 
|
1365  | 
  ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
 | 
|
1366  | 
  ("insert" x "empty")  ->  ("{\at}Finset" x)
 | 
|
1367  | 
\end{ttbox}
 | 
|
1368  | 
This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
 | 
|
1369  | 
two elements, binding the first to {\tt x} and the rest to {\tt xs}.
 | 
|
1370  | 
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
 | 
|
1371  | 
The parse rules only work in the order given.  | 
|
1372  | 
||
1373  | 
\begin{warn}
 | 
|
1374  | 
  The \AST{} rewriter cannot discern constants from variables and looks
 | 
|
1375  | 
  only for names of atoms.  Thus the names of {\tt Constant}s occurring in
 | 
|
1376  | 
the (internal) left-hand side of translation rules should be regarded as  | 
|
1377  | 
  reserved keywords.  Choose non-identifiers like {\tt\at Finset} or
 | 
|
1378  | 
sufficiently long and strange names. If a bound variable's name gets  | 
|
1379  | 
rewritten, the result will be incorrect; for example, the term  | 
|
1380  | 
\begin{ttbox}
 | 
|
1381  | 
\%empty insert. insert(x, empty)  | 
|
1382  | 
\end{ttbox}
 | 
|
1383  | 
  gets printed as \verb|%empty insert. {x}|.
 | 
|
1384  | 
\end{warn}
 | 
|
1385  | 
||
1386  | 
||
1387  | 
\subsection{Example: a parse macro for dependent types}\label{prod_trans}
 | 
|
1388  | 
As stated earlier, a macro rule may not introduce new {\tt Variable}s on
 | 
|
1389  | 
the right-hand side. Something like \verb|"K(B)" => "%x. B"| is illegal;  | 
|
1390  | 
it allowed, it could cause variable capture. In such cases you usually  | 
|
1391  | 
must fall back on translation functions. But a trick can make things  | 
|
1392  | 
readable in some cases: {\em calling translation functions by parse
 | 
|
1393  | 
macros}:  | 
|
1394  | 
\begin{ttbox}
 | 
|
1395  | 
PROD = FINSET +  | 
|
1396  | 
consts  | 
|
1397  | 
Pi :: "[i, i => i] => i"  | 
|
1398  | 
  "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
 | 
|
1399  | 
  "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
 | 
|
1400  | 
\ttbreak  | 
|
1401  | 
translations  | 
|
1402  | 
"PROD x:A. B" => "Pi(A, \%x. B)"  | 
|
1403  | 
"A -> B" => "Pi(A, _K(B))"  | 
|
1404  | 
end  | 
|
1405  | 
ML  | 
|
1406  | 
  val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
 | 
|
1407  | 
\end{ttbox}
 | 
|
1408  | 
||
1409  | 
Here {\tt Pi} is an internal constant for constructing general products.
 | 
|
1410  | 
Two external forms exist: the general case {\tt PROD x:A.B} and the
 | 
|
1411  | 
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
 | 
|
1412  | 
does not depend on~{\tt x}.
 | 
|
1413  | 
||
1414  | 
The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
 | 
|
1415  | 
due to a parse translation associated with \ttindex{_K}.  The order of the
 | 
|
1416  | 
parse rules is critical. Unfortunately there is no such trick for  | 
|
1417  | 
printing, so we have to add a {\tt ML} section for the print translation
 | 
|
1418  | 
\ttindex{dependent_tr'}.
 | 
|
1419  | 
||
1420  | 
Recall that identifiers with a leading {\tt _} are allowed in translation
 | 
|
1421  | 
rules, but not in ordinary terms.  Thus we can create \AST{}s containing
 | 
|
1422  | 
names that are not directly expressible.  | 
|
1423  | 
||
1424  | 
The parse translation for {\tt _K} is already installed in Pure, and {\tt
 | 
|
1425  | 
dependent_tr'} is exported by the syntax module for public use. See  | 
|
1426  | 
\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
 | 
|
1427  | 
\index{macros|)}\index{rewriting!syntactic|)}
 | 
|
1428  | 
||
1429  | 
||
1430  | 
||
1431  | 
\section{*Translation functions} \label{sec:tr_funs}
 | 
|
1432  | 
\index{translations|(} 
 | 
|
1433  | 
%  | 
|
1434  | 
This section describes the translation function mechanism. By writing  | 
|
1435  | 
\ML{} functions, you can do almost everything with terms or \AST{}s during
 | 
|
1436  | 
parsing and printing. The logic \LK\ is a good example of sophisticated  | 
|
1437  | 
transformations between internal and external representations of  | 
|
1438  | 
associative sequences; here, macros would be useless.  | 
|
1439  | 
||
1440  | 
A full understanding of translations requires some familiarity  | 
|
1441  | 
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
 | 
|
1442  | 
{\tt Syntax.ast} and the encodings of types and terms as such at the various
 | 
|
1443  | 
stages of the parsing or printing process. Most users should never need to  | 
|
1444  | 
use translation functions.  | 
|
1445  | 
||
1446  | 
\subsection{Declaring translation functions}
 | 
|
1447  | 
There are four kinds of translation functions. Each such function is  | 
|
1448  | 
associated with a name, which triggers calls to it. Such names can be  | 
|
1449  | 
constants (logical or syntactic) or type constructors.  | 
|
1450  | 
||
1451  | 
{\tt Syntax.print_syntax} displays the sets of names associated with the
 | 
|
1452  | 
translation functions of a {\tt Syntax.syntax} under
 | 
|
1453  | 
\ttindex{parse_ast_translation}, \ttindex{parse_translation},
 | 
|
1454  | 
\ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
 | 
|
1455  | 
add new ones via the {\tt ML} section\index{ML section@{\tt ML} section} of
 | 
|
1456  | 
a {\tt .thy} file.  There may never be more than one function of the same
 | 
|
1457  | 
kind per name.  Conceptually, the {\tt ML} section should appear between
 | 
|
1458  | 
{\tt consts} and {\tt translations}; newly installed translation functions
 | 
|
1459  | 
are already effective when macros and logical rules are parsed.  | 
|
1460  | 
||
1461  | 
The {\tt ML} section is copied verbatim into the \ML\ file generated from a
 | 
|
1462  | 
{\tt .thy} file.  Definitions made here are accessible as components of an
 | 
|
1463  | 
\ML\ structure; to make some definitions private, use an \ML{} {\tt local}
 | 
|
1464  | 
declaration.  The {\tt ML} section may install translation functions by
 | 
|
1465  | 
declaring any of the following identifiers:  | 
|
1466  | 
\begin{ttbox}
 | 
|
1467  | 
val parse_ast_translation : (string * (ast list -> ast)) list  | 
|
1468  | 
val print_ast_translation : (string * (ast list -> ast)) list  | 
|
1469  | 
val parse_translation : (string * (term list -> term)) list  | 
|
1470  | 
val print_translation : (string * (term list -> term)) list  | 
|
1471  | 
\end{ttbox}
 | 
|
1472  | 
||
1473  | 
\subsection{The translation strategy}
 | 
|
1474  | 
All four kinds of translation functions are treated similarly. They are  | 
|
1475  | 
called during the transformations between parse trees, \AST{}s and terms
 | 
|
1476  | 
(recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
 | 
|
1477  | 
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function  | 
|
1478  | 
$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
 | 
|
1479  | 
function call $f \mtt[ x@1, \ldots, x@n \mtt]$.  | 
|
1480  | 
||
1481  | 
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
 | 
|
1482  | 
combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
 | 
|
1483  | 
x@n}$. For term translations, the arguments are terms and a combination  | 
|
1484  | 
has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
 | 
|
1485  | 
x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated  | 
|
1486  | 
transformations than \AST{}s do, typically involving abstractions and bound
 | 
|
1487  | 
variables.  | 
|
1488  | 
||
1489  | 
Regardless of whether they act on terms or \AST{}s,
 | 
|
1490  | 
parse translations differ from print translations fundamentally:  | 
|
1491  | 
\begin{description}
 | 
|
1492  | 
\item[Parse translations] are applied bottom-up. The arguments are already  | 
|
1493  | 
in translated form. The translations must not fail; exceptions trigger  | 
|
1494  | 
an error message.  | 
|
1495  | 
||
1496  | 
\item[Print translations] are applied top-down. They are supplied with  | 
|
1497  | 
arguments that are partly still in internal form. The result again  | 
|
1498  | 
undergoes translation; therefore a print translation should not introduce  | 
|
1499  | 
as head the very constant that invoked it. The function may raise  | 
|
1500  | 
  exception \ttindex{Match} to indicate failure; in this event it has no
 | 
|
1501  | 
effect.  | 
|
1502  | 
\end{description}
 | 
|
1503  | 
||
1504  | 
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
 | 
|
1505  | 
\ttindex{Const} for terms --- can invoke translation functions.  This
 | 
|
1506  | 
causes another difference between parsing and printing.  | 
|
1507  | 
||
1508  | 
Parsing starts with a string and the constants are not yet identified.  | 
|
1509  | 
Only parse tree heads create {\tt Constant}s in the resulting \AST; recall
 | 
|
1510  | 
$ast_of_pt$ in \S\ref{sec:asts}.  Macros and parse \AST{} translations may
 | 
|
1511  | 
introduce further {\tt Constant}s.  When the final \AST{} is converted to a
 | 
|
1512  | 
term, all {\tt Constant}s become {\tt Const}s; recall $term_of_ast$ in
 | 
|
1513  | 
\S\ref{sec:asts}.
 | 
|
1514  | 
||
1515  | 
Printing starts with a well-typed term and all the constants are known. So  | 
|
1516  | 
all logical constants and type constructors may invoke print translations.  | 
|
1517  | 
These, and macros, may introduce further constants.  | 
|
1518  | 
||
1519  | 
||
1520  | 
\subsection{Example: a print translation for dependent types}
 | 
|
1521  | 
\indexbold{*_K}\indexbold{*dependent_tr'}
 | 
|
1522  | 
Let us continue the dependent type example (page~\pageref{prod_trans}) by
 | 
|
1523  | 
examining the parse translation for {\tt _K} and the print translation
 | 
|
1524  | 
{\tt dependent_tr'}, which are both built-in.  By convention, parse
 | 
|
1525  | 
translations have names ending with {\tt _tr} and print translations have
 | 
|
1526  | 
names ending with {\tt _tr'}.  Search for such names in the Isabelle
 | 
|
1527  | 
sources to locate more examples.  | 
|
1528  | 
||
1529  | 
Here is the parse translation for {\tt _K}:
 | 
|
1530  | 
\begin{ttbox}
 | 
|
1531  | 
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
 | 
|
1532  | 
  | k_tr ts = raise TERM("k_tr",ts);
 | 
|
1533  | 
\end{ttbox}
 | 
|
1534  | 
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
 | 
|
1535  | 
{\tt Abs} node with a body derived from $t$.  Since terms given to parse
 | 
|
1536  | 
translations are not yet typed, the type of the bound variable in the new  | 
|
1537  | 
{\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
 | 
|
1538  | 
nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
 | 
|
1539  | 
a basic term manipulation function defined in {\tt Pure/term.ML}.
 | 
|
1540  | 
||
1541  | 
Here is the print translation for dependent types:  | 
|
1542  | 
\begin{ttbox}
 | 
|
1543  | 
fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =  | 
|
1544  | 
if 0 mem (loose_bnos B) then  | 
|
1545  | 
let val (x', B') = variant_abs (x, dummyT, B);  | 
|
1546  | 
in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)  | 
|
1547  | 
end  | 
|
1548  | 
else list_comb (Const (r, dummyT) $ A $ B, ts)  | 
|
1549  | 
| dependent_tr' _ _ = raise Match;  | 
|
1550  | 
\end{ttbox}
 | 
|
1551  | 
The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried
 | 
|
1552  | 
function application during its installation. We could set up print  | 
|
1553  | 
translations for both {\tt Pi} and {\tt Sigma} by including
 | 
|
1554  | 
\begin{ttbox}
 | 
|
1555  | 
val print_translation =  | 
|
1556  | 
  [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
 | 
|
1557  | 
   ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
 | 
|
1558  | 
\end{ttbox}
 | 
|
1559  | 
within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
 | 
|
1560  | 
\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
 | 
|
1561  | 
$\hbox{\tt{\at}->}r(A, B)$, choosing the latter form if $B$ does not depend
 | 
|
1562  | 
on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
 | 
|
1563  | 
from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
 | 
|
1564  | 
from all names in $B$, and $B'$ the body $B$ with {\tt Bound} nodes
 | 
|
1565  | 
referring to our {\tt Abs} node replaced by $\ttfct{Free} (x',
 | 
|
1566  | 
\mtt{dummyT})$.
 | 
|
1567  | 
||
1568  | 
We must be careful with types here.  While types of {\tt Const}s are
 | 
|
1569  | 
ignored, type constraints may be printed for some {\tt Free}s and
 | 
|
1570  | 
{\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
 | 
|
1571  | 
\ttindex{dummyT} are never printed with constraint, though.  The line
 | 
|
1572  | 
\begin{ttbox}
 | 
|
1573  | 
let val (x', B') = variant_abs (x, dummyT, B);  | 
|
1574  | 
\end{ttbox}\index{*variant_abs}
 | 
|
1575  | 
replaces bound variable occurrences in~$B$ by the free variable $x'$ with  | 
|
1576  | 
type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
 | 
|
1577  | 
correct type~{\tt T}, so this is the only place where a type
 | 
|
1578  | 
constraint might appear.  | 
|
1579  | 
\index{translations|)}
 | 
|
1580  | 
||
1581  | 
||
1582  |