| author | lcp |
| Fri, 14 Jan 1994 12:36:30 +0100 | |
| changeset 226 | cc87161971e4 |
| parent 188 | 6be0856cdf49 |
| child 291 | a615050a7494 |
| permissions | -rw-r--r-- |
| 104 | 1 |
%% $Id$ |
| 108 | 2 |
|
| 104 | 3 |
\newcommand{\rmindex}[1]{{#1}\index{#1}\@}
|
4 |
\newcommand{\mtt}[1]{\mbox{\tt #1}}
|
|
5 |
\newcommand{\ttfct}[1]{\mathop{\mtt{#1}}\nolimits}
|
|
6 |
\newcommand{\ttrel}[1]{\mathrel{\mtt{#1}}}
|
|
7 |
\newcommand{\Constant}{\ttfct{Constant}}
|
|
8 |
\newcommand{\Variable}{\ttfct{Variable}}
|
|
9 |
\newcommand{\Appl}[1]{\ttfct{Appl}\mathopen{\mtt[}#1\mathclose{\mtt]}}
|
|
10 |
||
11 |
||
12 |
||
13 |
\chapter{Defining Logics} \label{Defining-Logics}
|
|
14 |
||
15 |
This chapter is intended for Isabelle experts. It explains how to define new |
|
16 |
logical systems, Isabelle's {\em raison d'\^etre}. Isabelle logics are
|
|
17 |
hierarchies of theories. A number of simple examples are contained in {\em
|
|
18 |
Introduction to Isabelle}; the full syntax of theory definition files ({\tt
|
|
19 |
.thy} files) is shown in {\em The Isabelle Reference Manual}. This chapter's
|
|
20 |
chief purpose is a thorough description of all syntax related matters |
|
21 |
concerning theories. The most important sections are \S\ref{sec:mixfix} about
|
|
22 |
mixfix declarations and \S\ref{sec:macros} describing the macro system. The
|
|
23 |
concluding examples of \S\ref{sec:min_logics} are more concerned with the
|
|
24 |
logical aspects of the definition of theories. Sections marked with * can be |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
25 |
skipped on first reading. |
| 104 | 26 |
|
27 |
||
28 |
||
29 |
\section{Precedence grammars} \label{sec:precedence_grammars}
|
|
30 |
||
31 |
The precise syntax of a logic is best defined by a \rmindex{context-free
|
|
32 |
grammar}. In order to simplify the description of mathematical languages, we |
|
33 |
introduce an extended format which permits {\bf
|
|
34 |
precedences}\indexbold{precedence}. This scheme generalizes precedence
|
|
35 |
declarations in \ML\ and {\sc prolog}. In this extended grammar format,
|
|
36 |
nonterminals are decorated by integers, their precedence. In the sequel, |
|
37 |
precedences are shown as subscripts. A nonterminal $A@p$ on the right-hand |
|
38 |
side of a production may only be replaced using a production $A@q = \gamma$ |
|
39 |
where $p \le q$. |
|
40 |
||
41 |
Formally, a set of context free productions $G$ induces a derivation |
|
42 |
relation $\rew@G$ on strings as follows: |
|
43 |
\[ \alpha A@p \beta ~\rew@G~ \alpha\gamma\beta ~~~iff~~~ |
|
44 |
\exists (A@q=\gamma) \in G.~q \ge p |
|
45 |
\] |
|
46 |
Any extended grammar of this kind can be translated into a normal context |
|
47 |
free grammar. However, this translation may require the introduction of a |
|
48 |
large number of new nonterminals and productions. |
|
49 |
||
50 |
\begin{example} \label{ex:precedence}
|
|
51 |
The following simple grammar for arithmetic expressions demonstrates how |
|
52 |
binding power and associativity of operators can be enforced by precedences. |
|
53 |
\begin{center}
|
|
54 |
\begin{tabular}{rclr}
|
|
55 |
$A@9$ & = & {\tt0} \\
|
|
56 |
$A@9$ & = & {\tt(} $A@0$ {\tt)} \\
|
|
57 |
$A@0$ & = & $A@0$ {\tt+} $A@1$ \\
|
|
58 |
$A@2$ & = & $A@3$ {\tt*} $A@2$ \\
|
|
59 |
$A@3$ & = & {\tt-} $A@3$
|
|
60 |
\end{tabular}
|
|
61 |
\end{center}
|
|
62 |
The choice of precedences determines that {\tt -} binds tighter than {\tt *}
|
|
63 |
which binds tighter than {\tt +}, and that {\tt +} associates to the left and
|
|
64 |
{\tt *} to the right.
|
|
65 |
\end{example}
|
|
66 |
||
67 |
To minimize the number of subscripts, we adopt the following conventions: |
|
68 |
\begin{itemize}
|
|
69 |
\item All precedences $p$ must be in the range $0 \leq p \leq max_pri$ for |
|
70 |
some fixed $max_pri$. |
|
71 |
\item Precedence $0$ on the right-hand side and precedence $max_pri$ on the |
|
72 |
left-hand side may be omitted. |
|
73 |
\end{itemize}
|
|
74 |
In addition, we write the production $A@p = \alpha$ as $A = \alpha~(p)$, |
|
75 |
i.e.\ the precedence of the left-hand side actually appears in the uttermost |
|
76 |
right. Finally, alternatives may be separated by $|$, and repetition |
|
77 |
indicated by \dots. |
|
78 |
||
79 |
Using these conventions and assuming $max_pri=9$, the grammar in |
|
80 |
Example~\ref{ex:precedence} becomes
|
|
81 |
\begin{center}
|
|
82 |
\begin{tabular}{rclc}
|
|
83 |
$A$ & = & {\tt0} & \hspace*{4em} \\
|
|
84 |
& $|$ & {\tt(} $A$ {\tt)} \\
|
|
85 |
& $|$ & $A$ {\tt+} $A@1$ & (0) \\
|
|
86 |
& $|$ & $A@3$ {\tt*} $A@2$ & (2) \\
|
|
87 |
& $|$ & {\tt-} $A@3$ & (3)
|
|
88 |
\end{tabular}
|
|
89 |
\end{center}
|
|
90 |
||
91 |
||
92 |
||
93 |
\section{Basic syntax} \label{sec:basic_syntax}
|
|
94 |
||
95 |
The basis of all extensions by object-logics is the \rmindex{Pure theory},
|
|
96 |
bound to the \ML-identifier \ttindex{Pure.thy}. It contains, among many other
|
|
97 |
things, the \rmindex{Pure syntax}. An informal account of this basic syntax
|
|
98 |
(meta-logic, types etc.) may be found in {\em Introduction to Isabelle}. A
|
|
99 |
more precise description using a precedence grammar is shown in |
|
100 |
Figure~\ref{fig:pure_gram}.
|
|
101 |
||
102 |
\begin{figure}[htb]
|
|
103 |
\begin{center}
|
|
104 |
\begin{tabular}{rclc}
|
|
105 |
$prop$ &=& \ttindex{PROP} $aprop$ ~~$|$~~ {\tt(} $prop$ {\tt)} \\
|
|
106 |
&$|$& $logic@3$ \ttindex{==} $logic@2$ & (2) \\
|
|
107 |
&$|$& $logic@3$ \ttindex{=?=} $logic@2$ & (2) \\
|
|
108 |
&$|$& $prop@2$ \ttindex{==>} $prop@1$ & (1) \\
|
|
109 |
&$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop@1$ & (1) \\
|
|
110 |
&$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\
|
|
111 |
$logic$ &=& $prop$ ~~$|$~~ $fun$ \\\\ |
|
112 |
$aprop$ &=& $id$ ~~$|$~~ $var$ |
|
113 |
~~$|$~~ $fun@{max_pri}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\\\
|
|
114 |
$fun$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $fun$ {\tt)} \\
|
|
| 135 | 115 |
&$|$& $fun@{max_pri}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\
|
116 |
&$|$& $fun@{max_pri}$ {\tt::} $type$ \\
|
|
| 104 | 117 |
&$|$& \ttindex{\%} $idts$ {\tt.} $logic$ & (0) \\\\
|
118 |
$idts$ &=& $idt$ ~~$|$~~ $idt@1$ $idts$ \\\\ |
|
119 |
$idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
|
|
120 |
&$|$& $id$ \ttindex{::} $type$ & (0) \\\\
|
|
| 135 | 121 |
$type$ &=& $tfree$ ~~$|$~~ $tvar$ ~~$|$~~ $tfree$ {\tt::} $sort$
|
122 |
~~$|$~~ $tvar$ {\tt::} $sort$ \\
|
|
| 104 | 123 |
&$|$& $id$ ~~$|$~~ $type@{max_pri}$ $id$
|
124 |
~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
|
|
125 |
&$|$& $type@1$ \ttindex{=>} $type$ & (0) \\
|
|
126 |
&$|$& {\tt[} $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\
|
|
127 |
&$|$& {\tt(} $type$ {\tt)} \\\\
|
|
128 |
$sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
|
|
129 |
~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
|
|
130 |
\end{tabular}\index{*"!"!}\index{*"["|}\index{*"|"]}
|
|
131 |
\indexbold{type@$type$} \indexbold{sort@$sort$} \indexbold{idt@$idt$}
|
|
132 |
\indexbold{idts@$idts$} \indexbold{logic@$logic$} \indexbold{prop@$prop$}
|
|
133 |
\indexbold{fun@$fun$}
|
|
134 |
\end{center}
|
|
135 |
\caption{Meta-Logic Syntax}
|
|
136 |
\label{fig:pure_gram}
|
|
137 |
\end{figure}
|
|
138 |
||
139 |
The following main categories are defined: |
|
140 |
\begin{description}
|
|
141 |
\item[$prop$] Terms of type $prop$, i.e.\ formulae of the meta-logic. |
|
142 |
||
143 |
\item[$aprop$] Atomic propositions. |
|
144 |
||
145 |
\item[$logic$] Terms of types in class $logic$. Initially, $logic$ contains |
|
146 |
merely $prop$. As the syntax is extended by new object-logics, more |
|
| 108 | 147 |
productions for $logic$ are added automatically (see below). |
| 104 | 148 |
|
149 |
\item[$fun$] Terms potentially of function type. |
|
150 |
||
151 |
\item[$type$] Meta-types. |
|
152 |
||
| 135 | 153 |
\item[$idts$] A list of identifiers, possibly constrained by types. Note |
| 104 | 154 |
that \verb|x :: nat y| is parsed as \verb|x :: (nat y)|, i.e.\ {\tt y}
|
| 108 | 155 |
would be treated like a type constructor applied to {\tt nat}.
|
| 104 | 156 |
\end{description}
|
157 |
||
158 |
||
159 |
\subsection{Logical types and default syntax}
|
|
160 |
||
161 |
Isabelle is concerned with mathematical languages which have a certain |
|
162 |
minimal vocabulary: identifiers, variables, parentheses, and the lambda |
|
163 |
calculus. Logical types, i.e.\ those of class $logic$, are automatically |
|
164 |
equipped with this basic syntax. More precisely, for any type constructor |
|
| 135 | 165 |
$ty$ with arity $(\vec{s})c$, where $c$ is a subclass of $logic$, the
|
166 |
following productions are added: |
|
| 104 | 167 |
\begin{center}
|
168 |
\begin{tabular}{rclc}
|
|
169 |
$ty$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $ty$ {\tt)} \\
|
|
170 |
&$|$& $fun@{max_pri}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)}\\
|
|
171 |
&$|$& $ty@{max_pri}$ {\tt::} $type$\\\\
|
|
172 |
$logic$ &=& $ty$ |
|
173 |
\end{tabular}
|
|
174 |
\end{center}
|
|
175 |
||
176 |
||
177 |
\subsection{Lexical matters *}
|
|
178 |
||
179 |
The parser does not process input strings directly, it rather operates on |
|
180 |
token lists provided by Isabelle's \rmindex{lexical analyzer} (the
|
|
181 |
\bfindex{lexer}). There are two different kinds of tokens: {\bf
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
182 |
delimiters}\indexbold{delimiter} and {\bf names}\indexbold{name}.
|
| 104 | 183 |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
184 |
Delimiters can be regarded as reserved words\index{reserved word} of the
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
185 |
syntax and the user can add new ones, when extending theories. In |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
186 |
Figure~\ref{fig:pure_gram} they appear in typewriter font, e.g.\ {\tt PROP},
|
| 108 | 187 |
{\tt ==}, {\tt =?=}, {\tt ;}.
|
| 104 | 188 |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
189 |
Names on the other hand have a fixed predefined syntax. The lexer |
| 104 | 190 |
distinguishes four kinds of them: identifiers\index{identifier},
|
191 |
unknowns\index{unknown}\index{scheme variable|see{unknown}}, type
|
|
192 |
variables\index{type variable}, type unknowns\index{type unknown}\index{type
|
|
193 |
scheme variable|see{type unknown}}; they are denoted by $id$\index{id@$id$},
|
|
194 |
$var$\index{var@$var$}, $tfree$\index{tfree@$tfree$},
|
|
195 |
$tvar$\index{tvar@$tvar$}, respectively. Typical examples are {\tt x}, {\tt
|
|
196 |
?x7}, {\tt 'a}, {\tt ?'a3}, the exact syntax is:
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
197 |
\begin{eqnarray*}
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
198 |
id & = & letter~quasiletter^* \\ |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
199 |
var & = & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
200 |
tfree & = & \mbox{\tt '}id \\
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
201 |
tvar & = & \mbox{\tt ?}tfree ~~|~~
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
202 |
\mbox{\tt ?}tfree\mbox{\tt .}nat \\[1ex]
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
203 |
letter & = & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
204 |
digit & = & \mbox{one of {\tt 0}\dots {\tt 9}} \\
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
205 |
quasiletter & = & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
206 |
nat & = & digit^+ |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
207 |
\end{eqnarray*}
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
208 |
A $var$ or $tvar$ describes an \rmindex{unknown} which is internally a pair
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
209 |
of base name and index (\ML\ type \ttindex{indexname}). These components are
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
210 |
either explicitly separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
211 |
directly run together as in {\tt ?x1}. The latter form is possible, if the
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
212 |
base name does not end with digits; if the index is 0, it may be dropped |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
213 |
altogether: {\tt ?x} abbreviates {\tt ?x0} or {\tt ?x.0}.
|
| 104 | 214 |
|
215 |
Note that $id$, $var$, $tfree$, $tvar$ are mutually disjoint, but it is |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
216 |
perfectly legal that they overlap with the set of delimiters (e.g.\ {\tt
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
217 |
PROP}, {\tt ALL}, {\tt EX}).
|
| 104 | 218 |
|
219 |
The lexical analyzer translates input strings to token lists by repeatedly |
|
220 |
taking the maximal prefix of the input string that forms a valid token. A |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
221 |
maximal prefix that is both a delimiter and a name is treated as a delimiter. |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
222 |
Spaces, tabs and newlines are separators; they never occur within tokens. |
| 104 | 223 |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
224 |
Note that delimiters need not necessarily be surrounded by white space to be |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
225 |
recognized as separate. For example, if {\tt -} is a delimiter but {\tt --}
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
226 |
is not, then the string {\tt --} is treated as two consecutive occurrences of
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
227 |
{\tt -}. This is in contrast to \ML\ which always treats {\tt --} as a single
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
228 |
symbolic name. The consequence of Isabelle's more liberal scheme is that the |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
229 |
same string may be parsed in different ways after extending the syntax: after |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
230 |
adding {\tt --} as a delimiter, the input {\tt --} is treated as a single
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
231 |
token. |
| 104 | 232 |
|
233 |
||
234 |
\subsection{Inspecting syntax *}
|
|
235 |
||
236 |
You may get the \ML\ representation of the syntax of any Isabelle theory by |
|
237 |
applying \index{*syn_of}
|
|
238 |
\begin{ttbox}
|
|
239 |
syn_of: theory -> Syntax.syntax |
|
240 |
\end{ttbox}
|
|
| 108 | 241 |
\ttindex{Syntax.syntax} is an abstract type. Values of this type can be
|
242 |
displayed by the following functions: \index{*Syntax.print_syntax}
|
|
243 |
\index{*Syntax.print_gram} \index{*Syntax.print_trans}
|
|
| 104 | 244 |
\begin{ttbox}
|
245 |
Syntax.print_syntax: Syntax.syntax -> unit |
|
246 |
Syntax.print_gram: Syntax.syntax -> unit |
|
247 |
Syntax.print_trans: Syntax.syntax -> unit |
|
248 |
\end{ttbox}
|
|
| 108 | 249 |
{\tt Syntax.print_syntax} shows virtually all information contained in a
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
250 |
syntax. Its output is divided into labeled sections. The syntax proper is |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
251 |
represented by {\tt lexicon}, {\tt roots} and {\tt prods}. The rest refers to
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
252 |
the manifold facilities to apply syntactic translations (macro expansion |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
253 |
etc.). |
| 104 | 254 |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
255 |
To cope with the verbosity of {\tt Syntax.print_syntax}, there are
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
256 |
\ttindex{Syntax.print_gram} and \ttindex{Syntax.print_trans} to print the
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
257 |
syntax proper and the translation related information only. |
| 104 | 258 |
|
259 |
Let's have a closer look at part of Pure's syntax: |
|
260 |
\begin{ttbox}
|
|
261 |
Syntax.print_syntax (syn_of Pure.thy); |
|
262 |
{\out lexicon: "!!" "%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
|
|
263 |
{\out roots: logic type fun prop}
|
|
264 |
{\out prods:}
|
|
265 |
{\out type = tfree (1000)}
|
|
266 |
{\out type = tvar (1000)}
|
|
267 |
{\out type = id (1000)}
|
|
268 |
{\out type = tfree "::" sort[0] => "_ofsort" (1000)}
|
|
269 |
{\out type = tvar "::" sort[0] => "_ofsort" (1000)}
|
|
270 |
{\out \vdots}
|
|
271 |
{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
|
|
272 |
{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
|
|
273 |
{\out "_idtyp" "_lambda" "_tapp" "_tappl"}
|
|
274 |
{\out parse_rules:}
|
|
275 |
{\out parse_translation: "!!" "_K" "_abs" "_aprop"}
|
|
276 |
{\out print_translation: "all"}
|
|
277 |
{\out print_rules:}
|
|
278 |
{\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
|
|
279 |
\end{ttbox}
|
|
280 |
||
281 |
\begin{description}
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
282 |
\item[\ttindex{lexicon}] The set of delimiters used for lexical analysis.
|
| 104 | 283 |
|
284 |
\item[\ttindex{roots}]
|
|
285 |
The legal syntactic categories to start parsing with. You name the |
|
286 |
desired root directly as a string when calling lower level functions or |
|
287 |
specifying macros. Higher level functions usually expect a type and |
|
288 |
derive the actual root as follows:\index{root_of_type@$root_of_type$}
|
|
289 |
\begin{itemize}
|
|
290 |
\item $root_of_type(\tau@1 \To \tau@2) = \mtt{fun}$.
|
|
291 |
||
292 |
\item $root_of_type(\tau@1 \mathrel{ty} \tau@2) = ty$.
|
|
293 |
||
294 |
\item $root_of_type((\tau@1, \dots, \tau@n)ty) = ty$. |
|
295 |
||
296 |
\item $root_of_type(\alpha) = \mtt{logic}$.
|
|
297 |
\end{itemize}
|
|
| 135 | 298 |
Thereby $\tau@1, \dots, \tau@n$ are types, $ty$ an infix or ordinary type |
299 |
constructor and $\alpha$ a type variable or unknown. Note that only the |
|
300 |
outermost type constructor is taken into account. |
|
| 104 | 301 |
|
302 |
\item[\ttindex{prods}]
|
|
303 |
The list of productions describing the precedence grammar. Nonterminals |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
304 |
$A@n$ are rendered in {\sc ascii} as {\tt $A$[$n$]}, delimiters are
|
| 135 | 305 |
quoted. Some productions have strings attached after an {\tt =>}. These
|
| 104 | 306 |
strings later become the heads of parse trees, but they also play a vital |
307 |
role when terms are printed (see \S\ref{sec:asts}).
|
|
308 |
||
309 |
Productions which do not have string attached and thus do not create a |
|
310 |
new parse tree node are called {\bf copy productions}\indexbold{copy
|
|
311 |
production}. They must have exactly one |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
312 |
argument\index{argument!production} (i.e.\ nonterminal or name token)
|
| 104 | 313 |
on the right-hand side. The parse tree generated when parsing that |
314 |
argument is simply passed up as the result of parsing the whole copy |
|
315 |
production. |
|
316 |
||
317 |
A special kind of copy production is one where the argument is a |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
318 |
nonterminal and no additional syntactic sugar (delimiters) exists. It is |
| 104 | 319 |
called a \bfindex{chain production}. Chain productions should be seen as
|
320 |
an abbreviation mechanism: conceptually, they are removed from the |
|
321 |
grammar by adding appropriate new rules. Precedence information attached |
|
322 |
to chain productions is ignored, only the dummy value $-1$ is displayed. |
|
323 |
||
324 |
\item[\ttindex{consts}, \ttindex{parse_rules}, \ttindex{print_rules}]
|
|
| 108 | 325 |
This is macro related information (see \S\ref{sec:macros}).
|
| 104 | 326 |
|
327 |
\item[\tt *_translation] |
|
328 |
\index{*parse_ast_translation} \index{*parse_translation}
|
|
329 |
\index{*print_translation} \index{*print_ast_translation}
|
|
330 |
The sets of constants that invoke translation functions. These are more |
|
331 |
arcane matters (see \S\ref{sec:asts} and \S\ref{sec:tr_funs}).
|
|
332 |
\end{description}
|
|
333 |
||
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
334 |
You may inspect the syntax of any theory using the above calling |
| 108 | 335 |
sequence. Beware that, as more and more material accumulates, the output |
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
336 |
becomes longer and longer. |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
337 |
% When extending syntaxes, new {\tt roots}, {\tt prods}, {\tt parse_rules}
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
338 |
% and {\tt print_rules} are appended to the end. The other lists are
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
339 |
% displayed sorted. |
| 104 | 340 |
|
341 |
||
342 |
\section{Abstract syntax trees} \label{sec:asts}
|
|
343 |
||
344 |
Figure~\ref{fig:parse_print} shows a simplified model of the parsing and
|
|
345 |
printing process. |
|
346 |
||
347 |
\begin{figure}[htb]
|
|
348 |
\begin{center}
|
|
349 |
\begin{tabular}{cl}
|
|
350 |
string & \\ |
|
351 |
$\downarrow$ & parser \\ |
|
352 |
parse tree & \\ |
|
353 |
$\downarrow$ & \rmindex{parse ast translation} \\
|
|
354 |
ast & \\ |
|
355 |
$\downarrow$ & ast rewriting (macros) \\ |
|
356 |
ast & \\ |
|
357 |
$\downarrow$ & \rmindex{parse translation}, type-inference \\
|
|
358 |
--- well-typed term --- & \\ |
|
359 |
$\downarrow$ & \rmindex{print translation} \\
|
|
360 |
ast & \\ |
|
361 |
$\downarrow$ & ast rewriting (macros) \\ |
|
362 |
ast & \\ |
|
363 |
$\downarrow$ & \rmindex{print ast translation}, printer \\
|
|
364 |
string & |
|
365 |
\end{tabular}
|
|
366 |
\end{center}
|
|
367 |
\caption{Parsing and Printing}
|
|
368 |
\label{fig:parse_print}
|
|
369 |
\end{figure}
|
|
370 |
||
| 108 | 371 |
The parser takes an input string (more precisely the token list produced by |
372 |
the lexer) and produces a \rmin«ôx{parse tree}, which is directly derived
|
|
373 |
from the productions involved. By applying some internal transformations the |
|
374 |
parse tree becomes an \bfindex{abstract syntax tree} (\bfindex{ast}). Macro
|
|
| 104 | 375 |
expansion, further translations and finally type inference yields a |
376 |
well-typed term\index{term!well-typed}.
|
|
377 |
||
378 |
The printing process is the reverse, except for some subtleties to be |
|
379 |
discussed later. |
|
380 |
||
381 |
Asts are an intermediate form between the very raw parse trees and the typed |
|
382 |
$\lambda$-terms. An ast is either an atom (constant or variable) or a list of |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
383 |
{\em at least two\/} subasts. Internally, they have type
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
384 |
\ttindex{Syntax.ast}: \index{*Constant} \index{*Variable} \index{*Appl}
|
| 104 | 385 |
\begin{ttbox}
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
386 |
datatype ast = Constant of string |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
387 |
| Variable of string |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
388 |
| Appl of ast list |
| 104 | 389 |
\end{ttbox}
|
390 |
||
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
391 |
We write constant atoms as quoted strings, variable atoms as non-quoted |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
392 |
strings and applications as list of subasts separated by space and enclosed |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
393 |
in parentheses. For example: |
| 104 | 394 |
\begin{ttbox}
|
395 |
Appl [Constant "_constrain", |
|
396 |
Appl [Constant "_abs", Variable "x", Variable "t"], |
|
397 |
Appl [Constant "fun", Variable "'a", Variable "'b"]] |
|
398 |
\end{ttbox}
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
399 |
is written as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
|
| 104 | 400 |
Note that {\tt ()} and {\tt (f)} are both illegal.
|
401 |
||
| 135 | 402 |
The resemblance of Lisp's S-expressions is intentional, but notice the two |
| 104 | 403 |
kinds of atomic symbols: $\Constant x$ and $\Variable x$. This distinction |
| 108 | 404 |
has some more obscure reasons and you can ignore it about half of the time. |
405 |
You should not take the names ``{\tt Constant}'' and ``{\tt Variable}'' too
|
|
406 |
literally. In the later translation to terms, $\Variable x$ may become a |
|
407 |
constant, free or bound variable, even a type constructor or class name; the |
|
408 |
actual outcome depends on the context. |
|
| 104 | 409 |
|
410 |
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as some sort of
|
|
411 |
application of $f$ to the arguments $x@1, \ldots, x@n$. But the actual kind |
|
412 |
of application (say a type constructor $f$ applied to types $x@1, \ldots, |
|
413 |
x@n$) is determined later by context, too. |
|
414 |
||
415 |
Forms like {\tt (("_abs" x $t$) $u$)} are perfectly legal, but asts are not
|
|
416 |
higher-order: the {\tt "_abs"} does not yet bind the {\tt x} in any way,
|
|
417 |
though later at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs}
|
|
418 |
node and occurrences of {\tt x} in $t$ will be replaced by {\tt Bound}s. Even
|
|
| 135 | 419 |
if non-constant heads of applications may seem unusual, asts should be |
| 108 | 420 |
regarded as first-order. Purists may think of ${\tt (} f~x@1~\ldots~x@n{\tt
|
421 |
)}$ as a first-order application of some invisible $(n+1)$-ary constant. |
|
| 104 | 422 |
|
423 |
||
424 |
\subsection{Parse trees to asts}
|
|
425 |
||
426 |
Asts are generated from parse trees by applying some translation functions, |
|
427 |
which are internally called {\bf parse ast translations}\indexbold{parse ast
|
|
428 |
translation}. |
|
429 |
||
430 |
We can think of the raw output of the parser as trees built up by nesting the |
|
431 |
right-hand sides of those productions that were used to derive a word that |
|
432 |
matches the input token list. Then parse trees are simply lists of tokens and |
|
433 |
sub parse trees, the latter replacing the nonterminals of the productions. |
|
434 |
Note that we refer here to the actual productions in their internal form as |
|
435 |
displayed by {\tt Syntax.print_syntax}.
|
|
436 |
||
437 |
Ignoring parse ast translations, the mapping |
|
438 |
$ast_of_pt$\index{ast_of_pt@$ast_of_pt$} from parse trees to asts is derived
|
|
439 |
from the productions involved as follows: |
|
440 |
\begin{itemize}
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
441 |
\item Name tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$, |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
442 |
$var$, $tfree$ or $tvar$ token, and $s$ its associated string. |
| 104 | 443 |
|
444 |
\item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$. |
|
445 |
||
446 |
\item $0$-ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$.
|
|
447 |
||
448 |
\item $n$-ary productions: $ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>}
|
|
449 |
c) = \Appl{\Constant c,$ $ast_of_pt(P@1),$ $\ldots,$ $ast_of_pt(P@n)}$,
|
|
450 |
where $n \ge 1$. |
|
451 |
\end{itemize}
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
452 |
where $P, P@1, \ldots, P@n$ denote parse trees or name tokens and ``\dots'' |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
453 |
zero or more delimiters. Delimiters are stripped and do not appear in asts. |
| 104 | 454 |
The following table presents some simple examples: |
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
455 |
\begin{center}
|
| 104 | 456 |
{\tt\begin{tabular}{ll}
|
457 |
\rm input string & \rm ast \\\hline |
|
458 |
"f" & f \\ |
|
459 |
"'a" & 'a \\ |
|
460 |
"t == u" & ("==" t u) \\
|
|
461 |
"f(x)" & ("_appl" f x) \\
|
|
462 |
"f(x, y)" & ("_appl" f ("_args" x y)) \\
|
|
463 |
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\
|
|
464 |
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
465 |
\end{tabular}}
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
466 |
\end{center}
|
| 104 | 467 |
Some of these examples illustrate why further translations are desirable in |
468 |
order to provide some nice standard form of asts before macro expansion takes |
|
469 |
place. Hence the Pure syntax provides predefined parse ast |
|
470 |
translations\index{parse ast translation!of Pure} for ordinary applications,
|
|
| 135 | 471 |
type applications, nested abstractions, meta implications and function types. |
| 104 | 472 |
Their net effect on some representative input strings is shown in |
473 |
Figure~\ref{fig:parse_ast_tr}.
|
|
474 |
||
475 |
\begin{figure}[htb]
|
|
476 |
\begin{center}
|
|
477 |
{\tt\begin{tabular}{ll}
|
|
478 |
\rm string & \rm ast \\\hline |
|
479 |
"f(x, y, z)" & (f x y z) \\ |
|
480 |
"'a ty" & (ty 'a) \\ |
|
481 |
"('a, 'b) ty" & (ty 'a 'b) \\
|
|
482 |
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\
|
|
483 |
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\
|
|
484 |
"[| P; Q; R |] => S" & ("==>" P ("==>" Q ("==>" R S))) \\
|
|
485 |
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
|
|
486 |
\end{tabular}}
|
|
487 |
\end{center}
|
|
488 |
\caption{Built-in Parse Ast Translations}
|
|
489 |
\label{fig:parse_ast_tr}
|
|
490 |
\end{figure}
|
|
491 |
||
492 |
This translation process is guided by names of constant heads of asts. The |
|
493 |
list of constants invoking parse ast translations is shown in the output of |
|
494 |
{\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
|
|
495 |
||
496 |
||
497 |
\subsection{Asts to terms *}
|
|
498 |
||
499 |
After the ast has been normalized by the macro expander (see |
|
500 |
\S\ref{sec:macros}), it is transformed into a term with yet another set of
|
|
501 |
translation functions interspersed: {\bf parse translations}\indexbold{parse
|
|
502 |
translation}. The result is a non-typed term\index{term!non-typed} which may
|
|
503 |
contain type constraints, that is 2-place applications with head {\tt
|
|
504 |
"_constrain"}. The second argument of a constraint is a type encoded as term. |
|
505 |
Real types will be introduced later during type inference, which is not |
|
506 |
discussed here. |
|
507 |
||
508 |
If we ignore the built-in parse translations of Pure first, then the mapping |
|
509 |
$term_of_ast$\index{term_of_ast@$term_of_ast$} from asts to (non-typed) terms
|
|
510 |
is defined by: |
|
511 |
\begin{itemize}
|
|
512 |
\item $term_of_ast(\Constant x) = \ttfct{Const} (x, \mtt{dummyT})$.
|
|
513 |
||
514 |
\item $term_of_ast(\Variable \mtt{"?}xi\mtt") = \ttfct{Var} ((x, i),
|
|
515 |
\mtt{dummyT})$, where $x$ is the base name and $i$ the index extracted
|
|
516 |
from $xi$. |
|
517 |
||
518 |
\item $term_of_ast(\Variable x) = \ttfct{Free} (x, \mtt{dummyT})$.
|
|
519 |
||
520 |
\item $term_of_ast(\Appl{f, x@1, \ldots, x@n}) = term_of_ast(f) ~{\tt\$}~
|
|
521 |
term_of_ast(x@1)$ {\tt\$} \dots\ {\tt\$} $term_of_ast(x@n)$.
|
|
522 |
\end{itemize}
|
|
523 |
Note that {\tt Const}, {\tt Var}, {\tt Free} belong to the datatype {\tt
|
|
524 |
term} and \ttindex{dummyT} is bound to some dummy {\tt typ}, which is ignored
|
|
525 |
during type-inference. |
|
526 |
||
| 108 | 527 |
So far the outcome is still a first-order term. {\tt Abs}s and {\tt Bound}s
|
| 104 | 528 |
are introduced by parse translations associated with {\tt "_abs"} and {\tt
|
529 |
"!!"} (and any other user defined binder). |
|
530 |
||
531 |
||
532 |
\subsection{Printing of terms *}
|
|
533 |
||
534 |
When terms are prepared for printing, they are first transformed into asts |
|
535 |
via $ast_of_term$\index{ast_of_term@$ast_of_term$} (ignoring {\bf print
|
|
536 |
translations}\indexbold{print translation}):
|
|
537 |
\begin{itemize}
|
|
538 |
\item $ast_of_term(\ttfct{Const} (x, \tau)) = \Constant x$.
|
|
539 |
||
540 |
\item $ast_of_term(\ttfct{Free} (x, \tau)) = constrain (\Variable x,
|
|
541 |
\tau)$. |
|
542 |
||
543 |
\item $ast_of_term(\ttfct{Var} ((x, i), \tau)) = constrain (\Variable
|
|
544 |
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
|
|
545 |
the {\tt indexname} $(x, i)$.
|
|
546 |
||
547 |
\item $ast_of_term(\ttfct{Abs} (x, \tau, t)) = \ttfct{Appl}
|
|
548 |
\mathopen{\mtt[} \Constant \mtt{"_abs"}, constrain(\Variable x', \tau),$
|
|
549 |
$ast_of_term(t') \mathclose{\mtt]}$, where $x'$ is a version of $x$
|
|
550 |
renamed away from all names occurring in $t$, and $t'$ the body $t$ with |
|
551 |
all {\tt Bound}s referring to this {\tt Abs} replaced by $\ttfct{Free}
|
|
552 |
(x', \mtt{dummyT})$.
|
|
553 |
||
554 |
\item $ast_of_term(\ttfct{Bound} i) = \Variable \mtt{"B.}i\mtt"$. Note that
|
|
555 |
the occurrence of loose bound variables is abnormal and should never |
|
556 |
happen when printing well-typed terms. |
|
557 |
||
558 |
\item $ast_of_term(f \ttrel{\$} x@1 \ttrel{\$} \ldots \ttrel{\$} x@n) =
|
|
559 |
\ttfct{Appl} \mathopen{\mtt[} ast_of_term(f), ast_of_term(x@1), \ldots,$
|
|
560 |
$ast_of_term(x@n) \mathclose{\mtt]}$, where $f$ is not an application.
|
|
561 |
||
562 |
\item $constrain(x, \tau) = x$, if $\tau = \mtt{dummyT}$ \index{*dummyT} or
|
|
563 |
\ttindex{show_types} not set to {\tt true}.
|
|
564 |
||
565 |
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, ty}$,
|
|
566 |
where $ty$ is the ast encoding of $\tau$. That is: type constructors as |
|
567 |
{\tt Constant}s, type variables as {\tt Variable}s and type applications
|
|
568 |
as {\tt Appl}s with the head type constructor as first element.
|
|
569 |
Additionally, if \ttindex{show_sorts} is set to {\tt true}, some type
|
|
| 135 | 570 |
variables are decorated with an ast encoding of their sort. |
| 104 | 571 |
\end{itemize}
|
572 |
||
573 |
\medskip |
|
574 |
After an ast has been normalized wrt.\ the print macros, it is transformed |
|
575 |
into the final output string. The built-in {\bf print ast
|
|
576 |
translations}\indexbold{print ast translation} are essentially the reverse
|
|
577 |
ones of the parse ast translations of Figure~\ref{fig:parse_ast_tr}.
|
|
578 |
||
579 |
For the actual printing process, the names attached to grammar productions of |
|
580 |
the form $\ldots A@{p@1}^1 \ldots A@{p@n}^n \ldots \mtt{=>} c$ play a vital
|
|
581 |
role. Whenever an ast with constant head $c$, i.e.\ $\mtt"c\mtt"$ or |
|
582 |
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is encountered it is printed according to |
|
583 |
the production for $c$. This means that first the arguments $x@1$ \dots $x@n$ |
|
584 |
are printed, then put in parentheses if necessary for reasons of precedence, |
|
585 |
and finally joined to a single string, separated by the syntactic sugar of |
|
586 |
the production (denoted by ``\dots'' above). |
|
587 |
||
588 |
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the |
|
589 |
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots |
|
| 108 | 590 |
x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with
|
| 104 | 591 |
non-constant head or without a corresponding production are printed as |
592 |
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. A single |
|
593 |
$\Variable x$ is simply printed as $x$. |
|
594 |
||
| 108 | 595 |
Note that the system does {\em not\/} insert blanks automatically. They
|
| 135 | 596 |
should be part of the mixfix declaration the production has been derived |
597 |
from, if they are required to separate tokens. Mixfix declarations may also |
|
598 |
contain pretty printing annotations. |
|
| 104 | 599 |
|
600 |
||
601 |
||
602 |
\section{Mixfix declarations} \label{sec:mixfix}
|
|
603 |
||
604 |
When defining theories, the user usually declares new constants as names |
|
605 |
associated with a type followed by an optional \bfindex{mixfix annotation}.
|
|
606 |
If none of the constants are introduced with mixfix annotations, there is no |
|
607 |
concrete syntax to speak of: terms can only be abstractions or applications |
|
608 |
of the form $f(t@1, \dots, t@n)$. Since this notation quickly becomes |
|
609 |
unreadable, Isabelle supports syntax definitions in the form of unrestricted |
|
610 |
context-free \index{context-free grammar} \index{precedence grammar}
|
|
611 |
precedence grammars using mixfix annotations. |
|
612 |
||
| 108 | 613 |
Mixfix annotations describe the {\em concrete\/} syntax, a standard
|
614 |
translation into the abstract syntax and a pretty printing scheme, all in |
|
615 |
one. Isabelle syntax definitions are inspired by \OBJ's~\cite{OBJ} {\em
|
|
616 |
mixfix\/} syntax. Each mixfix annotation defines a precedence grammar |
|
617 |
production and optionally associates a constant with it. |
|
| 104 | 618 |
|
| 135 | 619 |
There is a general form of mixfix annotation and some shortcuts for common |
620 |
cases like infix operators. |
|
| 104 | 621 |
|
| 108 | 622 |
The general \bfindex{mixfix declaration} as it may occur within the {\tt
|
| 104 | 623 |
consts} section\index{consts section@{\tt consts} section} of a {\tt .thy}
|
624 |
file, specifies a constant declaration and a grammar production at the same |
|
625 |
time. It has the form {\tt $c$ ::\ "$\tau$" ("$sy$" $ps$ $p$)} and is
|
|
626 |
interpreted as follows: |
|
627 |
\begin{itemize}
|
|
628 |
\item {\tt $c$ ::\ "$\tau$"} is the actual constant declaration without any
|
|
629 |
syntactic significance. |
|
630 |
||
631 |
\item $sy$ is the right-hand side of the production, specified as a |
|
632 |
symbolic string. In general, $sy$ is of the form $\alpha@0 \_ \alpha@1 |
|
633 |
\dots \alpha@{n-1} \_ \alpha@n$, where each occurrence of \ttindex{_}
|
|
634 |
denotes an argument\index{argument!mixfix} position and the strings
|
|
635 |
$\alpha@i$ do not contain (non-escaped) {\tt _}. The $\alpha@i$ may
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
636 |
consist of delimiters\index{delimiter}, spaces\index{space (pretty
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
637 |
printing)} or \rmindex{pretty printing} annotations (see below).
|
| 104 | 638 |
|
639 |
\item $\tau$ specifies the syntactic categories of the arguments on the |
|
640 |
left-hand and of the right-hand side. Arguments may be nonterminals or |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
641 |
name tokens. If $sy$ is of the form above, $\tau$ must be a nested |
| 104 | 642 |
function type with at least $n$ argument positions, say $\tau = [\tau@1, |
643 |
\dots, \tau@n] \To \tau'$. The syntactic category of argument $i$ is |
|
644 |
derived from type $\tau@i$ (see $root_of_type$ in |
|
645 |
\S\ref{sec:basic_syntax}). The result, i.e.\ the left-hand side of the
|
|
646 |
production, is derived from type $\tau'$. Note that the $\tau@i$ and |
|
647 |
$\tau'$ may be function types. |
|
648 |
||
649 |
\item $c$ is the name of the constant associated with the production. If |
|
650 |
$c$ is the empty string {\tt ""} then this is a \rmindex{copy
|
|
651 |
production}. Otherwise, parsing an instance of the phrase $sy$ generates |
|
652 |
the ast {\tt ("$c$" $a@1$ $\ldots$ $a@n$)}, where $a@i$ is the ast
|
|
653 |
generated by parsing the $i$-th argument. |
|
654 |
||
655 |
\item $ps$ is an optional list of at most $n$ integers, say {\tt [$p@1$,
|
|
656 |
$\ldots$, $p@m$]}, where $p@i$ is the minimal \rmindex{precedence}
|
|
657 |
required of any phrase that may appear as the $i$-th argument. Missing |
|
658 |
precedences default to $0$. |
|
659 |
||
660 |
\item $p$ is the \rmindex{precedence} the of this production.
|
|
661 |
\end{itemize}
|
|
662 |
||
663 |
Precedences\index{precedence!range of} may range between $0$ and
|
|
664 |
$max_pri$\indexbold{max_pri@$max_pri$} (= 1000). If you want to ignore them,
|
|
665 |
the safest way to do so is to use the declaration {\tt $c$ ::\ "$\tau$"
|
|
666 |
("$sy$")}. The resulting production puts no precedence constraints on any of
|
|
667 |
its arguments and has maximal precedence itself. |
|
668 |
||
669 |
\begin{example}
|
|
670 |
The following theory specification contains a {\tt consts} section that
|
|
671 |
encodes the precedence grammar of Example~\ref{ex:precedence} as mixfix
|
|
672 |
declarations: |
|
673 |
\begin{ttbox}
|
|
674 |
EXP = Pure + |
|
675 |
types |
|
676 |
exp 0 |
|
677 |
arities |
|
678 |
exp :: logic |
|
679 |
consts |
|
680 |
"0" :: "exp" ("0" 9)
|
|
681 |
"+" :: "[exp, exp] => exp" ("_ + _" [0, 1] 0)
|
|
682 |
"*" :: "[exp, exp] => exp" ("_ * _" [3, 2] 2)
|
|
683 |
"-" :: "exp => exp" ("- _" [3] 3)
|
|
684 |
end |
|
685 |
\end{ttbox}
|
|
686 |
Note that the {\tt arities} declaration causes {\tt exp} to be added to the
|
|
687 |
syntax' roots. If you put the above text into a file {\tt exp.thy} and load
|
|
| 135 | 688 |
it via {\tt use_thy "EXP"}, you can run some tests:
|
| 104 | 689 |
\begin{ttbox}
|
690 |
val read_exp = Syntax.test_read (syn_of EXP.thy) "exp"; |
|
691 |
read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0"; |
|
692 |
{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
|
|
693 |
{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
|
|
694 |
{\out \vdots}
|
|
695 |
read_exp "0 + - 0 + 0"; |
|
696 |
{\out tokens: "0" "+" "-" "0" "+" "0"}
|
|
697 |
{\out raw: ("+" ("+" "0" ("-" "0")) "0")}
|
|
698 |
{\out \vdots}
|
|
699 |
\end{ttbox}
|
|
700 |
The output of \ttindex{Syntax.test_read} includes the token list ({\tt
|
|
701 |
tokens}) and the raw ast directly derived from the parse tree, ignoring parse |
|
702 |
ast translations. The rest is tracing information provided by the macro |
|
703 |
expander (see \S\ref{sec:macros}).
|
|
704 |
||
| 108 | 705 |
Executing {\tt Syntax.print_gram (syn_of EXP.thy)} reveals the actual grammar
|
706 |
productions derived from the above mixfix declarations (lots of additional |
|
707 |
information deleted): |
|
| 104 | 708 |
\begin{ttbox}
|
709 |
exp = "0" => "0" (9) |
|
710 |
exp = exp[0] "+" exp[1] => "+" (0) |
|
711 |
exp = exp[3] "*" exp[2] => "*" (2) |
|
712 |
exp = "-" exp[3] => "-" (3) |
|
713 |
\end{ttbox}
|
|
714 |
\end{example}
|
|
715 |
||
716 |
Let us now have a closer look at the structure of the string $sy$ appearing |
|
717 |
in mixfix annotations. This string specifies a list of parsing and printing |
|
718 |
directives, namely delimiters\index{delimiter},
|
|
719 |
arguments\index{argument!mixfix}, spaces\index{space (pretty printing)},
|
|
720 |
blocks of indentation\index{block (pretty printing)} and optional or forced
|
|
721 |
line breaks\index{break (pretty printing)}. These are encoded via the
|
|
722 |
following character sequences: |
|
723 |
||
724 |
\begin{description}
|
|
725 |
\item[~\ttindex_~] An argument\index{argument!mixfix} position.
|
|
726 |
||
727 |
\item[~$d$~] A \rmindex{delimiter}, i.e.\ a non-empty sequence of
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
728 |
non-special or escaped characters. Escaping a character\index{escape
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
729 |
character} means preceding it with a {\tt '} (quote). Thus you have to
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
730 |
write {\tt ''} if you really want a single quote. Delimiters may never
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
731 |
contain white space, though. |
| 104 | 732 |
|
733 |
\item[~$s$~] A non-empty sequence of spaces\index{space (pretty printing)}
|
|
734 |
for printing. |
|
735 |
||
736 |
\item[~{\ttindex($n$}~] Open a block\index{block (pretty printing)}. $n$ is
|
|
737 |
an optional sequence of digits that specifies the amount of indentation |
|
738 |
to be added when a line break occurs within the block. If {\tt(} is not
|
|
739 |
followed by a digit, the indentation defaults to $0$. |
|
740 |
||
741 |
\item[~\ttindex)~] Close a block. |
|
742 |
||
743 |
\item[~\ttindex{//}~] Force a line break\index{break (pretty printing)}.
|
|
744 |
||
745 |
\item[~\ttindex/$s$~] Allow a line break\index{break (pretty printing)}.
|
|
746 |
Spaces $s$ right after {\tt /} are only printed if the break is not
|
|
747 |
taken. |
|
748 |
\end{description}
|
|
749 |
||
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
750 |
In terms of parsing, arguments are nonterminals or name tokens and |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
751 |
delimiters are delimiters. The other directives have only significance for |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
752 |
printing. The \rmindex{pretty printing} mechanisms of Isabelle is essentially
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
753 |
the one described in \cite{paulson91}.
|
| 104 | 754 |
|
755 |
||
756 |
\subsection{Infixes}
|
|
757 |
||
758 |
Infix\index{infix} operators associating to the left or right can be declared
|
|
759 |
conveniently using \ttindex{infixl} or \ttindex{infixr}.
|
|
760 |
||
761 |
Roughly speaking, the form {\tt $c$ ::\ "$\tau$" (infixl $p$)} abbreviates:
|
|
762 |
\begin{ttbox}
|
|
763 |
"op \(c\)" ::\ "\(\tau\)" ("op \(c\)")
|
|
764 |
"op \(c\)" ::\ "\(\tau\)" ("(_ \(c\)/ _)" [\(p\), \(p + 1\)] \(p\))
|
|
765 |
\end{ttbox}
|
|
766 |
and {\tt $c$ ::\ "$\tau$" (infixr $p$)} abbreviates:
|
|
767 |
\begin{ttbox}
|
|
768 |
"op \(c\)" ::\ "\(\tau\)" ("op \(c\)")
|
|
769 |
"op \(c\)" ::\ "\(\tau\)" ("(_ \(c\)/ _)" [\(p + 1\), \(p\)] \(p\))
|
|
770 |
\end{ttbox}
|
|
771 |
||
772 |
Thus, prefixing infixes with \ttindex{op} makes them behave like ordinary
|
|
773 |
function symbols. Special characters occurring in $c$ have to be escaped as |
|
| 135 | 774 |
in delimiters. Also note that the expanded forms above would be actually |
775 |
illegal at the user level because of duplicate declarations of constants. |
|
| 104 | 776 |
|
777 |
||
778 |
\subsection{Binders}
|
|
779 |
||
780 |
A \bfindex{binder} is a variable-binding construct, such as a
|
|
781 |
\rmindex{quantifier}. The constant declaration \indexbold{*binder}
|
|
782 |
\begin{ttbox}
|
|
783 |
\(c\) ::\ "\(\tau\)" (binder "\(Q\)" \(p\)) |
|
784 |
\end{ttbox}
|
|
785 |
introduces a binder $c$ of type $\tau$, which must have the form $(\tau@1 \To |
|
786 |
\tau@2) \To \tau@3$. Its concrete syntax is $Q~x.P$. A binder is like a |
|
787 |
generalized quantifier where $\tau@1$ is the type of the bound variable $x$, |
|
788 |
$\tau@2$ the type of the body $P$, and $\tau@3$ the type of the whole term. |
|
789 |
For example $\forall$ can be declared like this: |
|
790 |
\begin{ttbox}
|
|
791 |
All :: "('a => o) => o" (binder "ALL " 10)
|
|
792 |
\end{ttbox}
|
|
793 |
This allows us to write $\forall x.P$ either as {\tt All(\%$x$.$P$)} or {\tt
|
|
794 |
ALL $x$.$P$}. When printing terms, Isabelle usually uses the latter form, but |
|
795 |
has to fall back on $\mtt{All}(P)$, if $P$ is not an abstraction.
|
|
796 |
||
797 |
Binders $c$ of type $(\sigma \To \tau) \To \tau$ can be nested; then the |
|
798 |
internal form $c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) |
|
799 |
\ldots))$ corresponds to external $Q~x@1~x@2 \ldots x@n. P$. |
|
800 |
||
801 |
\medskip |
|
802 |
The general binder declaration |
|
803 |
\begin{ttbox}
|
|
804 |
\(c\) ::\ "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)" (binder "\(Q\)" \(p\)) |
|
805 |
\end{ttbox}
|
|
806 |
is internally expanded to |
|
807 |
\begin{ttbox}
|
|
808 |
\(c\) ::\ "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)" |
|
809 |
"\(Q\)" ::\ "[idts, \(\tau@2\)] => \(\tau@3\)" ("(3\(Q\)_./ _)" \(p\))
|
|
810 |
\end{ttbox}
|
|
811 |
with $idts$ being the syntactic category for a list of $id$s optionally |
|
812 |
constrained (see Figure~\ref{fig:pure_gram}). Note that special characters in
|
|
813 |
$Q$ have to be escaped as in delimiters. |
|
814 |
||
815 |
Additionally, a parse translation\index{parse translation!for binder} for $Q$
|
|
816 |
and a print translation\index{print translation!for binder} for $c$ is
|
|
817 |
installed. These perform behind the scenes the translation between the |
|
818 |
internal and external forms. |
|
819 |
||
820 |
||
821 |
||
822 |
\section{Syntactic translations (macros)} \label{sec:macros}
|
|
823 |
||
824 |
So far we have pretended that there is a close enough relationship between |
|
825 |
concrete and abstract syntax to allow an automatic translation from one to |
|
| 108 | 826 |
the other using the constant name supplied with each non-copy production. In |
827 |
many cases this scheme is not powerful enough. Some typical examples involve |
|
| 104 | 828 |
variable binding constructs (e.g.\ {\tt ALL x:A.P} vs.\ {\tt Ball(A, \%x.P)}
|
| 108 | 829 |
or convenient notations for enumerations like finite sets, lists etc.\ (e.g.\ |
| 104 | 830 |
{\tt [x, y, z]} vs.\ {\tt Cons(x, Cons(y, Cons(z, Nil)))}).
|
831 |
||
832 |
Isabelle offers such translation facilities at two different levels, namely |
|
833 |
{\bf macros}\indexbold{macro} and {\bf translation functions}.
|
|
834 |
||
| 108 | 835 |
Macros are specified by first-order rewriting systems that operate on asts. |
| 104 | 836 |
They are usually easy to read and in most cases not very difficult to write. |
| 108 | 837 |
Unfortunately, some more obscure translations cannot be expressed as macros |
838 |
and you have to fall back on the more powerful mechanism of translation |
|
839 |
functions written in \ML. These are quite unreadable and hard to write (see |
|
840 |
\S\ref{sec:tr_funs}).
|
|
| 104 | 841 |
|
842 |
\medskip |
|
| 108 | 843 |
Let us now get started with the macro system by a simple example: |
| 104 | 844 |
|
845 |
\begin{example}~ \label{ex:set_trans}
|
|
846 |
||
847 |
\begin{ttbox}
|
|
848 |
SET = Pure + |
|
849 |
types |
|
850 |
i, o 0 |
|
851 |
arities |
|
852 |
i, o :: logic |
|
853 |
consts |
|
854 |
Trueprop :: "o => prop" ("_" 5)
|
|
855 |
Collect :: "[i, i => o] => i" |
|
856 |
"{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
|
|
857 |
Replace :: "[i, [i, i] => o] => i" |
|
858 |
"{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
|
|
859 |
Ball :: "[i, i => o] => o" |
|
860 |
"{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10)
|
|
861 |
translations |
|
862 |
"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, %x. P)"
|
|
863 |
"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, %x y. Q)"
|
|
864 |
"ALL x:A. P" == "Ball(A, %x. P)" |
|
865 |
end |
|
866 |
\end{ttbox}
|
|
867 |
||
868 |
This and the following theories are complete working examples, though they |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
869 |
are fragmentary as they contain merely syntax. The full theory definition can |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
870 |
be found in {\tt ZF/zf.thy}.
|
| 104 | 871 |
|
872 |
{\tt SET} defines constants for set comprehension ({\tt Collect}),
|
|
873 |
replacement ({\tt Replace}) and bounded universal quantification ({\tt
|
|
874 |
Ball}). Without additional syntax you would have to express $\forall x \in A. |
|
875 |
P(x)$ as {\tt Ball(A, P)}. Since this is quite awkward, we define additional
|
|
876 |
constants with appropriate concrete syntax. These constants are decorated |
|
877 |
with {\tt\at} to stress their pure syntactic purpose; they should never occur
|
|
878 |
within the final well-typed terms. Another consequence is that the user |
|
| 108 | 879 |
cannot refer to such names directly, since they are not legal identifiers. |
| 104 | 880 |
|
| 108 | 881 |
The translations cause the replacement of external forms by internal forms |
| 135 | 882 |
after parsing, and vice versa before printing of terms. |
| 104 | 883 |
\end{example}
|
884 |
||
885 |
This is only a very simple but common instance of a more powerful mechanism. |
|
886 |
As a specification of what is to be translated, it should be comprehensible |
|
887 |
without further explanations. But there are also some snags and other |
|
888 |
peculiarities that are typical for macro systems in general. The purpose of |
|
889 |
this section is to explain how Isabelle's macro system really works. |
|
890 |
||
891 |
||
892 |
\subsection{Specifying macros}
|
|
893 |
||
894 |
Basically macros are rewrite rules on asts. But unlike other macro systems of |
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
895 |
various programming languages, Isabelle's macros work in both directions. |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
896 |
Therefore a syntax contains two lists of rules: one for parsing and one for |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
897 |
printing. |
| 104 | 898 |
|
899 |
The {\tt translations} section\index{translations section@{\tt translations}
|
|
| 108 | 900 |
section} consists of a list of rule specifications of the form: |
901 |
||
902 |
\begin{center}
|
|
| 104 | 903 |
{\tt $[$ ($root$) $]$ $string$ $[$ => $|$ <= $|$ == $]$ $[$ ($root$) $]$
|
904 |
$string$}. |
|
| 108 | 905 |
\end{center}
|
| 104 | 906 |
|
907 |
This specifies a \rmindex{parse rule} ({\tt =>}) a \rmindex{print rule} ({\tt
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
908 |
<=}) or both ({\tt ==}). The two $string$s preceded by optional
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
909 |
parenthesized $root$s are the left-hand and right-hand side of the rule in |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
910 |
user-defined syntax. |
| 104 | 911 |
|
912 |
Rules are internalized wrt.\ an intermediate signature that is obtained from |
|
913 |
the parent theories' ones by adding all material of all sections preceding |
|
| 135 | 914 |
{\tt translations} in the {\tt .thy} file. Especially, new syntax defined in
|
| 104 | 915 |
{\tt consts} is already effective.
|
916 |
||
917 |
Then part of the process that transforms input strings into terms is applied: |
|
| 108 | 918 |
lexing, parsing and parse ast translations (see \S\ref{sec:asts}). Macros
|
919 |
specified in the parents are {\em not\/} expanded. Also note that the lexer
|
|
| 104 | 920 |
runs in a different mode that additionally accepts identifiers of the form |
921 |
$\_~letter~quasiletter^*$ (like {\tt _idt}, {\tt _K}). The syntactic category
|
|
922 |
to parse is specified by $root$, which defaults to {\tt logic}.
|
|
923 |
||
924 |
Finally, Isabelle tries to guess which atoms of the resulting ast of the rule |
|
925 |
should be treated as constants during matching (see below). These names are |
|
926 |
extracted from all class, type and constant declarations made so far. |
|
927 |
||
928 |
\medskip |
|
929 |
The result are two lists of translation rules in internal form, that is pairs |
|
| 135 | 930 |
of asts. They can be viewed using {\tt Syntax.print_syntax} (sections
|
931 |
\ttindex{parse_rules} and \ttindex{print_rules}). For {\tt SET} of
|
|
| 104 | 932 |
Example~\ref{ex:set_trans} these are:
|
933 |
\begin{ttbox}
|
|
934 |
parse_rules: |
|
935 |
("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P))
|
|
936 |
("{\at}Replace" y x A Q) -> ("Replace" A ("_abs" x ("_abs" y Q)))
|
|
937 |
("{\at}Ball" x A P) -> ("Ball" A ("_abs" x P))
|
|
938 |
print_rules: |
|
939 |
("Collect" A ("_abs" x P)) -> ("{\at}Collect" x A P)
|
|
940 |
("Replace" A ("_abs" x ("_abs" y Q))) -> ("{\at}Replace" y x A Q)
|
|
941 |
("Ball" A ("_abs" x P)) -> ("{\at}Ball" x A P)
|
|
942 |
\end{ttbox}
|
|
943 |
||
944 |
Note that in this notation all rules are oriented left to right. In the {\tt
|
|
945 |
translations} section, which has been split into two parts, print rules |
|
946 |
appeared right to left. |
|
947 |
||
948 |
\begin{warn}
|
|
949 |
Be careful not to choose names for variables in rules that are actually |
|
950 |
treated as constant. If in doubt, check the rules in their internal form or |
|
951 |
the section labeled {\tt consts} in the output of {\tt Syntax.print_syntax}.
|
|
952 |
\end{warn}
|
|
953 |
||
954 |
||
955 |
\subsection{Applying rules}
|
|
956 |
||
957 |
In the course of parsing and printing terms, asts are generated as an |
|
958 |
intermediate form as pictured in Figure~\ref{fig:parse_print}. These asts are
|
|
959 |
normalized wrt.\ the given lists of translation rules in a uniform manner. As |
|
| 135 | 960 |
stated earlier, asts are supposed to be first-order `terms'. The rewriting |
| 108 | 961 |
systems derived from {\tt translations} sections essentially resemble
|
962 |
traditional first-order term rewriting systems. We first examine how a single |
|
963 |
rule is applied. |
|
| 104 | 964 |
|
965 |
Let $t$ be the ast to be normalized and $(l, r)$ some translation rule. A |
|
| 108 | 966 |
subast $u$ of $t$ is called {\bf redex}\indexbold{redex (ast)} (reducible
|
967 |
expression), if it is an instance of $l$. In this case $l$ is said to {\bf
|
|
968 |
match}\indexbold{match (ast)} $u$. A redex matched by $l$ may be replaced by
|
|
969 |
the corresponding instance of $r$, thus {\bf rewriting}\index{rewrite (ast)}
|
|
970 |
the ast $t$. |
|
| 104 | 971 |
|
972 |
Matching requires some notion of {\bf place-holders}\indexbold{place-holder
|
|
973 |
(ast)} that may occur in rule patterns but not in ordinary asts, which are |
|
974 |
considered ground. Here we simply use {\tt Variable}s for this purpose.
|
|
975 |
||
976 |
More formally, the matching of $u$ by $l$ is performed as follows (the rule |
|
977 |
pattern is the second argument): \index{match (ast)@$match$ (ast)}
|
|
978 |
\begin{itemize}
|
|
979 |
\item $match(\Constant x, \Constant x) = \mbox{OK}$.
|
|
980 |
||
981 |
\item $match(\Variable x, \Constant x) = \mbox{OK}$.
|
|
982 |
||
983 |
\item $match(u, \Variable x) = \mbox{OK, bind}~x~\mbox{to}~u$.
|
|
984 |
||
985 |
\item $match(\Appl{u@1, \ldots, u@n}, \Appl{l@1, \ldots, l@n}) = match(u@1,
|
|
986 |
l@1), \ldots, match(u@n, l@n)$. |
|
987 |
||
988 |
\item $match(u, l) = \mbox{FAIL}$ in any other case.
|
|
989 |
\end{itemize}
|
|
990 |
||
991 |
This means that a {\tt Constant} pattern matches any atomic asts of the same
|
|
992 |
name, while a {\tt Variable} matches any ast. If successful, $match$ yields a
|
|
| 108 | 993 |
substitution $\sigma$ that is applied to $r$, generating the appropriate |
| 104 | 994 |
instance that replaces $u$. |
995 |
||
996 |
\medskip |
|
997 |
In order to make things simple and fast, ast rewrite rules $(l, r)$ are |
|
998 |
restricted by the following conditions: |
|
999 |
\begin{itemize}
|
|
1000 |
\item Rules have to be left linear, i.e.\ $l$ must not contain any {\tt
|
|
1001 |
Variable} more than once. |
|
1002 |
||
1003 |
\item Rules must have constant heads, i.e.\ $l = \mtt"c\mtt"$ or $l = |
|
1004 |
(\mtt"c\mtt" ~ x@1 \ldots x@n)$. |
|
1005 |
||
1006 |
\item The set of variables contained in $r$ has to be a subset of those of |
|
1007 |
$l$. |
|
1008 |
\end{itemize}
|
|
1009 |
||
1010 |
\medskip |
|
| 108 | 1011 |
Having first-order matching in mind, the second case of $match$ may look a |
| 104 | 1012 |
bit odd. But this is exactly the place, where {\tt Variable}s of non-rule
|
1013 |
asts behave like {\tt Constant}s. The deeper meaning of this is related with
|
|
| 135 | 1014 |
asts being very `primitive' in some sense, ignorant of the underlying |
1015 |
`semantics', not far removed from parse trees. At this level it is not yet |
|
| 104 | 1016 |
known, which $id$s will become constants, bounds, frees, types or classes. As |
1017 |
$ast_of_pt$ (see \S\ref{sec:asts}) shows, former parse tree heads appear in
|
|
1018 |
asts as {\tt Constant}s, while $id$s, $var$s, $tfree$s and $tvar$s become
|
|
1019 |
{\tt Variable}s.
|
|
1020 |
||
1021 |
This is at variance with asts generated from terms before printing (see |
|
1022 |
$ast_of_term$ in \S\ref{sec:asts}), where all constants and type constructors
|
|
1023 |
become {\tt Constant}s.
|
|
1024 |
||
1025 |
\begin{warn}
|
|
1026 |
This means asts may contain quite a messy mixture of {\tt Variable}s and {\tt
|
|
1027 |
Constant}s, which is insignificant at macro level because $match$ treats them |
|
1028 |
alike. |
|
1029 |
\end{warn}
|
|
1030 |
||
1031 |
Because of this behaviour, different kinds of atoms with the same name are |
|
1032 |
indistinguishable, which may make some rules prone to misbehaviour. Regard |
|
1033 |
the following fragmentary example: |
|
1034 |
\begin{ttbox}
|
|
1035 |
types |
|
1036 |
Nil 0 |
|
1037 |
consts |
|
1038 |
Nil :: "'a list" |
|
1039 |
"[]" :: "'a list" ("[]")
|
|
1040 |
translations |
|
1041 |
"[]" == "Nil" |
|
1042 |
\end{ttbox}
|
|
1043 |
Then the term {\tt Nil} will be printed as {\tt []}, just as expected. What
|
|
1044 |
happens with \verb|%Nil.t| or {\tt x::Nil} is left as an exercise.
|
|
1045 |
||
1046 |
||
1047 |
\subsection{Rewriting strategy}
|
|
1048 |
||
1049 |
When normalizing an ast by repeatedly applying translation rules until no |
|
1050 |
more rule is applicable, there are in each step two choices: which rule to |
|
1051 |
apply next, and which redex to reduce. |
|
1052 |
||
1053 |
We could assume that the user always supplies terminating and confluent |
|
1054 |
rewriting systems, but this would often complicate things unnecessarily. |
|
1055 |
Therefore, we reveal part of the actual rewriting strategy: The normalizer |
|
1056 |
always applies the first matching rule reducing an unspecified redex chosen |
|
1057 |
first. |
|
1058 |
||
| 135 | 1059 |
Thereby, `first rule' is roughly speaking meant wrt.\ the appearance of the |
| 104 | 1060 |
rules in the {\tt translations} sections. But this is more tricky than it
|
1061 |
seems: If a given theory is {\em extended}, new rules are simply appended to
|
|
1062 |
the end. But if theories are {\em merged}, it is not clear which list of
|
|
1063 |
rules has priority over the other. In fact the merge order is left |
|
1064 |
unspecified. This shouldn't cause any problems in practice, since |
|
1065 |
translations of different theories usually do not overlap. {\tt
|
|
1066 |
Syntax.print_syntax} shows the rules in their internal order. |
|
1067 |
||
1068 |
\medskip |
|
1069 |
You can watch the normalization of asts during parsing and printing by |
|
| 108 | 1070 |
setting \ttindex{Syntax.trace_norm_ast} to {\tt true}. An alternative is the
|
1071 |
use of \ttindex{Syntax.test_read}, which is always in trace mode. The
|
|
1072 |
information displayed when tracing\index{tracing (ast)} includes: the ast
|
|
1073 |
before normalization ({\tt pre}), redexes with results ({\tt rewrote}), the
|
|
1074 |
normal form finally reached ({\tt post}) and some statistics ({\tt
|
|
1075 |
normalize}). If tracing is off, \ttindex{Syntax.stat_norm_ast} can be set to
|
|
1076 |
{\tt true} in order to enable printing of the normal form and statistics
|
|
1077 |
only. |
|
| 104 | 1078 |
|
1079 |
||
1080 |
\subsection{More examples}
|
|
1081 |
||
1082 |
Let us first reconsider Example~\ref{ex:set_trans}, which is concerned with
|
|
1083 |
variable binding constructs. |
|
1084 |
||
1085 |
There is a minor disadvantage over an implementation via translation |
|
1086 |
functions (as done for binders): |
|
1087 |
||
1088 |
\begin{warn}
|
|
1089 |
If \ttindex{eta_contract} is set to {\tt true}, terms will be
|
|
| 108 | 1090 |
$\eta$-contracted {\em before\/} the ast rewriter sees them. Thus some
|
| 104 | 1091 |
abstraction nodes needed for print rules to match may get lost. E.g.\ |
1092 |
\verb|Ball(A, %x. P(x))| is contracted to {\tt Ball(A, P)}, the print rule is
|
|
1093 |
no longer applicable and the output will be {\tt Ball(A, P)}. Note that
|
|
| 108 | 1094 |
$\eta$-expansion via macros is {\em not\/} possible.
|
| 104 | 1095 |
\end{warn}
|
1096 |
||
1097 |
\medskip |
|
1098 |
Another common trap are meta constraints. If \ttindex{show_types} is set to
|
|
1099 |
{\tt true}, bound variables will be decorated by their meta types at the
|
|
1100 |
binding place (but not at occurrences in the body). E.g.\ matching with |
|
1101 |
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
|
|
1102 |
"i")} rather than only {\tt y}. Ast rewriting will cause the constraint to
|
|
1103 |
appear in the external form, say \verb|{y::i:A::i. P::o}|. Therefore your
|
|
1104 |
syntax should be ready for such constraints to be re-read. This is the case |
|
1105 |
in our example, because of the category {\tt idt} of the first argument.
|
|
1106 |
||
1107 |
\begin{warn}
|
|
1108 |
Choosing {\tt id} instead of {\tt idt} is a very common error, especially
|
|
1109 |
since it appears in former versions of most of Isabelle's object-logics. |
|
1110 |
\end{warn}
|
|
1111 |
||
1112 |
\begin{example} \label{ex:finset_trans}
|
|
1113 |
This example demonstrates the use of recursive macros to implement a |
|
1114 |
convenient notation for finite sets. |
|
1115 |
\begin{ttbox}
|
|
1116 |
FINSET = SET + |
|
1117 |
types |
|
1118 |
is 0 |
|
1119 |
consts |
|
1120 |
"" :: "i => is" ("_")
|
|
1121 |
"{\at}Enum" :: "[i, is] => is" ("_,/ _")
|
|
1122 |
empty :: "i" ("{\ttlbrace}{\ttrbrace}")
|
|
1123 |
insert :: "[i, i] => i" |
|
1124 |
"{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}")
|
|
1125 |
translations |
|
1126 |
"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"
|
|
1127 |
"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"
|
|
1128 |
end |
|
1129 |
\end{ttbox}
|
|
1130 |
||
1131 |
Finite sets are internally built up by {\tt empty} and {\tt insert}.
|
|
1132 |
Externally we would like to see \verb|{x, y, z}| rather than {\tt insert(x,
|
|
1133 |
insert(y, insert(z, empty)))}. |
|
1134 |
||
1135 |
First we define the generic syntactic category {\tt is} for one or more
|
|
1136 |
objects of type {\tt i} separated by commas (including breaks for pretty
|
|
1137 |
printing). The category has to be declared as a 0-place type constructor, but |
|
1138 |
without {\tt arities} declaration. Hence {\tt is} is not a logical type, no
|
|
1139 |
default productions will be added, and we can cook our own syntax for {\tt
|
|
1140 |
is} (first two lines of {\tt consts} section). If we had needed generic
|
|
1141 |
enumerations of type $\alpha$ (i.e.\ {\tt logic}), we could have used the
|
|
1142 |
predefined category \ttindex{args} and skipped this part altogether.
|
|
1143 |
||
1144 |
Next follows {\tt empty}, which is already equipped with its syntax
|
|
1145 |
\verb|{}|, and {\tt insert} without concrete syntax. The syntactic constant
|
|
1146 |
{\tt\at Finset} provides concrete syntax for enumerations of {\tt i} enclosed
|
|
1147 |
in curly braces. Remember that a pair of parentheses specifies a block of |
|
| 108 | 1148 |
indentation for pretty printing. The category {\tt is} can later be reused
|
| 104 | 1149 |
for other enumerations like lists or tuples. |
1150 |
||
| 108 | 1151 |
The translations may look a bit odd at first sight, but rules can only be |
1152 |
fully understood in their internal forms, which are: |
|
| 104 | 1153 |
\begin{ttbox}
|
1154 |
parse_rules: |
|
1155 |
("{\at}Finset" ("{\at}Enum" x xs)) -> ("insert" x ("{\at}Finset" xs))
|
|
1156 |
("{\at}Finset" x) -> ("insert" x "empty")
|
|
1157 |
print_rules: |
|
1158 |
("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs))
|
|
1159 |
("insert" x "empty") -> ("{\at}Finset" x)
|
|
1160 |
\end{ttbox}
|
|
1161 |
This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
|
|
1162 |
two elements, binding the first to {\tt x} and the rest to {\tt xs}.
|
|
1163 |
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration. Note that
|
|
1164 |
the parse rules only work in this order. |
|
1165 |
||
1166 |
\medskip |
|
1167 |
Some rules are prone to misbehaviour, as |
|
1168 |
\verb|%empty insert. insert(x, empty)| shows, which is printed as |
|
1169 |
\verb|%empty insert. {x}|. This problem arises, because the ast rewriter
|
|
1170 |
cannot discern constants, frees, bounds etc.\ and looks only for names of |
|
1171 |
atoms. |
|
1172 |
||
1173 |
Thus the names of {\tt Constant}s occurring in the (internal) left-hand side
|
|
| 135 | 1174 |
of translation rules should be regarded as `reserved keywords'. It is good |
| 104 | 1175 |
practice to choose non-identifiers here like {\tt\at Finset} or sufficiently
|
1176 |
long and strange names. |
|
1177 |
\end{example}
|
|
1178 |
||
1179 |
\begin{example} \label{ex:prod_trans}
|
|
1180 |
One of the well-formedness conditions for ast rewrite rules stated earlier |
|
1181 |
implies that you can never introduce new {\tt Variable}s on the right-hand
|
|
1182 |
side. Something like \verb|"K(B)" => "%x. B"| is illegal and could cause |
|
1183 |
variable capturing, if it were allowed. In such cases you usually have to |
|
1184 |
fall back on translation functions. But there is a trick that makes things |
|
| 108 | 1185 |
quite readable in some cases: {\em calling parse translations by parse
|
| 104 | 1186 |
rules}. This is demonstrated here. |
1187 |
\begin{ttbox}
|
|
1188 |
PROD = FINSET + |
|
1189 |
consts |
|
1190 |
Pi :: "[i, i => i] => i" |
|
1191 |
"{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10)
|
|
1192 |
"{\at}->" :: "[i, i] => i" ("(_ ->/ _)" [51, 50] 50)
|
|
1193 |
translations |
|
1194 |
"PROD x:A. B" => "Pi(A, %x. B)" |
|
1195 |
"A -> B" => "Pi(A, _K(B))" |
|
1196 |
end |
|
1197 |
ML |
|
1198 |
val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
|
|
1199 |
\end{ttbox}
|
|
1200 |
||
1201 |
{\tt Pi} is an internal constant for constructing dependent products. Two
|
|
1202 |
external forms exist: {\tt PROD x:A.B}, the general case, and {\tt A -> B},
|
|
1203 |
an abbreviation for \verb|Pi(A, %x.B)| with {\tt B} not actually depending on
|
|
1204 |
{\tt x}.
|
|
1205 |
||
1206 |
Now the second parse rule is where the trick comes in: {\tt _K(B)} is
|
|
1207 |
introduced during ast rewriting, which later becomes \verb|%x.B| due to a |
|
1208 |
parse translation associated with \ttindex{_K}. Note that a leading {\tt _}
|
|
1209 |
in $id$s is allowed in translation rules, but not in ordinary terms. This |
|
| 135 | 1210 |
special behaviour of the lexer is very useful for `forging' asts containing |
1211 |
names that are not directly accessible normally. |
|
1212 |
||
1213 |
Unfortunately, there is no such trick for printing, so we have to add a {\tt
|
|
1214 |
ML} section for the print translation \ttindex{dependent_tr'}.
|
|
| 104 | 1215 |
|
1216 |
The parse translation for {\tt _K} is already installed in Pure, and {\tt
|
|
1217 |
dependent_tr'} is exported by the syntax module for public use. See |
|
1218 |
\S\ref{sec:tr_funs} for more of the arcane lore of translation functions.
|
|
1219 |
\end{example}
|
|
1220 |
||
1221 |
||
1222 |
||
1223 |
\section{Translation functions *} \label{sec:tr_funs}
|
|
1224 |
||
1225 |
This section is about the remaining translation mechanism which enables the |
|
1226 |
designer of theories to do almost everything with terms or asts during the |
|
| 108 | 1227 |
parsing or printing process, by writing \ML-functions. The logic \LK\ is a |
1228 |
good example of a quite sophisticated use of this to transform between |
|
| 104 | 1229 |
internal and external representations of associative sequences. The high |
1230 |
level macro system described in \S\ref{sec:macros} fails here completely.
|
|
1231 |
||
1232 |
\begin{warn}
|
|
1233 |
A full understanding of the matters presented here requires some familiarity |
|
1234 |
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
|
|
1235 |
{\tt Syntax.ast} and the encodings of types and terms as such at the various
|
|
1236 |
stages of the parsing or printing process. You probably do not really want to |
|
1237 |
use translation functions at all! |
|
1238 |
\end{warn}
|
|
1239 |
||
1240 |
As already mentioned in \S\ref{sec:asts}, there are four kinds of translation
|
|
1241 |
functions. All such functions are associated with a name which specifies an |
|
1242 |
ast's or term's head invoking that function. Such names can be (logical or |
|
1243 |
syntactic) constants or type constructors. |
|
1244 |
||
1245 |
{\tt Syntax.print_syntax} displays the sets of names associated with the
|
|
1246 |
translation functions of a {\tt Syntax.syntax} under
|
|
1247 |
\ttindex{parse_ast_translation}, \ttindex{parse_translation},
|
|
1248 |
\ttindex{print_translation} and \ttindex{print_ast_translation}. The user can
|
|
1249 |
add new ones via the {\tt ML} section\index{ML section@{\tt ML} section} of a
|
|
1250 |
{\tt .thy} file. But there may never be more than one function of the same
|
|
1251 |
kind per name. |
|
1252 |
||
1253 |
\begin{warn}
|
|
1254 |
Conceptually, the {\tt ML} section should appear between {\tt consts} and
|
|
1255 |
{\tt translations}, i.e.\ newly installed translation functions are already
|
|
1256 |
effective when macros and logical rules are parsed. {\tt ML} has to be the
|
|
1257 |
last section because the {\tt .thy} file parser is unable to detect the end
|
|
1258 |
of \ML\ code in another way than by end-of-file. |
|
1259 |
\end{warn}
|
|
1260 |
||
1261 |
All text of the {\tt ML} section is simply copied verbatim into the \ML\ file
|
|
1262 |
generated from a {\tt .thy} file. Definitions made here by the user become
|
|
1263 |
components of a \ML\ structure of the same name as the theory to be created. |
|
1264 |
Therefore local things should be declared within {\tt local}. The following
|
|
1265 |
special \ML\ values, which are all optional, serve as the interface for the |
|
1266 |
installation of user defined translation functions. |
|
1267 |
||
1268 |
\begin{ttbox}
|
|
1269 |
val parse_ast_translation: (string * (ast list -> ast)) list |
|
1270 |
val parse_translation: (string * (term list -> term)) list |
|
1271 |
val print_translation: (string * (term list -> term)) list |
|
1272 |
val print_ast_translation: (string * (ast list -> ast)) list |
|
1273 |
\end{ttbox}
|
|
1274 |
||
1275 |
The basic idea behind all four kinds of functions is relatively simple (see |
|
1276 |
also Figure~\ref{fig:parse_print}): Whenever --- during the transformations
|
|
1277 |
between parse trees, asts and terms --- a combination of the form |
|
1278 |
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function $f$ |
|
1279 |
of appropriate kind exists for $c$, the result will be $f \mtt[ x@1, \ldots, |
|
1280 |
x@n \mtt]$. Thereby, $x@1, \ldots, x@n$ (with $n \ge 0$) are asts for ast |
|
| 135 | 1281 |
translations and terms for term translations. A `combination' at ast level is |
| 104 | 1282 |
of the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots, x@n}$, and at
|
1283 |
term level $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttrel{\$}
|
|
1284 |
x@1 \ttrel{\$} \dots \ttrel{\$} x@n$.
|
|
1285 |
||
1286 |
\medskip |
|
1287 |
Translation functions at ast level differ from those at term level only in |
|
1288 |
the same way, as asts and terms differ. Terms, being more complex and more |
|
1289 |
specific, allow more sophisticated transformations (typically involving |
|
1290 |
abstractions and bound variables). |
|
1291 |
||
| 108 | 1292 |
On the other hand, {\em parse\/} (ast) translations differ from {\em print\/}
|
| 104 | 1293 |
(ast) translations more fundamentally: |
1294 |
\begin{description}
|
|
1295 |
\item[Parse (ast) translations] are applied bottom-up, i.e.\ the arguments |
|
1296 |
supplied ($x@1, \ldots, x@n$ above) are already in translated form. |
|
1297 |
Additionally, they may not fail, exceptions are re-raised after printing |
|
| 135 | 1298 |
an error message. |
| 104 | 1299 |
|
1300 |
\item[Print (ast) translations] are applied top-down, i.e.\ supplied with |
|
1301 |
arguments that are partly still in internal form. The result is again fed |
|
1302 |
into the translation machinery as a whole. Therefore a print (ast) |
|
1303 |
translation should not introduce as head a constant of the same name that |
|
1304 |
invoked it in the first place. Alternatively, exception \ttindex{Match}
|
|
1305 |
may be raised, indicating failure of translation. |
|
1306 |
\end{description}
|
|
1307 |
||
1308 |
Another difference between the parsing and the printing process is, which |
|
1309 |
atoms are {\tt Constant}s or {\tt Const}s, i.e.\ able to invoke translation
|
|
1310 |
functions. |
|
1311 |
||
1312 |
For parse ast translations only former parse tree heads are {\tt Constant}s
|
|
1313 |
(see also $ast_of_pt$ in \S\ref{sec:asts}). These and additionally introduced
|
|
1314 |
{\tt Constant}s (e.g.\ by macros), become {\tt Const}s for parse translations
|
|
1315 |
(see also $term_of_ast$ in \S\ref{sec:asts}).
|
|
1316 |
||
1317 |
The situation is slightly different, when terms are prepared for printing, |
|
1318 |
since the role of atoms is known. Initially, all logical constants and type |
|
1319 |
constructors may invoke print translations. New constants may be introduced |
|
1320 |
by these or by macros, able to invoke parse ast translations. |
|
1321 |
||
1322 |
||
1323 |
\subsection{A simple example *}
|
|
1324 |
||
1325 |
Presenting a simple and useful example of translation functions is not that |
|
1326 |
easy, since the macro system is sufficient for most simple applications. By |
|
1327 |
convention, translation functions always have names ending with {\tt
|
|
1328 |
_ast_tr}, {\tt _tr}, {\tt _tr'} or {\tt _ast_tr'}. You may look for such
|
|
1329 |
names in the sources of Pure Isabelle for more examples. |
|
1330 |
||
1331 |
\begin{example} \label{ex:tr_funs}
|
|
1332 |
||
1333 |
We continue Example~\ref{ex:prod_trans} by presenting the \ML\ sources of the
|
|
1334 |
parse translation for \ttindex{_K} and the print translation
|
|
1335 |
\ttindex{dependent_tr'}:
|
|
1336 |
\begin{ttbox}
|
|
1337 |
(* nondependent abstraction *) |
|
1338 |
fun k_tr (*"_K"*) [t] = Abs ("x", dummyT, incr_boundvars 1 t)
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1339 |
| k_tr (*"_K"*) ts = raise TERM("k_tr",ts);
|
| 104 | 1340 |
|
1341 |
(* dependent / nondependent quantifiers *) |
|
1342 |
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) = |
|
1343 |
if 0 mem (loose_bnos B) then |
|
1344 |
let val (x', B') = variant_abs (x, dummyT, B); |
|
1345 |
in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts) |
|
1346 |
end |
|
1347 |
else list_comb (Const (r, dummyT) $ A $ B, ts) |
|
1348 |
| dependent_tr' _ _ = raise Match; |
|
1349 |
\end{ttbox}
|
|
|
188
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1350 |
This text is taken from the Pure sources. Ordinary user translations |
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1351 |
would typically appear within the {\tt ML} section of the {\tt .thy} file. To
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1352 |
link {\tt k_tr} into the parser requires the definition
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1353 |
\begin{ttbox}
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1354 |
val parse_translation = [("_K",k_tr)];
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1355 |
\end{ttbox}
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1356 |
For {\tt k_tr} this is not necessary, as it is built-in, but user-defined
|
|
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
nipkow
parents:
142
diff
changeset
|
1357 |
translation functions are added in exactly the same way. \medskip |
| 104 | 1358 |
|
1359 |
If {\tt k_tr} is called with exactly one argument $t$, it creates a new {\tt
|
|
1360 |
Abs} node with a body derived from $t$: loose {\tt Bound}s, i.e.\ those
|
|
1361 |
referring to outer {\tt Abs}s, are incremented using
|
|
1362 |
\ttindex{incr_boundvars}. This and many other basic term manipulation
|
|
1363 |
functions are defined in {\tt Pure/term.ML}, the comments there being in most
|
|
1364 |
cases the only documentation. |
|
1365 |
||
1366 |
Since terms fed into parse translations are not yet typed, the type of the |
|
1367 |
bound variable in the new {\tt Abs} is simply {\tt dummyT}.
|
|
1368 |
||
1369 |
\medskip |
|
1370 |
The argument $(q, r)$ for {\tt dependent_tr'} is supplied already during the
|
|
| 108 | 1371 |
installation within an {\tt ML} section. This yields a parse translation that
|
| 104 | 1372 |
transforms something like $c(A, \mtt{Abs}(x, T, B), t@1, \ldots, t@n)$ into
|
1373 |
$q(x', A, B', t@1, \ldots, t@n)$ or $r(A, B, t@1, \ldots, t@n)$. The latter |
|
1374 |
form, if $B$ does not actually depend on $x$. This is checked using |
|
| 108 | 1375 |
\ttindex{loose_bnos}, yet another function of {\tt Pure/term.ML}. Note that
|
1376 |
$x'$ is a version of $x$ renamed away from all names in $B$, and $B'$ the |
|
1377 |
body $B$ with {\tt Bound}s referring to our {\tt Abs} node replaced by
|
|
1378 |
$\ttfct{Free} (x', \mtt{dummyT})$.
|
|
| 104 | 1379 |
|
1380 |
We have to be more careful with types here. While types of {\tt Const}s are
|
|
1381 |
completely ignored, type constraints may be printed for some {\tt Free}s and
|
|
| 135 | 1382 |
{\tt Var}s (if \ttindex{show_types} is set to {\tt true}). Variables of type
|
1383 |
\ttindex{dummyT} are never printed with constraint, though. Thus, a
|
|
| 104 | 1384 |
constraint of $x'$ may only appear at its binding place, since {\tt Free}s of
|
1385 |
$B'$ replacing the appropriate {\tt Bound}s of $B$ via \ttindex{variant_abs}
|
|
| 135 | 1386 |
have all type {\tt dummyT}. \end{example}
|
| 104 | 1387 |
|
1388 |
||
1389 |
||
1390 |
\section{Example: some minimal logics} \label{sec:min_logics}
|
|
1391 |
||
1392 |
This concluding section presents some examples that are very simple from a |
|
1393 |
syntactic point of view. They should rather demonstrate how to define new |
|
1394 |
object-logics from scratch. In particular we need to say how an object-logic |
|
1395 |
syntax is hooked up to the meta-logic. Since all theorems must conform to the |
|
1396 |
syntax for $prop$ (see Figure~\ref{fig:pure_gram}), that syntax has to be
|
|
1397 |
extended with the object-level syntax. Assume that the syntax of your |
|
1398 |
object-logic defines a category $o$ of formulae. These formulae can now |
|
1399 |
appear in axioms and theorems wherever $prop$ does if you add the production |
|
1400 |
\[ prop ~=~ o. \] |
|
1401 |
More precisely, you need a coercion from formulae to propositions: |
|
1402 |
\begin{ttbox}
|
|
1403 |
Base = Pure + |
|
1404 |
types |
|
1405 |
o 0 |
|
1406 |
arities |
|
1407 |
o :: logic |
|
1408 |
consts |
|
1409 |
Trueprop :: "o => prop" ("_" 5)
|
|
1410 |
end |
|
1411 |
\end{ttbox}
|
|
1412 |
The constant {\tt Trueprop} (the name is arbitrary) acts as an invisible
|
|
1413 |
coercion function. Assuming this definition resides in a file {\tt base.thy},
|
|
| 135 | 1414 |
you have to load it with the command {\tt use_thy "Base"}.
|
| 104 | 1415 |
|
| 108 | 1416 |
One of the simplest nontrivial logics is {\em minimal logic\/} of
|
1417 |
implication. Its definition in Isabelle needs no advanced features but |
|
1418 |
illustrates the overall mechanism quite nicely: |
|
| 104 | 1419 |
\begin{ttbox}
|
1420 |
Hilbert = Base + |
|
1421 |
consts |
|
1422 |
"-->" :: "[o, o] => o" (infixr 10) |
|
1423 |
rules |
|
1424 |
K "P --> Q --> P" |
|
1425 |
S "(P --> Q --> R) --> (P --> Q) --> P --> R" |
|
1426 |
MP "[| P --> Q; P |] ==> Q" |
|
1427 |
end |
|
1428 |
\end{ttbox}
|
|
1429 |
After loading this definition you can start to prove theorems in this logic: |
|
1430 |
\begin{ttbox}
|
|
1431 |
goal Hilbert.thy "P --> P"; |
|
1432 |
{\out Level 0}
|
|
1433 |
{\out P --> P}
|
|
1434 |
{\out 1. P --> P}
|
|
1435 |
by (resolve_tac [Hilbert.MP] 1); |
|
1436 |
{\out Level 1}
|
|
1437 |
{\out P --> P}
|
|
1438 |
{\out 1. ?P --> P --> P}
|
|
1439 |
{\out 2. ?P}
|
|
1440 |
by (resolve_tac [Hilbert.MP] 1); |
|
1441 |
{\out Level 2}
|
|
1442 |
{\out P --> P}
|
|
1443 |
{\out 1. ?P1 --> ?P --> P --> P}
|
|
1444 |
{\out 2. ?P1}
|
|
1445 |
{\out 3. ?P}
|
|
1446 |
by (resolve_tac [Hilbert.S] 1); |
|
1447 |
{\out Level 3}
|
|
1448 |
{\out P --> P}
|
|
1449 |
{\out 1. P --> ?Q2 --> P}
|
|
1450 |
{\out 2. P --> ?Q2}
|
|
1451 |
by (resolve_tac [Hilbert.K] 1); |
|
1452 |
{\out Level 4}
|
|
1453 |
{\out P --> P}
|
|
1454 |
{\out 1. P --> ?Q2}
|
|
1455 |
by (resolve_tac [Hilbert.K] 1); |
|
1456 |
{\out Level 5}
|
|
1457 |
{\out P --> P}
|
|
1458 |
{\out No subgoals!}
|
|
1459 |
\end{ttbox}
|
|
1460 |
As you can see, this Hilbert-style formulation of minimal logic is easy to |
|
1461 |
define but difficult to use. The following natural deduction formulation is |
|
1462 |
far preferable: |
|
1463 |
\begin{ttbox}
|
|
1464 |
MinI = Base + |
|
1465 |
consts |
|
1466 |
"-->" :: "[o, o] => o" (infixr 10) |
|
1467 |
rules |
|
1468 |
impI "(P ==> Q) ==> P --> Q" |
|
1469 |
impE "[| P --> Q; P |] ==> Q" |
|
1470 |
end |
|
1471 |
\end{ttbox}
|
|
1472 |
Note, however, that although the two systems are equivalent, this fact cannot |
|
1473 |
be proved within Isabelle: {\tt S} and {\tt K} can be derived in {\tt MinI}
|
|
1474 |
(exercise!), but {\tt impI} cannot be derived in {\tt Hilbert}. The reason is
|
|
| 108 | 1475 |
that {\tt impI} is only an {\em admissible\/} rule in {\tt Hilbert},
|
1476 |
something that can only be shown by induction over all possible proofs in |
|
1477 |
{\tt Hilbert}.
|
|
| 104 | 1478 |
|
1479 |
It is a very simple matter to extend minimal logic with falsity: |
|
1480 |
\begin{ttbox}
|
|
1481 |
MinIF = MinI + |
|
1482 |
consts |
|
1483 |
False :: "o" |
|
1484 |
rules |
|
1485 |
FalseE "False ==> P" |
|
1486 |
end |
|
1487 |
\end{ttbox}
|
|
1488 |
On the other hand, we may wish to introduce conjunction only: |
|
1489 |
\begin{ttbox}
|
|
1490 |
MinC = Base + |
|
1491 |
consts |
|
1492 |
"&" :: "[o, o] => o" (infixr 30) |
|
1493 |
rules |
|
1494 |
conjI "[| P; Q |] ==> P & Q" |
|
1495 |
conjE1 "P & Q ==> P" |
|
1496 |
conjE2 "P & Q ==> Q" |
|
1497 |
end |
|
1498 |
\end{ttbox}
|
|
1499 |
And if we want to have all three connectives together, we define: |
|
1500 |
\begin{ttbox}
|
|
1501 |
MinIFC = MinIF + MinC |
|
1502 |
\end{ttbox}
|
|
1503 |
Now we can prove mixed theorems like |
|
1504 |
\begin{ttbox}
|
|
1505 |
goal MinIFC.thy "P & False --> Q"; |
|
1506 |
by (resolve_tac [MinI.impI] 1); |
|
1507 |
by (dresolve_tac [MinC.conjE2] 1); |
|
1508 |
by (eresolve_tac [MinIF.FalseE] 1); |
|
1509 |
\end{ttbox}
|
|
1510 |
Try this as an exercise! |
|
1511 |