author  wenzelm 
Fri, 10 Nov 2000 19:02:37 +0100  
changeset 10432  3dfbc913d184 
parent 8136  8c65f3ca13f2 
child 11621  a19bc891e4bf 
permissions  rwrr 
320  1 
%% $Id$ 
2 
\chapter{Defining Logics} \label{DefiningLogics} 

3 
This chapter explains how to define new formal systems  in particular, 

4 
their concrete syntax. While Isabelle can be regarded as a theorem prover 

5 
for set theory, higherorder logic or the sequent calculus, its 

6 
distinguishing feature is support for the definition of new logics. 

7 

8 
Isabelle logics are hierarchies of theories, which are described and 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

9 
illustrated in 
320  10 
\iflabelundefined{sec:definingtheories}{{\em Introduction to Isabelle}}% 
11 
{\S\ref{sec:definingtheories}}. That material, together with the theory 

12 
files provided in the examples directories, should suffice for all simple 

13 
applications. The easiest way to define a new theory is by modifying a 

14 
copy of an existing theory. 

15 

16 
This chapter documents the metalogic syntax, mixfix declarations and 

17 
pretty printing. The extended examples in \S\ref{sec:min_logics} 

18 
demonstrate the logical aspects of the definition of theories. 

19 

20 

21 
\section{Priority grammars} \label{sec:priority_grammars} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

22 
\index{priority grammars(} 
320  23 

24 
A contextfree grammar contains a set of {\bf nonterminal symbols}, a set of 

25 
{\bf terminal symbols} and a set of {\bf productions}\index{productions}. 

26 
Productions have the form ${A=\gamma}$, where $A$ is a nonterminal and 

27 
$\gamma$ is a string of terminals and nonterminals. One designated 

28 
nonterminal is called the {\bf start symbol}. The language defined by the 

29 
grammar consists of all strings of terminals that can be derived from the 

30 
start symbol by applying productions as rewrite rules. 

31 

32 
The syntax of an Isabelle logic is specified by a {\bf priority 

33 
grammar}.\index{priorities} Each nonterminal is decorated by an integer 

34 
priority, as in~$A^{(p)}$. A nonterminal $A^{(p)}$ in a derivation may be 

35 
rewritten using a production $A^{(q)} = \gamma$ only if~$p \le q$. Any 

36 
priority grammar can be translated into a normal context free grammar by 

37 
introducing new nonterminals and productions. 

38 

39 
Formally, a set of context free productions $G$ induces a derivation 

40 
relation $\longrightarrow@G$. Let $\alpha$ and $\beta$ denote strings of 

41 
terminal or nonterminal symbols. Then 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

42 
\[ \alpha\, A^{(p)}\, \beta ~\longrightarrow@G~ \alpha\,\gamma\,\beta \] 
320  43 
if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$p \le q$. 
44 

45 
The following simple grammar for arithmetic expressions demonstrates how 

46 
binding power and associativity of operators can be enforced by priorities. 

47 
\begin{center} 

48 
\begin{tabular}{rclr} 

49 
$A^{(9)}$ & = & {\tt0} \\ 

50 
$A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\ 

51 
$A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\ 

52 
$A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\ 

53 
$A^{(3)}$ & = & {\tt} $A^{(3)}$ 

54 
\end{tabular} 

55 
\end{center} 

56 
The choice of priorities determines that {\tt } binds tighter than {\tt *}, 

57 
which binds tighter than {\tt +}. Furthermore {\tt +} associates to the 

58 
left and {\tt *} to the right. 

59 

60 
For clarity, grammars obey these conventions: 

61 
\begin{itemize} 

62 
\item All priorities must lie between~0 and \ttindex{max_pri}, which is a 

63 
some fixed integer. Sometimes {\tt max_pri} is written as $\infty$. 

64 
\item Priority 0 on the righthand side and priority \ttindex{max_pri} on 

65 
the lefthand side may be omitted. 

66 
\item The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$; the 

67 
priority of the lefthand side actually appears in a column on the far 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

68 
right. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

69 
\item Alternatives are separated by~$$. 
320  70 
\item Repetition is indicated by dots~(\dots) in an informal but obvious 
71 
way. 

72 
\end{itemize} 

73 

74 
Using these conventions and assuming $\infty=9$, the grammar 

75 
takes the form 

76 
\begin{center} 

77 
\begin{tabular}{rclc} 

78 
$A$ & = & {\tt0} & \hspace*{4em} \\ 

79 
& $$ & {\tt(} $A$ {\tt)} \\ 

80 
& $$ & $A$ {\tt+} $A^{(1)}$ & (0) \\ 

81 
& $$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\ 

82 
& $$ & {\tt} $A^{(3)}$ & (3) 

83 
\end{tabular} 

84 
\end{center} 

85 
\index{priority grammars)} 

86 

87 

4597
a0bdee64194c
Fixed a lot of overfull and underfull lines (hboxes)
paulson
parents:
4543
diff
changeset

88 
\begin{figure}\small 
320  89 
\begin{center} 
90 
\begin{tabular}{rclc} 

711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

91 
$any$ &=& $prop$ ~~$$~~ $logic$ \\\\ 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

92 
$prop$ &=& {\tt(} $prop$ {\tt)} \\ 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

93 
&$$& $prop^{(4)}$ {\tt::} $type$ & (3) \\ 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

94 
&$$& {\tt PROP} $aprop$ \\ 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

95 
&$$& $any^{(3)}$ {\tt ==} $any^{(2)}$ & (2) \\ 
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

96 
&$$& $any^{(3)}$ {\tt =?=} $any^{(2)}$ & (2) \\ 
320  97 
&$$& $prop^{(2)}$ {\tt ==>} $prop^{(1)}$ & (1) \\ 
98 
&$$& {\tt[} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt]} {\tt==>} $prop^{(1)}$ & (1) \\ 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

99 
&$$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\ 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

100 
&$$& {\tt OFCLASS} {\tt(} $type$ {\tt,} $logic$ {\tt)} \\\\ 
4543  101 
$aprop$ &=& $id$ ~~$$~~ $longid$ ~~$$~~ $var$ 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

102 
~~$$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\\\ 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

103 
$logic$ &=& {\tt(} $logic$ {\tt)} \\ 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

104 
&$$& $logic^{(4)}$ {\tt::} $type$ & (3) \\ 
4543  105 
&$$& $id$ ~~$$~~ $longid$ ~~$$~~ $var$ 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

106 
~~$$~~ $logic^{(\infty)}$ {\tt(} $any$ {\tt,} \dots {\tt,} $any$ {\tt)} \\ 
3694  107 
&$$& {\tt \%} $pttrns$ {\tt.} $any^{(3)}$ & (3) \\\\ 
320  108 
$idts$ &=& $idt$ ~~$$~~ $idt^{(1)}$ $idts$ \\\\ 
109 
$idt$ &=& $id$ ~~$$~~ {\tt(} $idt$ {\tt)} \\ 

110 
&$$& $id$ {\tt ::} $type$ & (0) \\\\ 

3694  111 
$pttrns$ &=& $pttrn$ ~~$$~~ $pttrn^{(1)}$ $pttrns$ \\\\ 
112 
$pttrn$ &=& $idt$ \\\\ 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

113 
$type$ &=& {\tt(} $type$ {\tt)} \\ 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

114 
&$$& $tid$ ~~$$~~ $tvar$ ~~$$~~ $tid$ {\tt::} $sort$ 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

115 
~~$$~~ $tvar$ {\tt::} $sort$ \\ 
320  116 
&$$& $id$ ~~$$~~ $type^{(\infty)}$ $id$ 
117 
~~$$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\ 

4543  118 
&$$& $longid$ ~~$$~~ $type^{(\infty)}$ $longid$ 
119 
~~$$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $longid$ \\ 

320  120 
&$$& $type^{(1)}$ {\tt =>} $type$ & (0) \\ 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

121 
&$$& {\tt[} $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0) \\\\ 
4543  122 
$sort$ &=& $id$ ~~$$~~ $longid$ ~~$$~~ {\tt\ttlbrace\ttrbrace} ~~$$~~ 
8136  123 
{\tt\ttlbrace} $id$ ~$$~ $longid${\tt,}\dots{\tt,} $id$ ~$$~$longid$ {\tt\ttrbrace} 
320  124 
\end{tabular} 
125 
\index{*PROP symbol} 

126 
\index{*== symbol}\index{*=?= symbol}\index{*==> symbol} 

127 
\index{*:: symbol}\index{*=> symbol} 

332  128 
\index{sort constraints} 
129 
%the index command: a percent is permitted, but braces must match! 

320  130 
\index{%@{\tt\%} symbol} 
131 
\index{{}@{\tt\ttlbrace} symbol}\index{{}@{\tt\ttrbrace} symbol} 

132 
\index{*[ symbol}\index{*] symbol} 

133 
\index{*"!"! symbol} 

134 
\index{*"[" symbol} 

135 
\index{*""] symbol} 

136 
\end{center} 

137 
\caption{Metalogic syntax}\label{fig:pure_gram} 

138 
\end{figure} 

139 

140 

141 
\section{The Pure syntax} \label{sec:basic_syntax} 

142 
\index{syntax!Pure(} 

143 

144 
At the root of all objectlogics lies the theory \thydx{Pure}. It 

145 
contains, among many other things, the Pure syntax. An informal account of 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

146 
this basic syntax (types, terms and formulae) appears in 
320  147 
\iflabelundefined{sec:forward}{{\em Introduction to Isabelle}}% 
148 
{\S\ref{sec:forward}}. A more precise description using a priority grammar 

149 
appears in Fig.\ts\ref{fig:pure_gram}. It defines the following 

150 
nonterminals: 

151 
\begin{ttdescription} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

152 
\item[\ndxbold{any}] denotes any term. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

153 

711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

154 
\item[\ndxbold{prop}] denotes terms of type {\tt prop}. These are formulae 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

155 
of the metalogic. Note that user constants of result type {\tt prop} 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

156 
(i.e.\ $c :: \ldots \To prop$) should always provide concrete syntax. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

157 
Otherwise atomic propositions with head $c$ may be printed incorrectly. 
320  158 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

159 
\item[\ndxbold{aprop}] denotes atomic propositions. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

160 

d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

161 
%% FIXME huh!? 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

162 
% These typically 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

163 
% include the judgement forms of the objectlogic; its definition 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

164 
% introduces a metalevel predicate for each judgement form. 
320  165 

711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

166 
\item[\ndxbold{logic}] denotes terms whose type belongs to class 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

167 
\cldx{logic}, excluding type \tydx{prop}. 
320  168 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

169 
\item[\ndxbold{idts}] denotes a list of identifiers, possibly constrained 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

170 
by types. 
3694  171 

172 
\item[\ndxbold{pttrn}, \ndxbold{pttrns}] denote patterns for 

173 
abstraction, cases etc. Initially the same as $idt$ and $idts$, 

174 
these are indetended to be augmented by user extensions. 

320  175 

176 
\item[\ndxbold{type}] denotes types of the metalogic. 

177 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

178 
\item[\ndxbold{sort}] denotes metalevel sorts. 
320  179 
\end{ttdescription} 
180 

181 
\begin{warn} 

182 
In {\tt idts}, note that \verbx::nat y is parsed as \verbx::(nat y), 

183 
treating {\tt y} like a type constructor applied to {\tt nat}. The 

184 
likely result is an error message. To avoid this interpretation, use 

185 
parentheses and write \verb(x::nat) y. 

332  186 
\index{type constraints}\index{*:: symbol} 
320  187 

188 
Similarly, \verbx::nat y::nat is parsed as \verbx::(nat y::nat) and 

189 
yields an error. The correct form is \verb(x::nat) (y::nat). 

190 
\end{warn} 

191 

452  192 
\begin{warn} 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

193 
Type constraints bind very weakly. For example, \verb!x<y::nat! is normally 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

194 
parsed as \verb!(x<y)::nat!, unless \verb$<$ has priority of 3 or less, in 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

195 
which case the string is likely to be ambiguous. The correct form is 
452  196 
\verb!x<(y::nat)!. 
197 
\end{warn} 

320  198 

867  199 
\subsection{Logical types and default syntax}\label{logicaltypes} 
200 
\index{lambda calc@$\lambda$calculus} 

201 

202 
Isabelle's representation of mathematical languages is based on the 

203 
simply typed $\lambda$calculus. All logical types, namely those of 

204 
class \cldx{logic}, are automatically equipped with a basic syntax of 

205 
types, identifiers, variables, parentheses, $\lambda$abstraction and 

206 
application. 

207 
\begin{warn} 

208 
Isabelle combines the syntaxes for all types of class \cldx{logic} by 

209 
mapping all those types to the single nonterminal $logic$. Thus all 

210 
productions of $logic$, in particular $id$, $var$ etc, become available. 

211 
\end{warn} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

212 

d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

213 

320  214 
\subsection{Lexical matters} 
215 
The parser does not process input strings directly. It operates on token 

216 
lists provided by Isabelle's \bfindex{lexer}. There are two kinds of 

217 
tokens: \bfindex{delimiters} and \bfindex{name tokens}. 

218 

219 
\index{reserved words} 

220 
Delimiters can be regarded as reserved words of the syntax. You can 

221 
add new ones when extending theories. In Fig.\ts\ref{fig:pure_gram} they 

222 
appear in typewriter font, for example {\tt ==}, {\tt =?=} and 

223 
{\tt PROP}\@. 

224 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

225 
Name tokens have a predefined syntax. The lexer distinguishes six disjoint 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

226 
classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

227 
identifiers\index{type identifiers}, type unknowns\index{type unknowns}, 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

228 
\rmindex{numerals}, \rmindex{strings}. They are denoted by \ndxbold{id}, 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

229 
\ndxbold{var}, \ndxbold{tid}, \ndxbold{tvar}, \ndxbold{xnum}, \ndxbold{xstr}, 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

230 
respectively. Typical examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3}, 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

231 
{\tt \#42}, {\tt ''foo bar''}. Here is the precise syntax: 
320  232 
\begin{eqnarray*} 
233 
id & = & letter~quasiletter^* \\ 

4543  234 
longid & = & id\mbox{\tt .}id~\dots~id \\ 
320  235 
var & = & \mbox{\tt ?}id ~~~~ \mbox{\tt ?}id\mbox{\tt .}nat \\ 
236 
tid & = & \mbox{\tt '}id \\ 

237 
tvar & = & \mbox{\tt ?}tid ~~~~ 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

238 
\mbox{\tt ?}tid\mbox{\tt .}nat \\ 
5542  239 
xnum & = & \mbox{\tt \#}nat ~~~~ \mbox{\tt \#}nat \\ 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

240 
xstr & = & \mbox{\tt ''\rm text\tt ''} \\[1ex] 
320  241 
letter & = & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\ 
242 
digit & = & \mbox{one of {\tt 0}\dots {\tt 9}} \\ 

243 
quasiletter & = & letter ~~~~ digit ~~~~ \mbox{\tt _} ~~~~ \mbox{\tt '} \\ 

244 
nat & = & digit^+ 

245 
\end{eqnarray*} 

4543  246 
The lexer repeatedly takes the longest prefix of the input string that 
247 
forms a valid token. A maximal prefix that is both a delimiter and a 

248 
name is treated as a delimiter. Spaces, tabs, newlines and formfeeds 

249 
are separators; they never occur within tokens, except those of class 

250 
$xstr$. 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

251 

d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

252 
\medskip 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

253 
Delimiters need not be separated by white space. For example, if {\tt } 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

254 
is a delimiter but {\tt } is not, then the string {\tt } is treated as 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

255 
two consecutive occurrences of the token~{\tt }. In contrast, \ML\ 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

256 
treats {\tt } as a single symbolic name. The consequence of Isabelle's 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

257 
more liberal scheme is that the same string may be parsed in different ways 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

258 
after extending the syntax: after adding {\tt } as a delimiter, the input 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

259 
{\tt } is treated as a single token. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

260 

320  261 
A \ndxbold{var} or \ndxbold{tvar} describes an unknown, which is internally 
262 
a pair of base name and index (\ML\ type \mltydx{indexname}). These 

263 
components are either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or 

264 
run together as in {\tt ?x1}. The latter form is possible if the base name 

265 
does not end with digits. If the index is 0, it may be dropped altogether: 

266 
{\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}. 

267 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

268 
Tokens of class $xnum$ or $xstr$ are not used by the metalogic. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

269 
Objectlogics may provide numerals and string constants by adding appropriate 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

270 
productions and translation functions. 
320  271 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

272 
\medskip 
320  273 
Although name tokens are returned from the lexer rather than the parser, it 
274 
is more logical to regard them as nonterminals. Delimiters, however, are 

275 
terminals; they are just syntactic sugar and contribute nothing to the 

276 
abstract syntax tree. 

277 

278 

3108  279 
\subsection{*Inspecting the syntax} \label{pg:print_syn} 
320  280 
\begin{ttbox} 
281 
syn_of : theory > Syntax.syntax 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

282 
print_syntax : theory > unit 
320  283 
Syntax.print_syntax : Syntax.syntax > unit 
284 
Syntax.print_gram : Syntax.syntax > unit 

285 
Syntax.print_trans : Syntax.syntax > unit 

286 
\end{ttbox} 

287 
The abstract type \mltydx{Syntax.syntax} allows manipulation of syntaxes 

288 
in \ML. You can display values of this type by calling the following 

289 
functions: 

290 
\begin{ttdescription} 

291 
\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle 

292 
theory~{\it thy} as an \ML\ value. 

293 

8136  294 
\item[\ttindexbold{print_syntax} $thy$] uses {\tt Syntax.print_syntax} 
295 
to display the syntax part of theory $thy$. 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

296 

320  297 
\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all 
298 
information contained in the syntax {\it syn}. The displayed output can 

299 
be large. The following two functions are more selective. 

300 

301 
\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

302 
of~{\it syn}, namely the lexicon, logical types and productions. These are 
320  303 
discussed below. 
304 

305 
\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation 

306 
part of~{\it syn}, namely the constants, parse/print macros and 

307 
parse/print translations. 

308 
\end{ttdescription} 

309 

310 
Let us demonstrate these functions by inspecting Pure's syntax. Even that 

311 
is too verbose to display in full. 

312 
\begin{ttbox}\index{*Pure theory} 

313 
Syntax.print_syntax (syn_of Pure.thy); 

314 
{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

315 
{\out logtypes: fun itself} 
320  316 
{\out prods:} 
317 
{\out type = tid (1000)} 

318 
{\out type = tvar (1000)} 

319 
{\out type = id (1000)} 

320 
{\out type = tid "::" sort[0] => "_ofsort" (1000)} 

321 
{\out type = tvar "::" sort[0] => "_ofsort" (1000)} 

322 
{\out \vdots} 

323 
\ttbreak 

3108  324 
{\out print modes: "symbols" "xterm"} 
320  325 
{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots} 
326 
{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"} 

327 
{\out "_idtyp" "_lambda" "_tapp" "_tappl"} 

328 
{\out parse_rules:} 

329 
{\out parse_translation: "!!" "_K" "_abs" "_aprop"} 

330 
{\out print_translation: "all"} 

331 
{\out print_rules:} 

332 
{\out print_ast_translation: "==>" "_abs" "_idts" "fun"} 

333 
\end{ttbox} 

334 

332  335 
As you can see, the output is divided into labelled sections. The grammar 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

336 
is represented by {\tt lexicon}, {\tt logtypes} and {\tt prods}. The rest 
320  337 
refers to syntactic translations and macro expansion. Here is an 
338 
explanation of the various sections. 

339 
\begin{description} 

340 
\item[{\tt lexicon}] lists the delimiters used for lexical 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

341 
analysis.\index{delimiters} 
320  342 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

343 
\item[{\tt logtypes}] lists the types that are regarded the same as {\tt 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

344 
logic} syntactically. Thus types of objectlogics (e.g.\ {\tt nat}, say) 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

345 
will be automatically equipped with the standard syntax of 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

346 
$\lambda$calculus. 
320  347 

348 
\item[{\tt prods}] lists the \rmindex{productions} of the priority grammar. 

349 
The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}. 

350 
Each delimiter is quoted. Some productions are shown with {\tt =>} and 

351 
an attached string. These strings later become the heads of parse 

352 
trees; they also play a vital role when terms are printed (see 

353 
\S\ref{sec:asts}). 

354 

355 
Productions with no strings attached are called {\bf copy 

356 
productions}\indexbold{productions!copy}. Their righthand side must 

357 
have exactly one nonterminal symbol (or name token). The parser does 

358 
not create a new parse tree node for copy productions, but simply 

359 
returns the parse tree of the righthand symbol. 

360 

361 
If the righthand side consists of a single nonterminal with no 

362 
delimiters, then the copy production is called a {\bf chain 

363 
production}. Chain productions act as abbreviations: 

364 
conceptually, they are removed from the grammar by adding new 

365 
productions. Priority information attached to chain productions is 

366 
ignored; only the dummy value $1$ is displayed. 

3108  367 

368 
\item[\ttindex{print_modes}] lists the alternative print modes 

369 
provided by this syntax (see \S\ref{sec:prmodes}). 

320  370 

371 
\item[{\tt consts}, {\tt parse_rules}, {\tt print_rules}] 

372 
relate to macros (see \S\ref{sec:macros}). 

373 

374 
\item[{\tt parse_ast_translation}, {\tt print_ast_translation}] 

375 
list sets of constants that invoke translation functions for abstract 

376 
syntax trees. Section \S\ref{sec:asts} below discusses this obscure 

377 
matter.\index{constants!for translations} 

378 

4597
a0bdee64194c
Fixed a lot of overfull and underfull lines (hboxes)
paulson
parents:
4543
diff
changeset

379 
\item[{\tt parse_translation}, {\tt print_translation}] list the sets 
320  380 
of constants that invoke translation functions for terms (see 
381 
\S\ref{sec:tr_funs}). 

382 
\end{description} 

383 
\index{syntax!Pure)} 

384 

385 

386 
\section{Mixfix declarations} \label{sec:mixfix} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

387 
\index{mixfix declarations(} 
320  388 

389 
When defining a theory, you declare new constants by giving their names, 

390 
their type, and an optional {\bf mixfix annotation}. Mixfix annotations 

391 
allow you to extend Isabelle's basic $\lambda$calculus syntax with 

392 
readable notation. They can express any contextfree priority grammar. 

393 
Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

394 
general than the priority declarations of \ML\ and Prolog. 
320  395 

396 
A mixfix annotation defines a production of the priority grammar. It 

397 
describes the concrete syntax, the translation to abstract syntax, and the 

398 
pretty printing. Special case annotations provide a simple means of 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

399 
specifying infix operators and binders. 
320  400 

401 
\subsection{The general mixfix form} 

402 
Here is a detailed account of mixfix declarations. Suppose the following 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

403 
line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy} 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

404 
file: 
320  405 
\begin{center} 
406 
{\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)} 

407 
\end{center} 

332  408 
This constant declaration and mixfix annotation are interpreted as follows: 
320  409 
\begin{itemize}\index{productions} 
410 
\item The string {\tt $c$} is the name of the constant associated with the 

411 
production; unless it is a valid identifier, it must be enclosed in 

412 
quotes. If $c$ is empty (given as~{\tt ""}) then this is a copy 

413 
production.\index{productions!copy} Otherwise, parsing an instance of the 

414 
phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$ 

415 
$a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$th 

416 
argument. 

417 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

418 
\item The constant $c$, if nonempty, is declared to have type $\sigma$ 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

419 
({\tt consts} section only). 
320  420 

421 
\item The string $template$ specifies the righthand side of 

422 
the production. It has the form 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

423 
\[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \] 
320  424 
where each occurrence of {\tt_} denotes an argument position and 
425 
the~$w@i$ do not contain~{\tt _}. (If you want a literal~{\tt _} in 

426 
the concrete syntax, you must escape it as described below.) The $w@i$ 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

427 
may consist of \rmindex{delimiters}, spaces or 
320  428 
\rmindex{pretty printing} annotations (see below). 
429 

430 
\item The type $\sigma$ specifies the production's nonterminal symbols 

431 
(or name tokens). If $template$ is of the form above then $\sigma$ 

432 
must be a function type with at least~$n$ argument positions, say 

433 
$\sigma = [\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are 

434 
derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

435 
below. Any of these may be function types. 
320  436 

437 
\item The optional list~$ps$ may contain at most $n$ integers, say {\tt 

438 
[$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal 

439 
priority\indexbold{priorities} required of any phrase that may appear 

440 
as the $i$th argument. Missing priorities default to~0. 

4543  441 

442 
\item The integer $p$ is the priority of this production. If 

443 
omitted, it defaults to the maximal priority. Priorities range 

444 
between 0 and \ttindexbold{max_pri} (= 1000). 

320  445 

446 
\end{itemize} 

447 
% 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

448 
The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

449 
A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

450 
nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

451 
The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

452 
this is a logical type (namely one of class {\tt logic} excluding {\tt 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

453 
prop}). Otherwise it is $ty$ (note that only the outermost type constructor 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

454 
is taken into account). Finally, the nonterminal of a type variable is {\tt 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

455 
any}. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

456 

911
55754d6d399c
new in mixfix annotations: "' " (quote space) separates delimiters without
wenzelm
parents:
885
diff
changeset

457 
\begin{warn} 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

458 
Theories must sometimes declare types for purely syntactic purposes  
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

459 
merely playing the role of nonterminals. One example is \tydx{type}, the 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

460 
builtin type of types. This is a `type of all types' in the syntactic 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

461 
sense only. Do not declare such types under {\tt arities} as belonging to 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

462 
class {\tt logic}\index{*logic class}, for that would make them useless as 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

463 
separate nonterminal symbols. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

464 
\end{warn} 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

465 

d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

466 
Associating nonterminals with types allows a constant's type to specify 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

467 
syntax as well. We can declare the function~$f$ to have type $[\tau@1, 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

468 
\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

469 
of the function's $n$ arguments. The constant's name, in this case~$f$, will 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

470 
also serve as the label in the abstract syntax tree. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

471 

d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

472 
You may also declare mixfix syntax without adding constants to the theory's 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

473 
signature, by using a {\tt syntax} section instead of {\tt consts}. Thus a 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

474 
production need not map directly to a logical function (this typically 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

475 
requires additional syntactic translations, see also 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

476 
Chapter~\ref{chap:syntax}). 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

477 

d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

478 

911
55754d6d399c
new in mixfix annotations: "' " (quote space) separates delimiters without
wenzelm
parents:
885
diff
changeset

479 
\medskip 
55754d6d399c
new in mixfix annotations: "' " (quote space) separates delimiters without
wenzelm
parents:
885
diff
changeset

480 
As a special case of the general mixfix declaration, the form 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

481 
\begin{center} 
911
55754d6d399c
new in mixfix annotations: "' " (quote space) separates delimiters without
wenzelm
parents:
885
diff
changeset

482 
{\tt $c$ ::\ "$\sigma$" ("$template$")} 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

483 
\end{center} 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

484 
specifies no priorities. The resulting production puts no priority 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

485 
constraints on any of its arguments and has maximal priority itself. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

486 
Omitting priorities in this manner is prone to syntactic ambiguities unless 
3098  487 
the production's righthand side is fully bracketed, as in 
488 
\verb"if _ then _ else _ fi". 

320  489 

490 
Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"}, 

491 
is sensible only if~$c$ is an identifier. Otherwise you will be unable to 

492 
write terms involving~$c$. 

493 

494 

495 
\subsection{Example: arithmetic expressions} 

496 
\index{examples!of mixfix declarations} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

497 
This theory specification contains a {\tt syntax} section with mixfix 
320  498 
declarations encoding the priority grammar from 
499 
\S\ref{sec:priority_grammars}: 

500 
\begin{ttbox} 

3108  501 
ExpSyntax = Pure + 
320  502 
types 
503 
exp 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

504 
syntax 
1387  505 
"0" :: exp ("0" 9) 
506 
"+" :: [exp, exp] => exp ("_ + _" [0, 1] 0) 

507 
"*" :: [exp, exp] => exp ("_ * _" [3, 2] 2) 

508 
"" :: exp => exp (" _" [3] 3) 

320  509 
end 
510 
\end{ttbox} 

8136  511 
If you put this into a file {\tt ExpSyntax.thy} and load it by {\tt 
3108  512 
use_thy"ExpSyntax"}, you can run some tests: 
320  513 
\begin{ttbox} 
3108  514 
val read_exp = Syntax.test_read (syn_of ExpSyntax.thy) "exp"; 
320  515 
{\out val it = fn : string > unit} 
516 
read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0"; 

517 
{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"} 

518 
{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")} 

519 
{\out \vdots} 

520 
read_exp "0 +  0 + 0"; 

521 
{\out tokens: "0" "+" "" "0" "+" "0"} 

522 
{\out raw: ("+" ("+" "0" ("" "0")) "0")} 

523 
{\out \vdots} 

524 
\end{ttbox} 

525 
The output of \ttindex{Syntax.test_read} includes the token list ({\tt 

526 
tokens}) and the raw \AST{} directly derived from the parse tree, 

527 
ignoring parse \AST{} translations. The rest is tracing information 

528 
provided by the macro expander (see \S\ref{sec:macros}). 

529 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

530 
Executing {\tt Syntax.print_gram} reveals the productions derived from the 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

531 
above mixfix declarations (lots of additional information deleted): 
320  532 
\begin{ttbox} 
3108  533 
Syntax.print_gram (syn_of ExpSyntax.thy); 
320  534 
{\out exp = "0" => "0" (9)} 
535 
{\out exp = exp[0] "+" exp[1] => "+" (0)} 

536 
{\out exp = exp[3] "*" exp[2] => "*" (2)} 

537 
{\out exp = "" exp[3] => "" (3)} 

538 
\end{ttbox} 

539 

3108  540 
Note that because {\tt exp} is not of class {\tt logic}, it has been 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

541 
retained as a separate nonterminal. This also entails that the syntax 
3108  542 
does not provide for identifiers or paranthesized expressions. 
543 
Normally you would also want to add the declaration {\tt arities 

544 
exp::logic} after {\tt types} and use {\tt consts} instead of {\tt 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

545 
syntax}. Try this as an exercise and study the changes in the 
867  546 
grammar. 
320  547 

548 
\subsection{The mixfix template} 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

549 
Let us now take a closer look at the string $template$ appearing in mixfix 
320  550 
annotations. This string specifies a list of parsing and printing 
551 
directives: delimiters\index{delimiters}, arguments, spaces, blocks of 

552 
indentation and line breaks. These are encoded by the following character 

553 
sequences: 

554 
\index{pretty printing(} 

555 
\begin{description} 

556 
\item[~$d$~] is a delimiter, namely a nonempty sequence of characters 

557 
other than the special characters {\tt _}, {\tt(}, {\tt)} and~{\tt/}. 

558 
Even these characters may appear if escaped; this means preceding it with 

559 
a~{\tt '} (single quote). Thus you have to write {\tt ''} if you really 

911
55754d6d399c
new in mixfix annotations: "' " (quote space) separates delimiters without
wenzelm
parents:
885
diff
changeset

560 
want a single quote. Furthermore, a~{\tt '} followed by a space separates 
55754d6d399c
new in mixfix annotations: "' " (quote space) separates delimiters without
wenzelm
parents:
885
diff
changeset

561 
delimiters without extra white space being added for printing. 
320  562 

563 
\item[~{\tt_}~] is an argument position, which stands for a nonterminal symbol 

564 
or name token. 

565 

566 
\item[~$s$~] is a nonempty sequence of spaces for printing. This and the 

567 
following specifications do not affect parsing at all. 

568 

569 
\item[~{\tt(}$n$~] opens a pretty printing block. The optional number $n$ 

570 
specifies how much indentation to add when a line break occurs within the 

571 
block. If {\tt(} is not followed by digits, the indentation defaults 

572 
to~0. 

573 

574 
\item[~{\tt)}~] closes a pretty printing block. 

575 

576 
\item[~{\tt//}~] forces a line break. 

577 

578 
\item[~{\tt/}$s$~] allows a line break. Here $s$ stands for the string of 

579 
spaces (zero or more) right after the {\tt /} character. These spaces 

580 
are printed if the break is not taken. 

581 
\end{description} 

582 
For example, the template {\tt"(_ +/ _)"} specifies an infix operator. 

583 
There are two argument positions; the delimiter~{\tt+} is preceded by a 

584 
space and followed by a space or line break; the entire phrase is a pretty 

585 
printing block. Other examples appear in Fig.\ts\ref{fig:set_trans} below. 

586 
Isabelle's pretty printer resembles the one described in 

6592  587 
Paulson~\cite{paulsonml2}. 
320  588 

589 
\index{pretty printing)} 

590 

591 

592 
\subsection{Infixes} 

593 
\indexbold{infixes} 

594 

3108  595 
Infix operators associating to the left or right can be declared using 
596 
{\tt infixl} or {\tt infixr}. Basically, the form {\tt $c$ ::\ 

597 
$\sigma$ (infixl $p$)} abbreviates the mixfix declarations 

320  598 
\begin{ttbox} 
1387  599 
"op \(c\)" :: \(\sigma\) ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\)) 
600 
"op \(c\)" :: \(\sigma\) ("op \(c\)") 

320  601 
\end{ttbox} 
1387  602 
and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations 
320  603 
\begin{ttbox} 
1387  604 
"op \(c\)" :: \(\sigma\) ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\)) 
605 
"op \(c\)" :: \(\sigma\) ("op \(c\)") 

320  606 
\end{ttbox} 
607 
The infix operator is declared as a constant with the prefix {\tt op}. 

608 
Thus, prefixing infixes with \sdx{op} makes them behave like ordinary 

609 
function symbols, as in \ML. Special characters occurring in~$c$ must be 

610 
escaped, as in delimiters, using a single quote. 

611 

3108  612 
A slightly more general form of infix declarations allows constant 
613 
names to be independent from their concrete syntax, namely \texttt{$c$ 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

614 
::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}. As 
3108  615 
an example consider: 
616 
\begin{ttbox} 

617 
and :: [bool, bool] => bool (infixr "&" 35) 

618 
\end{ttbox} 

619 
The internal constant name will then be just \texttt{and}, without any 

620 
\texttt{op} prefixed. 

621 

320  622 

623 
\subsection{Binders} 

624 
\indexbold{binders} 

625 
\begingroup 

626 
\def\Q{{\cal Q}} 

627 
A {\bf binder} is a variablebinding construct such as a quantifier. The 

628 
constant declaration 

629 
\begin{ttbox} 

1387  630 
\(c\) :: \(\sigma\) (binder "\(\Q\)" [\(pb\)] \(p\)) 
320  631 
\end{ttbox} 
632 
introduces a constant~$c$ of type~$\sigma$, which must have the form 

633 
$(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is $\Q~x.P$, where 

634 
$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$ 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

635 
and the whole term has type~$\tau@3$. The optional integer $pb$ 
1060
a122584b5cc5
In binders, the default body priority is now p instead of 0.
lcp
parents:
911
diff
changeset

636 
specifies the body's priority, by default~$p$. Special characters 
877  637 
in $\Q$ must be escaped using a single quote. 
320  638 

864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

639 
The declaration is expanded internally to something like 
320  640 
\begin{ttbox} 
3098  641 
\(c\)\hskip3pt :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\) 
642 
"\(\Q\)" :: [idts, \(\tau@2\)] => \(\tau@3\) ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\)) 

320  643 
\end{ttbox} 
644 
Here \ndx{idts} is the nonterminal symbol for a list of identifiers with 

332  645 
\index{type constraints} 
320  646 
optional type constraints (see Fig.\ts\ref{fig:pure_gram}). The 
647 
declaration also installs a parse translation\index{translations!parse} 

648 
for~$\Q$ and a print translation\index{translations!print} for~$c$ to 

649 
translate between the internal and external forms. 

650 

651 
A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a 

652 
list of variables. The external form $\Q~x@1~x@2 \ldots x@n. P$ 

653 
corresponds to the internal form 

654 
\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \] 

655 

656 
\medskip 

657 
For example, let us declare the quantifier~$\forall$:\index{quantifiers} 

658 
\begin{ttbox} 

1387  659 
All :: ('a => o) => o (binder "ALL " 10) 
320  660 
\end{ttbox} 
661 
This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL 

662 
$x$.$P$}. When printing, Isabelle prefers the latter form, but must fall 

663 
back on ${\tt All}(P)$ if $P$ is not an abstraction. Both $P$ and {\tt ALL 

664 
$x$.$P$} have type~$o$, the type of formulae, while the bound variable 

665 
can be polymorphic. 

666 
\endgroup 

667 

668 
\index{mixfix declarations)} 

669 

3108  670 

671 
\section{*Alternative print modes} \label{sec:prmodes} 

672 
\index{print modes(} 

673 
% 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

674 
Isabelle's pretty printer supports alternative output syntaxes. These 
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

675 
may be used independently or in cooperation. The currently active 
3108  676 
print modes (with precedence from left to right) are determined by a 
677 
reference variable. 

678 
\begin{ttbox}\index{*print_mode} 

679 
print_mode: string list ref 

680 
\end{ttbox} 

681 
Initially this may already contain some print mode identifiers, 

682 
depending on how Isabelle has been invoked (e.g.\ by some user 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

683 
interface). So changes should be incremental  adding or deleting 
3108  684 
modes relative to the current value. 
685 

686 
Any \ML{} string is a legal print mode identifier, without any 

687 
predeclaration required. The following names should be considered 

688 
reserved, though: \texttt{""} (yes, the empty string), 

689 
\texttt{symbols}, \texttt{latex}, \texttt{xterm}. 

690 

691 
There is a separate table of mixfix productions for pretty printing 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

692 
associated with each print mode. The currently active ones are 
3108  693 
conceptually just concatenated from left to right, with the standard 
694 
syntax output table always coming last as default. Thus mixfix 

695 
productions of preceding modes in the list may override those of later 

696 
ones. Also note that token translations are always relative to some 

697 
print mode (see \S\ref{sec:tok_tr}). 

698 

699 
\medskip The canonical application of print modes is optional printing 

700 
of mathematical symbols from a special screen font instead of {\sc 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

701 
ascii}. Another example is to reuse Isabelle's advanced 
3108  702 
$\lambda$term printing mechanisms to generate completely different 
3228  703 
output, say for interfacing external tools like \rmindex{model 
704 
checkers} (see also \texttt{HOL/Modelcheck}). 

3108  705 

706 
\index{print modes)} 

707 

708 

711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

709 
\section{Ambiguity of parsed expressions} \label{sec:ambiguity} 
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

710 
\index{ambiguity!of parsed expressions} 
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

711 

bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

712 
To keep the grammar small and allow common productions to be shared 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

713 
all logical types (except {\tt prop}) are internally represented 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

714 
by one nonterminal, namely {\tt logic}. This and omitted or too freely 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

715 
chosen priorities may lead to ways of parsing an expression that were 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

716 
not intended by the theory's maker. In most cases Isabelle is able to 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

717 
select one of multiple parse trees that an expression has lead 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

718 
to by checking which of them can be typed correctly. But this may not 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

719 
work in every case and always slows down parsing. 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

720 
The warning and error messages that can be produced during this process are 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

721 
as follows: 
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

722 

880  723 
If an ambiguity can be resolved by type inference the following 
724 
warning is shown to remind the user that parsing is (unnecessarily) 

3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

725 
slowed down. In cases where it's not easily possible to eliminate the 
880  726 
ambiguity the frequency of the warning can be controlled by changing 
883
92abd2ad9d08
renamed Sign.ambiguity_level to Syntax.ambiguity_level
clasohm
parents:
880
diff
changeset

727 
the value of {\tt Syntax.ambiguity_level} which has type {\tt int 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

728 
ref}. Its default value is 1 and by increasing it one can control how 
883
92abd2ad9d08
renamed Sign.ambiguity_level to Syntax.ambiguity_level
clasohm
parents:
880
diff
changeset

729 
many parse trees are necessary to generate the warning. 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

730 

bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

731 
\begin{ttbox} 
3801  732 
{\out Ambiguous input "\dots"} 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

733 
{\out produces the following parse trees:} 
3801  734 
{\out \dots} 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

735 
{\out Fortunately, only one parse tree is type correct.} 
3801  736 
{\out You may still want to disambiguate your grammar or your input.} 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

737 
\end{ttbox} 
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

738 

bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

739 
The following message is normally caused by using the same 
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

740 
syntax in two different productions: 
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

741 

bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

742 
\begin{ttbox} 
3802  743 
{\out Ambiguous input "..."} 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

744 
{\out produces the following parse trees:} 
3802  745 
{\out \dots} 
746 
{\out More than one term is type correct:} 

747 
{\out \dots} 

711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

748 
\end{ttbox} 
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

749 

866
2d3d020eef11
added documentation of bind_thm, qed, qed_goal, get_thm, thms_of
clasohm
parents:
864
diff
changeset

750 
Ambiguities occuring in syntax translation rules cannot be resolved by 
2d3d020eef11
added documentation of bind_thm, qed, qed_goal, get_thm, thms_of
clasohm
parents:
864
diff
changeset

751 
type inference because it is not necessary for these rules to be type 
3485
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
paulson
parents:
3318
diff
changeset

752 
correct. Therefore Isabelle always generates an error message and the 
866
2d3d020eef11
added documentation of bind_thm, qed, qed_goal, get_thm, thms_of
clasohm
parents:
864
diff
changeset

753 
ambiguity should be eliminated by changing the grammar or the rule. 
711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

754 

320  755 

756 
\section{Example: some minimal logics} \label{sec:min_logics} 

757 
\index{examples!of logic definitions} 

758 

759 
This section presents some examples that have a simple syntax. They 

760 
demonstrate how to define new objectlogics from scratch. 

761 

711
bb868a30e66f
updated remarks about grammar; added section about ambiguities
clasohm
parents:
452
diff
changeset

762 
First we must define how an objectlogic syntax is embedded into the 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

763 
metalogic. Since all theorems must conform to the syntax for~\ndx{prop} 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

764 
(see Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the 
320  765 
objectlevel syntax. Assume that the syntax of your objectlogic defines a 
864
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

766 
metatype~\tydx{o} of formulae which refers to the nonterminal {\tt logic}. 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

767 
These formulae can now appear in axioms and theorems wherever \ndx{prop} does 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

768 
if you add the production 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

769 
\[ prop ~=~ logic. \] 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

770 
This is not supposed to be a copy production but an implicit coercion from 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
wenzelm
parents:
711
diff
changeset

771 
formulae to propositions: 
320  772 
\begin{ttbox} 
773 
Base = Pure + 

774 
types 

775 
o 

776 
arities 

777 
o :: logic 

778 
consts 

1387  779 
Trueprop :: o => prop ("_" 5) 
320  780 
end 
781 
\end{ttbox} 

782 
The constant \cdx{Trueprop} (the name is arbitrary) acts as an invisible 

332  783 
coercion function. Assuming this definition resides in a file {\tt Base.thy}, 
320  784 
you have to load it with the command {\tt use_thy "Base"}. 
785 

786 
One of the simplest nontrivial logics is {\bf minimal logic} of 

787 
implication. Its definition in Isabelle needs no advanced features but 

788 
illustrates the overall mechanism nicely: 

789 
\begin{ttbox} 

790 
Hilbert = Base + 

791 
consts 

1387  792 
">" :: [o, o] => o (infixr 10) 
320  793 
rules 
794 
K "P > Q > P" 

795 
S "(P > Q > R) > (P > Q) > P > R" 

796 
MP "[ P > Q; P ] ==> Q" 

797 
end 

798 
\end{ttbox} 

332  799 
After loading this definition from the file {\tt Hilbert.thy}, you can 
320  800 
start to prove theorems in the logic: 
801 
\begin{ttbox} 

5205  802 
Goal "P > P"; 
320  803 
{\out Level 0} 
804 
{\out P > P} 

805 
{\out 1. P > P} 

806 
\ttbreak 

807 
by (resolve_tac [Hilbert.MP] 1); 

808 
{\out Level 1} 

809 
{\out P > P} 

810 
{\out 1. ?P > P > P} 

811 
{\out 2. ?P} 

812 
\ttbreak 

813 
by (resolve_tac [Hilbert.MP] 1); 

814 
{\out Level 2} 

815 
{\out P > P} 

816 
{\out 1. ?P1 > ?P > P > P} 

817 
{\out 2. ?P1} 

818 
{\out 3. ?P} 

819 
\ttbreak 

820 
by (resolve_tac [Hilbert.S] 1); 

821 
{\out Level 3} 

822 
{\out P > P} 

823 
{\out 1. P > ?Q2 > P} 

824 
{\out 2. P > ?Q2} 

825 
\ttbreak 

826 
by (resolve_tac [Hilbert.K] 1); 

827 
{\out Level 4} 

828 
{\out P > P} 

829 
{\out 1. P > ?Q2} 

830 
\ttbreak 

831 
by (resolve_tac [Hilbert.K] 1); 

832 
{\out Level 5} 

833 
{\out P > P} 

834 
{\out No subgoals!} 

835 
\end{ttbox} 

836 
As we can see, this Hilbertstyle formulation of minimal logic is easy to 

837 
define but difficult to use. The following natural deduction formulation is 

838 
better: 

839 
\begin{ttbox} 

840 
MinI = Base + 

841 
consts 

1387  842 
">" :: [o, o] => o (infixr 10) 
320  843 
rules 
844 
impI "(P ==> Q) ==> P > Q" 

845 
impE "[ P > Q; P ] ==> Q" 

846 
end 

847 
\end{ttbox} 

848 
Note, however, that although the two systems are equivalent, this fact 

849 
cannot be proved within Isabelle. Axioms {\tt S} and {\tt K} can be 

850 
derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt 

851 
Hilbert}. The reason is that {\tt impI} is only an {\bf admissible} rule 

852 
in {\tt Hilbert}, something that can only be shown by induction over all 

853 
possible proofs in {\tt Hilbert}. 

854 

855 
We may easily extend minimal logic with falsity: 

856 
\begin{ttbox} 

857 
MinIF = MinI + 

858 
consts 

1387  859 
False :: o 
320  860 
rules 
861 
FalseE "False ==> P" 

862 
end 

863 
\end{ttbox} 

864 
On the other hand, we may wish to introduce conjunction only: 

865 
\begin{ttbox} 

866 
MinC = Base + 

867 
consts 

1387  868 
"&" :: [o, o] => o (infixr 30) 
320  869 
\ttbreak 
870 
rules 

871 
conjI "[ P; Q ] ==> P & Q" 

872 
conjE1 "P & Q ==> P" 

873 
conjE2 "P & Q ==> Q" 

874 
end 

875 
\end{ttbox} 

876 
And if we want to have all three connectives together, we create and load a 

3108  877 
theory file consisting of a single line: 
320  878 
\begin{ttbox} 
879 
MinIFC = MinIF + MinC 

880 
\end{ttbox} 

881 
Now we can prove mixed theorems like 

882 
\begin{ttbox} 

5205  883 
Goal "P & False > Q"; 
320  884 
by (resolve_tac [MinI.impI] 1); 
885 
by (dresolve_tac [MinC.conjE2] 1); 

886 
by (eresolve_tac [MinIF.FalseE] 1); 

887 
\end{ttbox} 

888 
Try this as an exercise! 

5371  889 

890 

891 
%%% Local Variables: 

892 
%%% mode: latex 

893 
%%% TeXmaster: "ref" 

894 
%%% End: 