104

1 
%% $Id$

108

2 

104

3 
\newcommand{\rmindex}[1]{{#1}\index{#1}\@}


4 
\newcommand{\mtt}[1]{\mbox{\tt #1}}


5 
\newcommand{\ttfct}[1]{\mathop{\mtt{#1}}\nolimits}


6 
\newcommand{\ttrel}[1]{\mathrel{\mtt{#1}}}


7 
\newcommand{\Constant}{\ttfct{Constant}}


8 
\newcommand{\Variable}{\ttfct{Variable}}


9 
\newcommand{\Appl}[1]{\ttfct{Appl}\mathopen{\mtt[}#1\mathclose{\mtt]}}


10 


11 


12 


13 
\chapter{Defining Logics} \label{DefiningLogics}


14 


15 
This chapter is intended for Isabelle experts. It explains how to define new


16 
logical systems, Isabelle's {\em raison d'\^etre}. Isabelle logics are


17 
hierarchies of theories. A number of simple examples are contained in {\em


18 
Introduction to Isabelle}; the full syntax of theory definition files ({\tt


19 
.thy} files) is shown in {\em The Isabelle Reference Manual}. This chapter's


20 
chief purpose is a thorough description of all syntax related matters


21 
concerning theories. The most important sections are \S\ref{sec:mixfix} about


22 
mixfix declarations and \S\ref{sec:macros} describing the macro system. The


23 
concluding examples of \S\ref{sec:min_logics} are more concerned with the


24 
logical aspects of the definition of theories. Sections marked with * can be


25 
skipped on the first reading.


26 


27 


28 
%% FIXME move to Refman


29 
% \section{Classes and types *}


30 
% \index{*arities!context conditions}


31 
%


32 
% Type declarations are subject to the following two wellformedness


33 
% conditions:


34 
% \begin{itemize}


35 
% \item There are no two declarations $ty :: (\vec{r})c$ and $ty :: (\vec{s})c$


36 
% with $\vec{r} \neq \vec{s}$. For example


37 
% \begin{ttbox}


38 
% types ty 1


39 
% arities ty :: ({\ttlbrace}logic{\ttrbrace}) logic


40 
% ty :: ({\ttlbrace}{\ttrbrace})logic


41 
% \end{ttbox}


42 
% leads to an error message and fails.


43 
% \item If there are two declarations $ty :: (s@1,\dots,s@n)c$ and $ty ::


44 
% (s@1',\dots,s@n')c'$ such that $c' < c$ then $s@i' \preceq s@i$ must hold


45 
% for $i=1,\dots,n$. The relationship $\preceq$, defined as


46 
% \[ s' \preceq s \iff \forall c\in s. \exists c'\in s'.~ c'\le c, \]


47 
% expresses that the set of types represented by $s'$ is a subset of the set of


48 
% types represented by $s$. For example


49 
% \begin{ttbox}


50 
% classes term < logic


51 
% types ty 1


52 
% arities ty :: ({\ttlbrace}logic{\ttrbrace})logic


53 
% ty :: ({\ttlbrace}{\ttrbrace})term


54 
% \end{ttbox}


55 
% leads to an error message and fails.


56 
% \end{itemize}


57 
% These conditions guarantee principal types~\cite{nipkowprehofer}.


58 


59 


60 


61 
\section{Precedence grammars} \label{sec:precedence_grammars}


62 


63 
The precise syntax of a logic is best defined by a \rmindex{contextfree


64 
grammar}. In order to simplify the description of mathematical languages, we


65 
introduce an extended format which permits {\bf


66 
precedences}\indexbold{precedence}. This scheme generalizes precedence


67 
declarations in \ML\ and {\sc prolog}. In this extended grammar format,


68 
nonterminals are decorated by integers, their precedence. In the sequel,


69 
precedences are shown as subscripts. A nonterminal $A@p$ on the righthand


70 
side of a production may only be replaced using a production $A@q = \gamma$


71 
where $p \le q$.


72 


73 
Formally, a set of context free productions $G$ induces a derivation


74 
relation $\rew@G$ on strings as follows:


75 
\[ \alpha A@p \beta ~\rew@G~ \alpha\gamma\beta ~~~iff~~~


76 
\exists (A@q=\gamma) \in G.~q \ge p


77 
\]


78 
Any extended grammar of this kind can be translated into a normal context


79 
free grammar. However, this translation may require the introduction of a


80 
large number of new nonterminals and productions.


81 


82 
\begin{example} \label{ex:precedence}


83 
The following simple grammar for arithmetic expressions demonstrates how


84 
binding power and associativity of operators can be enforced by precedences.


85 
\begin{center}


86 
\begin{tabular}{rclr}


87 
$A@9$ & = & {\tt0} \\


88 
$A@9$ & = & {\tt(} $A@0$ {\tt)} \\


89 
$A@0$ & = & $A@0$ {\tt+} $A@1$ \\


90 
$A@2$ & = & $A@3$ {\tt*} $A@2$ \\


91 
$A@3$ & = & {\tt} $A@3$


92 
\end{tabular}


93 
\end{center}


94 
The choice of precedences determines that {\tt } binds tighter than {\tt *}


95 
which binds tighter than {\tt +}, and that {\tt +} associates to the left and


96 
{\tt *} to the right.


97 
\end{example}


98 


99 
To minimize the number of subscripts, we adopt the following conventions:


100 
\begin{itemize}


101 
\item All precedences $p$ must be in the range $0 \leq p \leq max_pri$ for


102 
some fixed $max_pri$.


103 
\item Precedence $0$ on the righthand side and precedence $max_pri$ on the


104 
lefthand side may be omitted.


105 
\end{itemize}


106 
In addition, we write the production $A@p = \alpha$ as $A = \alpha~(p)$,


107 
i.e.\ the precedence of the lefthand side actually appears in the uttermost


108 
right. Finally, alternatives may be separated by $$, and repetition


109 
indicated by \dots.


110 


111 
Using these conventions and assuming $max_pri=9$, the grammar in


112 
Example~\ref{ex:precedence} becomes


113 
\begin{center}


114 
\begin{tabular}{rclc}


115 
$A$ & = & {\tt0} & \hspace*{4em} \\


116 
& $$ & {\tt(} $A$ {\tt)} \\


117 
& $$ & $A$ {\tt+} $A@1$ & (0) \\


118 
& $$ & $A@3$ {\tt*} $A@2$ & (2) \\


119 
& $$ & {\tt} $A@3$ & (3)


120 
\end{tabular}


121 
\end{center}


122 


123 


124 


125 
\section{Basic syntax} \label{sec:basic_syntax}


126 


127 
The basis of all extensions by objectlogics is the \rmindex{Pure theory},


128 
bound to the \MLidentifier \ttindex{Pure.thy}. It contains, among many other


129 
things, the \rmindex{Pure syntax}. An informal account of this basic syntax


130 
(metalogic, types etc.) may be found in {\em Introduction to Isabelle}. A


131 
more precise description using a precedence grammar is shown in


132 
Figure~\ref{fig:pure_gram}.


133 


134 
\begin{figure}[htb]


135 
\begin{center}


136 
\begin{tabular}{rclc}


137 
$prop$ &=& \ttindex{PROP} $aprop$ ~~$$~~ {\tt(} $prop$ {\tt)} \\


138 
&$$& $logic@3$ \ttindex{==} $logic@2$ & (2) \\


139 
&$$& $logic@3$ \ttindex{=?=} $logic@2$ & (2) \\


140 
&$$& $prop@2$ \ttindex{==>} $prop@1$ & (1) \\


141 
&$$& {\tt[} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt]} {\tt==>} $prop@1$ & (1) \\


142 
&$$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\


143 
$logic$ &=& $prop$ ~~$$~~ $fun$ \\\\


144 
$aprop$ &=& $id$ ~~$$~~ $var$


145 
~~$$~~ $fun@{max_pri}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\\\


146 
$fun$ &=& $id$ ~~$$~~ $var$ ~~$$~~ {\tt(} $fun$ {\tt)} \\


147 
&$$& \ttindex{\%} $idts$ {\tt.} $logic$ & (0) \\\\


148 
$idts$ &=& $idt$ ~~$$~~ $idt@1$ $idts$ \\\\


149 
$idt$ &=& $id$ ~~$$~~ {\tt(} $idt$ {\tt)} \\


150 
&$$& $id$ \ttindex{::} $type$ & (0) \\\\


151 
$type$ &=& $tfree$ ~~$$~~ $tvar$ \\


152 
&$$& $tfree$ {\tt::} $sort$ ~~$$~~ $tvar$ {\tt::} $sort$ \\


153 
&$$& $id$ ~~$$~~ $type@{max_pri}$ $id$


154 
~~$$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\


155 
&$$& $type@1$ \ttindex{=>} $type$ & (0) \\


156 
&$$& {\tt[} $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\


157 
&$$& {\tt(} $type$ {\tt)} \\\\


158 
$sort$ &=& $id$ ~~$$~~ {\tt\ttlbrace\ttrbrace}


159 
~~$$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}


160 
\end{tabular}\index{*"!"!}\index{*"["}\index{*""]}


161 
\indexbold{type@$type$} \indexbold{sort@$sort$} \indexbold{idt@$idt$}


162 
\indexbold{idts@$idts$} \indexbold{logic@$logic$} \indexbold{prop@$prop$}


163 
\indexbold{fun@$fun$}


164 
\end{center}


165 
\caption{MetaLogic Syntax}


166 
\label{fig:pure_gram}


167 
\end{figure}


168 


169 
The following main categories are defined:


170 
\begin{description}


171 
\item[$prop$] Terms of type $prop$, i.e.\ formulae of the metalogic.


172 


173 
\item[$aprop$] Atomic propositions.


174 


175 
\item[$logic$] Terms of types in class $logic$. Initially, $logic$ contains


176 
merely $prop$. As the syntax is extended by new objectlogics, more

108

177 
productions for $logic$ are added automatically (see below).

104

178 


179 
\item[$fun$] Terms potentially of function type.


180 


181 
\item[$type$] Metatypes.


182 


183 
\item[$idts$] a list of identifiers, possibly constrained by types. Note


184 
that \verbx :: nat y is parsed as \verbx :: (nat y), i.e.\ {\tt y}

108

185 
would be treated like a type constructor applied to {\tt nat}.

104

186 
\end{description}


187 


188 


189 
\subsection{Logical types and default syntax}


190 


191 
Isabelle is concerned with mathematical languages which have a certain


192 
minimal vocabulary: identifiers, variables, parentheses, and the lambda


193 
calculus. Logical types, i.e.\ those of class $logic$, are automatically


194 
equipped with this basic syntax. More precisely, for any type constructor


195 
$ty$ with arity $(\dots)c$, where $c$ is a subclass of $logic$, the following


196 
productions are added:


197 
\begin{center}


198 
\begin{tabular}{rclc}


199 
$ty$ &=& $id$ ~~$$~~ $var$ ~~$$~~ {\tt(} $ty$ {\tt)} \\


200 
&$$& $fun@{max_pri}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)}\\


201 
&$$& $ty@{max_pri}$ {\tt::} $type$\\\\


202 
$logic$ &=& $ty$


203 
\end{tabular}


204 
\end{center}


205 


206 


207 
\subsection{Lexical matters *}


208 


209 
The parser does not process input strings directly, it rather operates on


210 
token lists provided by Isabelle's \rmindex{lexical analyzer} (the


211 
\bfindex{lexer}). There are two different kinds of tokens: {\bf


212 
literals}\indexbold{literal token}\indexbold{token!literal} and {\bf valued


213 
tokens}\indexbold{valued token}\indexbold{token!valued}.


214 

108

215 
Literals can be regarded as reserved words\index{reserved word} of the syntax


216 
and the user can add new ones, when extending theories. In


217 
Figure~\ref{fig:pure_gram} they appear in typewriter type, e.g.\ {\tt PROP},


218 
{\tt ==}, {\tt =?=}, {\tt ;}.

104

219 


220 
Valued tokens on the other hand have a fixed predefined syntax. The lexer


221 
distinguishes four kinds of them: identifiers\index{identifier},


222 
unknowns\index{unknown}\index{scheme variablesee{unknown}}, type


223 
variables\index{type variable}, type unknowns\index{type unknown}\index{type


224 
scheme variablesee{type unknown}}; they are denoted by $id$\index{id@$id$},


225 
$var$\index{var@$var$}, $tfree$\index{tfree@$tfree$},


226 
$tvar$\index{tvar@$tvar$}, respectively. Typical examples are {\tt x}, {\tt


227 
?x7}, {\tt 'a}, {\tt ?'a3}, the exact syntax is:


228 


229 
\begin{tabular}{rcl}


230 
$id$ & = & $letter~quasiletter^*$ \\


231 
$var$ & = & ${\tt ?}id ~~~~ {\tt ?}id{\tt .}nat$ \\


232 
$tfree$ & = & ${\tt '}id$ \\


233 
$tvar$ & = & ${\tt ?}tfree ~~~~ {\tt ?}tfree{\tt .}nat$ \\[1ex]


234 


235 
$letter$ & = & one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z} \\


236 
$digit$ & = & one of {\tt 0}\dots {\tt 9} \\


237 
$quasiletter$ & = & $letter ~~~~ digit ~~~~ {\tt _} ~~~~ {\tt '}$ \\


238 
$nat$ & = & $digit^+$


239 
\end{tabular}


240 


241 
A string of $var$ or $tvar$ describes an \rmindex{unknown} which is


242 
internally a pair of base name and index (\ML\ type \ttindex{indexname}).


243 
These components are either explicitly separated by a dot as in {\tt ?x.1} or


244 
{\tt ?x7.3} or directly run together as in {\tt ?x1}. The latter form is

108

245 
possible, if the base name does not end with digits. And if the index is 0,


246 
it may be dropped altogether: {\tt ?x} abbreviates {\tt ?x0} or {\tt ?x.0}.

104

247 


248 
Note that $id$, $var$, $tfree$, $tvar$ are mutually disjoint, but it is


249 
perfectly legal that they overlap with the set of literal tokens (e.g.\ {\tt


250 
PROP}, {\tt ALL}, {\tt EX}).


251 


252 
The lexical analyzer translates input strings to token lists by repeatedly


253 
taking the maximal prefix of the input string that forms a valid token. A


254 
maximal prefix that is both a literal and a valued token is treated as a


255 
literal. Spaces, tabs and newlines are separators; they never occur within


256 
tokens.


257 


258 
Note that literals need not necessarily be surrounded by white space to be


259 
recognized as separate. For example, if {\tt } is a literal but {\tt } is


260 
not, then the string {\tt } is treated as two consecutive occurrences of


261 
{\tt }. This is in contrast to \ML\ which would treat {\tt } always as a


262 
single symbolic name. The consequence of Isabelle's more liberal scheme is


263 
that the same string may be parsed in different ways after extending the


264 
syntax: after adding {\tt } as a literal, the input {\tt } is treated as


265 
a single token.


266 


267 


268 
\subsection{Inspecting syntax *}


269 


270 
You may get the \ML\ representation of the syntax of any Isabelle theory by


271 
applying \index{*syn_of}


272 
\begin{ttbox}


273 
syn_of: theory > Syntax.syntax


274 
\end{ttbox}

108

275 
\ttindex{Syntax.syntax} is an abstract type. Values of this type can be


276 
displayed by the following functions: \index{*Syntax.print_syntax}


277 
\index{*Syntax.print_gram} \index{*Syntax.print_trans}

104

278 
\begin{ttbox}


279 
Syntax.print_syntax: Syntax.syntax > unit


280 
Syntax.print_gram: Syntax.syntax > unit


281 
Syntax.print_trans: Syntax.syntax > unit


282 
\end{ttbox}

108

283 
{\tt Syntax.print_syntax} shows virtually all information contained in a


284 
syntax, therefore being quite verbose. Its output is divided into labeled


285 
sections. The syntax proper is represented by {\tt lexicon}, {\tt roots} and

104

286 
{\tt prods}. The rest refers to the manifold facilities to apply syntactic

108

287 
translations (macro expansion etc.). See \S\ref{sec:macros} and


288 
\S\ref{sec:tr_funs} for more details on translations.

104

289 


290 
To simplify coping with the verbosity of {\tt Syntax.print_syntax}, there are


291 
\ttindex{Syntax.print_gram} to print the syntax proper only and

108

292 
\ttindex{Syntax.print_trans} to print the translation related information


293 
only.

104

294 


295 
Let's have a closer look at part of Pure's syntax:


296 
\begin{ttbox}


297 
Syntax.print_syntax (syn_of Pure.thy);


298 
{\out lexicon: "!!" "%" "(" ")" "," "." "::" ";" "==" "==>" \dots}


299 
{\out roots: logic type fun prop}


300 
{\out prods:}


301 
{\out type = tfree (1000)}


302 
{\out type = tvar (1000)}


303 
{\out type = id (1000)}


304 
{\out type = tfree "::" sort[0] => "_ofsort" (1000)}


305 
{\out type = tvar "::" sort[0] => "_ofsort" (1000)}


306 
{\out \vdots}


307 
{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}


308 
{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}


309 
{\out "_idtyp" "_lambda" "_tapp" "_tappl"}


310 
{\out parse_rules:}


311 
{\out parse_translation: "!!" "_K" "_abs" "_aprop"}


312 
{\out print_translation: "all"}


313 
{\out print_rules:}


314 
{\out print_ast_translation: "==>" "_abs" "_idts" "fun"}


315 
\end{ttbox}


316 


317 
\begin{description}


318 
\item[\ttindex{lexicon}]


319 
The set of literal tokens (i.e.\ reserved words, delimiters) used for


320 
lexical analysis.


321 


322 
\item[\ttindex{roots}]


323 
The legal syntactic categories to start parsing with. You name the


324 
desired root directly as a string when calling lower level functions or


325 
specifying macros. Higher level functions usually expect a type and


326 
derive the actual root as follows:\index{root_of_type@$root_of_type$}


327 
\begin{itemize}


328 
\item $root_of_type(\tau@1 \To \tau@2) = \mtt{fun}$.


329 


330 
\item $root_of_type(\tau@1 \mathrel{ty} \tau@2) = ty$.


331 


332 
\item $root_of_type((\tau@1, \dots, \tau@n)ty) = ty$.


333 


334 
\item $root_of_type(\alpha) = \mtt{logic}$.


335 
\end{itemize}


336 
Thereby $\tau@1, \dots, \tau@n$ are types, $ty$ a type infix or ordinary


337 
type constructor and $\alpha$ a type variable or unknown. Note that only


338 
the outermost type constructor is taken into account.


339 


340 
\item[\ttindex{prods}]


341 
The list of productions describing the precedence grammar. Nonterminals


342 
$A@n$ are rendered in ASCII as {\tt $A$[$n$]}, literal tokens are quoted.


343 
Note that some productions have strings attached after an {\tt =>}. These


344 
strings later become the heads of parse trees, but they also play a vital


345 
role when terms are printed (see \S\ref{sec:asts}).


346 


347 
Productions which do not have string attached and thus do not create a


348 
new parse tree node are called {\bf copy productions}\indexbold{copy


349 
production}. They must have exactly one


350 
argument\index{argument!production} (i.e.\ nonterminal or valued token)


351 
on the righthand side. The parse tree generated when parsing that


352 
argument is simply passed up as the result of parsing the whole copy


353 
production.


354 


355 
A special kind of copy production is one where the argument is a


356 
nonterminal and no additional syntactic sugar (literals) exists. It is


357 
called a \bfindex{chain production}. Chain productions should be seen as


358 
an abbreviation mechanism: conceptually, they are removed from the


359 
grammar by adding appropriate new rules. Precedence information attached


360 
to chain productions is ignored, only the dummy value $1$ is displayed.


361 


362 
\item[\ttindex{consts}, \ttindex{parse_rules}, \ttindex{print_rules}]

108

363 
This is macro related information (see \S\ref{sec:macros}).

104

364 


365 
\item[\tt *_translation]


366 
\index{*parse_ast_translation} \index{*parse_translation}


367 
\index{*print_translation} \index{*print_ast_translation}


368 
The sets of constants that invoke translation functions. These are more


369 
arcane matters (see \S\ref{sec:asts} and \S\ref{sec:tr_funs}).


370 
\end{description}


371 


372 
Of course you may inspect the syntax of any theory using the above calling

108

373 
sequence. Beware that, as more and more material accumulates, the output


374 
becomes even more verbose. When extending syntaxes, new {\tt roots}, {\tt


375 
prods}, {\tt parse_rules} and {\tt print_rules} are appended to the end. The


376 
other lists are displayed sorted.

104

377 


378 


379 


380 
\section{Abstract syntax trees} \label{sec:asts}


381 


382 
Figure~\ref{fig:parse_print} shows a simplified model of the parsing and


383 
printing process.


384 


385 
\begin{figure}[htb]


386 
\begin{center}


387 
\begin{tabular}{cl}


388 
string & \\


389 
$\downarrow$ & parser \\


390 
parse tree & \\


391 
$\downarrow$ & \rmindex{parse ast translation} \\


392 
ast & \\


393 
$\downarrow$ & ast rewriting (macros) \\


394 
ast & \\


395 
$\downarrow$ & \rmindex{parse translation}, typeinference \\


396 
 welltyped term  & \\


397 
$\downarrow$ & \rmindex{print translation} \\


398 
ast & \\


399 
$\downarrow$ & ast rewriting (macros) \\


400 
ast & \\


401 
$\downarrow$ & \rmindex{print ast translation}, printer \\


402 
string &


403 
\end{tabular}


404 
\end{center}


405 
\caption{Parsing and Printing}


406 
\label{fig:parse_print}


407 
\end{figure}


408 

108

409 
The parser takes an input string (more precisely the token list produced by


410 
the lexer) and produces a \rmin«ôx{parse tree}, which is directly derived


411 
from the productions involved. By applying some internal transformations the


412 
parse tree becomes an \bfindex{abstract syntax tree} (\bfindex{ast}). Macro

104

413 
expansion, further translations and finally type inference yields a


414 
welltyped term\index{term!welltyped}.


415 


416 
The printing process is the reverse, except for some subtleties to be


417 
discussed later.


418 


419 
Asts are an intermediate form between the very raw parse trees and the typed


420 
$\lambda$terms. An ast is either an atom (constant or variable) or a list of

108

421 
{\em at least two\/} subasts. Internally, they have type \ttindex{Syntax.ast}

104

422 
which is defined as: \index{*Constant} \index{*Variable} \index{*Appl}


423 
\begin{ttbox}


424 
datatype ast =


425 
Constant of string 


426 
Variable of string 


427 
Appl of ast list


428 
\end{ttbox}


429 


430 
Notation: We write constant atoms as quoted strings, variable atoms as


431 
nonquoted strings and applications as list of subasts separated by space and


432 
enclosed in parentheses. For example:


433 
\begin{ttbox}


434 
Appl [Constant "_constrain",


435 
Appl [Constant "_abs", Variable "x", Variable "t"],


436 
Appl [Constant "fun", Variable "'a", Variable "'b"]]


437 
{\rm is written as} ("_constrain" ("_abs" x t) ("fun" 'a 'b))


438 
\end{ttbox}


439 


440 
Note that {\tt ()} and {\tt (f)} are both illegal.


441 


442 
The resemblance of LISP's Sexpressions is intentional, but notice the two


443 
kinds of atomic symbols: $\Constant x$ and $\Variable x$. This distinction

108

444 
has some more obscure reasons and you can ignore it about half of the time.


445 
You should not take the names ``{\tt Constant}'' and ``{\tt Variable}'' too


446 
literally. In the later translation to terms, $\Variable x$ may become a


447 
constant, free or bound variable, even a type constructor or class name; the


448 
actual outcome depends on the context.

104

449 


450 
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as some sort of


451 
application of $f$ to the arguments $x@1, \ldots, x@n$. But the actual kind


452 
of application (say a type constructor $f$ applied to types $x@1, \ldots,


453 
x@n$) is determined later by context, too.


454 


455 
Forms like {\tt (("_abs" x $t$) $u$)} are perfectly legal, but asts are not


456 
higherorder: the {\tt "_abs"} does not yet bind the {\tt x} in any way,


457 
though later at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs}


458 
node and occurrences of {\tt x} in $t$ will be replaced by {\tt Bound}s. Even


459 
if non constant heads of applications may seem unusual, asts should be

108

460 
regarded as firstorder. Purists may think of ${\tt (} f~x@1~\ldots~x@n{\tt


461 
)}$ as a firstorder application of some invisible $(n+1)$ary constant.

104

462 


463 


464 
\subsection{Parse trees to asts}


465 


466 
Asts are generated from parse trees by applying some translation functions,


467 
which are internally called {\bf parse ast translations}\indexbold{parse ast


468 
translation}.


469 


470 
We can think of the raw output of the parser as trees built up by nesting the


471 
righthand sides of those productions that were used to derive a word that


472 
matches the input token list. Then parse trees are simply lists of tokens and


473 
sub parse trees, the latter replacing the nonterminals of the productions.


474 
Note that we refer here to the actual productions in their internal form as


475 
displayed by {\tt Syntax.print_syntax}.


476 


477 
Ignoring parse ast translations, the mapping


478 
$ast_of_pt$\index{ast_of_pt@$ast_of_pt$} from parse trees to asts is derived


479 
from the productions involved as follows:


480 
\begin{itemize}


481 
\item Valued tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$,


482 
$var$, $tfree$ or $tvar$ token, and $s$ its value.


483 


484 
\item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$.


485 


486 
\item $0$ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$.


487 


488 
\item $n$ary productions: $ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>}


489 
c) = \Appl{\Constant c,$ $ast_of_pt(P@1),$ $\ldots,$ $ast_of_pt(P@n)}$,


490 
where $n \ge 1$.


491 
\end{itemize}


492 
Thereby $P, P@1, \ldots, P@n$ denote sub parse trees or valued tokens and


493 
``\dots'' zero or more literal tokens. That means literal tokens are stripped


494 
and do not appear in asts.


495 


496 
The following table presents some simple examples:


497 


498 
{\tt\begin{tabular}{ll}


499 
\rm input string & \rm ast \\\hline


500 
"f" & f \\


501 
"'a" & 'a \\


502 
"t == u" & ("==" t u) \\


503 
"f(x)" & ("_appl" f x) \\


504 
"f(x, y)" & ("_appl" f ("_args" x y)) \\


505 
"f(x, y, z)" & ("_appl" f ("_args" x ("_args" y z))) \\


506 
"\%x y.\ t" & ("_lambda" ("_idts" x y) t) \\


507 
\end{tabular}}


508 


509 
Some of these examples illustrate why further translations are desirable in


510 
order to provide some nice standard form of asts before macro expansion takes


511 
place. Hence the Pure syntax provides predefined parse ast


512 
translations\index{parse ast translation!of Pure} for ordinary applications,


513 
type applications, nested abstraction, meta implication and function types.


514 
Their net effect on some representative input strings is shown in


515 
Figure~\ref{fig:parse_ast_tr}.


516 


517 
\begin{figure}[htb]


518 
\begin{center}


519 
{\tt\begin{tabular}{ll}


520 
\rm string & \rm ast \\\hline


521 
"f(x, y, z)" & (f x y z) \\


522 
"'a ty" & (ty 'a) \\


523 
"('a, 'b) ty" & (ty 'a 'b) \\


524 
"\%x y z.\ t" & ("_abs" x ("_abs" y ("_abs" z t))) \\


525 
"\%x ::\ 'a.\ t" & ("_abs" ("_constrain" x 'a) t) \\


526 
"[ P; Q; R ] => S" & ("==>" P ("==>" Q ("==>" R S))) \\


527 
"['a, 'b, 'c] => 'd" & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))


528 
\end{tabular}}


529 
\end{center}


530 
\caption{Builtin Parse Ast Translations}


531 
\label{fig:parse_ast_tr}


532 
\end{figure}


533 


534 
This translation process is guided by names of constant heads of asts. The


535 
list of constants invoking parse ast translations is shown in the output of


536 
{\tt Syntax.print_syntax} under {\tt parse_ast_translation}.


537 


538 


539 
\subsection{Asts to terms *}


540 


541 
After the ast has been normalized by the macro expander (see


542 
\S\ref{sec:macros}), it is transformed into a term with yet another set of


543 
translation functions interspersed: {\bf parse translations}\indexbold{parse


544 
translation}. The result is a nontyped term\index{term!nontyped} which may


545 
contain type constraints, that is 2place applications with head {\tt


546 
"_constrain"}. The second argument of a constraint is a type encoded as term.


547 
Real types will be introduced later during type inference, which is not


548 
discussed here.


549 


550 
If we ignore the builtin parse translations of Pure first, then the mapping


551 
$term_of_ast$\index{term_of_ast@$term_of_ast$} from asts to (nontyped) terms


552 
is defined by:


553 
\begin{itemize}


554 
\item $term_of_ast(\Constant x) = \ttfct{Const} (x, \mtt{dummyT})$.


555 


556 
\item $term_of_ast(\Variable \mtt{"?}xi\mtt") = \ttfct{Var} ((x, i),


557 
\mtt{dummyT})$, where $x$ is the base name and $i$ the index extracted


558 
from $xi$.


559 


560 
\item $term_of_ast(\Variable x) = \ttfct{Free} (x, \mtt{dummyT})$.


561 


562 
\item $term_of_ast(\Appl{f, x@1, \ldots, x@n}) = term_of_ast(f) ~{\tt\$}~


563 
term_of_ast(x@1)$ {\tt\$} \dots\ {\tt\$} $term_of_ast(x@n)$.


564 
\end{itemize}


565 
Note that {\tt Const}, {\tt Var}, {\tt Free} belong to the datatype {\tt


566 
term} and \ttindex{dummyT} is bound to some dummy {\tt typ}, which is ignored


567 
during typeinference.


568 

108

569 
So far the outcome is still a firstorder term. {\tt Abs}s and {\tt Bound}s

104

570 
are introduced by parse translations associated with {\tt "_abs"} and {\tt


571 
"!!"} (and any other user defined binder).


572 


573 


574 
\subsection{Printing of terms *}


575 


576 
When terms are prepared for printing, they are first transformed into asts


577 
via $ast_of_term$\index{ast_of_term@$ast_of_term$} (ignoring {\bf print


578 
translations}\indexbold{print translation}):


579 
\begin{itemize}


580 
\item $ast_of_term(\ttfct{Const} (x, \tau)) = \Constant x$.


581 


582 
\item $ast_of_term(\ttfct{Free} (x, \tau)) = constrain (\Variable x,


583 
\tau)$.


584 


585 
\item $ast_of_term(\ttfct{Var} ((x, i), \tau)) = constrain (\Variable


586 
\mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of


587 
the {\tt indexname} $(x, i)$.


588 


589 
\item $ast_of_term(\ttfct{Abs} (x, \tau, t)) = \ttfct{Appl}


590 
\mathopen{\mtt[} \Constant \mtt{"_abs"}, constrain(\Variable x', \tau),$


591 
$ast_of_term(t') \mathclose{\mtt]}$, where $x'$ is a version of $x$


592 
renamed away from all names occurring in $t$, and $t'$ the body $t$ with


593 
all {\tt Bound}s referring to this {\tt Abs} replaced by $\ttfct{Free}


594 
(x', \mtt{dummyT})$.


595 


596 
\item $ast_of_term(\ttfct{Bound} i) = \Variable \mtt{"B.}i\mtt"$. Note that


597 
the occurrence of loose bound variables is abnormal and should never


598 
happen when printing welltyped terms.


599 


600 
\item $ast_of_term(f \ttrel{\$} x@1 \ttrel{\$} \ldots \ttrel{\$} x@n) =


601 
\ttfct{Appl} \mathopen{\mtt[} ast_of_term(f), ast_of_term(x@1), \ldots,$


602 
$ast_of_term(x@n) \mathclose{\mtt]}$, where $f$ is not an application.


603 


604 
\item $constrain(x, \tau) = x$, if $\tau = \mtt{dummyT}$ \index{*dummyT} or


605 
\ttindex{show_types} not set to {\tt true}.


606 


607 
\item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, ty}$,


608 
where $ty$ is the ast encoding of $\tau$. That is: type constructors as


609 
{\tt Constant}s, type variables as {\tt Variable}s and type applications


610 
as {\tt Appl}s with the head type constructor as first element.


611 
Additionally, if \ttindex{show_sorts} is set to {\tt true}, some type


612 
variables are decorated with an ast encoding their sort.


613 
\end{itemize}


614 


615 
\medskip


616 
After an ast has been normalized wrt.\ the print macros, it is transformed


617 
into the final output string. The builtin {\bf print ast


618 
translations}\indexbold{print ast translation} are essentially the reverse


619 
ones of the parse ast translations of Figure~\ref{fig:parse_ast_tr}.


620 


621 
For the actual printing process, the names attached to grammar productions of


622 
the form $\ldots A@{p@1}^1 \ldots A@{p@n}^n \ldots \mtt{=>} c$ play a vital


623 
role. Whenever an ast with constant head $c$, i.e.\ $\mtt"c\mtt"$ or


624 
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is encountered it is printed according to


625 
the production for $c$. This means that first the arguments $x@1$ \dots $x@n$


626 
are printed, then put in parentheses if necessary for reasons of precedence,


627 
and finally joined to a single string, separated by the syntactic sugar of


628 
the production (denoted by ``\dots'' above).


629 


630 
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the


631 
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots

108

632 
x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with

104

633 
nonconstant head or without a corresponding production are printed as


634 
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. A single


635 
$\Variable x$ is simply printed as $x$.


636 

108

637 
Note that the system does {\em not\/} insert blanks automatically. They


638 
should be part of the mixfix declaration (which provide the user interface


639 
for defining syntax) if they are required to separate tokens. Mixfix


640 
declarations may also contain pretty printing annotations. See


641 
\S\ref{sec:mixfix} for details.

104

642 


643 


644 


645 
\section{Mixfix declarations} \label{sec:mixfix}


646 


647 
When defining theories, the user usually declares new constants as names


648 
associated with a type followed by an optional \bfindex{mixfix annotation}.


649 
If none of the constants are introduced with mixfix annotations, there is no


650 
concrete syntax to speak of: terms can only be abstractions or applications


651 
of the form $f(t@1, \dots, t@n)$. Since this notation quickly becomes


652 
unreadable, Isabelle supports syntax definitions in the form of unrestricted


653 
contextfree \index{contextfree grammar} \index{precedence grammar}


654 
precedence grammars using mixfix annotations.


655 

108

656 
Mixfix annotations describe the {\em concrete\/} syntax, a standard


657 
translation into the abstract syntax and a pretty printing scheme, all in


658 
one. Isabelle syntax definitions are inspired by \OBJ's~\cite{OBJ} {\em


659 
mixfix\/} syntax. Each mixfix annotation defines a precedence grammar


660 
production and optionally associates a constant with it.

104

661 


662 
There is a general form of mixfix annotation exhibiting the full power of


663 
extending a theory's syntax, and some shortcuts for common cases like infix


664 
operators.


665 

108

666 
The general \bfindex{mixfix declaration} as it may occur within the {\tt

104

667 
consts} section\index{consts section@{\tt consts} section} of a {\tt .thy}


668 
file, specifies a constant declaration and a grammar production at the same


669 
time. It has the form {\tt $c$ ::\ "$\tau$" ("$sy$" $ps$ $p$)} and is


670 
interpreted as follows:


671 
\begin{itemize}


672 
\item {\tt $c$ ::\ "$\tau$"} is the actual constant declaration without any


673 
syntactic significance.


674 


675 
\item $sy$ is the righthand side of the production, specified as a


676 
symbolic string. In general, $sy$ is of the form $\alpha@0 \_ \alpha@1


677 
\dots \alpha@{n1} \_ \alpha@n$, where each occurrence of \ttindex{_}


678 
denotes an argument\index{argument!mixfix} position and the strings


679 
$\alpha@i$ do not contain (nonescaped) {\tt _}. The $\alpha@i$ may

108

680 
consist of delimiters\index{delimiter},

104

681 
spaces\index{space (pretty printing)} or \rmindex{pretty printing}


682 
annotations (see below).


683 


684 
\item $\tau$ specifies the syntactic categories of the arguments on the


685 
lefthand and of the righthand side. Arguments may be nonterminals or


686 
valued tokens. If $sy$ is of the form above, $\tau$ must be a nested


687 
function type with at least $n$ argument positions, say $\tau = [\tau@1,


688 
\dots, \tau@n] \To \tau'$. The syntactic category of argument $i$ is


689 
derived from type $\tau@i$ (see $root_of_type$ in


690 
\S\ref{sec:basic_syntax}). The result, i.e.\ the lefthand side of the


691 
production, is derived from type $\tau'$. Note that the $\tau@i$ and


692 
$\tau'$ may be function types.


693 


694 
\item $c$ is the name of the constant associated with the production. If


695 
$c$ is the empty string {\tt ""} then this is a \rmindex{copy


696 
production}. Otherwise, parsing an instance of the phrase $sy$ generates


697 
the ast {\tt ("$c$" $a@1$ $\ldots$ $a@n$)}, where $a@i$ is the ast


698 
generated by parsing the $i$th argument.


699 


700 
\item $ps$ is an optional list of at most $n$ integers, say {\tt [$p@1$,


701 
$\ldots$, $p@m$]}, where $p@i$ is the minimal \rmindex{precedence}


702 
required of any phrase that may appear as the $i$th argument. Missing


703 
precedences default to $0$.


704 


705 
\item $p$ is the \rmindex{precedence} the of this production.


706 
\end{itemize}


707 


708 
Precedences\index{precedence!range of} may range between $0$ and


709 
$max_pri$\indexbold{max_pri@$max_pri$} (= 1000). If you want to ignore them,


710 
the safest way to do so is to use the declaration {\tt $c$ ::\ "$\tau$"


711 
("$sy$")}. The resulting production puts no precedence constraints on any of


712 
its arguments and has maximal precedence itself.


713 


714 
\begin{example}


715 
The following theory specification contains a {\tt consts} section that


716 
encodes the precedence grammar of Example~\ref{ex:precedence} as mixfix


717 
declarations:


718 
\begin{ttbox}


719 
EXP = Pure +


720 
types


721 
exp 0


722 
arities


723 
exp :: logic


724 
consts


725 
"0" :: "exp" ("0" 9)


726 
"+" :: "[exp, exp] => exp" ("_ + _" [0, 1] 0)


727 
"*" :: "[exp, exp] => exp" ("_ * _" [3, 2] 2)


728 
"" :: "exp => exp" (" _" [3] 3)


729 
end


730 
\end{ttbox}


731 
Note that the {\tt arities} declaration causes {\tt exp} to be added to the


732 
syntax' roots. If you put the above text into a file {\tt exp.thy} and load

108

733 
it via {\tt use_thy "exp"}, you can run some tests:

104

734 
\begin{ttbox}


735 
val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";


736 
read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";


737 
{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}


738 
{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}


739 
{\out \vdots}


740 
read_exp "0 +  0 + 0";


741 
{\out tokens: "0" "+" "" "0" "+" "0"}


742 
{\out raw: ("+" ("+" "0" ("" "0")) "0")}


743 
{\out \vdots}


744 
\end{ttbox}


745 
The output of \ttindex{Syntax.test_read} includes the token list ({\tt


746 
tokens}) and the raw ast directly derived from the parse tree, ignoring parse


747 
ast translations. The rest is tracing information provided by the macro


748 
expander (see \S\ref{sec:macros}).


749 

108

750 
Executing {\tt Syntax.print_gram (syn_of EXP.thy)} reveals the actual grammar


751 
productions derived from the above mixfix declarations (lots of additional


752 
information deleted):

104

753 
\begin{ttbox}


754 
exp = "0" => "0" (9)


755 
exp = exp[0] "+" exp[1] => "+" (0)


756 
exp = exp[3] "*" exp[2] => "*" (2)


757 
exp = "" exp[3] => "" (3)


758 
\end{ttbox}


759 
\end{example}


760 


761 
Let us now have a closer look at the structure of the string $sy$ appearing


762 
in mixfix annotations. This string specifies a list of parsing and printing


763 
directives, namely delimiters\index{delimiter},


764 
arguments\index{argument!mixfix}, spaces\index{space (pretty printing)},


765 
blocks of indentation\index{block (pretty printing)} and optional or forced


766 
line breaks\index{break (pretty printing)}. These are encoded via the


767 
following character sequences:


768 


769 
\begin{description}


770 
\item[~\ttindex_~] An argument\index{argument!mixfix} position.


771 


772 
\item[~$d$~] A \rmindex{delimiter}, i.e.\ a nonempty sequence of

108

773 
nonspecial or escaped characters. Escaping a

104

774 
character\index{escape character} means preceding it with a {\tt '}


775 
(quote). Thus you have to write {\tt ''} if you really want a single


776 
quote. Delimiters may never contain white space, though.


777 


778 
\item[~$s$~] A nonempty sequence of spaces\index{space (pretty printing)}


779 
for printing.


780 


781 
\item[~{\ttindex($n$}~] Open a block\index{block (pretty printing)}. $n$ is


782 
an optional sequence of digits that specifies the amount of indentation


783 
to be added when a line break occurs within the block. If {\tt(} is not


784 
followed by a digit, the indentation defaults to $0$.


785 


786 
\item[~\ttindex)~] Close a block.


787 


788 
\item[~\ttindex{//}~] Force a line break\index{break (pretty printing)}.


789 


790 
\item[~\ttindex/$s$~] Allow a line break\index{break (pretty printing)}.


791 
Spaces $s$ right after {\tt /} are only printed if the break is not


792 
taken.


793 
\end{description}


794 


795 
In terms of parsing, arguments are nonterminals or valued tokens, while


796 
delimiters are literal tokens. The other directives have only significance


797 
for printing. The \rmindex{pretty printing} mechanisms of Isabelle is

108

798 
essentially the one described in \cite{paulson91}.

104

799 


800 


801 
\subsection{Infixes}


802 


803 
Infix\index{infix} operators associating to the left or right can be declared


804 
conveniently using \ttindex{infixl} or \ttindex{infixr}.


805 


806 
Roughly speaking, the form {\tt $c$ ::\ "$\tau$" (infixl $p$)} abbreviates:


807 
\begin{ttbox}


808 
"op \(c\)" ::\ "\(\tau\)" ("op \(c\)")


809 
"op \(c\)" ::\ "\(\tau\)" ("(_ \(c\)/ _)" [\(p\), \(p + 1\)] \(p\))


810 
\end{ttbox}


811 
and {\tt $c$ ::\ "$\tau$" (infixr $p$)} abbreviates:


812 
\begin{ttbox}


813 
"op \(c\)" ::\ "\(\tau\)" ("op \(c\)")


814 
"op \(c\)" ::\ "\(\tau\)" ("(_ \(c\)/ _)" [\(p + 1\), \(p\)] \(p\))


815 
\end{ttbox}


816 


817 
Thus, prefixing infixes with \ttindex{op} makes them behave like ordinary


818 
function symbols. Special characters occurring in $c$ have to be escaped as

108

819 
in delimiters. Also note that the expanded forms above are illegal at the


820 
user level because of duplicate declarations of constants.

104

821 


822 


823 
\subsection{Binders}


824 


825 
A \bfindex{binder} is a variablebinding construct, such as a


826 
\rmindex{quantifier}. The constant declaration \indexbold{*binder}


827 
\begin{ttbox}


828 
\(c\) ::\ "\(\tau\)" (binder "\(Q\)" \(p\))


829 
\end{ttbox}


830 
introduces a binder $c$ of type $\tau$, which must have the form $(\tau@1 \To


831 
\tau@2) \To \tau@3$. Its concrete syntax is $Q~x.P$. A binder is like a


832 
generalized quantifier where $\tau@1$ is the type of the bound variable $x$,


833 
$\tau@2$ the type of the body $P$, and $\tau@3$ the type of the whole term.


834 
For example $\forall$ can be declared like this:


835 
\begin{ttbox}


836 
All :: "('a => o) => o" (binder "ALL " 10)


837 
\end{ttbox}


838 
This allows us to write $\forall x.P$ either as {\tt All(\%$x$.$P$)} or {\tt


839 
ALL $x$.$P$}. When printing terms, Isabelle usually uses the latter form, but


840 
has to fall back on $\mtt{All}(P)$, if $P$ is not an abstraction.


841 


842 
Binders $c$ of type $(\sigma \To \tau) \To \tau$ can be nested; then the


843 
internal form $c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P)


844 
\ldots))$ corresponds to external $Q~x@1~x@2 \ldots x@n. P$.


845 


846 
\medskip


847 
The general binder declaration


848 
\begin{ttbox}


849 
\(c\) ::\ "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)" (binder "\(Q\)" \(p\))


850 
\end{ttbox}


851 
is internally expanded to


852 
\begin{ttbox}


853 
\(c\) ::\ "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)"


854 
"\(Q\)" ::\ "[idts, \(\tau@2\)] => \(\tau@3\)" ("(3\(Q\)_./ _)" \(p\))


855 
\end{ttbox}


856 
with $idts$ being the syntactic category for a list of $id$s optionally


857 
constrained (see Figure~\ref{fig:pure_gram}). Note that special characters in


858 
$Q$ have to be escaped as in delimiters.


859 


860 
Additionally, a parse translation\index{parse translation!for binder} for $Q$


861 
and a print translation\index{print translation!for binder} for $c$ is


862 
installed. These perform behind the scenes the translation between the


863 
internal and external forms.


864 


865 


866 


867 
\section{Syntactic translations (macros)} \label{sec:macros}


868 


869 
So far we have pretended that there is a close enough relationship between


870 
concrete and abstract syntax to allow an automatic translation from one to

108

871 
the other using the constant name supplied with each noncopy production. In


872 
many cases this scheme is not powerful enough. Some typical examples involve

104

873 
variable binding constructs (e.g.\ {\tt ALL x:A.P} vs.\ {\tt Ball(A, \%x.P)}

108

874 
or convenient notations for enumerations like finite sets, lists etc.\ (e.g.\

104

875 
{\tt [x, y, z]} vs.\ {\tt Cons(x, Cons(y, Cons(z, Nil)))}).


876 


877 
Isabelle offers such translation facilities at two different levels, namely


878 
{\bf macros}\indexbold{macro} and {\bf translation functions}.


879 

108

880 
Macros are specified by firstorder rewriting systems that operate on asts.

104

881 
They are usually easy to read and in most cases not very difficult to write.

108

882 
Unfortunately, some more obscure translations cannot be expressed as macros


883 
and you have to fall back on the more powerful mechanism of translation


884 
functions written in \ML. These are quite unreadable and hard to write (see


885 
\S\ref{sec:tr_funs}).

104

886 


887 
\medskip

108

888 
Let us now get started with the macro system by a simple example:

104

889 


890 
\begin{example}~ \label{ex:set_trans}


891 


892 
\begin{ttbox}


893 
SET = Pure +


894 
types


895 
i, o 0


896 
arities


897 
i, o :: logic


898 
consts


899 
Trueprop :: "o => prop" ("_" 5)


900 
Collect :: "[i, i => o] => i"


901 
"{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})")


902 
Replace :: "[i, [i, i] => o] => i"


903 
"{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")


904 
Ball :: "[i, i => o] => o"


905 
"{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10)


906 
translations


907 
"{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, %x. P)"


908 
"{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, %x y. Q)"


909 
"ALL x:A. P" == "Ball(A, %x. P)"


910 
end


911 
\end{ttbox}


912 


913 
This and the following theories are complete working examples, though they


914 
are fragmentary as they contain merely syntax. They are somewhat fashioned


915 
after {\tt ZF/zf.thy}, where you should look for a good realworld example.


916 


917 
{\tt SET} defines constants for set comprehension ({\tt Collect}),


918 
replacement ({\tt Replace}) and bounded universal quantification ({\tt


919 
Ball}). Without additional syntax you would have to express $\forall x \in A.


920 
P(x)$ as {\tt Ball(A, P)}. Since this is quite awkward, we define additional


921 
constants with appropriate concrete syntax. These constants are decorated


922 
with {\tt\at} to stress their pure syntactic purpose; they should never occur


923 
within the final welltyped terms. Another consequence is that the user

108

924 
cannot refer to such names directly, since they are not legal identifiers.

104

925 

108

926 
The translations cause the replacement of external forms by internal forms


927 
after parsing and before printing of terms.

104

928 
\end{example}


929 


930 
This is only a very simple but common instance of a more powerful mechanism.


931 
As a specification of what is to be translated, it should be comprehensible


932 
without further explanations. But there are also some snags and other


933 
peculiarities that are typical for macro systems in general. The purpose of


934 
this section is to explain how Isabelle's macro system really works.


935 


936 


937 
\subsection{Specifying macros}


938 


939 
Basically macros are rewrite rules on asts. But unlike other macro systems of


940 
various programming languages, Isabelle's macros work two way. Therefore a


941 
syntax contains two lists of rules: one for parsing and one for printing.


942 


943 
The {\tt translations} section\index{translations section@{\tt translations}

108

944 
section} consists of a list of rule specifications of the form:


945 


946 
\begin{center}

104

947 
{\tt $[$ ($root$) $]$ $string$ $[$ => $$ <= $$ == $]$ $[$ ($root$) $]$


948 
$string$}.

108

949 
\end{center}

104

950 


951 
This specifies a \rmindex{parse rule} ({\tt =>}) a \rmindex{print rule} ({\tt


952 
<=}) or both ({\tt ==}). The two $string$s preceded by optional parenthesized


953 
$root$s denote the lefthand and righthand side of the rule as 'source

108

954 
code', i.e.\ in the usual syntax of terms.

104

955 


956 
Rules are internalized wrt.\ an intermediate signature that is obtained from


957 
the parent theories' ones by adding all material of all sections preceding


958 
{\tt translations} in the {\tt .thy} file, especially new syntax defined in


959 
{\tt consts} is already effective.


960 


961 
Then part of the process that transforms input strings into terms is applied:

108

962 
lexing, parsing and parse ast translations (see \S\ref{sec:asts}). Macros


963 
specified in the parents are {\em not\/} expanded. Also note that the lexer

104

964 
runs in a different mode that additionally accepts identifiers of the form


965 
$\_~letter~quasiletter^*$ (like {\tt _idt}, {\tt _K}). The syntactic category


966 
to parse is specified by $root$, which defaults to {\tt logic}.


967 


968 
Finally, Isabelle tries to guess which atoms of the resulting ast of the rule


969 
should be treated as constants during matching (see below). These names are


970 
extracted from all class, type and constant declarations made so far.


971 


972 
\medskip


973 
The result are two lists of translation rules in internal form, that is pairs


974 
of asts. They can be viewed using {\tt Syntax.print_syntax}


975 
(\ttindex{parse_rules} and \ttindex{print_rules}). For {\tt SET} of


976 
Example~\ref{ex:set_trans} these are:


977 
\begin{ttbox}


978 
parse_rules:


979 
("{\at}Collect" x A P) > ("Collect" A ("_abs" x P))


980 
("{\at}Replace" y x A Q) > ("Replace" A ("_abs" x ("_abs" y Q)))


981 
("{\at}Ball" x A P) > ("Ball" A ("_abs" x P))


982 
print_rules:


983 
("Collect" A ("_abs" x P)) > ("{\at}Collect" x A P)


984 
("Replace" A ("_abs" x ("_abs" y Q))) > ("{\at}Replace" y x A Q)


985 
("Ball" A ("_abs" x P)) > ("{\at}Ball" x A P)


986 
\end{ttbox}


987 


988 
Note that in this notation all rules are oriented left to right. In the {\tt


989 
translations} section, which has been split into two parts, print rules


990 
appeared right to left.


991 


992 
\begin{warn}


993 
Be careful not to choose names for variables in rules that are actually


994 
treated as constant. If in doubt, check the rules in their internal form or


995 
the section labeled {\tt consts} in the output of {\tt Syntax.print_syntax}.


996 
\end{warn}


997 


998 


999 
\subsection{Applying rules}


1000 


1001 
In the course of parsing and printing terms, asts are generated as an


1002 
intermediate form as pictured in Figure~\ref{fig:parse_print}. These asts are


1003 
normalized wrt.\ the given lists of translation rules in a uniform manner. As

108

1004 
stated earlier, asts are supposed to be firstorder 'terms'. The rewriting


1005 
systems derived from {\tt translations} sections essentially resemble


1006 
traditional firstorder term rewriting systems. We first examine how a single


1007 
rule is applied.

104

1008 


1009 
Let $t$ be the ast to be normalized and $(l, r)$ some translation rule. A

108

1010 
subast $u$ of $t$ is called {\bf redex}\indexbold{redex (ast)} (reducible


1011 
expression), if it is an instance of $l$. In this case $l$ is said to {\bf


1012 
match}\indexbold{match (ast)} $u$. A redex matched by $l$ may be replaced by


1013 
the corresponding instance of $r$, thus {\bf rewriting}\index{rewrite (ast)}


1014 
the ast $t$.

104

1015 


1016 
Matching requires some notion of {\bf placeholders}\indexbold{placeholder


1017 
(ast)} that may occur in rule patterns but not in ordinary asts, which are


1018 
considered ground. Here we simply use {\tt Variable}s for this purpose.


1019 


1020 
More formally, the matching of $u$ by $l$ is performed as follows (the rule


1021 
pattern is the second argument): \index{match (ast)@$match$ (ast)}


1022 
\begin{itemize}


1023 
\item $match(\Constant x, \Constant x) = \mbox{OK}$.


1024 


1025 
\item $match(\Variable x, \Constant x) = \mbox{OK}$.


1026 


1027 
\item $match(u, \Variable x) = \mbox{OK, bind}~x~\mbox{to}~u$.


1028 


1029 
\item $match(\Appl{u@1, \ldots, u@n}, \Appl{l@1, \ldots, l@n}) = match(u@1,


1030 
l@1), \ldots, match(u@n, l@n)$.


1031 


1032 
\item $match(u, l) = \mbox{FAIL}$ in any other case.


1033 
\end{itemize}


1034 


1035 
This means that a {\tt Constant} pattern matches any atomic asts of the same


1036 
name, while a {\tt Variable} matches any ast. If successful, $match$ yields a

108

1037 
substitution $\sigma$ that is applied to $r$, generating the appropriate

104

1038 
instance that replaces $u$.


1039 


1040 
\medskip


1041 
In order to make things simple and fast, ast rewrite rules $(l, r)$ are


1042 
restricted by the following conditions:


1043 
\begin{itemize}


1044 
\item Rules have to be left linear, i.e.\ $l$ must not contain any {\tt


1045 
Variable} more than once.


1046 


1047 
\item Rules must have constant heads, i.e.\ $l = \mtt"c\mtt"$ or $l =


1048 
(\mtt"c\mtt" ~ x@1 \ldots x@n)$.


1049 


1050 
\item The set of variables contained in $r$ has to be a subset of those of


1051 
$l$.


1052 
\end{itemize}


1053 


1054 
\medskip

108

1055 
Having firstorder matching in mind, the second case of $match$ may look a

104

1056 
bit odd. But this is exactly the place, where {\tt Variable}s of nonrule


1057 
asts behave like {\tt Constant}s. The deeper meaning of this is related with


1058 
asts being very 'primitive' in some sense, ignorant of the underlying

108

1059 
'semantics', not far removed from parse trees. At this level it is not yet

104

1060 
known, which $id$s will become constants, bounds, frees, types or classes. As


1061 
$ast_of_pt$ (see \S\ref{sec:asts}) shows, former parse tree heads appear in


1062 
asts as {\tt Constant}s, while $id$s, $var$s, $tfree$s and $tvar$s become


1063 
{\tt Variable}s.


1064 


1065 
This is at variance with asts generated from terms before printing (see


1066 
$ast_of_term$ in \S\ref{sec:asts}), where all constants and type constructors


1067 
become {\tt Constant}s.


1068 


1069 
\begin{warn}


1070 
This means asts may contain quite a messy mixture of {\tt Variable}s and {\tt


1071 
Constant}s, which is insignificant at macro level because $match$ treats them


1072 
alike.


1073 
\end{warn}


1074 


1075 
Because of this behaviour, different kinds of atoms with the same name are


1076 
indistinguishable, which may make some rules prone to misbehaviour. Regard


1077 
the following fragmentary example:


1078 
\begin{ttbox}


1079 
types


1080 
Nil 0


1081 
consts


1082 
Nil :: "'a list"


1083 
"[]" :: "'a list" ("[]")


1084 
translations


1085 
"[]" == "Nil"


1086 
\end{ttbox}


1087 
Then the term {\tt Nil} will be printed as {\tt []}, just as expected. What


1088 
happens with \verb%Nil.t or {\tt x::Nil} is left as an exercise.


1089 


1090 


1091 
\subsection{Rewriting strategy}


1092 


1093 
When normalizing an ast by repeatedly applying translation rules until no


1094 
more rule is applicable, there are in each step two choices: which rule to


1095 
apply next, and which redex to reduce.


1096 


1097 
We could assume that the user always supplies terminating and confluent


1098 
rewriting systems, but this would often complicate things unnecessarily.


1099 
Therefore, we reveal part of the actual rewriting strategy: The normalizer


1100 
always applies the first matching rule reducing an unspecified redex chosen


1101 
first.


1102 


1103 
Thereby, 'first rule' is roughly speaking meant wrt.\ the appearance of the


1104 
rules in the {\tt translations} sections. But this is more tricky than it


1105 
seems: If a given theory is {\em extended}, new rules are simply appended to


1106 
the end. But if theories are {\em merged}, it is not clear which list of


1107 
rules has priority over the other. In fact the merge order is left


1108 
unspecified. This shouldn't cause any problems in practice, since


1109 
translations of different theories usually do not overlap. {\tt


1110 
Syntax.print_syntax} shows the rules in their internal order.


1111 


1112 
\medskip


1113 
You can watch the normalization of asts during parsing and printing by

108

1114 
setting \ttindex{Syntax.trace_norm_ast} to {\tt true}. An alternative is the


1115 
use of \ttindex{Syntax.test_read}, which is always in trace mode. The


1116 
information displayed when tracing\index{tracing (ast)} includes: the ast


1117 
before normalization ({\tt pre}), redexes with results ({\tt rewrote}), the


1118 
normal form finally reached ({\tt post}) and some statistics ({\tt


1119 
normalize}). If tracing is off, \ttindex{Syntax.stat_norm_ast} can be set to


1120 
{\tt true} in order to enable printing of the normal form and statistics


1121 
only.

104

1122 


1123 


1124 
\subsection{More examples}


1125 


1126 
Let us first reconsider Example~\ref{ex:set_trans}, which is concerned with


1127 
variable binding constructs.


1128 


1129 
There is a minor disadvantage over an implementation via translation


1130 
functions (as done for binders):


1131 


1132 
\begin{warn}


1133 
If \ttindex{eta_contract} is set to {\tt true}, terms will be

108

1134 
$\eta$contracted {\em before\/} the ast rewriter sees them. Thus some

104

1135 
abstraction nodes needed for print rules to match may get lost. E.g.\


1136 
\verbBall(A, %x. P(x)) is contracted to {\tt Ball(A, P)}, the print rule is


1137 
no longer applicable and the output will be {\tt Ball(A, P)}. Note that

108

1138 
$\eta$expansion via macros is {\em not\/} possible.

104

1139 
\end{warn}


1140 


1141 
\medskip


1142 
Another common trap are meta constraints. If \ttindex{show_types} is set to


1143 
{\tt true}, bound variables will be decorated by their meta types at the


1144 
binding place (but not at occurrences in the body). E.g.\ matching with


1145 
\verbCollect(A, %x. P) binds {\tt x} to something like {\tt ("_constrain" y


1146 
"i")} rather than only {\tt y}. Ast rewriting will cause the constraint to


1147 
appear in the external form, say \verb{y::i:A::i. P::o}. Therefore your


1148 
syntax should be ready for such constraints to be reread. This is the case


1149 
in our example, because of the category {\tt idt} of the first argument.


1150 


1151 
\begin{warn}


1152 
Choosing {\tt id} instead of {\tt idt} is a very common error, especially


1153 
since it appears in former versions of most of Isabelle's objectlogics.


1154 
\end{warn}


1155 


1156 
\begin{example} \label{ex:finset_trans}


1157 
This example demonstrates the use of recursive macros to implement a


1158 
convenient notation for finite sets.


1159 
\begin{ttbox}


1160 
FINSET = SET +


1161 
types


1162 
is 0


1163 
consts


1164 
"" :: "i => is" ("_")


1165 
"{\at}Enum" :: "[i, is] => is" ("_,/ _")


1166 
empty :: "i" ("{\ttlbrace}{\ttrbrace}")


1167 
insert :: "[i, i] => i"


1168 
"{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}")


1169 
translations


1170 
"{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})"


1171 
"{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})"


1172 
end


1173 
\end{ttbox}


1174 


1175 
Finite sets are internally built up by {\tt empty} and {\tt insert}.


1176 
Externally we would like to see \verb{x, y, z} rather than {\tt insert(x,


1177 
insert(y, insert(z, empty)))}.


1178 


1179 
First we define the generic syntactic category {\tt is} for one or more


1180 
objects of type {\tt i} separated by commas (including breaks for pretty


1181 
printing). The category has to be declared as a 0place type constructor, but


1182 
without {\tt arities} declaration. Hence {\tt is} is not a logical type, no


1183 
default productions will be added, and we can cook our own syntax for {\tt


1184 
is} (first two lines of {\tt consts} section). If we had needed generic


1185 
enumerations of type $\alpha$ (i.e.\ {\tt logic}), we could have used the


1186 
predefined category \ttindex{args} and skipped this part altogether.


1187 


1188 
Next follows {\tt empty}, which is already equipped with its syntax


1189 
\verb{}, and {\tt insert} without concrete syntax. The syntactic constant


1190 
{\tt\at Finset} provides concrete syntax for enumerations of {\tt i} enclosed


1191 
in curly braces. Remember that a pair of parentheses specifies a block of

108

1192 
indentation for pretty printing. The category {\tt is} can later be reused

104

1193 
for other enumerations like lists or tuples.


1194 

108

1195 
The translations may look a bit odd at first sight, but rules can only be


1196 
fully understood in their internal forms, which are:

104

1197 
\begin{ttbox}


1198 
parse_rules:


1199 
("{\at}Finset" ("{\at}Enum" x xs)) > ("insert" x ("{\at}Finset" xs))


1200 
("{\at}Finset" x) > ("insert" x "empty")


1201 
print_rules:


1202 
("insert" x ("{\at}Finset" xs)) > ("{\at}Finset" ("{\at}Enum" x xs))


1203 
("insert" x "empty") > ("{\at}Finset" x)


1204 
\end{ttbox}


1205 
This shows that \verb{x, xs} indeed matches any set enumeration of at least


1206 
two elements, binding the first to {\tt x} and the rest to {\tt xs}.


1207 
Likewise, \verb{xs} and \verb{x} represent any set enumeration. Note that


1208 
the parse rules only work in this order.


1209 


1210 
\medskip


1211 
Some rules are prone to misbehaviour, as


1212 
\verb%empty insert. insert(x, empty) shows, which is printed as


1213 
\verb%empty insert. {x}. This problem arises, because the ast rewriter


1214 
cannot discern constants, frees, bounds etc.\ and looks only for names of


1215 
atoms.


1216 


1217 
Thus the names of {\tt Constant}s occurring in the (internal) lefthand side


1218 
of translation rules should be regarded as 'reserved keywords'. It is good


1219 
practice to choose nonidentifiers here like {\tt\at Finset} or sufficiently


1220 
long and strange names.


1221 
\end{example}


1222 


1223 
\begin{example} \label{ex:prod_trans}


1224 
One of the wellformedness conditions for ast rewrite rules stated earlier


1225 
implies that you can never introduce new {\tt Variable}s on the righthand


1226 
side. Something like \verb"K(B)" => "%x. B" is illegal and could cause


1227 
variable capturing, if it were allowed. In such cases you usually have to


1228 
fall back on translation functions. But there is a trick that makes things

108

1229 
quite readable in some cases: {\em calling parse translations by parse

104

1230 
rules}. This is demonstrated here.


1231 
\begin{ttbox}


1232 
PROD = FINSET +


1233 
consts


1234 
Pi :: "[i, i => i] => i"


1235 
"{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10)


1236 
"{\at}>" :: "[i, i] => i" ("(_ >/ _)" [51, 50] 50)


1237 
translations


1238 
"PROD x:A. B" => "Pi(A, %x. B)"


1239 
"A > B" => "Pi(A, _K(B))"


1240 
end


1241 
ML


1242 
val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}>"))];


1243 
\end{ttbox}


1244 


1245 
{\tt Pi} is an internal constant for constructing dependent products. Two


1246 
external forms exist: {\tt PROD x:A.B}, the general case, and {\tt A > B},


1247 
an abbreviation for \verbPi(A, %x.B) with {\tt B} not actually depending on


1248 
{\tt x}.


1249 


1250 
Now the second parse rule is where the trick comes in: {\tt _K(B)} is


1251 
introduced during ast rewriting, which later becomes \verb%x.B due to a


1252 
parse translation associated with \ttindex{_K}. Note that a leading {\tt _}


1253 
in $id$s is allowed in translation rules, but not in ordinary terms. This


1254 
special behaviour of the lexer is very useful for 'forging' asts containing


1255 
names that are not directly accessible normally. Unfortunately, there is no


1256 
such trick for printing, so we have to add a {\tt ML} section for the print


1257 
translation \ttindex{dependent_tr'}.


1258 


1259 
The parse translation for {\tt _K} is already installed in Pure, and {\tt


1260 
dependent_tr'} is exported by the syntax module for public use. See


1261 
\S\ref{sec:tr_funs} for more of the arcane lore of translation functions.


1262 
\end{example}


1263 


1264 


1265 


1266 
\section{Translation functions *} \label{sec:tr_funs}


1267 


1268 
This section is about the remaining translation mechanism which enables the


1269 
designer of theories to do almost everything with terms or asts during the

108

1270 
parsing or printing process, by writing \MLfunctions. The logic \LK\ is a


1271 
good example of a quite sophisticated use of this to transform between

104

1272 
internal and external representations of associative sequences. The high


1273 
level macro system described in \S\ref{sec:macros} fails here completely.


1274 


1275 
\begin{warn}


1276 
A full understanding of the matters presented here requires some familiarity


1277 
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},


1278 
{\tt Syntax.ast} and the encodings of types and terms as such at the various


1279 
stages of the parsing or printing process. You probably do not really want to


1280 
use translation functions at all!


1281 
\end{warn}


1282 


1283 
As already mentioned in \S\ref{sec:asts}, there are four kinds of translation


1284 
functions. All such functions are associated with a name which specifies an


1285 
ast's or term's head invoking that function. Such names can be (logical or


1286 
syntactic) constants or type constructors.


1287 


1288 
{\tt Syntax.print_syntax} displays the sets of names associated with the


1289 
translation functions of a {\tt Syntax.syntax} under


1290 
\ttindex{parse_ast_translation}, \ttindex{parse_translation},


1291 
\ttindex{print_translation} and \ttindex{print_ast_translation}. The user can


1292 
add new ones via the {\tt ML} section\index{ML section@{\tt ML} section} of a


1293 
{\tt .thy} file. But there may never be more than one function of the same


1294 
kind per name.


1295 


1296 
\begin{warn}


1297 
Conceptually, the {\tt ML} section should appear between {\tt consts} and


1298 
{\tt translations}, i.e.\ newly installed translation functions are already


1299 
effective when macros and logical rules are parsed. {\tt ML} has to be the


1300 
last section because the {\tt .thy} file parser is unable to detect the end


1301 
of \ML\ code in another way than by endoffile.


1302 
\end{warn}


1303 


1304 
All text of the {\tt ML} section is simply copied verbatim into the \ML\ file


1305 
generated from a {\tt .thy} file. Definitions made here by the user become


1306 
components of a \ML\ structure of the same name as the theory to be created.


1307 
Therefore local things should be declared within {\tt local}. The following


1308 
special \ML\ values, which are all optional, serve as the interface for the


1309 
installation of user defined translation functions.


1310 


1311 
\begin{ttbox}


1312 
val parse_ast_translation: (string * (ast list > ast)) list


1313 
val parse_translation: (string * (term list > term)) list


1314 
val print_translation: (string * (term list > term)) list


1315 
val print_ast_translation: (string * (ast list > ast)) list


1316 
\end{ttbox}


1317 


1318 
The basic idea behind all four kinds of functions is relatively simple (see


1319 
also Figure~\ref{fig:parse_print}): Whenever  during the transformations


1320 
between parse trees, asts and terms  a combination of the form


1321 
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function $f$


1322 
of appropriate kind exists for $c$, the result will be $f \mtt[ x@1, \ldots,


1323 
x@n \mtt]$. Thereby, $x@1, \ldots, x@n$ (with $n \ge 0$) are asts for ast


1324 
translations and terms for term translations. A 'combination' at ast level is


1325 
of the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots, x@n}$, and at


1326 
term level $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttrel{\$}


1327 
x@1 \ttrel{\$} \dots \ttrel{\$} x@n$.


1328 


1329 
\medskip


1330 
Translation functions at ast level differ from those at term level only in


1331 
the same way, as asts and terms differ. Terms, being more complex and more


1332 
specific, allow more sophisticated transformations (typically involving


1333 
abstractions and bound variables).


1334 

108

1335 
On the other hand, {\em parse\/} (ast) translations differ from {\em print\/}

104

1336 
(ast) translations more fundamentally:


1337 
\begin{description}


1338 
\item[Parse (ast) translations] are applied bottomup, i.e.\ the arguments


1339 
supplied ($x@1, \ldots, x@n$ above) are already in translated form.


1340 
Additionally, they may not fail, exceptions are reraised after printing


1341 
of an error message.


1342 


1343 
\item[Print (ast) translations] are applied topdown, i.e.\ supplied with


1344 
arguments that are partly still in internal form. The result is again fed


1345 
into the translation machinery as a whole. Therefore a print (ast)


1346 
translation should not introduce as head a constant of the same name that


1347 
invoked it in the first place. Alternatively, exception \ttindex{Match}


1348 
may be raised, indicating failure of translation.


1349 
\end{description}


1350 


1351 
Another difference between the parsing and the printing process is, which


1352 
atoms are {\tt Constant}s or {\tt Const}s, i.e.\ able to invoke translation


1353 
functions.


1354 


1355 
For parse ast translations only former parse tree heads are {\tt Constant}s


1356 
(see also $ast_of_pt$ in \S\ref{sec:asts}). These and additionally introduced


1357 
{\tt Constant}s (e.g.\ by macros), become {\tt Const}s for parse translations


1358 
(see also $term_of_ast$ in \S\ref{sec:asts}).


1359 


1360 
The situation is slightly different, when terms are prepared for printing,


1361 
since the role of atoms is known. Initially, all logical constants and type


1362 
constructors may invoke print translations. New constants may be introduced


1363 
by these or by macros, able to invoke parse ast translations.


1364 


1365 


1366 
\subsection{A simple example *}


1367 


1368 
Presenting a simple and useful example of translation functions is not that


1369 
easy, since the macro system is sufficient for most simple applications. By


1370 
convention, translation functions always have names ending with {\tt


1371 
_ast_tr}, {\tt _tr}, {\tt _tr'} or {\tt _ast_tr'}. You may look for such


1372 
names in the sources of Pure Isabelle for more examples.


1373 


1374 
\begin{example} \label{ex:tr_funs}


1375 


1376 
We continue Example~\ref{ex:prod_trans} by presenting the \ML\ sources of the


1377 
parse translation for \ttindex{_K} and the print translation


1378 
\ttindex{dependent_tr'}:


1379 


1380 
\begin{ttbox}


1381 
(* nondependent abstraction *)


1382 


1383 
fun k_tr (*"_K"*) [t] = Abs ("x", dummyT, incr_boundvars 1 t)


1384 
 k_tr (*"_K"*) ts = raise_term "k_tr" ts;


1385 


1386 
(* dependent / nondependent quantifiers *)


1387 


1388 
fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =


1389 
if 0 mem (loose_bnos B) then


1390 
let val (x', B') = variant_abs (x, dummyT, B);


1391 
in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)


1392 
end


1393 
else list_comb (Const (r, dummyT) $ A $ B, ts)


1394 
 dependent_tr' _ _ = raise Match;


1395 
\end{ttbox}


1396 


1397 
This text is taken from the Pure sources, ordinary user translations would


1398 
typically appear within the {\tt ML} section of the {\tt .thy} file.


1399 


1400 
\medskip


1401 
If {\tt k_tr} is called with exactly one argument $t$, it creates a new {\tt


1402 
Abs} node with a body derived from $t$: loose {\tt Bound}s, i.e.\ those


1403 
referring to outer {\tt Abs}s, are incremented using


1404 
\ttindex{incr_boundvars}. This and many other basic term manipulation


1405 
functions are defined in {\tt Pure/term.ML}, the comments there being in most


1406 
cases the only documentation.


1407 


1408 
Since terms fed into parse translations are not yet typed, the type of the


1409 
bound variable in the new {\tt Abs} is simply {\tt dummyT}.


1410 


1411 
\medskip


1412 
The argument $(q, r)$ for {\tt dependent_tr'} is supplied already during the

108

1413 
installation within an {\tt ML} section. This yields a parse translation that

104

1414 
transforms something like $c(A, \mtt{Abs}(x, T, B), t@1, \ldots, t@n)$ into


1415 
$q(x', A, B', t@1, \ldots, t@n)$ or $r(A, B, t@1, \ldots, t@n)$. The latter


1416 
form, if $B$ does not actually depend on $x$. This is checked using

108

1417 
\ttindex{loose_bnos}, yet another function of {\tt Pure/term.ML}. Note that


1418 
$x'$ is a version of $x$ renamed away from all names in $B$, and $B'$ the


1419 
body $B$ with {\tt Bound}s referring to our {\tt Abs} node replaced by


1420 
$\ttfct{Free} (x', \mtt{dummyT})$.

104

1421 


1422 
We have to be more careful with types here. While types of {\tt Const}s are


1423 
completely ignored, type constraints may be printed for some {\tt Free}s and


1424 
{\tt Var}s (only if \ttindex{show_types} is set to {\tt true}). Variables of


1425 
type \ttindex{dummyT} are never printed with constraint, though. Thus, a


1426 
constraint of $x'$ may only appear at its binding place, since {\tt Free}s of


1427 
$B'$ replacing the appropriate {\tt Bound}s of $B$ via \ttindex{variant_abs}


1428 
have all type {\tt dummyT}.


1429 
\end{example}


1430 


1431 


1432 


1433 
\section{Example: some minimal logics} \label{sec:min_logics}


1434 


1435 
This concluding section presents some examples that are very simple from a


1436 
syntactic point of view. They should rather demonstrate how to define new


1437 
objectlogics from scratch. In particular we need to say how an objectlogic


1438 
syntax is hooked up to the metalogic. Since all theorems must conform to the


1439 
syntax for $prop$ (see Figure~\ref{fig:pure_gram}), that syntax has to be


1440 
extended with the objectlevel syntax. Assume that the syntax of your


1441 
objectlogic defines a category $o$ of formulae. These formulae can now


1442 
appear in axioms and theorems wherever $prop$ does if you add the production


1443 
\[ prop ~=~ o. \]


1444 
More precisely, you need a coercion from formulae to propositions:


1445 
\begin{ttbox}


1446 
Base = Pure +


1447 
types


1448 
o 0


1449 
arities


1450 
o :: logic


1451 
consts


1452 
Trueprop :: "o => prop" ("_" 5)


1453 
end


1454 
\end{ttbox}


1455 
The constant {\tt Trueprop} (the name is arbitrary) acts as an invisible


1456 
coercion function. Assuming this definition resides in a file {\tt base.thy},


1457 
you have to load it with the command {\tt use_thy "base"}.


1458 

108

1459 
One of the simplest nontrivial logics is {\em minimal logic\/} of


1460 
implication. Its definition in Isabelle needs no advanced features but


1461 
illustrates the overall mechanism quite nicely:

104

1462 
\begin{ttbox}


1463 
Hilbert = Base +


1464 
consts


1465 
">" :: "[o, o] => o" (infixr 10)


1466 
rules


1467 
K "P > Q > P"


1468 
S "(P > Q > R) > (P > Q) > P > R"


1469 
MP "[ P > Q; P ] ==> Q"


1470 
end


1471 
\end{ttbox}


1472 
After loading this definition you can start to prove theorems in this logic:


1473 
\begin{ttbox}


1474 
goal Hilbert.thy "P > P";


1475 
{\out Level 0}


1476 
{\out P > P}


1477 
{\out 1. P > P}


1478 
by (resolve_tac [Hilbert.MP] 1);


1479 
{\out Level 1}


1480 
{\out P > P}


1481 
{\out 1. ?P > P > P}


1482 
{\out 2. ?P}


1483 
by (resolve_tac [Hilbert.MP] 1);


1484 
{\out Level 2}


1485 
{\out P > P}


1486 
{\out 1. ?P1 > ?P > P > P}


1487 
{\out 2. ?P1}


1488 
{\out 3. ?P}


1489 
by (resolve_tac [Hilbert.S] 1);


1490 
{\out Level 3}


1491 
{\out P > P}


1492 
{\out 1. P > ?Q2 > P}


1493 
{\out 2. P > ?Q2}


1494 
by (resolve_tac [Hilbert.K] 1);


1495 
{\out Level 4}


1496 
{\out P > P}


1497 
{\out 1. P > ?Q2}


1498 
by (resolve_tac [Hilbert.K] 1);


1499 
{\out Level 5}


1500 
{\out P > P}


1501 
{\out No subgoals!}


1502 
\end{ttbox}


1503 
As you can see, this Hilbertstyle formulation of minimal logic is easy to


1504 
define but difficult to use. The following natural deduction formulation is


1505 
far preferable:


1506 
\begin{ttbox}


1507 
MinI = Base +


1508 
consts


1509 
">" :: "[o, o] => o" (infixr 10)


1510 
rules


1511 
impI "(P ==> Q) ==> P > Q"


1512 
impE "[ P > Q; P ] ==> Q"


1513 
end


1514 
\end{ttbox}


1515 
Note, however, that although the two systems are equivalent, this fact cannot


1516 
be proved within Isabelle: {\tt S} and {\tt K} can be derived in {\tt MinI}


1517 
(exercise!), but {\tt impI} cannot be derived in {\tt Hilbert}. The reason is

108

1518 
that {\tt impI} is only an {\em admissible\/} rule in {\tt Hilbert},


1519 
something that can only be shown by induction over all possible proofs in


1520 
{\tt Hilbert}.

104

1521 


1522 
It is a very simple matter to extend minimal logic with falsity:


1523 
\begin{ttbox}


1524 
MinIF = MinI +


1525 
consts


1526 
False :: "o"


1527 
rules


1528 
FalseE "False ==> P"


1529 
end


1530 
\end{ttbox}


1531 
On the other hand, we may wish to introduce conjunction only:


1532 
\begin{ttbox}


1533 
MinC = Base +


1534 
consts


1535 
"&" :: "[o, o] => o" (infixr 30)


1536 
rules


1537 
conjI "[ P; Q ] ==> P & Q"


1538 
conjE1 "P & Q ==> P"


1539 
conjE2 "P & Q ==> Q"


1540 
end


1541 
\end{ttbox}


1542 
And if we want to have all three connectives together, we define:


1543 
\begin{ttbox}


1544 
MinIFC = MinIF + MinC

