doc-src/Logics/defining.tex
author paulson
Fri Feb 16 18:00:47 1996 +0100 (1996-02-16)
changeset 1512 ce37c64244c0
parent 291 a615050a7494
permissions -rw-r--r--
Elimination of fully-functorial style.
Type tactic changed to a type abbrevation (from a datatype).
Constructor tactic and function apply deleted.
lcp@104
     1
%% $Id$
lcp@291
     2
%%  \([a-zA-Z][a-zA-Z]}\.\) \([^ ]\)             \1  \2
lcp@291
     3
%%  @\([a-z0-9]\)       ^{(\1)}
wenzelm@108
     4
lcp@291
     5
\newcommand\rmindex[1]{{#1}\index{#1}\@}
lcp@291
     6
\newcommand\mtt[1]{\mbox{\tt #1}}
lcp@291
     7
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
lcp@291
     8
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
lcp@291
     9
\newcommand\Constant{\ttfct{Constant}}
lcp@291
    10
\newcommand\Variable{\ttfct{Variable}}
lcp@291
    11
\newcommand\Appl[1]{\ttfct{Appl}\mathopen{\mtt[}#1\mathclose{\mtt]}}
lcp@291
    12
\newcommand\AST{{\sc ast}}
lcp@291
    13
\let\rew=\longrightarrow
lcp@104
    14
lcp@104
    15
lcp@104
    16
\chapter{Defining Logics} \label{Defining-Logics}
lcp@104
    17
lcp@291
    18
This chapter explains how to define new formal systems --- in particular,
lcp@291
    19
their concrete syntax.  While Isabelle can be regarded as a theorem prover
lcp@291
    20
for set theory, higher-order logic or the sequent calculus, its
lcp@291
    21
distinguishing feature is support for the definition of new logics.
lcp@104
    22
lcp@291
    23
Isabelle logics are hierarchies of theories, which are described and
lcp@291
    24
illustrated in {\em Introduction to Isabelle}.  That material, together
lcp@291
    25
with the theory files provided in the examples directories, should suffice
lcp@291
    26
for all simple applications.  The easiest way to define a new theory is by
lcp@291
    27
modifying a copy of an existing theory.
lcp@291
    28
lcp@291
    29
This chapter is intended for experienced Isabelle users.  It documents all
lcp@291
    30
aspects of theories concerned with syntax: mixfix declarations, pretty
lcp@291
    31
printing, macros and translation functions.  The extended examples of
lcp@291
    32
\S\ref{sec:min_logics} demonstrate the logical aspects of the definition of
lcp@291
    33
theories.  Sections marked with * are highly technical and might be skipped
lcp@291
    34
on the first reading.
lcp@104
    35
lcp@104
    36
lcp@291
    37
\section{Priority grammars} \label{sec:priority_grammars}
lcp@291
    38
\index{grammars!priority|(} 
lcp@104
    39
lcp@291
    40
The syntax of an Isabelle logic is specified by a {\bf priority grammar}.
lcp@291
    41
A context-free grammar\index{grammars!context-free} contains a set of
lcp@291
    42
productions of the form $A=\gamma$, where $A$ is a nonterminal and
lcp@291
    43
$\gamma$, the right-hand side, is a string of terminals and nonterminals.
lcp@291
    44
Isabelle uses an extended format permitting {\bf priorities}, or
lcp@291
    45
precedences.  Each nonterminal is decorated by an integer priority, as
lcp@291
    46
in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be replaced
lcp@291
    47
using a production $A^{(q)} = \gamma$ only if $p \le q$.  Any priority
lcp@291
    48
grammar can be translated into a normal context free grammar by introducing
lcp@291
    49
new nonterminals and productions.
lcp@104
    50
lcp@104
    51
Formally, a set of context free productions $G$ induces a derivation
lcp@291
    52
relation $\rew@G$.  Let $\alpha$ and $\beta$ denote strings of terminal or
lcp@291
    53
nonterminal symbols.  Then
lcp@291
    54
\[ \alpha\, A^{(p)}\, \beta ~\rew@G~ \alpha\,\gamma\,\beta \] 
lcp@291
    55
if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$q\ge p$.
lcp@104
    56
lcp@104
    57
The following simple grammar for arithmetic expressions demonstrates how
lcp@291
    58
binding power and associativity of operators can be enforced by priorities.
lcp@104
    59
\begin{center}
lcp@104
    60
\begin{tabular}{rclr}
lcp@291
    61
  $A^{(9)}$ & = & {\tt0} \\
lcp@291
    62
  $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
lcp@291
    63
  $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
lcp@291
    64
  $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
lcp@291
    65
  $A^{(3)}$ & = & {\tt-} $A^{(3)}$
lcp@104
    66
\end{tabular}
lcp@104
    67
\end{center}
lcp@291
    68
The choice of priorities determines that {\tt -} binds tighter than {\tt *},
lcp@291
    69
which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the
lcp@291
    70
left and {\tt *} to the right.
lcp@104
    71
lcp@104
    72
To minimize the number of subscripts, we adopt the following conventions:
lcp@104
    73
\begin{itemize}
lcp@291
    74
\item All priorities $p$ must be in the range $0 \leq p \leq max_pri$ for
lcp@291
    75
  some fixed integer $max_pri$.
lcp@291
    76
\item Priority $0$ on the right-hand side and priority $max_pri$ on the
lcp@104
    77
  left-hand side may be omitted.
lcp@104
    78
\end{itemize}
lcp@291
    79
The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$;
lcp@291
    80
the priority of the left-hand side actually appears in a column on the far
lcp@291
    81
right.  Finally, alternatives may be separated by $|$, and repetition
lcp@104
    82
indicated by \dots.
lcp@104
    83
lcp@291
    84
Using these conventions and assuming $max_pri=9$, the grammar takes the form
lcp@104
    85
\begin{center}
lcp@104
    86
\begin{tabular}{rclc}
lcp@104
    87
$A$ & = & {\tt0} & \hspace*{4em} \\
lcp@104
    88
 & $|$ & {\tt(} $A$ {\tt)} \\
lcp@291
    89
 & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
lcp@291
    90
 & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
lcp@291
    91
 & $|$ & {\tt-} $A^{(3)}$ & (3)
lcp@104
    92
\end{tabular}
lcp@104
    93
\end{center}
lcp@291
    94
\index{grammars!priority|)}
lcp@104
    95
lcp@104
    96
lcp@291
    97
\begin{figure}
lcp@104
    98
\begin{center}
lcp@104
    99
\begin{tabular}{rclc}
lcp@104
   100
$prop$ &=& \ttindex{PROP} $aprop$ ~~$|$~~ {\tt(} $prop$ {\tt)} \\
lcp@291
   101
     &$|$& $logic^{(3)}$ \ttindex{==} $logic^{(2)}$ & (2) \\
lcp@291
   102
     &$|$& $logic^{(3)}$ \ttindex{=?=} $logic^{(2)}$ & (2) \\
lcp@291
   103
     &$|$& $prop^{(2)}$ \ttindex{==>} $prop^{(1)}$ & (1) \\
lcp@291
   104
     &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
lcp@104
   105
     &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\
lcp@104
   106
$logic$ &=& $prop$ ~~$|$~~ $fun$ \\\\
lcp@104
   107
$aprop$ &=& $id$ ~~$|$~~ $var$
lcp@291
   108
    ~~$|$~~ $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\\\
lcp@104
   109
$fun$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $fun$ {\tt)} \\
lcp@291
   110
    &$|$& $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\
lcp@291
   111
    &$|$& $fun^{(max_pri)}$ {\tt::} $type$ \\
lcp@104
   112
    &$|$& \ttindex{\%} $idts$ {\tt.} $logic$ & (0) \\\\
lcp@291
   113
$idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
lcp@104
   114
$idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
lcp@104
   115
    &$|$& $id$ \ttindex{::} $type$ & (0) \\\\
lcp@291
   116
$type$ &=& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
wenzelm@135
   117
  ~~$|$~~ $tvar$ {\tt::} $sort$ \\
lcp@291
   118
     &$|$& $id$ ~~$|$~~ $type^{(max_pri)}$ $id$
lcp@104
   119
                ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
lcp@291
   120
     &$|$& $type^{(1)}$ \ttindex{=>} $type$ & (0) \\
lcp@104
   121
     &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\
lcp@104
   122
     &$|$& {\tt(} $type$ {\tt)} \\\\
lcp@104
   123
$sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
lcp@104
   124
                ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
lcp@104
   125
\end{tabular}\index{*"!"!}\index{*"["|}\index{*"|"]}
lcp@104
   126
\indexbold{type@$type$} \indexbold{sort@$sort$} \indexbold{idt@$idt$}
lcp@104
   127
\indexbold{idts@$idts$} \indexbold{logic@$logic$} \indexbold{prop@$prop$}
lcp@104
   128
\indexbold{fun@$fun$}
lcp@104
   129
\end{center}
lcp@291
   130
\caption{Meta-logic syntax}\label{fig:pure_gram}
lcp@104
   131
\end{figure}
lcp@104
   132
lcp@291
   133
lcp@291
   134
\section{The Pure syntax} \label{sec:basic_syntax}
lcp@291
   135
\index{syntax!Pure|(}
lcp@104
   136
lcp@291
   137
At the root of all object-logics lies the Pure theory,\index{theory!Pure}
lcp@291
   138
bound to the \ML{} identifier \ttindex{Pure.thy}.  It contains, among many
lcp@291
   139
other things, the Pure syntax. An informal account of this basic syntax
lcp@291
   140
(meta-logic, types, \ldots) may be found in {\em Introduction to Isabelle}.
lcp@291
   141
A more precise description using a priority grammar is shown in
lcp@291
   142
Fig.\ts\ref{fig:pure_gram}.  The following nonterminals are defined:
lcp@291
   143
\begin{description}
lcp@291
   144
  \item[$prop$] Terms of type $prop$.  These are formulae of the meta-logic.
lcp@104
   145
lcp@291
   146
  \item[$aprop$] Atomic propositions.  These typically include the
lcp@291
   147
    judgement forms of the object-logic; its definition introduces a
lcp@291
   148
    meta-level predicate for each judgement form.
lcp@291
   149
lcp@291
   150
  \item[$logic$] Terms whose type belongs to class $logic$.  Initially,
lcp@291
   151
    this category contains just $prop$.  As the syntax is extended by new
lcp@291
   152
    object-logics, more productions for $logic$ are added automatically
lcp@291
   153
    (see below).
lcp@104
   154
lcp@104
   155
  \item[$fun$] Terms potentially of function type.
lcp@104
   156
lcp@291
   157
  \item[$type$] Types of the meta-logic.
lcp@104
   158
lcp@291
   159
  \item[$idts$] A list of identifiers, possibly constrained by types.  
lcp@104
   160
\end{description}
lcp@104
   161
lcp@291
   162
\begin{warn}
lcp@291
   163
  Note that \verb|x::nat y| is parsed as \verb|x::(nat y)|, treating {\tt
lcp@291
   164
    y} like a type constructor applied to {\tt nat}.  The likely result is
lcp@291
   165
  an error message.  To avoid this interpretation, use parentheses and
lcp@291
   166
  write \verb|(x::nat) y|.
lcp@104
   167
lcp@291
   168
  Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and
lcp@291
   169
  yields a syntax error.  The correct form is \verb|(x::nat) (y::nat)|.
lcp@291
   170
\end{warn}
lcp@291
   171
lcp@291
   172
\subsection{Logical types and default syntax}\label{logical-types}
lcp@291
   173
Isabelle's representation of mathematical languages is based on the typed
lcp@291
   174
$\lambda$-calculus.  All logical types, namely those of class $logic$, are
lcp@291
   175
automatically equipped with a basic syntax of types, identifiers,
lcp@291
   176
variables, parentheses, $\lambda$-abstractions and applications.  
lcp@291
   177
lcp@291
   178
More precisely, for each type constructor $ty$ with arity $(\vec{s})c$,
lcp@291
   179
where $c$ is a subclass of $logic$, several productions are added:
lcp@104
   180
\begin{center}
lcp@104
   181
\begin{tabular}{rclc}
lcp@104
   182
$ty$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $ty$ {\tt)} \\
lcp@291
   183
  &$|$& $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)}\\
lcp@291
   184
  &$|$& $ty^{(max_pri)}$ {\tt::} $type$\\\\
lcp@104
   185
$logic$ &=& $ty$
lcp@104
   186
\end{tabular}
lcp@104
   187
\end{center}
lcp@104
   188
lcp@104
   189
lcp@291
   190
\subsection{Lexical matters}
lcp@291
   191
The parser does not process input strings directly.  It operates on token
lcp@291
   192
lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of
lcp@291
   193
tokens: \bfindex{delimiters} and \bfindex{name tokens}.
lcp@104
   194
lcp@291
   195
Delimiters can be regarded as reserved words of the syntax.  You can
lcp@291
   196
add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they
lcp@291
   197
appear in typewriter font, for example {\tt ==}, {\tt =?=} and
lcp@291
   198
{\tt PROP}\@.
lcp@104
   199
lcp@291
   200
Name tokens have a predefined syntax.  The lexer distinguishes four
lcp@291
   201
disjoint classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
lcp@291
   202
identifiers\index{identifiers!type}, type unknowns\index{unknowns!type}.
lcp@291
   203
They are denoted by $id$\index{id@$id$}, $var$\index{var@$var$},
lcp@291
   204
$tid$\index{tid@$tid$}, $tvar$\index{tvar@$tvar$}, respectively.  Typical
lcp@291
   205
examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3}.  Here is the precise
lcp@291
   206
syntax:
nipkow@188
   207
\begin{eqnarray*}
nipkow@188
   208
id        & =   & letter~quasiletter^* \\
nipkow@188
   209
var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
lcp@291
   210
tid     & =   & \mbox{\tt '}id \\
lcp@291
   211
tvar      & =   & \mbox{\tt ?}tid ~~|~~
lcp@291
   212
                  \mbox{\tt ?}tid\mbox{\tt .}nat \\[1ex]
nipkow@188
   213
letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
nipkow@188
   214
digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
nipkow@188
   215
quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
nipkow@188
   216
nat       & =   & digit^+
nipkow@188
   217
\end{eqnarray*}
lcp@291
   218
A $var$ or $tvar$ describes an unknown, which is internally a pair
nipkow@188
   219
of base name and index (\ML\ type \ttindex{indexname}).  These components are
lcp@291
   220
either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
lcp@291
   221
run together as in {\tt ?x1}.  The latter form is possible if the
lcp@291
   222
base name does not end with digits.  If the index is 0, it may be dropped
lcp@291
   223
altogether: {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
lcp@104
   224
lcp@291
   225
The lexer repeatedly takes the maximal prefix of the input string that
lcp@291
   226
forms a valid token.  A maximal prefix that is both a delimiter and a name
lcp@291
   227
is treated as a delimiter.  Spaces, tabs and newlines are separators; they
lcp@291
   228
never occur within tokens.
lcp@104
   229
lcp@291
   230
Delimiters need not be separated by white space.  For example, if {\tt -}
lcp@291
   231
is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
lcp@291
   232
two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\ 
lcp@291
   233
treats {\tt --} as a single symbolic name.  The consequence of Isabelle's
lcp@291
   234
more liberal scheme is that the same string may be parsed in different ways
lcp@291
   235
after extending the syntax: after adding {\tt --} as a delimiter, the input
lcp@291
   236
{\tt --} is treated as a single token.
lcp@104
   237
lcp@291
   238
Name tokens are terminal symbols, strictly speaking, but we can generally
lcp@291
   239
regard them as nonterminals.  This is because a name token carries with it
lcp@291
   240
useful information, the name.  Delimiters, on the other hand, are nothing
lcp@291
   241
but than syntactic sugar.
lcp@104
   242
lcp@104
   243
lcp@291
   244
\subsection{*Inspecting the syntax}
lcp@104
   245
\begin{ttbox}
lcp@291
   246
syn_of              : theory -> Syntax.syntax
lcp@291
   247
Syntax.print_syntax : Syntax.syntax -> unit
lcp@291
   248
Syntax.print_gram   : Syntax.syntax -> unit
lcp@291
   249
Syntax.print_trans  : Syntax.syntax -> unit
lcp@104
   250
\end{ttbox}
lcp@291
   251
The abstract type \ttindex{Syntax.syntax} allows manipulation of syntaxes
lcp@291
   252
in \ML.  You can display values of this type by calling the following
lcp@291
   253
functions:
lcp@291
   254
\begin{description}
lcp@291
   255
\item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
lcp@291
   256
  theory~{\it thy} as an \ML\ value.
lcp@291
   257
lcp@291
   258
\item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
lcp@291
   259
  information contained in the syntax {\it syn}.  The displayed output can
lcp@291
   260
  be large.  The following two functions are more selective.
lcp@104
   261
lcp@291
   262
\item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
lcp@291
   263
  of~{\it syn}, namely the lexicon, roots and productions.
lcp@104
   264
lcp@291
   265
\item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
lcp@291
   266
  part of~{\it syn}, namely the constants, parse/print macros and
lcp@291
   267
  parse/print translations.
lcp@291
   268
\end{description}
lcp@291
   269
lcp@291
   270
Let us demonstrate these functions by inspecting Pure's syntax.  Even that
lcp@291
   271
is too verbose to display in full.
lcp@104
   272
\begin{ttbox}
lcp@104
   273
Syntax.print_syntax (syn_of Pure.thy);
lcp@291
   274
{\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
lcp@104
   275
{\out roots: logic type fun prop}
lcp@104
   276
{\out prods:}
lcp@291
   277
{\out   type = tid  (1000)}
lcp@104
   278
{\out   type = tvar  (1000)}
lcp@104
   279
{\out   type = id  (1000)}
lcp@291
   280
{\out   type = tid "::" sort[0]  => "_ofsort" (1000)}
lcp@104
   281
{\out   type = tvar "::" sort[0]  => "_ofsort" (1000)}
lcp@104
   282
{\out   \vdots}
lcp@291
   283
\ttbreak
lcp@104
   284
{\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
lcp@104
   285
{\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
lcp@104
   286
{\out   "_idtyp" "_lambda" "_tapp" "_tappl"}
lcp@104
   287
{\out parse_rules:}
lcp@104
   288
{\out parse_translation: "!!" "_K" "_abs" "_aprop"}
lcp@104
   289
{\out print_translation: "all"}
lcp@104
   290
{\out print_rules:}
lcp@104
   291
{\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
lcp@104
   292
\end{ttbox}
lcp@104
   293
lcp@291
   294
As you can see, the output is divided into labeled sections.  The grammar
lcp@291
   295
is represented by {\tt lexicon}, {\tt roots} and {\tt prods}.  The rest
lcp@291
   296
refers to syntactic translations and macro expansion.  Here is an
lcp@291
   297
explanation of the various sections.
lcp@104
   298
\begin{description}
lcp@291
   299
  \item[\ttindex{lexicon}] lists the delimiters used for lexical
lcp@291
   300
    analysis.\index{delimiters} 
lcp@104
   301
lcp@291
   302
  \item[\ttindex{roots}] lists the grammar's nonterminal symbols.  You must
lcp@291
   303
    name the desired root when calling lower level functions or specifying
lcp@291
   304
    macros.  Higher level functions usually expect a type and derive the
lcp@291
   305
    actual root as described in~\S\ref{sec:grammar}.
lcp@104
   306
lcp@291
   307
  \item[\ttindex{prods}] lists the productions of the priority grammar.
lcp@291
   308
    The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
lcp@291
   309
    Each delimiter is quoted.  Some productions are shown with {\tt =>} and
lcp@291
   310
    an attached string.  These strings later become the heads of parse
lcp@291
   311
    trees; they also play a vital role when terms are printed (see
lcp@291
   312
    \S\ref{sec:asts}).
lcp@104
   313
lcp@291
   314
    Productions with no strings attached are called {\bf copy
lcp@291
   315
      productions}\indexbold{productions!copy}.  Their right-hand side must
lcp@291
   316
    have exactly one nonterminal symbol (or name token).  The parser does
lcp@291
   317
    not create a new parse tree node for copy productions, but simply
lcp@291
   318
    returns the parse tree of the right-hand symbol.
lcp@104
   319
lcp@291
   320
    If the right-hand side consists of a single nonterminal with no
lcp@291
   321
    delimiters, then the copy production is called a {\bf chain
lcp@291
   322
      production}\indexbold{productions!chain}.  Chain productions should
lcp@291
   323
    be seen as abbreviations: conceptually, they are removed from the
lcp@291
   324
    grammar by adding new productions.  Priority information
lcp@291
   325
    attached to chain productions is ignored, only the dummy value $-1$ is
lcp@291
   326
    displayed.
lcp@104
   327
lcp@104
   328
  \item[\ttindex{consts}, \ttindex{parse_rules}, \ttindex{print_rules}]
lcp@291
   329
    relate to macros (see \S\ref{sec:macros}).
lcp@104
   330
lcp@291
   331
  \item[\ttindex{parse_ast_translation}, \ttindex{print_ast_translation}]
lcp@291
   332
    list sets of constants that invoke translation functions for abstract
lcp@291
   333
    syntax trees.  Section \S\ref{sec:asts} below discusses this obscure
lcp@291
   334
    matter. 
lcp@104
   335
lcp@291
   336
  \item[\ttindex{parse_translation}, \ttindex{print_translation}] list sets
lcp@291
   337
    of constants that invoke translation functions for terms (see
lcp@291
   338
    \S\ref{sec:tr_funs}).
lcp@291
   339
\end{description}
lcp@291
   340
\index{syntax!Pure|)}
lcp@104
   341
lcp@104
   342
lcp@104
   343
\section{Mixfix declarations} \label{sec:mixfix}
lcp@291
   344
\index{mixfix declaration|(} 
lcp@104
   345
lcp@291
   346
When defining a theory, you declare new constants by giving their names,
lcp@291
   347
their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
lcp@291
   348
allow you to extend Isabelle's basic $\lambda$-calculus syntax with
lcp@291
   349
readable notation.  They can express any context-free priority grammar.
lcp@291
   350
Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
lcp@291
   351
general than the priority declarations of \ML\ and Prolog.  
lcp@291
   352
lcp@291
   353
A mixfix annotation defines a production of the priority grammar.  It
lcp@291
   354
describes the concrete syntax, the translation to abstract syntax, and the
lcp@291
   355
pretty printing.  Special case annotations provide a simple means of
lcp@291
   356
specifying infix operators, binders and so forth.
lcp@104
   357
lcp@291
   358
\subsection{Grammar productions}\label{sec:grammar}
lcp@291
   359
Let us examine the treatment of the production
lcp@291
   360
\[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\,  
lcp@291
   361
                  A@n^{(p@n)}\, w@n. \]
lcp@291
   362
Here $A@i^{(p@i)}$ is a nonterminal with priority~$p@i$ for $i=1$,
lcp@291
   363
\ldots,~$n$, while $w@0$, \ldots,~$w@n$ are strings of terminals.
lcp@291
   364
In the corresponding mixfix annotation, the priorities are given separately
lcp@291
   365
as $[p@1,\ldots,p@n]$ and~$p$.  The nonterminal symbols are identified with
lcp@291
   366
types~$\tau$, $\tau@1$, \ldots,~$\tau@n$ respectively, and the production's
lcp@291
   367
effect on nonterminals is expressed as the function type
lcp@291
   368
\[ [\tau@1, \ldots, \tau@n]\To \tau. \]
lcp@291
   369
Finally, the template
lcp@291
   370
\[ w@0  \;_\; w@1 \;_\; \ldots \;_\; w@n \]
lcp@291
   371
describes the strings of terminals.
lcp@104
   372
lcp@291
   373
A simple type is typically declared for each nonterminal symbol.  In
lcp@291
   374
first-order logic, type~$i$ stands for terms and~$o$ for formulae.  Only
lcp@291
   375
the outermost type constructor is taken into account.  For example, any
lcp@291
   376
type of the form $\sigma list$ stands for a list;  productions may refer
lcp@291
   377
to the symbol {\tt list} and will apply lists of any type.
lcp@291
   378
lcp@291
   379
The symbol associated with a type is called its {\bf root} since it may
lcp@291
   380
serve as the root of a parse tree.  Precisely, the root of $(\tau@1, \dots,
lcp@291
   381
\tau@n)ty$ is $ty$, where $\tau@1$, \ldots, $\tau@n$ are types and $ty$ is
lcp@291
   382
a type constructor.  Type infixes are a special case of this; in
lcp@291
   383
particular, the root of $\tau@1 \To \tau@2$ is {\tt fun}.  Finally, the
lcp@291
   384
root of a type variable is {\tt logic}; general productions might
lcp@291
   385
refer to this nonterminal.
lcp@104
   386
lcp@291
   387
Identifying nonterminals with types allows a constant's type to specify
lcp@291
   388
syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
lcp@291
   389
\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the
lcp@291
   390
layout of the function's $n$ arguments.  The constant's name, in this
lcp@291
   391
case~$f$, will also serve as the label in the abstract syntax tree.  There
lcp@291
   392
are two exceptions to this treatment of constants:
lcp@291
   393
\begin{enumerate}
lcp@291
   394
  \item A production need not map directly to a logical function.  In this
lcp@291
   395
    case, you must declare a constant whose purpose is purely syntactic.
lcp@291
   396
    By convention such constants begin with the symbol~{\tt\at}, 
lcp@291
   397
    ensuring that they can never be written in formulae.
lcp@291
   398
lcp@291
   399
  \item A copy production has no associated constant.
lcp@291
   400
\end{enumerate}
lcp@291
   401
There is something artificial about this representation of productions,
lcp@291
   402
but it is convenient, particularly for simple theory extensions.
lcp@104
   403
lcp@291
   404
\subsection{The general mixfix form}
lcp@291
   405
Here is a detailed account of the general \bfindex{mixfix declaration} as
lcp@291
   406
it may occur within the {\tt consts} section of a {\tt .thy} file.
lcp@291
   407
\begin{center}
lcp@291
   408
  {\tt "$c$" ::\ "$\sigma$" ("$template$" $ps$ $p$)}
lcp@291
   409
\end{center}
lcp@291
   410
This constant declaration and mixfix annotation is interpreted as follows:
lcp@291
   411
\begin{itemize}
lcp@291
   412
\item The string {\tt "$c$"} is the name of the constant associated with
lcp@291
   413
  the production.  If $c$ is empty (given as~{\tt ""}) then this is a copy
lcp@291
   414
  production.\index{productions!copy} Otherwise, parsing an instance of the
lcp@291
   415
  phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
lcp@291
   416
    $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
lcp@291
   417
  argument.
lcp@291
   418
lcp@291
   419
  \item The constant $c$, if non-empty, is declared to have type $\sigma$.
lcp@104
   420
lcp@291
   421
  \item The string $template$ specifies the right-hand side of
lcp@291
   422
    the production.  It has the form
lcp@291
   423
    \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \] 
lcp@291
   424
    where each occurrence of \ttindex{_} denotes an
lcp@291
   425
    argument\index{argument!mixfix} position and the~$w@i$ do not
lcp@291
   426
    contain~{\tt _}.  (If you want a literal~{\tt _} in the concrete
lcp@291
   427
    syntax, you must escape it as described below.)  The $w@i$ may
lcp@291
   428
    consist of \rmindex{delimiters}, spaces or \rmindex{pretty
lcp@291
   429
      printing} annotations (see below).
lcp@104
   430
lcp@291
   431
  \item The type $\sigma$ specifies the production's nonterminal symbols (or name
lcp@291
   432
    tokens).  If $template$ is of the form above then $\sigma$ must be a
lcp@291
   433
    function type with at least~$n$ argument positions, say $\sigma =
lcp@291
   434
    [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are derived
lcp@291
   435
    from the type $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described above.
lcp@291
   436
    Any of these may be function types; the corresponding root is then {\tt
lcp@291
   437
      fun}. 
lcp@104
   438
lcp@291
   439
  \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
lcp@291
   440
      [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
lcp@291
   441
    priority\indexbold{priorities} required of any phrase that may appear
lcp@291
   442
    as the $i$-th argument.  Missing priorities default to~$0$.
lcp@291
   443
lcp@291
   444
  \item The integer $p$ is the priority of this production.  If omitted, it
lcp@291
   445
    defaults to the maximal priority.
lcp@291
   446
lcp@291
   447
    Priorities, or precedences, range between $0$ and
lcp@291
   448
    $max_pri$\indexbold{max_pri@$max_pri$} (= 1000).
lcp@104
   449
\end{itemize}
lcp@104
   450
lcp@291
   451
The declaration {\tt $c$ ::\ "$\sigma$" ("$template$")} specifies no
lcp@291
   452
priorities.  The resulting production puts no priority constraints on any
lcp@291
   453
of its arguments and has maximal priority itself.  Omitting priorities in
lcp@291
   454
this manner will introduce syntactic ambiguities unless the production's
lcp@291
   455
right-hand side is fully bracketed, as in \verb|"if _ then _ else _ fi"|.
lcp@104
   456
lcp@291
   457
\begin{warn}
lcp@291
   458
  Theories must sometimes declare types for purely syntactic purposes.  One
lcp@291
   459
  example is {\tt type}, the built-in type of types.  This is a `type of
lcp@291
   460
  all types' in the syntactic sense only.  Do not declare such types under
lcp@291
   461
  {\tt arities} as belonging to class $logic$, for that would allow their
lcp@291
   462
  use in arbitrary Isabelle expressions~(\S\ref{logical-types}).
lcp@291
   463
\end{warn}
lcp@291
   464
lcp@291
   465
\subsection{Example: arithmetic expressions}
lcp@291
   466
This theory specification contains a {\tt consts} section with mixfix
lcp@291
   467
declarations encoding the priority grammar from
lcp@291
   468
\S\ref{sec:priority_grammars}:
lcp@104
   469
\begin{ttbox}
lcp@104
   470
EXP = Pure +
lcp@104
   471
types
lcp@291
   472
  exp
lcp@104
   473
arities
lcp@104
   474
  exp :: logic
lcp@104
   475
consts
lcp@291
   476
  "0" :: "exp"                ("0"      9)
lcp@291
   477
  "+" :: "[exp, exp] => exp"  ("_ + _"  [0, 1] 0)
lcp@291
   478
  "*" :: "[exp, exp] => exp"  ("_ * _"  [3, 2] 2)
lcp@291
   479
  "-" :: "exp => exp"         ("- _"    [3] 3)
lcp@104
   480
end
lcp@104
   481
\end{ttbox}
lcp@104
   482
Note that the {\tt arities} declaration causes {\tt exp} to be added to the
lcp@291
   483
syntax' roots.  If you put the text above into a file {\tt exp.thy} and load
wenzelm@135
   484
it via {\tt use_thy "EXP"}, you can run some tests:
lcp@104
   485
\begin{ttbox}
lcp@104
   486
val read_exp = Syntax.test_read (syn_of EXP.thy) "exp";
lcp@291
   487
{\out val it = fn : string -> unit}
lcp@104
   488
read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0";
lcp@104
   489
{\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
lcp@104
   490
{\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
lcp@104
   491
{\out \vdots}
lcp@104
   492
read_exp "0 + - 0 + 0";
lcp@104
   493
{\out tokens: "0" "+" "-" "0" "+" "0"}
lcp@104
   494
{\out raw: ("+" ("+" "0" ("-" "0")) "0")}
lcp@104
   495
{\out \vdots}
lcp@104
   496
\end{ttbox}
lcp@104
   497
The output of \ttindex{Syntax.test_read} includes the token list ({\tt
lcp@291
   498
  tokens}) and the raw \AST{} directly derived from the parse tree,
lcp@291
   499
ignoring parse \AST{} translations.  The rest is tracing information
lcp@291
   500
provided by the macro expander (see \S\ref{sec:macros}).
lcp@104
   501
lcp@291
   502
Executing {\tt Syntax.print_gram} reveals the productions derived
lcp@291
   503
from our mixfix declarations (lots of additional information deleted):
lcp@104
   504
\begin{ttbox}
lcp@291
   505
Syntax.print_gram (syn_of EXP.thy);
lcp@291
   506
{\out exp = "0"  => "0" (9)}
lcp@291
   507
{\out exp = exp[0] "+" exp[1]  => "+" (0)}
lcp@291
   508
{\out exp = exp[3] "*" exp[2]  => "*" (2)}
lcp@291
   509
{\out exp = "-" exp[3]  => "-" (3)}
lcp@104
   510
\end{ttbox}
lcp@104
   511
lcp@291
   512
lcp@291
   513
\subsection{The mixfix template}
lcp@291
   514
Let us take a closer look at the string $template$ appearing in mixfix
lcp@291
   515
annotations.  This string specifies a list of parsing and printing
lcp@291
   516
directives: delimiters\index{delimiter}, arguments\index{argument!mixfix},
lcp@291
   517
spaces, blocks of indentation and line breaks.  These are encoded via the
lcp@104
   518
following character sequences:
lcp@104
   519
lcp@291
   520
\index{pretty printing|(}
lcp@104
   521
\begin{description}
lcp@291
   522
  \item[~\ttindex_~] An argument\index{argument!mixfix} position, which
lcp@291
   523
    stands for a nonterminal symbol or name token.
lcp@104
   524
lcp@291
   525
  \item[~$d$~] A \rmindex{delimiter}, namely a non-empty sequence of
lcp@291
   526
    non-special or escaped characters.  Escaping a character\index{escape
lcp@291
   527
      character} means preceding it with a {\tt '} (single quote).  Thus
lcp@291
   528
    you have to write {\tt ''} if you really want a single quote.  You must
lcp@291
   529
    also escape {\tt _}, {\tt (}, {\tt )} and {\tt /}.  Delimiters may
lcp@291
   530
    never contain white space, though.
lcp@104
   531
lcp@291
   532
  \item[~$s$~] A non-empty sequence of spaces for printing.  This
lcp@291
   533
    and the following specifications do not affect parsing at all.
lcp@104
   534
lcp@291
   535
  \item[~{\ttindex($n$}~] Open a pretty printing block.  The optional
lcp@291
   536
    number $n$ specifies how much indentation to add when a line break
lcp@291
   537
    occurs within the block.  If {\tt(} is not followed by digits, the
lcp@291
   538
    indentation defaults to~$0$.
lcp@104
   539
lcp@291
   540
  \item[~\ttindex)~] Close a pretty printing block.
lcp@104
   541
lcp@291
   542
  \item[~\ttindex{//}~] Force a line break.
lcp@104
   543
lcp@291
   544
  \item[~\ttindex/$s$~] Allow a line break.  Here $s$ stands for the string
lcp@291
   545
    of spaces (zero or more) right after the {\tt /} character.  These
lcp@291
   546
    spaces are printed if the break is not taken.
lcp@291
   547
\end{description}
lcp@291
   548
Isabelle's pretty printer resembles the one described in
lcp@291
   549
Paulson~\cite{paulson91}.  \index{pretty printing|)}
lcp@104
   550
lcp@104
   551
lcp@104
   552
\subsection{Infixes}
lcp@291
   553
\indexbold{infix operators}
lcp@291
   554
Infix operators associating to the left or right can be declared
lcp@291
   555
using {\tt infixl} or {\tt infixr}.
lcp@291
   556
Roughly speaking, the form {\tt $c$ ::\ "$\sigma$" (infixl $p$)}
lcp@291
   557
abbreviates the declarations
lcp@104
   558
\begin{ttbox}
lcp@291
   559
"op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
lcp@291
   560
"op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
lcp@104
   561
\end{ttbox}
lcp@291
   562
and {\tt $c$ ::\ "$\sigma$" (infixr $p$)} abbreviates the declarations
lcp@104
   563
\begin{ttbox}
lcp@291
   564
"op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
lcp@291
   565
"op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
lcp@104
   566
\end{ttbox}
lcp@291
   567
The infix operator is declared as a constant with the prefix {\tt op}.
lcp@104
   568
Thus, prefixing infixes with \ttindex{op} makes them behave like ordinary
lcp@291
   569
function symbols, as in \ML.  Special characters occurring in~$c$ must be
lcp@291
   570
escaped, as in delimiters, using a single quote.
lcp@291
   571
lcp@291
   572
The expanded forms above would be actually illegal in a {\tt .thy} file
lcp@291
   573
because they declare the constant \hbox{\tt"op \(c\)"} twice.
lcp@104
   574
lcp@104
   575
lcp@104
   576
\subsection{Binders}
lcp@291
   577
\indexbold{binders}
lcp@291
   578
\begingroup
lcp@291
   579
\def\Q{{\cal Q}}
lcp@291
   580
A {\bf binder} is a variable-binding construct such as a quantifier.  The
lcp@291
   581
binder declaration \indexbold{*binder}
lcp@104
   582
\begin{ttbox}
lcp@291
   583
\(c\) :: "\(\sigma\)"   (binder "\(\Q\)" \(p\))
lcp@104
   584
\end{ttbox}
lcp@291
   585
introduces a constant~$c$ of type~$\sigma$, which must have the form
lcp@291
   586
$(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
lcp@291
   587
$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
lcp@291
   588
and the whole term has type~$\tau@3$.  Special characters in $\Q$ must be
lcp@291
   589
escaped using a single quote.
lcp@291
   590
lcp@291
   591
Let us declare the quantifier~$\forall$:
lcp@104
   592
\begin{ttbox}
lcp@104
   593
All :: "('a => o) => o"   (binder "ALL " 10)
lcp@104
   594
\end{ttbox}
lcp@291
   595
This let us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
lcp@291
   596
  $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
lcp@291
   597
back on $\mtt{All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
lcp@291
   598
  $x$.$P$} have type~$o$, the type of formulae, while the bound variable
lcp@291
   599
can be polymorphic.
lcp@104
   600
lcp@291
   601
The binder~$c$ of type $(\sigma \To \tau) \To \tau$ can be nested.  The
lcp@291
   602
external form $\Q~x@1~x@2 \ldots x@n. P$ corresponds to the internal form
lcp@291
   603
\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)) \]
lcp@104
   604
lcp@104
   605
\medskip
lcp@104
   606
The general binder declaration
lcp@104
   607
\begin{ttbox}
lcp@291
   608
\(c\)    :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)"   (binder "\(\Q\)" \(p\))
lcp@104
   609
\end{ttbox}
lcp@104
   610
is internally expanded to
lcp@104
   611
\begin{ttbox}
lcp@291
   612
\(c\)    :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)"
lcp@291
   613
"\(\Q\)"\hskip-3pt  :: "[idts, \(\tau@2\)] => \(\tau@3\)"   ("(3\(\Q\)_./ _)" \(p\))
lcp@104
   614
\end{ttbox}
lcp@291
   615
with $idts$ being the nonterminal symbol for a list of $id$s optionally
lcp@291
   616
constrained (see Fig.\ts\ref{fig:pure_gram}).  The declaration also
lcp@291
   617
installs a parse translation\index{translations!parse} for~$\Q$ and a print
lcp@291
   618
translation\index{translations!print} for~$c$ to translate between the
lcp@291
   619
internal and external forms.
lcp@291
   620
\endgroup
lcp@104
   621
lcp@291
   622
\index{mixfix declaration|)}
lcp@104
   623
lcp@104
   624
lcp@104
   625
\section{Example: some minimal logics} \label{sec:min_logics}
lcp@291
   626
This section presents some examples that have a simple syntax.  They
lcp@291
   627
demonstrate how to define new object-logics from scratch.
lcp@104
   628
lcp@291
   629
First we must define how an object-logic syntax embedded into the
lcp@291
   630
meta-logic.  Since all theorems must conform to the syntax for~$prop$ (see
lcp@291
   631
Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
lcp@291
   632
object-level syntax.  Assume that the syntax of your object-logic defines a
lcp@291
   633
nonterminal symbol~$o$ of formulae.  These formulae can now appear in
lcp@291
   634
axioms and theorems wherever $prop$ does if you add the production
lcp@104
   635
\[ prop ~=~ o. \]
lcp@291
   636
This is not a copy production but a coercion from formulae to propositions:
lcp@104
   637
\begin{ttbox}
lcp@104
   638
Base = Pure +
lcp@104
   639
types
lcp@291
   640
  o
lcp@104
   641
arities
lcp@104
   642
  o :: logic
lcp@104
   643
consts
lcp@104
   644
  Trueprop :: "o => prop"   ("_" 5)
lcp@104
   645
end
lcp@104
   646
\end{ttbox}
lcp@104
   647
The constant {\tt Trueprop} (the name is arbitrary) acts as an invisible
lcp@291
   648
coercion function.  Assuming this definition resides in a file {\tt base.thy},
wenzelm@135
   649
you have to load it with the command {\tt use_thy "Base"}.
lcp@104
   650
lcp@291
   651
One of the simplest nontrivial logics is {\bf minimal logic} of
lcp@291
   652
implication.  Its definition in Isabelle needs no advanced features but
lcp@291
   653
illustrates the overall mechanism nicely:
lcp@104
   654
\begin{ttbox}
lcp@104
   655
Hilbert = Base +
lcp@104
   656
consts
lcp@104
   657
  "-->" :: "[o, o] => o"   (infixr 10)
lcp@104
   658
rules
lcp@104
   659
  K     "P --> Q --> P"
lcp@104
   660
  S     "(P --> Q --> R) --> (P --> Q) --> P --> R"
lcp@104
   661
  MP    "[| P --> Q; P |] ==> Q"
lcp@104
   662
end
lcp@104
   663
\end{ttbox}
lcp@291
   664
After loading this definition from the file {\tt hilbert.thy}, you can
lcp@291
   665
start to prove theorems in the logic:
lcp@104
   666
\begin{ttbox}
lcp@104
   667
goal Hilbert.thy "P --> P";
lcp@104
   668
{\out Level 0}
lcp@104
   669
{\out P --> P}
lcp@104
   670
{\out  1.  P --> P}
lcp@291
   671
\ttbreak
lcp@104
   672
by (resolve_tac [Hilbert.MP] 1);
lcp@104
   673
{\out Level 1}
lcp@104
   674
{\out P --> P}
lcp@104
   675
{\out  1.  ?P --> P --> P}
lcp@104
   676
{\out  2.  ?P}
lcp@291
   677
\ttbreak
lcp@104
   678
by (resolve_tac [Hilbert.MP] 1);
lcp@104
   679
{\out Level 2}
lcp@104
   680
{\out P --> P}
lcp@104
   681
{\out  1.  ?P1 --> ?P --> P --> P}
lcp@104
   682
{\out  2.  ?P1}
lcp@104
   683
{\out  3.  ?P}
lcp@291
   684
\ttbreak
lcp@104
   685
by (resolve_tac [Hilbert.S] 1);
lcp@104
   686
{\out Level 3}
lcp@104
   687
{\out P --> P}
lcp@104
   688
{\out  1.  P --> ?Q2 --> P}
lcp@104
   689
{\out  2.  P --> ?Q2}
lcp@291
   690
\ttbreak
lcp@104
   691
by (resolve_tac [Hilbert.K] 1);
lcp@104
   692
{\out Level 4}
lcp@104
   693
{\out P --> P}
lcp@104
   694
{\out  1.  P --> ?Q2}
lcp@291
   695
\ttbreak
lcp@104
   696
by (resolve_tac [Hilbert.K] 1);
lcp@104
   697
{\out Level 5}
lcp@104
   698
{\out P --> P}
lcp@104
   699
{\out No subgoals!}
lcp@104
   700
\end{ttbox}
lcp@291
   701
As we can see, this Hilbert-style formulation of minimal logic is easy to
lcp@291
   702
define but difficult to use.  The following natural deduction formulation is
lcp@291
   703
better:
lcp@104
   704
\begin{ttbox}
lcp@104
   705
MinI = Base +
lcp@104
   706
consts
lcp@104
   707
  "-->" :: "[o, o] => o"   (infixr 10)
lcp@104
   708
rules
lcp@104
   709
  impI  "(P ==> Q) ==> P --> Q"
lcp@104
   710
  impE  "[| P --> Q; P |] ==> Q"
lcp@104
   711
end
lcp@104
   712
\end{ttbox}
lcp@291
   713
Note, however, that although the two systems are equivalent, this fact
lcp@291
   714
cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be
lcp@291
   715
derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
lcp@291
   716
  Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule
lcp@291
   717
in {\tt Hilbert}, something that can only be shown by induction over all
lcp@291
   718
possible proofs in {\tt Hilbert}.
lcp@104
   719
lcp@291
   720
We may easily extend minimal logic with falsity:
lcp@104
   721
\begin{ttbox}
lcp@104
   722
MinIF = MinI +
lcp@104
   723
consts
lcp@104
   724
  False :: "o"
lcp@104
   725
rules
lcp@104
   726
  FalseE "False ==> P"
lcp@104
   727
end
lcp@104
   728
\end{ttbox}
lcp@104
   729
On the other hand, we may wish to introduce conjunction only:
lcp@104
   730
\begin{ttbox}
lcp@104
   731
MinC = Base +
lcp@104
   732
consts
lcp@104
   733
  "&" :: "[o, o] => o"   (infixr 30)
lcp@291
   734
\ttbreak
lcp@104
   735
rules
lcp@104
   736
  conjI  "[| P; Q |] ==> P & Q"
lcp@104
   737
  conjE1 "P & Q ==> P"
lcp@104
   738
  conjE2 "P & Q ==> Q"
lcp@104
   739
end
lcp@104
   740
\end{ttbox}
lcp@291
   741
And if we want to have all three connectives together, we create and load a
lcp@291
   742
theory file consisting of a single line:\footnote{We can combine the
lcp@291
   743
  theories without creating a theory file using the ML declaration
lcp@291
   744
\begin{ttbox}
lcp@291
   745
val MinIFC_thy = merge_theories(MinIF,MinC)
lcp@291
   746
\end{ttbox}
lcp@291
   747
\index{*merge_theories|fnote}}
lcp@104
   748
\begin{ttbox}
lcp@104
   749
MinIFC = MinIF + MinC
lcp@104
   750
\end{ttbox}
lcp@104
   751
Now we can prove mixed theorems like
lcp@104
   752
\begin{ttbox}
lcp@104
   753
goal MinIFC.thy "P & False --> Q";
lcp@104
   754
by (resolve_tac [MinI.impI] 1);
lcp@104
   755
by (dresolve_tac [MinC.conjE2] 1);
lcp@104
   756
by (eresolve_tac [MinIF.FalseE] 1);
lcp@104
   757
\end{ttbox}
lcp@104
   758
Try this as an exercise!
lcp@104
   759
lcp@291
   760
\medskip
lcp@291
   761
Unless you need to define macros or syntax translation functions, you may
lcp@291
   762
skip the rest of this chapter.
lcp@291
   763
lcp@291
   764
lcp@291
   765
\section{*Abstract syntax trees} \label{sec:asts}
lcp@291
   766
\index{trees!abstract syntax|(} The parser, given a token list from the
lcp@291
   767
lexer, applies productions to yield a parse tree\index{trees!parse}.  By
lcp@291
   768
applying some internal transformations the parse tree becomes an abstract
lcp@291
   769
syntax tree, or \AST{}.  Macro expansion, further translations and finally
lcp@291
   770
type inference yields a well-typed term\index{terms!obtained from ASTs}.
lcp@291
   771
The printing process is the reverse, except for some subtleties to be
lcp@291
   772
discussed later.
lcp@291
   773
lcp@291
   774
Figure~\ref{fig:parse_print} outlines the parsing and printing process.
lcp@291
   775
Much of the complexity is due to the macro mechanism.  Using macros, you
lcp@291
   776
can specify most forms of concrete syntax without writing any \ML{} code.
lcp@291
   777
lcp@291
   778
\begin{figure}
lcp@291
   779
\begin{center}
lcp@291
   780
\begin{tabular}{cl}
lcp@291
   781
string          & \\
lcp@291
   782
$\downarrow$    & parser \\
lcp@291
   783
parse tree      & \\
lcp@291
   784
$\downarrow$    & parse \AST{} translation \\
lcp@291
   785
\AST{}             & \\
lcp@291
   786
$\downarrow$    & \AST{} rewriting (macros) \\
lcp@291
   787
\AST{}             & \\
lcp@291
   788
$\downarrow$    & parse translation, type inference \\
lcp@291
   789
--- well-typed term --- & \\
lcp@291
   790
$\downarrow$    & print translation \\
lcp@291
   791
\AST{}             & \\
lcp@291
   792
$\downarrow$    & \AST{} rewriting (macros) \\
lcp@291
   793
\AST{}             & \\
lcp@291
   794
$\downarrow$    & print \AST{} translation, printer \\
lcp@291
   795
string          &
lcp@291
   796
\end{tabular}
lcp@291
   797
\index{translations!parse}\index{translations!parse AST}
lcp@291
   798
\index{translations!print}\index{translations!print AST}
lcp@291
   799
lcp@291
   800
\end{center}
lcp@291
   801
\caption{Parsing and printing}\label{fig:parse_print}
lcp@291
   802
\end{figure}
lcp@291
   803
lcp@291
   804
Abstract syntax trees are an intermediate form between the raw parse trees
lcp@291
   805
and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
lcp@291
   806
variable) or a list of {\em at least two\/} subtrees.  Internally, they
lcp@291
   807
have type \ttindex{Syntax.ast}: \index{*Constant} \index{*Variable}
lcp@291
   808
\index{*Appl}
lcp@291
   809
\begin{ttbox}
lcp@291
   810
datatype ast = Constant of string
lcp@291
   811
             | Variable of string
lcp@291
   812
             | Appl of ast list
lcp@291
   813
\end{ttbox}
lcp@291
   814
lcp@291
   815
Isabelle uses an S-expression syntax for abstract syntax trees.  Constant
lcp@291
   816
atoms are shown as quoted strings, variable atoms as non-quoted strings and
lcp@291
   817
applications as a parenthesized list of subtrees.  For example, the \AST
lcp@291
   818
\begin{ttbox}
lcp@291
   819
Appl [Constant "_constrain",
lcp@291
   820
  Appl [Constant "_abs", Variable "x", Variable "t"],
lcp@291
   821
  Appl [Constant "fun", Variable "'a", Variable "'b"]]
lcp@291
   822
\end{ttbox}
lcp@291
   823
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
lcp@291
   824
Both {\tt ()} and {\tt (f)} are illegal because they have too few
lcp@291
   825
subtrees. 
lcp@291
   826
lcp@291
   827
The resemblance of Lisp's S-expressions is intentional, but there are two
lcp@291
   828
kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
lcp@291
   829
names ``{\tt Constant}'' and ``{\tt Variable}'' too literally; in the later
lcp@291
   830
translation to terms, $\Variable x$ may become a constant, free or bound
lcp@291
   831
variable, even a type constructor or class name; the actual outcome depends
lcp@291
   832
on the context.
lcp@291
   833
lcp@291
   834
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
lcp@291
   835
application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of
lcp@291
   836
application is determined later by context; it could be a type constructor
lcp@291
   837
applied to types.
lcp@291
   838
lcp@291
   839
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
lcp@291
   840
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
lcp@291
   841
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
lcp@291
   842
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
lcp@291
   843
constructor \ttindex{Bound}).
lcp@291
   844
lcp@291
   845
lcp@291
   846
\subsection{Transforming parse trees to \AST{}s}
lcp@291
   847
The parse tree is the raw output of the parser.  Translation functions,
lcp@291
   848
called {\bf parse AST translations}\indexbold{translations!parse AST},
lcp@291
   849
transform the parse tree into an abstract syntax tree.
lcp@291
   850
lcp@291
   851
The parse tree is constructed by nesting the right-hand sides of the
lcp@291
   852
productions used to recognize the input.  Such parse trees are simply lists
lcp@291
   853
of tokens and constituent parse trees, the latter representing the
lcp@291
   854
nonterminals of the productions.  Let us refer to the actual productions in
lcp@291
   855
the form displayed by {\tt Syntax.print_syntax}.
lcp@291
   856
lcp@291
   857
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
lcp@291
   858
by stripping out delimiters and copy productions.  More precisely, the
lcp@291
   859
mapping $ast_of_pt$\index{ast_of_pt@$ast_of_pt$} is derived from the
lcp@291
   860
productions as follows:
lcp@291
   861
\begin{itemize}
lcp@291
   862
  \item Name tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$,
lcp@291
   863
    $var$, $tid$ or $tvar$ token, and $s$ its associated string.
lcp@291
   864
lcp@291
   865
  \item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$.
lcp@291
   866
    Here $\ldots$ stands for strings of delimiters, which are
lcp@291
   867
    discarded.  $P$ stands for the single constituent that is not a
lcp@291
   868
    delimiter; it is either a nonterminal symbol or a name token.
lcp@291
   869
lcp@291
   870
  \item $0$-ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$.
lcp@291
   871
    Here there are no constituents other than delimiters, which are
lcp@291
   872
    discarded. 
lcp@291
   873
lcp@291
   874
  \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
lcp@291
   875
    the remaining constituents $P@1$, \ldots, $P@n$ are built into an
lcp@291
   876
    application whose head constant is~$c$:
lcp@291
   877
    \begin{eqnarray*}
lcp@291
   878
      \lefteqn{ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>} c)} \\
lcp@291
   879
      &&\qquad{}= \Appl{\Constant c, ast_of_pt(P@1), \ldots, ast_of_pt(P@n)}
lcp@291
   880
    \end{eqnarray*}
lcp@291
   881
\end{itemize}
lcp@291
   882
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
lcp@291
   883
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
lcp@291
   884
These examples illustrate the need for further translations to make \AST{}s
lcp@291
   885
closer to the typed $\lambda$-calculus.  The Pure syntax provides
lcp@291
   886
predefined parse \AST{} translations\index{translations!parse AST} for
lcp@291
   887
ordinary applications, type applications, nested abstractions, meta
lcp@291
   888
implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
lcp@291
   889
effect on some representative input strings.
lcp@291
   890
lcp@291
   891
lcp@291
   892
\begin{figure}
lcp@291
   893
\begin{center}
lcp@291
   894
\tt\begin{tabular}{ll}
lcp@291
   895
\rm input string    & \rm \AST \\\hline
lcp@291
   896
"f"                 & f \\
lcp@291
   897
"'a"                & 'a \\
lcp@291
   898
"t == u"            & ("==" t u) \\
lcp@291
   899
"f(x)"              & ("_appl" f x) \\
lcp@291
   900
"f(x, y)"           & ("_appl" f ("_args" x y)) \\
lcp@291
   901
"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
lcp@291
   902
"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
lcp@291
   903
\end{tabular}
lcp@291
   904
\end{center}
lcp@291
   905
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
lcp@291
   906
\end{figure}
lcp@291
   907
lcp@291
   908
\begin{figure}
lcp@291
   909
\begin{center}
lcp@291
   910
\tt\begin{tabular}{ll}
lcp@291
   911
\rm input string            & \rm \AST{} \\\hline
lcp@291
   912
"f(x, y, z)"                & (f x y z) \\
lcp@291
   913
"'a ty"                     & (ty 'a) \\
lcp@291
   914
"('a, 'b) ty"               & (ty 'a 'b) \\
lcp@291
   915
"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
lcp@291
   916
"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
lcp@291
   917
"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
lcp@291
   918
"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
lcp@291
   919
\end{tabular}
lcp@291
   920
\end{center}
lcp@291
   921
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
lcp@291
   922
\end{figure}
lcp@291
   923
lcp@291
   924
The names of constant heads in the \AST{} control the translation process.
lcp@291
   925
The list of constants invoking parse \AST{} translations appears in the
lcp@291
   926
output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
lcp@291
   927
lcp@291
   928
lcp@291
   929
\subsection{Transforming \AST{}s to terms}
lcp@291
   930
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@291
   931
transformed into a term.  This term is probably ill-typed since type
lcp@291
   932
inference has not occurred yet.  The term may contain type constraints
lcp@291
   933
consisting of applications with head {\tt "_constrain"}; the second
lcp@291
   934
argument is a type encoded as a term.  Type inference later introduces
lcp@291
   935
correct types or rejects the input.
lcp@291
   936
lcp@291
   937
Another set of translation functions, namely parse
lcp@291
   938
translations,\index{translations!parse}, may affect this process.  If we
lcp@291
   939
ignore parse translations for the time being, then \AST{}s are transformed
lcp@291
   940
to terms by mapping \AST{} constants to constants, \AST{} variables to
lcp@291
   941
schematic or free variables and \AST{} applications to applications.
lcp@291
   942
lcp@291
   943
More precisely, the mapping $term_of_ast$\index{term_of_ast@$term_of_ast$}
lcp@291
   944
is defined by
lcp@291
   945
\begin{itemize}
lcp@291
   946
\item Constants: $term_of_ast(\Constant x) = \ttfct{Const} (x,
lcp@291
   947
  \mtt{dummyT})$.
lcp@291
   948
lcp@291
   949
\item Schematic variables: $term_of_ast(\Variable \mtt{"?}xi\mtt") =
lcp@291
   950
  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
lcp@291
   951
  the index extracted from $xi$.
lcp@291
   952
lcp@291
   953
\item Free variables: $term_of_ast(\Variable x) = \ttfct{Free} (x,
lcp@291
   954
  \mtt{dummyT})$.
lcp@291
   955
lcp@291
   956
\item Function applications with $n$ arguments:
lcp@291
   957
    \begin{eqnarray*}
lcp@291
   958
      \lefteqn{term_of_ast(\Appl{f, x@1, \ldots, x@n})} \\
lcp@291
   959
      &&\qquad{}= term_of_ast(f) \ttapp
lcp@291
   960
         term_of_ast(x@1) \ttapp \ldots \ttapp term_of_ast(x@n)
lcp@291
   961
    \end{eqnarray*}
lcp@291
   962
\end{itemize}
lcp@291
   963
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
lcp@291
   964
\verb|$|\index{$@{\tt\$}} are constructors of the datatype {\tt term},
lcp@291
   965
while \ttindex{dummyT} stands for some dummy type that is ignored during
lcp@291
   966
type inference.
lcp@291
   967
lcp@291
   968
So far the outcome is still a first-order term.  Abstractions and bound
lcp@291
   969
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
lcp@291
   970
by parse translations.  Such translations are attached to {\tt "_abs"},
lcp@291
   971
{\tt "!!"} and user-defined binders.
lcp@291
   972
lcp@291
   973
lcp@291
   974
\subsection{Printing of terms}
lcp@291
   975
The output phase is essentially the inverse of the input phase.  Terms are
lcp@291
   976
translated via abstract syntax trees into strings.  Finally the strings are
lcp@291
   977
pretty printed.
lcp@291
   978
lcp@291
   979
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
lcp@291
   980
terms into \AST{}s.  Ignoring those, the transformation maps
lcp@291
   981
term constants, variables and applications to the corresponding constructs
lcp@291
   982
on \AST{}s.  Abstractions are mapped to applications of the special
lcp@291
   983
constant {\tt _abs}.
lcp@291
   984
lcp@291
   985
More precisely, the mapping $ast_of_term$\index{ast_of_term@$ast_of_term$}
lcp@291
   986
is defined as follows:
lcp@291
   987
\begin{itemize}
lcp@291
   988
  \item $ast_of_term(\ttfct{Const} (x, \tau)) = \Constant x$.
lcp@291
   989
lcp@291
   990
  \item $ast_of_term(\ttfct{Free} (x, \tau)) = constrain (\Variable x,
lcp@291
   991
    \tau)$.
lcp@291
   992
lcp@291
   993
  \item $ast_of_term(\ttfct{Var} ((x, i), \tau)) = constrain (\Variable
lcp@291
   994
    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
lcp@291
   995
    the {\tt indexname} $(x, i)$.
lcp@291
   996
lcp@291
   997
  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
lcp@291
   998
    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
lcp@291
   999
    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
lcp@291
  1000
    the free variable $x'$.  This replaces corresponding occurrences of the
lcp@291
  1001
    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
lcp@291
  1002
    \mtt{dummyT})$:
lcp@291
  1003
   \begin{eqnarray*}
lcp@291
  1004
      \lefteqn{ast_of_term(\ttfct{Abs} (x, \tau, t))} \\
lcp@291
  1005
      &&\qquad{}=   \ttfct{Appl}
lcp@291
  1006
                  \mathopen{\mtt[} 
lcp@291
  1007
                  \Constant \mtt{"_abs"}, constrain(\Variable x', \tau), \\
lcp@291
  1008
      &&\qquad\qquad\qquad ast_of_term(t') \mathclose{\mtt]}.
lcp@291
  1009
    \end{eqnarray*}
lcp@291
  1010
lcp@291
  1011
  \item $ast_of_term(\ttfct{Bound} i) = \Variable \mtt{"B.}i\mtt"$.  
lcp@291
  1012
    The occurrence of constructor \ttindex{Bound} should never happen
lcp@291
  1013
    when printing well-typed terms; it indicates a de Bruijn index with no
lcp@291
  1014
    matching abstraction.
lcp@291
  1015
lcp@291
  1016
  \item Where $f$ is not an application,
lcp@291
  1017
    \begin{eqnarray*}
lcp@291
  1018
      \lefteqn{ast_of_term(f \ttapp x@1 \ttapp \ldots \ttapp x@n)} \\
lcp@291
  1019
      &&\qquad{}= \ttfct{Appl} 
lcp@291
  1020
                  \mathopen{\mtt[} ast_of_term(f), 
lcp@291
  1021
                  ast_of_term(x@1), \ldots,ast_of_term(x@n) 
lcp@291
  1022
                  \mathclose{\mtt]}
lcp@291
  1023
    \end{eqnarray*}
lcp@291
  1024
\end{itemize}
lcp@291
  1025
lcp@291
  1026
Type constraints are inserted to allow the printing of types, which is
lcp@291
  1027
governed by the boolean variable \ttindex{show_types}.  Constraints are
lcp@291
  1028
treated as follows:
lcp@291
  1029
\begin{itemize}
lcp@291
  1030
  \item $constrain(x, \tau) = x$, if $\tau = \mtt{dummyT}$ \index{*dummyT} or
lcp@291
  1031
    \ttindex{show_types} not set to {\tt true}.
lcp@291
  1032
lcp@291
  1033
  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, ty}$,
lcp@291
  1034
    where $ty$ is the \AST{} encoding of $\tau$.  That is, type constructors as
lcp@291
  1035
    {\tt Constant}s, type identifiers as {\tt Variable}s and type applications
lcp@291
  1036
    as {\tt Appl}s with the head type constructor as first element.
lcp@291
  1037
    Additionally, if \ttindex{show_sorts} is set to {\tt true}, some type
lcp@291
  1038
    variables are decorated with an \AST{} encoding of their sort.
lcp@291
  1039
\end{itemize}
lcp@291
  1040
lcp@291
  1041
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
lcp@291
  1042
transformed into the final output string.  The built-in {\bf print AST
lcp@291
  1043
  translations}\indexbold{translations!print AST} effectively reverse the
lcp@291
  1044
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
lcp@291
  1045
lcp@291
  1046
For the actual printing process, the names attached to productions
lcp@291
  1047
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
lcp@291
  1048
a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
lcp@291
  1049
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
lcp@291
  1050
for~$c$.  Each argument~$x@i$ is converted to a string, and put in
lcp@291
  1051
parentheses if its priority~$(p@i)$ requires this.  The resulting strings
lcp@291
  1052
and their syntactic sugar (denoted by ``\dots'' above) are joined to make a
lcp@291
  1053
single string.
lcp@291
  1054
lcp@291
  1055
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
lcp@291
  1056
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
lcp@291
  1057
x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with
lcp@291
  1058
non-constant head or without a corresponding production are printed as
lcp@291
  1059
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of
lcp@291
  1060
$\Variable x$ is simply printed as~$x$.
lcp@291
  1061
lcp@291
  1062
Blanks are {\em not\/} inserted automatically.  If blanks are required to
lcp@291
  1063
separate tokens, specify them in the mixfix declaration, possibly preceeded
lcp@291
  1064
by a slash~({\tt/}) to allow a line break.
lcp@291
  1065
\index{trees!abstract syntax|)}
lcp@291
  1066
lcp@291
  1067
lcp@291
  1068
lcp@291
  1069
\section{*Macros: Syntactic rewriting} \label{sec:macros}
lcp@291
  1070
\index{macros|(}\index{rewriting!syntactic|(} 
lcp@291
  1071
lcp@291
  1072
Mixfix declarations alone can handle situations where there is a direct
lcp@291
  1073
connection between the concrete syntax and the underlying term.  Sometimes
lcp@291
  1074
we require a more elaborate concrete syntax, such as quantifiers and list
lcp@291
  1075
notation.  Isabelle's {\bf macros} and {\bf translation functions} can
lcp@291
  1076
perform translations such as
lcp@291
  1077
\begin{center}\tt
lcp@291
  1078
  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
lcp@291
  1079
    ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
lcp@291
  1080
    [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
lcp@291
  1081
  \end{tabular}
lcp@291
  1082
\end{center}
lcp@291
  1083
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
lcp@291
  1084
are the most powerful translation mechanism but are difficult to read or
lcp@291
  1085
write.  Macros are specified by first-order rewriting systems that operate
lcp@291
  1086
on abstract syntax trees.  They are usually easy to read and write, and can
lcp@291
  1087
express all but the most obscure translations.
lcp@291
  1088
lcp@291
  1089
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
lcp@291
  1090
theory.\footnote{This and the following theories are complete working
lcp@291
  1091
  examples, though they specify only syntax, no axioms.  The file {\tt
lcp@291
  1092
    ZF/zf.thy} presents the full set theory definition, including many
lcp@291
  1093
  macro rules.}  Theory {\tt SET} defines constants for set comprehension
lcp@291
  1094
({\tt Collect}), replacement ({\tt Replace}) and bounded universal
lcp@291
  1095
quantification ({\tt Ball}).  Each of these binds some variables.  Without
lcp@291
  1096
additional syntax we should have to express $\forall x \in A.  P$ as {\tt
lcp@291
  1097
  Ball(A,\%x.P)}, and similarly for the others.
lcp@291
  1098
lcp@291
  1099
\begin{figure}
lcp@291
  1100
\begin{ttbox}
lcp@291
  1101
SET = Pure +
lcp@291
  1102
types
lcp@291
  1103
  i, o
lcp@291
  1104
arities
lcp@291
  1105
  i, o :: logic
lcp@291
  1106
consts
lcp@291
  1107
  Trueprop      :: "o => prop"              ("_" 5)
lcp@291
  1108
  Collect       :: "[i, i => o] => i"
lcp@291
  1109
  "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
lcp@291
  1110
  Replace       :: "[i, [i, i] => o] => i"
lcp@291
  1111
  "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
lcp@291
  1112
  Ball          :: "[i, i => o] => o"
lcp@291
  1113
  "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
lcp@291
  1114
translations
lcp@291
  1115
  "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
lcp@291
  1116
  "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
lcp@291
  1117
  "ALL x:A. P"  == "Ball(A, \%x. P)"
lcp@291
  1118
end
lcp@291
  1119
\end{ttbox}
lcp@291
  1120
\caption{Macro example: set theory}\label{fig:set_trans}
lcp@291
  1121
\end{figure}
lcp@291
  1122
lcp@291
  1123
The theory specifies a variable-binding syntax through additional
lcp@291
  1124
productions that have mixfix declarations.  Each non-copy production must
lcp@291
  1125
specify some constant, which is used for building \AST{}s.  The additional
lcp@291
  1126
constants are decorated with {\tt\at} to stress their purely syntactic
lcp@291
  1127
purpose; they should never occur within the final well-typed terms.
lcp@291
  1128
Furthermore, they cannot be written in formulae because they are not legal
lcp@291
  1129
identifiers.
lcp@291
  1130
lcp@291
  1131
The translations cause the replacement of external forms by internal forms
lcp@291
  1132
after parsing, and vice versa before printing of terms.  As a specification
lcp@291
  1133
of the set theory notation, they should be largely self-explanatory.  The
lcp@291
  1134
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
lcp@291
  1135
appear implicitly in the macro rules via their mixfix forms.
lcp@291
  1136
lcp@291
  1137
Macros can define variable-binding syntax because they operate on \AST{}s,
lcp@291
  1138
which have no inbuilt notion of bound variable.  The macro variables {\tt
lcp@291
  1139
  x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
lcp@291
  1140
in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
lcp@291
  1141
range over formulae containing bound variable occurrences.
lcp@291
  1142
lcp@291
  1143
Other applications of the macro system can be less straightforward, and
lcp@291
  1144
there are peculiarities.  The rest of this section will describe in detail
lcp@291
  1145
how Isabelle macros are preprocessed and applied.
lcp@291
  1146
lcp@291
  1147
lcp@291
  1148
\subsection{Specifying macros}
lcp@291
  1149
Macros are basically rewrite rules on \AST{}s.  But unlike other macro
lcp@291
  1150
systems found in programming languages, Isabelle's macros work in both
lcp@291
  1151
directions.  Therefore a syntax contains two lists of rewrites: one for
lcp@291
  1152
parsing and one for printing.
lcp@291
  1153
lcp@291
  1154
The {\tt translations} section\index{translations section@{\tt translations}
lcp@291
  1155
section} specifies macros.  The syntax for a macro is
lcp@291
  1156
\[ (root)\; string \quad
lcp@291
  1157
   \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
lcp@291
  1158
   \right\} \quad
lcp@291
  1159
   (root)\; string 
lcp@291
  1160
\]
lcp@291
  1161
%
lcp@291
  1162
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
lcp@291
  1163
({\tt ==}).  The two strings specify the left and right-hand sides of the
lcp@291
  1164
macro rule.  The $(root)$ specification is optional; it specifies the
lcp@291
  1165
nonterminal for parsing the $string$ and if omitted defaults to {\tt
lcp@291
  1166
  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
lcp@291
  1167
\begin{itemize}
lcp@291
  1168
\item Rules must be left linear: $l$ must not contain repeated variables.
lcp@291
  1169
lcp@291
  1170
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
lcp@291
  1171
  (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
lcp@291
  1172
lcp@291
  1173
\item Every variable in~$r$ must also occur in~$l$.
lcp@291
  1174
\end{itemize}
lcp@291
  1175
lcp@291
  1176
Macro rules may refer to any syntax from the parent theories.  They may
lcp@291
  1177
also refer to anything defined before the the {\tt .thy} file's {\tt
lcp@291
  1178
  translations} section --- including any mixfix declarations.
lcp@291
  1179
lcp@291
  1180
Upon declaration, both sides of the macro rule undergo parsing and parse
lcp@291
  1181
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
lcp@291
  1182
macro expansion.  The lexer runs in a different mode that additionally
lcp@291
  1183
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
lcp@291
  1184
{\tt _K}).  Thus, a constant whose name starts with an underscore can
lcp@291
  1185
appear in macro rules but not in ordinary terms.
lcp@291
  1186
lcp@291
  1187
Some atoms of the macro rule's \AST{} are designated as constants for
lcp@291
  1188
matching.  These are all names that have been declared as classes, types or
lcp@291
  1189
constants.
lcp@291
  1190
lcp@291
  1191
The result of this preprocessing is two lists of macro rules, each stored
lcp@291
  1192
as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
lcp@291
  1193
(sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
lcp@291
  1194
theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
lcp@291
  1195
\begin{ttbox}
lcp@291
  1196
parse_rules:
lcp@291
  1197
  ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
lcp@291
  1198
  ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
lcp@291
  1199
  ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
lcp@291
  1200
print_rules:
lcp@291
  1201
  ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
lcp@291
  1202
  ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
lcp@291
  1203
  ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
lcp@291
  1204
\end{ttbox}
lcp@291
  1205
lcp@291
  1206
\begin{warn}
lcp@291
  1207
  Avoid choosing variable names that have previously been used as
lcp@291
  1208
  constants, types or type classes; the {\tt consts} section in the output
lcp@291
  1209
  of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
lcp@291
  1210
  incorrectly, inspect its internal form as shown above, recalling that
lcp@291
  1211
  constants appear as quoted strings and variables without quotes.
lcp@291
  1212
\end{warn}
lcp@291
  1213
lcp@291
  1214
\begin{warn}
lcp@291
  1215
If \ttindex{eta_contract} is set to {\tt true}, terms will be
lcp@291
  1216
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
lcp@291
  1217
abstraction nodes needed for print rules to match may vanish.  For example,
lcp@291
  1218
\verb|Ball(A, %x. P(x))| contracts {\tt Ball(A, P)}; the print rule does
lcp@291
  1219
not apply and the output will be {\tt Ball(A, P)}.  This problem would not
lcp@291
  1220
occur if \ML{} translation functions were used instead of macros (as is
lcp@291
  1221
done for binder declarations).
lcp@291
  1222
\end{warn}
lcp@291
  1223
lcp@291
  1224
lcp@291
  1225
\begin{warn}
lcp@291
  1226
Another trap concerns type constraints.  If \ttindex{show_types} is set to
lcp@291
  1227
{\tt true}, bound variables will be decorated by their meta types at the
lcp@291
  1228
binding place (but not at occurrences in the body).  Matching with
lcp@291
  1229
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
lcp@291
  1230
"i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
lcp@291
  1231
appear in the external form, say \verb|{y::i:A::i. P::o}|.  
lcp@291
  1232
lcp@291
  1233
To allow such constraints to be re-read, your syntax should specify bound
lcp@291
  1234
variables using the nonterminal~\ttindex{idt}.  This is the case in our
lcp@291
  1235
example.  Choosing {\tt id} instead of {\tt idt} is a common error,
lcp@291
  1236
especially since it appears in former versions of most of Isabelle's
lcp@291
  1237
object-logics.
lcp@291
  1238
\end{warn}
lcp@291
  1239
lcp@291
  1240
lcp@291
  1241
lcp@291
  1242
\subsection{Applying rules}
lcp@291
  1243
As a term is being parsed or printed, an \AST{} is generated as an
lcp@291
  1244
intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
lcp@291
  1245
normalized by applying macro rules in the manner of a traditional term
lcp@291
  1246
rewriting system.  We first examine how a single rule is applied.
lcp@291
  1247
lcp@291
  1248
Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some
lcp@291
  1249
translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
lcp@291
  1250
instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
lcp@291
  1251
matched by $l$ may be replaced by the corresponding instance of~$r$, thus
lcp@291
  1252
{\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
lcp@291
  1253
  place-holders} that may occur in rule patterns but not in ordinary
lcp@291
  1254
\AST{}s; {\tt Variable} atoms serve this purpose.
lcp@291
  1255
lcp@291
  1256
The matching of the object~$u$ by the pattern~$l$ is performed as follows:
lcp@291
  1257
\begin{itemize}
lcp@291
  1258
  \item Every constant matches itself.
lcp@291
  1259
lcp@291
  1260
  \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
lcp@291
  1261
    This point is discussed further below.
lcp@291
  1262
lcp@291
  1263
  \item Every \AST{} in the object matches $\Variable x$ in the pattern,
lcp@291
  1264
    binding~$x$ to~$u$.
lcp@291
  1265
lcp@291
  1266
  \item One application matches another if they have the same number of
lcp@291
  1267
    subtrees and corresponding subtrees match.
lcp@291
  1268
lcp@291
  1269
  \item In every other case, matching fails.  In particular, {\tt
lcp@291
  1270
      Constant}~$x$ can only match itself.
lcp@291
  1271
\end{itemize}
lcp@291
  1272
A successful match yields a substitution that is applied to~$r$, generating
lcp@291
  1273
the instance that replaces~$u$.
lcp@291
  1274
lcp@291
  1275
The second case above may look odd.  This is where {\tt Variable}s of
lcp@291
  1276
non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
lcp@291
  1277
far removed from parse trees; at this level it is not yet known which
lcp@291
  1278
identifiers will become constants, bounds, frees, types or classes.  As
lcp@291
  1279
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
lcp@291
  1280
{\tt Constant}s, while $id$s, $var$s, $tid$s and $tvar$s become {\tt
lcp@291
  1281
  Variable}s.  On the other hand, when \AST{}s generated from terms for
lcp@291
  1282
printing, all constants and type constructors become {\tt Constant}s; see
lcp@291
  1283
\S\ref{sec:asts}.  Thus \AST{}s may contain a messy mixture of {\tt
lcp@291
  1284
  Variable}s and {\tt Constant}s.  This is insignificant at macro level
lcp@291
  1285
because matching treats them alike.
lcp@291
  1286
lcp@291
  1287
Because of this behaviour, different kinds of atoms with the same name are
lcp@291
  1288
indistinguishable, which may make some rules prone to misbehaviour.  Example:
lcp@291
  1289
\begin{ttbox}
lcp@291
  1290
types
lcp@291
  1291
  Nil
lcp@291
  1292
consts
lcp@291
  1293
  Nil     :: "'a list"
lcp@291
  1294
  "[]"    :: "'a list"    ("[]")
lcp@291
  1295
translations
lcp@291
  1296
  "[]"    == "Nil"
lcp@291
  1297
\end{ttbox}
lcp@291
  1298
The term {\tt Nil} will be printed as {\tt []}, just as expected.  What
lcp@291
  1299
happens with \verb|%Nil.t| or {\tt x::Nil} is left as an exercise.
lcp@291
  1300
lcp@291
  1301
Normalizing an \AST{} involves repeatedly applying macro rules until none
lcp@291
  1302
is applicable.  Macro rules are chosen in the order that they appear in the
lcp@291
  1303
{\tt translations} section.  You can watch the normalization of \AST{}s
lcp@291
  1304
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
lcp@291
  1305
{\tt true}.\index{tracing!of macros} Alternatively, use
lcp@291
  1306
\ttindex{Syntax.test_read}.  The information displayed when tracing
lcp@291
  1307
includes the \AST{} before normalization ({\tt pre}), redexes with results
lcp@291
  1308
({\tt rewrote}), the normal form finally reached ({\tt post}) and some
lcp@291
  1309
statistics ({\tt normalize}).  If tracing is off,
lcp@291
  1310
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
lcp@291
  1311
printing of the normal form and statistics only.
lcp@291
  1312
lcp@291
  1313
lcp@291
  1314
\subsection{Example: the syntax of finite sets}
lcp@291
  1315
This example demonstrates the use of recursive macros to implement a
lcp@291
  1316
convenient notation for finite sets.
lcp@291
  1317
\begin{ttbox}
lcp@291
  1318
FINSET = SET +
lcp@291
  1319
types
lcp@291
  1320
  is
lcp@291
  1321
consts
lcp@291
  1322
  ""            :: "i => is"                ("_")
lcp@291
  1323
  "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
lcp@291
  1324
  empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
lcp@291
  1325
  insert        :: "[i, i] => i"
lcp@291
  1326
  "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
lcp@291
  1327
translations
lcp@291
  1328
  "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
lcp@291
  1329
  "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
lcp@291
  1330
end
lcp@291
  1331
\end{ttbox}
lcp@291
  1332
Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
lcp@291
  1333
declarations above specify \verb|{x, y, z}| as the external representation
lcp@291
  1334
of
lcp@291
  1335
\begin{ttbox}
lcp@291
  1336
insert(x, insert(y, insert(z, empty)))
lcp@291
  1337
\end{ttbox}
lcp@291
  1338
lcp@291
  1339
The nonterminal symbol~{\tt is} stands for one or more objects of type~{\tt
lcp@291
  1340
  i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
lcp@291
  1341
allows a line break after the comma for pretty printing; if no line break
lcp@291
  1342
is required then a space is printed instead.
lcp@291
  1343
lcp@291
  1344
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
lcp@291
  1345
declaration.  Hence {\tt is} is not a logical type and no default
lcp@291
  1346
productions are added.  If we had needed enumerations of the nonterminal
lcp@291
  1347
{\tt logic}, which would include all the logical types, we could have used
lcp@291
  1348
the predefined nonterminal symbol \ttindex{args} and skipped this part
lcp@291
  1349
altogether.  The nonterminal~{\tt is} can later be reused for other
lcp@291
  1350
enumerations of type~{\tt i} like lists or tuples.
lcp@291
  1351
lcp@291
  1352
Next follows {\tt empty}, which is already equipped with its syntax
lcp@291
  1353
\verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
lcp@291
  1354
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
lcp@291
  1355
  i} enclosed in curly braces.  Remember that a pair of parentheses, as in
lcp@291
  1356
\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
lcp@291
  1357
lcp@291
  1358
The translations may look strange at first.  Macro rules are best
lcp@291
  1359
understood in their internal forms:
lcp@291
  1360
\begin{ttbox}
lcp@291
  1361
parse_rules:
lcp@291
  1362
  ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
lcp@291
  1363
  ("{\at}Finset" x)  ->  ("insert" x "empty")
lcp@291
  1364
print_rules:
lcp@291
  1365
  ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
lcp@291
  1366
  ("insert" x "empty")  ->  ("{\at}Finset" x)
lcp@291
  1367
\end{ttbox}
lcp@291
  1368
This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
lcp@291
  1369
two elements, binding the first to {\tt x} and the rest to {\tt xs}.
lcp@291
  1370
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
lcp@291
  1371
The parse rules only work in the order given.
lcp@291
  1372
lcp@291
  1373
\begin{warn}
lcp@291
  1374
  The \AST{} rewriter cannot discern constants from variables and looks
lcp@291
  1375
  only for names of atoms.  Thus the names of {\tt Constant}s occurring in
lcp@291
  1376
  the (internal) left-hand side of translation rules should be regarded as
lcp@291
  1377
  reserved keywords.  Choose non-identifiers like {\tt\at Finset} or
lcp@291
  1378
  sufficiently long and strange names.  If a bound variable's name gets
lcp@291
  1379
  rewritten, the result will be incorrect; for example, the term
lcp@291
  1380
\begin{ttbox}
lcp@291
  1381
\%empty insert. insert(x, empty)
lcp@291
  1382
\end{ttbox}
lcp@291
  1383
  gets printed as \verb|%empty insert. {x}|.
lcp@291
  1384
\end{warn}
lcp@291
  1385
lcp@291
  1386
lcp@291
  1387
\subsection{Example: a parse macro for dependent types}\label{prod_trans}
lcp@291
  1388
As stated earlier, a macro rule may not introduce new {\tt Variable}s on
lcp@291
  1389
the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;
lcp@291
  1390
it allowed, it could cause variable capture.  In such cases you usually
lcp@291
  1391
must fall back on translation functions.  But a trick can make things
lcp@291
  1392
readable in some cases: {\em calling translation functions by parse
lcp@291
  1393
  macros}:
lcp@291
  1394
\begin{ttbox}
lcp@291
  1395
PROD = FINSET +
lcp@291
  1396
consts
lcp@291
  1397
  Pi            :: "[i, i => i] => i"
lcp@291
  1398
  "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
lcp@291
  1399
  "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
lcp@291
  1400
\ttbreak
lcp@291
  1401
translations
lcp@291
  1402
  "PROD x:A. B" => "Pi(A, \%x. B)"
lcp@291
  1403
  "A -> B"      => "Pi(A, _K(B))"
lcp@291
  1404
end
lcp@291
  1405
ML
lcp@291
  1406
  val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
lcp@291
  1407
\end{ttbox}
lcp@291
  1408
lcp@291
  1409
Here {\tt Pi} is an internal constant for constructing general products.
lcp@291
  1410
Two external forms exist: the general case {\tt PROD x:A.B} and the
lcp@291
  1411
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
lcp@291
  1412
does not depend on~{\tt x}.
lcp@291
  1413
lcp@291
  1414
The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
lcp@291
  1415
due to a parse translation associated with \ttindex{_K}.  The order of the
lcp@291
  1416
parse rules is critical.  Unfortunately there is no such trick for
lcp@291
  1417
printing, so we have to add a {\tt ML} section for the print translation
lcp@291
  1418
\ttindex{dependent_tr'}.
lcp@291
  1419
lcp@291
  1420
Recall that identifiers with a leading {\tt _} are allowed in translation
lcp@291
  1421
rules, but not in ordinary terms.  Thus we can create \AST{}s containing
lcp@291
  1422
names that are not directly expressible.
lcp@291
  1423
lcp@291
  1424
The parse translation for {\tt _K} is already installed in Pure, and {\tt
lcp@291
  1425
dependent_tr'} is exported by the syntax module for public use.  See
lcp@291
  1426
\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
lcp@291
  1427
\index{macros|)}\index{rewriting!syntactic|)}
lcp@291
  1428
lcp@291
  1429
lcp@291
  1430
lcp@291
  1431
\section{*Translation functions} \label{sec:tr_funs}
lcp@291
  1432
\index{translations|(} 
lcp@291
  1433
%
lcp@291
  1434
This section describes the translation function mechanism.  By writing
lcp@291
  1435
\ML{} functions, you can do almost everything with terms or \AST{}s during
lcp@291
  1436
parsing and printing.  The logic \LK\ is a good example of sophisticated
lcp@291
  1437
transformations between internal and external representations of
lcp@291
  1438
associative sequences; here, macros would be useless.
lcp@291
  1439
lcp@291
  1440
A full understanding of translations requires some familiarity
lcp@291
  1441
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
lcp@291
  1442
{\tt Syntax.ast} and the encodings of types and terms as such at the various
lcp@291
  1443
stages of the parsing or printing process.  Most users should never need to
lcp@291
  1444
use translation functions.
lcp@291
  1445
lcp@291
  1446
\subsection{Declaring translation functions}
lcp@291
  1447
There are four kinds of translation functions.  Each such function is
lcp@291
  1448
associated with a name, which triggers calls to it.  Such names can be
lcp@291
  1449
constants (logical or syntactic) or type constructors.
lcp@291
  1450
lcp@291
  1451
{\tt Syntax.print_syntax} displays the sets of names associated with the
lcp@291
  1452
translation functions of a {\tt Syntax.syntax} under
lcp@291
  1453
\ttindex{parse_ast_translation}, \ttindex{parse_translation},
lcp@291
  1454
\ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
lcp@291
  1455
add new ones via the {\tt ML} section\index{ML section@{\tt ML} section} of
lcp@291
  1456
a {\tt .thy} file.  There may never be more than one function of the same
lcp@291
  1457
kind per name.  Conceptually, the {\tt ML} section should appear between
lcp@291
  1458
{\tt consts} and {\tt translations}; newly installed translation functions
lcp@291
  1459
are already effective when macros and logical rules are parsed.
lcp@291
  1460
lcp@291
  1461
The {\tt ML} section is copied verbatim into the \ML\ file generated from a
lcp@291
  1462
{\tt .thy} file.  Definitions made here are accessible as components of an
lcp@291
  1463
\ML\ structure; to make some definitions private, use an \ML{} {\tt local}
lcp@291
  1464
declaration.  The {\tt ML} section may install translation functions by
lcp@291
  1465
declaring any of the following identifiers:
lcp@291
  1466
\begin{ttbox}
lcp@291
  1467
val parse_ast_translation : (string * (ast list -> ast)) list
lcp@291
  1468
val print_ast_translation : (string * (ast list -> ast)) list
lcp@291
  1469
val parse_translation     : (string * (term list -> term)) list
lcp@291
  1470
val print_translation     : (string * (term list -> term)) list
lcp@291
  1471
\end{ttbox}
lcp@291
  1472
lcp@291
  1473
\subsection{The translation strategy}
lcp@291
  1474
All four kinds of translation functions are treated similarly.  They are
lcp@291
  1475
called during the transformations between parse trees, \AST{}s and terms
lcp@291
  1476
(recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
lcp@291
  1477
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
lcp@291
  1478
$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
lcp@291
  1479
function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
lcp@291
  1480
lcp@291
  1481
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
lcp@291
  1482
combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
lcp@291
  1483
  x@n}$.  For term translations, the arguments are terms and a combination
lcp@291
  1484
has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
lcp@291
  1485
x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
lcp@291
  1486
transformations than \AST{}s do, typically involving abstractions and bound
lcp@291
  1487
variables.
lcp@291
  1488
lcp@291
  1489
Regardless of whether they act on terms or \AST{}s,
lcp@291
  1490
parse translations differ from print translations fundamentally:
lcp@291
  1491
\begin{description}
lcp@291
  1492
\item[Parse translations] are applied bottom-up.  The arguments are already
lcp@291
  1493
  in translated form.  The translations must not fail; exceptions trigger
lcp@291
  1494
  an error message.
lcp@291
  1495
lcp@291
  1496
\item[Print translations] are applied top-down.  They are supplied with
lcp@291
  1497
  arguments that are partly still in internal form.  The result again
lcp@291
  1498
  undergoes translation; therefore a print translation should not introduce
lcp@291
  1499
  as head the very constant that invoked it.  The function may raise
lcp@291
  1500
  exception \ttindex{Match} to indicate failure; in this event it has no
lcp@291
  1501
  effect.
lcp@291
  1502
\end{description}
lcp@291
  1503
lcp@291
  1504
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
lcp@291
  1505
\ttindex{Const} for terms --- can invoke translation functions.  This
lcp@291
  1506
causes another difference between parsing and printing.
lcp@291
  1507
lcp@291
  1508
Parsing starts with a string and the constants are not yet identified.
lcp@291
  1509
Only parse tree heads create {\tt Constant}s in the resulting \AST; recall
lcp@291
  1510
$ast_of_pt$ in \S\ref{sec:asts}.  Macros and parse \AST{} translations may
lcp@291
  1511
introduce further {\tt Constant}s.  When the final \AST{} is converted to a
lcp@291
  1512
term, all {\tt Constant}s become {\tt Const}s; recall $term_of_ast$ in
lcp@291
  1513
\S\ref{sec:asts}.
lcp@291
  1514
lcp@291
  1515
Printing starts with a well-typed term and all the constants are known.  So
lcp@291
  1516
all logical constants and type constructors may invoke print translations.
lcp@291
  1517
These, and macros, may introduce further constants.
lcp@291
  1518
lcp@291
  1519
lcp@291
  1520
\subsection{Example: a print translation for dependent types}
lcp@291
  1521
\indexbold{*_K}\indexbold{*dependent_tr'}
lcp@291
  1522
Let us continue the dependent type example (page~\pageref{prod_trans}) by
lcp@291
  1523
examining the parse translation for {\tt _K} and the print translation
lcp@291
  1524
{\tt dependent_tr'}, which are both built-in.  By convention, parse
lcp@291
  1525
translations have names ending with {\tt _tr} and print translations have
lcp@291
  1526
names ending with {\tt _tr'}.  Search for such names in the Isabelle
lcp@291
  1527
sources to locate more examples.
lcp@291
  1528
lcp@291
  1529
Here is the parse translation for {\tt _K}:
lcp@291
  1530
\begin{ttbox}
lcp@291
  1531
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
lcp@291
  1532
  | k_tr ts = raise TERM("k_tr",ts);
lcp@291
  1533
\end{ttbox}
lcp@291
  1534
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
lcp@291
  1535
{\tt Abs} node with a body derived from $t$.  Since terms given to parse
lcp@291
  1536
translations are not yet typed, the type of the bound variable in the new
lcp@291
  1537
{\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
lcp@291
  1538
nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
lcp@291
  1539
a basic term manipulation function defined in {\tt Pure/term.ML}.
lcp@291
  1540
lcp@291
  1541
Here is the print translation for dependent types:
lcp@291
  1542
\begin{ttbox}
lcp@291
  1543
fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
lcp@291
  1544
      if 0 mem (loose_bnos B) then
lcp@291
  1545
        let val (x', B') = variant_abs (x, dummyT, B);
lcp@291
  1546
        in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
lcp@291
  1547
        end
lcp@291
  1548
      else list_comb (Const (r, dummyT) $ A $ B, ts)
lcp@291
  1549
  | dependent_tr' _ _ = raise Match;
lcp@291
  1550
\end{ttbox}
lcp@291
  1551
The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried
lcp@291
  1552
function application during its installation.  We could set up print
lcp@291
  1553
translations for both {\tt Pi} and {\tt Sigma} by including
lcp@291
  1554
\begin{ttbox}
lcp@291
  1555
val print_translation =
lcp@291
  1556
  [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
lcp@291
  1557
   ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
lcp@291
  1558
\end{ttbox}
lcp@291
  1559
within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
lcp@291
  1560
\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
lcp@291
  1561
$\hbox{\tt{\at}->}r(A, B)$, choosing the latter form if $B$ does not depend
lcp@291
  1562
on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
lcp@291
  1563
from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
lcp@291
  1564
from all names in $B$, and $B'$ the body $B$ with {\tt Bound} nodes
lcp@291
  1565
referring to our {\tt Abs} node replaced by $\ttfct{Free} (x',
lcp@291
  1566
\mtt{dummyT})$.
lcp@291
  1567
lcp@291
  1568
We must be careful with types here.  While types of {\tt Const}s are
lcp@291
  1569
ignored, type constraints may be printed for some {\tt Free}s and
lcp@291
  1570
{\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
lcp@291
  1571
\ttindex{dummyT} are never printed with constraint, though.  The line
lcp@291
  1572
\begin{ttbox}
lcp@291
  1573
        let val (x', B') = variant_abs (x, dummyT, B);
lcp@291
  1574
\end{ttbox}\index{*variant_abs}
lcp@291
  1575
replaces bound variable occurrences in~$B$ by the free variable $x'$ with
lcp@291
  1576
type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
lcp@291
  1577
correct type~{\tt T}, so this is the only place where a type
lcp@291
  1578
constraint might appear. 
lcp@291
  1579
\index{translations|)}
lcp@291
  1580
lcp@291
  1581
lcp@291
  1582