| author | wenzelm | 
| Thu, 20 Feb 1997 17:02:42 +0100 | |
| changeset 2667 | b2172eab9ba6 | 
| parent 291 | a615050a7494 | 
| permissions | -rw-r--r-- | 
| 104 | 1 | %% $Id$ | 
| 291 | 2 | %% \([a-zA-Z][a-zA-Z]}\.\) \([^ ]\) \1 \2 | 
| 3 | %%  @\([a-z0-9]\)       ^{(\1)}
 | |
| 108 | 4 | |
| 291 | 5 | \newcommand\rmindex[1]{{#1}\index{#1}\@}
 | 
| 6 | \newcommand\mtt[1]{\mbox{\tt #1}}
 | |
| 7 | \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
 | |
| 8 | \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
 | |
| 9 | \newcommand\Constant{\ttfct{Constant}}
 | |
| 10 | \newcommand\Variable{\ttfct{Variable}}
 | |
| 11 | \newcommand\Appl[1]{\ttfct{Appl}\mathopen{\mtt[}#1\mathclose{\mtt]}}
 | |
| 12 | \newcommand\AST{{\sc ast}}
 | |
| 13 | \let\rew=\longrightarrow | |
| 104 | 14 | |
| 15 | ||
| 16 | \chapter{Defining Logics} \label{Defining-Logics}
 | |
| 17 | ||
| 291 | 18 | This chapter explains how to define new formal systems --- in particular, | 
| 19 | their concrete syntax. While Isabelle can be regarded as a theorem prover | |
| 20 | for set theory, higher-order logic or the sequent calculus, its | |
| 21 | distinguishing feature is support for the definition of new logics. | |
| 104 | 22 | |
| 291 | 23 | Isabelle logics are hierarchies of theories, which are described and | 
| 24 | illustrated in {\em Introduction to Isabelle}.  That material, together
 | |
| 25 | with the theory files provided in the examples directories, should suffice | |
| 26 | for all simple applications. The easiest way to define a new theory is by | |
| 27 | modifying a copy of an existing theory. | |
| 28 | ||
| 29 | This chapter is intended for experienced Isabelle users. It documents all | |
| 30 | aspects of theories concerned with syntax: mixfix declarations, pretty | |
| 31 | printing, macros and translation functions. The extended examples of | |
| 32 | \S\ref{sec:min_logics} demonstrate the logical aspects of the definition of
 | |
| 33 | theories. Sections marked with * are highly technical and might be skipped | |
| 34 | on the first reading. | |
| 104 | 35 | |
| 36 | ||
| 291 | 37 | \section{Priority grammars} \label{sec:priority_grammars}
 | 
| 38 | \index{grammars!priority|(} 
 | |
| 104 | 39 | |
| 291 | 40 | The syntax of an Isabelle logic is specified by a {\bf priority grammar}.
 | 
| 41 | A context-free grammar\index{grammars!context-free} contains a set of
 | |
| 42 | productions of the form $A=\gamma$, where $A$ is a nonterminal and | |
| 43 | $\gamma$, the right-hand side, is a string of terminals and nonterminals. | |
| 44 | Isabelle uses an extended format permitting {\bf priorities}, or
 | |
| 45 | precedences. Each nonterminal is decorated by an integer priority, as | |
| 46 | in~$A^{(p)}$.  A nonterminal $A^{(p)}$ in a derivation may be replaced
 | |
| 47 | using a production $A^{(q)} = \gamma$ only if $p \le q$.  Any priority
 | |
| 48 | grammar can be translated into a normal context free grammar by introducing | |
| 49 | new nonterminals and productions. | |
| 104 | 50 | |
| 51 | Formally, a set of context free productions $G$ induces a derivation | |
| 291 | 52 | relation $\rew@G$. Let $\alpha$ and $\beta$ denote strings of terminal or | 
| 53 | nonterminal symbols. Then | |
| 54 | \[ \alpha\, A^{(p)}\, \beta ~\rew@G~ \alpha\,\gamma\,\beta \] 
 | |
| 55 | if and only if $G$ contains some production $A^{(q)}=\gamma$ for~$q\ge p$.
 | |
| 104 | 56 | |
| 57 | The following simple grammar for arithmetic expressions demonstrates how | |
| 291 | 58 | binding power and associativity of operators can be enforced by priorities. | 
| 104 | 59 | \begin{center}
 | 
| 60 | \begin{tabular}{rclr}
 | |
| 291 | 61 |   $A^{(9)}$ & = & {\tt0} \\
 | 
| 62 |   $A^{(9)}$ & = & {\tt(} $A^{(0)}$ {\tt)} \\
 | |
| 63 |   $A^{(0)}$ & = & $A^{(0)}$ {\tt+} $A^{(1)}$ \\
 | |
| 64 |   $A^{(2)}$ & = & $A^{(3)}$ {\tt*} $A^{(2)}$ \\
 | |
| 65 |   $A^{(3)}$ & = & {\tt-} $A^{(3)}$
 | |
| 104 | 66 | \end{tabular}
 | 
| 67 | \end{center}
 | |
| 291 | 68 | The choice of priorities determines that {\tt -} binds tighter than {\tt *},
 | 
| 69 | which binds tighter than {\tt +}.  Furthermore {\tt +} associates to the
 | |
| 70 | left and {\tt *} to the right.
 | |
| 104 | 71 | |
| 72 | To minimize the number of subscripts, we adopt the following conventions: | |
| 73 | \begin{itemize}
 | |
| 291 | 74 | \item All priorities $p$ must be in the range $0 \leq p \leq max_pri$ for | 
| 75 | some fixed integer $max_pri$. | |
| 76 | \item Priority $0$ on the right-hand side and priority $max_pri$ on the | |
| 104 | 77 | left-hand side may be omitted. | 
| 78 | \end{itemize}
 | |
| 291 | 79 | The production $A^{(p)} = \alpha$ is written as $A = \alpha~(p)$;
 | 
| 80 | the priority of the left-hand side actually appears in a column on the far | |
| 81 | right. Finally, alternatives may be separated by $|$, and repetition | |
| 104 | 82 | indicated by \dots. | 
| 83 | ||
| 291 | 84 | Using these conventions and assuming $max_pri=9$, the grammar takes the form | 
| 104 | 85 | \begin{center}
 | 
| 86 | \begin{tabular}{rclc}
 | |
| 87 | $A$ & = & {\tt0} & \hspace*{4em} \\
 | |
| 88 |  & $|$ & {\tt(} $A$ {\tt)} \\
 | |
| 291 | 89 |  & $|$ & $A$ {\tt+} $A^{(1)}$ & (0) \\
 | 
| 90 |  & $|$ & $A^{(3)}$ {\tt*} $A^{(2)}$ & (2) \\
 | |
| 91 |  & $|$ & {\tt-} $A^{(3)}$ & (3)
 | |
| 104 | 92 | \end{tabular}
 | 
| 93 | \end{center}
 | |
| 291 | 94 | \index{grammars!priority|)}
 | 
| 104 | 95 | |
| 96 | ||
| 291 | 97 | \begin{figure}
 | 
| 104 | 98 | \begin{center}
 | 
| 99 | \begin{tabular}{rclc}
 | |
| 100 | $prop$ &=& \ttindex{PROP} $aprop$ ~~$|$~~ {\tt(} $prop$ {\tt)} \\
 | |
| 291 | 101 |      &$|$& $logic^{(3)}$ \ttindex{==} $logic^{(2)}$ & (2) \\
 | 
| 102 |      &$|$& $logic^{(3)}$ \ttindex{=?=} $logic^{(2)}$ & (2) \\
 | |
| 103 |      &$|$& $prop^{(2)}$ \ttindex{==>} $prop^{(1)}$ & (1) \\
 | |
| 104 |      &$|$& {\tt[|} $prop$ {\tt;} \dots {\tt;} $prop$ {\tt|]} {\tt==>} $prop^{(1)}$ & (1) \\
 | |
| 104 | 105 |      &$|$& {\tt!!} $idts$ {\tt.} $prop$ & (0) \\\\
 | 
| 106 | $logic$ &=& $prop$ ~~$|$~~ $fun$ \\\\ | |
| 107 | $aprop$ &=& $id$ ~~$|$~~ $var$ | |
| 291 | 108 |     ~~$|$~~ $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\\\
 | 
| 104 | 109 | $fun$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $fun$ {\tt)} \\
 | 
| 291 | 110 |     &$|$& $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)} \\
 | 
| 111 |     &$|$& $fun^{(max_pri)}$ {\tt::} $type$ \\
 | |
| 104 | 112 |     &$|$& \ttindex{\%} $idts$ {\tt.} $logic$ & (0) \\\\
 | 
| 291 | 113 | $idts$ &=& $idt$ ~~$|$~~ $idt^{(1)}$ $idts$ \\\\
 | 
| 104 | 114 | $idt$ &=& $id$ ~~$|$~~ {\tt(} $idt$ {\tt)} \\
 | 
| 115 |     &$|$& $id$ \ttindex{::} $type$ & (0) \\\\
 | |
| 291 | 116 | $type$ &=& $tid$ ~~$|$~~ $tvar$ ~~$|$~~ $tid$ {\tt::} $sort$
 | 
| 135 | 117 |   ~~$|$~~ $tvar$ {\tt::} $sort$ \\
 | 
| 291 | 118 |      &$|$& $id$ ~~$|$~~ $type^{(max_pri)}$ $id$
 | 
| 104 | 119 |                 ~~$|$~~ {\tt(} $type$ {\tt,} \dots {\tt,} $type$ {\tt)} $id$ \\
 | 
| 291 | 120 |      &$|$& $type^{(1)}$ \ttindex{=>} $type$ & (0) \\
 | 
| 104 | 121 |      &$|$& {\tt[}  $type$ {\tt,} \dots {\tt,} $type$ {\tt]} {\tt=>} $type$&(0)\\
 | 
| 122 |      &$|$& {\tt(} $type$ {\tt)} \\\\
 | |
| 123 | $sort$ &=& $id$ ~~$|$~~ {\tt\ttlbrace\ttrbrace}
 | |
| 124 |                 ~~$|$~~ {\tt\ttlbrace} $id$ {\tt,} \dots {\tt,} $id$ {\tt\ttrbrace}
 | |
| 125 | \end{tabular}\index{*"!"!}\index{*"["|}\index{*"|"]}
 | |
| 126 | \indexbold{type@$type$} \indexbold{sort@$sort$} \indexbold{idt@$idt$}
 | |
| 127 | \indexbold{idts@$idts$} \indexbold{logic@$logic$} \indexbold{prop@$prop$}
 | |
| 128 | \indexbold{fun@$fun$}
 | |
| 129 | \end{center}
 | |
| 291 | 130 | \caption{Meta-logic syntax}\label{fig:pure_gram}
 | 
| 104 | 131 | \end{figure}
 | 
| 132 | ||
| 291 | 133 | |
| 134 | \section{The Pure syntax} \label{sec:basic_syntax}
 | |
| 135 | \index{syntax!Pure|(}
 | |
| 104 | 136 | |
| 291 | 137 | At the root of all object-logics lies the Pure theory,\index{theory!Pure}
 | 
| 138 | bound to the \ML{} identifier \ttindex{Pure.thy}.  It contains, among many
 | |
| 139 | other things, the Pure syntax. An informal account of this basic syntax | |
| 140 | (meta-logic, types, \ldots) may be found in {\em Introduction to Isabelle}.
 | |
| 141 | A more precise description using a priority grammar is shown in | |
| 142 | Fig.\ts\ref{fig:pure_gram}.  The following nonterminals are defined:
 | |
| 143 | \begin{description}
 | |
| 144 | \item[$prop$] Terms of type $prop$. These are formulae of the meta-logic. | |
| 104 | 145 | |
| 291 | 146 | \item[$aprop$] Atomic propositions. These typically include the | 
| 147 | judgement forms of the object-logic; its definition introduces a | |
| 148 | meta-level predicate for each judgement form. | |
| 149 | ||
| 150 | \item[$logic$] Terms whose type belongs to class $logic$. Initially, | |
| 151 | this category contains just $prop$. As the syntax is extended by new | |
| 152 | object-logics, more productions for $logic$ are added automatically | |
| 153 | (see below). | |
| 104 | 154 | |
| 155 | \item[$fun$] Terms potentially of function type. | |
| 156 | ||
| 291 | 157 | \item[$type$] Types of the meta-logic. | 
| 104 | 158 | |
| 291 | 159 | \item[$idts$] A list of identifiers, possibly constrained by types. | 
| 104 | 160 | \end{description}
 | 
| 161 | ||
| 291 | 162 | \begin{warn}
 | 
| 163 |   Note that \verb|x::nat y| is parsed as \verb|x::(nat y)|, treating {\tt
 | |
| 164 |     y} like a type constructor applied to {\tt nat}.  The likely result is
 | |
| 165 | an error message. To avoid this interpretation, use parentheses and | |
| 166 | write \verb|(x::nat) y|. | |
| 104 | 167 | |
| 291 | 168 | Similarly, \verb|x::nat y::nat| is parsed as \verb|x::(nat y::nat)| and | 
| 169 | yields a syntax error. The correct form is \verb|(x::nat) (y::nat)|. | |
| 170 | \end{warn}
 | |
| 171 | ||
| 172 | \subsection{Logical types and default syntax}\label{logical-types}
 | |
| 173 | Isabelle's representation of mathematical languages is based on the typed | |
| 174 | $\lambda$-calculus. All logical types, namely those of class $logic$, are | |
| 175 | automatically equipped with a basic syntax of types, identifiers, | |
| 176 | variables, parentheses, $\lambda$-abstractions and applications. | |
| 177 | ||
| 178 | More precisely, for each type constructor $ty$ with arity $(\vec{s})c$,
 | |
| 179 | where $c$ is a subclass of $logic$, several productions are added: | |
| 104 | 180 | \begin{center}
 | 
| 181 | \begin{tabular}{rclc}
 | |
| 182 | $ty$ &=& $id$ ~~$|$~~ $var$ ~~$|$~~ {\tt(} $ty$ {\tt)} \\
 | |
| 291 | 183 |   &$|$& $fun^{(max_pri)}$ {\tt(} $logic$ {\tt,} \dots {\tt,} $logic$ {\tt)}\\
 | 
| 184 |   &$|$& $ty^{(max_pri)}$ {\tt::} $type$\\\\
 | |
| 104 | 185 | $logic$ &=& $ty$ | 
| 186 | \end{tabular}
 | |
| 187 | \end{center}
 | |
| 188 | ||
| 189 | ||
| 291 | 190 | \subsection{Lexical matters}
 | 
| 191 | The parser does not process input strings directly. It operates on token | |
| 192 | lists provided by Isabelle's \bfindex{lexer}.  There are two kinds of
 | |
| 193 | tokens: \bfindex{delimiters} and \bfindex{name tokens}.
 | |
| 104 | 194 | |
| 291 | 195 | Delimiters can be regarded as reserved words of the syntax. You can | 
| 196 | add new ones when extending theories.  In Fig.\ts\ref{fig:pure_gram} they
 | |
| 197 | appear in typewriter font, for example {\tt ==}, {\tt =?=} and
 | |
| 198 | {\tt PROP}\@.
 | |
| 104 | 199 | |
| 291 | 200 | Name tokens have a predefined syntax. The lexer distinguishes four | 
| 201 | disjoint classes of names: \rmindex{identifiers}, \rmindex{unknowns}, type
 | |
| 202 | identifiers\index{identifiers!type}, type unknowns\index{unknowns!type}.
 | |
| 203 | They are denoted by $id$\index{id@$id$}, $var$\index{var@$var$},
 | |
| 204 | $tid$\index{tid@$tid$}, $tvar$\index{tvar@$tvar$}, respectively.  Typical
 | |
| 205 | examples are {\tt x}, {\tt ?x7}, {\tt 'a}, {\tt ?'a3}.  Here is the precise
 | |
| 206 | syntax: | |
| 188 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 207 | \begin{eqnarray*}
 | 
| 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 208 | id & = & letter~quasiletter^* \\ | 
| 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 209 | var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
 | 
| 291 | 210 | tid     & =   & \mbox{\tt '}id \\
 | 
| 211 | tvar      & =   & \mbox{\tt ?}tid ~~|~~
 | |
| 212 |                   \mbox{\tt ?}tid\mbox{\tt .}nat \\[1ex]
 | |
| 188 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 213 | letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
 | 
| 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 214 | digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
 | 
| 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 215 | quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
 | 
| 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 216 | nat & = & digit^+ | 
| 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 217 | \end{eqnarray*}
 | 
| 291 | 218 | A $var$ or $tvar$ describes an unknown, which is internally a pair | 
| 188 
6be0856cdf49
last minute changes, eg literal tokens -> delimiters and valued tokens ->
 nipkow parents: 
142diff
changeset | 219 | of base name and index (\ML\ type \ttindex{indexname}).  These components are
 | 
| 291 | 220 | either separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
 | 
| 221 | run together as in {\tt ?x1}.  The latter form is possible if the
 | |
| 222 | base name does not end with digits. If the index is 0, it may be dropped | |
| 223 | altogether: {\tt ?x} abbreviates both {\tt ?x0} and {\tt ?x.0}.
 | |
| 104 | 224 | |
| 291 | 225 | The lexer repeatedly takes the maximal prefix of the input string that | 
| 226 | forms a valid token. A maximal prefix that is both a delimiter and a name | |
| 227 | is treated as a delimiter. Spaces, tabs and newlines are separators; they | |
| 228 | never occur within tokens. | |
| 104 | 229 | |
| 291 | 230 | Delimiters need not be separated by white space.  For example, if {\tt -}
 | 
| 231 | is a delimiter but {\tt --} is not, then the string {\tt --} is treated as
 | |
| 232 | two consecutive occurrences of the token~{\tt -}.  In contrast, \ML\ 
 | |
| 233 | treats {\tt --} as a single symbolic name.  The consequence of Isabelle's
 | |
| 234 | more liberal scheme is that the same string may be parsed in different ways | |
| 235 | after extending the syntax: after adding {\tt --} as a delimiter, the input
 | |
| 236 | {\tt --} is treated as a single token.
 | |
| 104 | 237 | |
| 291 | 238 | Name tokens are terminal symbols, strictly speaking, but we can generally | 
| 239 | regard them as nonterminals. This is because a name token carries with it | |
| 240 | useful information, the name. Delimiters, on the other hand, are nothing | |
| 241 | but than syntactic sugar. | |
| 104 | 242 | |
| 243 | ||
| 291 | 244 | \subsection{*Inspecting the syntax}
 | 
| 104 | 245 | \begin{ttbox}
 | 
| 291 | 246 | syn_of : theory -> Syntax.syntax | 
| 247 | Syntax.print_syntax : Syntax.syntax -> unit | |
| 248 | Syntax.print_gram : Syntax.syntax -> unit | |
| 249 | Syntax.print_trans : Syntax.syntax -> unit | |
| 104 | 250 | \end{ttbox}
 | 
| 291 | 251 | The abstract type \ttindex{Syntax.syntax} allows manipulation of syntaxes
 | 
| 252 | in \ML. You can display values of this type by calling the following | |
| 253 | functions: | |
| 254 | \begin{description}
 | |
| 255 | \item[\ttindexbold{syn_of} {\it thy}] returns the syntax of the Isabelle
 | |
| 256 |   theory~{\it thy} as an \ML\ value.
 | |
| 257 | ||
| 258 | \item[\ttindexbold{Syntax.print_syntax} {\it syn}] shows virtually all
 | |
| 259 |   information contained in the syntax {\it syn}.  The displayed output can
 | |
| 260 | be large. The following two functions are more selective. | |
| 104 | 261 | |
| 291 | 262 | \item[\ttindexbold{Syntax.print_gram} {\it syn}] shows the grammar part
 | 
| 263 |   of~{\it syn}, namely the lexicon, roots and productions.
 | |
| 104 | 264 | |
| 291 | 265 | \item[\ttindexbold{Syntax.print_trans} {\it syn}] shows the translation
 | 
| 266 |   part of~{\it syn}, namely the constants, parse/print macros and
 | |
| 267 | parse/print translations. | |
| 268 | \end{description}
 | |
| 269 | ||
| 270 | Let us demonstrate these functions by inspecting Pure's syntax. Even that | |
| 271 | is too verbose to display in full. | |
| 104 | 272 | \begin{ttbox}
 | 
| 273 | Syntax.print_syntax (syn_of Pure.thy); | |
| 291 | 274 | {\out lexicon: "!!" "\%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
 | 
| 104 | 275 | {\out roots: logic type fun prop}
 | 
| 276 | {\out prods:}
 | |
| 291 | 277 | {\out   type = tid  (1000)}
 | 
| 104 | 278 | {\out   type = tvar  (1000)}
 | 
| 279 | {\out   type = id  (1000)}
 | |
| 291 | 280 | {\out   type = tid "::" sort[0]  => "_ofsort" (1000)}
 | 
| 104 | 281 | {\out   type = tvar "::" sort[0]  => "_ofsort" (1000)}
 | 
| 282 | {\out   \vdots}
 | |
| 291 | 283 | \ttbreak | 
| 104 | 284 | {\out consts: "_K" "_appl" "_aprop" "_args" "_asms" "_bigimpl" \dots}
 | 
| 285 | {\out parse_ast_translation: "_appl" "_bigimpl" "_bracket"}
 | |
| 286 | {\out   "_idtyp" "_lambda" "_tapp" "_tappl"}
 | |
| 287 | {\out parse_rules:}
 | |
| 288 | {\out parse_translation: "!!" "_K" "_abs" "_aprop"}
 | |
| 289 | {\out print_translation: "all"}
 | |
| 290 | {\out print_rules:}
 | |
| 291 | {\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
 | |
| 292 | \end{ttbox}
 | |
| 293 | ||
| 291 | 294 | As you can see, the output is divided into labeled sections. The grammar | 
| 295 | is represented by {\tt lexicon}, {\tt roots} and {\tt prods}.  The rest
 | |
| 296 | refers to syntactic translations and macro expansion. Here is an | |
| 297 | explanation of the various sections. | |
| 104 | 298 | \begin{description}
 | 
| 291 | 299 |   \item[\ttindex{lexicon}] lists the delimiters used for lexical
 | 
| 300 |     analysis.\index{delimiters} 
 | |
| 104 | 301 | |
| 291 | 302 |   \item[\ttindex{roots}] lists the grammar's nonterminal symbols.  You must
 | 
| 303 | name the desired root when calling lower level functions or specifying | |
| 304 | macros. Higher level functions usually expect a type and derive the | |
| 305 |     actual root as described in~\S\ref{sec:grammar}.
 | |
| 104 | 306 | |
| 291 | 307 |   \item[\ttindex{prods}] lists the productions of the priority grammar.
 | 
| 308 |     The nonterminal $A^{(n)}$ is rendered in {\sc ascii} as {\tt $A$[$n$]}.
 | |
| 309 |     Each delimiter is quoted.  Some productions are shown with {\tt =>} and
 | |
| 310 | an attached string. These strings later become the heads of parse | |
| 311 | trees; they also play a vital role when terms are printed (see | |
| 312 |     \S\ref{sec:asts}).
 | |
| 104 | 313 | |
| 291 | 314 |     Productions with no strings attached are called {\bf copy
 | 
| 315 |       productions}\indexbold{productions!copy}.  Their right-hand side must
 | |
| 316 | have exactly one nonterminal symbol (or name token). The parser does | |
| 317 | not create a new parse tree node for copy productions, but simply | |
| 318 | returns the parse tree of the right-hand symbol. | |
| 104 | 319 | |
| 291 | 320 | If the right-hand side consists of a single nonterminal with no | 
| 321 |     delimiters, then the copy production is called a {\bf chain
 | |
| 322 |       production}\indexbold{productions!chain}.  Chain productions should
 | |
| 323 | be seen as abbreviations: conceptually, they are removed from the | |
| 324 | grammar by adding new productions. Priority information | |
| 325 | attached to chain productions is ignored, only the dummy value $-1$ is | |
| 326 | displayed. | |
| 104 | 327 | |
| 328 |   \item[\ttindex{consts}, \ttindex{parse_rules}, \ttindex{print_rules}]
 | |
| 291 | 329 |     relate to macros (see \S\ref{sec:macros}).
 | 
| 104 | 330 | |
| 291 | 331 |   \item[\ttindex{parse_ast_translation}, \ttindex{print_ast_translation}]
 | 
| 332 | list sets of constants that invoke translation functions for abstract | |
| 333 |     syntax trees.  Section \S\ref{sec:asts} below discusses this obscure
 | |
| 334 | matter. | |
| 104 | 335 | |
| 291 | 336 |   \item[\ttindex{parse_translation}, \ttindex{print_translation}] list sets
 | 
| 337 | of constants that invoke translation functions for terms (see | |
| 338 |     \S\ref{sec:tr_funs}).
 | |
| 339 | \end{description}
 | |
| 340 | \index{syntax!Pure|)}
 | |
| 104 | 341 | |
| 342 | ||
| 343 | \section{Mixfix declarations} \label{sec:mixfix}
 | |
| 291 | 344 | \index{mixfix declaration|(} 
 | 
| 104 | 345 | |
| 291 | 346 | When defining a theory, you declare new constants by giving their names, | 
| 347 | their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
 | |
| 348 | allow you to extend Isabelle's basic $\lambda$-calculus syntax with | |
| 349 | readable notation. They can express any context-free priority grammar. | |
| 350 | Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
 | |
| 351 | general than the priority declarations of \ML\ and Prolog. | |
| 352 | ||
| 353 | A mixfix annotation defines a production of the priority grammar. It | |
| 354 | describes the concrete syntax, the translation to abstract syntax, and the | |
| 355 | pretty printing. Special case annotations provide a simple means of | |
| 356 | specifying infix operators, binders and so forth. | |
| 104 | 357 | |
| 291 | 358 | \subsection{Grammar productions}\label{sec:grammar}
 | 
| 359 | Let us examine the treatment of the production | |
| 360 | \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\, A@2^{(p@2)}\, \ldots\,  
 | |
| 361 |                   A@n^{(p@n)}\, w@n. \]
 | |
| 362 | Here $A@i^{(p@i)}$ is a nonterminal with priority~$p@i$ for $i=1$,
 | |
| 363 | \ldots,~$n$, while $w@0$, \ldots,~$w@n$ are strings of terminals. | |
| 364 | In the corresponding mixfix annotation, the priorities are given separately | |
| 365 | as $[p@1,\ldots,p@n]$ and~$p$. The nonterminal symbols are identified with | |
| 366 | types~$\tau$, $\tau@1$, \ldots,~$\tau@n$ respectively, and the production's | |
| 367 | effect on nonterminals is expressed as the function type | |
| 368 | \[ [\tau@1, \ldots, \tau@n]\To \tau. \] | |
| 369 | Finally, the template | |
| 370 | \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n \] | |
| 371 | describes the strings of terminals. | |
| 104 | 372 | |
| 291 | 373 | A simple type is typically declared for each nonterminal symbol. In | 
| 374 | first-order logic, type~$i$ stands for terms and~$o$ for formulae. Only | |
| 375 | the outermost type constructor is taken into account. For example, any | |
| 376 | type of the form $\sigma list$ stands for a list; productions may refer | |
| 377 | to the symbol {\tt list} and will apply lists of any type.
 | |
| 378 | ||
| 379 | The symbol associated with a type is called its {\bf root} since it may
 | |
| 380 | serve as the root of a parse tree. Precisely, the root of $(\tau@1, \dots, | |
| 381 | \tau@n)ty$ is $ty$, where $\tau@1$, \ldots, $\tau@n$ are types and $ty$ is | |
| 382 | a type constructor. Type infixes are a special case of this; in | |
| 383 | particular, the root of $\tau@1 \To \tau@2$ is {\tt fun}.  Finally, the
 | |
| 384 | root of a type variable is {\tt logic}; general productions might
 | |
| 385 | refer to this nonterminal. | |
| 104 | 386 | |
| 291 | 387 | Identifying nonterminals with types allows a constant's type to specify | 
| 388 | syntax as well. We can declare the function~$f$ to have type $[\tau@1, | |
| 389 | \ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the | |
| 390 | layout of the function's $n$ arguments. The constant's name, in this | |
| 391 | case~$f$, will also serve as the label in the abstract syntax tree. There | |
| 392 | are two exceptions to this treatment of constants: | |
| 393 | \begin{enumerate}
 | |
| 394 | \item A production need not map directly to a logical function. In this | |
| 395 | case, you must declare a constant whose purpose is purely syntactic. | |
| 396 |     By convention such constants begin with the symbol~{\tt\at}, 
 | |
| 397 | ensuring that they can never be written in formulae. | |
| 398 | ||
| 399 | \item A copy production has no associated constant. | |
| 400 | \end{enumerate}
 | |
| 401 | There is something artificial about this representation of productions, | |
| 402 | but it is convenient, particularly for simple theory extensions. | |
| 104 | 403 | |
| 291 | 404 | \subsection{The general mixfix form}
 | 
| 405 | Here is a detailed account of the general \bfindex{mixfix declaration} as
 | |
| 406 | it may occur within the {\tt consts} section of a {\tt .thy} file.
 | |
| 407 | \begin{center}
 | |
| 408 |   {\tt "$c$" ::\ "$\sigma$" ("$template$" $ps$ $p$)}
 | |
| 409 | \end{center}
 | |
| 410 | This constant declaration and mixfix annotation is interpreted as follows: | |
| 411 | \begin{itemize}
 | |
| 412 | \item The string {\tt "$c$"} is the name of the constant associated with
 | |
| 413 |   the production.  If $c$ is empty (given as~{\tt ""}) then this is a copy
 | |
| 414 |   production.\index{productions!copy} Otherwise, parsing an instance of the
 | |
| 415 |   phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
 | |
| 416 |     $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
 | |
| 417 | argument. | |
| 418 | ||
| 419 | \item The constant $c$, if non-empty, is declared to have type $\sigma$. | |
| 104 | 420 | |
| 291 | 421 | \item The string $template$ specifies the right-hand side of | 
| 422 | the production. It has the form | |
| 423 | \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \] | |
| 424 |     where each occurrence of \ttindex{_} denotes an
 | |
| 425 |     argument\index{argument!mixfix} position and the~$w@i$ do not
 | |
| 426 |     contain~{\tt _}.  (If you want a literal~{\tt _} in the concrete
 | |
| 427 | syntax, you must escape it as described below.) The $w@i$ may | |
| 428 |     consist of \rmindex{delimiters}, spaces or \rmindex{pretty
 | |
| 429 | printing} annotations (see below). | |
| 104 | 430 | |
| 291 | 431 | \item The type $\sigma$ specifies the production's nonterminal symbols (or name | 
| 432 | tokens). If $template$ is of the form above then $\sigma$ must be a | |
| 433 | function type with at least~$n$ argument positions, say $\sigma = | |
| 434 | [\tau@1, \dots, \tau@n] \To \tau$. Nonterminal symbols are derived | |
| 435 | from the type $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described above. | |
| 436 |     Any of these may be function types; the corresponding root is then {\tt
 | |
| 437 | fun}. | |
| 104 | 438 | |
| 291 | 439 |   \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
 | 
| 440 | [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal | |
| 441 |     priority\indexbold{priorities} required of any phrase that may appear
 | |
| 442 | as the $i$-th argument. Missing priorities default to~$0$. | |
| 443 | ||
| 444 | \item The integer $p$ is the priority of this production. If omitted, it | |
| 445 | defaults to the maximal priority. | |
| 446 | ||
| 447 | Priorities, or precedences, range between $0$ and | |
| 448 |     $max_pri$\indexbold{max_pri@$max_pri$} (= 1000).
 | |
| 104 | 449 | \end{itemize}
 | 
| 450 | ||
| 291 | 451 | The declaration {\tt $c$ ::\ "$\sigma$" ("$template$")} specifies no
 | 
| 452 | priorities. The resulting production puts no priority constraints on any | |
| 453 | of its arguments and has maximal priority itself. Omitting priorities in | |
| 454 | this manner will introduce syntactic ambiguities unless the production's | |
| 455 | right-hand side is fully bracketed, as in \verb|"if _ then _ else _ fi"|. | |
| 104 | 456 | |
| 291 | 457 | \begin{warn}
 | 
| 458 | Theories must sometimes declare types for purely syntactic purposes. One | |
| 459 |   example is {\tt type}, the built-in type of types.  This is a `type of
 | |
| 460 | all types' in the syntactic sense only. Do not declare such types under | |
| 461 |   {\tt arities} as belonging to class $logic$, for that would allow their
 | |
| 462 |   use in arbitrary Isabelle expressions~(\S\ref{logical-types}).
 | |
| 463 | \end{warn}
 | |
| 464 | ||
| 465 | \subsection{Example: arithmetic expressions}
 | |
| 466 | This theory specification contains a {\tt consts} section with mixfix
 | |
| 467 | declarations encoding the priority grammar from | |
| 468 | \S\ref{sec:priority_grammars}:
 | |
| 104 | 469 | \begin{ttbox}
 | 
| 470 | EXP = Pure + | |
| 471 | types | |
| 291 | 472 | exp | 
| 104 | 473 | arities | 
| 474 | exp :: logic | |
| 475 | consts | |
| 291 | 476 |   "0" :: "exp"                ("0"      9)
 | 
| 477 |   "+" :: "[exp, exp] => exp"  ("_ + _"  [0, 1] 0)
 | |
| 478 |   "*" :: "[exp, exp] => exp"  ("_ * _"  [3, 2] 2)
 | |
| 479 |   "-" :: "exp => exp"         ("- _"    [3] 3)
 | |
| 104 | 480 | end | 
| 481 | \end{ttbox}
 | |
| 482 | Note that the {\tt arities} declaration causes {\tt exp} to be added to the
 | |
| 291 | 483 | syntax' roots.  If you put the text above into a file {\tt exp.thy} and load
 | 
| 135 | 484 | it via {\tt use_thy "EXP"}, you can run some tests:
 | 
| 104 | 485 | \begin{ttbox}
 | 
| 486 | val read_exp = Syntax.test_read (syn_of EXP.thy) "exp"; | |
| 291 | 487 | {\out val it = fn : string -> unit}
 | 
| 104 | 488 | read_exp "0 * 0 * 0 * 0 + 0 + 0 + 0"; | 
| 489 | {\out tokens: "0" "*" "0" "*" "0" "*" "0" "+" "0" "+" "0" "+" "0"}
 | |
| 490 | {\out raw: ("+" ("+" ("+" ("*" "0" ("*" "0" ("*" "0" "0"))) "0") "0") "0")}
 | |
| 491 | {\out \vdots}
 | |
| 492 | read_exp "0 + - 0 + 0"; | |
| 493 | {\out tokens: "0" "+" "-" "0" "+" "0"}
 | |
| 494 | {\out raw: ("+" ("+" "0" ("-" "0")) "0")}
 | |
| 495 | {\out \vdots}
 | |
| 496 | \end{ttbox}
 | |
| 497 | The output of \ttindex{Syntax.test_read} includes the token list ({\tt
 | |
| 291 | 498 |   tokens}) and the raw \AST{} directly derived from the parse tree,
 | 
| 499 | ignoring parse \AST{} translations.  The rest is tracing information
 | |
| 500 | provided by the macro expander (see \S\ref{sec:macros}).
 | |
| 104 | 501 | |
| 291 | 502 | Executing {\tt Syntax.print_gram} reveals the productions derived
 | 
| 503 | from our mixfix declarations (lots of additional information deleted): | |
| 104 | 504 | \begin{ttbox}
 | 
| 291 | 505 | Syntax.print_gram (syn_of EXP.thy); | 
| 506 | {\out exp = "0"  => "0" (9)}
 | |
| 507 | {\out exp = exp[0] "+" exp[1]  => "+" (0)}
 | |
| 508 | {\out exp = exp[3] "*" exp[2]  => "*" (2)}
 | |
| 509 | {\out exp = "-" exp[3]  => "-" (3)}
 | |
| 104 | 510 | \end{ttbox}
 | 
| 511 | ||
| 291 | 512 | |
| 513 | \subsection{The mixfix template}
 | |
| 514 | Let us take a closer look at the string $template$ appearing in mixfix | |
| 515 | annotations. This string specifies a list of parsing and printing | |
| 516 | directives: delimiters\index{delimiter}, arguments\index{argument!mixfix},
 | |
| 517 | spaces, blocks of indentation and line breaks. These are encoded via the | |
| 104 | 518 | following character sequences: | 
| 519 | ||
| 291 | 520 | \index{pretty printing|(}
 | 
| 104 | 521 | \begin{description}
 | 
| 291 | 522 |   \item[~\ttindex_~] An argument\index{argument!mixfix} position, which
 | 
| 523 | stands for a nonterminal symbol or name token. | |
| 104 | 524 | |
| 291 | 525 |   \item[~$d$~] A \rmindex{delimiter}, namely a non-empty sequence of
 | 
| 526 |     non-special or escaped characters.  Escaping a character\index{escape
 | |
| 527 |       character} means preceding it with a {\tt '} (single quote).  Thus
 | |
| 528 |     you have to write {\tt ''} if you really want a single quote.  You must
 | |
| 529 |     also escape {\tt _}, {\tt (}, {\tt )} and {\tt /}.  Delimiters may
 | |
| 530 | never contain white space, though. | |
| 104 | 531 | |
| 291 | 532 | \item[~$s$~] A non-empty sequence of spaces for printing. This | 
| 533 | and the following specifications do not affect parsing at all. | |
| 104 | 534 | |
| 291 | 535 |   \item[~{\ttindex($n$}~] Open a pretty printing block.  The optional
 | 
| 536 | number $n$ specifies how much indentation to add when a line break | |
| 537 |     occurs within the block.  If {\tt(} is not followed by digits, the
 | |
| 538 | indentation defaults to~$0$. | |
| 104 | 539 | |
| 291 | 540 | \item[~\ttindex)~] Close a pretty printing block. | 
| 104 | 541 | |
| 291 | 542 |   \item[~\ttindex{//}~] Force a line break.
 | 
| 104 | 543 | |
| 291 | 544 | \item[~\ttindex/$s$~] Allow a line break. Here $s$ stands for the string | 
| 545 |     of spaces (zero or more) right after the {\tt /} character.  These
 | |
| 546 | spaces are printed if the break is not taken. | |
| 547 | \end{description}
 | |
| 548 | Isabelle's pretty printer resembles the one described in | |
| 549 | Paulson~\cite{paulson91}.  \index{pretty printing|)}
 | |
| 104 | 550 | |
| 551 | ||
| 552 | \subsection{Infixes}
 | |
| 291 | 553 | \indexbold{infix operators}
 | 
| 554 | Infix operators associating to the left or right can be declared | |
| 555 | using {\tt infixl} or {\tt infixr}.
 | |
| 556 | Roughly speaking, the form {\tt $c$ ::\ "$\sigma$" (infixl $p$)}
 | |
| 557 | abbreviates the declarations | |
| 104 | 558 | \begin{ttbox}
 | 
| 291 | 559 | "op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
 | 
| 560 | "op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
 | |
| 104 | 561 | \end{ttbox}
 | 
| 291 | 562 | and {\tt $c$ ::\ "$\sigma$" (infixr $p$)} abbreviates the declarations
 | 
| 104 | 563 | \begin{ttbox}
 | 
| 291 | 564 | "op \(c\)" :: "\(\sigma\)"   ("op \(c\)")
 | 
| 565 | "op \(c\)" :: "\(\sigma\)"   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
 | |
| 104 | 566 | \end{ttbox}
 | 
| 291 | 567 | The infix operator is declared as a constant with the prefix {\tt op}.
 | 
| 104 | 568 | Thus, prefixing infixes with \ttindex{op} makes them behave like ordinary
 | 
| 291 | 569 | function symbols, as in \ML. Special characters occurring in~$c$ must be | 
| 570 | escaped, as in delimiters, using a single quote. | |
| 571 | ||
| 572 | The expanded forms above would be actually illegal in a {\tt .thy} file
 | |
| 573 | because they declare the constant \hbox{\tt"op \(c\)"} twice.
 | |
| 104 | 574 | |
| 575 | ||
| 576 | \subsection{Binders}
 | |
| 291 | 577 | \indexbold{binders}
 | 
| 578 | \begingroup | |
| 579 | \def\Q{{\cal Q}}
 | |
| 580 | A {\bf binder} is a variable-binding construct such as a quantifier.  The
 | |
| 581 | binder declaration \indexbold{*binder}
 | |
| 104 | 582 | \begin{ttbox}
 | 
| 291 | 583 | \(c\) :: "\(\sigma\)" (binder "\(\Q\)" \(p\)) | 
| 104 | 584 | \end{ttbox}
 | 
| 291 | 585 | introduces a constant~$c$ of type~$\sigma$, which must have the form | 
| 586 | $(\tau@1 \To \tau@2) \To \tau@3$. Its concrete syntax is $\Q~x.P$, where | |
| 587 | $x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$ | |
| 588 | and the whole term has type~$\tau@3$. Special characters in $\Q$ must be | |
| 589 | escaped using a single quote. | |
| 590 | ||
| 591 | Let us declare the quantifier~$\forall$: | |
| 104 | 592 | \begin{ttbox}
 | 
| 593 | All :: "('a => o) => o"   (binder "ALL " 10)
 | |
| 594 | \end{ttbox}
 | |
| 291 | 595 | This let us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
 | 
| 596 | $x$.$P$}. When printing, Isabelle prefers the latter form, but must fall | |
| 597 | back on $\mtt{All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
 | |
| 598 | $x$.$P$} have type~$o$, the type of formulae, while the bound variable | |
| 599 | can be polymorphic. | |
| 104 | 600 | |
| 291 | 601 | The binder~$c$ of type $(\sigma \To \tau) \To \tau$ can be nested. The | 
| 602 | external form $\Q~x@1~x@2 \ldots x@n. P$ corresponds to the internal form | |
| 603 | \[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)) \] | |
| 104 | 604 | |
| 605 | \medskip | |
| 606 | The general binder declaration | |
| 607 | \begin{ttbox}
 | |
| 291 | 608 | \(c\) :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)" (binder "\(\Q\)" \(p\)) | 
| 104 | 609 | \end{ttbox}
 | 
| 610 | is internally expanded to | |
| 611 | \begin{ttbox}
 | |
| 291 | 612 | \(c\) :: "(\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)" | 
| 613 | "\(\Q\)"\hskip-3pt  :: "[idts, \(\tau@2\)] => \(\tau@3\)"   ("(3\(\Q\)_./ _)" \(p\))
 | |
| 104 | 614 | \end{ttbox}
 | 
| 291 | 615 | with $idts$ being the nonterminal symbol for a list of $id$s optionally | 
| 616 | constrained (see Fig.\ts\ref{fig:pure_gram}).  The declaration also
 | |
| 617 | installs a parse translation\index{translations!parse} for~$\Q$ and a print
 | |
| 618 | translation\index{translations!print} for~$c$ to translate between the
 | |
| 619 | internal and external forms. | |
| 620 | \endgroup | |
| 104 | 621 | |
| 291 | 622 | \index{mixfix declaration|)}
 | 
| 104 | 623 | |
| 624 | ||
| 625 | \section{Example: some minimal logics} \label{sec:min_logics}
 | |
| 291 | 626 | This section presents some examples that have a simple syntax. They | 
| 627 | demonstrate how to define new object-logics from scratch. | |
| 104 | 628 | |
| 291 | 629 | First we must define how an object-logic syntax embedded into the | 
| 630 | meta-logic. Since all theorems must conform to the syntax for~$prop$ (see | |
| 631 | Fig.\ts\ref{fig:pure_gram}), that syntax has to be extended with the
 | |
| 632 | object-level syntax. Assume that the syntax of your object-logic defines a | |
| 633 | nonterminal symbol~$o$ of formulae. These formulae can now appear in | |
| 634 | axioms and theorems wherever $prop$ does if you add the production | |
| 104 | 635 | \[ prop ~=~ o. \] | 
| 291 | 636 | This is not a copy production but a coercion from formulae to propositions: | 
| 104 | 637 | \begin{ttbox}
 | 
| 638 | Base = Pure + | |
| 639 | types | |
| 291 | 640 | o | 
| 104 | 641 | arities | 
| 642 | o :: logic | |
| 643 | consts | |
| 644 |   Trueprop :: "o => prop"   ("_" 5)
 | |
| 645 | end | |
| 646 | \end{ttbox}
 | |
| 647 | The constant {\tt Trueprop} (the name is arbitrary) acts as an invisible
 | |
| 291 | 648 | coercion function.  Assuming this definition resides in a file {\tt base.thy},
 | 
| 135 | 649 | you have to load it with the command {\tt use_thy "Base"}.
 | 
| 104 | 650 | |
| 291 | 651 | One of the simplest nontrivial logics is {\bf minimal logic} of
 | 
| 652 | implication. Its definition in Isabelle needs no advanced features but | |
| 653 | illustrates the overall mechanism nicely: | |
| 104 | 654 | \begin{ttbox}
 | 
| 655 | Hilbert = Base + | |
| 656 | consts | |
| 657 | "-->" :: "[o, o] => o" (infixr 10) | |
| 658 | rules | |
| 659 | K "P --> Q --> P" | |
| 660 | S "(P --> Q --> R) --> (P --> Q) --> P --> R" | |
| 661 | MP "[| P --> Q; P |] ==> Q" | |
| 662 | end | |
| 663 | \end{ttbox}
 | |
| 291 | 664 | After loading this definition from the file {\tt hilbert.thy}, you can
 | 
| 665 | start to prove theorems in the logic: | |
| 104 | 666 | \begin{ttbox}
 | 
| 667 | goal Hilbert.thy "P --> P"; | |
| 668 | {\out Level 0}
 | |
| 669 | {\out P --> P}
 | |
| 670 | {\out  1.  P --> P}
 | |
| 291 | 671 | \ttbreak | 
| 104 | 672 | by (resolve_tac [Hilbert.MP] 1); | 
| 673 | {\out Level 1}
 | |
| 674 | {\out P --> P}
 | |
| 675 | {\out  1.  ?P --> P --> P}
 | |
| 676 | {\out  2.  ?P}
 | |
| 291 | 677 | \ttbreak | 
| 104 | 678 | by (resolve_tac [Hilbert.MP] 1); | 
| 679 | {\out Level 2}
 | |
| 680 | {\out P --> P}
 | |
| 681 | {\out  1.  ?P1 --> ?P --> P --> P}
 | |
| 682 | {\out  2.  ?P1}
 | |
| 683 | {\out  3.  ?P}
 | |
| 291 | 684 | \ttbreak | 
| 104 | 685 | by (resolve_tac [Hilbert.S] 1); | 
| 686 | {\out Level 3}
 | |
| 687 | {\out P --> P}
 | |
| 688 | {\out  1.  P --> ?Q2 --> P}
 | |
| 689 | {\out  2.  P --> ?Q2}
 | |
| 291 | 690 | \ttbreak | 
| 104 | 691 | by (resolve_tac [Hilbert.K] 1); | 
| 692 | {\out Level 4}
 | |
| 693 | {\out P --> P}
 | |
| 694 | {\out  1.  P --> ?Q2}
 | |
| 291 | 695 | \ttbreak | 
| 104 | 696 | by (resolve_tac [Hilbert.K] 1); | 
| 697 | {\out Level 5}
 | |
| 698 | {\out P --> P}
 | |
| 699 | {\out No subgoals!}
 | |
| 700 | \end{ttbox}
 | |
| 291 | 701 | As we can see, this Hilbert-style formulation of minimal logic is easy to | 
| 702 | define but difficult to use. The following natural deduction formulation is | |
| 703 | better: | |
| 104 | 704 | \begin{ttbox}
 | 
| 705 | MinI = Base + | |
| 706 | consts | |
| 707 | "-->" :: "[o, o] => o" (infixr 10) | |
| 708 | rules | |
| 709 | impI "(P ==> Q) ==> P --> Q" | |
| 710 | impE "[| P --> Q; P |] ==> Q" | |
| 711 | end | |
| 712 | \end{ttbox}
 | |
| 291 | 713 | Note, however, that although the two systems are equivalent, this fact | 
| 714 | cannot be proved within Isabelle.  Axioms {\tt S} and {\tt K} can be
 | |
| 715 | derived in {\tt MinI} (exercise!), but {\tt impI} cannot be derived in {\tt
 | |
| 716 |   Hilbert}.  The reason is that {\tt impI} is only an {\bf admissible} rule
 | |
| 717 | in {\tt Hilbert}, something that can only be shown by induction over all
 | |
| 718 | possible proofs in {\tt Hilbert}.
 | |
| 104 | 719 | |
| 291 | 720 | We may easily extend minimal logic with falsity: | 
| 104 | 721 | \begin{ttbox}
 | 
| 722 | MinIF = MinI + | |
| 723 | consts | |
| 724 | False :: "o" | |
| 725 | rules | |
| 726 | FalseE "False ==> P" | |
| 727 | end | |
| 728 | \end{ttbox}
 | |
| 729 | On the other hand, we may wish to introduce conjunction only: | |
| 730 | \begin{ttbox}
 | |
| 731 | MinC = Base + | |
| 732 | consts | |
| 733 | "&" :: "[o, o] => o" (infixr 30) | |
| 291 | 734 | \ttbreak | 
| 104 | 735 | rules | 
| 736 | conjI "[| P; Q |] ==> P & Q" | |
| 737 | conjE1 "P & Q ==> P" | |
| 738 | conjE2 "P & Q ==> Q" | |
| 739 | end | |
| 740 | \end{ttbox}
 | |
| 291 | 741 | And if we want to have all three connectives together, we create and load a | 
| 742 | theory file consisting of a single line:\footnote{We can combine the
 | |
| 743 | theories without creating a theory file using the ML declaration | |
| 744 | \begin{ttbox}
 | |
| 745 | val MinIFC_thy = merge_theories(MinIF,MinC) | |
| 746 | \end{ttbox}
 | |
| 747 | \index{*merge_theories|fnote}}
 | |
| 104 | 748 | \begin{ttbox}
 | 
| 749 | MinIFC = MinIF + MinC | |
| 750 | \end{ttbox}
 | |
| 751 | Now we can prove mixed theorems like | |
| 752 | \begin{ttbox}
 | |
| 753 | goal MinIFC.thy "P & False --> Q"; | |
| 754 | by (resolve_tac [MinI.impI] 1); | |
| 755 | by (dresolve_tac [MinC.conjE2] 1); | |
| 756 | by (eresolve_tac [MinIF.FalseE] 1); | |
| 757 | \end{ttbox}
 | |
| 758 | Try this as an exercise! | |
| 759 | ||
| 291 | 760 | \medskip | 
| 761 | Unless you need to define macros or syntax translation functions, you may | |
| 762 | skip the rest of this chapter. | |
| 763 | ||
| 764 | ||
| 765 | \section{*Abstract syntax trees} \label{sec:asts}
 | |
| 766 | \index{trees!abstract syntax|(} The parser, given a token list from the
 | |
| 767 | lexer, applies productions to yield a parse tree\index{trees!parse}.  By
 | |
| 768 | applying some internal transformations the parse tree becomes an abstract | |
| 769 | syntax tree, or \AST{}.  Macro expansion, further translations and finally
 | |
| 770 | type inference yields a well-typed term\index{terms!obtained from ASTs}.
 | |
| 771 | The printing process is the reverse, except for some subtleties to be | |
| 772 | discussed later. | |
| 773 | ||
| 774 | Figure~\ref{fig:parse_print} outlines the parsing and printing process.
 | |
| 775 | Much of the complexity is due to the macro mechanism. Using macros, you | |
| 776 | can specify most forms of concrete syntax without writing any \ML{} code.
 | |
| 777 | ||
| 778 | \begin{figure}
 | |
| 779 | \begin{center}
 | |
| 780 | \begin{tabular}{cl}
 | |
| 781 | string & \\ | |
| 782 | $\downarrow$ & parser \\ | |
| 783 | parse tree & \\ | |
| 784 | $\downarrow$    & parse \AST{} translation \\
 | |
| 785 | \AST{}             & \\
 | |
| 786 | $\downarrow$    & \AST{} rewriting (macros) \\
 | |
| 787 | \AST{}             & \\
 | |
| 788 | $\downarrow$ & parse translation, type inference \\ | |
| 789 | --- well-typed term --- & \\ | |
| 790 | $\downarrow$ & print translation \\ | |
| 791 | \AST{}             & \\
 | |
| 792 | $\downarrow$    & \AST{} rewriting (macros) \\
 | |
| 793 | \AST{}             & \\
 | |
| 794 | $\downarrow$    & print \AST{} translation, printer \\
 | |
| 795 | string & | |
| 796 | \end{tabular}
 | |
| 797 | \index{translations!parse}\index{translations!parse AST}
 | |
| 798 | \index{translations!print}\index{translations!print AST}
 | |
| 799 | ||
| 800 | \end{center}
 | |
| 801 | \caption{Parsing and printing}\label{fig:parse_print}
 | |
| 802 | \end{figure}
 | |
| 803 | ||
| 804 | Abstract syntax trees are an intermediate form between the raw parse trees | |
| 805 | and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
 | |
| 806 | variable) or a list of {\em at least two\/} subtrees.  Internally, they
 | |
| 807 | have type \ttindex{Syntax.ast}: \index{*Constant} \index{*Variable}
 | |
| 808 | \index{*Appl}
 | |
| 809 | \begin{ttbox}
 | |
| 810 | datatype ast = Constant of string | |
| 811 | | Variable of string | |
| 812 | | Appl of ast list | |
| 813 | \end{ttbox}
 | |
| 814 | ||
| 815 | Isabelle uses an S-expression syntax for abstract syntax trees. Constant | |
| 816 | atoms are shown as quoted strings, variable atoms as non-quoted strings and | |
| 817 | applications as a parenthesized list of subtrees. For example, the \AST | |
| 818 | \begin{ttbox}
 | |
| 819 | Appl [Constant "_constrain", | |
| 820 | Appl [Constant "_abs", Variable "x", Variable "t"], | |
| 821 | Appl [Constant "fun", Variable "'a", Variable "'b"]] | |
| 822 | \end{ttbox}
 | |
| 823 | is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
 | |
| 824 | Both {\tt ()} and {\tt (f)} are illegal because they have too few
 | |
| 825 | subtrees. | |
| 826 | ||
| 827 | The resemblance of Lisp's S-expressions is intentional, but there are two | |
| 828 | kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the | |
| 829 | names ``{\tt Constant}'' and ``{\tt Variable}'' too literally; in the later
 | |
| 830 | translation to terms, $\Variable x$ may become a constant, free or bound | |
| 831 | variable, even a type constructor or class name; the actual outcome depends | |
| 832 | on the context. | |
| 833 | ||
| 834 | Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
 | |
| 835 | application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of | |
| 836 | application is determined later by context; it could be a type constructor | |
| 837 | applied to types. | |
| 838 | ||
| 839 | Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
 | |
| 840 | first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
 | |
| 841 | at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
 | |
| 842 | occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
 | |
| 843 | constructor \ttindex{Bound}).
 | |
| 844 | ||
| 845 | ||
| 846 | \subsection{Transforming parse trees to \AST{}s}
 | |
| 847 | The parse tree is the raw output of the parser. Translation functions, | |
| 848 | called {\bf parse AST translations}\indexbold{translations!parse AST},
 | |
| 849 | transform the parse tree into an abstract syntax tree. | |
| 850 | ||
| 851 | The parse tree is constructed by nesting the right-hand sides of the | |
| 852 | productions used to recognize the input. Such parse trees are simply lists | |
| 853 | of tokens and constituent parse trees, the latter representing the | |
| 854 | nonterminals of the productions. Let us refer to the actual productions in | |
| 855 | the form displayed by {\tt Syntax.print_syntax}.
 | |
| 856 | ||
| 857 | Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
 | |
| 858 | by stripping out delimiters and copy productions. More precisely, the | |
| 859 | mapping $ast_of_pt$\index{ast_of_pt@$ast_of_pt$} is derived from the
 | |
| 860 | productions as follows: | |
| 861 | \begin{itemize}
 | |
| 862 | \item Name tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$, | |
| 863 | $var$, $tid$ or $tvar$ token, and $s$ its associated string. | |
| 864 | ||
| 865 | \item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$. | |
| 866 | Here $\ldots$ stands for strings of delimiters, which are | |
| 867 | discarded. $P$ stands for the single constituent that is not a | |
| 868 | delimiter; it is either a nonterminal symbol or a name token. | |
| 869 | ||
| 870 |   \item $0$-ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$.
 | |
| 871 | Here there are no constituents other than delimiters, which are | |
| 872 | discarded. | |
| 873 | ||
| 874 | \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and | |
| 875 | the remaining constituents $P@1$, \ldots, $P@n$ are built into an | |
| 876 | application whose head constant is~$c$: | |
| 877 |     \begin{eqnarray*}
 | |
| 878 |       \lefteqn{ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>} c)} \\
 | |
| 879 |       &&\qquad{}= \Appl{\Constant c, ast_of_pt(P@1), \ldots, ast_of_pt(P@n)}
 | |
| 880 |     \end{eqnarray*}
 | |
| 881 | \end{itemize}
 | |
| 882 | Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
 | |
| 883 | {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
 | |
| 884 | These examples illustrate the need for further translations to make \AST{}s
 | |
| 885 | closer to the typed $\lambda$-calculus. The Pure syntax provides | |
| 886 | predefined parse \AST{} translations\index{translations!parse AST} for
 | |
| 887 | ordinary applications, type applications, nested abstractions, meta | |
| 888 | implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
 | |
| 889 | effect on some representative input strings. | |
| 890 | ||
| 891 | ||
| 892 | \begin{figure}
 | |
| 893 | \begin{center}
 | |
| 894 | \tt\begin{tabular}{ll}
 | |
| 895 | \rm input string & \rm \AST \\\hline | |
| 896 | "f" & f \\ | |
| 897 | "'a" & 'a \\ | |
| 898 | "t == u"            & ("==" t u) \\
 | |
| 899 | "f(x)"              & ("_appl" f x) \\
 | |
| 900 | "f(x, y)"           & ("_appl" f ("_args" x y)) \\
 | |
| 901 | "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
 | |
| 902 | "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
 | |
| 903 | \end{tabular}
 | |
| 904 | \end{center}
 | |
| 905 | \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
 | |
| 906 | \end{figure}
 | |
| 907 | ||
| 908 | \begin{figure}
 | |
| 909 | \begin{center}
 | |
| 910 | \tt\begin{tabular}{ll}
 | |
| 911 | \rm input string            & \rm \AST{} \\\hline
 | |
| 912 | "f(x, y, z)" & (f x y z) \\ | |
| 913 | "'a ty" & (ty 'a) \\ | |
| 914 | "('a, 'b) ty"               & (ty 'a 'b) \\
 | |
| 915 | "\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
 | |
| 916 | "\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
 | |
| 917 | "[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
 | |
| 918 | "['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
 | |
| 919 | \end{tabular}
 | |
| 920 | \end{center}
 | |
| 921 | \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
 | |
| 922 | \end{figure}
 | |
| 923 | ||
| 924 | The names of constant heads in the \AST{} control the translation process.
 | |
| 925 | The list of constants invoking parse \AST{} translations appears in the
 | |
| 926 | output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
 | |
| 927 | ||
| 928 | ||
| 929 | \subsection{Transforming \AST{}s to terms}
 | |
| 930 | The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | |
| 931 | transformed into a term. This term is probably ill-typed since type | |
| 932 | inference has not occurred yet. The term may contain type constraints | |
| 933 | consisting of applications with head {\tt "_constrain"}; the second
 | |
| 934 | argument is a type encoded as a term. Type inference later introduces | |
| 935 | correct types or rejects the input. | |
| 936 | ||
| 937 | Another set of translation functions, namely parse | |
| 938 | translations,\index{translations!parse}, may affect this process.  If we
 | |
| 939 | ignore parse translations for the time being, then \AST{}s are transformed
 | |
| 940 | to terms by mapping \AST{} constants to constants, \AST{} variables to
 | |
| 941 | schematic or free variables and \AST{} applications to applications.
 | |
| 942 | ||
| 943 | More precisely, the mapping $term_of_ast$\index{term_of_ast@$term_of_ast$}
 | |
| 944 | is defined by | |
| 945 | \begin{itemize}
 | |
| 946 | \item Constants: $term_of_ast(\Constant x) = \ttfct{Const} (x,
 | |
| 947 |   \mtt{dummyT})$.
 | |
| 948 | ||
| 949 | \item Schematic variables: $term_of_ast(\Variable \mtt{"?}xi\mtt") =
 | |
| 950 |   \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
 | |
| 951 | the index extracted from $xi$. | |
| 952 | ||
| 953 | \item Free variables: $term_of_ast(\Variable x) = \ttfct{Free} (x,
 | |
| 954 |   \mtt{dummyT})$.
 | |
| 955 | ||
| 956 | \item Function applications with $n$ arguments: | |
| 957 |     \begin{eqnarray*}
 | |
| 958 |       \lefteqn{term_of_ast(\Appl{f, x@1, \ldots, x@n})} \\
 | |
| 959 |       &&\qquad{}= term_of_ast(f) \ttapp
 | |
| 960 | term_of_ast(x@1) \ttapp \ldots \ttapp term_of_ast(x@n) | |
| 961 |     \end{eqnarray*}
 | |
| 962 | \end{itemize}
 | |
| 963 | Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
 | |
| 964 | \verb|$|\index{$@{\tt\$}} are constructors of the datatype {\tt term},
 | |
| 965 | while \ttindex{dummyT} stands for some dummy type that is ignored during
 | |
| 966 | type inference. | |
| 967 | ||
| 968 | So far the outcome is still a first-order term. Abstractions and bound | |
| 969 | variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
 | |
| 970 | by parse translations.  Such translations are attached to {\tt "_abs"},
 | |
| 971 | {\tt "!!"} and user-defined binders.
 | |
| 972 | ||
| 973 | ||
| 974 | \subsection{Printing of terms}
 | |
| 975 | The output phase is essentially the inverse of the input phase. Terms are | |
| 976 | translated via abstract syntax trees into strings. Finally the strings are | |
| 977 | pretty printed. | |
| 978 | ||
| 979 | Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
 | |
| 980 | terms into \AST{}s.  Ignoring those, the transformation maps
 | |
| 981 | term constants, variables and applications to the corresponding constructs | |
| 982 | on \AST{}s.  Abstractions are mapped to applications of the special
 | |
| 983 | constant {\tt _abs}.
 | |
| 984 | ||
| 985 | More precisely, the mapping $ast_of_term$\index{ast_of_term@$ast_of_term$}
 | |
| 986 | is defined as follows: | |
| 987 | \begin{itemize}
 | |
| 988 |   \item $ast_of_term(\ttfct{Const} (x, \tau)) = \Constant x$.
 | |
| 989 | ||
| 990 |   \item $ast_of_term(\ttfct{Free} (x, \tau)) = constrain (\Variable x,
 | |
| 991 | \tau)$. | |
| 992 | ||
| 993 |   \item $ast_of_term(\ttfct{Var} ((x, i), \tau)) = constrain (\Variable
 | |
| 994 |     \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
 | |
| 995 |     the {\tt indexname} $(x, i)$.
 | |
| 996 | ||
| 997 | \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant | |
| 998 | of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ | |
| 999 | be obtained from~$t$ by replacing all bound occurrences of~$x$ by | |
| 1000 | the free variable $x'$. This replaces corresponding occurrences of the | |
| 1001 |     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
 | |
| 1002 |     \mtt{dummyT})$:
 | |
| 1003 |    \begin{eqnarray*}
 | |
| 1004 |       \lefteqn{ast_of_term(\ttfct{Abs} (x, \tau, t))} \\
 | |
| 1005 |       &&\qquad{}=   \ttfct{Appl}
 | |
| 1006 |                   \mathopen{\mtt[} 
 | |
| 1007 |                   \Constant \mtt{"_abs"}, constrain(\Variable x', \tau), \\
 | |
| 1008 |       &&\qquad\qquad\qquad ast_of_term(t') \mathclose{\mtt]}.
 | |
| 1009 |     \end{eqnarray*}
 | |
| 1010 | ||
| 1011 |   \item $ast_of_term(\ttfct{Bound} i) = \Variable \mtt{"B.}i\mtt"$.  
 | |
| 1012 |     The occurrence of constructor \ttindex{Bound} should never happen
 | |
| 1013 | when printing well-typed terms; it indicates a de Bruijn index with no | |
| 1014 | matching abstraction. | |
| 1015 | ||
| 1016 | \item Where $f$ is not an application, | |
| 1017 |     \begin{eqnarray*}
 | |
| 1018 |       \lefteqn{ast_of_term(f \ttapp x@1 \ttapp \ldots \ttapp x@n)} \\
 | |
| 1019 |       &&\qquad{}= \ttfct{Appl} 
 | |
| 1020 |                   \mathopen{\mtt[} ast_of_term(f), 
 | |
| 1021 | ast_of_term(x@1), \ldots,ast_of_term(x@n) | |
| 1022 |                   \mathclose{\mtt]}
 | |
| 1023 |     \end{eqnarray*}
 | |
| 1024 | \end{itemize}
 | |
| 1025 | ||
| 1026 | Type constraints are inserted to allow the printing of types, which is | |
| 1027 | governed by the boolean variable \ttindex{show_types}.  Constraints are
 | |
| 1028 | treated as follows: | |
| 1029 | \begin{itemize}
 | |
| 1030 |   \item $constrain(x, \tau) = x$, if $\tau = \mtt{dummyT}$ \index{*dummyT} or
 | |
| 1031 |     \ttindex{show_types} not set to {\tt true}.
 | |
| 1032 | ||
| 1033 |   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, ty}$,
 | |
| 1034 |     where $ty$ is the \AST{} encoding of $\tau$.  That is, type constructors as
 | |
| 1035 |     {\tt Constant}s, type identifiers as {\tt Variable}s and type applications
 | |
| 1036 |     as {\tt Appl}s with the head type constructor as first element.
 | |
| 1037 |     Additionally, if \ttindex{show_sorts} is set to {\tt true}, some type
 | |
| 1038 |     variables are decorated with an \AST{} encoding of their sort.
 | |
| 1039 | \end{itemize}
 | |
| 1040 | ||
| 1041 | The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | |
| 1042 | transformed into the final output string.  The built-in {\bf print AST
 | |
| 1043 |   translations}\indexbold{translations!print AST} effectively reverse the
 | |
| 1044 | parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
 | |
| 1045 | ||
| 1046 | For the actual printing process, the names attached to productions | |
| 1047 | of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
 | |
| 1048 | a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
 | |
| 1049 | $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production | |
| 1050 | for~$c$. Each argument~$x@i$ is converted to a string, and put in | |
| 1051 | parentheses if its priority~$(p@i)$ requires this. The resulting strings | |
| 1052 | and their syntactic sugar (denoted by ``\dots'' above) are joined to make a | |
| 1053 | single string. | |
| 1054 | ||
| 1055 | If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the | |
| 1056 | corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots | |
| 1057 | x@n) ~ x@{n+1} \ldots x@m)$. Applications with too few arguments or with
 | |
| 1058 | non-constant head or without a corresponding production are printed as | |
| 1059 | $f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$. An occurrence of | |
| 1060 | $\Variable x$ is simply printed as~$x$. | |
| 1061 | ||
| 1062 | Blanks are {\em not\/} inserted automatically.  If blanks are required to
 | |
| 1063 | separate tokens, specify them in the mixfix declaration, possibly preceeded | |
| 1064 | by a slash~({\tt/}) to allow a line break.
 | |
| 1065 | \index{trees!abstract syntax|)}
 | |
| 1066 | ||
| 1067 | ||
| 1068 | ||
| 1069 | \section{*Macros: Syntactic rewriting} \label{sec:macros}
 | |
| 1070 | \index{macros|(}\index{rewriting!syntactic|(} 
 | |
| 1071 | ||
| 1072 | Mixfix declarations alone can handle situations where there is a direct | |
| 1073 | connection between the concrete syntax and the underlying term. Sometimes | |
| 1074 | we require a more elaborate concrete syntax, such as quantifiers and list | |
| 1075 | notation.  Isabelle's {\bf macros} and {\bf translation functions} can
 | |
| 1076 | perform translations such as | |
| 1077 | \begin{center}\tt
 | |
| 1078 |   \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
 | |
| 1079 | ALL x:A.P & Ball(A, \%x.P) \\ \relax | |
| 1080 | [x, y, z] & Cons(x, Cons(y, Cons(z, Nil))) | |
| 1081 |   \end{tabular}
 | |
| 1082 | \end{center}
 | |
| 1083 | Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
 | |
| 1084 | are the most powerful translation mechanism but are difficult to read or | |
| 1085 | write. Macros are specified by first-order rewriting systems that operate | |
| 1086 | on abstract syntax trees. They are usually easy to read and write, and can | |
| 1087 | express all but the most obscure translations. | |
| 1088 | ||
| 1089 | Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
 | |
| 1090 | theory.\footnote{This and the following theories are complete working
 | |
| 1091 |   examples, though they specify only syntax, no axioms.  The file {\tt
 | |
| 1092 | ZF/zf.thy} presents the full set theory definition, including many | |
| 1093 |   macro rules.}  Theory {\tt SET} defines constants for set comprehension
 | |
| 1094 | ({\tt Collect}), replacement ({\tt Replace}) and bounded universal
 | |
| 1095 | quantification ({\tt Ball}).  Each of these binds some variables.  Without
 | |
| 1096 | additional syntax we should have to express $\forall x \in A.  P$ as {\tt
 | |
| 1097 | Ball(A,\%x.P)}, and similarly for the others. | |
| 1098 | ||
| 1099 | \begin{figure}
 | |
| 1100 | \begin{ttbox}
 | |
| 1101 | SET = Pure + | |
| 1102 | types | |
| 1103 | i, o | |
| 1104 | arities | |
| 1105 | i, o :: logic | |
| 1106 | consts | |
| 1107 |   Trueprop      :: "o => prop"              ("_" 5)
 | |
| 1108 | Collect :: "[i, i => o] => i" | |
| 1109 |   "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
 | |
| 1110 | Replace :: "[i, [i, i] => o] => i" | |
| 1111 |   "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
 | |
| 1112 | Ball :: "[i, i => o] => o" | |
| 1113 |   "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
 | |
| 1114 | translations | |
| 1115 |   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
 | |
| 1116 |   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
 | |
| 1117 | "ALL x:A. P" == "Ball(A, \%x. P)" | |
| 1118 | end | |
| 1119 | \end{ttbox}
 | |
| 1120 | \caption{Macro example: set theory}\label{fig:set_trans}
 | |
| 1121 | \end{figure}
 | |
| 1122 | ||
| 1123 | The theory specifies a variable-binding syntax through additional | |
| 1124 | productions that have mixfix declarations. Each non-copy production must | |
| 1125 | specify some constant, which is used for building \AST{}s.  The additional
 | |
| 1126 | constants are decorated with {\tt\at} to stress their purely syntactic
 | |
| 1127 | purpose; they should never occur within the final well-typed terms. | |
| 1128 | Furthermore, they cannot be written in formulae because they are not legal | |
| 1129 | identifiers. | |
| 1130 | ||
| 1131 | The translations cause the replacement of external forms by internal forms | |
| 1132 | after parsing, and vice versa before printing of terms. As a specification | |
| 1133 | of the set theory notation, they should be largely self-explanatory. The | |
| 1134 | syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
 | |
| 1135 | appear implicitly in the macro rules via their mixfix forms. | |
| 1136 | ||
| 1137 | Macros can define variable-binding syntax because they operate on \AST{}s,
 | |
| 1138 | which have no inbuilt notion of bound variable.  The macro variables {\tt
 | |
| 1139 |   x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
 | |
| 1140 | in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
 | |
| 1141 | range over formulae containing bound variable occurrences. | |
| 1142 | ||
| 1143 | Other applications of the macro system can be less straightforward, and | |
| 1144 | there are peculiarities. The rest of this section will describe in detail | |
| 1145 | how Isabelle macros are preprocessed and applied. | |
| 1146 | ||
| 1147 | ||
| 1148 | \subsection{Specifying macros}
 | |
| 1149 | Macros are basically rewrite rules on \AST{}s.  But unlike other macro
 | |
| 1150 | systems found in programming languages, Isabelle's macros work in both | |
| 1151 | directions. Therefore a syntax contains two lists of rewrites: one for | |
| 1152 | parsing and one for printing. | |
| 1153 | ||
| 1154 | The {\tt translations} section\index{translations section@{\tt translations}
 | |
| 1155 | section} specifies macros. The syntax for a macro is | |
| 1156 | \[ (root)\; string \quad | |
| 1157 |    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
 | |
| 1158 | \right\} \quad | |
| 1159 | (root)\; string | |
| 1160 | \] | |
| 1161 | % | |
| 1162 | This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
 | |
| 1163 | ({\tt ==}).  The two strings specify the left and right-hand sides of the
 | |
| 1164 | macro rule. The $(root)$ specification is optional; it specifies the | |
| 1165 | nonterminal for parsing the $string$ and if omitted defaults to {\tt
 | |
| 1166 |   logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
 | |
| 1167 | \begin{itemize}
 | |
| 1168 | \item Rules must be left linear: $l$ must not contain repeated variables. | |
| 1169 | ||
| 1170 | \item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l = | |
| 1171 | (\mtt"c\mtt" ~ x@1 \ldots x@n)$. | |
| 1172 | ||
| 1173 | \item Every variable in~$r$ must also occur in~$l$. | |
| 1174 | \end{itemize}
 | |
| 1175 | ||
| 1176 | Macro rules may refer to any syntax from the parent theories. They may | |
| 1177 | also refer to anything defined before the the {\tt .thy} file's {\tt
 | |
| 1178 | translations} section --- including any mixfix declarations. | |
| 1179 | ||
| 1180 | Upon declaration, both sides of the macro rule undergo parsing and parse | |
| 1181 | \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
 | |
| 1182 | macro expansion. The lexer runs in a different mode that additionally | |
| 1183 | accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
 | |
| 1184 | {\tt _K}).  Thus, a constant whose name starts with an underscore can
 | |
| 1185 | appear in macro rules but not in ordinary terms. | |
| 1186 | ||
| 1187 | Some atoms of the macro rule's \AST{} are designated as constants for
 | |
| 1188 | matching. These are all names that have been declared as classes, types or | |
| 1189 | constants. | |
| 1190 | ||
| 1191 | The result of this preprocessing is two lists of macro rules, each stored | |
| 1192 | as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
 | |
| 1193 | (sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
 | |
| 1194 | theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
 | |
| 1195 | \begin{ttbox}
 | |
| 1196 | parse_rules: | |
| 1197 |   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
 | |
| 1198 |   ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
 | |
| 1199 |   ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
 | |
| 1200 | print_rules: | |
| 1201 |   ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
 | |
| 1202 |   ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
 | |
| 1203 |   ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
 | |
| 1204 | \end{ttbox}
 | |
| 1205 | ||
| 1206 | \begin{warn}
 | |
| 1207 | Avoid choosing variable names that have previously been used as | |
| 1208 |   constants, types or type classes; the {\tt consts} section in the output
 | |
| 1209 |   of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
 | |
| 1210 | incorrectly, inspect its internal form as shown above, recalling that | |
| 1211 | constants appear as quoted strings and variables without quotes. | |
| 1212 | \end{warn}
 | |
| 1213 | ||
| 1214 | \begin{warn}
 | |
| 1215 | If \ttindex{eta_contract} is set to {\tt true}, terms will be
 | |
| 1216 | $\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
 | |
| 1217 | abstraction nodes needed for print rules to match may vanish. For example, | |
| 1218 | \verb|Ball(A, %x. P(x))| contracts {\tt Ball(A, P)}; the print rule does
 | |
| 1219 | not apply and the output will be {\tt Ball(A, P)}.  This problem would not
 | |
| 1220 | occur if \ML{} translation functions were used instead of macros (as is
 | |
| 1221 | done for binder declarations). | |
| 1222 | \end{warn}
 | |
| 1223 | ||
| 1224 | ||
| 1225 | \begin{warn}
 | |
| 1226 | Another trap concerns type constraints.  If \ttindex{show_types} is set to
 | |
| 1227 | {\tt true}, bound variables will be decorated by their meta types at the
 | |
| 1228 | binding place (but not at occurrences in the body). Matching with | |
| 1229 | \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
 | |
| 1230 | "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
 | |
| 1231 | appear in the external form, say \verb|{y::i:A::i. P::o}|.  
 | |
| 1232 | ||
| 1233 | To allow such constraints to be re-read, your syntax should specify bound | |
| 1234 | variables using the nonterminal~\ttindex{idt}.  This is the case in our
 | |
| 1235 | example.  Choosing {\tt id} instead of {\tt idt} is a common error,
 | |
| 1236 | especially since it appears in former versions of most of Isabelle's | |
| 1237 | object-logics. | |
| 1238 | \end{warn}
 | |
| 1239 | ||
| 1240 | ||
| 1241 | ||
| 1242 | \subsection{Applying rules}
 | |
| 1243 | As a term is being parsed or printed, an \AST{} is generated as an
 | |
| 1244 | intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
 | |
| 1245 | normalized by applying macro rules in the manner of a traditional term | |
| 1246 | rewriting system. We first examine how a single rule is applied. | |
| 1247 | ||
| 1248 | Let $t$ be the abstract syntax tree to be normalized and $(l, r)$ some | |
| 1249 | translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
 | |
| 1250 | instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
 | |
| 1251 | matched by $l$ may be replaced by the corresponding instance of~$r$, thus | |
| 1252 | {\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
 | |
| 1253 | place-holders} that may occur in rule patterns but not in ordinary | |
| 1254 | \AST{}s; {\tt Variable} atoms serve this purpose.
 | |
| 1255 | ||
| 1256 | The matching of the object~$u$ by the pattern~$l$ is performed as follows: | |
| 1257 | \begin{itemize}
 | |
| 1258 | \item Every constant matches itself. | |
| 1259 | ||
| 1260 | \item $\Variable x$ in the object matches $\Constant x$ in the pattern. | |
| 1261 | This point is discussed further below. | |
| 1262 | ||
| 1263 |   \item Every \AST{} in the object matches $\Variable x$ in the pattern,
 | |
| 1264 | binding~$x$ to~$u$. | |
| 1265 | ||
| 1266 | \item One application matches another if they have the same number of | |
| 1267 | subtrees and corresponding subtrees match. | |
| 1268 | ||
| 1269 |   \item In every other case, matching fails.  In particular, {\tt
 | |
| 1270 | Constant}~$x$ can only match itself. | |
| 1271 | \end{itemize}
 | |
| 1272 | A successful match yields a substitution that is applied to~$r$, generating | |
| 1273 | the instance that replaces~$u$. | |
| 1274 | ||
| 1275 | The second case above may look odd.  This is where {\tt Variable}s of
 | |
| 1276 | non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
 | |
| 1277 | far removed from parse trees; at this level it is not yet known which | |
| 1278 | identifiers will become constants, bounds, frees, types or classes. As | |
| 1279 | \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
 | |
| 1280 | {\tt Constant}s, while $id$s, $var$s, $tid$s and $tvar$s become {\tt
 | |
| 1281 |   Variable}s.  On the other hand, when \AST{}s generated from terms for
 | |
| 1282 | printing, all constants and type constructors become {\tt Constant}s; see
 | |
| 1283 | \S\ref{sec:asts}.  Thus \AST{}s may contain a messy mixture of {\tt
 | |
| 1284 |   Variable}s and {\tt Constant}s.  This is insignificant at macro level
 | |
| 1285 | because matching treats them alike. | |
| 1286 | ||
| 1287 | Because of this behaviour, different kinds of atoms with the same name are | |
| 1288 | indistinguishable, which may make some rules prone to misbehaviour. Example: | |
| 1289 | \begin{ttbox}
 | |
| 1290 | types | |
| 1291 | Nil | |
| 1292 | consts | |
| 1293 | Nil :: "'a list" | |
| 1294 |   "[]"    :: "'a list"    ("[]")
 | |
| 1295 | translations | |
| 1296 | "[]" == "Nil" | |
| 1297 | \end{ttbox}
 | |
| 1298 | The term {\tt Nil} will be printed as {\tt []}, just as expected.  What
 | |
| 1299 | happens with \verb|%Nil.t| or {\tt x::Nil} is left as an exercise.
 | |
| 1300 | ||
| 1301 | Normalizing an \AST{} involves repeatedly applying macro rules until none
 | |
| 1302 | is applicable. Macro rules are chosen in the order that they appear in the | |
| 1303 | {\tt translations} section.  You can watch the normalization of \AST{}s
 | |
| 1304 | during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
 | |
| 1305 | {\tt true}.\index{tracing!of macros} Alternatively, use
 | |
| 1306 | \ttindex{Syntax.test_read}.  The information displayed when tracing
 | |
| 1307 | includes the \AST{} before normalization ({\tt pre}), redexes with results
 | |
| 1308 | ({\tt rewrote}), the normal form finally reached ({\tt post}) and some
 | |
| 1309 | statistics ({\tt normalize}).  If tracing is off,
 | |
| 1310 | \ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
 | |
| 1311 | printing of the normal form and statistics only. | |
| 1312 | ||
| 1313 | ||
| 1314 | \subsection{Example: the syntax of finite sets}
 | |
| 1315 | This example demonstrates the use of recursive macros to implement a | |
| 1316 | convenient notation for finite sets. | |
| 1317 | \begin{ttbox}
 | |
| 1318 | FINSET = SET + | |
| 1319 | types | |
| 1320 | is | |
| 1321 | consts | |
| 1322 |   ""            :: "i => is"                ("_")
 | |
| 1323 |   "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
 | |
| 1324 |   empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
 | |
| 1325 | insert :: "[i, i] => i" | |
| 1326 |   "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
 | |
| 1327 | translations | |
| 1328 |   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
 | |
| 1329 |   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
 | |
| 1330 | end | |
| 1331 | \end{ttbox}
 | |
| 1332 | Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
 | |
| 1333 | declarations above specify \verb|{x, y, z}| as the external representation
 | |
| 1334 | of | |
| 1335 | \begin{ttbox}
 | |
| 1336 | insert(x, insert(y, insert(z, empty))) | |
| 1337 | \end{ttbox}
 | |
| 1338 | ||
| 1339 | The nonterminal symbol~{\tt is} stands for one or more objects of type~{\tt
 | |
| 1340 |   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
 | |
| 1341 | allows a line break after the comma for pretty printing; if no line break | |
| 1342 | is required then a space is printed instead. | |
| 1343 | ||
| 1344 | The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
 | |
| 1345 | declaration.  Hence {\tt is} is not a logical type and no default
 | |
| 1346 | productions are added. If we had needed enumerations of the nonterminal | |
| 1347 | {\tt logic}, which would include all the logical types, we could have used
 | |
| 1348 | the predefined nonterminal symbol \ttindex{args} and skipped this part
 | |
| 1349 | altogether.  The nonterminal~{\tt is} can later be reused for other
 | |
| 1350 | enumerations of type~{\tt i} like lists or tuples.
 | |
| 1351 | ||
| 1352 | Next follows {\tt empty}, which is already equipped with its syntax
 | |
| 1353 | \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
 | |
| 1354 | constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
 | |
| 1355 | i} enclosed in curly braces. Remember that a pair of parentheses, as in | |
| 1356 | \verb|"{(_)}"|, specifies a block of indentation for pretty printing.
 | |
| 1357 | ||
| 1358 | The translations may look strange at first. Macro rules are best | |
| 1359 | understood in their internal forms: | |
| 1360 | \begin{ttbox}
 | |
| 1361 | parse_rules: | |
| 1362 |   ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
 | |
| 1363 |   ("{\at}Finset" x)  ->  ("insert" x "empty")
 | |
| 1364 | print_rules: | |
| 1365 |   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
 | |
| 1366 |   ("insert" x "empty")  ->  ("{\at}Finset" x)
 | |
| 1367 | \end{ttbox}
 | |
| 1368 | This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
 | |
| 1369 | two elements, binding the first to {\tt x} and the rest to {\tt xs}.
 | |
| 1370 | Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
 | |
| 1371 | The parse rules only work in the order given. | |
| 1372 | ||
| 1373 | \begin{warn}
 | |
| 1374 |   The \AST{} rewriter cannot discern constants from variables and looks
 | |
| 1375 |   only for names of atoms.  Thus the names of {\tt Constant}s occurring in
 | |
| 1376 | the (internal) left-hand side of translation rules should be regarded as | |
| 1377 |   reserved keywords.  Choose non-identifiers like {\tt\at Finset} or
 | |
| 1378 | sufficiently long and strange names. If a bound variable's name gets | |
| 1379 | rewritten, the result will be incorrect; for example, the term | |
| 1380 | \begin{ttbox}
 | |
| 1381 | \%empty insert. insert(x, empty) | |
| 1382 | \end{ttbox}
 | |
| 1383 |   gets printed as \verb|%empty insert. {x}|.
 | |
| 1384 | \end{warn}
 | |
| 1385 | ||
| 1386 | ||
| 1387 | \subsection{Example: a parse macro for dependent types}\label{prod_trans}
 | |
| 1388 | As stated earlier, a macro rule may not introduce new {\tt Variable}s on
 | |
| 1389 | the right-hand side. Something like \verb|"K(B)" => "%x. B"| is illegal; | |
| 1390 | it allowed, it could cause variable capture. In such cases you usually | |
| 1391 | must fall back on translation functions. But a trick can make things | |
| 1392 | readable in some cases: {\em calling translation functions by parse
 | |
| 1393 | macros}: | |
| 1394 | \begin{ttbox}
 | |
| 1395 | PROD = FINSET + | |
| 1396 | consts | |
| 1397 | Pi :: "[i, i => i] => i" | |
| 1398 |   "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
 | |
| 1399 |   "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
 | |
| 1400 | \ttbreak | |
| 1401 | translations | |
| 1402 | "PROD x:A. B" => "Pi(A, \%x. B)" | |
| 1403 | "A -> B" => "Pi(A, _K(B))" | |
| 1404 | end | |
| 1405 | ML | |
| 1406 |   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
 | |
| 1407 | \end{ttbox}
 | |
| 1408 | ||
| 1409 | Here {\tt Pi} is an internal constant for constructing general products.
 | |
| 1410 | Two external forms exist: the general case {\tt PROD x:A.B} and the
 | |
| 1411 | function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
 | |
| 1412 | does not depend on~{\tt x}.
 | |
| 1413 | ||
| 1414 | The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
 | |
| 1415 | due to a parse translation associated with \ttindex{_K}.  The order of the
 | |
| 1416 | parse rules is critical. Unfortunately there is no such trick for | |
| 1417 | printing, so we have to add a {\tt ML} section for the print translation
 | |
| 1418 | \ttindex{dependent_tr'}.
 | |
| 1419 | ||
| 1420 | Recall that identifiers with a leading {\tt _} are allowed in translation
 | |
| 1421 | rules, but not in ordinary terms.  Thus we can create \AST{}s containing
 | |
| 1422 | names that are not directly expressible. | |
| 1423 | ||
| 1424 | The parse translation for {\tt _K} is already installed in Pure, and {\tt
 | |
| 1425 | dependent_tr'} is exported by the syntax module for public use. See | |
| 1426 | \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
 | |
| 1427 | \index{macros|)}\index{rewriting!syntactic|)}
 | |
| 1428 | ||
| 1429 | ||
| 1430 | ||
| 1431 | \section{*Translation functions} \label{sec:tr_funs}
 | |
| 1432 | \index{translations|(} 
 | |
| 1433 | % | |
| 1434 | This section describes the translation function mechanism. By writing | |
| 1435 | \ML{} functions, you can do almost everything with terms or \AST{}s during
 | |
| 1436 | parsing and printing. The logic \LK\ is a good example of sophisticated | |
| 1437 | transformations between internal and external representations of | |
| 1438 | associative sequences; here, macros would be useless. | |
| 1439 | ||
| 1440 | A full understanding of translations requires some familiarity | |
| 1441 | with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
 | |
| 1442 | {\tt Syntax.ast} and the encodings of types and terms as such at the various
 | |
| 1443 | stages of the parsing or printing process. Most users should never need to | |
| 1444 | use translation functions. | |
| 1445 | ||
| 1446 | \subsection{Declaring translation functions}
 | |
| 1447 | There are four kinds of translation functions. Each such function is | |
| 1448 | associated with a name, which triggers calls to it. Such names can be | |
| 1449 | constants (logical or syntactic) or type constructors. | |
| 1450 | ||
| 1451 | {\tt Syntax.print_syntax} displays the sets of names associated with the
 | |
| 1452 | translation functions of a {\tt Syntax.syntax} under
 | |
| 1453 | \ttindex{parse_ast_translation}, \ttindex{parse_translation},
 | |
| 1454 | \ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
 | |
| 1455 | add new ones via the {\tt ML} section\index{ML section@{\tt ML} section} of
 | |
| 1456 | a {\tt .thy} file.  There may never be more than one function of the same
 | |
| 1457 | kind per name.  Conceptually, the {\tt ML} section should appear between
 | |
| 1458 | {\tt consts} and {\tt translations}; newly installed translation functions
 | |
| 1459 | are already effective when macros and logical rules are parsed. | |
| 1460 | ||
| 1461 | The {\tt ML} section is copied verbatim into the \ML\ file generated from a
 | |
| 1462 | {\tt .thy} file.  Definitions made here are accessible as components of an
 | |
| 1463 | \ML\ structure; to make some definitions private, use an \ML{} {\tt local}
 | |
| 1464 | declaration.  The {\tt ML} section may install translation functions by
 | |
| 1465 | declaring any of the following identifiers: | |
| 1466 | \begin{ttbox}
 | |
| 1467 | val parse_ast_translation : (string * (ast list -> ast)) list | |
| 1468 | val print_ast_translation : (string * (ast list -> ast)) list | |
| 1469 | val parse_translation : (string * (term list -> term)) list | |
| 1470 | val print_translation : (string * (term list -> term)) list | |
| 1471 | \end{ttbox}
 | |
| 1472 | ||
| 1473 | \subsection{The translation strategy}
 | |
| 1474 | All four kinds of translation functions are treated similarly. They are | |
| 1475 | called during the transformations between parse trees, \AST{}s and terms
 | |
| 1476 | (recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
 | |
| 1477 | $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function | |
| 1478 | $f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
 | |
| 1479 | function call $f \mtt[ x@1, \ldots, x@n \mtt]$. | |
| 1480 | ||
| 1481 | For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
 | |
| 1482 | combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
 | |
| 1483 | x@n}$. For term translations, the arguments are terms and a combination | |
| 1484 | has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
 | |
| 1485 | x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated | |
| 1486 | transformations than \AST{}s do, typically involving abstractions and bound
 | |
| 1487 | variables. | |
| 1488 | ||
| 1489 | Regardless of whether they act on terms or \AST{}s,
 | |
| 1490 | parse translations differ from print translations fundamentally: | |
| 1491 | \begin{description}
 | |
| 1492 | \item[Parse translations] are applied bottom-up. The arguments are already | |
| 1493 | in translated form. The translations must not fail; exceptions trigger | |
| 1494 | an error message. | |
| 1495 | ||
| 1496 | \item[Print translations] are applied top-down. They are supplied with | |
| 1497 | arguments that are partly still in internal form. The result again | |
| 1498 | undergoes translation; therefore a print translation should not introduce | |
| 1499 | as head the very constant that invoked it. The function may raise | |
| 1500 |   exception \ttindex{Match} to indicate failure; in this event it has no
 | |
| 1501 | effect. | |
| 1502 | \end{description}
 | |
| 1503 | ||
| 1504 | Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
 | |
| 1505 | \ttindex{Const} for terms --- can invoke translation functions.  This
 | |
| 1506 | causes another difference between parsing and printing. | |
| 1507 | ||
| 1508 | Parsing starts with a string and the constants are not yet identified. | |
| 1509 | Only parse tree heads create {\tt Constant}s in the resulting \AST; recall
 | |
| 1510 | $ast_of_pt$ in \S\ref{sec:asts}.  Macros and parse \AST{} translations may
 | |
| 1511 | introduce further {\tt Constant}s.  When the final \AST{} is converted to a
 | |
| 1512 | term, all {\tt Constant}s become {\tt Const}s; recall $term_of_ast$ in
 | |
| 1513 | \S\ref{sec:asts}.
 | |
| 1514 | ||
| 1515 | Printing starts with a well-typed term and all the constants are known. So | |
| 1516 | all logical constants and type constructors may invoke print translations. | |
| 1517 | These, and macros, may introduce further constants. | |
| 1518 | ||
| 1519 | ||
| 1520 | \subsection{Example: a print translation for dependent types}
 | |
| 1521 | \indexbold{*_K}\indexbold{*dependent_tr'}
 | |
| 1522 | Let us continue the dependent type example (page~\pageref{prod_trans}) by
 | |
| 1523 | examining the parse translation for {\tt _K} and the print translation
 | |
| 1524 | {\tt dependent_tr'}, which are both built-in.  By convention, parse
 | |
| 1525 | translations have names ending with {\tt _tr} and print translations have
 | |
| 1526 | names ending with {\tt _tr'}.  Search for such names in the Isabelle
 | |
| 1527 | sources to locate more examples. | |
| 1528 | ||
| 1529 | Here is the parse translation for {\tt _K}:
 | |
| 1530 | \begin{ttbox}
 | |
| 1531 | fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
 | |
| 1532 |   | k_tr ts = raise TERM("k_tr",ts);
 | |
| 1533 | \end{ttbox}
 | |
| 1534 | If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
 | |
| 1535 | {\tt Abs} node with a body derived from $t$.  Since terms given to parse
 | |
| 1536 | translations are not yet typed, the type of the bound variable in the new | |
| 1537 | {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
 | |
| 1538 | nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
 | |
| 1539 | a basic term manipulation function defined in {\tt Pure/term.ML}.
 | |
| 1540 | ||
| 1541 | Here is the print translation for dependent types: | |
| 1542 | \begin{ttbox}
 | |
| 1543 | fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) = | |
| 1544 | if 0 mem (loose_bnos B) then | |
| 1545 | let val (x', B') = variant_abs (x, dummyT, B); | |
| 1546 | in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts) | |
| 1547 | end | |
| 1548 | else list_comb (Const (r, dummyT) $ A $ B, ts) | |
| 1549 | | dependent_tr' _ _ = raise Match; | |
| 1550 | \end{ttbox}
 | |
| 1551 | The argument {\tt (q,r)} is supplied to {\tt dependent_tr'} by a curried
 | |
| 1552 | function application during its installation. We could set up print | |
| 1553 | translations for both {\tt Pi} and {\tt Sigma} by including
 | |
| 1554 | \begin{ttbox}
 | |
| 1555 | val print_translation = | |
| 1556 |   [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
 | |
| 1557 |    ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
 | |
| 1558 | \end{ttbox}
 | |
| 1559 | within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
 | |
| 1560 | \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
 | |
| 1561 | $\hbox{\tt{\at}->}r(A, B)$, choosing the latter form if $B$ does not depend
 | |
| 1562 | on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
 | |
| 1563 | from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
 | |
| 1564 | from all names in $B$, and $B'$ the body $B$ with {\tt Bound} nodes
 | |
| 1565 | referring to our {\tt Abs} node replaced by $\ttfct{Free} (x',
 | |
| 1566 | \mtt{dummyT})$.
 | |
| 1567 | ||
| 1568 | We must be careful with types here.  While types of {\tt Const}s are
 | |
| 1569 | ignored, type constraints may be printed for some {\tt Free}s and
 | |
| 1570 | {\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
 | |
| 1571 | \ttindex{dummyT} are never printed with constraint, though.  The line
 | |
| 1572 | \begin{ttbox}
 | |
| 1573 | let val (x', B') = variant_abs (x, dummyT, B); | |
| 1574 | \end{ttbox}\index{*variant_abs}
 | |
| 1575 | replaces bound variable occurrences in~$B$ by the free variable $x'$ with | |
| 1576 | type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
 | |
| 1577 | correct type~{\tt T}, so this is the only place where a type
 | |
| 1578 | constraint might appear. | |
| 1579 | \index{translations|)}
 | |
| 1580 | ||
| 1581 | ||
| 1582 |