| author | huffman | 
| Thu, 29 Mar 2012 14:29:36 +0200 | |
| changeset 47193 | 9ae03b37b4f6 | 
| parent 42840 | e87888b4152f | 
| child 48113 | 1c4500446ba4 | 
| permissions | -rw-r--r-- | 
| 30184 
37969710e61f
removed parts of the manual that are clearly obsolete, or covered by
 wenzelm parents: 
14947diff
changeset | 1 | |
| 323 | 2 | \chapter{Syntax Transformations} \label{chap:syntax}
 | 
| 3 | \newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
 | |
| 4 | \newcommand\mtt[1]{\mbox{\tt #1}}
 | |
| 5 | \newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
 | |
| 6 | \newcommand\Constant{\ttfct{Constant}}
 | |
| 7 | \newcommand\Variable{\ttfct{Variable}}
 | |
| 8 | \newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
 | |
| 9 | \index{syntax!transformations|(}
 | |
| 10 | ||
| 11 | This chapter is intended for experienced Isabelle users who need to define | |
| 12 | macros or code their own translation functions. It describes the | |
| 13 | transformations between parse trees, abstract syntax trees and terms. | |
| 14 | ||
| 15 | ||
| 16 | \section{Abstract syntax trees} \label{sec:asts}
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 17 | \index{ASTs|(}
 | 
| 323 | 18 | |
| 19 | The parser, given a token list from the lexer, applies productions to yield | |
| 20 | a parse tree\index{parse trees}.  By applying some internal transformations
 | |
| 21 | the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
 | |
| 22 | expansion, further translations and finally type inference yields a | |
| 23 | well-typed term. The printing process is the reverse, except for some | |
| 24 | subtleties to be discussed later. | |
| 25 | ||
| 26 | Figure~\ref{fig:parse_print} outlines the parsing and printing process.
 | |
| 27 | Much of the complexity is due to the macro mechanism. Using macros, you | |
| 28 | can specify most forms of concrete syntax without writing any \ML{} code.
 | |
| 29 | ||
| 30 | \begin{figure}
 | |
| 31 | \begin{center}
 | |
| 32 | \begin{tabular}{cl}
 | |
| 33 | string & \\ | |
| 3108 | 34 | $\downarrow$ & lexer, parser \\ | 
| 323 | 35 | parse tree & \\ | 
| 36 | $\downarrow$    & parse \AST{} translation \\
 | |
| 37 | \AST{}             & \\
 | |
| 38 | $\downarrow$    & \AST{} rewriting (macros) \\
 | |
| 39 | \AST{}             & \\
 | |
| 40 | $\downarrow$ & parse translation, type inference \\ | |
| 41 | --- well-typed term --- & \\ | |
| 42 | $\downarrow$ & print translation \\ | |
| 43 | \AST{}             & \\
 | |
| 44 | $\downarrow$    & \AST{} rewriting (macros) \\
 | |
| 45 | \AST{}             & \\
 | |
| 42840 | 46 | $\downarrow$    & print \AST{} translation \\
 | 
| 323 | 47 | string & | 
| 48 | \end{tabular}
 | |
| 49 | ||
| 50 | \end{center}
 | |
| 51 | \caption{Parsing and printing}\label{fig:parse_print}
 | |
| 52 | \end{figure}
 | |
| 53 | ||
| 54 | Abstract syntax trees are an intermediate form between the raw parse trees | |
| 55 | and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
 | |
| 56 | variable) or a list of {\em at least two\/} subtrees.  Internally, they
 | |
| 57 | have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
 | |
| 58 | \index{*Appl}
 | |
| 59 | \begin{ttbox}
 | |
| 60 | datatype ast = Constant of string | |
| 61 | | Variable of string | |
| 62 | | Appl of ast list | |
| 63 | \end{ttbox}
 | |
| 64 | % | |
| 65 | Isabelle uses an S-expression syntax for abstract syntax trees. Constant | |
| 66 | atoms are shown as quoted strings, variable atoms as non-quoted strings and | |
| 332 | 67 | applications as a parenthesised list of subtrees. For example, the \AST | 
| 323 | 68 | \begin{ttbox}
 | 
| 69 | Appl [Constant "_constrain", | |
| 70 | Appl [Constant "_abs", Variable "x", Variable "t"], | |
| 71 | Appl [Constant "fun", Variable "'a", Variable "'b"]] | |
| 72 | \end{ttbox}
 | |
| 73 | is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
 | |
| 74 | Both {\tt ()} and {\tt (f)} are illegal because they have too few
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 75 | subtrees. | 
| 323 | 76 | |
| 77 | The resemblance to Lisp's S-expressions is intentional, but there are two | |
| 78 | kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the | |
| 79 | names {\tt Constant} and {\tt Variable} too literally; in the later
 | |
| 80 | translation to terms, $\Variable x$ may become a constant, free or bound | |
| 81 | variable, even a type constructor or class name; the actual outcome depends | |
| 82 | on the context. | |
| 83 | ||
| 84 | Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
 | |
| 85 | application of~$f$ to the arguments $x@1, \ldots, x@n$. But the kind of | |
| 86 | application is determined later by context; it could be a type constructor | |
| 87 | applied to types. | |
| 88 | ||
| 89 | Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
 | |
| 90 | first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
 | |
| 91 | at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
 | |
| 92 | occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
 | |
| 93 | constructor \ttindex{Bound}).
 | |
| 94 | ||
| 95 | ||
| 6618 | 96 | \section{Transforming parse trees to ASTs}\label{sec:astofpt}
 | 
| 323 | 97 | \index{ASTs!made from parse trees}
 | 
| 98 | \newcommand\astofpt[1]{\lbrakk#1\rbrakk}
 | |
| 99 | ||
| 100 | The parse tree is the raw output of the parser. Translation functions, | |
| 101 | called {\bf parse AST translations}\indexbold{translations!parse AST},
 | |
| 102 | transform the parse tree into an abstract syntax tree. | |
| 103 | ||
| 104 | The parse tree is constructed by nesting the right-hand sides of the | |
| 105 | productions used to recognize the input. Such parse trees are simply lists | |
| 106 | of tokens and constituent parse trees, the latter representing the | |
| 107 | nonterminals of the productions. Let us refer to the actual productions in | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 108 | the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 109 | example). | 
| 323 | 110 | |
| 111 | Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
 | |
| 112 | by stripping out delimiters and copy productions. More precisely, the | |
| 113 | mapping $\astofpt{-}$ is derived from the productions as follows:
 | |
| 114 | \begin{itemize}
 | |
| 115 | \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
 | |
| 14947 | 116 |   \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{num}, \ndx{xnum} or \ndx{xstr} token,
 | 
| 117 |   and $s$ its associated string.  Note that for {\tt xstr} this does not
 | |
| 118 | include the quotes. | |
| 323 | 119 | |
| 120 | \item Copy productions:\index{productions!copy}
 | |
| 121 |   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
 | |
| 122 | strings of delimiters, which are discarded. $P$ stands for the single | |
| 123 | constituent that is not a delimiter; it is either a nonterminal symbol or | |
| 124 | a name token. | |
| 125 | ||
| 126 |   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
 | |
| 127 | Here there are no constituents other than delimiters, which are | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 128 | discarded. | 
| 323 | 129 | |
| 130 | \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and | |
| 131 | the remaining constituents $P@1$, \ldots, $P@n$ are built into an | |
| 132 | application whose head constant is~$c$: | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 133 |     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
 | 
| 323 | 134 |        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
 | 
| 135 | \] | |
| 136 | \end{itemize}
 | |
| 137 | Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
 | |
| 138 | {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
 | |
| 139 | These examples illustrate the need for further translations to make \AST{}s
 | |
| 140 | closer to the typed $\lambda$-calculus. The Pure syntax provides | |
| 141 | predefined parse \AST{} translations\index{translations!parse AST} for
 | |
| 142 | ordinary applications, type applications, nested abstractions, meta | |
| 143 | implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
 | |
| 144 | effect on some representative input strings. | |
| 145 | ||
| 146 | ||
| 147 | \begin{figure}
 | |
| 148 | \begin{center}
 | |
| 149 | \tt\begin{tabular}{ll}
 | |
| 150 | \rm input string & \rm \AST \\\hline | |
| 151 | "f" & f \\ | |
| 152 | "'a" & 'a \\ | |
| 153 | "t == u"            & ("==" t u) \\
 | |
| 154 | "f(x)"              & ("_appl" f x) \\
 | |
| 155 | "f(x, y)"           & ("_appl" f ("_args" x y)) \\
 | |
| 156 | "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
 | |
| 157 | "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
 | |
| 158 | \end{tabular}
 | |
| 159 | \end{center}
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 160 | \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
 | 
| 323 | 161 | \end{figure}
 | 
| 162 | ||
| 163 | \begin{figure}
 | |
| 164 | \begin{center}
 | |
| 165 | \tt\begin{tabular}{ll}
 | |
| 166 | \rm input string            & \rm \AST{} \\\hline
 | |
| 167 | "f(x, y, z)" & (f x y z) \\ | |
| 168 | "'a ty" & (ty 'a) \\ | |
| 169 | "('a, 'b) ty"               & (ty 'a 'b) \\
 | |
| 170 | "\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
 | |
| 171 | "\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
 | |
| 172 | "[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
 | |
| 173 | "['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
 | |
| 174 | \end{tabular}
 | |
| 175 | \end{center}
 | |
| 176 | \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
 | |
| 177 | \end{figure}
 | |
| 178 | ||
| 179 | The names of constant heads in the \AST{} control the translation process.
 | |
| 180 | The list of constants invoking parse \AST{} translations appears in the
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 181 | output of {\tt print_syntax} under {\tt parse_ast_translation}.
 | 
| 323 | 182 | |
| 183 | ||
| 6618 | 184 | \section{Transforming ASTs to terms}\label{sec:termofast}
 | 
| 323 | 185 | \index{terms!made from ASTs}
 | 
| 186 | \newcommand\termofast[1]{\lbrakk#1\rbrakk}
 | |
| 187 | ||
| 188 | The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | |
| 189 | transformed into a term. This term is probably ill-typed since type | |
| 190 | inference has not occurred yet. The term may contain type constraints | |
| 191 | consisting of applications with head {\tt "_constrain"}; the second
 | |
| 192 | argument is a type encoded as a term. Type inference later introduces | |
| 193 | correct types or rejects the input. | |
| 194 | ||
| 195 | Another set of translation functions, namely parse | |
| 196 | translations\index{translations!parse}, may affect this process.  If we
 | |
| 197 | ignore parse translations for the time being, then \AST{}s are transformed
 | |
| 198 | to terms by mapping \AST{} constants to constants, \AST{} variables to
 | |
| 199 | schematic or free variables and \AST{} applications to applications.
 | |
| 200 | ||
| 201 | More precisely, the mapping $\termofast{-}$ is defined by
 | |
| 202 | \begin{itemize}
 | |
| 203 | \item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
 | |
| 204 |   \mtt{dummyT})$.
 | |
| 205 | ||
| 206 | \item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
 | |
| 207 |   \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
 | |
| 208 | the index extracted from~$xi$. | |
| 209 | ||
| 210 | \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
 | |
| 211 |   \mtt{dummyT})$.
 | |
| 212 | ||
| 213 | \item Function applications with $n$ arguments: | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 214 |     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
 | 
| 323 | 215 |        \termofast{f} \ttapp
 | 
| 216 |          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
 | |
| 217 | \] | |
| 218 | \end{itemize}
 | |
| 219 | Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
 | |
| 220 | \verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
 | |
| 221 | while \ttindex{dummyT} stands for some dummy type that is ignored during
 | |
| 222 | type inference. | |
| 223 | ||
| 224 | So far the outcome is still a first-order term. Abstractions and bound | |
| 225 | variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
 | |
| 226 | by parse translations.  Such translations are attached to {\tt "_abs"},
 | |
| 227 | {\tt "!!"} and user-defined binders.
 | |
| 228 | ||
| 229 | ||
| 230 | \section{Printing of terms}
 | |
| 231 | \newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
 | |
| 232 | ||
| 233 | The output phase is essentially the inverse of the input phase. Terms are | |
| 234 | translated via abstract syntax trees into strings. Finally the strings are | |
| 235 | pretty printed. | |
| 236 | ||
| 237 | Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
 | |
| 238 | terms into \AST{}s.  Ignoring those, the transformation maps
 | |
| 239 | term constants, variables and applications to the corresponding constructs | |
| 240 | on \AST{}s.  Abstractions are mapped to applications of the special
 | |
| 241 | constant {\tt _abs}.
 | |
| 242 | ||
| 243 | More precisely, the mapping $\astofterm{-}$ is defined as follows:
 | |
| 244 | \begin{itemize}
 | |
| 245 |   \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
 | |
| 246 | ||
| 247 |   \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
 | |
| 248 | \tau)$. | |
| 249 | ||
| 250 |   \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
 | |
| 251 |     \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
 | |
| 252 |     the {\tt indexname} $(x, i)$.
 | |
| 253 | ||
| 254 | \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant | |
| 255 | of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ | |
| 256 | be obtained from~$t$ by replacing all bound occurrences of~$x$ by | |
| 257 | the free variable $x'$. This replaces corresponding occurrences of the | |
| 258 |     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
 | |
| 259 |     \mtt{dummyT})$:
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 260 |    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 261 |       \Appl{\Constant \mtt{"_abs"},
 | 
| 8136 | 262 |         constrain(\Variable x', \tau), \astofterm{t'}}
 | 
| 323 | 263 | \] | 
| 264 | ||
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 265 |   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
 | 
| 323 | 266 |     The occurrence of constructor \ttindex{Bound} should never happen
 | 
| 267 | when printing well-typed terms; it indicates a de Bruijn index with no | |
| 268 | matching abstraction. | |
| 269 | ||
| 270 | \item Where $f$ is not an application, | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 271 |     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
 | 
| 323 | 272 |        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
 | 
| 273 | \] | |
| 274 | \end{itemize}
 | |
| 275 | % | |
| 332 | 276 | Type constraints\index{type constraints} are inserted to allow the printing
 | 
| 277 | of types.  This is governed by the boolean variable \ttindex{show_types}:
 | |
| 323 | 278 | \begin{itemize}
 | 
| 279 |   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
 | |
| 332 | 280 |     \ttindex{show_types} is set to {\tt false}.
 | 
| 323 | 281 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 282 |   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 283 |          \astofterm{\tau}}$ \ otherwise.
 | 
| 323 | 284 | |
| 285 |     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
 | |
| 286 |     constructors go to {\tt Constant}s; type identifiers go to {\tt
 | |
| 287 |       Variable}s; type applications go to {\tt Appl}s with the type
 | |
| 288 |     constructor as the first element.  If \ttindex{show_sorts} is set to
 | |
| 289 |     {\tt true}, some type variables are decorated with an \AST{} encoding
 | |
| 290 | of their sort. | |
| 291 | \end{itemize}
 | |
| 292 | % | |
| 293 | The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | |
| 294 | transformed into the final output string.  The built-in {\bf print AST
 | |
| 332 | 295 |   translations}\indexbold{translations!print AST} reverse the
 | 
| 323 | 296 | parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
 | 
| 297 | ||
| 298 | For the actual printing process, the names attached to productions | |
| 299 | of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
 | |
| 300 | a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
 | |
| 301 | $(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production | |
| 302 | for~$c$. Each argument~$x@i$ is converted to a string, and put in | |
| 303 | parentheses if its priority~$(p@i)$ requires this. The resulting strings | |
| 304 | and their syntactic sugar (denoted by \dots{} above) are joined to make a
 | |
| 305 | single string. | |
| 306 | ||
| 3108 | 307 | If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments | 
| 308 | than the corresponding production, it is first split into | |
| 309 | $((\mtt"c\mtt"~ x@1 \ldots x@n) ~ x@{n+1} \ldots x@m)$.  Applications
 | |
| 310 | with too few arguments or with non-constant head or without a | |
| 311 | corresponding production are printed as $f(x@1, \ldots, x@l)$ or | |
| 312 | $(\alpha@1, \ldots, \alpha@l) ty$. Multiple productions associated | |
| 313 | with some name $c$ are tried in order of appearance. An occurrence of | |
| 323 | 314 | $\Variable x$ is simply printed as~$x$. | 
| 315 | ||
| 316 | Blanks are {\em not\/} inserted automatically.  If blanks are required to
 | |
| 317 | separate tokens, specify them in the mixfix declaration, possibly preceded | |
| 318 | by a slash~({\tt/}) to allow a line break.
 | |
| 319 | \index{ASTs|)}
 | |
| 320 | ||
| 321 | ||
| 322 | ||
| 3108 | 323 | \section{Macros: syntactic rewriting} \label{sec:macros}
 | 
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 324 | \index{macros|(}\index{rewriting!syntactic|(}
 | 
| 323 | 325 | |
| 326 | Mixfix declarations alone can handle situations where there is a direct | |
| 327 | connection between the concrete syntax and the underlying term. Sometimes | |
| 328 | we require a more elaborate concrete syntax, such as quantifiers and list | |
| 329 | notation.  Isabelle's {\bf macros} and {\bf translation functions} can
 | |
| 330 | perform translations such as | |
| 331 | \begin{center}\tt
 | |
| 332 |   \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
 | |
| 333 | ALL x:A.P & Ball(A, \%x.P) \\ \relax | |
| 334 | [x, y, z] & Cons(x, Cons(y, Cons(z, Nil))) | |
| 335 |   \end{tabular}
 | |
| 336 | \end{center}
 | |
| 337 | Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
 | |
| 338 | are the most powerful translation mechanism but are difficult to read or | |
| 339 | write. Macros are specified by first-order rewriting systems that operate | |
| 340 | on abstract syntax trees. They are usually easy to read and write, and can | |
| 341 | express all but the most obscure translations. | |
| 342 | ||
| 3108 | 343 | Figure~\ref{fig:set_trans} defines a fragment of first-order logic and
 | 
| 344 | set theory.\footnote{This and the following theories are complete
 | |
| 345 | working examples, though they specify only syntax, no axioms. The | |
| 346 |   file {\tt ZF/ZF.thy} presents a full set theory definition,
 | |
| 347 |   including many macro rules.} Theory {\tt SetSyntax} declares
 | |
| 348 | constants for set comprehension ({\tt Collect}), replacement ({\tt
 | |
| 349 |   Replace}) and bounded universal quantification ({\tt Ball}).  Each
 | |
| 350 | of these binds some variables. Without additional syntax we should | |
| 351 | have to write $\forall x \in A.  P$ as {\tt Ball(A,\%x.P)}, and
 | |
| 352 | similarly for the others. | |
| 323 | 353 | |
| 354 | \begin{figure}
 | |
| 355 | \begin{ttbox}
 | |
| 3108 | 356 | SetSyntax = Pure + | 
| 323 | 357 | types | 
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 358 | i o | 
| 323 | 359 | arities | 
| 360 | i, o :: logic | |
| 361 | consts | |
| 1387 | 362 |   Trueprop      :: o => prop                ("_" 5)
 | 
| 363 | Collect :: [i, i => o] => i | |
| 364 | Replace :: [i, [i, i] => o] => i | |
| 365 | Ball :: [i, i => o] => o | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 366 | syntax | 
| 1387 | 367 |   "{\at}Collect"    :: [idt, i, o] => i         ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
 | 
| 368 |   "{\at}Replace"    :: [idt, idt, i, o] => i    ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
 | |
| 369 |   "{\at}Ball"       :: [idt, i, o] => o         ("(3ALL _:_./ _)" 10)
 | |
| 323 | 370 | translations | 
| 371 |   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
 | |
| 372 |   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
 | |
| 373 | "ALL x:A. P" == "Ball(A, \%x. P)" | |
| 374 | end | |
| 375 | \end{ttbox}
 | |
| 376 | \caption{Macro example: set theory}\label{fig:set_trans}
 | |
| 377 | \end{figure}
 | |
| 378 | ||
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 379 | The theory specifies a variable-binding syntax through additional productions | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 380 | that have mixfix declarations. Each non-copy production must specify some | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 381 | constant, which is used for building \AST{}s. \index{constants!syntactic} The
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 382 | additional constants are decorated with {\tt\at} to stress their purely
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 383 | syntactic purpose; they may not occur within the final well-typed terms, | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 384 | being declared as {\tt syntax} rather than {\tt consts}.
 | 
| 323 | 385 | |
| 386 | The translations cause the replacement of external forms by internal forms | |
| 387 | after parsing, and vice versa before printing of terms. As a specification | |
| 388 | of the set theory notation, they should be largely self-explanatory. The | |
| 389 | syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
 | |
| 390 | appear implicitly in the macro rules via their mixfix forms. | |
| 391 | ||
| 392 | Macros can define variable-binding syntax because they operate on \AST{}s,
 | |
| 393 | which have no inbuilt notion of bound variable.  The macro variables {\tt
 | |
| 394 |   x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
 | |
| 395 | in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
 | |
| 396 | range over formulae containing bound variable occurrences. | |
| 397 | ||
| 398 | Other applications of the macro system can be less straightforward, and | |
| 399 | there are peculiarities. The rest of this section will describe in detail | |
| 400 | how Isabelle macros are preprocessed and applied. | |
| 401 | ||
| 402 | ||
| 403 | \subsection{Specifying macros}
 | |
| 404 | Macros are basically rewrite rules on \AST{}s.  But unlike other macro
 | |
| 405 | systems found in programming languages, Isabelle's macros work in both | |
| 406 | directions. Therefore a syntax contains two lists of rewrites: one for | |
| 407 | parsing and one for printing. | |
| 408 | ||
| 409 | \index{*translations section}
 | |
| 410 | The {\tt translations} section specifies macros.  The syntax for a macro is
 | |
| 411 | \[ (root)\; string \quad | |
| 412 |    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
 | |
| 413 | \right\} \quad | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 414 | (root)\; string | 
| 323 | 415 | \] | 
| 416 | % | |
| 417 | This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
 | |
| 418 | ({\tt ==}).  The two strings specify the left and right-hand sides of the
 | |
| 419 | macro rule. The $(root)$ specification is optional; it specifies the | |
| 420 | nonterminal for parsing the $string$ and if omitted defaults to {\tt
 | |
| 421 |   logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
 | |
| 422 | \begin{itemize}
 | |
| 423 | \item Rules must be left linear: $l$ must not contain repeated variables. | |
| 424 | ||
| 425 | \item Every variable in~$r$ must also occur in~$l$. | |
| 426 | \end{itemize}
 | |
| 427 | ||
| 3108 | 428 | Macro rules may refer to any syntax from the parent theories. They | 
| 429 | may also refer to anything defined before the current {\tt
 | |
| 323 | 430 | translations} section --- including any mixfix declarations. | 
| 431 | ||
| 432 | Upon declaration, both sides of the macro rule undergo parsing and parse | |
| 433 | \AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
 | |
| 434 | macro expansion. The lexer runs in a different mode that additionally | |
| 435 | accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
 | |
| 436 | {\tt _K}).  Thus, a constant whose name starts with an underscore can
 | |
| 437 | appear in macro rules but not in ordinary terms. | |
| 438 | ||
| 439 | Some atoms of the macro rule's \AST{} are designated as constants for
 | |
| 440 | matching. These are all names that have been declared as classes, types or | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 441 | constants (logical and syntactic). | 
| 323 | 442 | |
| 3108 | 443 | The result of this preprocessing is two lists of macro rules, each | 
| 444 | stored as a pair of \AST{}s.  They can be viewed using {\tt
 | |
| 445 |   print_syntax} (sections \ttindex{parse_rules} and
 | |
| 446 | \ttindex{print_rules}).  For theory~{\tt SetSyntax} of
 | |
| 447 | Fig.~\ref{fig:set_trans} these are
 | |
| 323 | 448 | \begin{ttbox}
 | 
| 449 | parse_rules: | |
| 450 |   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
 | |
| 451 |   ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
 | |
| 452 |   ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
 | |
| 453 | print_rules: | |
| 454 |   ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
 | |
| 455 |   ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
 | |
| 456 |   ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
 | |
| 457 | \end{ttbox}
 | |
| 458 | ||
| 459 | \begin{warn}
 | |
| 460 | Avoid choosing variable names that have previously been used as | |
| 461 |   constants, types or type classes; the {\tt consts} section in the output
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 462 |   of {\tt print_syntax} lists all such names.  If a macro rule works
 | 
| 323 | 463 | incorrectly, inspect its internal form as shown above, recalling that | 
| 464 | constants appear as quoted strings and variables without quotes. | |
| 465 | \end{warn}
 | |
| 466 | ||
| 467 | \begin{warn}
 | |
| 468 | If \ttindex{eta_contract} is set to {\tt true}, terms will be
 | |
| 469 | $\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
 | |
| 470 | abstraction nodes needed for print rules to match may vanish. For example, | |
| 332 | 471 | \verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
 | 
| 323 | 472 | not apply and the output will be {\tt Ball(A, P)}.  This problem would not
 | 
| 473 | occur if \ML{} translation functions were used instead of macros (as is
 | |
| 474 | done for binder declarations). | |
| 475 | \end{warn}
 | |
| 476 | ||
| 477 | ||
| 478 | \begin{warn}
 | |
| 479 | Another trap concerns type constraints.  If \ttindex{show_types} is set to
 | |
| 480 | {\tt true}, bound variables will be decorated by their meta types at the
 | |
| 481 | binding place (but not at occurrences in the body). Matching with | |
| 482 | \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
 | |
| 483 | "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 484 | appear in the external form, say \verb|{y::i:A::i. P::o}|.
 | 
| 323 | 485 | |
| 486 | To allow such constraints to be re-read, your syntax should specify bound | |
| 487 | variables using the nonterminal~\ndx{idt}.  This is the case in our
 | |
| 3108 | 488 | example.  Choosing {\tt id} instead of {\tt idt} is a common error.
 | 
| 323 | 489 | \end{warn}
 | 
| 490 | ||
| 491 | ||
| 492 | ||
| 493 | \subsection{Applying rules}
 | |
| 494 | As a term is being parsed or printed, an \AST{} is generated as an
 | |
| 495 | intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
 | |
| 332 | 496 | normalised by applying macro rules in the manner of a traditional term | 
| 323 | 497 | rewriting system. We first examine how a single rule is applied. | 
| 498 | ||
| 332 | 499 | Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some | 
| 323 | 500 | translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
 | 
| 501 | instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
 | |
| 502 | matched by $l$ may be replaced by the corresponding instance of~$r$, thus | |
| 503 | {\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
 | |
| 504 | place-holders} that may occur in rule patterns but not in ordinary | |
| 505 | \AST{}s; {\tt Variable} atoms serve this purpose.
 | |
| 506 | ||
| 507 | The matching of the object~$u$ by the pattern~$l$ is performed as follows: | |
| 508 | \begin{itemize}
 | |
| 509 | \item Every constant matches itself. | |
| 510 | ||
| 511 | \item $\Variable x$ in the object matches $\Constant x$ in the pattern. | |
| 512 | This point is discussed further below. | |
| 513 | ||
| 514 |   \item Every \AST{} in the object matches $\Variable x$ in the pattern,
 | |
| 515 | binding~$x$ to~$u$. | |
| 516 | ||
| 517 | \item One application matches another if they have the same number of | |
| 518 | subtrees and corresponding subtrees match. | |
| 519 | ||
| 520 |   \item In every other case, matching fails.  In particular, {\tt
 | |
| 521 | Constant}~$x$ can only match itself. | |
| 522 | \end{itemize}
 | |
| 523 | A successful match yields a substitution that is applied to~$r$, generating | |
| 524 | the instance that replaces~$u$. | |
| 525 | ||
| 526 | The second case above may look odd.  This is where {\tt Variable}s of
 | |
| 527 | non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
 | |
| 528 | far removed from parse trees; at this level it is not yet known which | |
| 529 | identifiers will become constants, bounds, frees, types or classes. As | |
| 530 | \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 531 | {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
 | 
| 14947 | 532 | \ndx{tvar}, \ndx{num}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s.  On the other
 | 
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 533 | hand, when \AST{}s generated from terms for printing, all constants and type
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 534 | constructors become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 535 | contain a messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
 | 
| 323 | 536 | insignificant at macro level because matching treats them alike. | 
| 537 | ||
| 538 | Because of this behaviour, different kinds of atoms with the same name are | |
| 539 | indistinguishable, which may make some rules prone to misbehaviour. Example: | |
| 540 | \begin{ttbox}
 | |
| 541 | types | |
| 542 | Nil | |
| 543 | consts | |
| 1387 | 544 | Nil :: 'a list | 
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 545 | syntax | 
| 1387 | 546 |   "[]"    :: 'a list    ("[]")
 | 
| 323 | 547 | translations | 
| 548 | "[]" == "Nil" | |
| 549 | \end{ttbox}
 | |
| 550 | The term {\tt Nil} will be printed as {\tt []}, just as expected.
 | |
| 551 | The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 552 | expected!  Guess how type~{\tt Nil} is printed?
 | 
| 323 | 553 | |
| 14893 | 554 | Normalizing an \AST{} involves repeatedly applying macro rules until none are
 | 
| 555 | applicable. Macro rules are chosen in order of appearance in the theory | |
| 556 | definitions.  You can watch the normalization of \AST{}s during parsing and
 | |
| 557 | printing by setting \ttindex{Syntax.trace_ast} to {\tt true}.\index{tracing!of
 | |
| 558 |   macros} The information displayed when tracing includes the \AST{} before
 | |
| 559 | normalization ({\tt pre}), redexes with results ({\tt rewrote}), the normal
 | |
| 560 | form finally reached ({\tt post}) and some statistics ({\tt normalize}).
 | |
| 323 | 561 | |
| 562 | \subsection{Example: the syntax of finite sets}
 | |
| 563 | \index{examples!of macros}
 | |
| 564 | ||
| 565 | This example demonstrates the use of recursive macros to implement a | |
| 566 | convenient notation for finite sets. | |
| 567 | \begin{ttbox}
 | |
| 3108 | 568 | FinSyntax = SetSyntax + | 
| 323 | 569 | types | 
| 570 | is | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 571 | syntax | 
| 1387 | 572 |   ""            :: i => is                  ("_")
 | 
| 573 |   "{\at}Enum"       :: [i, is] => is            ("_,/ _")
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 574 | consts | 
| 1387 | 575 |   empty         :: i                        ("{\ttlbrace}{\ttrbrace}")
 | 
| 576 | insert :: [i, i] => i | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 577 | syntax | 
| 1387 | 578 |   "{\at}Finset"     :: is => i                  ("{\ttlbrace}(_){\ttrbrace}")
 | 
| 323 | 579 | translations | 
| 580 |   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
 | |
| 581 |   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
 | |
| 582 | end | |
| 583 | \end{ttbox}
 | |
| 584 | Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
 | |
| 585 | declarations above specify \verb|{x, y, z}| as the external representation
 | |
| 586 | of | |
| 587 | \begin{ttbox}
 | |
| 588 | insert(x, insert(y, insert(z, empty))) | |
| 589 | \end{ttbox}
 | |
| 590 | The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
 | |
| 591 |   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
 | |
| 592 | allows a line break after the comma for \rmindex{pretty printing}; if no
 | |
| 593 | line break is required then a space is printed instead. | |
| 594 | ||
| 595 | The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 596 | declaration.  Hence {\tt is} is not a logical type and may be used safely as
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 597 | a new nonterminal for custom syntax.  The nonterminal~{\tt is} can later be
 | 
| 3485 
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
 paulson parents: 
3135diff
changeset | 598 | re-used for other enumerations of type~{\tt i} like lists or tuples.  If we
 | 
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 599 | had needed polymorphic enumerations, we could have used the predefined | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 600 | nonterminal symbol \ndx{args} and skipped this part altogether.
 | 
| 323 | 601 | |
| 602 | Next follows {\tt empty}, which is already equipped with its syntax
 | |
| 603 | \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
 | |
| 604 | constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
 | |
| 605 | i} enclosed in curly braces. Remember that a pair of parentheses, as in | |
| 606 | \verb|"{(_)}"|, specifies a block of indentation for pretty printing.
 | |
| 607 | ||
| 608 | The translations may look strange at first. Macro rules are best | |
| 609 | understood in their internal forms: | |
| 610 | \begin{ttbox}
 | |
| 611 | parse_rules: | |
| 612 |   ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
 | |
| 613 |   ("{\at}Finset" x)  ->  ("insert" x "empty")
 | |
| 614 | print_rules: | |
| 615 |   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
 | |
| 616 |   ("insert" x "empty")  ->  ("{\at}Finset" x)
 | |
| 617 | \end{ttbox}
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 618 | This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
 | 
| 323 | 619 | two elements, binding the first to {\tt x} and the rest to {\tt xs}.
 | 
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 620 | Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
 | 
| 323 | 621 | The parse rules only work in the order given. | 
| 622 | ||
| 623 | \begin{warn}
 | |
| 332 | 624 |   The \AST{} rewriter cannot distinguish constants from variables and looks
 | 
| 323 | 625 |   only for names of atoms.  Thus the names of {\tt Constant}s occurring in
 | 
| 626 | the (internal) left-hand side of translation rules should be regarded as | |
| 627 |   \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
 | |
| 628 | sufficiently long and strange names. If a bound variable's name gets | |
| 629 | rewritten, the result will be incorrect; for example, the term | |
| 630 | \begin{ttbox}
 | |
| 631 | \%empty insert. insert(x, empty) | |
| 632 | \end{ttbox}
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 633 | \par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
 | 
| 323 | 634 | \end{warn}
 | 
| 635 | ||
| 636 | ||
| 637 | \subsection{Example: a parse macro for dependent types}\label{prod_trans}
 | |
| 638 | \index{examples!of macros}
 | |
| 639 | ||
| 640 | As stated earlier, a macro rule may not introduce new {\tt Variable}s on
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 641 | the right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal; | 
| 323 | 642 | if allowed, it could cause variable capture. In such cases you usually | 
| 643 | must fall back on translation functions. But a trick can make things | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 644 | readable in some cases: {\em calling\/} translation functions by parse
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 645 | macros: | 
| 323 | 646 | \begin{ttbox}
 | 
| 3135 | 647 | ProdSyntax = SetSyntax + | 
| 323 | 648 | consts | 
| 1387 | 649 | Pi :: [i, i => i] => i | 
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 650 | syntax | 
| 1387 | 651 |   "{\at}PROD"       :: [idt, i, i] => i       ("(3PROD _:_./ _)" 10)
 | 
| 652 |   "{\at}->"         :: [i, i] => i            ("(_ ->/ _)" [51, 50] 50)
 | |
| 323 | 653 | \ttbreak | 
| 654 | translations | |
| 655 | "PROD x:A. B" => "Pi(A, \%x. B)" | |
| 656 | "A -> B" => "Pi(A, _K(B))" | |
| 657 | end | |
| 658 | ML | |
| 659 |   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
 | |
| 660 | \end{ttbox}
 | |
| 661 | ||
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 662 | Here {\tt Pi} is a logical constant for constructing general products.
 | 
| 323 | 663 | Two external forms exist: the general case {\tt PROD x:A.B} and the
 | 
| 664 | function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
 | |
| 665 | does not depend on~{\tt x}.
 | |
| 666 | ||
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 667 | The second parse macro introduces {\tt _K(B)}, which later becomes
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 668 | \verb|%x.B| due to a parse translation associated with \cdx{_K}.
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 669 | Unfortunately there is no such trick for printing, so we have to add a {\tt
 | 
| 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 670 | ML} section for the print translation \ttindex{dependent_tr'}.
 | 
| 323 | 671 | |
| 672 | Recall that identifiers with a leading {\tt _} are allowed in translation
 | |
| 673 | rules, but not in ordinary terms.  Thus we can create \AST{}s containing
 | |
| 674 | names that are not directly expressible. | |
| 675 | ||
| 8136 | 676 | The parse translation for {\tt _K} is already installed in Pure, and the
 | 
| 677 | function {\tt dependent_tr'} is exported by the syntax module for public use.
 | |
| 678 | See \S\ref{sec:tr_funs} below for more of the arcane lore of translation
 | |
| 679 | functions.  \index{macros|)}\index{rewriting!syntactic|)}
 | |
| 323 | 680 | |
| 681 | ||
| 682 | \section{Translation functions} \label{sec:tr_funs}
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 683 | \index{translations|(}
 | 
| 323 | 684 | % | 
| 9695 | 685 | This section describes the translation function mechanism.  By writing \ML{}
 | 
| 686 | functions, you can do almost everything to terms or \AST{}s during parsing and
 | |
| 687 | printing. The logic LK is a good example of sophisticated transformations | |
| 688 | between internal and external representations of sequents; here, macros would | |
| 689 | be useless. | |
| 323 | 690 | |
| 691 | A full understanding of translations requires some familiarity | |
| 692 | with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
 | |
| 693 | {\tt Syntax.ast} and the encodings of types and terms as such at the various
 | |
| 694 | stages of the parsing or printing process. Most users should never need to | |
| 695 | use translation functions. | |
| 696 | ||
| 697 | \subsection{Declaring translation functions}
 | |
| 3108 | 698 | There are four kinds of translation functions, with one of these | 
| 699 | coming in two variants. Each such function is associated with a name, | |
| 700 | which triggers calls to it. Such names can be constants (logical or | |
| 701 | syntactic) or type constructors. | |
| 323 | 702 | |
| 6343 | 703 | Function {\tt print_syntax} displays the sets of names associated with the
 | 
| 704 | translation functions of a theory under \texttt{parse_ast_translation}, etc.
 | |
| 705 | You can add new ones via the {\tt ML} section\index{*ML section} of a theory
 | |
| 706 | definition file.  Even though the {\tt ML} section is the very last part of
 | |
| 707 | the file, newly installed translation functions are already effective when | |
| 708 | processing all of the preceding sections. | |
| 323 | 709 | |
| 3108 | 710 | The {\tt ML} section's contents are simply copied verbatim near the
 | 
| 711 | beginning of the \ML\ file generated from a theory definition file. | |
| 712 | Definitions made here are accessible as components of an \ML\ | |
| 713 | structure; to make some parts private, use an \ML{} {\tt local}
 | |
| 714 | declaration.  The {\ML} code may install translation functions by
 | |
| 715 | declaring any of the following identifiers: | |
| 323 | 716 | \begin{ttbox}
 | 
| 3108 | 717 | val parse_ast_translation : (string * (ast list -> ast)) list | 
| 718 | val print_ast_translation : (string * (ast list -> ast)) list | |
| 719 | val parse_translation : (string * (term list -> term)) list | |
| 720 | val print_translation : (string * (term list -> term)) list | |
| 4375 | 721 | val typed_print_translation : | 
| 722 | (string * (bool -> typ -> term list -> term)) list | |
| 323 | 723 | \end{ttbox}
 | 
| 724 | ||
| 725 | \subsection{The translation strategy}
 | |
| 3108 | 726 | The different kinds of translation functions are called during the | 
| 727 | transformations between parse trees, \AST{}s and terms (recall
 | |
| 728 | Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
 | |
| 729 | $(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation | |
| 730 | function $f$ of appropriate kind exists for $c$, the result is | |
| 731 | computed by the \ML{} function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
 | |
| 323 | 732 | |
| 3108 | 733 | For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.
 | 
| 734 | A combination has the form $\Constant c$ or $\Appl{\Constant c, x@1,
 | |
| 735 | \ldots, x@n}$. For term translations, the arguments are terms and a | |
| 736 | combination has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const}
 | |
| 737 | (c, \tau) \ttapp x@1 \ttapp \ldots \ttapp x@n$. Terms allow more | |
| 738 | sophisticated transformations than \AST{}s do, typically involving
 | |
| 739 | abstractions and bound variables. {\em Typed} print translations may
 | |
| 4375 | 740 | even peek at the type $\tau$ of the constant they are invoked on; they | 
| 741 | are also passed the current value of the \ttindex{show_sorts} flag.
 | |
| 323 | 742 | |
| 3108 | 743 | Regardless of whether they act on terms or \AST{}s, translation
 | 
| 744 | functions called during the parsing process differ from those for | |
| 745 | printing more fundamentally in their overall behaviour: | |
| 323 | 746 | \begin{description}
 | 
| 6343 | 747 | \item[Parse translations] are applied bottom-up. The arguments are already in | 
| 748 | translated form. The translations must not fail; exceptions trigger an | |
| 749 | error message. There may never be more than one function associated with | |
| 750 | any syntactic name. | |
| 751 | ||
| 323 | 752 | \item[Print translations] are applied top-down. They are supplied with | 
| 753 | arguments that are partly still in internal form. The result again | |
| 6343 | 754 | undergoes translation; therefore a print translation should not introduce as | 
| 755 | head the very constant that invoked it. The function may raise exception | |
| 756 |   \xdx{Match} to indicate failure; in this event it has no effect.  Multiple
 | |
| 757 | functions associated with some syntactic name are tried in an unspecified | |
| 758 | order. | |
| 323 | 759 | \end{description}
 | 
| 760 | ||
| 761 | Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
 | |
| 762 | \ttindex{Const} for terms --- can invoke translation functions.  This
 | |
| 763 | causes another difference between parsing and printing. | |
| 764 | ||
| 765 | Parsing starts with a string and the constants are not yet identified. | |
| 766 | Only parse tree heads create {\tt Constant}s in the resulting \AST, as
 | |
| 767 | described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
 | |
| 768 | introduce further {\tt Constant}s.  When the final \AST{} is converted to a
 | |
| 769 | term, all {\tt Constant}s become {\tt Const}s, as described in
 | |
| 770 | \S\ref{sec:termofast}.
 | |
| 771 | ||
| 772 | Printing starts with a well-typed term and all the constants are known. So | |
| 773 | all logical constants and type constructors may invoke print translations. | |
| 774 | These, and macros, may introduce further constants. | |
| 775 | ||
| 776 | ||
| 777 | \subsection{Example: a print translation for dependent types}
 | |
| 778 | \index{examples!of translations}\indexbold{*dependent_tr'}
 | |
| 779 | ||
| 780 | Let us continue the dependent type example (page~\pageref{prod_trans}) by
 | |
| 781 | examining the parse translation for~\cdx{_K} and the print translation
 | |
| 782 | {\tt dependent_tr'}, which are both built-in.  By convention, parse
 | |
| 783 | translations have names ending with {\tt _tr} and print translations have
 | |
| 784 | names ending with {\tt _tr'}.  Search for such names in the Isabelle
 | |
| 785 | sources to locate more examples. | |
| 786 | ||
| 787 | Here is the parse translation for {\tt _K}:
 | |
| 788 | \begin{ttbox}
 | |
| 789 | fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 790 |   | k_tr ts = raise TERM ("k_tr", ts);
 | 
| 323 | 791 | \end{ttbox}
 | 
| 792 | If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
 | |
| 793 | {\tt Abs} node with a body derived from $t$.  Since terms given to parse
 | |
| 794 | translations are not yet typed, the type of the bound variable in the new | |
| 795 | {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
 | |
| 796 | nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
 | |
| 797 | a basic term manipulation function defined in {\tt Pure/term.ML}.
 | |
| 798 | ||
| 799 | Here is the print translation for dependent types: | |
| 800 | \begin{ttbox}
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 801 | fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) = | 
| 323 | 802 | if 0 mem (loose_bnos B) then | 
| 3108 | 803 | let val (x', B') = Syntax.variant_abs' (x, dummyT, B) in | 
| 804 | list_comb | |
| 8136 | 805 | (Const (q,dummyT) $ | 
| 806 |              Syntax.mark_boundT (x',{\thinspace}T) $ A $ B', ts)
 | |
| 323 | 807 | end | 
| 808 | else list_comb (Const (r, dummyT) $ A $ B, ts) | |
| 809 | | dependent_tr' _ _ = raise Match; | |
| 810 | \end{ttbox}
 | |
| 3135 | 811 | The argument {\tt (q,{\thinspace}r)} is supplied to the curried function {\tt
 | 
| 3108 | 812 | dependent_tr'} by a partial application during its installation. | 
| 813 | For example, we could set up print translations for both {\tt Pi} and
 | |
| 814 | {\tt Sigma} by including
 | |
| 323 | 815 | \begin{ttbox}\index{*ML section}
 | 
| 816 | val print_translation = | |
| 817 |   [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
 | |
| 818 |    ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
 | |
| 819 | \end{ttbox}
 | |
| 3108 | 820 | within the {\tt ML} section.  The first of these transforms ${\tt
 | 
| 821 |   Pi}(A, \mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
 | |
| 822 | $\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not
 | |
| 823 | depend on~$x$.  It checks this using \ttindex{loose_bnos}, yet another
 | |
| 824 | function from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$
 | |
| 825 | renamed away from all names in $B$, and $B'$ is the body $B$ with {\tt
 | |
| 826 |   Bound} nodes referring to the {\tt Abs} node replaced by
 | |
| 827 | $\ttfct{Free} (x', \mtt{dummyT})$ (but marked as representing a bound
 | |
| 828 | variable). | |
| 323 | 829 | |
| 830 | We must be careful with types here.  While types of {\tt Const}s are
 | |
| 831 | ignored, type constraints may be printed for some {\tt Free}s and
 | |
| 832 | {\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
 | |
| 833 | \ttindex{dummyT} are never printed with constraint, though.  The line
 | |
| 834 | \begin{ttbox}
 | |
| 3108 | 835 | let val (x', B') = Syntax.variant_abs' (x, dummyT, B); | 
| 836 | \end{ttbox}\index{*Syntax.variant_abs'}
 | |
| 323 | 837 | replaces bound variable occurrences in~$B$ by the free variable $x'$ with | 
| 838 | type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
 | |
| 839 | correct type~{\tt T}, so this is the only place where a type
 | |
| 864 
d63b111b917a
quite a lot of minor and major revisions (inspecting theories, read_axm,
 wenzelm parents: 
499diff
changeset | 840 | constraint might appear. | 
| 3108 | 841 | |
| 842 | Also note that we are responsible to mark free identifiers that | |
| 3485 
f27a30a18a17
Now there are TWO spaces after each full stop, so that the Emacs sentence
 paulson parents: 
3135diff
changeset | 843 | actually represent bound variables. This is achieved by | 
| 3108 | 844 | \ttindex{Syntax.variant_abs'} and \ttindex{Syntax.mark_boundT} above.
 | 
| 845 | Failing to do so may cause these names to be printed in the wrong | |
| 846 | style.  \index{translations|)} \index{syntax!transformations|)}
 | |
| 847 | ||
| 848 | ||
| 5371 | 849 | %%% Local Variables: | 
| 850 | %%% mode: latex | |
| 851 | %%% TeX-master: "ref" | |
| 852 | %%% End: |