doc-src/Logics/defining.tex
changeset 188 6be0856cdf49
parent 142 6dfae8cddec7
child 291 a615050a7494
equal deleted inserted replaced
187:8729bfdcb638 188:6be0856cdf49
    20 chief purpose is a thorough description of all syntax related matters
    20 chief purpose is a thorough description of all syntax related matters
    21 concerning theories. The most important sections are \S\ref{sec:mixfix} about
    21 concerning theories. The most important sections are \S\ref{sec:mixfix} about
    22 mixfix declarations and \S\ref{sec:macros} describing the macro system. The
    22 mixfix declarations and \S\ref{sec:macros} describing the macro system. The
    23 concluding examples of \S\ref{sec:min_logics} are more concerned with the
    23 concluding examples of \S\ref{sec:min_logics} are more concerned with the
    24 logical aspects of the definition of theories. Sections marked with * can be
    24 logical aspects of the definition of theories. Sections marked with * can be
    25 skipped on the first reading.
    25 skipped on first reading.
    26 
    26 
    27 
    27 
    28 
    28 
    29 \section{Precedence grammars} \label{sec:precedence_grammars}
    29 \section{Precedence grammars} \label{sec:precedence_grammars}
    30 
    30 
   177 \subsection{Lexical matters *}
   177 \subsection{Lexical matters *}
   178 
   178 
   179 The parser does not process input strings directly, it rather operates on
   179 The parser does not process input strings directly, it rather operates on
   180 token lists provided by Isabelle's \rmindex{lexical analyzer} (the
   180 token lists provided by Isabelle's \rmindex{lexical analyzer} (the
   181 \bfindex{lexer}). There are two different kinds of tokens: {\bf
   181 \bfindex{lexer}). There are two different kinds of tokens: {\bf
   182 literals}\indexbold{literal token}\indexbold{token!literal} and {\bf valued
   182   delimiters}\indexbold{delimiter} and {\bf names}\indexbold{name}.
   183 tokens}\indexbold{valued token}\indexbold{token!valued}.
   183 
   184 
   184 Delimiters can be regarded as reserved words\index{reserved word} of the
   185 Literals can be regarded as reserved words\index{reserved word} of the syntax
   185 syntax and the user can add new ones, when extending theories. In
   186 and the user can add new ones, when extending theories. In
   186 Figure~\ref{fig:pure_gram} they appear in typewriter font, e.g.\ {\tt PROP},
   187 Figure~\ref{fig:pure_gram} they appear in typewriter type, e.g.\ {\tt PROP},
       
   188 {\tt ==}, {\tt =?=}, {\tt ;}.
   187 {\tt ==}, {\tt =?=}, {\tt ;}.
   189 
   188 
   190 Valued tokens on the other hand have a fixed predefined syntax. The lexer
   189 Names on the other hand have a fixed predefined syntax. The lexer
   191 distinguishes four kinds of them: identifiers\index{identifier},
   190 distinguishes four kinds of them: identifiers\index{identifier},
   192 unknowns\index{unknown}\index{scheme variable|see{unknown}}, type
   191 unknowns\index{unknown}\index{scheme variable|see{unknown}}, type
   193 variables\index{type variable}, type unknowns\index{type unknown}\index{type
   192 variables\index{type variable}, type unknowns\index{type unknown}\index{type
   194 scheme variable|see{type unknown}}; they are denoted by $id$\index{id@$id$},
   193 scheme variable|see{type unknown}}; they are denoted by $id$\index{id@$id$},
   195 $var$\index{var@$var$}, $tfree$\index{tfree@$tfree$},
   194 $var$\index{var@$var$}, $tfree$\index{tfree@$tfree$},
   196 $tvar$\index{tvar@$tvar$}, respectively. Typical examples are {\tt x}, {\tt
   195 $tvar$\index{tvar@$tvar$}, respectively. Typical examples are {\tt x}, {\tt
   197 ?x7}, {\tt 'a}, {\tt ?'a3}, the exact syntax is:
   196 ?x7}, {\tt 'a}, {\tt ?'a3}, the exact syntax is:
   198 
   197 \begin{eqnarray*}
   199 \begin{tabular}{rcl}
   198 id        & =   & letter~quasiletter^* \\
   200 $id$        & =   & $letter~quasiletter^*$ \\
   199 var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
   201 $var$       & =   & ${\tt ?}id ~~|~~ {\tt ?}id{\tt .}nat$ \\
   200 tfree     & =   & \mbox{\tt '}id \\
   202 $tfree$     & =   & ${\tt '}id$ \\
   201 tvar      & =   & \mbox{\tt ?}tfree ~~|~~
   203 $tvar$      & =   & ${\tt ?}tfree ~~|~~ {\tt ?}tfree{\tt .}nat$ \\[1ex]
   202                   \mbox{\tt ?}tfree\mbox{\tt .}nat \\[1ex]
   204 
   203 letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
   205 $letter$    & =   & one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z} \\
   204 digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
   206 $digit$     & =   & one of {\tt 0}\dots {\tt 9} \\
   205 quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
   207 $quasiletter$ & =  & $letter ~~|~~ digit ~~|~~ {\tt _} ~~|~~ {\tt '}$ \\
   206 nat       & =   & digit^+
   208 $nat$       & =   & $digit^+$
   207 \end{eqnarray*}
   209 \end{tabular}
   208 A $var$ or $tvar$ describes an \rmindex{unknown} which is internally a pair
   210 
   209 of base name and index (\ML\ type \ttindex{indexname}).  These components are
   211 A string of $var$ or $tvar$ describes an \rmindex{unknown} which is
   210 either explicitly separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
   212 internally a pair of base name and index (\ML\ type \ttindex{indexname}).
   211 directly run together as in {\tt ?x1}. The latter form is possible, if the
   213 These components are either explicitly separated by a dot as in {\tt ?x.1} or
   212 base name does not end with digits; if the index is 0, it may be dropped
   214 {\tt ?x7.3} or directly run together as in {\tt ?x1}. The latter form is
   213 altogether: {\tt ?x} abbreviates {\tt ?x0} or {\tt ?x.0}.
   215 possible, if the base name does not end with digits. And if the index is 0,
       
   216 it may be dropped altogether: {\tt ?x} abbreviates {\tt ?x0} or {\tt ?x.0}.
       
   217 
   214 
   218 Note that $id$, $var$, $tfree$, $tvar$ are mutually disjoint, but it is
   215 Note that $id$, $var$, $tfree$, $tvar$ are mutually disjoint, but it is
   219 perfectly legal that they overlap with the set of literal tokens (e.g.\ {\tt
   216 perfectly legal that they overlap with the set of delimiters (e.g.\ {\tt
   220 PROP}, {\tt ALL}, {\tt EX}).
   217   PROP}, {\tt ALL}, {\tt EX}).
   221 
   218 
   222 The lexical analyzer translates input strings to token lists by repeatedly
   219 The lexical analyzer translates input strings to token lists by repeatedly
   223 taking the maximal prefix of the input string that forms a valid token. A
   220 taking the maximal prefix of the input string that forms a valid token. A
   224 maximal prefix that is both a literal and a valued token is treated as a
   221 maximal prefix that is both a delimiter and a name is treated as a delimiter.
   225 literal. Spaces, tabs and newlines are separators; they never occur within
   222 Spaces, tabs and newlines are separators; they never occur within tokens.
   226 tokens.
   223 
   227 
   224 Note that delimiters need not necessarily be surrounded by white space to be
   228 Note that literals need not necessarily be surrounded by white space to be
   225 recognized as separate. For example, if {\tt -} is a delimiter but {\tt --}
   229 recognized as separate. For example, if {\tt -} is a literal but {\tt --} is
   226 is not, then the string {\tt --} is treated as two consecutive occurrences of
   230 not, then the string {\tt --} is treated as two consecutive occurrences of
   227 {\tt -}. This is in contrast to \ML\ which always treats {\tt --} as a single
   231 {\tt -}. This is in contrast to \ML\ which would treat {\tt --} always as a
   228 symbolic name. The consequence of Isabelle's more liberal scheme is that the
   232 single symbolic name. The consequence of Isabelle's more liberal scheme is
   229 same string may be parsed in different ways after extending the syntax: after
   233 that the same string may be parsed in different ways after extending the
   230 adding {\tt --} as a delimiter, the input {\tt --} is treated as a single
   234 syntax: after adding {\tt --} as a literal, the input {\tt --} is treated as
   231 token.
   235 a single token.
       
   236 
   232 
   237 
   233 
   238 \subsection{Inspecting syntax *}
   234 \subsection{Inspecting syntax *}
   239 
   235 
   240 You may get the \ML\ representation of the syntax of any Isabelle theory by
   236 You may get the \ML\ representation of the syntax of any Isabelle theory by
   249   Syntax.print_syntax: Syntax.syntax -> unit
   245   Syntax.print_syntax: Syntax.syntax -> unit
   250   Syntax.print_gram: Syntax.syntax -> unit
   246   Syntax.print_gram: Syntax.syntax -> unit
   251   Syntax.print_trans: Syntax.syntax -> unit
   247   Syntax.print_trans: Syntax.syntax -> unit
   252 \end{ttbox}
   248 \end{ttbox}
   253 {\tt Syntax.print_syntax} shows virtually all information contained in a
   249 {\tt Syntax.print_syntax} shows virtually all information contained in a
   254 syntax, therefore being quite verbose. Its output is divided into labeled
   250 syntax. Its output is divided into labeled sections. The syntax proper is
   255 sections. The syntax proper is represented by {\tt lexicon}, {\tt roots} and
   251 represented by {\tt lexicon}, {\tt roots} and {\tt prods}. The rest refers to
   256 {\tt prods}. The rest refers to the manifold facilities to apply syntactic
   252 the manifold facilities to apply syntactic translations (macro expansion
   257 translations (macro expansion etc.).
   253 etc.).
   258 
   254 
   259 To simplify coping with the verbosity of {\tt Syntax.print_syntax}, there are
   255 To cope with the verbosity of {\tt Syntax.print_syntax}, there are
   260 \ttindex{Syntax.print_gram} to print the syntax proper only and
   256 \ttindex{Syntax.print_gram} and \ttindex{Syntax.print_trans} to print the
   261 \ttindex{Syntax.print_trans} to print the translation related information
   257 syntax proper and the translation related information only.
   262 only.
       
   263 
   258 
   264 Let's have a closer look at part of Pure's syntax:
   259 Let's have a closer look at part of Pure's syntax:
   265 \begin{ttbox}
   260 \begin{ttbox}
   266 Syntax.print_syntax (syn_of Pure.thy);
   261 Syntax.print_syntax (syn_of Pure.thy);
   267 {\out lexicon: "!!" "%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
   262 {\out lexicon: "!!" "%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
   282 {\out print_rules:}
   277 {\out print_rules:}
   283 {\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
   278 {\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
   284 \end{ttbox}
   279 \end{ttbox}
   285 
   280 
   286 \begin{description}
   281 \begin{description}
   287   \item[\ttindex{lexicon}]
   282   \item[\ttindex{lexicon}] The set of delimiters used for lexical analysis.
   288     The set of literal tokens (i.e.\ reserved words, delimiters) used for
       
   289     lexical analysis.
       
   290 
   283 
   291   \item[\ttindex{roots}]
   284   \item[\ttindex{roots}]
   292     The legal syntactic categories to start parsing with. You name the
   285     The legal syntactic categories to start parsing with. You name the
   293     desired root directly as a string when calling lower level functions or
   286     desired root directly as a string when calling lower level functions or
   294     specifying macros. Higher level functions usually expect a type and
   287     specifying macros. Higher level functions usually expect a type and
   306     constructor and $\alpha$ a type variable or unknown. Note that only the
   299     constructor and $\alpha$ a type variable or unknown. Note that only the
   307     outermost type constructor is taken into account.
   300     outermost type constructor is taken into account.
   308 
   301 
   309   \item[\ttindex{prods}]
   302   \item[\ttindex{prods}]
   310     The list of productions describing the precedence grammar. Nonterminals
   303     The list of productions describing the precedence grammar. Nonterminals
   311     $A@n$ are rendered in {\sc ascii} as {\tt $A$[$n$]}, literal tokens are
   304     $A@n$ are rendered in {\sc ascii} as {\tt $A$[$n$]}, delimiters are
   312     quoted. Some productions have strings attached after an {\tt =>}. These
   305     quoted. Some productions have strings attached after an {\tt =>}. These
   313     strings later become the heads of parse trees, but they also play a vital
   306     strings later become the heads of parse trees, but they also play a vital
   314     role when terms are printed (see \S\ref{sec:asts}).
   307     role when terms are printed (see \S\ref{sec:asts}).
   315 
   308 
   316     Productions which do not have string attached and thus do not create a
   309     Productions which do not have string attached and thus do not create a
   317     new parse tree node are called {\bf copy productions}\indexbold{copy
   310     new parse tree node are called {\bf copy productions}\indexbold{copy
   318     production}. They must have exactly one
   311     production}. They must have exactly one
   319     argument\index{argument!production} (i.e.\ nonterminal or valued token)
   312     argument\index{argument!production} (i.e.\ nonterminal or name token)
   320     on the right-hand side. The parse tree generated when parsing that
   313     on the right-hand side. The parse tree generated when parsing that
   321     argument is simply passed up as the result of parsing the whole copy
   314     argument is simply passed up as the result of parsing the whole copy
   322     production.
   315     production.
   323 
   316 
   324     A special kind of copy production is one where the argument is a
   317     A special kind of copy production is one where the argument is a
   325     nonterminal and no additional syntactic sugar (literals) exists. It is
   318     nonterminal and no additional syntactic sugar (delimiters) exists. It is
   326     called a \bfindex{chain production}. Chain productions should be seen as
   319     called a \bfindex{chain production}. Chain productions should be seen as
   327     an abbreviation mechanism: conceptually, they are removed from the
   320     an abbreviation mechanism: conceptually, they are removed from the
   328     grammar by adding appropriate new rules. Precedence information attached
   321     grammar by adding appropriate new rules. Precedence information attached
   329     to chain productions is ignored, only the dummy value $-1$ is displayed.
   322     to chain productions is ignored, only the dummy value $-1$ is displayed.
   330 
   323 
   336     \index{*print_translation} \index{*print_ast_translation}
   329     \index{*print_translation} \index{*print_ast_translation}
   337     The sets of constants that invoke translation functions. These are more
   330     The sets of constants that invoke translation functions. These are more
   338     arcane matters (see \S\ref{sec:asts} and \S\ref{sec:tr_funs}).
   331     arcane matters (see \S\ref{sec:asts} and \S\ref{sec:tr_funs}).
   339 \end{description}
   332 \end{description}
   340 
   333 
   341 Of course you may inspect the syntax of any theory using the above calling
   334 You may inspect the syntax of any theory using the above calling
   342 sequence. Beware that, as more and more material accumulates, the output
   335 sequence. Beware that, as more and more material accumulates, the output
   343 becomes even more verbose. When extending syntaxes, new {\tt roots}, {\tt
   336 becomes longer and longer.
   344 prods}, {\tt parse_rules} and {\tt print_rules} are appended to the end. The
   337 % When extending syntaxes, new {\tt roots}, {\tt prods}, {\tt parse_rules}
   345 other lists are displayed sorted.
   338 % and {\tt print_rules} are appended to the end. The other lists are
   346 
   339 % displayed sorted.
   347 
   340 
   348 
   341 
   349 \section{Abstract syntax trees} \label{sec:asts}
   342 \section{Abstract syntax trees} \label{sec:asts}
   350 
   343 
   351 Figure~\ref{fig:parse_print} shows a simplified model of the parsing and
   344 Figure~\ref{fig:parse_print} shows a simplified model of the parsing and
   385 The printing process is the reverse, except for some subtleties to be
   378 The printing process is the reverse, except for some subtleties to be
   386 discussed later.
   379 discussed later.
   387 
   380 
   388 Asts are an intermediate form between the very raw parse trees and the typed
   381 Asts are an intermediate form between the very raw parse trees and the typed
   389 $\lambda$-terms. An ast is either an atom (constant or variable) or a list of
   382 $\lambda$-terms. An ast is either an atom (constant or variable) or a list of
   390 {\em at least two\/} subasts. Internally, they have type \ttindex{Syntax.ast}
   383 {\em at least two\/} subasts. Internally, they have type
   391 which is defined as: \index{*Constant} \index{*Variable} \index{*Appl}
   384 \ttindex{Syntax.ast}: \index{*Constant} \index{*Variable} \index{*Appl}
   392 \begin{ttbox}
   385 \begin{ttbox}
   393 datatype ast =
   386 datatype ast = Constant of string
   394   Constant of string |
   387              | Variable of string
   395   Variable of string |
   388              | Appl of ast list
   396   Appl of ast list
   389 \end{ttbox}
   397 \end{ttbox}
   390 
   398 
   391 We write constant atoms as quoted strings, variable atoms as non-quoted
   399 Notation: We write constant atoms as quoted strings, variable atoms as
   392 strings and applications as list of subasts separated by space and enclosed
   400 non-quoted strings and applications as list of subasts separated by space and
   393 in parentheses. For example:
   401 enclosed in parentheses. For example:
       
   402 \begin{ttbox}
   394 \begin{ttbox}
   403   Appl [Constant "_constrain",
   395   Appl [Constant "_constrain",
   404     Appl [Constant "_abs", Variable "x", Variable "t"],
   396     Appl [Constant "_abs", Variable "x", Variable "t"],
   405     Appl [Constant "fun", Variable "'a", Variable "'b"]]
   397     Appl [Constant "fun", Variable "'a", Variable "'b"]]
   406   {\rm is written as} ("_constrain" ("_abs" x t) ("fun" 'a 'b))
   398 \end{ttbox}
   407 \end{ttbox}
   399 is written as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
   408 
       
   409 Note that {\tt ()} and {\tt (f)} are both illegal.
   400 Note that {\tt ()} and {\tt (f)} are both illegal.
   410 
   401 
   411 The resemblance of Lisp's S-expressions is intentional, but notice the two
   402 The resemblance of Lisp's S-expressions is intentional, but notice the two
   412 kinds of atomic symbols: $\Constant x$ and $\Variable x$. This distinction
   403 kinds of atomic symbols: $\Constant x$ and $\Variable x$. This distinction
   413 has some more obscure reasons and you can ignore it about half of the time.
   404 has some more obscure reasons and you can ignore it about half of the time.
   445 
   436 
   446 Ignoring parse ast translations, the mapping
   437 Ignoring parse ast translations, the mapping
   447 $ast_of_pt$\index{ast_of_pt@$ast_of_pt$} from parse trees to asts is derived
   438 $ast_of_pt$\index{ast_of_pt@$ast_of_pt$} from parse trees to asts is derived
   448 from the productions involved as follows:
   439 from the productions involved as follows:
   449 \begin{itemize}
   440 \begin{itemize}
   450   \item Valued tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$,
   441   \item Name tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$,
   451     $var$, $tfree$ or $tvar$ token, and $s$ its value.
   442     $var$, $tfree$ or $tvar$ token, and $s$ its associated string.
   452 
   443 
   453   \item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$.
   444   \item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$.
   454 
   445 
   455   \item $0$-ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$.
   446   \item $0$-ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$.
   456 
   447 
   457   \item $n$-ary productions: $ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>}
   448   \item $n$-ary productions: $ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>}
   458     c) = \Appl{\Constant c,$ $ast_of_pt(P@1),$ $\ldots,$ $ast_of_pt(P@n)}$,
   449     c) = \Appl{\Constant c,$ $ast_of_pt(P@1),$ $\ldots,$ $ast_of_pt(P@n)}$,
   459     where $n \ge 1$.
   450     where $n \ge 1$.
   460 \end{itemize}
   451 \end{itemize}
   461 Thereby $P, P@1, \ldots, P@n$ denote sub parse trees or valued tokens and
   452 where $P, P@1, \ldots, P@n$ denote parse trees or name tokens and ``\dots''
   462 ``\dots'' zero or more literal tokens. That means literal tokens are stripped
   453 zero or more delimiters. Delimiters are stripped and do not appear in asts.
   463 and do not appear in asts.
       
   464 
       
   465 The following table presents some simple examples:
   454 The following table presents some simple examples:
   466 
   455 \begin{center}
   467 {\tt\begin{tabular}{ll}
   456 {\tt\begin{tabular}{ll}
   468 \rm input string    & \rm ast \\\hline
   457 \rm input string    & \rm ast \\\hline
   469 "f"                 & f \\
   458 "f"                 & f \\
   470 "'a"                & 'a \\
   459 "'a"                & 'a \\
   471 "t == u"            & ("==" t u) \\
   460 "t == u"            & ("==" t u) \\
   472 "f(x)"              & ("_appl" f x) \\
   461 "f(x)"              & ("_appl" f x) \\
   473 "f(x, y)"           & ("_appl" f ("_args" x y)) \\
   462 "f(x, y)"           & ("_appl" f ("_args" x y)) \\
   474 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
   463 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
   475 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
   464 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
   476 \end{tabular}}
   465 \end{tabular}}  
   477 
   466 \end{center}
   478 Some of these examples illustrate why further translations are desirable in
   467 Some of these examples illustrate why further translations are desirable in
   479 order to provide some nice standard form of asts before macro expansion takes
   468 order to provide some nice standard form of asts before macro expansion takes
   480 place. Hence the Pure syntax provides predefined parse ast
   469 place. Hence the Pure syntax provides predefined parse ast
   481 translations\index{parse ast translation!of Pure} for ordinary applications,
   470 translations\index{parse ast translation!of Pure} for ordinary applications,
   482 type applications, nested abstractions, meta implications and function types.
   471 type applications, nested abstractions, meta implications and function types.
   642   \item $sy$ is the right-hand side of the production, specified as a
   631   \item $sy$ is the right-hand side of the production, specified as a
   643     symbolic string. In general, $sy$ is of the form $\alpha@0 \_ \alpha@1
   632     symbolic string. In general, $sy$ is of the form $\alpha@0 \_ \alpha@1
   644     \dots \alpha@{n-1} \_ \alpha@n$, where each occurrence of \ttindex{_}
   633     \dots \alpha@{n-1} \_ \alpha@n$, where each occurrence of \ttindex{_}
   645     denotes an argument\index{argument!mixfix} position and the strings
   634     denotes an argument\index{argument!mixfix} position and the strings
   646     $\alpha@i$ do not contain (non-escaped) {\tt _}. The $\alpha@i$ may
   635     $\alpha@i$ do not contain (non-escaped) {\tt _}. The $\alpha@i$ may
   647     consist of delimiters\index{delimiter},
   636     consist of delimiters\index{delimiter}, spaces\index{space (pretty
   648     spaces\index{space (pretty printing)} or \rmindex{pretty printing}
   637       printing)} or \rmindex{pretty printing} annotations (see below).
   649     annotations (see below).
       
   650 
   638 
   651   \item $\tau$ specifies the syntactic categories of the arguments on the
   639   \item $\tau$ specifies the syntactic categories of the arguments on the
   652     left-hand and of the right-hand side. Arguments may be nonterminals or
   640     left-hand and of the right-hand side. Arguments may be nonterminals or
   653     valued tokens. If $sy$ is of the form above, $\tau$ must be a nested
   641     name tokens. If $sy$ is of the form above, $\tau$ must be a nested
   654     function type with at least $n$ argument positions, say $\tau = [\tau@1,
   642     function type with at least $n$ argument positions, say $\tau = [\tau@1,
   655     \dots, \tau@n] \To \tau'$. The syntactic category of argument $i$ is
   643     \dots, \tau@n] \To \tau'$. The syntactic category of argument $i$ is
   656     derived from type $\tau@i$ (see $root_of_type$ in
   644     derived from type $\tau@i$ (see $root_of_type$ in
   657     \S\ref{sec:basic_syntax}). The result, i.e.\ the left-hand side of the
   645     \S\ref{sec:basic_syntax}). The result, i.e.\ the left-hand side of the
   658     production, is derived from type $\tau'$. Note that the $\tau@i$ and
   646     production, is derived from type $\tau'$. Note that the $\tau@i$ and
   735 
   723 
   736 \begin{description}
   724 \begin{description}
   737   \item[~\ttindex_~] An argument\index{argument!mixfix} position.
   725   \item[~\ttindex_~] An argument\index{argument!mixfix} position.
   738 
   726 
   739   \item[~$d$~] A \rmindex{delimiter}, i.e.\ a non-empty sequence of
   727   \item[~$d$~] A \rmindex{delimiter}, i.e.\ a non-empty sequence of
   740     non-special or escaped characters. Escaping a
   728     non-special or escaped characters. Escaping a character\index{escape
   741     character\index{escape character} means preceding it with a {\tt '}
   729       character} means preceding it with a {\tt '} (quote). Thus you have to
   742     (quote). Thus you have to write {\tt ''} if you really want a single
   730     write {\tt ''} if you really want a single quote. Delimiters may never
   743     quote. Delimiters may never contain white space, though.
   731     contain white space, though.
   744 
   732 
   745   \item[~$s$~] A non-empty sequence of spaces\index{space (pretty printing)}
   733   \item[~$s$~] A non-empty sequence of spaces\index{space (pretty printing)}
   746     for printing.
   734     for printing.
   747 
   735 
   748   \item[~{\ttindex($n$}~] Open a block\index{block (pretty printing)}. $n$ is
   736   \item[~{\ttindex($n$}~] Open a block\index{block (pretty printing)}. $n$ is
   757   \item[~\ttindex/$s$~] Allow a line break\index{break (pretty printing)}.
   745   \item[~\ttindex/$s$~] Allow a line break\index{break (pretty printing)}.
   758     Spaces $s$ right after {\tt /} are only printed if the break is not
   746     Spaces $s$ right after {\tt /} are only printed if the break is not
   759     taken.
   747     taken.
   760 \end{description}
   748 \end{description}
   761 
   749 
   762 In terms of parsing, arguments are nonterminals or valued tokens, while
   750 In terms of parsing, arguments are nonterminals or name tokens and
   763 delimiters are literal tokens. The other directives have only significance
   751 delimiters are delimiters. The other directives have only significance for
   764 for printing. The \rmindex{pretty printing} mechanisms of Isabelle is
   752 printing. The \rmindex{pretty printing} mechanisms of Isabelle is essentially
   765 essentially the one described in \cite{paulson91}.
   753 the one described in \cite{paulson91}.
   766 
   754 
   767 
   755 
   768 \subsection{Infixes}
   756 \subsection{Infixes}
   769 
   757 
   770 Infix\index{infix} operators associating to the left or right can be declared
   758 Infix\index{infix} operators associating to the left or right can be declared
   876   "ALL x:A. P"  == "Ball(A, %x. P)"
   864   "ALL x:A. P"  == "Ball(A, %x. P)"
   877 end
   865 end
   878 \end{ttbox}
   866 \end{ttbox}
   879 
   867 
   880 This and the following theories are complete working examples, though they
   868 This and the following theories are complete working examples, though they
   881 are fragmentary as they contain merely syntax. They are somewhat fashioned
   869 are fragmentary as they contain merely syntax. The full theory definition can
   882 after {\tt ZF/zf.thy}, where you should look for a good real-world example.
   870 be found in {\tt ZF/zf.thy}.
   883 
   871 
   884 {\tt SET} defines constants for set comprehension ({\tt Collect}),
   872 {\tt SET} defines constants for set comprehension ({\tt Collect}),
   885 replacement ({\tt Replace}) and bounded universal quantification ({\tt
   873 replacement ({\tt Replace}) and bounded universal quantification ({\tt
   886 Ball}). Without additional syntax you would have to express $\forall x \in A.
   874 Ball}). Without additional syntax you would have to express $\forall x \in A.
   887 P(x)$ as {\tt Ball(A, P)}. Since this is quite awkward, we define additional
   875 P(x)$ as {\tt Ball(A, P)}. Since this is quite awkward, we define additional
   902 
   890 
   903 
   891 
   904 \subsection{Specifying macros}
   892 \subsection{Specifying macros}
   905 
   893 
   906 Basically macros are rewrite rules on asts. But unlike other macro systems of
   894 Basically macros are rewrite rules on asts. But unlike other macro systems of
   907 various programming languages, Isabelle's macros work two way. Therefore a
   895 various programming languages, Isabelle's macros work in both directions.
   908 syntax contains two lists of rules: one for parsing and one for printing.
   896 Therefore a syntax contains two lists of rules: one for parsing and one for
       
   897 printing.
   909 
   898 
   910 The {\tt translations} section\index{translations section@{\tt translations}
   899 The {\tt translations} section\index{translations section@{\tt translations}
   911 section} consists of a list of rule specifications of the form:
   900 section} consists of a list of rule specifications of the form:
   912 
   901 
   913 \begin{center}
   902 \begin{center}
   914 {\tt $[$ ($root$) $]$ $string$ $[$ => $|$ <= $|$ == $]$ $[$ ($root$) $]$
   903 {\tt $[$ ($root$) $]$ $string$ $[$ => $|$ <= $|$ == $]$ $[$ ($root$) $]$
   915 $string$}.
   904 $string$}.
   916 \end{center}
   905 \end{center}
   917 
   906 
   918 This specifies a \rmindex{parse rule} ({\tt =>}) a \rmindex{print rule} ({\tt
   907 This specifies a \rmindex{parse rule} ({\tt =>}) a \rmindex{print rule} ({\tt
   919 <=}) or both ({\tt ==}). The two $string$s preceded by optional parenthesized
   908   <=}) or both ({\tt ==}). The two $string$s preceded by optional
   920 $root$s denote the left-hand and right-hand side of the rule as `source
   909 parenthesized $root$s are the left-hand and right-hand side of the rule in
   921 code', i.e.\ in the usual syntax of terms.
   910 user-defined syntax.
   922 
   911 
   923 Rules are internalized wrt.\ an intermediate signature that is obtained from
   912 Rules are internalized wrt.\ an intermediate signature that is obtained from
   924 the parent theories' ones by adding all material of all sections preceding
   913 the parent theories' ones by adding all material of all sections preceding
   925 {\tt translations} in the {\tt .thy} file. Especially, new syntax defined in
   914 {\tt translations} in the {\tt .thy} file. Especially, new syntax defined in
   926 {\tt consts} is already effective.
   915 {\tt consts} is already effective.
  1342 \begin{example} \label{ex:tr_funs}
  1331 \begin{example} \label{ex:tr_funs}
  1343 
  1332 
  1344 We continue Example~\ref{ex:prod_trans} by presenting the \ML\ sources of the
  1333 We continue Example~\ref{ex:prod_trans} by presenting the \ML\ sources of the
  1345 parse translation for \ttindex{_K} and the print translation
  1334 parse translation for \ttindex{_K} and the print translation
  1346 \ttindex{dependent_tr'}:
  1335 \ttindex{dependent_tr'}:
  1347 
       
  1348 \begin{ttbox}
  1336 \begin{ttbox}
  1349 (* nondependent abstraction *)
  1337 (* nondependent abstraction *)
  1350 
       
  1351 fun k_tr (*"_K"*) [t] = Abs ("x", dummyT, incr_boundvars 1 t)
  1338 fun k_tr (*"_K"*) [t] = Abs ("x", dummyT, incr_boundvars 1 t)
  1352   | k_tr (*"_K"*) ts = raise_term "k_tr" ts;
  1339   | k_tr (*"_K"*) ts = raise TERM("k_tr",ts);
  1353 
  1340 
  1354 (* dependent / nondependent quantifiers *)
  1341 (* dependent / nondependent quantifiers *)
  1355 
       
  1356 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
  1342 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
  1357       if 0 mem (loose_bnos B) then
  1343       if 0 mem (loose_bnos B) then
  1358         let val (x', B') = variant_abs (x, dummyT, B);
  1344         let val (x', B') = variant_abs (x, dummyT, B);
  1359         in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
  1345         in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
  1360         end
  1346         end
  1361       else list_comb (Const (r, dummyT) $ A $ B, ts)
  1347       else list_comb (Const (r, dummyT) $ A $ B, ts)
  1362   | dependent_tr' _ _ = raise Match;
  1348   | dependent_tr' _ _ = raise Match;
  1363 \end{ttbox}
  1349 \end{ttbox}
  1364 
  1350 This text is taken from the Pure sources. Ordinary user translations
  1365 This text is taken from the Pure sources, ordinary user translations would
  1351 would typically appear within the {\tt ML} section of the {\tt .thy} file. To
  1366 typically appear within the {\tt ML} section of the {\tt .thy} file.
  1352 link {\tt k_tr} into the parser requires the definition
  1367 
  1353 \begin{ttbox}
  1368 \medskip
  1354 val parse_translation = [("_K",k_tr)];
       
  1355 \end{ttbox}
       
  1356 For {\tt k_tr} this is not necessary, as it is built-in, but user-defined
       
  1357 translation functions are added in exactly the same way.  \medskip
       
  1358 
  1369 If {\tt k_tr} is called with exactly one argument $t$, it creates a new {\tt
  1359 If {\tt k_tr} is called with exactly one argument $t$, it creates a new {\tt
  1370 Abs} node with a body derived from $t$: loose {\tt Bound}s, i.e.\ those
  1360 Abs} node with a body derived from $t$: loose {\tt Bound}s, i.e.\ those
  1371 referring to outer {\tt Abs}s, are incremented using
  1361 referring to outer {\tt Abs}s, are incremented using
  1372 \ttindex{incr_boundvars}. This and many other basic term manipulation
  1362 \ttindex{incr_boundvars}. This and many other basic term manipulation
  1373 functions are defined in {\tt Pure/term.ML}, the comments there being in most
  1363 functions are defined in {\tt Pure/term.ML}, the comments there being in most