isabelle: comparison doc-src/Logics/defining.tex

equal deleted inserted replaced

-:8729bfdcb638
+:6be0856cdf49
 chief purpose is a thorough description of all syntax related matters
 concerning theories. The most important sections are \S\ref{sec:mixfix} about
 mixfix declarations and \S\ref{sec:macros} describing the macro system. The
 concluding examples of \S\ref{sec:min_logics} are more concerned with the
 logical aspects of the definition of theories. Sections marked with * can be
-skipped on the first reading.
+skipped on first reading.
 \section{Precedence grammars} \label{sec:precedence_grammars}
 \subsection{Lexical matters *}
 The parser does not process input strings directly, it rather operates on
 token lists provided by Isabelle's \rmindex{lexical analyzer} (the
 \bfindex{lexer}). There are two different kinds of tokens: {\bf
-literals}\indexbold{literal token}\indexbold{token!literal} and {\bf valued
+delimiters}\indexbold{delimiter} and {\bf names}\indexbold{name}.
-tokens}\indexbold{valued token}\indexbold{token!valued}.
+Delimiters can be regarded as reserved words\index{reserved word} of the
-Literals can be regarded as reserved words\index{reserved word} of the syntax
+syntax and the user can add new ones, when extending theories. In
-and the user can add new ones, when extending theories. In
+Figure~\ref{fig:pure_gram} they appear in typewriter font, e.g.\ {\tt PROP},
-Figure~\ref{fig:pure_gram} they appear in typewriter type, e.g.\ {\tt PROP},
 {\tt ==}, {\tt =?=}, {\tt ;}.
-Valued tokens on the other hand have a fixed predefined syntax. The lexer
+Names on the other hand have a fixed predefined syntax. The lexer
 distinguishes four kinds of them: identifiers\index{identifier},
 unknowns\index{unknown}\index{scheme variable|see{unknown}}, type
 variables\index{type variable}, type unknowns\index{type unknown}\index{type
 scheme variable|see{type unknown}}; they are denoted by $id$\index{id@$id$},
 $var$\index{var@$var$}, $tfree$\index{tfree@$tfree$},
 $tvar$\index{tvar@$tvar$}, respectively. Typical examples are {\tt x}, {\tt
 ?x7}, {\tt 'a}, {\tt ?'a3}, the exact syntax is:
+\begin{eqnarray*}
-\begin{tabular}{rcl}
+id        & =   & letter~quasiletter^* \\
-$id$        & =   & $letter~quasiletter^*$ \\
+var       & =   & \mbox{\tt ?}id ~~|~~ \mbox{\tt ?}id\mbox{\tt .}nat \\
-$var$       & =   & ${\tt ?}id ~~|~~ {\tt ?}id{\tt .}nat$ \\
+tfree     & =   & \mbox{\tt '}id \\
-$tfree$     & =   & ${\tt '}id$ \\
+tvar      & =   & \mbox{\tt ?}tfree ~~|~~
-$tvar$      & =   & ${\tt ?}tfree ~~|~~ {\tt ?}tfree{\tt .}nat$ \\[1ex]
+\mbox{\tt ?}tfree\mbox{\tt .}nat \\[1ex]
+letter    & =   & \mbox{one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z}} \\
-$letter$    & =   & one of {\tt a}\dots {\tt z} {\tt A}\dots {\tt Z} \\
+digit     & =   & \mbox{one of {\tt 0}\dots {\tt 9}} \\
-$digit$     & =   & one of {\tt 0}\dots {\tt 9} \\
+quasiletter & =  & letter ~~|~~ digit ~~|~~ \mbox{\tt _} ~~|~~ \mbox{\tt '} \\
-$quasiletter$ & =  & $letter ~~|~~ digit ~~|~~ {\tt _} ~~|~~ {\tt '}$ \\
+nat       & =   & digit^+
-$nat$       & =   & $digit^+$
+\end{eqnarray*}
-\end{tabular}
+A $var$ or $tvar$ describes an \rmindex{unknown} which is internally a pair
+of base name and index (\ML\ type \ttindex{indexname}).  These components are
-A string of $var$ or $tvar$ describes an \rmindex{unknown} which is
+either explicitly separated by a dot as in {\tt ?x.1} or {\tt ?x7.3} or
-internally a pair of base name and index (\ML\ type \ttindex{indexname}).
+directly run together as in {\tt ?x1}. The latter form is possible, if the
-These components are either explicitly separated by a dot as in {\tt ?x.1} or
+base name does not end with digits; if the index is 0, it may be dropped
-{\tt ?x7.3} or directly run together as in {\tt ?x1}. The latter form is
+altogether: {\tt ?x} abbreviates {\tt ?x0} or {\tt ?x.0}.
-possible, if the base name does not end with digits. And if the index is 0,
-it may be dropped altogether: {\tt ?x} abbreviates {\tt ?x0} or {\tt ?x.0}.
 Note that $id$, $var$, $tfree$, $tvar$ are mutually disjoint, but it is
-perfectly legal that they overlap with the set of literal tokens (e.g.\ {\tt
+perfectly legal that they overlap with the set of delimiters (e.g.\ {\tt
 PROP}, {\tt ALL}, {\tt EX}).
 The lexical analyzer translates input strings to token lists by repeatedly
 taking the maximal prefix of the input string that forms a valid token. A
-maximal prefix that is both a literal and a valued token is treated as a
+maximal prefix that is both a delimiter and a name is treated as a delimiter.
-literal. Spaces, tabs and newlines are separators; they never occur within
+Spaces, tabs and newlines are separators; they never occur within tokens.
-tokens.
+Note that delimiters need not necessarily be surrounded by white space to be
-Note that literals need not necessarily be surrounded by white space to be
+recognized as separate. For example, if {\tt -} is a delimiter but {\tt --}
-recognized as separate. For example, if {\tt -} is a literal but {\tt --} is
+is not, then the string {\tt --} is treated as two consecutive occurrences of
-not, then the string {\tt --} is treated as two consecutive occurrences of
+{\tt -}. This is in contrast to \ML\ which always treats {\tt --} as a single
-{\tt -}. This is in contrast to \ML\ which would treat {\tt --} always as a
+symbolic name. The consequence of Isabelle's more liberal scheme is that the
-single symbolic name. The consequence of Isabelle's more liberal scheme is
+same string may be parsed in different ways after extending the syntax: after
-that the same string may be parsed in different ways after extending the
+adding {\tt --} as a delimiter, the input {\tt --} is treated as a single
-syntax: after adding {\tt --} as a literal, the input {\tt --} is treated as
+token.
-a single token.
 \subsection{Inspecting syntax *}
 You may get the \ML\ representation of the syntax of any Isabelle theory by
 Syntax.print_syntax: Syntax.syntax -> unit
 Syntax.print_gram: Syntax.syntax -> unit
 Syntax.print_trans: Syntax.syntax -> unit
 \end{ttbox}
 {\tt Syntax.print_syntax} shows virtually all information contained in a
-syntax, therefore being quite verbose. Its output is divided into labeled
+syntax. Its output is divided into labeled sections. The syntax proper is
-sections. The syntax proper is represented by {\tt lexicon}, {\tt roots} and
+represented by {\tt lexicon}, {\tt roots} and {\tt prods}. The rest refers to
-{\tt prods}. The rest refers to the manifold facilities to apply syntactic
+the manifold facilities to apply syntactic translations (macro expansion
-translations (macro expansion etc.).
+etc.).
-To simplify coping with the verbosity of {\tt Syntax.print_syntax}, there are
+To cope with the verbosity of {\tt Syntax.print_syntax}, there are
-\ttindex{Syntax.print_gram} to print the syntax proper only and
+\ttindex{Syntax.print_gram} and \ttindex{Syntax.print_trans} to print the
-\ttindex{Syntax.print_trans} to print the translation related information
+syntax proper and the translation related information only.
-only.
 Let's have a closer look at part of Pure's syntax:
 \begin{ttbox}
 Syntax.print_syntax (syn_of Pure.thy);
 {\out lexicon: "!!" "%" "(" ")" "," "." "::" ";" "==" "==>" \dots}
 {\out print_rules:}
 {\out print_ast_translation: "==>" "_abs" "_idts" "fun"}
 \end{ttbox}
 \begin{description}
-\item[\ttindex{lexicon}]
+\item[\ttindex{lexicon}] The set of delimiters used for lexical analysis.
-The set of literal tokens (i.e.\ reserved words, delimiters) used for
-lexical analysis.
 \item[\ttindex{roots}]
 The legal syntactic categories to start parsing with. You name the
 desired root directly as a string when calling lower level functions or
 specifying macros. Higher level functions usually expect a type and
 constructor and $\alpha$ a type variable or unknown. Note that only the
 outermost type constructor is taken into account.
 \item[\ttindex{prods}]
 The list of productions describing the precedence grammar. Nonterminals
-$A@n$ are rendered in {\sc ascii} as {\tt $A$[$n$]}, literal tokens are
+$A@n$ are rendered in {\sc ascii} as {\tt $A$[$n$]}, delimiters are
 quoted. Some productions have strings attached after an {\tt =>}. These
 strings later become the heads of parse trees, but they also play a vital
 role when terms are printed (see \S\ref{sec:asts}).
 Productions which do not have string attached and thus do not create a
 new parse tree node are called {\bf copy productions}\indexbold{copy
 production}. They must have exactly one
-argument\index{argument!production} (i.e.\ nonterminal or valued token)
+argument\index{argument!production} (i.e.\ nonterminal or name token)
 on the right-hand side. The parse tree generated when parsing that
 argument is simply passed up as the result of parsing the whole copy
 production.
 A special kind of copy production is one where the argument is a
-nonterminal and no additional syntactic sugar (literals) exists. It is
+nonterminal and no additional syntactic sugar (delimiters) exists. It is
 called a \bfindex{chain production}. Chain productions should be seen as
 an abbreviation mechanism: conceptually, they are removed from the
 grammar by adding appropriate new rules. Precedence information attached
 to chain productions is ignored, only the dummy value $-1$ is displayed.
 \index{*print_translation} \index{*print_ast_translation}
 The sets of constants that invoke translation functions. These are more
 arcane matters (see \S\ref{sec:asts} and \S\ref{sec:tr_funs}).
 \end{description}
-Of course you may inspect the syntax of any theory using the above calling
+You may inspect the syntax of any theory using the above calling
 sequence. Beware that, as more and more material accumulates, the output
-becomes even more verbose. When extending syntaxes, new {\tt roots}, {\tt
+becomes longer and longer.
-prods}, {\tt parse_rules} and {\tt print_rules} are appended to the end. The
+% When extending syntaxes, new {\tt roots}, {\tt prods}, {\tt parse_rules}
-other lists are displayed sorted.
+% and {\tt print_rules} are appended to the end. The other lists are
+% displayed sorted.
 \section{Abstract syntax trees} \label{sec:asts}
 Figure~\ref{fig:parse_print} shows a simplified model of the parsing and
 The printing process is the reverse, except for some subtleties to be
 discussed later.
 Asts are an intermediate form between the very raw parse trees and the typed
 $\lambda$-terms. An ast is either an atom (constant or variable) or a list of
-{\em at least two\/} subasts. Internally, they have type \ttindex{Syntax.ast}
+{\em at least two\/} subasts. Internally, they have type
-which is defined as: \index{*Constant} \index{*Variable} \index{*Appl}
+\ttindex{Syntax.ast}: \index{*Constant} \index{*Variable} \index{*Appl}
 \begin{ttbox}
-datatype ast =
+datatype ast = Constant of string
-Constant of string |
+| Variable of string
-Variable of string |
+| Appl of ast list
-Appl of ast list
+\end{ttbox}
-\end{ttbox}
+We write constant atoms as quoted strings, variable atoms as non-quoted
-Notation: We write constant atoms as quoted strings, variable atoms as
+strings and applications as list of subasts separated by space and enclosed
-non-quoted strings and applications as list of subasts separated by space and
+in parentheses. For example:
-enclosed in parentheses. For example:
 \begin{ttbox}
 Appl [Constant "_constrain",
 Appl [Constant "_abs", Variable "x", Variable "t"],
 Appl [Constant "fun", Variable "'a", Variable "'b"]]
-{\rm is written as} ("_constrain" ("_abs" x t) ("fun" 'a 'b))
+\end{ttbox}
-\end{ttbox}
+is written as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
 Note that {\tt ()} and {\tt (f)} are both illegal.
 The resemblance of Lisp's S-expressions is intentional, but notice the two
 kinds of atomic symbols: $\Constant x$ and $\Variable x$. This distinction
 has some more obscure reasons and you can ignore it about half of the time.
 Ignoring parse ast translations, the mapping
 $ast_of_pt$\index{ast_of_pt@$ast_of_pt$} from parse trees to asts is derived
 from the productions involved as follows:
 \begin{itemize}
-\item Valued tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$,
+\item Name tokens: $ast_of_pt(t) = \Variable s$, where $t$ is an $id$,
-$var$, $tfree$ or $tvar$ token, and $s$ its value.
+$var$, $tfree$ or $tvar$ token, and $s$ its associated string.
 \item Copy productions: $ast_of_pt(\ldots P \ldots) = ast_of_pt(P)$.
 \item $0$-ary productions: $ast_of_pt(\ldots \mtt{=>} c) = \Constant c$.
 \item $n$-ary productions: $ast_of_pt(\ldots P@1 \ldots P@n \ldots \mtt{=>}
 c) = \Appl{\Constant c,$ $ast_of_pt(P@1),$ $\ldots,$ $ast_of_pt(P@n)}$,
 where $n \ge 1$.
 \end{itemize}
-Thereby $P, P@1, \ldots, P@n$ denote sub parse trees or valued tokens and
+where $P, P@1, \ldots, P@n$ denote parse trees or name tokens and ``\dots''
-``\dots'' zero or more literal tokens. That means literal tokens are stripped
+zero or more delimiters. Delimiters are stripped and do not appear in asts.
-and do not appear in asts.
 The following table presents some simple examples:
+\begin{center}
 {\tt\begin{tabular}{ll}
 \rm input string    & \rm ast \\\hline
 "f"                 & f \\
 "'a"                & 'a \\
 "t == u"            & ("==" t u) \\
 "f(x)"              & ("_appl" f x) \\
 "f(x, y)"           & ("_appl" f ("_args" x y)) \\
 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
 \end{tabular}}
+\end{center}
 Some of these examples illustrate why further translations are desirable in
 order to provide some nice standard form of asts before macro expansion takes
 place. Hence the Pure syntax provides predefined parse ast
 translations\index{parse ast translation!of Pure} for ordinary applications,
 type applications, nested abstractions, meta implications and function types.
 \item $sy$ is the right-hand side of the production, specified as a
 symbolic string. In general, $sy$ is of the form $\alpha@0 \_ \alpha@1
 \dots \alpha@{n-1} \_ \alpha@n$, where each occurrence of \ttindex{_}
 denotes an argument\index{argument!mixfix} position and the strings
 $\alpha@i$ do not contain (non-escaped) {\tt _}. The $\alpha@i$ may
-consist of delimiters\index{delimiter},
+consist of delimiters\index{delimiter}, spaces\index{space (pretty
-spaces\index{space (pretty printing)} or \rmindex{pretty printing}
+printing)} or \rmindex{pretty printing} annotations (see below).
-annotations (see below).
 \item $\tau$ specifies the syntactic categories of the arguments on the
 left-hand and of the right-hand side. Arguments may be nonterminals or
-valued tokens. If $sy$ is of the form above, $\tau$ must be a nested
+name tokens. If $sy$ is of the form above, $\tau$ must be a nested
 function type with at least $n$ argument positions, say $\tau = [\tau@1,
 \dots, \tau@n] \To \tau'$. The syntactic category of argument $i$ is
 derived from type $\tau@i$ (see $root_of_type$ in
 \S\ref{sec:basic_syntax}). The result, i.e.\ the left-hand side of the
 production, is derived from type $\tau'$. Note that the $\tau@i$ and
 \begin{description}
 \item[~\ttindex_~] An argument\index{argument!mixfix} position.
 \item[~$d$~] A \rmindex{delimiter}, i.e.\ a non-empty sequence of
-non-special or escaped characters. Escaping a
+non-special or escaped characters. Escaping a character\index{escape
-character\index{escape character} means preceding it with a {\tt '}
+character} means preceding it with a {\tt '} (quote). Thus you have to
-(quote). Thus you have to write {\tt ''} if you really want a single
+write {\tt ''} if you really want a single quote. Delimiters may never
-quote. Delimiters may never contain white space, though.
+contain white space, though.
 \item[~$s$~] A non-empty sequence of spaces\index{space (pretty printing)}
 for printing.
 \item[~{\ttindex($n$}~] Open a block\index{block (pretty printing)}. $n$ is
 \item[~\ttindex/$s$~] Allow a line break\index{break (pretty printing)}.
 Spaces $s$ right after {\tt /} are only printed if the break is not
 taken.
 \end{description}
-In terms of parsing, arguments are nonterminals or valued tokens, while
+In terms of parsing, arguments are nonterminals or name tokens and
-delimiters are literal tokens. The other directives have only significance
+delimiters are delimiters. The other directives have only significance for
-for printing. The \rmindex{pretty printing} mechanisms of Isabelle is
+printing. The \rmindex{pretty printing} mechanisms of Isabelle is essentially
-essentially the one described in \cite{paulson91}.
+the one described in \cite{paulson91}.
 \subsection{Infixes}
 Infix\index{infix} operators associating to the left or right can be declared
 "ALL x:A. P"  == "Ball(A, %x. P)"
 end
 \end{ttbox}
 This and the following theories are complete working examples, though they
-are fragmentary as they contain merely syntax. They are somewhat fashioned
+are fragmentary as they contain merely syntax. The full theory definition can
-after {\tt ZF/zf.thy}, where you should look for a good real-world example.
+be found in {\tt ZF/zf.thy}.
 {\tt SET} defines constants for set comprehension ({\tt Collect}),
 replacement ({\tt Replace}) and bounded universal quantification ({\tt
 Ball}). Without additional syntax you would have to express $\forall x \in A.
 P(x)$ as {\tt Ball(A, P)}. Since this is quite awkward, we define additional
 \subsection{Specifying macros}
 Basically macros are rewrite rules on asts. But unlike other macro systems of
-various programming languages, Isabelle's macros work two way. Therefore a
+various programming languages, Isabelle's macros work in both directions.
-syntax contains two lists of rules: one for parsing and one for printing.
+Therefore a syntax contains two lists of rules: one for parsing and one for
+printing.
 The {\tt translations} section\index{translations section@{\tt translations}
 section} consists of a list of rule specifications of the form:
 \begin{center}
 {\tt $[$ ($root$) $]$ $string$ $[$ => $|$ <= $|$ == $]$ $[$ ($root$) $]$
 $string$}.
 \end{center}
 This specifies a \rmindex{parse rule} ({\tt =>}) a \rmindex{print rule} ({\tt
-<=}) or both ({\tt ==}). The two $string$s preceded by optional parenthesized
+<=}) or both ({\tt ==}). The two $string$s preceded by optional
-$root$s denote the left-hand and right-hand side of the rule as `source
+parenthesized $root$s are the left-hand and right-hand side of the rule in
-code', i.e.\ in the usual syntax of terms.
+user-defined syntax.
 Rules are internalized wrt.\ an intermediate signature that is obtained from
 the parent theories' ones by adding all material of all sections preceding
 {\tt translations} in the {\tt .thy} file. Especially, new syntax defined in
 {\tt consts} is already effective.
 \begin{example} \label{ex:tr_funs}
 We continue Example~\ref{ex:prod_trans} by presenting the \ML\ sources of the
 parse translation for \ttindex{_K} and the print translation
 \ttindex{dependent_tr'}:
 \begin{ttbox}
 (* nondependent abstraction *)
 fun k_tr (*"_K"*) [t] = Abs ("x", dummyT, incr_boundvars 1 t)
-| k_tr (*"_K"*) ts = raise_term "k_tr" ts;
+| k_tr (*"_K"*) ts = raise TERM("k_tr",ts);
 (* dependent / nondependent quantifiers *)
 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
 if 0 mem (loose_bnos B) then
 let val (x', B') = variant_abs (x, dummyT, B);
 in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
 end
 else list_comb (Const (r, dummyT) $ A $ B, ts)
 | dependent_tr' _ _ = raise Match;
 \end{ttbox}
+This text is taken from the Pure sources. Ordinary user translations
-This text is taken from the Pure sources, ordinary user translations would
+would typically appear within the {\tt ML} section of the {\tt .thy} file. To
-typically appear within the {\tt ML} section of the {\tt .thy} file.
+link {\tt k_tr} into the parser requires the definition
+\begin{ttbox}
-\medskip
+val parse_translation = [("_K",k_tr)];
+\end{ttbox}
+For {\tt k_tr} this is not necessary, as it is built-in, but user-defined
+translation functions are added in exactly the same way.  \medskip
 If {\tt k_tr} is called with exactly one argument $t$, it creates a new {\tt
 Abs} node with a body derived from $t$: loose {\tt Bound}s, i.e.\ those
 referring to outer {\tt Abs}s, are incremented using
 \ttindex{incr_boundvars}. This and many other basic term manipulation
 functions are defined in {\tt Pure/term.ML}, the comments there being in most

changeset 188	6be0856cdf49
parent 142	6dfae8cddec7
child 291	a615050a7494