doc-src/Ref/syntax.tex
changeset 864 d63b111b917a
parent 499 5a54c796b808
child 1387 9bcad9c22fd4
--- a/doc-src/Ref/syntax.tex	Fri Jan 13 02:02:00 1995 +0100
+++ b/doc-src/Ref/syntax.tex	Wed Jan 18 10:17:55 1995 +0100
@@ -14,7 +14,7 @@
 
 
 \section{Abstract syntax trees} \label{sec:asts}
-\index{ASTs|(} 
+\index{ASTs|(}
 
 The parser, given a token list from the lexer, applies productions to yield
 a parse tree\index{parse trees}.  By applying some internal transformations
@@ -72,7 +72,7 @@
 \end{ttbox}
 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
 Both {\tt ()} and {\tt (f)} are illegal because they have too few
-subtrees. 
+subtrees.
 
 The resemblance to Lisp's S-expressions is intentional, but there are two
 kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
@@ -105,14 +105,17 @@
 productions used to recognize the input.  Such parse trees are simply lists
 of tokens and constituent parse trees, the latter representing the
 nonterminals of the productions.  Let us refer to the actual productions in
-the form displayed by {\tt Syntax.print_syntax}.
+the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
+example).
 
 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
 by stripping out delimiters and copy productions.  More precisely, the
 mapping $\astofpt{-}$ is derived from the productions as follows:
 \begin{itemize}
 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
-  \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.
+  \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$
+  its associated string. Note that for {\tt xstr} this does not include the
+  quotes.
 
 \item Copy productions:\index{productions!copy}
   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
@@ -122,12 +125,12 @@
 
   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
     Here there are no constituents other than delimiters, which are
-    discarded. 
+    discarded.
 
   \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
     the remaining constituents $P@1$, \ldots, $P@n$ are built into an
     application whose head constant is~$c$:
-    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = 
+    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
     \]
 \end{itemize}
@@ -154,7 +157,7 @@
 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
 \end{tabular}
 \end{center}
-\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
+\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
 \end{figure}
 
 \begin{figure}
@@ -175,7 +178,7 @@
 
 The names of constant heads in the \AST{} control the translation process.
 The list of constants invoking parse \AST{} translations appears in the
-output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
+output of {\tt print_syntax} under {\tt parse_ast_translation}.
 
 
 \section{Transforming \AST{}s to terms}\label{sec:termofast}
@@ -208,7 +211,7 @@
   \mtt{dummyT})$.
 
 \item Function applications with $n$ arguments:
-    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = 
+    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
        \termofast{f} \ttapp
          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
     \]
@@ -254,18 +257,18 @@
     the free variable $x'$.  This replaces corresponding occurrences of the
     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
     \mtt{dummyT})$:
-   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = 
-      \Appl{\Constant \mtt{"_abs"}, 
+   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
+      \Appl{\Constant \mtt{"_abs"},
         constrain(\Variable x', \tau), \astofterm{t'}}.
     \]
 
-  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.  
+  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
     The occurrence of constructor \ttindex{Bound} should never happen
     when printing well-typed terms; it indicates a de Bruijn index with no
     matching abstraction.
 
   \item Where $f$ is not an application,
-    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = 
+    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
     \]
 \end{itemize}
@@ -276,8 +279,8 @@
   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
     \ttindex{show_types} is set to {\tt false}.
 
-  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, 
-         \astofterm{\tau}}$ \ otherwise.  
+  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
+         \astofterm{\tau}}$ \ otherwise.
 
     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
     constructors go to {\tt Constant}s; type identifiers go to {\tt
@@ -316,7 +319,7 @@
 
 
 \section{Macros: Syntactic rewriting} \label{sec:macros}
-\index{macros|(}\index{rewriting!syntactic|(} 
+\index{macros|(}\index{rewriting!syntactic|(}
 
 Mixfix declarations alone can handle situations where there is a direct
 connection between the concrete syntax and the underlying term.  Sometimes
@@ -338,7 +341,7 @@
 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
 theory.\footnote{This and the following theories are complete working
   examples, though they specify only syntax, no axioms.  The file {\tt
-    ZF/ZF.thy} presents the full set theory definition, including many
+    ZF/ZF.thy} presents a full set theory definition, including many
   macro rules.}  Theory {\tt SET} defines constants for set comprehension
 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal
 quantification ({\tt Ball}).  Each of these binds some variables.  Without
@@ -349,16 +352,17 @@
 \begin{ttbox}
 SET = Pure +
 types
-  i, o
+  i o
 arities
   i, o :: logic
 consts
   Trueprop      :: "o => prop"              ("_" 5)
   Collect       :: "[i, i => o] => i"
+  Replace       :: "[i, [i, i] => o] => i"
+  Ball          :: "[i, i => o] => o"
+syntax
   "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
-  Replace       :: "[i, [i, i] => o] => i"
   "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
-  Ball          :: "[i, i => o] => o"
   "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
 translations
   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
@@ -369,13 +373,12 @@
 \caption{Macro example: set theory}\label{fig:set_trans}
 \end{figure}
 
-The theory specifies a variable-binding syntax through additional
-productions that have mixfix declarations.  Each non-copy production must
-specify some constant, which is used for building \AST{}s.
-\index{constants!syntactic} The additional constants are decorated with
-{\tt\at} to stress their purely syntactic purpose; they should never occur
-within the final well-typed terms.  Furthermore, they cannot be written in
-formulae because they are not legal identifiers.
+The theory specifies a variable-binding syntax through additional productions
+that have mixfix declarations.  Each non-copy production must specify some
+constant, which is used for building \AST{}s. \index{constants!syntactic} The
+additional constants are decorated with {\tt\at} to stress their purely
+syntactic purpose; they may not occur within the final well-typed terms,
+being declared as {\tt syntax} rather than {\tt consts}.
 
 The translations cause the replacement of external forms by internal forms
 after parsing, and vice versa before printing of terms.  As a specification
@@ -405,7 +408,7 @@
 \[ (root)\; string \quad
    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
    \right\} \quad
-   (root)\; string 
+   (root)\; string
 \]
 %
 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
@@ -435,10 +438,10 @@
 
 Some atoms of the macro rule's \AST{} are designated as constants for
 matching.  These are all names that have been declared as classes, types or
-constants.
+constants (logical and syntactic).
 
 The result of this preprocessing is two lists of macro rules, each stored
-as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
+as a pair of \AST{}s.  They can be viewed using {\tt print_syntax}
 (sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
 \begin{ttbox}
@@ -455,7 +458,7 @@
 \begin{warn}
   Avoid choosing variable names that have previously been used as
   constants, types or type classes; the {\tt consts} section in the output
-  of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
+  of {\tt print_syntax} lists all such names.  If a macro rule works
   incorrectly, inspect its internal form as shown above, recalling that
   constants appear as quoted strings and variables without quotes.
 \end{warn}
@@ -477,7 +480,7 @@
 binding place (but not at occurrences in the body).  Matching with
 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
 "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
-appear in the external form, say \verb|{y::i:A::i. P::o}|.  
+appear in the external form, say \verb|{y::i:A::i. P::o}|.
 
 To allow such constraints to be re-read, your syntax should specify bound
 variables using the nonterminal~\ndx{idt}.  This is the case in our
@@ -526,11 +529,11 @@
 far removed from parse trees; at this level it is not yet known which
 identifiers will become constants, bounds, frees, types or classes.  As
 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
-{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and
-\ndx{tvar} become {\tt Variable}s.  On the other hand, when \AST{}s
-generated from terms for printing, all constants and type constructors
-become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may contain a
-messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
+{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
+\ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s.  On the other
+hand, when \AST{}s generated from terms for printing, all constants and type
+constructors become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may
+contain a messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
 insignificant at macro level because matching treats them alike.
 
 Because of this behaviour, different kinds of atoms with the same name are
@@ -540,13 +543,14 @@
   Nil
 consts
   Nil     :: "'a list"
+syntax
   "[]"    :: "'a list"    ("[]")
 translations
   "[]"    == "Nil"
 \end{ttbox}
 The term {\tt Nil} will be printed as {\tt []}, just as expected.
 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
-expected!  How is the type~{\tt Nil} printed?
+expected!  Guess how type~{\tt Nil} is printed?
 
 Normalizing an \AST{} involves repeatedly applying macro rules until none
 are applicable.  Macro rules are chosen in the order that they appear in the
@@ -573,11 +577,13 @@
 FINSET = SET +
 types
   is
-consts
+syntax
   ""            :: "i => is"                ("_")
   "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
+consts
   empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
   insert        :: "[i, i] => i"
+syntax
   "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
 translations
   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
@@ -596,12 +602,11 @@
 line break is required then a space is printed instead.
 
 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
-declaration.  Hence {\tt is} is not a logical type and no default
-productions are added.  If we had needed enumerations of the nonterminal
-{\tt logic}, which would include all the logical types, we could have used
-the predefined nonterminal symbol \ndx{args} and skipped this part
-altogether.  The nonterminal~{\tt is} can later be reused for other
-enumerations of type~{\tt i} like lists or tuples.
+declaration.  Hence {\tt is} is not a logical type and may be used safely as
+a new nonterminal for custom syntax.  The nonterminal~{\tt is} can later be
+re-used for other enumerations of type~{\tt i} like lists or tuples. If we
+had needed polymorphic enumerations, we could have used the predefined
+nonterminal symbol \ndx{args} and skipped this part altogether.
 
 \index{"@Finset@{\tt\at Finset} constant}
 Next follows {\tt empty}, which is already equipped with its syntax
@@ -620,9 +625,9 @@
   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
   ("insert" x "empty")  ->  ("{\at}Finset" x)
 \end{ttbox}
-This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
+This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
-Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
+Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
 The parse rules only work in the order given.
 
 \begin{warn}
@@ -635,7 +640,7 @@
 \begin{ttbox}
 \%empty insert. insert(x, empty)
 \end{ttbox}
-\par\noindent is printed as \verb|%empty insert. {x}|.
+\par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
 \end{warn}
 
 
@@ -643,15 +648,16 @@
 \index{examples!of macros}
 
 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
-the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;
+the right-hand side.  Something like \verb|"K(B)" => "%x.B"| is illegal;
 if allowed, it could cause variable capture.  In such cases you usually
 must fall back on translation functions.  But a trick can make things
-readable in some cases: {\em calling translation functions by parse
-  macros}:
+readable in some cases: {\em calling\/} translation functions by parse
+macros:
 \begin{ttbox}
 PROD = FINSET +
 consts
   Pi            :: "[i, i => i] => i"
+syntax
   "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
   "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
 \ttbreak
@@ -663,16 +669,15 @@
   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
 \end{ttbox}
 
-Here {\tt Pi} is an internal constant for constructing general products.
+Here {\tt Pi} is a logical constant for constructing general products.
 Two external forms exist: the general case {\tt PROD x:A.B} and the
 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
 does not depend on~{\tt x}.
 
-The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
-due to a parse translation associated with \cdx{_K}.  The order of the
-parse rules is critical.  Unfortunately there is no such trick for
-printing, so we have to add a {\tt ML} section for the print translation
-\ttindex{dependent_tr'}.
+The second parse macro introduces {\tt _K(B)}, which later becomes
+\verb|%x.B| due to a parse translation associated with \cdx{_K}.
+Unfortunately there is no such trick for printing, so we have to add a {\tt
+ML} section for the print translation \ttindex{dependent_tr'}.
 
 Recall that identifiers with a leading {\tt _} are allowed in translation
 rules, but not in ordinary terms.  Thus we can create \AST{}s containing
@@ -685,13 +690,13 @@
 
 
 \section{Translation functions} \label{sec:tr_funs}
-\index{translations|(} 
+\index{translations|(}
 %
 This section describes the translation function mechanism.  By writing
 \ML{} functions, you can do almost everything with terms or \AST{}s during
 parsing and printing.  The logic \LK\ is a good example of sophisticated
 transformations between internal and external representations of sequents;
-here, macros would be useless. 
+here, macros would be useless.
 
 A full understanding of translations requires some familiarity
 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
@@ -704,21 +709,21 @@
 associated with a name, which triggers calls to it.  Such names can be
 constants (logical or syntactic) or type constructors.
 
-{\tt Syntax.print_syntax} displays the sets of names associated with the
+{\tt print_syntax} displays the sets of names associated with the
 translation functions of a {\tt Syntax.syntax} under
 \ttindex{parse_ast_translation}, \ttindex{parse_translation},
 \ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
 add new ones via the {\tt ML} section\index{*ML section} of
 a {\tt .thy} file.  There may never be more than one function of the same
-kind per name.  Conceptually, the {\tt ML} section should appear between
-{\tt consts} and {\tt translations}; newly installed translation functions
-are already effective when macros and logical rules are parsed.
+kind per name.  Even though the {\tt ML} section is the very last part of a
+{\tt .thy} file, newly installed translation functions are effective when
+processing the preceding section.
 
-The {\tt ML} section is copied verbatim into the \ML\ file generated from a
-{\tt .thy} file.  Definitions made here are accessible as components of an
-\ML\ structure; to make some definitions private, use an \ML{} {\tt local}
-declaration.  The {\tt ML} section may install translation functions by
-declaring any of the following identifiers:
+The {\tt ML} section is copied verbatim near the beginning of the \ML\ file
+generated from a {\tt .thy} file.  Definitions made here are accessible as
+components of an \ML\ structure; to make some definitions private, use an
+\ML{} {\tt local} declaration.  The {\tt ML} section may install translation
+functions by declaring any of the following identifiers:
 \begin{ttbox}
 val parse_ast_translation : (string * (ast list -> ast)) list
 val print_ast_translation : (string * (ast list -> ast)) list
@@ -742,8 +747,8 @@
 transformations than \AST{}s do, typically involving abstractions and bound
 variables.
 
-Regardless of whether they act on terms or \AST{}s,
-parse translations differ from print translations fundamentally:
+Regardless of whether they act on terms or \AST{}s, parse translations differ
+from print translations in their overall behaviour fundamentally:
 \begin{description}
 \item[Parse translations] are applied bottom-up.  The arguments are already
   in translated form.  The translations must not fail; exceptions trigger
@@ -786,7 +791,7 @@
 Here is the parse translation for {\tt _K}:
 \begin{ttbox}
 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
-  | k_tr ts = raise TERM("k_tr",ts);
+  | k_tr ts = raise TERM ("k_tr", ts);
 \end{ttbox}
 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
 {\tt Abs} node with a body derived from $t$.  Since terms given to parse
@@ -797,7 +802,7 @@
 
 Here is the print translation for dependent types:
 \begin{ttbox}
-fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
+fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
       if 0 mem (loose_bnos B) then
         let val (x', B') = variant_abs (x, dummyT, B);
         in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
@@ -833,6 +838,6 @@
 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
 type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
 correct type~{\tt T}, so this is the only place where a type
-constraint might appear. 
+constraint might appear.
 \index{translations|)}
 \index{syntax!transformations|)}