doc-src/Ref/syntax.tex
changeset 864 d63b111b917a
parent 499 5a54c796b808
child 1387 9bcad9c22fd4
equal deleted inserted replaced
863:67692db44c70 864:d63b111b917a
    12 macros or code their own translation functions.  It describes the
    12 macros or code their own translation functions.  It describes the
    13 transformations between parse trees, abstract syntax trees and terms.
    13 transformations between parse trees, abstract syntax trees and terms.
    14 
    14 
    15 
    15 
    16 \section{Abstract syntax trees} \label{sec:asts}
    16 \section{Abstract syntax trees} \label{sec:asts}
    17 \index{ASTs|(} 
    17 \index{ASTs|(}
    18 
    18 
    19 The parser, given a token list from the lexer, applies productions to yield
    19 The parser, given a token list from the lexer, applies productions to yield
    20 a parse tree\index{parse trees}.  By applying some internal transformations
    20 a parse tree\index{parse trees}.  By applying some internal transformations
    21 the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
    21 the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
    22 expansion, further translations and finally type inference yields a
    22 expansion, further translations and finally type inference yields a
    70       Appl [Constant "_abs", Variable "x", Variable "t"],
    70       Appl [Constant "_abs", Variable "x", Variable "t"],
    71       Appl [Constant "fun", Variable "'a", Variable "'b"]]
    71       Appl [Constant "fun", Variable "'a", Variable "'b"]]
    72 \end{ttbox}
    72 \end{ttbox}
    73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
    73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
    74 Both {\tt ()} and {\tt (f)} are illegal because they have too few
    74 Both {\tt ()} and {\tt (f)} are illegal because they have too few
    75 subtrees. 
    75 subtrees.
    76 
    76 
    77 The resemblance to Lisp's S-expressions is intentional, but there are two
    77 The resemblance to Lisp's S-expressions is intentional, but there are two
    78 kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
    78 kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
    79 names {\tt Constant} and {\tt Variable} too literally; in the later
    79 names {\tt Constant} and {\tt Variable} too literally; in the later
    80 translation to terms, $\Variable x$ may become a constant, free or bound
    80 translation to terms, $\Variable x$ may become a constant, free or bound
   103 
   103 
   104 The parse tree is constructed by nesting the right-hand sides of the
   104 The parse tree is constructed by nesting the right-hand sides of the
   105 productions used to recognize the input.  Such parse trees are simply lists
   105 productions used to recognize the input.  Such parse trees are simply lists
   106 of tokens and constituent parse trees, the latter representing the
   106 of tokens and constituent parse trees, the latter representing the
   107 nonterminals of the productions.  Let us refer to the actual productions in
   107 nonterminals of the productions.  Let us refer to the actual productions in
   108 the form displayed by {\tt Syntax.print_syntax}.
   108 the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an
       
   109 example).
   109 
   110 
   110 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
   111 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
   111 by stripping out delimiters and copy productions.  More precisely, the
   112 by stripping out delimiters and copy productions.  More precisely, the
   112 mapping $\astofpt{-}$ is derived from the productions as follows:
   113 mapping $\astofpt{-}$ is derived from the productions as follows:
   113 \begin{itemize}
   114 \begin{itemize}
   114 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
   115 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
   115   \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.
   116   \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$
       
   117   its associated string. Note that for {\tt xstr} this does not include the
       
   118   quotes.
   116 
   119 
   117 \item Copy productions:\index{productions!copy}
   120 \item Copy productions:\index{productions!copy}
   118   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
   121   $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
   119   strings of delimiters, which are discarded.  $P$ stands for the single
   122   strings of delimiters, which are discarded.  $P$ stands for the single
   120   constituent that is not a delimiter; it is either a nonterminal symbol or
   123   constituent that is not a delimiter; it is either a nonterminal symbol or
   121   a name token.
   124   a name token.
   122 
   125 
   123   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
   126   \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
   124     Here there are no constituents other than delimiters, which are
   127     Here there are no constituents other than delimiters, which are
   125     discarded. 
   128     discarded.
   126 
   129 
   127   \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
   130   \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
   128     the remaining constituents $P@1$, \ldots, $P@n$ are built into an
   131     the remaining constituents $P@1$, \ldots, $P@n$ are built into an
   129     application whose head constant is~$c$:
   132     application whose head constant is~$c$:
   130     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = 
   133     \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} =
   131        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
   134        \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
   132     \]
   135     \]
   133 \end{itemize}
   136 \end{itemize}
   134 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
   137 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
   135 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
   138 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
   152 "f(x, y)"           & ("_appl" f ("_args" x y)) \\
   155 "f(x, y)"           & ("_appl" f ("_args" x y)) \\
   153 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
   156 "f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
   154 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
   157 "\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
   155 \end{tabular}
   158 \end{tabular}
   156 \end{center}
   159 \end{center}
   157 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
   160 \caption{Parsing examples using the Pure syntax}\label{fig:parse_ast}
   158 \end{figure}
   161 \end{figure}
   159 
   162 
   160 \begin{figure}
   163 \begin{figure}
   161 \begin{center}
   164 \begin{center}
   162 \tt\begin{tabular}{ll}
   165 \tt\begin{tabular}{ll}
   173 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
   176 \caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
   174 \end{figure}
   177 \end{figure}
   175 
   178 
   176 The names of constant heads in the \AST{} control the translation process.
   179 The names of constant heads in the \AST{} control the translation process.
   177 The list of constants invoking parse \AST{} translations appears in the
   180 The list of constants invoking parse \AST{} translations appears in the
   178 output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
   181 output of {\tt print_syntax} under {\tt parse_ast_translation}.
   179 
   182 
   180 
   183 
   181 \section{Transforming \AST{}s to terms}\label{sec:termofast}
   184 \section{Transforming \AST{}s to terms}\label{sec:termofast}
   182 \index{terms!made from ASTs}
   185 \index{terms!made from ASTs}
   183 \newcommand\termofast[1]{\lbrakk#1\rbrakk}
   186 \newcommand\termofast[1]{\lbrakk#1\rbrakk}
   206 
   209 
   207 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
   210 \item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
   208   \mtt{dummyT})$.
   211   \mtt{dummyT})$.
   209 
   212 
   210 \item Function applications with $n$ arguments:
   213 \item Function applications with $n$ arguments:
   211     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = 
   214     \[ \termofast{\Appl{f, x@1, \ldots, x@n}} =
   212        \termofast{f} \ttapp
   215        \termofast{f} \ttapp
   213          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
   216          \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
   214     \]
   217     \]
   215 \end{itemize}
   218 \end{itemize}
   216 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
   219 Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
   252     of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
   255     of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
   253     be obtained from~$t$ by replacing all bound occurrences of~$x$ by
   256     be obtained from~$t$ by replacing all bound occurrences of~$x$ by
   254     the free variable $x'$.  This replaces corresponding occurrences of the
   257     the free variable $x'$.  This replaces corresponding occurrences of the
   255     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
   258     constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
   256     \mtt{dummyT})$:
   259     \mtt{dummyT})$:
   257    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = 
   260    \[ \astofterm{\ttfct{Abs} (x, \tau, t)} =
   258       \Appl{\Constant \mtt{"_abs"}, 
   261       \Appl{\Constant \mtt{"_abs"},
   259         constrain(\Variable x', \tau), \astofterm{t'}}.
   262         constrain(\Variable x', \tau), \astofterm{t'}}.
   260     \]
   263     \]
   261 
   264 
   262   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.  
   265   \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.
   263     The occurrence of constructor \ttindex{Bound} should never happen
   266     The occurrence of constructor \ttindex{Bound} should never happen
   264     when printing well-typed terms; it indicates a de Bruijn index with no
   267     when printing well-typed terms; it indicates a de Bruijn index with no
   265     matching abstraction.
   268     matching abstraction.
   266 
   269 
   267   \item Where $f$ is not an application,
   270   \item Where $f$ is not an application,
   268     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = 
   271     \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} =
   269        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
   272        \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
   270     \]
   273     \]
   271 \end{itemize}
   274 \end{itemize}
   272 %
   275 %
   273 Type constraints\index{type constraints} are inserted to allow the printing
   276 Type constraints\index{type constraints} are inserted to allow the printing
   274 of types.  This is governed by the boolean variable \ttindex{show_types}:
   277 of types.  This is governed by the boolean variable \ttindex{show_types}:
   275 \begin{itemize}
   278 \begin{itemize}
   276   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
   279   \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
   277     \ttindex{show_types} is set to {\tt false}.
   280     \ttindex{show_types} is set to {\tt false}.
   278 
   281 
   279   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, 
   282   \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x,
   280          \astofterm{\tau}}$ \ otherwise.  
   283          \astofterm{\tau}}$ \ otherwise.
   281 
   284 
   282     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
   285     Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
   283     constructors go to {\tt Constant}s; type identifiers go to {\tt
   286     constructors go to {\tt Constant}s; type identifiers go to {\tt
   284       Variable}s; type applications go to {\tt Appl}s with the type
   287       Variable}s; type applications go to {\tt Appl}s with the type
   285     constructor as the first element.  If \ttindex{show_sorts} is set to
   288     constructor as the first element.  If \ttindex{show_sorts} is set to
   314 \index{ASTs|)}
   317 \index{ASTs|)}
   315 
   318 
   316 
   319 
   317 
   320 
   318 \section{Macros: Syntactic rewriting} \label{sec:macros}
   321 \section{Macros: Syntactic rewriting} \label{sec:macros}
   319 \index{macros|(}\index{rewriting!syntactic|(} 
   322 \index{macros|(}\index{rewriting!syntactic|(}
   320 
   323 
   321 Mixfix declarations alone can handle situations where there is a direct
   324 Mixfix declarations alone can handle situations where there is a direct
   322 connection between the concrete syntax and the underlying term.  Sometimes
   325 connection between the concrete syntax and the underlying term.  Sometimes
   323 we require a more elaborate concrete syntax, such as quantifiers and list
   326 we require a more elaborate concrete syntax, such as quantifiers and list
   324 notation.  Isabelle's {\bf macros} and {\bf translation functions} can
   327 notation.  Isabelle's {\bf macros} and {\bf translation functions} can
   336 express all but the most obscure translations.
   339 express all but the most obscure translations.
   337 
   340 
   338 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
   341 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
   339 theory.\footnote{This and the following theories are complete working
   342 theory.\footnote{This and the following theories are complete working
   340   examples, though they specify only syntax, no axioms.  The file {\tt
   343   examples, though they specify only syntax, no axioms.  The file {\tt
   341     ZF/ZF.thy} presents the full set theory definition, including many
   344     ZF/ZF.thy} presents a full set theory definition, including many
   342   macro rules.}  Theory {\tt SET} defines constants for set comprehension
   345   macro rules.}  Theory {\tt SET} defines constants for set comprehension
   343 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal
   346 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal
   344 quantification ({\tt Ball}).  Each of these binds some variables.  Without
   347 quantification ({\tt Ball}).  Each of these binds some variables.  Without
   345 additional syntax we should have to write $\forall x \in A.  P$ as {\tt
   348 additional syntax we should have to write $\forall x \in A.  P$ as {\tt
   346   Ball(A,\%x.P)}, and similarly for the others.
   349   Ball(A,\%x.P)}, and similarly for the others.
   347 
   350 
   348 \begin{figure}
   351 \begin{figure}
   349 \begin{ttbox}
   352 \begin{ttbox}
   350 SET = Pure +
   353 SET = Pure +
   351 types
   354 types
   352   i, o
   355   i o
   353 arities
   356 arities
   354   i, o :: logic
   357   i, o :: logic
   355 consts
   358 consts
   356   Trueprop      :: "o => prop"              ("_" 5)
   359   Trueprop      :: "o => prop"              ("_" 5)
   357   Collect       :: "[i, i => o] => i"
   360   Collect       :: "[i, i => o] => i"
       
   361   Replace       :: "[i, [i, i] => o] => i"
       
   362   Ball          :: "[i, i => o] => o"
       
   363 syntax
   358   "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
   364   "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
   359   Replace       :: "[i, [i, i] => o] => i"
       
   360   "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
   365   "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
   361   Ball          :: "[i, i => o] => o"
       
   362   "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
   366   "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
   363 translations
   367 translations
   364   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
   368   "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
   365   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
   369   "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
   366   "ALL x:A. P"  == "Ball(A, \%x. P)"
   370   "ALL x:A. P"  == "Ball(A, \%x. P)"
   367 end
   371 end
   368 \end{ttbox}
   372 \end{ttbox}
   369 \caption{Macro example: set theory}\label{fig:set_trans}
   373 \caption{Macro example: set theory}\label{fig:set_trans}
   370 \end{figure}
   374 \end{figure}
   371 
   375 
   372 The theory specifies a variable-binding syntax through additional
   376 The theory specifies a variable-binding syntax through additional productions
   373 productions that have mixfix declarations.  Each non-copy production must
   377 that have mixfix declarations.  Each non-copy production must specify some
   374 specify some constant, which is used for building \AST{}s.
   378 constant, which is used for building \AST{}s. \index{constants!syntactic} The
   375 \index{constants!syntactic} The additional constants are decorated with
   379 additional constants are decorated with {\tt\at} to stress their purely
   376 {\tt\at} to stress their purely syntactic purpose; they should never occur
   380 syntactic purpose; they may not occur within the final well-typed terms,
   377 within the final well-typed terms.  Furthermore, they cannot be written in
   381 being declared as {\tt syntax} rather than {\tt consts}.
   378 formulae because they are not legal identifiers.
       
   379 
   382 
   380 The translations cause the replacement of external forms by internal forms
   383 The translations cause the replacement of external forms by internal forms
   381 after parsing, and vice versa before printing of terms.  As a specification
   384 after parsing, and vice versa before printing of terms.  As a specification
   382 of the set theory notation, they should be largely self-explanatory.  The
   385 of the set theory notation, they should be largely self-explanatory.  The
   383 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
   386 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
   403 \index{*translations section}
   406 \index{*translations section}
   404 The {\tt translations} section specifies macros.  The syntax for a macro is
   407 The {\tt translations} section specifies macros.  The syntax for a macro is
   405 \[ (root)\; string \quad
   408 \[ (root)\; string \quad
   406    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
   409    \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
   407    \right\} \quad
   410    \right\} \quad
   408    (root)\; string 
   411    (root)\; string
   409 \]
   412 \]
   410 %
   413 %
   411 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
   414 This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
   412 ({\tt ==}).  The two strings specify the left and right-hand sides of the
   415 ({\tt ==}).  The two strings specify the left and right-hand sides of the
   413 macro rule.  The $(root)$ specification is optional; it specifies the
   416 macro rule.  The $(root)$ specification is optional; it specifies the
   433 {\tt _K}).  Thus, a constant whose name starts with an underscore can
   436 {\tt _K}).  Thus, a constant whose name starts with an underscore can
   434 appear in macro rules but not in ordinary terms.
   437 appear in macro rules but not in ordinary terms.
   435 
   438 
   436 Some atoms of the macro rule's \AST{} are designated as constants for
   439 Some atoms of the macro rule's \AST{} are designated as constants for
   437 matching.  These are all names that have been declared as classes, types or
   440 matching.  These are all names that have been declared as classes, types or
   438 constants.
   441 constants (logical and syntactic).
   439 
   442 
   440 The result of this preprocessing is two lists of macro rules, each stored
   443 The result of this preprocessing is two lists of macro rules, each stored
   441 as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
   444 as a pair of \AST{}s.  They can be viewed using {\tt print_syntax}
   442 (sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
   445 (sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
   443 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
   446 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
   444 \begin{ttbox}
   447 \begin{ttbox}
   445 parse_rules:
   448 parse_rules:
   446   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
   449   ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
   453 \end{ttbox}
   456 \end{ttbox}
   454 
   457 
   455 \begin{warn}
   458 \begin{warn}
   456   Avoid choosing variable names that have previously been used as
   459   Avoid choosing variable names that have previously been used as
   457   constants, types or type classes; the {\tt consts} section in the output
   460   constants, types or type classes; the {\tt consts} section in the output
   458   of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
   461   of {\tt print_syntax} lists all such names.  If a macro rule works
   459   incorrectly, inspect its internal form as shown above, recalling that
   462   incorrectly, inspect its internal form as shown above, recalling that
   460   constants appear as quoted strings and variables without quotes.
   463   constants appear as quoted strings and variables without quotes.
   461 \end{warn}
   464 \end{warn}
   462 
   465 
   463 \begin{warn}
   466 \begin{warn}
   475 Another trap concerns type constraints.  If \ttindex{show_types} is set to
   478 Another trap concerns type constraints.  If \ttindex{show_types} is set to
   476 {\tt true}, bound variables will be decorated by their meta types at the
   479 {\tt true}, bound variables will be decorated by their meta types at the
   477 binding place (but not at occurrences in the body).  Matching with
   480 binding place (but not at occurrences in the body).  Matching with
   478 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
   481 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
   479 "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
   482 "i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
   480 appear in the external form, say \verb|{y::i:A::i. P::o}|.  
   483 appear in the external form, say \verb|{y::i:A::i. P::o}|.
   481 
   484 
   482 To allow such constraints to be re-read, your syntax should specify bound
   485 To allow such constraints to be re-read, your syntax should specify bound
   483 variables using the nonterminal~\ndx{idt}.  This is the case in our
   486 variables using the nonterminal~\ndx{idt}.  This is the case in our
   484 example.  Choosing {\tt id} instead of {\tt idt} is a common error,
   487 example.  Choosing {\tt id} instead of {\tt idt} is a common error,
   485 especially since it appears in former versions of most of Isabelle's
   488 especially since it appears in former versions of most of Isabelle's
   524 The second case above may look odd.  This is where {\tt Variable}s of
   527 The second case above may look odd.  This is where {\tt Variable}s of
   525 non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
   528 non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
   526 far removed from parse trees; at this level it is not yet known which
   529 far removed from parse trees; at this level it is not yet known which
   527 identifiers will become constants, bounds, frees, types or classes.  As
   530 identifiers will become constants, bounds, frees, types or classes.  As
   528 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
   531 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
   529 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and
   532 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid},
   530 \ndx{tvar} become {\tt Variable}s.  On the other hand, when \AST{}s
   533 \ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s.  On the other
   531 generated from terms for printing, all constants and type constructors
   534 hand, when \AST{}s generated from terms for printing, all constants and type
   532 become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may contain a
   535 constructors become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may
   533 messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
   536 contain a messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
   534 insignificant at macro level because matching treats them alike.
   537 insignificant at macro level because matching treats them alike.
   535 
   538 
   536 Because of this behaviour, different kinds of atoms with the same name are
   539 Because of this behaviour, different kinds of atoms with the same name are
   537 indistinguishable, which may make some rules prone to misbehaviour.  Example:
   540 indistinguishable, which may make some rules prone to misbehaviour.  Example:
   538 \begin{ttbox}
   541 \begin{ttbox}
   539 types
   542 types
   540   Nil
   543   Nil
   541 consts
   544 consts
   542   Nil     :: "'a list"
   545   Nil     :: "'a list"
       
   546 syntax
   543   "[]"    :: "'a list"    ("[]")
   547   "[]"    :: "'a list"    ("[]")
   544 translations
   548 translations
   545   "[]"    == "Nil"
   549   "[]"    == "Nil"
   546 \end{ttbox}
   550 \end{ttbox}
   547 The term {\tt Nil} will be printed as {\tt []}, just as expected.
   551 The term {\tt Nil} will be printed as {\tt []}, just as expected.
   548 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
   552 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
   549 expected!  How is the type~{\tt Nil} printed?
   553 expected!  Guess how type~{\tt Nil} is printed?
   550 
   554 
   551 Normalizing an \AST{} involves repeatedly applying macro rules until none
   555 Normalizing an \AST{} involves repeatedly applying macro rules until none
   552 are applicable.  Macro rules are chosen in the order that they appear in the
   556 are applicable.  Macro rules are chosen in the order that they appear in the
   553 {\tt translations} section.  You can watch the normalization of \AST{}s
   557 {\tt translations} section.  You can watch the normalization of \AST{}s
   554 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
   558 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
   571 \index{"@Finset@{\tt\at Finset} constant}
   575 \index{"@Finset@{\tt\at Finset} constant}
   572 \begin{ttbox}
   576 \begin{ttbox}
   573 FINSET = SET +
   577 FINSET = SET +
   574 types
   578 types
   575   is
   579   is
   576 consts
   580 syntax
   577   ""            :: "i => is"                ("_")
   581   ""            :: "i => is"                ("_")
   578   "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
   582   "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
       
   583 consts
   579   empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
   584   empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
   580   insert        :: "[i, i] => i"
   585   insert        :: "[i, i] => i"
       
   586 syntax
   581   "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
   587   "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
   582 translations
   588 translations
   583   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
   589   "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
   584   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
   590   "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
   585 end
   591 end
   594   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
   600   i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
   595 allows a line break after the comma for \rmindex{pretty printing}; if no
   601 allows a line break after the comma for \rmindex{pretty printing}; if no
   596 line break is required then a space is printed instead.
   602 line break is required then a space is printed instead.
   597 
   603 
   598 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
   604 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
   599 declaration.  Hence {\tt is} is not a logical type and no default
   605 declaration.  Hence {\tt is} is not a logical type and may be used safely as
   600 productions are added.  If we had needed enumerations of the nonterminal
   606 a new nonterminal for custom syntax.  The nonterminal~{\tt is} can later be
   601 {\tt logic}, which would include all the logical types, we could have used
   607 re-used for other enumerations of type~{\tt i} like lists or tuples. If we
   602 the predefined nonterminal symbol \ndx{args} and skipped this part
   608 had needed polymorphic enumerations, we could have used the predefined
   603 altogether.  The nonterminal~{\tt is} can later be reused for other
   609 nonterminal symbol \ndx{args} and skipped this part altogether.
   604 enumerations of type~{\tt i} like lists or tuples.
       
   605 
   610 
   606 \index{"@Finset@{\tt\at Finset} constant}
   611 \index{"@Finset@{\tt\at Finset} constant}
   607 Next follows {\tt empty}, which is already equipped with its syntax
   612 Next follows {\tt empty}, which is already equipped with its syntax
   608 \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
   613 \verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
   609 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
   614 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
   618   ("{\at}Finset" x)  ->  ("insert" x "empty")
   623   ("{\at}Finset" x)  ->  ("insert" x "empty")
   619 print_rules:
   624 print_rules:
   620   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
   625   ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
   621   ("insert" x "empty")  ->  ("{\at}Finset" x)
   626   ("insert" x "empty")  ->  ("{\at}Finset" x)
   622 \end{ttbox}
   627 \end{ttbox}
   623 This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
   628 This shows that \verb|{x,xs}| indeed matches any set enumeration of at least
   624 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
   629 two elements, binding the first to {\tt x} and the rest to {\tt xs}.
   625 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
   630 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.
   626 The parse rules only work in the order given.
   631 The parse rules only work in the order given.
   627 
   632 
   628 \begin{warn}
   633 \begin{warn}
   629   The \AST{} rewriter cannot distinguish constants from variables and looks
   634   The \AST{} rewriter cannot distinguish constants from variables and looks
   630   only for names of atoms.  Thus the names of {\tt Constant}s occurring in
   635   only for names of atoms.  Thus the names of {\tt Constant}s occurring in
   633   sufficiently long and strange names.  If a bound variable's name gets
   638   sufficiently long and strange names.  If a bound variable's name gets
   634   rewritten, the result will be incorrect; for example, the term
   639   rewritten, the result will be incorrect; for example, the term
   635 \begin{ttbox}
   640 \begin{ttbox}
   636 \%empty insert. insert(x, empty)
   641 \%empty insert. insert(x, empty)
   637 \end{ttbox}
   642 \end{ttbox}
   638 \par\noindent is printed as \verb|%empty insert. {x}|.
   643 \par\noindent is incorrectly printed as \verb|%empty insert. {x}|.
   639 \end{warn}
   644 \end{warn}
   640 
   645 
   641 
   646 
   642 \subsection{Example: a parse macro for dependent types}\label{prod_trans}
   647 \subsection{Example: a parse macro for dependent types}\label{prod_trans}
   643 \index{examples!of macros}
   648 \index{examples!of macros}
   644 
   649 
   645 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
   650 As stated earlier, a macro rule may not introduce new {\tt Variable}s on
   646 the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;
   651 the right-hand side.  Something like \verb|"K(B)" => "%x.B"| is illegal;
   647 if allowed, it could cause variable capture.  In such cases you usually
   652 if allowed, it could cause variable capture.  In such cases you usually
   648 must fall back on translation functions.  But a trick can make things
   653 must fall back on translation functions.  But a trick can make things
   649 readable in some cases: {\em calling translation functions by parse
   654 readable in some cases: {\em calling\/} translation functions by parse
   650   macros}:
   655 macros:
   651 \begin{ttbox}
   656 \begin{ttbox}
   652 PROD = FINSET +
   657 PROD = FINSET +
   653 consts
   658 consts
   654   Pi            :: "[i, i => i] => i"
   659   Pi            :: "[i, i => i] => i"
       
   660 syntax
   655   "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
   661   "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
   656   "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
   662   "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
   657 \ttbreak
   663 \ttbreak
   658 translations
   664 translations
   659   "PROD x:A. B" => "Pi(A, \%x. B)"
   665   "PROD x:A. B" => "Pi(A, \%x. B)"
   661 end
   667 end
   662 ML
   668 ML
   663   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
   669   val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
   664 \end{ttbox}
   670 \end{ttbox}
   665 
   671 
   666 Here {\tt Pi} is an internal constant for constructing general products.
   672 Here {\tt Pi} is a logical constant for constructing general products.
   667 Two external forms exist: the general case {\tt PROD x:A.B} and the
   673 Two external forms exist: the general case {\tt PROD x:A.B} and the
   668 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
   674 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
   669 does not depend on~{\tt x}.
   675 does not depend on~{\tt x}.
   670 
   676 
   671 The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
   677 The second parse macro introduces {\tt _K(B)}, which later becomes
   672 due to a parse translation associated with \cdx{_K}.  The order of the
   678 \verb|%x.B| due to a parse translation associated with \cdx{_K}.
   673 parse rules is critical.  Unfortunately there is no such trick for
   679 Unfortunately there is no such trick for printing, so we have to add a {\tt
   674 printing, so we have to add a {\tt ML} section for the print translation
   680 ML} section for the print translation \ttindex{dependent_tr'}.
   675 \ttindex{dependent_tr'}.
       
   676 
   681 
   677 Recall that identifiers with a leading {\tt _} are allowed in translation
   682 Recall that identifiers with a leading {\tt _} are allowed in translation
   678 rules, but not in ordinary terms.  Thus we can create \AST{}s containing
   683 rules, but not in ordinary terms.  Thus we can create \AST{}s containing
   679 names that are not directly expressible.
   684 names that are not directly expressible.
   680 
   685 
   683 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
   688 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
   684 \index{macros|)}\index{rewriting!syntactic|)}
   689 \index{macros|)}\index{rewriting!syntactic|)}
   685 
   690 
   686 
   691 
   687 \section{Translation functions} \label{sec:tr_funs}
   692 \section{Translation functions} \label{sec:tr_funs}
   688 \index{translations|(} 
   693 \index{translations|(}
   689 %
   694 %
   690 This section describes the translation function mechanism.  By writing
   695 This section describes the translation function mechanism.  By writing
   691 \ML{} functions, you can do almost everything with terms or \AST{}s during
   696 \ML{} functions, you can do almost everything with terms or \AST{}s during
   692 parsing and printing.  The logic \LK\ is a good example of sophisticated
   697 parsing and printing.  The logic \LK\ is a good example of sophisticated
   693 transformations between internal and external representations of sequents;
   698 transformations between internal and external representations of sequents;
   694 here, macros would be useless. 
   699 here, macros would be useless.
   695 
   700 
   696 A full understanding of translations requires some familiarity
   701 A full understanding of translations requires some familiarity
   697 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
   702 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
   698 {\tt Syntax.ast} and the encodings of types and terms as such at the various
   703 {\tt Syntax.ast} and the encodings of types and terms as such at the various
   699 stages of the parsing or printing process.  Most users should never need to
   704 stages of the parsing or printing process.  Most users should never need to
   702 \subsection{Declaring translation functions}
   707 \subsection{Declaring translation functions}
   703 There are four kinds of translation functions.  Each such function is
   708 There are four kinds of translation functions.  Each such function is
   704 associated with a name, which triggers calls to it.  Such names can be
   709 associated with a name, which triggers calls to it.  Such names can be
   705 constants (logical or syntactic) or type constructors.
   710 constants (logical or syntactic) or type constructors.
   706 
   711 
   707 {\tt Syntax.print_syntax} displays the sets of names associated with the
   712 {\tt print_syntax} displays the sets of names associated with the
   708 translation functions of a {\tt Syntax.syntax} under
   713 translation functions of a {\tt Syntax.syntax} under
   709 \ttindex{parse_ast_translation}, \ttindex{parse_translation},
   714 \ttindex{parse_ast_translation}, \ttindex{parse_translation},
   710 \ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
   715 \ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
   711 add new ones via the {\tt ML} section\index{*ML section} of
   716 add new ones via the {\tt ML} section\index{*ML section} of
   712 a {\tt .thy} file.  There may never be more than one function of the same
   717 a {\tt .thy} file.  There may never be more than one function of the same
   713 kind per name.  Conceptually, the {\tt ML} section should appear between
   718 kind per name.  Even though the {\tt ML} section is the very last part of a
   714 {\tt consts} and {\tt translations}; newly installed translation functions
   719 {\tt .thy} file, newly installed translation functions are effective when
   715 are already effective when macros and logical rules are parsed.
   720 processing the preceding section.
   716 
   721 
   717 The {\tt ML} section is copied verbatim into the \ML\ file generated from a
   722 The {\tt ML} section is copied verbatim near the beginning of the \ML\ file
   718 {\tt .thy} file.  Definitions made here are accessible as components of an
   723 generated from a {\tt .thy} file.  Definitions made here are accessible as
   719 \ML\ structure; to make some definitions private, use an \ML{} {\tt local}
   724 components of an \ML\ structure; to make some definitions private, use an
   720 declaration.  The {\tt ML} section may install translation functions by
   725 \ML{} {\tt local} declaration.  The {\tt ML} section may install translation
   721 declaring any of the following identifiers:
   726 functions by declaring any of the following identifiers:
   722 \begin{ttbox}
   727 \begin{ttbox}
   723 val parse_ast_translation : (string * (ast list -> ast)) list
   728 val parse_ast_translation : (string * (ast list -> ast)) list
   724 val print_ast_translation : (string * (ast list -> ast)) list
   729 val print_ast_translation : (string * (ast list -> ast)) list
   725 val parse_translation     : (string * (term list -> term)) list
   730 val parse_translation     : (string * (term list -> term)) list
   726 val print_translation     : (string * (term list -> term)) list
   731 val print_translation     : (string * (term list -> term)) list
   740 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
   745 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
   741 x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
   746 x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
   742 transformations than \AST{}s do, typically involving abstractions and bound
   747 transformations than \AST{}s do, typically involving abstractions and bound
   743 variables.
   748 variables.
   744 
   749 
   745 Regardless of whether they act on terms or \AST{}s,
   750 Regardless of whether they act on terms or \AST{}s, parse translations differ
   746 parse translations differ from print translations fundamentally:
   751 from print translations in their overall behaviour fundamentally:
   747 \begin{description}
   752 \begin{description}
   748 \item[Parse translations] are applied bottom-up.  The arguments are already
   753 \item[Parse translations] are applied bottom-up.  The arguments are already
   749   in translated form.  The translations must not fail; exceptions trigger
   754   in translated form.  The translations must not fail; exceptions trigger
   750   an error message.
   755   an error message.
   751 
   756 
   784 sources to locate more examples.
   789 sources to locate more examples.
   785 
   790 
   786 Here is the parse translation for {\tt _K}:
   791 Here is the parse translation for {\tt _K}:
   787 \begin{ttbox}
   792 \begin{ttbox}
   788 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
   793 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
   789   | k_tr ts = raise TERM("k_tr",ts);
   794   | k_tr ts = raise TERM ("k_tr", ts);
   790 \end{ttbox}
   795 \end{ttbox}
   791 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
   796 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
   792 {\tt Abs} node with a body derived from $t$.  Since terms given to parse
   797 {\tt Abs} node with a body derived from $t$.  Since terms given to parse
   793 translations are not yet typed, the type of the bound variable in the new
   798 translations are not yet typed, the type of the bound variable in the new
   794 {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
   799 {\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
   795 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
   800 nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
   796 a basic term manipulation function defined in {\tt Pure/term.ML}.
   801 a basic term manipulation function defined in {\tt Pure/term.ML}.
   797 
   802 
   798 Here is the print translation for dependent types:
   803 Here is the print translation for dependent types:
   799 \begin{ttbox}
   804 \begin{ttbox}
   800 fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
   805 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) =
   801       if 0 mem (loose_bnos B) then
   806       if 0 mem (loose_bnos B) then
   802         let val (x', B') = variant_abs (x, dummyT, B);
   807         let val (x', B') = variant_abs (x, dummyT, B);
   803         in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
   808         in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
   804         end
   809         end
   805       else list_comb (Const (r, dummyT) $ A $ B, ts)
   810       else list_comb (Const (r, dummyT) $ A $ B, ts)
   831         let val (x', B') = variant_abs (x, dummyT, B);
   836         let val (x', B') = variant_abs (x, dummyT, B);
   832 \end{ttbox}\index{*variant_abs}
   837 \end{ttbox}\index{*variant_abs}
   833 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
   838 replaces bound variable occurrences in~$B$ by the free variable $x'$ with
   834 type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
   839 type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
   835 correct type~{\tt T}, so this is the only place where a type
   840 correct type~{\tt T}, so this is the only place where a type
   836 constraint might appear. 
   841 constraint might appear.
   837 \index{translations|)}
   842 \index{translations|)}
   838 \index{syntax!transformations|)}
   843 \index{syntax!transformations|)}