doc-src/IsarRef/Inner_Syntax.thy
changeset 48958 12afbf6eb7f9
parent 48816 754b09cd616f
equal deleted inserted replaced
48957:c04001b3a753 48958:12afbf6eb7f9
       
     1 theory Inner_Syntax
       
     2 imports Base Main
       
     3 begin
       
     4 
       
     5 chapter {* Inner syntax --- the term language \label{ch:inner-syntax} *}
       
     6 
       
     7 text {* The inner syntax of Isabelle provides concrete notation for
       
     8   the main entities of the logical framework, notably @{text
       
     9   "\<lambda>"}-terms with types and type classes.  Applications may either
       
    10   extend existing syntactic categories by additional notation, or
       
    11   define new sub-languages that are linked to the standard term
       
    12   language via some explicit markers.  For example @{verbatim
       
    13   FOO}~@{text "foo"} could embed the syntax corresponding for some
       
    14   user-defined nonterminal @{text "foo"} --- within the bounds of the
       
    15   given lexical syntax of Isabelle/Pure.
       
    16 
       
    17   The most basic way to specify concrete syntax for logical entities
       
    18   works via mixfix annotations (\secref{sec:mixfix}), which may be
       
    19   usually given as part of the original declaration or via explicit
       
    20   notation commands later on (\secref{sec:notation}).  This already
       
    21   covers many needs of concrete syntax without having to understand
       
    22   the full complexity of inner syntax layers.
       
    23 
       
    24   Further details of the syntax engine involves the classical
       
    25   distinction of lexical language versus context-free grammar (see
       
    26   \secref{sec:pure-syntax}), and various mechanisms for \emph{syntax
       
    27   transformations} (see \secref{sec:syntax-transformations}).
       
    28 *}
       
    29 
       
    30 
       
    31 section {* Printing logical entities *}
       
    32 
       
    33 subsection {* Diagnostic commands \label{sec:print-diag} *}
       
    34 
       
    35 text {*
       
    36   \begin{matharray}{rcl}
       
    37     @{command_def "typ"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\
       
    38     @{command_def "term"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\
       
    39     @{command_def "prop"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\
       
    40     @{command_def "thm"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\
       
    41     @{command_def "prf"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\
       
    42     @{command_def "full_prf"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\
       
    43     @{command_def "pr"}@{text "\<^sup>*"} & : & @{text "any \<rightarrow>"} \\
       
    44   \end{matharray}
       
    45 
       
    46   These diagnostic commands assist interactive development by printing
       
    47   internal logical entities in a human-readable fashion.
       
    48 
       
    49   @{rail "
       
    50     @@{command typ} @{syntax modes}? @{syntax type} ('::' @{syntax sort})?
       
    51     ;
       
    52     @@{command term} @{syntax modes}? @{syntax term}
       
    53     ;
       
    54     @@{command prop} @{syntax modes}? @{syntax prop}
       
    55     ;
       
    56     @@{command thm} @{syntax modes}? @{syntax thmrefs}
       
    57     ;
       
    58     ( @@{command prf} | @@{command full_prf} ) @{syntax modes}? @{syntax thmrefs}?
       
    59     ;
       
    60     @@{command pr} @{syntax modes}? @{syntax nat}?
       
    61     ;
       
    62 
       
    63     @{syntax_def modes}: '(' (@{syntax name} + ) ')'
       
    64   "}
       
    65 
       
    66   \begin{description}
       
    67 
       
    68   \item @{command "typ"}~@{text \<tau>} reads and prints a type expression
       
    69   according to the current context.
       
    70 
       
    71   \item @{command "typ"}~@{text "\<tau> :: s"} uses type-inference to
       
    72   determine the most general way to make @{text "\<tau>"} conform to sort
       
    73   @{text "s"}.  For concrete @{text "\<tau>"} this checks if the type
       
    74   belongs to that sort.  Dummy type parameters ``@{text "_"}''
       
    75   (underscore) are assigned to fresh type variables with most general
       
    76   sorts, according the the principles of type-inference.
       
    77 
       
    78   \item @{command "term"}~@{text t} and @{command "prop"}~@{text \<phi>}
       
    79   read, type-check and print terms or propositions according to the
       
    80   current theory or proof context; the inferred type of @{text t} is
       
    81   output as well.  Note that these commands are also useful in
       
    82   inspecting the current environment of term abbreviations.
       
    83 
       
    84   \item @{command "thm"}~@{text "a\<^sub>1 \<dots> a\<^sub>n"} retrieves
       
    85   theorems from the current theory or proof context.  Note that any
       
    86   attributes included in the theorem specifications are applied to a
       
    87   temporary context derived from the current theory or proof; the
       
    88   result is discarded, i.e.\ attributes involved in @{text "a\<^sub>1,
       
    89   \<dots>, a\<^sub>n"} do not have any permanent effect.
       
    90 
       
    91   \item @{command "prf"} displays the (compact) proof term of the
       
    92   current proof state (if present), or of the given theorems. Note
       
    93   that this requires proof terms to be switched on for the current
       
    94   object logic (see the ``Proof terms'' section of the Isabelle
       
    95   reference manual for information on how to do this).
       
    96 
       
    97   \item @{command "full_prf"} is like @{command "prf"}, but displays
       
    98   the full proof term, i.e.\ also displays information omitted in the
       
    99   compact proof term, which is denoted by ``@{text _}'' placeholders
       
   100   there.
       
   101 
       
   102   \item @{command "pr"}~@{text "goals"} prints the current proof state
       
   103   (if present), including current facts and goals.  The optional limit
       
   104   arguments affect the number of goals to be displayed, which is
       
   105   initially 10.  Omitting limit value leaves the current setting
       
   106   unchanged.
       
   107 
       
   108   \end{description}
       
   109 
       
   110   All of the diagnostic commands above admit a list of @{text modes}
       
   111   to be specified, which is appended to the current print mode; see
       
   112   also \secref{sec:print-modes}.  Thus the output behavior may be
       
   113   modified according particular print mode features.  For example,
       
   114   @{command "pr"}~@{text "(latex xsymbols)"} would print the current
       
   115   proof state with mathematical symbols and special characters
       
   116   represented in {\LaTeX} source, according to the Isabelle style
       
   117   \cite{isabelle-sys}.
       
   118 
       
   119   Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more
       
   120   systematic way to include formal items into the printed text
       
   121   document.
       
   122 *}
       
   123 
       
   124 
       
   125 subsection {* Details of printed content *}
       
   126 
       
   127 text {*
       
   128   \begin{tabular}{rcll}
       
   129     @{attribute_def show_types} & : & @{text attribute} & default @{text false} \\
       
   130     @{attribute_def show_sorts} & : & @{text attribute} & default @{text false} \\
       
   131     @{attribute_def show_consts} & : & @{text attribute} & default @{text false} \\
       
   132     @{attribute_def show_abbrevs} & : & @{text attribute} & default @{text true} \\
       
   133     @{attribute_def show_brackets} & : & @{text attribute} & default @{text false} \\
       
   134     @{attribute_def names_long} & : & @{text attribute} & default @{text false} \\
       
   135     @{attribute_def names_short} & : & @{text attribute} & default @{text false} \\
       
   136     @{attribute_def names_unique} & : & @{text attribute} & default @{text true} \\
       
   137     @{attribute_def eta_contract} & : & @{text attribute} & default @{text true} \\
       
   138     @{attribute_def goals_limit} & : & @{text attribute} & default @{text 10} \\
       
   139     @{attribute_def show_main_goal} & : & @{text attribute} & default @{text false} \\
       
   140     @{attribute_def show_hyps} & : & @{text attribute} & default @{text false} \\
       
   141     @{attribute_def show_tags} & : & @{text attribute} & default @{text false} \\
       
   142     @{attribute_def show_question_marks} & : & @{text attribute} & default @{text true} \\
       
   143   \end{tabular}
       
   144   \medskip
       
   145 
       
   146   These configuration options control the detail of information that
       
   147   is displayed for types, terms, theorems, goals etc.  See also
       
   148   \secref{sec:config}.
       
   149 
       
   150   \begin{description}
       
   151 
       
   152   \item @{attribute show_types} and @{attribute show_sorts} control
       
   153   printing of type constraints for term variables, and sort
       
   154   constraints for type variables.  By default, neither of these are
       
   155   shown in output.  If @{attribute show_sorts} is enabled, types are
       
   156   always shown as well.
       
   157 
       
   158   Note that displaying types and sorts may explain why a polymorphic
       
   159   inference rule fails to resolve with some goal, or why a rewrite
       
   160   rule does not apply as expected.
       
   161 
       
   162   \item @{attribute show_consts} controls printing of types of
       
   163   constants when displaying a goal state.
       
   164 
       
   165   Note that the output can be enormous, because polymorphic constants
       
   166   often occur at several different type instances.
       
   167 
       
   168   \item @{attribute show_abbrevs} controls folding of constant
       
   169   abbreviations.
       
   170 
       
   171   \item @{attribute show_brackets} controls bracketing in pretty
       
   172   printed output.  If enabled, all sub-expressions of the pretty
       
   173   printing tree will be parenthesized, even if this produces malformed
       
   174   term syntax!  This crude way of showing the internal structure of
       
   175   pretty printed entities may occasionally help to diagnose problems
       
   176   with operator priorities, for example.
       
   177 
       
   178   \item @{attribute names_long}, @{attribute names_short}, and
       
   179   @{attribute names_unique} control the way of printing fully
       
   180   qualified internal names in external form.  See also
       
   181   \secref{sec:antiq} for the document antiquotation options of the
       
   182   same names.
       
   183 
       
   184   \item @{attribute eta_contract} controls @{text "\<eta>"}-contracted
       
   185   printing of terms.
       
   186 
       
   187   The @{text \<eta>}-contraction law asserts @{prop "(\<lambda>x. f x) \<equiv> f"},
       
   188   provided @{text x} is not free in @{text f}.  It asserts
       
   189   \emph{extensionality} of functions: @{prop "f \<equiv> g"} if @{prop "f x \<equiv>
       
   190   g x"} for all @{text x}.  Higher-order unification frequently puts
       
   191   terms into a fully @{text \<eta>}-expanded form.  For example, if @{text
       
   192   F} has type @{text "(\<tau> \<Rightarrow> \<tau>) \<Rightarrow> \<tau>"} then its expanded form is @{term
       
   193   "\<lambda>h. F (\<lambda>x. h x)"}.
       
   194 
       
   195   Enabling @{attribute eta_contract} makes Isabelle perform @{text
       
   196   \<eta>}-contractions before printing, so that @{term "\<lambda>h. F (\<lambda>x. h x)"}
       
   197   appears simply as @{text F}.
       
   198 
       
   199   Note that the distinction between a term and its @{text \<eta>}-expanded
       
   200   form occasionally matters.  While higher-order resolution and
       
   201   rewriting operate modulo @{text "\<alpha>\<beta>\<eta>"}-conversion, some other tools
       
   202   might look at terms more discretely.
       
   203 
       
   204   \item @{attribute goals_limit} controls the maximum number of
       
   205   subgoals to be shown in goal output.
       
   206 
       
   207   \item @{attribute show_main_goal} controls whether the main result
       
   208   to be proven should be displayed.  This information might be
       
   209   relevant for schematic goals, to inspect the current claim that has
       
   210   been synthesized so far.
       
   211 
       
   212   \item @{attribute show_hyps} controls printing of implicit
       
   213   hypotheses of local facts.  Normally, only those hypotheses are
       
   214   displayed that are \emph{not} covered by the assumptions of the
       
   215   current context: this situation indicates a fault in some tool being
       
   216   used.
       
   217 
       
   218   By enabling @{attribute show_hyps}, output of \emph{all} hypotheses
       
   219   can be enforced, which is occasionally useful for diagnostic
       
   220   purposes.
       
   221 
       
   222   \item @{attribute show_tags} controls printing of extra annotations
       
   223   within theorems, such as internal position information, or the case
       
   224   names being attached by the attribute @{attribute case_names}.
       
   225 
       
   226   Note that the @{attribute tagged} and @{attribute untagged}
       
   227   attributes provide low-level access to the collection of tags
       
   228   associated with a theorem.
       
   229 
       
   230   \item @{attribute show_question_marks} controls printing of question
       
   231   marks for schematic variables, such as @{text ?x}.  Only the leading
       
   232   question mark is affected, the remaining text is unchanged
       
   233   (including proper markup for schematic variables that might be
       
   234   relevant for user interfaces).
       
   235 
       
   236   \end{description}
       
   237 *}
       
   238 
       
   239 
       
   240 subsection {* Alternative print modes \label{sec:print-modes} *}
       
   241 
       
   242 text {*
       
   243   \begin{mldecls}
       
   244     @{index_ML print_mode_value: "unit -> string list"} \\
       
   245     @{index_ML Print_Mode.with_modes: "string list -> ('a -> 'b) -> 'a -> 'b"} \\
       
   246   \end{mldecls}
       
   247 
       
   248   The \emph{print mode} facility allows to modify various operations
       
   249   for printing.  Commands like @{command typ}, @{command term},
       
   250   @{command thm} (see \secref{sec:print-diag}) take additional print
       
   251   modes as optional argument.  The underlying ML operations are as
       
   252   follows.
       
   253 
       
   254   \begin{description}
       
   255 
       
   256   \item @{ML "print_mode_value ()"} yields the list of currently
       
   257   active print mode names.  This should be understood as symbolic
       
   258   representation of certain individual features for printing (with
       
   259   precedence from left to right).
       
   260 
       
   261   \item @{ML Print_Mode.with_modes}~@{text "modes f x"} evaluates
       
   262   @{text "f x"} in an execution context where the print mode is
       
   263   prepended by the given @{text "modes"}.  This provides a thread-safe
       
   264   way to augment print modes.  It is also monotonic in the set of mode
       
   265   names: it retains the default print mode that certain
       
   266   user-interfaces might have installed for their proper functioning!
       
   267 
       
   268   \end{description}
       
   269 
       
   270   \begin{warn}
       
   271   The old global reference @{ML print_mode} should never be used
       
   272   directly in applications.  Its main reason for being publicly
       
   273   accessible is to support historic versions of Proof~General.
       
   274   \end{warn}
       
   275 
       
   276   \medskip The pretty printer for inner syntax maintains alternative
       
   277   mixfix productions for any print mode name invented by the user, say
       
   278   in commands like @{command notation} or @{command abbreviation}.
       
   279   Mode names can be arbitrary, but the following ones have a specific
       
   280   meaning by convention:
       
   281 
       
   282   \begin{itemize}
       
   283 
       
   284   \item @{verbatim "\"\""} (the empty string): default mode;
       
   285   implicitly active as last element in the list of modes.
       
   286 
       
   287   \item @{verbatim input}: dummy print mode that is never active; may
       
   288   be used to specify notation that is only available for input.
       
   289 
       
   290   \item @{verbatim internal} dummy print mode that is never active;
       
   291   used internally in Isabelle/Pure.
       
   292 
       
   293   \item @{verbatim xsymbols}: enable proper mathematical symbols
       
   294   instead of ASCII art.\footnote{This traditional mode name stems from
       
   295   the ``X-Symbol'' package for old versions Proof~General with XEmacs,
       
   296   although that package has been superseded by Unicode in recent
       
   297   years.}
       
   298 
       
   299   \item @{verbatim HTML}: additional mode that is active in HTML
       
   300   presentation of Isabelle theory sources; allows to provide
       
   301   alternative output notation.
       
   302 
       
   303   \item @{verbatim latex}: additional mode that is active in {\LaTeX}
       
   304   document preparation of Isabelle theory sources; allows to provide
       
   305   alternative output notation.
       
   306 
       
   307   \end{itemize}
       
   308 *}
       
   309 
       
   310 
       
   311 subsection {* Printing limits *}
       
   312 
       
   313 text {*
       
   314   \begin{mldecls}
       
   315     @{index_ML Pretty.margin_default: "int Unsynchronized.ref"} \\
       
   316     @{index_ML print_depth: "int -> unit"} \\
       
   317   \end{mldecls}
       
   318 
       
   319   These ML functions set limits for pretty printed text.
       
   320 
       
   321   \begin{description}
       
   322 
       
   323   \item @{ML Pretty.margin_default} indicates the global default for
       
   324   the right margin of the built-in pretty printer, with initial value
       
   325   76.  Note that user-interfaces typically control margins
       
   326   automatically when resizing windows, or even bypass the formatting
       
   327   engine of Isabelle/ML altogether and do it within the front end via
       
   328   Isabelle/Scala.
       
   329 
       
   330   \item @{ML print_depth}~@{text n} limits the printing depth of the
       
   331   ML toplevel pretty printer; the precise effect depends on the ML
       
   332   compiler and run-time system.  Typically @{text n} should be less
       
   333   than 10.  Bigger values such as 100--1000 are useful for debugging.
       
   334 
       
   335   \end{description}
       
   336 *}
       
   337 
       
   338 
       
   339 section {* Mixfix annotations \label{sec:mixfix} *}
       
   340 
       
   341 text {* Mixfix annotations specify concrete \emph{inner syntax} of
       
   342   Isabelle types and terms.  Locally fixed parameters in toplevel
       
   343   theorem statements, locale and class specifications also admit
       
   344   mixfix annotations in a fairly uniform manner.  A mixfix annotation
       
   345   describes describes the concrete syntax, the translation to abstract
       
   346   syntax, and the pretty printing.  Special case annotations provide a
       
   347   simple means of specifying infix operators and binders.
       
   348 
       
   349   Isabelle mixfix syntax is inspired by {\OBJ} \cite{OBJ}.  It allows
       
   350   to specify any context-free priority grammar, which is more general
       
   351   than the fixity declarations of ML and Prolog.
       
   352 
       
   353   @{rail "
       
   354     @{syntax_def mixfix}: '(' mfix ')'
       
   355     ;
       
   356     @{syntax_def struct_mixfix}: '(' ( mfix | @'structure' ) ')'
       
   357     ;
       
   358 
       
   359     mfix: @{syntax template} prios? @{syntax nat}? |
       
   360       (@'infix' | @'infixl' | @'infixr') @{syntax template} @{syntax nat} |
       
   361       @'binder' @{syntax template} prios? @{syntax nat}
       
   362     ;
       
   363     template: string
       
   364     ;
       
   365     prios: '[' (@{syntax nat} + ',') ']'
       
   366   "}
       
   367 
       
   368   The string given as @{text template} may include literal text,
       
   369   spacing, blocks, and arguments (denoted by ``@{text _}''); the
       
   370   special symbol ``@{verbatim "\<index>"}'' (printed as ``@{text "\<index>"}'')
       
   371   represents an index argument that specifies an implicit structure
       
   372   reference (see also \secref{sec:locale}).  Infix and binder
       
   373   declarations provide common abbreviations for particular mixfix
       
   374   declarations.  So in practice, mixfix templates mostly degenerate to
       
   375   literal text for concrete syntax, such as ``@{verbatim "++"}'' for
       
   376   an infix symbol.
       
   377 *}
       
   378 
       
   379 
       
   380 subsection {* The general mixfix form *}
       
   381 
       
   382 text {* In full generality, mixfix declarations work as follows.
       
   383   Suppose a constant @{text "c :: \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>"} is annotated by
       
   384   @{text "(mixfix [p\<^sub>1, \<dots>, p\<^sub>n] p)"}, where @{text "mixfix"} is a string
       
   385   @{text "d\<^sub>0 _ d\<^sub>1 _ \<dots> _ d\<^sub>n"} consisting of delimiters that surround
       
   386   argument positions as indicated by underscores.
       
   387 
       
   388   Altogether this determines a production for a context-free priority
       
   389   grammar, where for each argument @{text "i"} the syntactic category
       
   390   is determined by @{text "\<tau>\<^sub>i"} (with priority @{text "p\<^sub>i"}), and the
       
   391   result category is determined from @{text "\<tau>"} (with priority @{text
       
   392   "p"}).  Priority specifications are optional, with default 0 for
       
   393   arguments and 1000 for the result.\footnote{Omitting priorities is
       
   394   prone to syntactic ambiguities unless the delimiter tokens determine
       
   395   fully bracketed notation, as in @{text "if _ then _ else _ fi"}.}
       
   396 
       
   397   Since @{text "\<tau>"} may be again a function type, the constant
       
   398   type scheme may have more argument positions than the mixfix
       
   399   pattern.  Printing a nested application @{text "c t\<^sub>1 \<dots> t\<^sub>m"} for
       
   400   @{text "m > n"} works by attaching concrete notation only to the
       
   401   innermost part, essentially by printing @{text "(c t\<^sub>1 \<dots> t\<^sub>n) \<dots> t\<^sub>m"}
       
   402   instead.  If a term has fewer arguments than specified in the mixfix
       
   403   template, the concrete syntax is ignored.
       
   404 
       
   405   \medskip A mixfix template may also contain additional directives
       
   406   for pretty printing, notably spaces, blocks, and breaks.  The
       
   407   general template format is a sequence over any of the following
       
   408   entities.
       
   409 
       
   410   \begin{description}
       
   411 
       
   412   \item @{text "d"} is a delimiter, namely a non-empty sequence of
       
   413   characters other than the following special characters:
       
   414 
       
   415   \smallskip
       
   416   \begin{tabular}{ll}
       
   417     @{verbatim "'"} & single quote \\
       
   418     @{verbatim "_"} & underscore \\
       
   419     @{text "\<index>"} & index symbol \\
       
   420     @{verbatim "("} & open parenthesis \\
       
   421     @{verbatim ")"} & close parenthesis \\
       
   422     @{verbatim "/"} & slash \\
       
   423   \end{tabular}
       
   424   \medskip
       
   425 
       
   426   \item @{verbatim "'"} escapes the special meaning of these
       
   427   meta-characters, producing a literal version of the following
       
   428   character, unless that is a blank.
       
   429 
       
   430   A single quote followed by a blank separates delimiters, without
       
   431   affecting printing, but input tokens may have additional white space
       
   432   here.
       
   433 
       
   434   \item @{verbatim "_"} is an argument position, which stands for a
       
   435   certain syntactic category in the underlying grammar.
       
   436 
       
   437   \item @{text "\<index>"} is an indexed argument position; this is the place
       
   438   where implicit structure arguments can be attached.
       
   439 
       
   440   \item @{text "s"} is a non-empty sequence of spaces for printing.
       
   441   This and the following specifications do not affect parsing at all.
       
   442 
       
   443   \item @{verbatim "("}@{text n} opens a pretty printing block.  The
       
   444   optional number specifies how much indentation to add when a line
       
   445   break occurs within the block.  If the parenthesis is not followed
       
   446   by digits, the indentation defaults to 0.  A block specified via
       
   447   @{verbatim "(00"} is unbreakable.
       
   448 
       
   449   \item @{verbatim ")"} closes a pretty printing block.
       
   450 
       
   451   \item @{verbatim "//"} forces a line break.
       
   452 
       
   453   \item @{verbatim "/"}@{text s} allows a line break.  Here @{text s}
       
   454   stands for the string of spaces (zero or more) right after the
       
   455   slash.  These spaces are printed if the break is \emph{not} taken.
       
   456 
       
   457   \end{description}
       
   458 
       
   459   The general idea of pretty printing with blocks and breaks is also
       
   460   described in \cite{paulson-ml2}; it goes back to \cite{Oppen:1980}.
       
   461 *}
       
   462 
       
   463 
       
   464 subsection {* Infixes *}
       
   465 
       
   466 text {* Infix operators are specified by convenient short forms that
       
   467   abbreviate general mixfix annotations as follows:
       
   468 
       
   469   \begin{center}
       
   470   \begin{tabular}{lll}
       
   471 
       
   472   @{verbatim "("}@{keyword_def "infix"}~@{verbatim "\""}@{text sy}@{verbatim "\""} @{text "p"}@{verbatim ")"}
       
   473   & @{text "\<mapsto>"} &
       
   474   @{verbatim "(\"(_ "}@{text sy}@{verbatim "/ _)\" ["}@{text "p + 1"}@{verbatim ", "}@{text "p + 1"}@{verbatim "]"}@{text " p"}@{verbatim ")"} \\
       
   475   @{verbatim "("}@{keyword_def "infixl"}~@{verbatim "\""}@{text sy}@{verbatim "\""} @{text "p"}@{verbatim ")"}
       
   476   & @{text "\<mapsto>"} &
       
   477   @{verbatim "(\"(_ "}@{text sy}@{verbatim "/ _)\" ["}@{text "p"}@{verbatim ", "}@{text "p + 1"}@{verbatim "]"}@{text " p"}@{verbatim ")"} \\
       
   478   @{verbatim "("}@{keyword_def "infixr"}~@{verbatim "\""}@{text sy}@{verbatim "\""} @{text "p"}@{verbatim ")"}
       
   479   & @{text "\<mapsto>"} &
       
   480   @{verbatim "(\"(_ "}@{text sy}@{verbatim "/ _)\" ["}@{text "p + 1"}@{verbatim ", "}@{text "p"}@{verbatim "]"}@{text " p"}@{verbatim ")"} \\
       
   481 
       
   482   \end{tabular}
       
   483   \end{center}
       
   484 
       
   485   The mixfix template @{verbatim "\"(_ "}@{text sy}@{verbatim "/ _)\""}
       
   486   specifies two argument positions; the delimiter is preceded by a
       
   487   space and followed by a space or line break; the entire phrase is a
       
   488   pretty printing block.
       
   489 
       
   490   The alternative notation @{verbatim "op"}~@{text sy} is introduced
       
   491   in addition.  Thus any infix operator may be written in prefix form
       
   492   (as in ML), independently of the number of arguments in the term.
       
   493 *}
       
   494 
       
   495 
       
   496 subsection {* Binders *}
       
   497 
       
   498 text {* A \emph{binder} is a variable-binding construct such as a
       
   499   quantifier.  The idea to formalize @{text "\<forall>x. b"} as @{text "All
       
   500   (\<lambda>x. b)"} for @{text "All :: ('a \<Rightarrow> bool) \<Rightarrow> bool"} already goes back
       
   501   to \cite{church40}.  Isabelle declarations of certain higher-order
       
   502   operators may be annotated with @{keyword_def "binder"} annotations
       
   503   as follows:
       
   504 
       
   505   \begin{center}
       
   506   @{text "c :: "}@{verbatim "\""}@{text "(\<tau>\<^sub>1 \<Rightarrow> \<tau>\<^sub>2) \<Rightarrow> \<tau>\<^sub>3"}@{verbatim "\"  ("}@{keyword "binder"}@{verbatim " \""}@{text "sy"}@{verbatim "\" ["}@{text "p"}@{verbatim "] "}@{text "q"}@{verbatim ")"}
       
   507   \end{center}
       
   508 
       
   509   This introduces concrete binder syntax @{text "sy x. b"}, where
       
   510   @{text x} is a bound variable of type @{text "\<tau>\<^sub>1"}, the body @{text
       
   511   b} has type @{text "\<tau>\<^sub>2"} and the whole term has type @{text "\<tau>\<^sub>3"}.
       
   512   The optional integer @{text p} specifies the syntactic priority of
       
   513   the body; the default is @{text "q"}, which is also the priority of
       
   514   the whole construct.
       
   515 
       
   516   Internally, the binder syntax is expanded to something like this:
       
   517   \begin{center}
       
   518   @{text "c_binder :: "}@{verbatim "\""}@{text "idts \<Rightarrow> \<tau>\<^sub>2 \<Rightarrow> \<tau>\<^sub>3"}@{verbatim "\"  (\"(3"}@{text sy}@{verbatim "_./ _)\" [0, "}@{text "p"}@{verbatim "] "}@{text "q"}@{verbatim ")"}
       
   519   \end{center}
       
   520 
       
   521   Here @{syntax (inner) idts} is the nonterminal symbol for a list of
       
   522   identifiers with optional type constraints (see also
       
   523   \secref{sec:pure-grammar}).  The mixfix template @{verbatim
       
   524   "\"(3"}@{text sy}@{verbatim "_./ _)\""} defines argument positions
       
   525   for the bound identifiers and the body, separated by a dot with
       
   526   optional line break; the entire phrase is a pretty printing block of
       
   527   indentation level 3.  Note that there is no extra space after @{text
       
   528   "sy"}, so it needs to be included user specification if the binder
       
   529   syntax ends with a token that may be continued by an identifier
       
   530   token at the start of @{syntax (inner) idts}.
       
   531 
       
   532   Furthermore, a syntax translation to transforms @{text "c_binder x\<^sub>1
       
   533   \<dots> x\<^sub>n b"} into iterated application @{text "c (\<lambda>x\<^sub>1. \<dots> c (\<lambda>x\<^sub>n. b)\<dots>)"}.
       
   534   This works in both directions, for parsing and printing.  *}
       
   535 
       
   536 
       
   537 section {* Explicit notation \label{sec:notation} *}
       
   538 
       
   539 text {*
       
   540   \begin{matharray}{rcll}
       
   541     @{command_def "type_notation"} & : & @{text "local_theory \<rightarrow> local_theory"} \\
       
   542     @{command_def "no_type_notation"} & : & @{text "local_theory \<rightarrow> local_theory"} \\
       
   543     @{command_def "notation"} & : & @{text "local_theory \<rightarrow> local_theory"} \\
       
   544     @{command_def "no_notation"} & : & @{text "local_theory \<rightarrow> local_theory"} \\
       
   545     @{command_def "write"} & : & @{text "proof(state) \<rightarrow> proof(state)"} \\
       
   546   \end{matharray}
       
   547 
       
   548   Commands that introduce new logical entities (terms or types)
       
   549   usually allow to provide mixfix annotations on the spot, which is
       
   550   convenient for default notation.  Nonetheless, the syntax may be
       
   551   modified later on by declarations for explicit notation.  This
       
   552   allows to add or delete mixfix annotations for of existing logical
       
   553   entities within the current context.
       
   554 
       
   555   @{rail "
       
   556     (@@{command type_notation} | @@{command no_type_notation}) @{syntax target}?
       
   557       @{syntax mode}? \\ (@{syntax nameref} @{syntax mixfix} + @'and')
       
   558     ;
       
   559     (@@{command notation} | @@{command no_notation}) @{syntax target}? @{syntax mode}? \\
       
   560       (@{syntax nameref} @{syntax struct_mixfix} + @'and')
       
   561     ;
       
   562     @@{command write} @{syntax mode}? (@{syntax nameref} @{syntax struct_mixfix} + @'and')
       
   563   "}
       
   564 
       
   565   \begin{description}
       
   566 
       
   567   \item @{command "type_notation"}~@{text "c (mx)"} associates mixfix
       
   568   syntax with an existing type constructor.  The arity of the
       
   569   constructor is retrieved from the context.
       
   570 
       
   571   \item @{command "no_type_notation"} is similar to @{command
       
   572   "type_notation"}, but removes the specified syntax annotation from
       
   573   the present context.
       
   574 
       
   575   \item @{command "notation"}~@{text "c (mx)"} associates mixfix
       
   576   syntax with an existing constant or fixed variable.  The type
       
   577   declaration of the given entity is retrieved from the context.
       
   578 
       
   579   \item @{command "no_notation"} is similar to @{command "notation"},
       
   580   but removes the specified syntax annotation from the present
       
   581   context.
       
   582 
       
   583   \item @{command "write"} is similar to @{command "notation"}, but
       
   584   works within an Isar proof body.
       
   585 
       
   586   \end{description}
       
   587 *}
       
   588 
       
   589 
       
   590 section {* The Pure syntax \label{sec:pure-syntax} *}
       
   591 
       
   592 subsection {* Lexical matters \label{sec:inner-lex} *}
       
   593 
       
   594 text {* The inner lexical syntax vaguely resembles the outer one
       
   595   (\secref{sec:outer-lex}), but some details are different.  There are
       
   596   two main categories of inner syntax tokens:
       
   597 
       
   598   \begin{enumerate}
       
   599 
       
   600   \item \emph{delimiters} --- the literal tokens occurring in
       
   601   productions of the given priority grammar (cf.\
       
   602   \secref{sec:priority-grammar});
       
   603 
       
   604   \item \emph{named tokens} --- various categories of identifiers etc.
       
   605 
       
   606   \end{enumerate}
       
   607 
       
   608   Delimiters override named tokens and may thus render certain
       
   609   identifiers inaccessible.  Sometimes the logical context admits
       
   610   alternative ways to refer to the same entity, potentially via
       
   611   qualified names.
       
   612 
       
   613   \medskip The categories for named tokens are defined once and for
       
   614   all as follows, reusing some categories of the outer token syntax
       
   615   (\secref{sec:outer-lex}).
       
   616 
       
   617   \begin{center}
       
   618   \begin{supertabular}{rcl}
       
   619     @{syntax_def (inner) id} & = & @{syntax_ref ident} \\
       
   620     @{syntax_def (inner) longid} & = & @{syntax_ref longident} \\
       
   621     @{syntax_def (inner) var} & = & @{syntax_ref var} \\
       
   622     @{syntax_def (inner) tid} & = & @{syntax_ref typefree} \\
       
   623     @{syntax_def (inner) tvar} & = & @{syntax_ref typevar} \\
       
   624     @{syntax_def (inner) num_token} & = & @{syntax_ref nat}@{text "  |  "}@{verbatim "-"}@{syntax_ref nat} \\
       
   625     @{syntax_def (inner) float_token} & = & @{syntax_ref nat}@{verbatim "."}@{syntax_ref nat}@{text "  |  "}@{verbatim "-"}@{syntax_ref nat}@{verbatim "."}@{syntax_ref nat} \\
       
   626     @{syntax_def (inner) xnum_token} & = & @{verbatim "#"}@{syntax_ref nat}@{text "  |  "}@{verbatim "#-"}@{syntax_ref nat} \\
       
   627 
       
   628     @{syntax_def (inner) str_token} & = & @{verbatim "''"} @{text "\<dots>"} @{verbatim "''"} \\
       
   629   \end{supertabular}
       
   630   \end{center}
       
   631 
       
   632   The token categories @{syntax (inner) num_token}, @{syntax (inner)
       
   633   float_token}, @{syntax (inner) xnum_token}, and @{syntax (inner)
       
   634   str_token} are not used in Pure.  Object-logics may implement numerals
       
   635   and string constants by adding appropriate syntax declarations,
       
   636   together with some translation functions (e.g.\ see Isabelle/HOL).
       
   637 
       
   638   The derived categories @{syntax_def (inner) num_const}, @{syntax_def
       
   639   (inner) float_const}, and @{syntax_def (inner) num_const} provide
       
   640   robust access to the respective tokens: the syntax tree holds a
       
   641   syntactic constant instead of a free variable.
       
   642 *}
       
   643 
       
   644 
       
   645 subsection {* Priority grammars \label{sec:priority-grammar} *}
       
   646 
       
   647 text {* A context-free grammar consists of a set of \emph{terminal
       
   648   symbols}, a set of \emph{nonterminal symbols} and a set of
       
   649   \emph{productions}.  Productions have the form @{text "A = \<gamma>"},
       
   650   where @{text A} is a nonterminal and @{text \<gamma>} is a string of
       
   651   terminals and nonterminals.  One designated nonterminal is called
       
   652   the \emph{root symbol}.  The language defined by the grammar
       
   653   consists of all strings of terminals that can be derived from the
       
   654   root symbol by applying productions as rewrite rules.
       
   655 
       
   656   The standard Isabelle parser for inner syntax uses a \emph{priority
       
   657   grammar}.  Each nonterminal is decorated by an integer priority:
       
   658   @{text "A\<^sup>(\<^sup>p\<^sup>)"}.  In a derivation, @{text "A\<^sup>(\<^sup>p\<^sup>)"} may be rewritten
       
   659   using a production @{text "A\<^sup>(\<^sup>q\<^sup>) = \<gamma>"} only if @{text "p \<le> q"}.  Any
       
   660   priority grammar can be translated into a normal context-free
       
   661   grammar by introducing new nonterminals and productions.
       
   662 
       
   663   \medskip Formally, a set of context free productions @{text G}
       
   664   induces a derivation relation @{text "\<longrightarrow>\<^sub>G"} as follows.  Let @{text
       
   665   \<alpha>} and @{text \<beta>} denote strings of terminal or nonterminal symbols.
       
   666   Then @{text "\<alpha> A\<^sup>(\<^sup>p\<^sup>) \<beta> \<longrightarrow>\<^sub>G \<alpha> \<gamma> \<beta>"} holds if and only if @{text G}
       
   667   contains some production @{text "A\<^sup>(\<^sup>q\<^sup>) = \<gamma>"} for @{text "p \<le> q"}.
       
   668 
       
   669   \medskip The following grammar for arithmetic expressions
       
   670   demonstrates how binding power and associativity of operators can be
       
   671   enforced by priorities.
       
   672 
       
   673   \begin{center}
       
   674   \begin{tabular}{rclr}
       
   675   @{text "A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} & @{text "="} & @{verbatim "("} @{text "A\<^sup>(\<^sup>0\<^sup>)"} @{verbatim ")"} \\
       
   676   @{text "A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} & @{text "="} & @{verbatim 0} \\
       
   677   @{text "A\<^sup>(\<^sup>0\<^sup>)"} & @{text "="} & @{text "A\<^sup>(\<^sup>0\<^sup>)"} @{verbatim "+"} @{text "A\<^sup>(\<^sup>1\<^sup>)"} \\
       
   678   @{text "A\<^sup>(\<^sup>2\<^sup>)"} & @{text "="} & @{text "A\<^sup>(\<^sup>3\<^sup>)"} @{verbatim "*"} @{text "A\<^sup>(\<^sup>2\<^sup>)"} \\
       
   679   @{text "A\<^sup>(\<^sup>3\<^sup>)"} & @{text "="} & @{verbatim "-"} @{text "A\<^sup>(\<^sup>3\<^sup>)"} \\
       
   680   \end{tabular}
       
   681   \end{center}
       
   682   The choice of priorities determines that @{verbatim "-"} binds
       
   683   tighter than @{verbatim "*"}, which binds tighter than @{verbatim
       
   684   "+"}.  Furthermore @{verbatim "+"} associates to the left and
       
   685   @{verbatim "*"} to the right.
       
   686 
       
   687   \medskip For clarity, grammars obey these conventions:
       
   688   \begin{itemize}
       
   689 
       
   690   \item All priorities must lie between 0 and 1000.
       
   691 
       
   692   \item Priority 0 on the right-hand side and priority 1000 on the
       
   693   left-hand side may be omitted.
       
   694 
       
   695   \item The production @{text "A\<^sup>(\<^sup>p\<^sup>) = \<alpha>"} is written as @{text "A = \<alpha>
       
   696   (p)"}, i.e.\ the priority of the left-hand side actually appears in
       
   697   a column on the far right.
       
   698 
       
   699   \item Alternatives are separated by @{text "|"}.
       
   700 
       
   701   \item Repetition is indicated by dots @{text "(\<dots>)"} in an informal
       
   702   but obvious way.
       
   703 
       
   704   \end{itemize}
       
   705 
       
   706   Using these conventions, the example grammar specification above
       
   707   takes the form:
       
   708   \begin{center}
       
   709   \begin{tabular}{rclc}
       
   710     @{text A} & @{text "="} & @{verbatim "("} @{text A} @{verbatim ")"} \\
       
   711               & @{text "|"} & @{verbatim 0} & \qquad\qquad \\
       
   712               & @{text "|"} & @{text A} @{verbatim "+"} @{text "A\<^sup>(\<^sup>1\<^sup>)"} & @{text "(0)"} \\
       
   713               & @{text "|"} & @{text "A\<^sup>(\<^sup>3\<^sup>)"} @{verbatim "*"} @{text "A\<^sup>(\<^sup>2\<^sup>)"} & @{text "(2)"} \\
       
   714               & @{text "|"} & @{verbatim "-"} @{text "A\<^sup>(\<^sup>3\<^sup>)"} & @{text "(3)"} \\
       
   715   \end{tabular}
       
   716   \end{center}
       
   717 *}
       
   718 
       
   719 
       
   720 subsection {* The Pure grammar \label{sec:pure-grammar} *}
       
   721 
       
   722 text {* The priority grammar of the @{text "Pure"} theory is defined
       
   723   approximately like this:
       
   724 
       
   725   \begin{center}
       
   726   \begin{supertabular}{rclr}
       
   727 
       
   728   @{syntax_def (inner) any} & = & @{text "prop  |  logic"} \\\\
       
   729 
       
   730   @{syntax_def (inner) prop} & = & @{verbatim "("} @{text prop} @{verbatim ")"} \\
       
   731     & @{text "|"} & @{text "prop\<^sup>(\<^sup>4\<^sup>)"} @{verbatim "::"} @{text type} & @{text "(3)"} \\
       
   732     & @{text "|"} & @{text "any\<^sup>(\<^sup>3\<^sup>)"} @{verbatim "=="} @{text "any\<^sup>(\<^sup>2\<^sup>)"} & @{text "(2)"} \\
       
   733     & @{text "|"} & @{text "any\<^sup>(\<^sup>3\<^sup>)"} @{text "\<equiv>"} @{text "any\<^sup>(\<^sup>2\<^sup>)"} & @{text "(2)"} \\
       
   734     & @{text "|"} & @{text "prop\<^sup>(\<^sup>3\<^sup>)"} @{verbatim "&&&"} @{text "prop\<^sup>(\<^sup>2\<^sup>)"} & @{text "(2)"} \\
       
   735     & @{text "|"} & @{text "prop\<^sup>(\<^sup>2\<^sup>)"} @{verbatim "==>"} @{text "prop\<^sup>(\<^sup>1\<^sup>)"} & @{text "(1)"} \\
       
   736     & @{text "|"} & @{text "prop\<^sup>(\<^sup>2\<^sup>)"} @{text "\<Longrightarrow>"} @{text "prop\<^sup>(\<^sup>1\<^sup>)"} & @{text "(1)"} \\
       
   737     & @{text "|"} & @{verbatim "[|"} @{text prop} @{verbatim ";"} @{text "\<dots>"} @{verbatim ";"} @{text prop} @{verbatim "|]"} @{verbatim "==>"} @{text "prop\<^sup>(\<^sup>1\<^sup>)"} & @{text "(1)"} \\
       
   738     & @{text "|"} & @{text "\<lbrakk>"} @{text prop} @{verbatim ";"} @{text "\<dots>"} @{verbatim ";"} @{text prop} @{text "\<rbrakk>"} @{text "\<Longrightarrow>"} @{text "prop\<^sup>(\<^sup>1\<^sup>)"} & @{text "(1)"} \\
       
   739     & @{text "|"} & @{verbatim "!!"} @{text idts} @{verbatim "."} @{text prop} & @{text "(0)"} \\
       
   740     & @{text "|"} & @{text "\<And>"} @{text idts} @{verbatim "."} @{text prop} & @{text "(0)"} \\
       
   741     & @{text "|"} & @{verbatim OFCLASS} @{verbatim "("} @{text type} @{verbatim ","} @{text logic} @{verbatim ")"} \\
       
   742     & @{text "|"} & @{verbatim SORT_CONSTRAINT} @{verbatim "("} @{text type} @{verbatim ")"} \\
       
   743     & @{text "|"} & @{verbatim TERM} @{text logic} \\
       
   744     & @{text "|"} & @{verbatim PROP} @{text aprop} \\\\
       
   745 
       
   746   @{syntax_def (inner) aprop} & = & @{verbatim "("} @{text aprop} @{verbatim ")"} \\
       
   747     & @{text "|"} & @{text "id  |  longid  |  var  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "..."} \\
       
   748     & @{text "|"} & @{verbatim CONST} @{text "id  |  "}@{verbatim CONST} @{text "longid"} \\
       
   749     & @{text "|"} & @{verbatim XCONST} @{text "id  |  "}@{verbatim XCONST} @{text "longid"} \\
       
   750     & @{text "|"} & @{text "logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)  any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} & @{text "(999)"} \\\\
       
   751 
       
   752   @{syntax_def (inner) logic} & = & @{verbatim "("} @{text logic} @{verbatim ")"} \\
       
   753     & @{text "|"} & @{text "logic\<^sup>(\<^sup>4\<^sup>)"} @{verbatim "::"} @{text type} & @{text "(3)"} \\
       
   754     & @{text "|"} & @{text "id  |  longid  |  var  |  "}@{verbatim "_"}@{text "  |  "}@{verbatim "..."} \\
       
   755     & @{text "|"} & @{verbatim CONST} @{text "id  |  "}@{verbatim CONST} @{text "longid"} \\
       
   756     & @{text "|"} & @{verbatim XCONST} @{text "id  |  "}@{verbatim XCONST} @{text "longid"} \\
       
   757     & @{text "|"} & @{text "logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)  any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} & @{text "(999)"} \\
       
   758     & @{text "|"} & @{text "\<struct> index\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} \\
       
   759     & @{text "|"} & @{verbatim "%"} @{text pttrns} @{verbatim "."} @{text "any\<^sup>(\<^sup>3\<^sup>)"} & @{text "(3)"} \\
       
   760     & @{text "|"} & @{text \<lambda>} @{text pttrns} @{verbatim "."} @{text "any\<^sup>(\<^sup>3\<^sup>)"} & @{text "(3)"} \\
       
   761     & @{text "|"} & @{verbatim op} @{verbatim "=="}@{text "  |  "}@{verbatim op} @{text "\<equiv>"}@{text "  |  "}@{verbatim op} @{verbatim "&&&"} \\
       
   762     & @{text "|"} & @{verbatim op} @{verbatim "==>"}@{text "  |  "}@{verbatim op} @{text "\<Longrightarrow>"} \\
       
   763     & @{text "|"} & @{verbatim TYPE} @{verbatim "("} @{text type} @{verbatim ")"} \\\\
       
   764 
       
   765   @{syntax_def (inner) idt} & = & @{verbatim "("} @{text idt} @{verbatim ")"}@{text "  |  id  |  "}@{verbatim "_"} \\
       
   766     & @{text "|"} & @{text id} @{verbatim "::"} @{text type} & @{text "(0)"} \\
       
   767     & @{text "|"} & @{verbatim "_"} @{verbatim "::"} @{text type} & @{text "(0)"} \\\\
       
   768 
       
   769   @{syntax_def (inner) index} & = & @{verbatim "\<^bsub>"} @{text "logic\<^sup>(\<^sup>0\<^sup>)"} @{verbatim "\<^esub>"}@{text "  |  |  \<index>"} \\\\
       
   770 
       
   771   @{syntax_def (inner) idts} & = & @{text "idt  |  idt\<^sup>(\<^sup>1\<^sup>) idts"} & @{text "(0)"} \\\\
       
   772 
       
   773   @{syntax_def (inner) pttrn} & = & @{text idt} \\\\
       
   774 
       
   775   @{syntax_def (inner) pttrns} & = & @{text "pttrn  |  pttrn\<^sup>(\<^sup>1\<^sup>) pttrns"} & @{text "(0)"} \\\\
       
   776 
       
   777   @{syntax_def (inner) type} & = & @{verbatim "("} @{text type} @{verbatim ")"} \\
       
   778     & @{text "|"} & @{text "tid  |  tvar  |  "}@{verbatim "_"} \\
       
   779     & @{text "|"} & @{text "tid"} @{verbatim "::"} @{text "sort  |  tvar  "}@{verbatim "::"} @{text "sort  |  "}@{verbatim "_"} @{verbatim "::"} @{text "sort"} \\
       
   780     & @{text "|"} & @{text "type_name  |  type\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) type_name"} \\
       
   781     & @{text "|"} & @{verbatim "("} @{text type} @{verbatim ","} @{text "\<dots>"} @{verbatim ","} @{text type} @{verbatim ")"} @{text type_name} \\
       
   782     & @{text "|"} & @{text "type\<^sup>(\<^sup>1\<^sup>)"} @{verbatim "=>"} @{text type} & @{text "(0)"} \\
       
   783     & @{text "|"} & @{text "type\<^sup>(\<^sup>1\<^sup>)"} @{text "\<Rightarrow>"} @{text type} & @{text "(0)"} \\
       
   784     & @{text "|"} & @{verbatim "["} @{text type} @{verbatim ","} @{text "\<dots>"} @{verbatim ","} @{text type} @{verbatim "]"} @{verbatim "=>"} @{text type} & @{text "(0)"} \\
       
   785     & @{text "|"} & @{verbatim "["} @{text type} @{verbatim ","} @{text "\<dots>"} @{verbatim ","} @{text type} @{verbatim "]"} @{text "\<Rightarrow>"} @{text type} & @{text "(0)"} \\
       
   786   @{syntax_def (inner) type_name} & = & @{text "id  |  longid"} \\\\
       
   787 
       
   788   @{syntax_def (inner) sort} & = & @{syntax class_name}~@{text "  |  "}@{verbatim "{}"} \\
       
   789     & @{text "|"} & @{verbatim "{"} @{syntax class_name} @{verbatim ","} @{text "\<dots>"} @{verbatim ","} @{syntax class_name} @{verbatim "}"} \\
       
   790   @{syntax_def (inner) class_name} & = & @{text "id  |  longid"} \\
       
   791   \end{supertabular}
       
   792   \end{center}
       
   793 
       
   794   \medskip Here literal terminals are printed @{verbatim "verbatim"};
       
   795   see also \secref{sec:inner-lex} for further token categories of the
       
   796   inner syntax.  The meaning of the nonterminals defined by the above
       
   797   grammar is as follows:
       
   798 
       
   799   \begin{description}
       
   800 
       
   801   \item @{syntax_ref (inner) any} denotes any term.
       
   802 
       
   803   \item @{syntax_ref (inner) prop} denotes meta-level propositions,
       
   804   which are terms of type @{typ prop}.  The syntax of such formulae of
       
   805   the meta-logic is carefully distinguished from usual conventions for
       
   806   object-logics.  In particular, plain @{text "\<lambda>"}-term notation is
       
   807   \emph{not} recognized as @{syntax (inner) prop}.
       
   808 
       
   809   \item @{syntax_ref (inner) aprop} denotes atomic propositions, which
       
   810   are embedded into regular @{syntax (inner) prop} by means of an
       
   811   explicit @{verbatim PROP} token.
       
   812 
       
   813   Terms of type @{typ prop} with non-constant head, e.g.\ a plain
       
   814   variable, are printed in this form.  Constants that yield type @{typ
       
   815   prop} are expected to provide their own concrete syntax; otherwise
       
   816   the printed version will appear like @{syntax (inner) logic} and
       
   817   cannot be parsed again as @{syntax (inner) prop}.
       
   818 
       
   819   \item @{syntax_ref (inner) logic} denotes arbitrary terms of a
       
   820   logical type, excluding type @{typ prop}.  This is the main
       
   821   syntactic category of object-logic entities, covering plain @{text
       
   822   \<lambda>}-term notation (variables, abstraction, application), plus
       
   823   anything defined by the user.
       
   824 
       
   825   When specifying notation for logical entities, all logical types
       
   826   (excluding @{typ prop}) are \emph{collapsed} to this single category
       
   827   of @{syntax (inner) logic}.
       
   828 
       
   829   \item @{syntax_ref (inner) index} denotes an optional index term for
       
   830   indexed syntax.  If omitted, it refers to the first @{keyword
       
   831   "structure"} variable in the context.  The special dummy ``@{text
       
   832   "\<index>"}'' serves as pattern variable in mixfix annotations that
       
   833   introduce indexed notation.
       
   834 
       
   835   \item @{syntax_ref (inner) idt} denotes identifiers, possibly
       
   836   constrained by types.
       
   837 
       
   838   \item @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref
       
   839   (inner) idt}.  This is the most basic category for variables in
       
   840   iterated binders, such as @{text "\<lambda>"} or @{text "\<And>"}.
       
   841 
       
   842   \item @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns}
       
   843   denote patterns for abstraction, cases bindings etc.  In Pure, these
       
   844   categories start as a merely copy of @{syntax (inner) idt} and
       
   845   @{syntax (inner) idts}, respectively.  Object-logics may add
       
   846   additional productions for binding forms.
       
   847 
       
   848   \item @{syntax_ref (inner) type} denotes types of the meta-logic.
       
   849 
       
   850   \item @{syntax_ref (inner) sort} denotes meta-level sorts.
       
   851 
       
   852   \end{description}
       
   853 
       
   854   Here are some further explanations of certain syntax features.
       
   855 
       
   856   \begin{itemize}
       
   857 
       
   858   \item In @{syntax (inner) idts}, note that @{text "x :: nat y"} is
       
   859   parsed as @{text "x :: (nat y)"}, treating @{text y} like a type
       
   860   constructor applied to @{text nat}.  To avoid this interpretation,
       
   861   write @{text "(x :: nat) y"} with explicit parentheses.
       
   862 
       
   863   \item Similarly, @{text "x :: nat y :: nat"} is parsed as @{text "x ::
       
   864   (nat y :: nat)"}.  The correct form is @{text "(x :: nat) (y ::
       
   865   nat)"}, or @{text "(x :: nat) y :: nat"} if @{text y} is last in the
       
   866   sequence of identifiers.
       
   867 
       
   868   \item Type constraints for terms bind very weakly.  For example,
       
   869   @{text "x < y :: nat"} is normally parsed as @{text "(x < y) ::
       
   870   nat"}, unless @{text "<"} has a very low priority, in which case the
       
   871   input is likely to be ambiguous.  The correct form is @{text "x < (y
       
   872   :: nat)"}.
       
   873 
       
   874   \item Constraints may be either written with two literal colons
       
   875   ``@{verbatim "::"}'' or the double-colon symbol @{verbatim "\<Colon>"},
       
   876   which actually looks exactly the same in some {\LaTeX} styles.
       
   877 
       
   878   \item Dummy variables (written as underscore) may occur in different
       
   879   roles.
       
   880 
       
   881   \begin{description}
       
   882 
       
   883   \item A type ``@{text "_"}'' or ``@{text "_ :: sort"}'' acts like an
       
   884   anonymous inference parameter, which is filled-in according to the
       
   885   most general type produced by the type-checking phase.
       
   886 
       
   887   \item A bound ``@{text "_"}'' refers to a vacuous abstraction, where
       
   888   the body does not refer to the binding introduced here.  As in the
       
   889   term @{term "\<lambda>x _. x"}, which is @{text "\<alpha>"}-equivalent to @{text
       
   890   "\<lambda>x y. x"}.
       
   891 
       
   892   \item A free ``@{text "_"}'' refers to an implicit outer binding.
       
   893   Higher definitional packages usually allow forms like @{text "f x _
       
   894   = x"}.
       
   895 
       
   896   \item A schematic ``@{text "_"}'' (within a term pattern, see
       
   897   \secref{sec:term-decls}) refers to an anonymous variable that is
       
   898   implicitly abstracted over its context of locally bound variables.
       
   899   For example, this allows pattern matching of @{text "{x. f x = g
       
   900   x}"} against @{text "{x. _ = _}"}, or even @{text "{_. _ = _}"} by
       
   901   using both bound and schematic dummies.
       
   902 
       
   903   \end{description}
       
   904 
       
   905   \item The three literal dots ``@{verbatim "..."}'' may be also
       
   906   written as ellipsis symbol @{verbatim "\<dots>"}.  In both cases this
       
   907   refers to a special schematic variable, which is bound in the
       
   908   context.  This special term abbreviation works nicely with
       
   909   calculational reasoning (\secref{sec:calculation}).
       
   910 
       
   911   \item @{verbatim CONST} ensures that the given identifier is treated
       
   912   as constant term, and passed through the parse tree in fully
       
   913   internalized form.  This is particularly relevant for translation
       
   914   rules (\secref{sec:syn-trans}), notably on the RHS.
       
   915 
       
   916   \item @{verbatim XCONST} is similar to @{verbatim CONST}, but
       
   917   retains the constant name as given.  This is only relevant to
       
   918   translation rules (\secref{sec:syn-trans}), notably on the LHS.
       
   919 
       
   920   \end{itemize}
       
   921 *}
       
   922 
       
   923 
       
   924 subsection {* Inspecting the syntax *}
       
   925 
       
   926 text {*
       
   927   \begin{matharray}{rcl}
       
   928     @{command_def "print_syntax"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\
       
   929   \end{matharray}
       
   930 
       
   931   \begin{description}
       
   932 
       
   933   \item @{command "print_syntax"} prints the inner syntax of the
       
   934   current context.  The output can be quite large; the most important
       
   935   sections are explained below.
       
   936 
       
   937   \begin{description}
       
   938 
       
   939   \item @{text "lexicon"} lists the delimiters of the inner token
       
   940   language; see \secref{sec:inner-lex}.
       
   941 
       
   942   \item @{text "prods"} lists the productions of the underlying
       
   943   priority grammar; see \secref{sec:priority-grammar}.
       
   944 
       
   945   The nonterminal @{text "A\<^sup>(\<^sup>p\<^sup>)"} is rendered in plain text as @{text
       
   946   "A[p]"}; delimiters are quoted.  Many productions have an extra
       
   947   @{text "\<dots> => name"}.  These names later become the heads of parse
       
   948   trees; they also guide the pretty printer.
       
   949 
       
   950   Productions without such parse tree names are called \emph{copy
       
   951   productions}.  Their right-hand side must have exactly one
       
   952   nonterminal symbol (or named token).  The parser does not create a
       
   953   new parse tree node for copy productions, but simply returns the
       
   954   parse tree of the right-hand symbol.
       
   955 
       
   956   If the right-hand side of a copy production consists of a single
       
   957   nonterminal without any delimiters, then it is called a \emph{chain
       
   958   production}.  Chain productions act as abbreviations: conceptually,
       
   959   they are removed from the grammar by adding new productions.
       
   960   Priority information attached to chain productions is ignored; only
       
   961   the dummy value @{text "-1"} is displayed.
       
   962 
       
   963   \item @{text "print modes"} lists the alternative print modes
       
   964   provided by this grammar; see \secref{sec:print-modes}.
       
   965 
       
   966   \item @{text "parse_rules"} and @{text "print_rules"} relate to
       
   967   syntax translations (macros); see \secref{sec:syn-trans}.
       
   968 
       
   969   \item @{text "parse_ast_translation"} and @{text
       
   970   "print_ast_translation"} list sets of constants that invoke
       
   971   translation functions for abstract syntax trees, which are only
       
   972   required in very special situations; see \secref{sec:tr-funs}.
       
   973 
       
   974   \item @{text "parse_translation"} and @{text "print_translation"}
       
   975   list the sets of constants that invoke regular translation
       
   976   functions; see \secref{sec:tr-funs}.
       
   977 
       
   978   \end{description}
       
   979 
       
   980   \end{description}
       
   981 *}
       
   982 
       
   983 
       
   984 subsection {* Ambiguity of parsed expressions *}
       
   985 
       
   986 text {*
       
   987   \begin{tabular}{rcll}
       
   988     @{attribute_def syntax_ambiguity_warning} & : & @{text attribute} & default @{text true} \\
       
   989     @{attribute_def syntax_ambiguity_limit} & : & @{text attribute} & default @{text 10} \\
       
   990   \end{tabular}
       
   991 
       
   992   Depending on the grammar and the given input, parsing may be
       
   993   ambiguous.  Isabelle lets the Earley parser enumerate all possible
       
   994   parse trees, and then tries to make the best out of the situation.
       
   995   Terms that cannot be type-checked are filtered out, which often
       
   996   leads to a unique result in the end.  Unlike regular type
       
   997   reconstruction, which is applied to the whole collection of input
       
   998   terms simultaneously, the filtering stage only treats each given
       
   999   term in isolation.  Filtering is also not attempted for individual
       
  1000   types or raw ASTs (as required for @{command translations}).
       
  1001 
       
  1002   Certain warning or error messages are printed, depending on the
       
  1003   situation and the given configuration options.  Parsing ultimately
       
  1004   fails, if multiple results remain after the filtering phase.
       
  1005 
       
  1006   \begin{description}
       
  1007 
       
  1008   \item @{attribute syntax_ambiguity_warning} controls output of
       
  1009   explicit warning messages about syntax ambiguity.
       
  1010 
       
  1011   \item @{attribute syntax_ambiguity_limit} determines the number of
       
  1012   resulting parse trees that are shown as part of the printed message
       
  1013   in case of an ambiguity.
       
  1014 
       
  1015   \end{description}
       
  1016 *}
       
  1017 
       
  1018 
       
  1019 section {* Syntax transformations \label{sec:syntax-transformations} *}
       
  1020 
       
  1021 text {* The inner syntax engine of Isabelle provides separate
       
  1022   mechanisms to transform parse trees either as rewrite systems on
       
  1023   first-order ASTs (\secref{sec:syn-trans}), or ML functions on ASTs
       
  1024   or syntactic @{text "\<lambda>"}-terms (\secref{sec:tr-funs}).  This works
       
  1025   both for parsing and printing, as outlined in
       
  1026   \figref{fig:parse-print}.
       
  1027 
       
  1028   \begin{figure}[htbp]
       
  1029   \begin{center}
       
  1030   \begin{tabular}{cl}
       
  1031   string          & \\
       
  1032   @{text "\<down>"}     & lexer + parser \\
       
  1033   parse tree      & \\
       
  1034   @{text "\<down>"}     & parse AST translation \\
       
  1035   AST             & \\
       
  1036   @{text "\<down>"}     & AST rewriting (macros) \\
       
  1037   AST             & \\
       
  1038   @{text "\<down>"}     & parse translation \\
       
  1039   --- pre-term ---    & \\
       
  1040   @{text "\<down>"}     & print translation \\
       
  1041   AST             & \\
       
  1042   @{text "\<down>"}     & AST rewriting (macros) \\
       
  1043   AST             & \\
       
  1044   @{text "\<down>"}     & print AST translation \\
       
  1045   string          &
       
  1046   \end{tabular}
       
  1047   \end{center}
       
  1048   \caption{Parsing and printing with translations}\label{fig:parse-print}
       
  1049   \end{figure}
       
  1050 
       
  1051   These intermediate syntax tree formats eventually lead to a pre-term
       
  1052   with all names and binding scopes resolved, but most type
       
  1053   information still missing.  Explicit type constraints might be given by
       
  1054   the user, or implicit position information by the system --- both
       
  1055   need to be passed-through carefully by syntax transformations.
       
  1056 
       
  1057   Pre-terms are further processed by the so-called \emph{check} and
       
  1058   \emph{unckeck} phases that are intertwined with type-inference (see
       
  1059   also \cite{isabelle-implementation}).  The latter allows to operate
       
  1060   on higher-order abstract syntax with proper binding and type
       
  1061   information already available.
       
  1062 
       
  1063   As a rule of thumb, anything that manipulates bindings of variables
       
  1064   or constants needs to be implemented as syntax transformation (see
       
  1065   below).  Anything else is better done via check/uncheck: a prominent
       
  1066   example application is the @{command abbreviation} concept of
       
  1067   Isabelle/Pure. *}
       
  1068 
       
  1069 
       
  1070 subsection {* Abstract syntax trees \label{sec:ast} *}
       
  1071 
       
  1072 text {* The ML datatype @{ML_type Ast.ast} explicitly represents the
       
  1073   intermediate AST format that is used for syntax rewriting
       
  1074   (\secref{sec:syn-trans}).  It is defined in ML as follows:
       
  1075   \begin{ttbox}
       
  1076   datatype ast =
       
  1077     Constant of string |
       
  1078     Variable of string |
       
  1079     Appl of ast list
       
  1080   \end{ttbox}
       
  1081 
       
  1082   An AST is either an atom (constant or variable) or a list of (at
       
  1083   least two) subtrees.  Occasional diagnostic output of ASTs uses
       
  1084   notation that resembles S-expression of LISP.  Constant atoms are
       
  1085   shown as quoted strings, variable atoms as non-quoted strings and
       
  1086   applications as a parenthesized list of subtrees.  For example, the
       
  1087   AST
       
  1088   @{ML [display] "Ast.Appl
       
  1089   [Ast.Constant \"_abs\", Ast.Variable \"x\", Ast.Variable \"t\"]"}
       
  1090   is pretty-printed as @{verbatim "(\"_abs\" x t)"}.  Note that
       
  1091   @{verbatim "()"} and @{verbatim "(x)"} are excluded as ASTs, because
       
  1092   they have too few subtrees.
       
  1093 
       
  1094   \medskip AST application is merely a pro-forma mechanism to indicate
       
  1095   certain syntactic structures.  Thus @{verbatim "(c a b)"} could mean
       
  1096   either term application or type application, depending on the
       
  1097   syntactic context.
       
  1098 
       
  1099   Nested application like @{verbatim "((\"_abs\" x t) u)"} is also
       
  1100   possible, but ASTs are definitely first-order: the syntax constant
       
  1101   @{verbatim "\"_abs\""} does not bind the @{verbatim x} in any way.
       
  1102   Proper bindings are introduced in later stages of the term syntax,
       
  1103   where @{verbatim "(\"_abs\" x t)"} becomes an @{ML Abs} node and
       
  1104   occurrences of @{verbatim x} in @{verbatim t} are replaced by bound
       
  1105   variables (represented as de-Bruijn indices).
       
  1106 *}
       
  1107 
       
  1108 
       
  1109 subsubsection {* AST constants versus variables *}
       
  1110 
       
  1111 text {* Depending on the situation --- input syntax, output syntax,
       
  1112   translation patterns --- the distinction of atomic asts as @{ML
       
  1113   Ast.Constant} versus @{ML Ast.Variable} serves slightly different
       
  1114   purposes.
       
  1115 
       
  1116   Input syntax of a term such as @{text "f a b = c"} does not yet
       
  1117   indicate the scopes of atomic entities @{text "f, a, b, c"}: they
       
  1118   could be global constants or local variables, even bound ones
       
  1119   depending on the context of the term.  @{ML Ast.Variable} leaves
       
  1120   this choice still open: later syntax layers (or translation
       
  1121   functions) may capture such a variable to determine its role
       
  1122   specifically, to make it a constant, bound variable, free variable
       
  1123   etc.  In contrast, syntax translations that introduce already known
       
  1124   constants would rather do it via @{ML Ast.Constant} to prevent
       
  1125   accidental re-interpretation later on.
       
  1126 
       
  1127   Output syntax turns term constants into @{ML Ast.Constant} and
       
  1128   variables (free or schematic) into @{ML Ast.Variable}.  This
       
  1129   information is precise when printing fully formal @{text "\<lambda>"}-terms.
       
  1130 
       
  1131   In AST translation patterns (\secref{sec:syn-trans}) the system
       
  1132   guesses from the current theory context which atoms should be
       
  1133   treated as constant versus variable for the matching process.
       
  1134   Sometimes this needs to be indicated more explicitly using @{text
       
  1135   "CONST c"} inside the term language.  It is also possible to use
       
  1136   @{command syntax} declarations (without mixfix annotation) to
       
  1137   enforce that certain unqualified names are always treated as
       
  1138   constant within the syntax machinery.
       
  1139 
       
  1140   \medskip For ASTs that represent the language of types or sorts, the
       
  1141   situation is much simpler, since the concrete syntax already
       
  1142   distinguishes type variables from type constants (constructors).  So
       
  1143   @{text "('a, 'b) foo"} corresponds to an AST application of some
       
  1144   constant for @{text foo} and variable arguments for @{text "'a"} and
       
  1145   @{text "'b"}.  Note that the postfix application is merely a feature
       
  1146   of the concrete syntax, while in the AST the constructor occurs in
       
  1147   head position.  *}
       
  1148 
       
  1149 
       
  1150 subsubsection {* Authentic syntax names *}
       
  1151 
       
  1152 text {* Naming constant entities within ASTs is another delicate
       
  1153   issue.  Unqualified names are looked up in the name space tables in
       
  1154   the last stage of parsing, after all translations have been applied.
       
  1155   Since syntax transformations do not know about this later name
       
  1156   resolution yet, there can be surprises in boundary cases.
       
  1157 
       
  1158   \emph{Authentic syntax names} for @{ML Ast.Constant} avoid this
       
  1159   problem: the fully-qualified constant name with a special prefix for
       
  1160   its formal category (@{text "class"}, @{text "type"}, @{text
       
  1161   "const"}, @{text "fixed"}) represents the information faithfully
       
  1162   within the untyped AST format.  Accidental overlap with free or
       
  1163   bound variables is excluded as well.  Authentic syntax names work
       
  1164   implicitly in the following situations:
       
  1165 
       
  1166   \begin{itemize}
       
  1167 
       
  1168   \item Input of term constants (or fixed variables) that are
       
  1169   introduced by concrete syntax via @{command notation}: the
       
  1170   correspondence of a particular grammar production to some known term
       
  1171   entity is preserved.
       
  1172 
       
  1173   \item Input of type constants (constructors) and type classes ---
       
  1174   thanks to explicit syntactic distinction independently on the
       
  1175   context.
       
  1176 
       
  1177   \item Output of term constants, type constants, type classes ---
       
  1178   this information is already available from the internal term to be
       
  1179   printed.
       
  1180 
       
  1181   \end{itemize}
       
  1182 
       
  1183   In other words, syntax transformations that operate on input terms
       
  1184   written as prefix applications are difficult to make robust.
       
  1185   Luckily, this case rarely occurs in practice, because syntax forms
       
  1186   to be translated usually correspond to some bits of concrete
       
  1187   notation. *}
       
  1188 
       
  1189 
       
  1190 subsection {* Raw syntax and translations \label{sec:syn-trans} *}
       
  1191 
       
  1192 text {*
       
  1193   \begin{tabular}{rcll}
       
  1194     @{command_def "nonterminal"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1195     @{command_def "syntax"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1196     @{command_def "no_syntax"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1197     @{command_def "translations"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1198     @{command_def "no_translations"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1199     @{attribute_def syntax_ast_trace} & : & @{text attribute} & default @{text false} \\
       
  1200     @{attribute_def syntax_ast_stats} & : & @{text attribute} & default @{text false} \\
       
  1201   \end{tabular}
       
  1202 
       
  1203   Unlike mixfix notation for existing formal entities
       
  1204   (\secref{sec:notation}), raw syntax declarations provide full access
       
  1205   to the priority grammar of the inner syntax, without any sanity
       
  1206   checks.  This includes additional syntactic categories (via
       
  1207   @{command nonterminal}) and free-form grammar productions (via
       
  1208   @{command syntax}).  Additional syntax translations (or macros, via
       
  1209   @{command translations}) are required to turn resulting parse trees
       
  1210   into proper representations of formal entities again.
       
  1211 
       
  1212   @{rail "
       
  1213     @@{command nonterminal} (@{syntax name} + @'and')
       
  1214     ;
       
  1215     (@@{command syntax} | @@{command no_syntax}) @{syntax mode}? (constdecl +)
       
  1216     ;
       
  1217     (@@{command translations} | @@{command no_translations})
       
  1218       (transpat ('==' | '=>' | '<=' | '\<rightleftharpoons>' | '\<rightharpoonup>' | '\<leftharpoondown>') transpat +)
       
  1219     ;
       
  1220 
       
  1221     constdecl: @{syntax name} '::' @{syntax type} @{syntax mixfix}?
       
  1222     ;
       
  1223     mode: ('(' ( @{syntax name} | @'output' | @{syntax name} @'output' ) ')')
       
  1224     ;
       
  1225     transpat: ('(' @{syntax nameref} ')')? @{syntax string}
       
  1226   "}
       
  1227 
       
  1228   \begin{description}
       
  1229 
       
  1230   \item @{command "nonterminal"}~@{text c} declares a type
       
  1231   constructor @{text c} (without arguments) to act as purely syntactic
       
  1232   type: a nonterminal symbol of the inner syntax.
       
  1233 
       
  1234   \item @{command "syntax"}~@{text "(mode) c :: \<sigma> (mx)"} augments the
       
  1235   priority grammar and the pretty printer table for the given print
       
  1236   mode (default @{verbatim "\"\""}). An optional keyword @{keyword_ref
       
  1237   "output"} means that only the pretty printer table is affected.
       
  1238 
       
  1239   Following \secref{sec:mixfix}, the mixfix annotation @{text "mx =
       
  1240   template ps q"} together with type @{text "\<sigma> = \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>"} and
       
  1241   specify a grammar production.  The @{text template} contains
       
  1242   delimiter tokens that surround @{text "n"} argument positions
       
  1243   (@{verbatim "_"}).  The latter correspond to nonterminal symbols
       
  1244   @{text "A\<^sub>i"} derived from the argument types @{text "\<tau>\<^sub>i"} as
       
  1245   follows:
       
  1246   \begin{itemize}
       
  1247 
       
  1248   \item @{text "prop"} if @{text "\<tau>\<^sub>i = prop"}
       
  1249 
       
  1250   \item @{text "logic"} if @{text "\<tau>\<^sub>i = (\<dots>)\<kappa>"} for logical type
       
  1251   constructor @{text "\<kappa> \<noteq> prop"}
       
  1252 
       
  1253   \item @{text any} if @{text "\<tau>\<^sub>i = \<alpha>"} for type variables
       
  1254 
       
  1255   \item @{text "\<kappa>"} if @{text "\<tau>\<^sub>i = \<kappa>"} for nonterminal @{text "\<kappa>"}
       
  1256   (syntactic type constructor)
       
  1257 
       
  1258   \end{itemize}
       
  1259 
       
  1260   Each @{text "A\<^sub>i"} is decorated by priority @{text "p\<^sub>i"} from the
       
  1261   given list @{text "ps"}; misssing priorities default to 0.
       
  1262 
       
  1263   The resulting nonterminal of the production is determined similarly
       
  1264   from type @{text "\<tau>"}, with priority @{text "q"} and default 1000.
       
  1265 
       
  1266   \medskip Parsing via this production produces parse trees @{text
       
  1267   "t\<^sub>1, \<dots>, t\<^sub>n"} for the argument slots.  The resulting parse tree is
       
  1268   composed as @{text "c t\<^sub>1 \<dots> t\<^sub>n"}, by using the syntax constant @{text
       
  1269   "c"} of the syntax declaration.
       
  1270 
       
  1271   Such syntactic constants are invented on the spot, without formal
       
  1272   check wrt.\ existing declarations.  It is conventional to use plain
       
  1273   identifiers prefixed by a single underscore (e.g.\ @{text
       
  1274   "_foobar"}).  Names should be chosen with care, to avoid clashes
       
  1275   with other syntax declarations.
       
  1276 
       
  1277   \medskip The special case of copy production is specified by @{text
       
  1278   "c = "}@{verbatim "\"\""} (empty string).  It means that the
       
  1279   resulting parse tree @{text "t"} is copied directly, without any
       
  1280   further decoration.
       
  1281 
       
  1282   \item @{command "no_syntax"}~@{text "(mode) decls"} removes grammar
       
  1283   declarations (and translations) resulting from @{text decls}, which
       
  1284   are interpreted in the same manner as for @{command "syntax"} above.
       
  1285 
       
  1286   \item @{command "translations"}~@{text rules} specifies syntactic
       
  1287   translation rules (i.e.\ macros) as first-order rewrite rules on
       
  1288   ASTs (\secref{sec:ast}).  The theory context maintains two
       
  1289   independent lists translation rules: parse rules (@{verbatim "=>"}
       
  1290   or @{text "\<rightharpoonup>"}) and print rules (@{verbatim "<="} or @{text "\<leftharpoondown>"}).
       
  1291   For convenience, both can be specified simultaneously as parse~/
       
  1292   print rules (@{verbatim "=="} or @{text "\<rightleftharpoons>"}).
       
  1293 
       
  1294   Translation patterns may be prefixed by the syntactic category to be
       
  1295   used for parsing; the default is @{text logic} which means that
       
  1296   regular term syntax is used.  Both sides of the syntax translation
       
  1297   rule undergo parsing and parse AST translations
       
  1298   \secref{sec:tr-funs}, in order to perform some fundamental
       
  1299   normalization like @{text "\<lambda>x y. b \<leadsto> \<lambda>x. \<lambda>y. b"}, but other AST
       
  1300   translation rules are \emph{not} applied recursively here.
       
  1301 
       
  1302   When processing AST patterns, the inner syntax lexer runs in a
       
  1303   different mode that allows identifiers to start with underscore.
       
  1304   This accommodates the usual naming convention for auxiliary syntax
       
  1305   constants --- those that do not have a logical counter part --- by
       
  1306   allowing to specify arbitrary AST applications within the term
       
  1307   syntax, independently of the corresponding concrete syntax.
       
  1308 
       
  1309   Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML
       
  1310   Ast.Variable} as follows: a qualified name or syntax constant
       
  1311   declared via @{command syntax}, or parse tree head of concrete
       
  1312   notation becomes @{ML Ast.Constant}, anything else @{ML
       
  1313   Ast.Variable}.  Note that @{text CONST} and @{text XCONST} within
       
  1314   the term language (\secref{sec:pure-grammar}) allow to enforce
       
  1315   treatment as constants.
       
  1316 
       
  1317   AST rewrite rules @{text "(lhs, rhs)"} need to obey the following
       
  1318   side-conditions:
       
  1319 
       
  1320   \begin{itemize}
       
  1321 
       
  1322   \item Rules must be left linear: @{text "lhs"} must not contain
       
  1323   repeated variables.\footnote{The deeper reason for this is that AST
       
  1324   equality is not well-defined: different occurrences of the ``same''
       
  1325   AST could be decorated differently by accidental type-constraints or
       
  1326   source position information, for example.}
       
  1327 
       
  1328   \item Every variable in @{text "rhs"} must also occur in @{text
       
  1329   "lhs"}.
       
  1330 
       
  1331   \end{itemize}
       
  1332 
       
  1333   \item @{command "no_translations"}~@{text rules} removes syntactic
       
  1334   translation rules, which are interpreted in the same manner as for
       
  1335   @{command "translations"} above.
       
  1336 
       
  1337   \item @{attribute syntax_ast_trace} and @{attribute
       
  1338   syntax_ast_stats} control diagnostic output in the AST normalization
       
  1339   process, when translation rules are applied to concrete input or
       
  1340   output.
       
  1341 
       
  1342   \end{description}
       
  1343 
       
  1344   Raw syntax and translations provides a slightly more low-level
       
  1345   access to the grammar and the form of resulting parse trees.  It is
       
  1346   often possible to avoid this untyped macro mechanism, and use
       
  1347   type-safe @{command abbreviation} or @{command notation} instead.
       
  1348   Some important situations where @{command syntax} and @{command
       
  1349   translations} are really need are as follows:
       
  1350 
       
  1351   \begin{itemize}
       
  1352 
       
  1353   \item Iterated replacement via recursive @{command translations}.
       
  1354   For example, consider list enumeration @{term "[a, b, c, d]"} as
       
  1355   defined in theory @{theory List} in Isabelle/HOL.
       
  1356 
       
  1357   \item Change of binding status of variables: anything beyond the
       
  1358   built-in @{keyword "binder"} mixfix annotation requires explicit
       
  1359   syntax translations.  For example, consider list filter
       
  1360   comprehension @{term "[x \<leftarrow> xs . P]"} as defined in theory @{theory
       
  1361   List} in Isabelle/HOL.
       
  1362 
       
  1363   \end{itemize}
       
  1364 *}
       
  1365 
       
  1366 subsubsection {* Applying translation rules *}
       
  1367 
       
  1368 text {* As a term is being parsed or printed, an AST is generated as
       
  1369   an intermediate form according to \figref{fig:parse-print}.  The AST
       
  1370   is normalized by applying translation rules in the manner of a
       
  1371   first-order term rewriting system.  We first examine how a single
       
  1372   rule is applied.
       
  1373 
       
  1374   Let @{text "t"} be the abstract syntax tree to be normalized and
       
  1375   @{text "(lhs, rhs)"} some translation rule.  A subtree @{text "u"}
       
  1376   of @{text "t"} is called \emph{redex} if it is an instance of @{text
       
  1377   "lhs"}; in this case the pattern @{text "lhs"} is said to match the
       
  1378   object @{text "u"}.  A redex matched by @{text "lhs"} may be
       
  1379   replaced by the corresponding instance of @{text "rhs"}, thus
       
  1380   \emph{rewriting} the AST @{text "t"}.  Matching requires some notion
       
  1381   of \emph{place-holders} in rule patterns: @{ML Ast.Variable} serves
       
  1382   this purpose.
       
  1383 
       
  1384   More precisely, the matching of the object @{text "u"} against the
       
  1385   pattern @{text "lhs"} is performed as follows:
       
  1386 
       
  1387   \begin{itemize}
       
  1388 
       
  1389   \item Objects of the form @{ML Ast.Variable}~@{text "x"} or @{ML
       
  1390   Ast.Constant}~@{text "x"} are matched by pattern @{ML
       
  1391   Ast.Constant}~@{text "x"}.  Thus all atomic ASTs in the object are
       
  1392   treated as (potential) constants, and a successful match makes them
       
  1393   actual constants even before name space resolution (see also
       
  1394   \secref{sec:ast}).
       
  1395 
       
  1396   \item Object @{text "u"} is matched by pattern @{ML
       
  1397   Ast.Variable}~@{text "x"}, binding @{text "x"} to @{text "u"}.
       
  1398 
       
  1399   \item Object @{ML Ast.Appl}~@{text "us"} is matched by @{ML
       
  1400   Ast.Appl}~@{text "ts"} if @{text "us"} and @{text "ts"} have the
       
  1401   same length and each corresponding subtree matches.
       
  1402 
       
  1403   \item In every other case, matching fails.
       
  1404 
       
  1405   \end{itemize}
       
  1406 
       
  1407   A successful match yields a substitution that is applied to @{text
       
  1408   "rhs"}, generating the instance that replaces @{text "u"}.
       
  1409 
       
  1410   Normalizing an AST involves repeatedly applying translation rules
       
  1411   until none are applicable.  This works yoyo-like: top-down,
       
  1412   bottom-up, top-down, etc.  At each subtree position, rules are
       
  1413   chosen in order of appearance in the theory definitions.
       
  1414 
       
  1415   The configuration options @{attribute syntax_ast_trace} and
       
  1416   @{attribute syntax_ast_stats} might help to understand this process
       
  1417   and diagnose problems.
       
  1418 
       
  1419   \begin{warn}
       
  1420   If syntax translation rules work incorrectly, the output of
       
  1421   @{command_ref print_syntax} with its \emph{rules} sections reveals the
       
  1422   actual internal forms of AST pattern, without potentially confusing
       
  1423   concrete syntax.  Recall that AST constants appear as quoted strings
       
  1424   and variables without quotes.
       
  1425   \end{warn}
       
  1426 
       
  1427   \begin{warn}
       
  1428   If @{attribute_ref eta_contract} is set to @{text "true"}, terms
       
  1429   will be @{text "\<eta>"}-contracted \emph{before} the AST rewriter sees
       
  1430   them.  Thus some abstraction nodes needed for print rules to match
       
  1431   may vanish.  For example, @{text "Ball A (\<lambda>x. P x)"} would contract
       
  1432   to @{text "Ball A P"} and the standard print rule would fail to
       
  1433   apply.  This problem can be avoided by hand-written ML translation
       
  1434   functions (see also \secref{sec:tr-funs}), which is in fact the same
       
  1435   mechanism used in built-in @{keyword "binder"} declarations.
       
  1436   \end{warn}
       
  1437 *}
       
  1438 
       
  1439 
       
  1440 subsection {* Syntax translation functions \label{sec:tr-funs} *}
       
  1441 
       
  1442 text {*
       
  1443   \begin{matharray}{rcl}
       
  1444     @{command_def "parse_ast_translation"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1445     @{command_def "parse_translation"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1446     @{command_def "print_translation"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1447     @{command_def "typed_print_translation"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1448     @{command_def "print_ast_translation"} & : & @{text "theory \<rightarrow> theory"} \\
       
  1449     @{ML_antiquotation_def "class_syntax"} & : & @{text ML_antiquotation} \\
       
  1450     @{ML_antiquotation_def "type_syntax"} & : & @{text ML_antiquotation} \\
       
  1451     @{ML_antiquotation_def "const_syntax"} & : & @{text ML_antiquotation} \\
       
  1452     @{ML_antiquotation_def "syntax_const"} & : & @{text ML_antiquotation} \\
       
  1453   \end{matharray}
       
  1454 
       
  1455   Syntax translation functions written in ML admit almost arbitrary
       
  1456   manipulations of inner syntax, at the expense of some complexity and
       
  1457   obscurity in the implementation.
       
  1458 
       
  1459   @{rail "
       
  1460   ( @@{command parse_ast_translation} | @@{command parse_translation} |
       
  1461     @@{command print_translation} | @@{command typed_print_translation} |
       
  1462     @@{command print_ast_translation}) ('(' @'advanced' ')')? @{syntax text}
       
  1463   ;
       
  1464   (@@{ML_antiquotation class_syntax} |
       
  1465    @@{ML_antiquotation type_syntax} |
       
  1466    @@{ML_antiquotation const_syntax} |
       
  1467    @@{ML_antiquotation syntax_const}) name
       
  1468   "}
       
  1469 
       
  1470   \begin{description}
       
  1471 
       
  1472   \item @{command parse_translation} etc. declare syntax translation
       
  1473   functions to the theory.  Any of these commands have a single
       
  1474   @{syntax text} argument that refers to an ML expression of
       
  1475   appropriate type, which are as follows by default:
       
  1476 
       
  1477   \medskip
       
  1478   {\footnotesize
       
  1479   \begin{tabular}{ll}
       
  1480   @{command parse_ast_translation} & : @{ML_type "(string * (Ast.ast list -> Ast.ast)) list"} \\
       
  1481   @{command parse_translation} & : @{ML_type "(string * (term list -> term)) list"} \\
       
  1482   @{command print_translation} & : @{ML_type "(string * (term list -> term)) list"} \\
       
  1483   @{command typed_print_translation} & : @{ML_type "(string * (typ -> term list -> term)) list"} \\
       
  1484   @{command print_ast_translation} & : @{ML_type "(string * (Ast.ast list -> Ast.ast)) list"} \\
       
  1485   \end{tabular}}
       
  1486   \medskip
       
  1487 
       
  1488   The argument list consists of @{text "(c, tr)"} pairs, where @{text
       
  1489   "c"} is the syntax name of the formal entity involved, and @{text
       
  1490   "tr"} a function that translates a syntax form @{text "c args"} into
       
  1491   @{text "tr args"}.  The ML naming convention for parse translations
       
  1492   is @{text "c_tr"} and for print translations @{text "c_tr'"}.
       
  1493 
       
  1494   The @{command_ref print_syntax} command displays the sets of names
       
  1495   associated with the translation functions of a theory under @{text
       
  1496   "parse_ast_translation"} etc.
       
  1497 
       
  1498   If the @{verbatim "("}@{keyword "advanced"}@{verbatim ")"} option is
       
  1499   given, the corresponding translation functions depend on the current
       
  1500   theory or proof context as additional argument.  This allows to
       
  1501   implement advanced syntax mechanisms, as translations functions may
       
  1502   refer to specific theory declarations or auxiliary proof data.
       
  1503 
       
  1504   \item @{text "@{class_syntax c}"}, @{text "@{type_syntax c}"},
       
  1505   @{text "@{const_syntax c}"} inline the authentic syntax name of the
       
  1506   given formal entities into the ML source.  This is the
       
  1507   fully-qualified logical name prefixed by a special marker to
       
  1508   indicate its kind: thus different logical name spaces are properly
       
  1509   distinguished within parse trees.
       
  1510 
       
  1511   \item @{text "@{const_syntax c}"} inlines the name @{text "c"} of
       
  1512   the given syntax constant, having checked that it has been declared
       
  1513   via some @{command syntax} commands within the theory context.  Note
       
  1514   that the usual naming convention makes syntax constants start with
       
  1515   underscore, to reduce the chance of accidental clashes with other
       
  1516   names occurring in parse trees (unqualified constants etc.).
       
  1517 
       
  1518   \end{description}
       
  1519 *}
       
  1520 
       
  1521 
       
  1522 subsubsection {* The translation strategy *}
       
  1523 
       
  1524 text {* The different kinds of translation functions are invoked during
       
  1525   the transformations between parse trees, ASTs and syntactic terms
       
  1526   (cf.\ \figref{fig:parse-print}).  Whenever a combination of the form
       
  1527   @{text "c x\<^sub>1 \<dots> x\<^sub>n"} is encountered, and a translation function
       
  1528   @{text "f"} of appropriate kind is declared for @{text "c"}, the
       
  1529   result is produced by evaluation of @{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} in ML.
       
  1530 
       
  1531   For AST translations, the arguments @{text "x\<^sub>1, \<dots>, x\<^sub>n"} are ASTs.  A
       
  1532   combination has the form @{ML "Ast.Constant"}~@{text "c"} or @{ML
       
  1533   "Ast.Appl"}~@{text "["}@{ML Ast.Constant}~@{text "c, x\<^sub>1, \<dots>, x\<^sub>n]"}.
       
  1534   For term translations, the arguments are terms and a combination has
       
  1535   the form @{ML Const}~@{text "(c, \<tau>)"} or @{ML Const}~@{text "(c, \<tau>)
       
  1536   $ x\<^sub>1 $ \<dots> $ x\<^sub>n"}.  Terms allow more sophisticated transformations
       
  1537   than ASTs do, typically involving abstractions and bound
       
  1538   variables. \emph{Typed} print translations may even peek at the type
       
  1539   @{text "\<tau>"} of the constant they are invoked on, although that information
       
  1540   may be inaccurate.
       
  1541 
       
  1542   Regardless of whether they act on ASTs or terms, translation
       
  1543   functions called during the parsing process differ from those for
       
  1544   printing in their overall behaviour:
       
  1545 
       
  1546   \begin{description}
       
  1547 
       
  1548   \item [Parse translations] are applied bottom-up.  The arguments are
       
  1549   already in translated form.  The translations must not fail;
       
  1550   exceptions trigger an error message.  There may be at most one
       
  1551   function associated with any syntactic name.
       
  1552 
       
  1553   \item [Print translations] are applied top-down.  They are supplied
       
  1554   with arguments that are partly still in internal form.  The result
       
  1555   again undergoes translation; therefore a print translation should
       
  1556   not introduce as head the very constant that invoked it.  The
       
  1557   function may raise exception @{ML Match} to indicate failure; in
       
  1558   this event it has no effect.  Multiple functions associated with
       
  1559   some syntactic name are tried in the order of declaration in the
       
  1560   theory.
       
  1561 
       
  1562   \end{description}
       
  1563 
       
  1564   Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and
       
  1565   @{ML Const} for terms --- can invoke translation functions.  This
       
  1566   means that parse translations can only be associated with parse tree
       
  1567   heads of concrete syntax, or syntactic constants introduced via
       
  1568   other translations.  For plain identifiers within the term language,
       
  1569   the status of constant versus variable is not yet know during
       
  1570   parsing.  This is in contrast to print translations, where constants
       
  1571   are explicitly known from the given term in its fully internal form.
       
  1572 *}
       
  1573 
       
  1574 end