src/Doc/Isar_Ref/Inner_Syntax.thy
changeset 62106 d6af554512d7
parent 61997 4d9518c3d031
child 62107 f74a33b14200
equal deleted inserted replaced
62105:686681f69d5e 62106:d6af554512d7
     4 imports Base Main
     4 imports Base Main
     5 begin
     5 begin
     6 
     6 
     7 chapter \<open>Inner syntax --- the term language \label{ch:inner-syntax}\<close>
     7 chapter \<open>Inner syntax --- the term language \label{ch:inner-syntax}\<close>
     8 
     8 
     9 text \<open>The inner syntax of Isabelle provides concrete notation for
     9 text \<open>
    10   the main entities of the logical framework, notably \<open>\<lambda>\<close>-terms with types and type classes.  Applications may either
    10   The inner syntax of Isabelle provides concrete notation for the main
    11   extend existing syntactic categories by additional notation, or
    11   entities of the logical framework, notably \<open>\<lambda>\<close>-terms with types and type
    12   define new sub-languages that are linked to the standard term
    12   classes. Applications may either extend existing syntactic categories by
    13   language via some explicit markers.  For example \<^verbatim>\<open>FOO\<close>~\<open>foo\<close> could
    13   additional notation, or define new sub-languages that are linked to the
    14   embed the syntax corresponding for some
    14   standard term language via some explicit markers. For example \<^verbatim>\<open>FOO\<close>~\<open>foo\<close>
    15   user-defined nonterminal \<open>foo\<close> --- within the bounds of the
    15   could embed the syntax corresponding for some user-defined nonterminal \<open>foo\<close>
    16   given lexical syntax of Isabelle/Pure.
    16   --- within the bounds of the given lexical syntax of Isabelle/Pure.
    17 
    17 
    18   The most basic way to specify concrete syntax for logical entities
    18   The most basic way to specify concrete syntax for logical entities works via
    19   works via mixfix annotations (\secref{sec:mixfix}), which may be
    19   mixfix annotations (\secref{sec:mixfix}), which may be usually given as part
    20   usually given as part of the original declaration or via explicit
    20   of the original declaration or via explicit notation commands later on
    21   notation commands later on (\secref{sec:notation}).  This already
    21   (\secref{sec:notation}). This already covers many needs of concrete syntax
    22   covers many needs of concrete syntax without having to understand
    22   without having to understand the full complexity of inner syntax layers.
    23   the full complexity of inner syntax layers.
    23 
    24 
    24   Further details of the syntax engine involves the classical distinction of
    25   Further details of the syntax engine involves the classical
    25   lexical language versus context-free grammar (see \secref{sec:pure-syntax}),
    26   distinction of lexical language versus context-free grammar (see
    26   and various mechanisms for \<^emph>\<open>syntax transformations\<close> (see
    27   \secref{sec:pure-syntax}), and various mechanisms for \<^emph>\<open>syntax
    27   \secref{sec:syntax-transformations}).
    28   transformations\<close> (see \secref{sec:syntax-transformations}).
       
    29 \<close>
    28 \<close>
    30 
    29 
    31 
    30 
    32 section \<open>Printing logical entities\<close>
    31 section \<open>Printing logical entities\<close>
    33 
    32 
    61     @@{command print_state} @{syntax modes}?
    60     @@{command print_state} @{syntax modes}?
    62     ;
    61     ;
    63     @{syntax_def modes}: '(' (@{syntax name} + ) ')'
    62     @{syntax_def modes}: '(' (@{syntax name} + ) ')'
    64   \<close>}
    63   \<close>}
    65 
    64 
    66   \<^descr> @{command "typ"}~\<open>\<tau>\<close> reads and prints a type expression
    65   \<^descr> @{command "typ"}~\<open>\<tau>\<close> reads and prints a type expression according to the
    67   according to the current context.
    66   current context.
    68 
    67 
    69   \<^descr> @{command "typ"}~\<open>\<tau> :: s\<close> uses type-inference to
    68   \<^descr> @{command "typ"}~\<open>\<tau> :: s\<close> uses type-inference to determine the most
    70   determine the most general way to make \<open>\<tau>\<close> conform to sort
    69   general way to make \<open>\<tau>\<close> conform to sort \<open>s\<close>. For concrete \<open>\<tau>\<close> this checks if
    71   \<open>s\<close>.  For concrete \<open>\<tau>\<close> this checks if the type
    70   the type belongs to that sort. Dummy type parameters ``\<open>_\<close>'' (underscore)
    72   belongs to that sort.  Dummy type parameters ``\<open>_\<close>''
    71   are assigned to fresh type variables with most general sorts, according the
    73   (underscore) are assigned to fresh type variables with most general
    72   the principles of type-inference.
    74   sorts, according the the principles of type-inference.
    73 
    75 
    74     \<^descr> @{command "term"}~\<open>t\<close> and @{command "prop"}~\<open>\<phi>\<close> read, type-check and
    76   \<^descr> @{command "term"}~\<open>t\<close> and @{command "prop"}~\<open>\<phi>\<close>
    75     print terms or propositions according to the current theory or proof
    77   read, type-check and print terms or propositions according to the
    76     context; the inferred type of \<open>t\<close> is output as well. Note that these
    78   current theory or proof context; the inferred type of \<open>t\<close> is
    77     commands are also useful in inspecting the current environment of term
    79   output as well.  Note that these commands are also useful in
    78     abbreviations.
    80   inspecting the current environment of term abbreviations.
    79 
    81 
    80     \<^descr> @{command "thm"}~\<open>a\<^sub>1 \<dots> a\<^sub>n\<close> retrieves theorems from the current theory
    82   \<^descr> @{command "thm"}~\<open>a\<^sub>1 \<dots> a\<^sub>n\<close> retrieves
    81     or proof context. Note that any attributes included in the theorem
    83   theorems from the current theory or proof context.  Note that any
    82     specifications are applied to a temporary context derived from the current
    84   attributes included in the theorem specifications are applied to a
    83     theory or proof; the result is discarded, i.e.\ attributes involved in
    85   temporary context derived from the current theory or proof; the
    84     \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> do not have any permanent effect.
    86   result is discarded, i.e.\ attributes involved in \<open>a\<^sub>1,
    85 
    87   \<dots>, a\<^sub>n\<close> do not have any permanent effect.
    86     \<^descr> @{command "prf"} displays the (compact) proof term of the current proof
    88 
    87     state (if present), or of the given theorems. Note that this requires
    89   \<^descr> @{command "prf"} displays the (compact) proof term of the
    88     proof terms to be switched on for the current object logic (see the
    90   current proof state (if present), or of the given theorems. Note
    89     ``Proof terms'' section of the Isabelle reference manual for information
    91   that this requires proof terms to be switched on for the current
    90     on how to do this).
    92   object logic (see the ``Proof terms'' section of the Isabelle
    91 
    93   reference manual for information on how to do this).
    92     \<^descr> @{command "full_prf"} is like @{command "prf"}, but displays the full
    94 
    93     proof term, i.e.\ also displays information omitted in the compact proof
    95   \<^descr> @{command "full_prf"} is like @{command "prf"}, but displays
    94     term, which is denoted by ``\<open>_\<close>'' placeholders there.
    96   the full proof term, i.e.\ also displays information omitted in the
    95 
    97   compact proof term, which is denoted by ``\<open>_\<close>'' placeholders
    96     \<^descr> @{command "print_state"} prints the current proof state (if present),
    98   there.
    97     including current facts and goals.
    99 
       
   100   \<^descr> @{command "print_state"} prints the current proof state (if
       
   101   present), including current facts and goals.
       
   102 
       
   103 
    98 
   104   All of the diagnostic commands above admit a list of \<open>modes\<close> to be
    99   All of the diagnostic commands above admit a list of \<open>modes\<close> to be
   105   specified, which is appended to the current print mode; see also
   100   specified, which is appended to the current print mode; see also
   106   \secref{sec:print-modes}. Thus the output behavior may be modified according
   101   \secref{sec:print-modes}. Thus the output behavior may be modified according
   107   particular print mode features. For example, @{command
   102   particular print mode features. For example, @{command
   108   "print_state"}~\<open>(latex)\<close> prints the current proof state with mathematical
   103   "print_state"}~\<open>(latex)\<close> prints the current proof state with mathematical
   109   symbols and special characters represented in {\LaTeX} source, according to
   104   symbols and special characters represented in {\LaTeX} source, according to
   110   the Isabelle style @{cite "isabelle-system"}.
   105   the Isabelle style @{cite "isabelle-system"}.
   111 
   106 
   112   Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more
   107   Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more systematic
   113   systematic way to include formal items into the printed text
   108   way to include formal items into the printed text document.
   114   document.
       
   115 \<close>
   109 \<close>
   116 
   110 
   117 
   111 
   118 subsection \<open>Details of printed content\<close>
   112 subsection \<open>Details of printed content\<close>
   119 
   113 
   135     @{attribute_def show_tags} & : & \<open>attribute\<close> & default \<open>false\<close> \\
   129     @{attribute_def show_tags} & : & \<open>attribute\<close> & default \<open>false\<close> \\
   136     @{attribute_def show_question_marks} & : & \<open>attribute\<close> & default \<open>true\<close> \\
   130     @{attribute_def show_question_marks} & : & \<open>attribute\<close> & default \<open>true\<close> \\
   137   \end{tabular}
   131   \end{tabular}
   138   \<^medskip>
   132   \<^medskip>
   139 
   133 
   140   These configuration options control the detail of information that
   134   These configuration options control the detail of information that is
   141   is displayed for types, terms, theorems, goals etc.  See also
   135   displayed for types, terms, theorems, goals etc. See also
   142   \secref{sec:config}.
   136   \secref{sec:config}.
   143 
   137 
   144   \<^descr> @{attribute show_markup} controls direct inlining of markup
   138   \<^descr> @{attribute show_markup} controls direct inlining of markup into the
   145   into the printed representation of formal entities --- notably type
   139   printed representation of formal entities --- notably type and sort
   146   and sort constraints.  This enables Prover IDE users to retrieve
   140   constraints. This enables Prover IDE users to retrieve that information via
   147   that information via tooltips or popups while hovering with the
   141   tooltips or popups while hovering with the mouse over the output window, for
   148   mouse over the output window, for example.  Consequently, this
   142   example. Consequently, this option is enabled by default for Isabelle/jEdit.
   149   option is enabled by default for Isabelle/jEdit.
   143 
   150 
   144   \<^descr> @{attribute show_types} and @{attribute show_sorts} control printing of
   151   \<^descr> @{attribute show_types} and @{attribute show_sorts} control
   145   type constraints for term variables, and sort constraints for type
   152   printing of type constraints for term variables, and sort
   146   variables. By default, neither of these are shown in output. If @{attribute
   153   constraints for type variables.  By default, neither of these are
   147   show_sorts} is enabled, types are always shown as well. In Isabelle/jEdit,
   154   shown in output.  If @{attribute show_sorts} is enabled, types are
   148   manual setting of these options is normally not required thanks to
   155   always shown as well.  In Isabelle/jEdit, manual setting of these
   149   @{attribute show_markup} above.
   156   options is normally not required thanks to @{attribute show_markup}
   150 
   157   above.
   151   Note that displaying types and sorts may explain why a polymorphic inference
   158 
   152   rule fails to resolve with some goal, or why a rewrite rule does not apply
   159   Note that displaying types and sorts may explain why a polymorphic
   153   as expected.
   160   inference rule fails to resolve with some goal, or why a rewrite
   154 
   161   rule does not apply as expected.
   155   \<^descr> @{attribute show_consts} controls printing of types of constants when
   162 
   156   displaying a goal state.
   163   \<^descr> @{attribute show_consts} controls printing of types of
   157 
   164   constants when displaying a goal state.
   158   Note that the output can be enormous, because polymorphic constants often
   165 
   159   occur at several different type instances.
   166   Note that the output can be enormous, because polymorphic constants
   160 
   167   often occur at several different type instances.
   161   \<^descr> @{attribute show_abbrevs} controls folding of constant abbreviations.
   168 
   162 
   169   \<^descr> @{attribute show_abbrevs} controls folding of constant
   163   \<^descr> @{attribute show_brackets} controls bracketing in pretty printed output.
   170   abbreviations.
   164   If enabled, all sub-expressions of the pretty printing tree will be
   171 
   165   parenthesized, even if this produces malformed term syntax! This crude way
   172   \<^descr> @{attribute show_brackets} controls bracketing in pretty
   166   of showing the internal structure of pretty printed entities may
   173   printed output.  If enabled, all sub-expressions of the pretty
   167   occasionally help to diagnose problems with operator priorities, for
   174   printing tree will be parenthesized, even if this produces malformed
   168   example.
   175   term syntax!  This crude way of showing the internal structure of
   169 
   176   pretty printed entities may occasionally help to diagnose problems
   170   \<^descr> @{attribute names_long}, @{attribute names_short}, and @{attribute
   177   with operator priorities, for example.
   171   names_unique} control the way of printing fully qualified internal names in
   178 
   172   external form. See also \secref{sec:antiq} for the document antiquotation
   179   \<^descr> @{attribute names_long}, @{attribute names_short}, and
   173   options of the same names.
   180   @{attribute names_unique} control the way of printing fully
   174 
   181   qualified internal names in external form.  See also
   175   \<^descr> @{attribute eta_contract} controls \<open>\<eta>\<close>-contracted printing of terms.
   182   \secref{sec:antiq} for the document antiquotation options of the
   176 
   183   same names.
   177   The \<open>\<eta>\<close>-contraction law asserts @{prop "(\<lambda>x. f x) \<equiv> f"}, provided \<open>x\<close> is not
   184 
   178   free in \<open>f\<close>. It asserts \<^emph>\<open>extensionality\<close> of functions: @{prop "f \<equiv> g"} if
   185   \<^descr> @{attribute eta_contract} controls \<open>\<eta>\<close>-contracted
   179   @{prop "f x \<equiv> g x"} for all \<open>x\<close>. Higher-order unification frequently puts
   186   printing of terms.
   180   terms into a fully \<open>\<eta>\<close>-expanded form. For example, if \<open>F\<close> has type \<open>(\<tau> \<Rightarrow> \<tau>)
   187 
   181   \<Rightarrow> \<tau>\<close> then its expanded form is @{term "\<lambda>h. F (\<lambda>x. h x)"}.
   188   The \<open>\<eta>\<close>-contraction law asserts @{prop "(\<lambda>x. f x) \<equiv> f"},
   182 
   189   provided \<open>x\<close> is not free in \<open>f\<close>.  It asserts
   183   Enabling @{attribute eta_contract} makes Isabelle perform \<open>\<eta>\<close>-contractions
   190   \<^emph>\<open>extensionality\<close> of functions: @{prop "f \<equiv> g"} if @{prop "f x \<equiv>
   184   before printing, so that @{term "\<lambda>h. F (\<lambda>x. h x)"} appears simply as \<open>F\<close>.
   191   g x"} for all \<open>x\<close>.  Higher-order unification frequently puts
   185 
   192   terms into a fully \<open>\<eta>\<close>-expanded form.  For example, if \<open>F\<close> has type \<open>(\<tau> \<Rightarrow> \<tau>) \<Rightarrow> \<tau>\<close> then its expanded form is @{term
   186   Note that the distinction between a term and its \<open>\<eta>\<close>-expanded form
   193   "\<lambda>h. F (\<lambda>x. h x)"}.
   187   occasionally matters. While higher-order resolution and rewriting operate
   194 
   188   modulo \<open>\<alpha>\<beta>\<eta>\<close>-conversion, some other tools might look at terms more
   195   Enabling @{attribute eta_contract} makes Isabelle perform \<open>\<eta>\<close>-contractions before printing, so that @{term "\<lambda>h. F (\<lambda>x. h x)"}
   189   discretely.
   196   appears simply as \<open>F\<close>.
   190 
   197 
   191   \<^descr> @{attribute goals_limit} controls the maximum number of subgoals to be
   198   Note that the distinction between a term and its \<open>\<eta>\<close>-expanded
   192   printed.
   199   form occasionally matters.  While higher-order resolution and
   193 
   200   rewriting operate modulo \<open>\<alpha>\<beta>\<eta>\<close>-conversion, some other tools
   194   \<^descr> @{attribute show_main_goal} controls whether the main result to be proven
   201   might look at terms more discretely.
   195   should be displayed. This information might be relevant for schematic goals,
   202 
   196   to inspect the current claim that has been synthesized so far.
   203   \<^descr> @{attribute goals_limit} controls the maximum number of
   197 
   204   subgoals to be printed.
   198   \<^descr> @{attribute show_hyps} controls printing of implicit hypotheses of local
   205 
   199   facts. Normally, only those hypotheses are displayed that are \<^emph>\<open>not\<close> covered
   206   \<^descr> @{attribute show_main_goal} controls whether the main result
   200   by the assumptions of the current context: this situation indicates a fault
   207   to be proven should be displayed.  This information might be
   201   in some tool being used.
   208   relevant for schematic goals, to inspect the current claim that has
   202 
   209   been synthesized so far.
   203   By enabling @{attribute show_hyps}, output of \<^emph>\<open>all\<close> hypotheses can be
   210 
   204   enforced, which is occasionally useful for diagnostic purposes.
   211   \<^descr> @{attribute show_hyps} controls printing of implicit
   205 
   212   hypotheses of local facts.  Normally, only those hypotheses are
   206   \<^descr> @{attribute show_tags} controls printing of extra annotations within
   213   displayed that are \<^emph>\<open>not\<close> covered by the assumptions of the
   207   theorems, such as internal position information, or the case names being
   214   current context: this situation indicates a fault in some tool being
   208   attached by the attribute @{attribute case_names}.
   215   used.
   209 
   216 
   210   Note that the @{attribute tagged} and @{attribute untagged} attributes
   217   By enabling @{attribute show_hyps}, output of \<^emph>\<open>all\<close> hypotheses
   211   provide low-level access to the collection of tags associated with a
   218   can be enforced, which is occasionally useful for diagnostic
   212   theorem.
   219   purposes.
   213 
   220 
   214   \<^descr> @{attribute show_question_marks} controls printing of question marks for
   221   \<^descr> @{attribute show_tags} controls printing of extra annotations
   215   schematic variables, such as \<open>?x\<close>. Only the leading question mark is
   222   within theorems, such as internal position information, or the case
   216   affected, the remaining text is unchanged (including proper markup for
   223   names being attached by the attribute @{attribute case_names}.
   217   schematic variables that might be relevant for user interfaces).
   224 
       
   225   Note that the @{attribute tagged} and @{attribute untagged}
       
   226   attributes provide low-level access to the collection of tags
       
   227   associated with a theorem.
       
   228 
       
   229   \<^descr> @{attribute show_question_marks} controls printing of question
       
   230   marks for schematic variables, such as \<open>?x\<close>.  Only the leading
       
   231   question mark is affected, the remaining text is unchanged
       
   232   (including proper markup for schematic variables that might be
       
   233   relevant for user interfaces).
       
   234 \<close>
   218 \<close>
   235 
   219 
   236 
   220 
   237 subsection \<open>Alternative print modes \label{sec:print-modes}\<close>
   221 subsection \<open>Alternative print modes \label{sec:print-modes}\<close>
   238 
   222 
   240   \begin{mldecls}
   224   \begin{mldecls}
   241     @{index_ML print_mode_value: "unit -> string list"} \\
   225     @{index_ML print_mode_value: "unit -> string list"} \\
   242     @{index_ML Print_Mode.with_modes: "string list -> ('a -> 'b) -> 'a -> 'b"} \\
   226     @{index_ML Print_Mode.with_modes: "string list -> ('a -> 'b) -> 'a -> 'b"} \\
   243   \end{mldecls}
   227   \end{mldecls}
   244 
   228 
   245   The \<^emph>\<open>print mode\<close> facility allows to modify various operations
   229   The \<^emph>\<open>print mode\<close> facility allows to modify various operations for printing.
   246   for printing.  Commands like @{command typ}, @{command term},
   230   Commands like @{command typ}, @{command term}, @{command thm} (see
   247   @{command thm} (see \secref{sec:print-diag}) take additional print
   231   \secref{sec:print-diag}) take additional print modes as optional argument.
   248   modes as optional argument.  The underlying ML operations are as
   232   The underlying ML operations are as follows.
   249   follows.
   233 
   250 
   234     \<^descr> @{ML "print_mode_value ()"} yields the list of currently active print
   251   \<^descr> @{ML "print_mode_value ()"} yields the list of currently
   235     mode names. This should be understood as symbolic representation of
   252   active print mode names.  This should be understood as symbolic
   236     certain individual features for printing (with precedence from left to
   253   representation of certain individual features for printing (with
   237     right).
   254   precedence from left to right).
   238 
   255 
   239     \<^descr> @{ML Print_Mode.with_modes}~\<open>modes f x\<close> evaluates \<open>f x\<close> in an execution
   256   \<^descr> @{ML Print_Mode.with_modes}~\<open>modes f x\<close> evaluates
   240     context where the print mode is prepended by the given \<open>modes\<close>. This
   257   \<open>f x\<close> in an execution context where the print mode is
   241     provides a thread-safe way to augment print modes. It is also monotonic in
   258   prepended by the given \<open>modes\<close>.  This provides a thread-safe
   242     the set of mode names: it retains the default print mode that certain
   259   way to augment print modes.  It is also monotonic in the set of mode
   243     user-interfaces might have installed for their proper functioning!
   260   names: it retains the default print mode that certain
   244 
   261   user-interfaces might have installed for their proper functioning!
   245   \<^medskip>
   262 
   246   The pretty printer for inner syntax maintains alternative mixfix productions
   263 
   247   for any print mode name invented by the user, say in commands like @{command
   264   \<^medskip>
   248   notation} or @{command abbreviation}. Mode names can be arbitrary, but the
   265   The pretty printer for inner syntax maintains alternative
   249   following ones have a specific meaning by convention:
   266   mixfix productions for any print mode name invented by the user, say
   250 
   267   in commands like @{command notation} or @{command abbreviation}.
   251     \<^item> \<^verbatim>\<open>""\<close> (the empty string): default mode; implicitly active as last
   268   Mode names can be arbitrary, but the following ones have a specific
   252     element in the list of modes.
   269   meaning by convention:
   253 
   270 
   254     \<^item> \<^verbatim>\<open>input\<close>: dummy print mode that is never active; may be used to specify
   271   \<^item> \<^verbatim>\<open>""\<close> (the empty string): default mode;
   255     notation that is only available for input.
   272   implicitly active as last element in the list of modes.
   256 
   273 
   257     \<^item> \<^verbatim>\<open>internal\<close> dummy print mode that is never active; used internally in
   274   \<^item> \<^verbatim>\<open>input\<close>: dummy print mode that is never active; may
   258     Isabelle/Pure.
   275   be used to specify notation that is only available for input.
   259 
   276 
   260     \<^item> \<^verbatim>\<open>ASCII\<close>: prefer ASCII art over mathematical symbols.
   277   \<^item> \<^verbatim>\<open>internal\<close> dummy print mode that is never active;
   261 
   278   used internally in Isabelle/Pure.
   262     \<^item> \<^verbatim>\<open>latex\<close>: additional mode that is active in {\LaTeX} document
   279 
   263     preparation of Isabelle theory sources; allows to provide alternative
   280   \<^item> \<^verbatim>\<open>ASCII\<close>: prefer ASCII art over mathematical symbols.
   264     output notation.
   281 
       
   282   \<^item> \<^verbatim>\<open>latex\<close>: additional mode that is active in {\LaTeX}
       
   283   document preparation of Isabelle theory sources; allows to provide
       
   284   alternative output notation.
       
   285 \<close>
   265 \<close>
   286 
   266 
   287 
   267 
   288 section \<open>Mixfix annotations \label{sec:mixfix}\<close>
   268 section \<open>Mixfix annotations \label{sec:mixfix}\<close>
   289 
   269 
   290 text \<open>Mixfix annotations specify concrete \<^emph>\<open>inner syntax\<close> of
   270 text \<open>
   291   Isabelle types and terms.  Locally fixed parameters in toplevel
   271   Mixfix annotations specify concrete \<^emph>\<open>inner syntax\<close> of Isabelle types and
   292   theorem statements, locale and class specifications also admit
   272   terms. Locally fixed parameters in toplevel theorem statements, locale and
   293   mixfix annotations in a fairly uniform manner.  A mixfix annotation
   273   class specifications also admit mixfix annotations in a fairly uniform
   294   describes the concrete syntax, the translation to abstract
   274   manner. A mixfix annotation describes the concrete syntax, the translation
   295   syntax, and the pretty printing.  Special case annotations provide a
   275   to abstract syntax, and the pretty printing. Special case annotations
   296   simple means of specifying infix operators and binders.
   276   provide a simple means of specifying infix operators and binders.
   297 
   277 
   298   Isabelle mixfix syntax is inspired by {\OBJ} @{cite OBJ}.  It allows
   278   Isabelle mixfix syntax is inspired by {\OBJ} @{cite OBJ}. It allows to
   299   to specify any context-free priority grammar, which is more general
   279   specify any context-free priority grammar, which is more general than the
   300   than the fixity declarations of ML and Prolog.
   280   fixity declarations of ML and Prolog.
   301 
   281 
   302   @{rail \<open>
   282   @{rail \<open>
   303     @{syntax_def mixfix}: '('
   283     @{syntax_def mixfix}: '('
   304       (@{syntax template} prios? @{syntax nat}? |
   284       (@{syntax template} prios? @{syntax nat}? |
   305         (@'infix' | @'infixl' | @'infixr') @{syntax template} @{syntax nat} |
   285         (@'infix' | @'infixl' | @'infixr') @{syntax template} @{syntax nat} |
   309     template: string
   289     template: string
   310     ;
   290     ;
   311     prios: '[' (@{syntax nat} + ',') ']'
   291     prios: '[' (@{syntax nat} + ',') ']'
   312   \<close>}
   292   \<close>}
   313 
   293 
   314   The string given as \<open>template\<close> may include literal text,
   294   The string given as \<open>template\<close> may include literal text, spacing, blocks,
   315   spacing, blocks, and arguments (denoted by ``\<open>_\<close>''); the
   295   and arguments (denoted by ``\<open>_\<close>''); the special symbol ``\<^verbatim>\<open>\<index>\<close>'' (printed as
   316   special symbol ``\<^verbatim>\<open>\<index>\<close>'' (printed as ``\<open>\<index>\<close>'')
   296   ``\<open>\<index>\<close>'') represents an index argument that specifies an implicit @{keyword
   317   represents an index argument that specifies an implicit @{keyword
   297   "structure"} reference (see also \secref{sec:locale}). Only locally fixed
   318   "structure"} reference (see also \secref{sec:locale}).  Only locally
   298   variables may be declared as @{keyword "structure"}.
   319   fixed variables may be declared as @{keyword "structure"}.
   299 
   320 
   300   Infix and binder declarations provide common abbreviations for particular
   321   Infix and binder declarations provide common abbreviations for
   301   mixfix declarations. So in practice, mixfix templates mostly degenerate to
   322   particular mixfix declarations.  So in practice, mixfix templates
   302   literal text for concrete syntax, such as ``\<^verbatim>\<open>++\<close>'' for an infix symbol.
   323   mostly degenerate to literal text for concrete syntax, such as
       
   324   ``\<^verbatim>\<open>++\<close>'' for an infix symbol.
       
   325 \<close>
   303 \<close>
   326 
   304 
   327 
   305 
   328 subsection \<open>The general mixfix form\<close>
   306 subsection \<open>The general mixfix form\<close>
   329 
   307 
   330 text \<open>In full generality, mixfix declarations work as follows.
   308 text \<open>
   331   Suppose a constant \<open>c :: \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> is annotated by
   309   In full generality, mixfix declarations work as follows. Suppose a constant
   332   \<open>(mixfix [p\<^sub>1, \<dots>, p\<^sub>n] p)\<close>, where \<open>mixfix\<close> is a string
   310   \<open>c :: \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> is annotated by \<open>(mixfix [p\<^sub>1, \<dots>, p\<^sub>n] p)\<close>, where
   333   \<open>d\<^sub>0 _ d\<^sub>1 _ \<dots> _ d\<^sub>n\<close> consisting of delimiters that surround
   311   \<open>mixfix\<close> is a string \<open>d\<^sub>0 _ d\<^sub>1 _ \<dots> _ d\<^sub>n\<close> consisting of delimiters that
   334   argument positions as indicated by underscores.
   312   surround argument positions as indicated by underscores.
   335 
   313 
   336   Altogether this determines a production for a context-free priority
   314   Altogether this determines a production for a context-free priority grammar,
   337   grammar, where for each argument \<open>i\<close> the syntactic category
   315   where for each argument \<open>i\<close> the syntactic category is determined by \<open>\<tau>\<^sub>i\<close>
   338   is determined by \<open>\<tau>\<^sub>i\<close> (with priority \<open>p\<^sub>i\<close>), and the
   316   (with priority \<open>p\<^sub>i\<close>), and the result category is determined from \<open>\<tau>\<close> (with
   339   result category is determined from \<open>\<tau>\<close> (with priority \<open>p\<close>).  Priority specifications are optional, with default 0 for
   317   priority \<open>p\<close>). Priority specifications are optional, with default 0 for
   340   arguments and 1000 for the result.\<^footnote>\<open>Omitting priorities is
   318   arguments and 1000 for the result.\<^footnote>\<open>Omitting priorities is prone to
   341   prone to syntactic ambiguities unless the delimiter tokens determine
   319   syntactic ambiguities unless the delimiter tokens determine fully bracketed
   342   fully bracketed notation, as in \<open>if _ then _ else _ fi\<close>.\<close>
   320   notation, as in \<open>if _ then _ else _ fi\<close>.\<close>
   343 
   321 
   344   Since \<open>\<tau>\<close> may be again a function type, the constant
   322   Since \<open>\<tau>\<close> may be again a function type, the constant type scheme may have
   345   type scheme may have more argument positions than the mixfix
   323   more argument positions than the mixfix pattern. Printing a nested
   346   pattern.  Printing a nested application \<open>c t\<^sub>1 \<dots> t\<^sub>m\<close> for
   324   application \<open>c t\<^sub>1 \<dots> t\<^sub>m\<close> for \<open>m > n\<close> works by attaching concrete notation
   347   \<open>m > n\<close> works by attaching concrete notation only to the
   325   only to the innermost part, essentially by printing \<open>(c t\<^sub>1 \<dots> t\<^sub>n) \<dots> t\<^sub>m\<close>
   348   innermost part, essentially by printing \<open>(c t\<^sub>1 \<dots> t\<^sub>n) \<dots> t\<^sub>m\<close>
   326   instead. If a term has fewer arguments than specified in the mixfix
   349   instead.  If a term has fewer arguments than specified in the mixfix
       
   350   template, the concrete syntax is ignored.
   327   template, the concrete syntax is ignored.
   351 
   328 
   352   \<^medskip>
   329   \<^medskip>
   353   A mixfix template may also contain additional directives
   330   A mixfix template may also contain additional directives for pretty
   354   for pretty printing, notably spaces, blocks, and breaks.  The
   331   printing, notably spaces, blocks, and breaks. The general template format is
   355   general template format is a sequence over any of the following
   332   a sequence over any of the following entities.
   356   entities.
   333 
   357 
   334   \<^descr> \<open>d\<close> is a delimiter, namely a non-empty sequence of characters other than
   358   \<^descr> \<open>d\<close> is a delimiter, namely a non-empty sequence of
   335   the following special characters:
   359   characters other than the following special characters:
       
   360 
   336 
   361   \<^medskip>
   337   \<^medskip>
   362   \begin{tabular}{ll}
   338   \begin{tabular}{ll}
   363     \<^verbatim>\<open>'\<close> & single quote \\
   339     \<^verbatim>\<open>'\<close> & single quote \\
   364     \<^verbatim>\<open>_\<close> & underscore \\
   340     \<^verbatim>\<open>_\<close> & underscore \\
   367     \<^verbatim>\<open>)\<close> & close parenthesis \\
   343     \<^verbatim>\<open>)\<close> & close parenthesis \\
   368     \<^verbatim>\<open>/\<close> & slash \\
   344     \<^verbatim>\<open>/\<close> & slash \\
   369   \end{tabular}
   345   \end{tabular}
   370   \<^medskip>
   346   \<^medskip>
   371 
   347 
   372   \<^descr> \<^verbatim>\<open>'\<close> escapes the special meaning of these
   348   \<^descr> \<^verbatim>\<open>'\<close> escapes the special meaning of these meta-characters, producing a
   373   meta-characters, producing a literal version of the following
   349   literal version of the following character, unless that is a blank.
   374   character, unless that is a blank.
   350 
   375 
   351   A single quote followed by a blank separates delimiters, without affecting
   376   A single quote followed by a blank separates delimiters, without
   352   printing, but input tokens may have additional white space here.
   377   affecting printing, but input tokens may have additional white space
   353 
   378   here.
   354   \<^descr> \<^verbatim>\<open>_\<close> is an argument position, which stands for a certain syntactic
   379 
   355   category in the underlying grammar.
   380   \<^descr> \<^verbatim>\<open>_\<close> is an argument position, which stands for a
   356 
   381   certain syntactic category in the underlying grammar.
   357   \<^descr> \<open>\<index>\<close> is an indexed argument position; this is the place where implicit
   382 
   358   structure arguments can be attached.
   383   \<^descr> \<open>\<index>\<close> is an indexed argument position; this is the place
   359 
   384   where implicit structure arguments can be attached.
   360   \<^descr> \<open>s\<close> is a non-empty sequence of spaces for printing. This and the following
   385 
   361   specifications do not affect parsing at all.
   386   \<^descr> \<open>s\<close> is a non-empty sequence of spaces for printing.
   362 
   387   This and the following specifications do not affect parsing at all.
   363   \<^descr> \<^verbatim>\<open>(\<close>\<open>n\<close> opens a pretty printing block. The optional number specifies how
   388 
   364   much indentation to add when a line break occurs within the block. If the
   389   \<^descr> \<^verbatim>\<open>(\<close>\<open>n\<close> opens a pretty printing block.  The
   365   parenthesis is not followed by digits, the indentation defaults to 0. A
   390   optional number specifies how much indentation to add when a line
   366   block specified via \<^verbatim>\<open>(00\<close> is unbreakable.
   391   break occurs within the block.  If the parenthesis is not followed
       
   392   by digits, the indentation defaults to 0.  A block specified via
       
   393   \<^verbatim>\<open>(00\<close> is unbreakable.
       
   394 
   367 
   395   \<^descr> \<^verbatim>\<open>)\<close> closes a pretty printing block.
   368   \<^descr> \<^verbatim>\<open>)\<close> closes a pretty printing block.
   396 
   369 
   397   \<^descr> \<^verbatim>\<open>//\<close> forces a line break.
   370   \<^descr> \<^verbatim>\<open>//\<close> forces a line break.
   398 
   371 
   399   \<^descr> \<^verbatim>\<open>/\<close>\<open>s\<close> allows a line break.  Here \<open>s\<close>
   372   \<^descr> \<^verbatim>\<open>/\<close>\<open>s\<close> allows a line break. Here \<open>s\<close> stands for the string of spaces
   400   stands for the string of spaces (zero or more) right after the
   373   (zero or more) right after the slash. These spaces are printed if the break
   401   slash.  These spaces are printed if the break is \<^emph>\<open>not\<close> taken.
   374   is \<^emph>\<open>not\<close> taken.
   402 
   375 
   403 
   376 
   404   The general idea of pretty printing with blocks and breaks is also
   377   The general idea of pretty printing with blocks and breaks is also described
   405   described in @{cite "paulson-ml2"}; it goes back to @{cite "Oppen:1980"}.
   378   in @{cite "paulson-ml2"}; it goes back to @{cite "Oppen:1980"}.
   406 \<close>
   379 \<close>
   407 
   380 
   408 
   381 
   409 subsection \<open>Infixes\<close>
   382 subsection \<open>Infixes\<close>
   410 
   383 
   411 text \<open>Infix operators are specified by convenient short forms that
   384 text \<open>
   412   abbreviate general mixfix annotations as follows:
   385   Infix operators are specified by convenient short forms that abbreviate
       
   386   general mixfix annotations as follows:
   413 
   387 
   414   \begin{center}
   388   \begin{center}
   415   \begin{tabular}{lll}
   389   \begin{tabular}{lll}
   416 
   390 
   417   \<^verbatim>\<open>(\<close>@{keyword_def "infix"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close> \<open>p\<close>\<^verbatim>\<open>)\<close>
   391   \<^verbatim>\<open>(\<close>@{keyword_def "infix"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close> \<open>p\<close>\<^verbatim>\<open>)\<close>
   425   \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p + 1\<close>\<^verbatim>\<open>,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\
   399   \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p + 1\<close>\<^verbatim>\<open>,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\
   426 
   400 
   427   \end{tabular}
   401   \end{tabular}
   428   \end{center}
   402   \end{center}
   429 
   403 
   430   The mixfix template \<^verbatim>\<open>"(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)"\<close>
   404   The mixfix template \<^verbatim>\<open>"(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)"\<close> specifies two argument positions;
   431   specifies two argument positions; the delimiter is preceded by a
   405   the delimiter is preceded by a space and followed by a space or line break;
   432   space and followed by a space or line break; the entire phrase is a
   406   the entire phrase is a pretty printing block.
   433   pretty printing block.
   407 
   434 
   408   The alternative notation \<^verbatim>\<open>op\<close>~\<open>sy\<close> is introduced in addition. Thus any
   435   The alternative notation \<^verbatim>\<open>op\<close>~\<open>sy\<close> is introduced
   409   infix operator may be written in prefix form (as in ML), independently of
   436   in addition.  Thus any infix operator may be written in prefix form
   410   the number of arguments in the term.
   437   (as in ML), independently of the number of arguments in the term.
       
   438 \<close>
   411 \<close>
   439 
   412 
   440 
   413 
   441 subsection \<open>Binders\<close>
   414 subsection \<open>Binders\<close>
   442 
   415 
   443 text \<open>A \<^emph>\<open>binder\<close> is a variable-binding construct such as a
   416 text \<open>
   444   quantifier.  The idea to formalize \<open>\<forall>x. b\<close> as \<open>All
   417   A \<^emph>\<open>binder\<close> is a variable-binding construct such as a quantifier. The idea
   445   (\<lambda>x. b)\<close> for \<open>All :: ('a \<Rightarrow> bool) \<Rightarrow> bool\<close> already goes back
   418   to formalize \<open>\<forall>x. b\<close> as \<open>All (\<lambda>x. b)\<close> for \<open>All :: ('a \<Rightarrow> bool) \<Rightarrow> bool\<close>
   446   to @{cite church40}.  Isabelle declarations of certain higher-order
   419   already goes back to @{cite church40}. Isabelle declarations of certain
   447   operators may be annotated with @{keyword_def "binder"} annotations
   420   higher-order operators may be annotated with @{keyword_def "binder"}
   448   as follows:
   421   annotations as follows:
   449 
   422 
   450   \begin{center}
   423   \begin{center}
   451   \<open>c ::\<close>~\<^verbatim>\<open>"\<close>\<open>(\<tau>\<^sub>1 \<Rightarrow> \<tau>\<^sub>2) \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  (\<close>@{keyword "binder"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>" [\<close>\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>
   424   \<open>c ::\<close>~\<^verbatim>\<open>"\<close>\<open>(\<tau>\<^sub>1 \<Rightarrow> \<tau>\<^sub>2) \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  (\<close>@{keyword "binder"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>" [\<close>\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>
   452   \end{center}
   425   \end{center}
   453 
   426 
   454   This introduces concrete binder syntax \<open>sy x. b\<close>, where
   427   This introduces concrete binder syntax \<open>sy x. b\<close>, where \<open>x\<close> is a bound
   455   \<open>x\<close> is a bound variable of type \<open>\<tau>\<^sub>1\<close>, the body \<open>b\<close> has type \<open>\<tau>\<^sub>2\<close> and the whole term has type \<open>\<tau>\<^sub>3\<close>.
   428   variable of type \<open>\<tau>\<^sub>1\<close>, the body \<open>b\<close> has type \<open>\<tau>\<^sub>2\<close> and the whole term has
   456   The optional integer \<open>p\<close> specifies the syntactic priority of
   429   type \<open>\<tau>\<^sub>3\<close>. The optional integer \<open>p\<close> specifies the syntactic priority of the
   457   the body; the default is \<open>q\<close>, which is also the priority of
   430   body; the default is \<open>q\<close>, which is also the priority of the whole construct.
   458   the whole construct.
       
   459 
   431 
   460   Internally, the binder syntax is expanded to something like this:
   432   Internally, the binder syntax is expanded to something like this:
   461   \begin{center}
   433   \begin{center}
   462   \<open>c_binder ::\<close>~\<^verbatim>\<open>"\<close>\<open>idts \<Rightarrow> \<tau>\<^sub>2 \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  ("(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)" [0,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>
   434   \<open>c_binder ::\<close>~\<^verbatim>\<open>"\<close>\<open>idts \<Rightarrow> \<tau>\<^sub>2 \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  ("(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)" [0,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>
   463   \end{center}
   435   \end{center}
   464 
   436 
   465   Here @{syntax (inner) idts} is the nonterminal symbol for a list of
   437   Here @{syntax (inner) idts} is the nonterminal symbol for a list of
   466   identifiers with optional type constraints (see also
   438   identifiers with optional type constraints (see also
   467   \secref{sec:pure-grammar}).  The mixfix template \<^verbatim>\<open>"(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)"\<close>
   439   \secref{sec:pure-grammar}). The mixfix template \<^verbatim>\<open>"(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)"\<close> defines
   468   defines argument positions
   440   argument positions for the bound identifiers and the body, separated by a
   469   for the bound identifiers and the body, separated by a dot with
   441   dot with optional line break; the entire phrase is a pretty printing block
   470   optional line break; the entire phrase is a pretty printing block of
   442   of indentation level 3. Note that there is no extra space after \<open>sy\<close>, so it
   471   indentation level 3.  Note that there is no extra space after \<open>sy\<close>, so it needs to be included user specification if the binder
   443   needs to be included user specification if the binder syntax ends with a
   472   syntax ends with a token that may be continued by an identifier
   444   token that may be continued by an identifier token at the start of @{syntax
   473   token at the start of @{syntax (inner) idts}.
   445   (inner) idts}.
   474 
   446 
   475   Furthermore, a syntax translation to transforms \<open>c_binder x\<^sub>1
   447   Furthermore, a syntax translation to transforms \<open>c_binder x\<^sub>1 \<dots> x\<^sub>n b\<close> into
   476   \<dots> x\<^sub>n b\<close> into iterated application \<open>c (\<lambda>x\<^sub>1. \<dots> c (\<lambda>x\<^sub>n. b)\<dots>)\<close>.
   448   iterated application \<open>c (\<lambda>x\<^sub>1. \<dots> c (\<lambda>x\<^sub>n. b)\<dots>)\<close>. This works in both
   477   This works in both directions, for parsing and printing.\<close>
   449   directions, for parsing and printing.
       
   450 \<close>
   478 
   451 
   479 
   452 
   480 section \<open>Explicit notation \label{sec:notation}\<close>
   453 section \<open>Explicit notation \label{sec:notation}\<close>
   481 
   454 
   482 text \<open>
   455 text \<open>
   486     @{command_def "notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
   459     @{command_def "notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
   487     @{command_def "no_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
   460     @{command_def "no_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
   488     @{command_def "write"} & : & \<open>proof(state) \<rightarrow> proof(state)\<close> \\
   461     @{command_def "write"} & : & \<open>proof(state) \<rightarrow> proof(state)\<close> \\
   489   \end{matharray}
   462   \end{matharray}
   490 
   463 
   491   Commands that introduce new logical entities (terms or types)
   464   Commands that introduce new logical entities (terms or types) usually allow
   492   usually allow to provide mixfix annotations on the spot, which is
   465   to provide mixfix annotations on the spot, which is convenient for default
   493   convenient for default notation.  Nonetheless, the syntax may be
   466   notation. Nonetheless, the syntax may be modified later on by declarations
   494   modified later on by declarations for explicit notation.  This
   467   for explicit notation. This allows to add or delete mixfix annotations for
   495   allows to add or delete mixfix annotations for of existing logical
   468   of existing logical entities within the current context.
   496   entities within the current context.
       
   497 
   469 
   498   @{rail \<open>
   470   @{rail \<open>
   499     (@@{command type_notation} | @@{command no_type_notation}) @{syntax mode}? \<newline>
   471     (@@{command type_notation} | @@{command no_type_notation}) @{syntax mode}? \<newline>
   500       (@{syntax nameref} @{syntax mixfix} + @'and')
   472       (@{syntax nameref} @{syntax mixfix} + @'and')
   501     ;
   473     ;
   503       (@{syntax nameref} @{syntax mixfix} + @'and')
   475       (@{syntax nameref} @{syntax mixfix} + @'and')
   504     ;
   476     ;
   505     @@{command write} @{syntax mode}? (@{syntax nameref} @{syntax mixfix} + @'and')
   477     @@{command write} @{syntax mode}? (@{syntax nameref} @{syntax mixfix} + @'and')
   506   \<close>}
   478   \<close>}
   507 
   479 
   508   \<^descr> @{command "type_notation"}~\<open>c (mx)\<close> associates mixfix
   480   \<^descr> @{command "type_notation"}~\<open>c (mx)\<close> associates mixfix syntax with an
   509   syntax with an existing type constructor.  The arity of the
   481   existing type constructor. The arity of the constructor is retrieved from
   510   constructor is retrieved from the context.
   482   the context.
   511 
   483 
   512   \<^descr> @{command "no_type_notation"} is similar to @{command
   484   \<^descr> @{command "no_type_notation"} is similar to @{command "type_notation"},
   513   "type_notation"}, but removes the specified syntax annotation from
   485   but removes the specified syntax annotation from the present context.
   514   the present context.
   486 
   515 
   487   \<^descr> @{command "notation"}~\<open>c (mx)\<close> associates mixfix syntax with an existing
   516   \<^descr> @{command "notation"}~\<open>c (mx)\<close> associates mixfix
   488   constant or fixed variable. The type declaration of the given entity is
   517   syntax with an existing constant or fixed variable.  The type
   489   retrieved from the context.
   518   declaration of the given entity is retrieved from the context.
   490 
   519 
   491   \<^descr> @{command "no_notation"} is similar to @{command "notation"}, but removes
   520   \<^descr> @{command "no_notation"} is similar to @{command "notation"},
   492   the specified syntax annotation from the present context.
   521   but removes the specified syntax annotation from the present
   493 
   522   context.
   494   \<^descr> @{command "write"} is similar to @{command "notation"}, but works within
   523 
   495   an Isar proof body.
   524   \<^descr> @{command "write"} is similar to @{command "notation"}, but
       
   525   works within an Isar proof body.
       
   526 \<close>
   496 \<close>
   527 
   497 
   528 
   498 
   529 section \<open>The Pure syntax \label{sec:pure-syntax}\<close>
   499 section \<open>The Pure syntax \label{sec:pure-syntax}\<close>
   530 
   500 
   531 subsection \<open>Lexical matters \label{sec:inner-lex}\<close>
   501 subsection \<open>Lexical matters \label{sec:inner-lex}\<close>
   532 
   502 
   533 text \<open>The inner lexical syntax vaguely resembles the outer one
   503 text \<open>
   534   (\secref{sec:outer-lex}), but some details are different.  There are
   504   The inner lexical syntax vaguely resembles the outer one
   535   two main categories of inner syntax tokens:
   505   (\secref{sec:outer-lex}), but some details are different. There are two main
   536 
   506   categories of inner syntax tokens:
   537   \<^enum> \<^emph>\<open>delimiters\<close> --- the literal tokens occurring in
   507 
   538   productions of the given priority grammar (cf.\
   508   \<^enum> \<^emph>\<open>delimiters\<close> --- the literal tokens occurring in productions of the given
   539   \secref{sec:priority-grammar});
   509   priority grammar (cf.\ \secref{sec:priority-grammar});
   540 
   510 
   541   \<^enum> \<^emph>\<open>named tokens\<close> --- various categories of identifiers etc.
   511   \<^enum> \<^emph>\<open>named tokens\<close> --- various categories of identifiers etc.
   542 
   512 
   543 
   513 
   544   Delimiters override named tokens and may thus render certain
   514   Delimiters override named tokens and may thus render certain identifiers
   545   identifiers inaccessible.  Sometimes the logical context admits
   515   inaccessible. Sometimes the logical context admits alternative ways to refer
   546   alternative ways to refer to the same entity, potentially via
   516   to the same entity, potentially via qualified names.
   547   qualified names.
   517 
   548 
   518   \<^medskip>
   549   \<^medskip>
   519   The categories for named tokens are defined once and for all as follows,
   550   The categories for named tokens are defined once and for
   520   reusing some categories of the outer token syntax (\secref{sec:outer-lex}).
   551   all as follows, reusing some categories of the outer token syntax
       
   552   (\secref{sec:outer-lex}).
       
   553 
   521 
   554   \begin{center}
   522   \begin{center}
   555   \begin{supertabular}{rcl}
   523   \begin{supertabular}{rcl}
   556     @{syntax_def (inner) id} & = & @{syntax_ref ident} \\
   524     @{syntax_def (inner) id} & = & @{syntax_ref ident} \\
   557     @{syntax_def (inner) longid} & = & @{syntax_ref longident} \\
   525     @{syntax_def (inner) longid} & = & @{syntax_ref longident} \\
   579 \<close>
   547 \<close>
   580 
   548 
   581 
   549 
   582 subsection \<open>Priority grammars \label{sec:priority-grammar}\<close>
   550 subsection \<open>Priority grammars \label{sec:priority-grammar}\<close>
   583 
   551 
   584 text \<open>A context-free grammar consists of a set of \<^emph>\<open>terminal
   552 text \<open>
   585   symbols\<close>, a set of \<^emph>\<open>nonterminal symbols\<close> and a set of
   553   A context-free grammar consists of a set of \<^emph>\<open>terminal symbols\<close>, a set of
   586   \<^emph>\<open>productions\<close>.  Productions have the form \<open>A = \<gamma>\<close>,
   554   \<^emph>\<open>nonterminal symbols\<close> and a set of \<^emph>\<open>productions\<close>. Productions have the
   587   where \<open>A\<close> is a nonterminal and \<open>\<gamma>\<close> is a string of
   555   form \<open>A = \<gamma>\<close>, where \<open>A\<close> is a nonterminal and \<open>\<gamma>\<close> is a string of terminals
   588   terminals and nonterminals.  One designated nonterminal is called
   556   and nonterminals. One designated nonterminal is called the \<^emph>\<open>root symbol\<close>.
   589   the \<^emph>\<open>root symbol\<close>.  The language defined by the grammar
   557   The language defined by the grammar consists of all strings of terminals
   590   consists of all strings of terminals that can be derived from the
   558   that can be derived from the root symbol by applying productions as rewrite
   591   root symbol by applying productions as rewrite rules.
   559   rules.
   592 
   560 
   593   The standard Isabelle parser for inner syntax uses a \<^emph>\<open>priority
   561   The standard Isabelle parser for inner syntax uses a \<^emph>\<open>priority grammar\<close>.
   594   grammar\<close>.  Each nonterminal is decorated by an integer priority:
   562   Each nonterminal is decorated by an integer priority: \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>. In a
   595   \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>.  In a derivation, \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> may be rewritten
   563   derivation, \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> may be rewritten using a production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> only
   596   using a production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> only if \<open>p \<le> q\<close>.  Any
   564   if \<open>p \<le> q\<close>. Any priority grammar can be translated into a normal
   597   priority grammar can be translated into a normal context-free
   565   context-free grammar by introducing new nonterminals and productions.
   598   grammar by introducing new nonterminals and productions.
   566 
   599 
   567   \<^medskip>
   600   \<^medskip>
   568   Formally, a set of context free productions \<open>G\<close> induces a derivation
   601   Formally, a set of context free productions \<open>G\<close>
   569   relation \<open>\<longrightarrow>\<^sub>G\<close> as follows. Let \<open>\<alpha>\<close> and \<open>\<beta>\<close> denote strings of terminal or
   602   induces a derivation relation \<open>\<longrightarrow>\<^sub>G\<close> as follows.  Let \<open>\<alpha>\<close> and \<open>\<beta>\<close> denote strings of terminal or nonterminal symbols.
   570   nonterminal symbols. Then \<open>\<alpha> A\<^sup>(\<^sup>p\<^sup>) \<beta> \<longrightarrow>\<^sub>G \<alpha> \<gamma> \<beta>\<close> holds if and only if \<open>G\<close>
   603   Then \<open>\<alpha> A\<^sup>(\<^sup>p\<^sup>) \<beta> \<longrightarrow>\<^sub>G \<alpha> \<gamma> \<beta>\<close> holds if and only if \<open>G\<close>
       
   604   contains some production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> for \<open>p \<le> q\<close>.
   571   contains some production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> for \<open>p \<le> q\<close>.
   605 
   572 
   606   \<^medskip>
   573   \<^medskip>
   607   The following grammar for arithmetic expressions
   574   The following grammar for arithmetic expressions demonstrates how binding
   608   demonstrates how binding power and associativity of operators can be
   575   power and associativity of operators can be enforced by priorities.
   609   enforced by priorities.
       
   610 
   576 
   611   \begin{center}
   577   \begin{center}
   612   \begin{tabular}{rclr}
   578   \begin{tabular}{rclr}
   613   \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>(\<close> \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>)\<close> \\
   579   \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>(\<close> \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>)\<close> \\
   614   \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>0\<close> \\
   580   \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>0\<close> \\
   615   \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>+\<close> \<open>A\<^sup>(\<^sup>1\<^sup>)\<close> \\
   581   \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>+\<close> \<open>A\<^sup>(\<^sup>1\<^sup>)\<close> \\
   616   \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>*\<close> \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> \\
   582   \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>*\<close> \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> \\
   617   \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>-\<close> \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \\
   583   \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>-\<close> \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \\
   618   \end{tabular}
   584   \end{tabular}
   619   \end{center}
   585   \end{center}
   620   The choice of priorities determines that \<^verbatim>\<open>-\<close> binds
   586   The choice of priorities determines that \<^verbatim>\<open>-\<close> binds tighter than \<^verbatim>\<open>*\<close>, which
   621   tighter than \<^verbatim>\<open>*\<close>, which binds tighter than \<^verbatim>\<open>+\<close>.
   587   binds tighter than \<^verbatim>\<open>+\<close>. Furthermore \<^verbatim>\<open>+\<close> associates to the left and \<^verbatim>\<open>*\<close> to
   622   Furthermore \<^verbatim>\<open>+\<close> associates to the left and
   588   the right.
   623   \<^verbatim>\<open>*\<close> to the right.
       
   624 
   589 
   625   \<^medskip>
   590   \<^medskip>
   626   For clarity, grammars obey these conventions:
   591   For clarity, grammars obey these conventions:
   627 
   592 
   628   \<^item> All priorities must lie between 0 and 1000.
   593     \<^item> All priorities must lie between 0 and 1000.
   629 
   594 
   630   \<^item> Priority 0 on the right-hand side and priority 1000 on the
   595     \<^item> Priority 0 on the right-hand side and priority 1000 on the left-hand
   631   left-hand side may be omitted.
   596     side may be omitted.
   632 
   597 
   633   \<^item> The production \<open>A\<^sup>(\<^sup>p\<^sup>) = \<alpha>\<close> is written as \<open>A = \<alpha>
   598     \<^item> The production \<open>A\<^sup>(\<^sup>p\<^sup>) = \<alpha>\<close> is written as \<open>A = \<alpha> (p)\<close>, i.e.\ the
   634   (p)\<close>, i.e.\ the priority of the left-hand side actually appears in
   599     priority of the left-hand side actually appears in a column on the far
   635   a column on the far right.
   600     right.
   636 
   601 
   637   \<^item> Alternatives are separated by \<open>|\<close>.
   602     \<^item> Alternatives are separated by \<open>|\<close>.
   638 
   603 
   639   \<^item> Repetition is indicated by dots \<open>(\<dots>)\<close> in an informal
   604     \<^item> Repetition is indicated by dots \<open>(\<dots>)\<close> in an informal but obvious way.
   640   but obvious way.
       
   641 
       
   642 
   605 
   643   Using these conventions, the example grammar specification above
   606   Using these conventions, the example grammar specification above
   644   takes the form:
   607   takes the form:
   645   \begin{center}
   608   \begin{center}
   646   \begin{tabular}{rclc}
   609   \begin{tabular}{rclc}
   654 \<close>
   617 \<close>
   655 
   618 
   656 
   619 
   657 subsection \<open>The Pure grammar \label{sec:pure-grammar}\<close>
   620 subsection \<open>The Pure grammar \label{sec:pure-grammar}\<close>
   658 
   621 
   659 text \<open>The priority grammar of the \<open>Pure\<close> theory is defined
   622 text \<open>
   660   approximately like this:
   623   The priority grammar of the \<open>Pure\<close> theory is defined approximately like
       
   624   this:
   661 
   625 
   662   \begin{center}
   626   \begin{center}
   663   \begin{supertabular}{rclr}
   627   \begin{supertabular}{rclr}
   664 
   628 
   665   @{syntax_def (inner) any} & = & \<open>prop  |  logic\<close> \\\\
   629   @{syntax_def (inner) any} & = & \<open>prop  |  logic\<close> \\\\
   727   @{syntax_def (inner) class_name} & = & \<open>id  |  longid\<close> \\
   691   @{syntax_def (inner) class_name} & = & \<open>id  |  longid\<close> \\
   728   \end{supertabular}
   692   \end{supertabular}
   729   \end{center}
   693   \end{center}
   730 
   694 
   731   \<^medskip>
   695   \<^medskip>
   732   Here literal terminals are printed \<^verbatim>\<open>verbatim\<close>;
   696   Here literal terminals are printed \<^verbatim>\<open>verbatim\<close>; see also
   733   see also \secref{sec:inner-lex} for further token categories of the
   697   \secref{sec:inner-lex} for further token categories of the inner syntax. The
   734   inner syntax.  The meaning of the nonterminals defined by the above
   698   meaning of the nonterminals defined by the above grammar is as follows:
   735   grammar is as follows:
       
   736 
   699 
   737   \<^descr> @{syntax_ref (inner) any} denotes any term.
   700   \<^descr> @{syntax_ref (inner) any} denotes any term.
   738 
   701 
   739   \<^descr> @{syntax_ref (inner) prop} denotes meta-level propositions,
   702   \<^descr> @{syntax_ref (inner) prop} denotes meta-level propositions, which are
   740   which are terms of type @{typ prop}.  The syntax of such formulae of
   703   terms of type @{typ prop}. The syntax of such formulae of the meta-logic is
   741   the meta-logic is carefully distinguished from usual conventions for
   704   carefully distinguished from usual conventions for object-logics. In
   742   object-logics.  In particular, plain \<open>\<lambda>\<close>-term notation is
   705   particular, plain \<open>\<lambda>\<close>-term notation is \<^emph>\<open>not\<close> recognized as @{syntax (inner)
   743   \<^emph>\<open>not\<close> recognized as @{syntax (inner) prop}.
   706   prop}.
   744 
   707 
   745   \<^descr> @{syntax_ref (inner) aprop} denotes atomic propositions, which
   708   \<^descr> @{syntax_ref (inner) aprop} denotes atomic propositions, which are
   746   are embedded into regular @{syntax (inner) prop} by means of an
   709   embedded into regular @{syntax (inner) prop} by means of an explicit \<^verbatim>\<open>PROP\<close>
   747   explicit \<^verbatim>\<open>PROP\<close> token.
   710   token.
   748 
   711 
   749   Terms of type @{typ prop} with non-constant head, e.g.\ a plain
   712   Terms of type @{typ prop} with non-constant head, e.g.\ a plain variable,
   750   variable, are printed in this form.  Constants that yield type @{typ
   713   are printed in this form. Constants that yield type @{typ prop} are expected
   751   prop} are expected to provide their own concrete syntax; otherwise
   714   to provide their own concrete syntax; otherwise the printed version will
   752   the printed version will appear like @{syntax (inner) logic} and
   715   appear like @{syntax (inner) logic} and cannot be parsed again as @{syntax
   753   cannot be parsed again as @{syntax (inner) prop}.
   716   (inner) prop}.
   754 
   717 
   755   \<^descr> @{syntax_ref (inner) logic} denotes arbitrary terms of a
   718   \<^descr> @{syntax_ref (inner) logic} denotes arbitrary terms of a logical type,
   756   logical type, excluding type @{typ prop}.  This is the main
   719   excluding type @{typ prop}. This is the main syntactic category of
   757   syntactic category of object-logic entities, covering plain \<open>\<lambda>\<close>-term notation (variables, abstraction, application), plus
   720   object-logic entities, covering plain \<open>\<lambda>\<close>-term notation (variables,
   758   anything defined by the user.
   721   abstraction, application), plus anything defined by the user.
   759 
   722 
   760   When specifying notation for logical entities, all logical types
   723   When specifying notation for logical entities, all logical types (excluding
   761   (excluding @{typ prop}) are \<^emph>\<open>collapsed\<close> to this single category
   724   @{typ prop}) are \<^emph>\<open>collapsed\<close> to this single category of @{syntax (inner)
   762   of @{syntax (inner) logic}.
   725   logic}.
   763 
   726 
   764   \<^descr> @{syntax_ref (inner) index} denotes an optional index term for
   727   \<^descr> @{syntax_ref (inner) index} denotes an optional index term for indexed
   765   indexed syntax.  If omitted, it refers to the first @{keyword_ref
   728   syntax. If omitted, it refers to the first @{keyword_ref "structure"}
   766   "structure"} variable in the context.  The special dummy ``\<open>\<index>\<close>'' serves as pattern variable in mixfix annotations that
   729   variable in the context. The special dummy ``\<open>\<index>\<close>'' serves as pattern
   767   introduce indexed notation.
   730   variable in mixfix annotations that introduce indexed notation.
   768 
   731 
   769   \<^descr> @{syntax_ref (inner) idt} denotes identifiers, possibly
   732   \<^descr> @{syntax_ref (inner) idt} denotes identifiers, possibly constrained by
   770   constrained by types.
   733   types.
   771 
   734 
   772   \<^descr> @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref
   735   \<^descr> @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref (inner)
   773   (inner) idt}.  This is the most basic category for variables in
   736   idt}. This is the most basic category for variables in iterated binders,
   774   iterated binders, such as \<open>\<lambda>\<close> or \<open>\<And>\<close>.
   737   such as \<open>\<lambda>\<close> or \<open>\<And>\<close>.
   775 
   738 
   776   \<^descr> @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns}
   739   \<^descr> @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns} denote
   777   denote patterns for abstraction, cases bindings etc.  In Pure, these
   740   patterns for abstraction, cases bindings etc. In Pure, these categories
   778   categories start as a merely copy of @{syntax (inner) idt} and
   741   start as a merely copy of @{syntax (inner) idt} and @{syntax (inner) idts},
   779   @{syntax (inner) idts}, respectively.  Object-logics may add
   742   respectively. Object-logics may add additional productions for binding
   780   additional productions for binding forms.
   743   forms.
   781 
   744 
   782   \<^descr> @{syntax_ref (inner) type} denotes types of the meta-logic.
   745   \<^descr> @{syntax_ref (inner) type} denotes types of the meta-logic.
   783 
   746 
   784   \<^descr> @{syntax_ref (inner) sort} denotes meta-level sorts.
   747   \<^descr> @{syntax_ref (inner) sort} denotes meta-level sorts.
   785 
   748 
   786 
   749 
   787   Here are some further explanations of certain syntax features.
   750   Here are some further explanations of certain syntax features.
   788 
   751 
   789   \<^item> In @{syntax (inner) idts}, note that \<open>x :: nat y\<close> is
   752   \<^item> In @{syntax (inner) idts}, note that \<open>x :: nat y\<close> is parsed as \<open>x :: (nat
   790   parsed as \<open>x :: (nat y)\<close>, treating \<open>y\<close> like a type
   753   y)\<close>, treating \<open>y\<close> like a type constructor applied to \<open>nat\<close>. To avoid this
   791   constructor applied to \<open>nat\<close>.  To avoid this interpretation,
   754   interpretation, write \<open>(x :: nat) y\<close> with explicit parentheses.
   792   write \<open>(x :: nat) y\<close> with explicit parentheses.
   755 
   793 
   756   \<^item> Similarly, \<open>x :: nat y :: nat\<close> is parsed as \<open>x :: (nat y :: nat)\<close>. The
   794   \<^item> Similarly, \<open>x :: nat y :: nat\<close> is parsed as \<open>x ::
   757   correct form is \<open>(x :: nat) (y :: nat)\<close>, or \<open>(x :: nat) y :: nat\<close> if \<open>y\<close> is
   795   (nat y :: nat)\<close>.  The correct form is \<open>(x :: nat) (y ::
   758   last in the sequence of identifiers.
   796   nat)\<close>, or \<open>(x :: nat) y :: nat\<close> if \<open>y\<close> is last in the
   759 
   797   sequence of identifiers.
   760   \<^item> Type constraints for terms bind very weakly. For example, \<open>x < y :: nat\<close>
   798 
   761   is normally parsed as \<open>(x < y) :: nat\<close>, unless \<open><\<close> has a very low priority,
   799   \<^item> Type constraints for terms bind very weakly.  For example,
   762   in which case the input is likely to be ambiguous. The correct form is \<open>x <
   800   \<open>x < y :: nat\<close> is normally parsed as \<open>(x < y) ::
   763   (y :: nat)\<close>.
   801   nat\<close>, unless \<open><\<close> has a very low priority, in which case the
       
   802   input is likely to be ambiguous.  The correct form is \<open>x < (y
       
   803   :: nat)\<close>.
       
   804 
   764 
   805   \<^item> Dummy variables (written as underscore) may occur in different
   765   \<^item> Dummy variables (written as underscore) may occur in different
   806   roles.
   766   roles.
   807 
   767 
   808     \<^descr> A type ``\<open>_\<close>'' or ``\<open>_ :: sort\<close>'' acts like an
   768     \<^descr> A type ``\<open>_\<close>'' or ``\<open>_ :: sort\<close>'' acts like an anonymous inference
   809     anonymous inference parameter, which is filled-in according to the
   769     parameter, which is filled-in according to the most general type produced
   810     most general type produced by the type-checking phase.
   770     by the type-checking phase.
   811 
   771 
   812     \<^descr> A bound ``\<open>_\<close>'' refers to a vacuous abstraction, where
   772     \<^descr> A bound ``\<open>_\<close>'' refers to a vacuous abstraction, where the body does not
   813     the body does not refer to the binding introduced here.  As in the
   773     refer to the binding introduced here. As in the term @{term "\<lambda>x _. x"},
   814     term @{term "\<lambda>x _. x"}, which is \<open>\<alpha>\<close>-equivalent to \<open>\<lambda>x y. x\<close>.
   774     which is \<open>\<alpha>\<close>-equivalent to \<open>\<lambda>x y. x\<close>.
   815 
   775 
   816     \<^descr> A free ``\<open>_\<close>'' refers to an implicit outer binding.
   776     \<^descr> A free ``\<open>_\<close>'' refers to an implicit outer binding. Higher definitional
   817     Higher definitional packages usually allow forms like \<open>f x _
   777     packages usually allow forms like \<open>f x _ = x\<close>.
   818     = x\<close>.
   778 
   819 
   779     \<^descr> A schematic ``\<open>_\<close>'' (within a term pattern, see \secref{sec:term-decls})
   820     \<^descr> A schematic ``\<open>_\<close>'' (within a term pattern, see
   780     refers to an anonymous variable that is implicitly abstracted over its
   821     \secref{sec:term-decls}) refers to an anonymous variable that is
   781     context of locally bound variables. For example, this allows pattern
   822     implicitly abstracted over its context of locally bound variables.
   782     matching of \<open>{x. f x = g x}\<close> against \<open>{x. _ = _}\<close>, or even \<open>{_. _ = _}\<close> by
   823     For example, this allows pattern matching of \<open>{x. f x = g
       
   824     x}\<close> against \<open>{x. _ = _}\<close>, or even \<open>{_. _ = _}\<close> by
       
   825     using both bound and schematic dummies.
   783     using both bound and schematic dummies.
   826 
   784 
   827   \<^descr> The three literal dots ``\<^verbatim>\<open>...\<close>'' may be also
   785   \<^descr> The three literal dots ``\<^verbatim>\<open>...\<close>'' may be also written as ellipsis symbol
   828   written as ellipsis symbol \<^verbatim>\<open>\<dots>\<close>.  In both cases this
   786   \<^verbatim>\<open>\<dots>\<close>. In both cases this refers to a special schematic variable, which is
   829   refers to a special schematic variable, which is bound in the
   787   bound in the context. This special term abbreviation works nicely with
   830   context.  This special term abbreviation works nicely with
       
   831   calculational reasoning (\secref{sec:calculation}).
   788   calculational reasoning (\secref{sec:calculation}).
   832 
   789 
   833   \<^descr> \<^verbatim>\<open>CONST\<close> ensures that the given identifier is treated
   790   \<^descr> \<^verbatim>\<open>CONST\<close> ensures that the given identifier is treated as constant term,
   834   as constant term, and passed through the parse tree in fully
   791   and passed through the parse tree in fully internalized form. This is
   835   internalized form.  This is particularly relevant for translation
   792   particularly relevant for translation rules (\secref{sec:syn-trans}),
   836   rules (\secref{sec:syn-trans}), notably on the RHS.
   793   notably on the RHS.
   837 
   794 
   838   \<^descr> \<^verbatim>\<open>XCONST\<close> is similar to \<^verbatim>\<open>CONST\<close>, but
   795   \<^descr> \<^verbatim>\<open>XCONST\<close> is similar to \<^verbatim>\<open>CONST\<close>, but retains the constant name as given.
   839   retains the constant name as given.  This is only relevant to
   796   This is only relevant to translation rules (\secref{sec:syn-trans}), notably
   840   translation rules (\secref{sec:syn-trans}), notably on the LHS.
   797   on the LHS.
   841 \<close>
   798 \<close>
   842 
   799 
   843 
   800 
   844 subsection \<open>Inspecting the syntax\<close>
   801 subsection \<open>Inspecting the syntax\<close>
   845 
   802 
   846 text \<open>
   803 text \<open>
   847   \begin{matharray}{rcl}
   804   \begin{matharray}{rcl}
   848     @{command_def "print_syntax"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
   805     @{command_def "print_syntax"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
   849   \end{matharray}
   806   \end{matharray}
   850 
   807 
   851   \<^descr> @{command "print_syntax"} prints the inner syntax of the
   808   \<^descr> @{command "print_syntax"} prints the inner syntax of the current context.
   852   current context.  The output can be quite large; the most important
   809   The output can be quite large; the most important sections are explained
   853   sections are explained below.
   810   below.
   854 
   811 
   855     \<^descr> \<open>lexicon\<close> lists the delimiters of the inner token
   812     \<^descr> \<open>lexicon\<close> lists the delimiters of the inner token language; see
   856     language; see \secref{sec:inner-lex}.
   813     \secref{sec:inner-lex}.
   857 
   814 
   858     \<^descr> \<open>prods\<close> lists the productions of the underlying
   815     \<^descr> \<open>prods\<close> lists the productions of the underlying priority grammar; see
   859     priority grammar; see \secref{sec:priority-grammar}.
   816     \secref{sec:priority-grammar}.
   860 
   817 
   861     The nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is rendered in plain text as \<open>A[p]\<close>; delimiters are quoted.  Many productions have an extra
   818     The nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is rendered in plain text as \<open>A[p]\<close>; delimiters
   862     \<open>\<dots> => name\<close>.  These names later become the heads of parse
   819     are quoted. Many productions have an extra \<open>\<dots> => name\<close>. These names later
   863     trees; they also guide the pretty printer.
   820     become the heads of parse trees; they also guide the pretty printer.
   864 
   821 
   865     Productions without such parse tree names are called \<^emph>\<open>copy
   822     Productions without such parse tree names are called \<^emph>\<open>copy productions\<close>.
   866     productions\<close>.  Their right-hand side must have exactly one
   823     Their right-hand side must have exactly one nonterminal symbol (or named
   867     nonterminal symbol (or named token).  The parser does not create a
   824     token). The parser does not create a new parse tree node for copy
   868     new parse tree node for copy productions, but simply returns the
   825     productions, but simply returns the parse tree of the right-hand symbol.
   869     parse tree of the right-hand symbol.
       
   870 
   826 
   871     If the right-hand side of a copy production consists of a single
   827     If the right-hand side of a copy production consists of a single
   872     nonterminal without any delimiters, then it is called a \<^emph>\<open>chain
   828     nonterminal without any delimiters, then it is called a \<^emph>\<open>chain
   873     production\<close>.  Chain productions act as abbreviations: conceptually,
   829     production\<close>. Chain productions act as abbreviations: conceptually, they
   874     they are removed from the grammar by adding new productions.
   830     are removed from the grammar by adding new productions. Priority
   875     Priority information attached to chain productions is ignored; only
   831     information attached to chain productions is ignored; only the dummy value
   876     the dummy value \<open>-1\<close> is displayed.
   832     \<open>-1\<close> is displayed.
   877 
   833 
   878     \<^descr> \<open>print modes\<close> lists the alternative print modes
   834     \<^descr> \<open>print modes\<close> lists the alternative print modes provided by this
   879     provided by this grammar; see \secref{sec:print-modes}.
   835     grammar; see \secref{sec:print-modes}.
   880 
   836 
   881     \<^descr> \<open>parse_rules\<close> and \<open>print_rules\<close> relate to
   837     \<^descr> \<open>parse_rules\<close> and \<open>print_rules\<close> relate to syntax translations (macros);
   882     syntax translations (macros); see \secref{sec:syn-trans}.
   838     see \secref{sec:syn-trans}.
   883 
   839 
   884     \<^descr> \<open>parse_ast_translation\<close> and \<open>print_ast_translation\<close> list sets of constants that invoke
   840     \<^descr> \<open>parse_ast_translation\<close> and \<open>print_ast_translation\<close> list sets of
   885     translation functions for abstract syntax trees, which are only
   841     constants that invoke translation functions for abstract syntax trees,
   886     required in very special situations; see \secref{sec:tr-funs}.
   842     which are only required in very special situations; see
   887 
   843     \secref{sec:tr-funs}.
   888     \<^descr> \<open>parse_translation\<close> and \<open>print_translation\<close>
   844 
   889     list the sets of constants that invoke regular translation
   845     \<^descr> \<open>parse_translation\<close> and \<open>print_translation\<close> list the sets of constants
   890     functions; see \secref{sec:tr-funs}.
   846     that invoke regular translation functions; see \secref{sec:tr-funs}.
   891 \<close>
   847 \<close>
   892 
   848 
   893 
   849 
   894 subsection \<open>Ambiguity of parsed expressions\<close>
   850 subsection \<open>Ambiguity of parsed expressions\<close>
   895 
   851 
   897   \begin{tabular}{rcll}
   853   \begin{tabular}{rcll}
   898     @{attribute_def syntax_ambiguity_warning} & : & \<open>attribute\<close> & default \<open>true\<close> \\
   854     @{attribute_def syntax_ambiguity_warning} & : & \<open>attribute\<close> & default \<open>true\<close> \\
   899     @{attribute_def syntax_ambiguity_limit} & : & \<open>attribute\<close> & default \<open>10\<close> \\
   855     @{attribute_def syntax_ambiguity_limit} & : & \<open>attribute\<close> & default \<open>10\<close> \\
   900   \end{tabular}
   856   \end{tabular}
   901 
   857 
   902   Depending on the grammar and the given input, parsing may be
   858   Depending on the grammar and the given input, parsing may be ambiguous.
   903   ambiguous.  Isabelle lets the Earley parser enumerate all possible
   859   Isabelle lets the Earley parser enumerate all possible parse trees, and then
   904   parse trees, and then tries to make the best out of the situation.
   860   tries to make the best out of the situation. Terms that cannot be
   905   Terms that cannot be type-checked are filtered out, which often
   861   type-checked are filtered out, which often leads to a unique result in the
   906   leads to a unique result in the end.  Unlike regular type
   862   end. Unlike regular type reconstruction, which is applied to the whole
   907   reconstruction, which is applied to the whole collection of input
   863   collection of input terms simultaneously, the filtering stage only treats
   908   terms simultaneously, the filtering stage only treats each given
   864   each given term in isolation. Filtering is also not attempted for individual
   909   term in isolation.  Filtering is also not attempted for individual
       
   910   types or raw ASTs (as required for @{command translations}).
   865   types or raw ASTs (as required for @{command translations}).
   911 
   866 
   912   Certain warning or error messages are printed, depending on the
   867   Certain warning or error messages are printed, depending on the situation
   913   situation and the given configuration options.  Parsing ultimately
   868   and the given configuration options. Parsing ultimately fails, if multiple
   914   fails, if multiple results remain after the filtering phase.
   869   results remain after the filtering phase.
   915 
   870 
   916   \<^descr> @{attribute syntax_ambiguity_warning} controls output of
   871   \<^descr> @{attribute syntax_ambiguity_warning} controls output of explicit warning
   917   explicit warning messages about syntax ambiguity.
   872   messages about syntax ambiguity.
   918 
   873 
   919   \<^descr> @{attribute syntax_ambiguity_limit} determines the number of
   874   \<^descr> @{attribute syntax_ambiguity_limit} determines the number of resulting
   920   resulting parse trees that are shown as part of the printed message
   875   parse trees that are shown as part of the printed message in case of an
   921   in case of an ambiguity.
   876   ambiguity.
   922 \<close>
   877 \<close>
   923 
   878 
   924 
   879 
   925 section \<open>Syntax transformations \label{sec:syntax-transformations}\<close>
   880 section \<open>Syntax transformations \label{sec:syntax-transformations}\<close>
   926 
   881 
   927 text \<open>The inner syntax engine of Isabelle provides separate
   882 text \<open>
   928   mechanisms to transform parse trees either via rewrite systems on
   883   The inner syntax engine of Isabelle provides separate mechanisms to
   929   first-order ASTs (\secref{sec:syn-trans}), or ML functions on ASTs
   884   transform parse trees either via rewrite systems on first-order ASTs
   930   or syntactic \<open>\<lambda>\<close>-terms (\secref{sec:tr-funs}).  This works
   885   (\secref{sec:syn-trans}), or ML functions on ASTs or syntactic \<open>\<lambda>\<close>-terms
   931   both for parsing and printing, as outlined in
   886   (\secref{sec:tr-funs}). This works both for parsing and printing, as
   932   \figref{fig:parse-print}.
   887   outlined in \figref{fig:parse-print}.
   933 
   888 
   934   \begin{figure}[htbp]
   889   \begin{figure}[htbp]
   935   \begin{center}
   890   \begin{center}
   936   \begin{tabular}{cl}
   891   \begin{tabular}{cl}
   937   string          & \\
   892   string          & \\
   952   \end{tabular}
   907   \end{tabular}
   953   \end{center}
   908   \end{center}
   954   \caption{Parsing and printing with translations}\label{fig:parse-print}
   909   \caption{Parsing and printing with translations}\label{fig:parse-print}
   955   \end{figure}
   910   \end{figure}
   956 
   911 
   957   These intermediate syntax tree formats eventually lead to a pre-term
   912   These intermediate syntax tree formats eventually lead to a pre-term with
   958   with all names and binding scopes resolved, but most type
   913   all names and binding scopes resolved, but most type information still
   959   information still missing.  Explicit type constraints might be given by
   914   missing. Explicit type constraints might be given by the user, or implicit
   960   the user, or implicit position information by the system --- both
   915   position information by the system --- both need to be passed-through
   961   need to be passed-through carefully by syntax transformations.
   916   carefully by syntax transformations.
   962 
   917 
   963   Pre-terms are further processed by the so-called \<^emph>\<open>check\<close> and
   918   Pre-terms are further processed by the so-called \<^emph>\<open>check\<close> and \<^emph>\<open>uncheck\<close>
   964   \<^emph>\<open>uncheck\<close> phases that are intertwined with type-inference (see
   919   phases that are intertwined with type-inference (see also @{cite
   965   also @{cite "isabelle-implementation"}).  The latter allows to operate
   920   "isabelle-implementation"}). The latter allows to operate on higher-order
   966   on higher-order abstract syntax with proper binding and type
   921   abstract syntax with proper binding and type information already available.
   967   information already available.
   922 
   968 
   923   As a rule of thumb, anything that manipulates bindings of variables or
   969   As a rule of thumb, anything that manipulates bindings of variables
   924   constants needs to be implemented as syntax transformation (see below).
   970   or constants needs to be implemented as syntax transformation (see
   925   Anything else is better done via check/uncheck: a prominent example
   971   below).  Anything else is better done via check/uncheck: a prominent
   926   application is the @{command abbreviation} concept of Isabelle/Pure.
   972   example application is the @{command abbreviation} concept of
   927 \<close>
   973   Isabelle/Pure.\<close>
       
   974 
   928 
   975 
   929 
   976 subsection \<open>Abstract syntax trees \label{sec:ast}\<close>
   930 subsection \<open>Abstract syntax trees \label{sec:ast}\<close>
   977 
   931 
   978 text \<open>The ML datatype @{ML_type Ast.ast} explicitly represents the
   932 text \<open>
   979   intermediate AST format that is used for syntax rewriting
   933   The ML datatype @{ML_type Ast.ast} explicitly represents the intermediate
   980   (\secref{sec:syn-trans}).  It is defined in ML as follows:
   934   AST format that is used for syntax rewriting (\secref{sec:syn-trans}). It is
       
   935   defined in ML as follows:
   981   @{verbatim [display]
   936   @{verbatim [display]
   982 \<open>datatype ast =
   937 \<open>datatype ast =
   983   Constant of string |
   938   Constant of string |
   984   Variable of string |
   939   Variable of string |
   985   Appl of ast list\<close>}
   940   Appl of ast list\<close>}
   986 
   941 
   987   An AST is either an atom (constant or variable) or a list of (at
   942   An AST is either an atom (constant or variable) or a list of (at least two)
   988   least two) subtrees.  Occasional diagnostic output of ASTs uses
   943   subtrees. Occasional diagnostic output of ASTs uses notation that resembles
   989   notation that resembles S-expression of LISP.  Constant atoms are
   944   S-expression of LISP. Constant atoms are shown as quoted strings, variable
   990   shown as quoted strings, variable atoms as non-quoted strings and
   945   atoms as non-quoted strings and applications as a parenthesized list of
   991   applications as a parenthesized list of subtrees.  For example, the
   946   subtrees. For example, the AST
   992   AST
       
   993   @{ML [display] \<open>Ast.Appl [Ast.Constant "_abs", Ast.Variable "x", Ast.Variable "t"]\<close>}
   947   @{ML [display] \<open>Ast.Appl [Ast.Constant "_abs", Ast.Variable "x", Ast.Variable "t"]\<close>}
   994   is pretty-printed as \<^verbatim>\<open>("_abs" x t)\<close>.  Note that
   948   is pretty-printed as \<^verbatim>\<open>("_abs" x t)\<close>. Note that \<^verbatim>\<open>()\<close> and \<^verbatim>\<open>(x)\<close> are
   995   \<^verbatim>\<open>()\<close> and \<^verbatim>\<open>(x)\<close> are excluded as ASTs, because
   949   excluded as ASTs, because they have too few subtrees.
   996   they have too few subtrees.
   950 
   997 
   951   \<^medskip>
   998   \<^medskip>
   952   AST application is merely a pro-forma mechanism to indicate certain
   999   AST application is merely a pro-forma mechanism to indicate
   953   syntactic structures. Thus \<^verbatim>\<open>(c a b)\<close> could mean either term application or
  1000   certain syntactic structures.  Thus \<^verbatim>\<open>(c a b)\<close> could mean
   954   type application, depending on the syntactic context.
  1001   either term application or type application, depending on the
   955 
  1002   syntactic context.
   956   Nested application like \<^verbatim>\<open>(("_abs" x t) u)\<close> is also possible, but ASTs are
  1003 
   957   definitely first-order: the syntax constant \<^verbatim>\<open>"_abs"\<close> does not bind the \<^verbatim>\<open>x\<close>
  1004   Nested application like \<^verbatim>\<open>(("_abs" x t) u)\<close> is also
   958   in any way. Proper bindings are introduced in later stages of the term
  1005   possible, but ASTs are definitely first-order: the syntax constant
   959   syntax, where \<^verbatim>\<open>("_abs" x t)\<close> becomes an @{ML Abs} node and occurrences of
  1006   \<^verbatim>\<open>"_abs"\<close> does not bind the \<^verbatim>\<open>x\<close> in any way.
   960   \<^verbatim>\<open>x\<close> in \<^verbatim>\<open>t\<close> are replaced by bound variables (represented as de-Bruijn
  1007   Proper bindings are introduced in later stages of the term syntax,
   961   indices).
  1008   where \<^verbatim>\<open>("_abs" x t)\<close> becomes an @{ML Abs} node and
       
  1009   occurrences of \<^verbatim>\<open>x\<close> in \<^verbatim>\<open>t\<close> are replaced by bound
       
  1010   variables (represented as de-Bruijn indices).
       
  1011 \<close>
   962 \<close>
  1012 
   963 
  1013 
   964 
  1014 subsubsection \<open>AST constants versus variables\<close>
   965 subsubsection \<open>AST constants versus variables\<close>
  1015 
   966 
  1016 text \<open>Depending on the situation --- input syntax, output syntax,
   967 text \<open>
  1017   translation patterns --- the distinction of atomic ASTs as @{ML
   968   Depending on the situation --- input syntax, output syntax, translation
  1018   Ast.Constant} versus @{ML Ast.Variable} serves slightly different
   969   patterns --- the distinction of atomic ASTs as @{ML Ast.Constant} versus
  1019   purposes.
   970   @{ML Ast.Variable} serves slightly different purposes.
  1020 
   971 
  1021   Input syntax of a term such as \<open>f a b = c\<close> does not yet
   972   Input syntax of a term such as \<open>f a b = c\<close> does not yet indicate the scopes
  1022   indicate the scopes of atomic entities \<open>f, a, b, c\<close>: they
   973   of atomic entities \<open>f, a, b, c\<close>: they could be global constants or local
  1023   could be global constants or local variables, even bound ones
   974   variables, even bound ones depending on the context of the term. @{ML
  1024   depending on the context of the term.  @{ML Ast.Variable} leaves
   975   Ast.Variable} leaves this choice still open: later syntax layers (or
  1025   this choice still open: later syntax layers (or translation
   976   translation functions) may capture such a variable to determine its role
  1026   functions) may capture such a variable to determine its role
   977   specifically, to make it a constant, bound variable, free variable etc. In
  1027   specifically, to make it a constant, bound variable, free variable
   978   contrast, syntax translations that introduce already known constants would
  1028   etc.  In contrast, syntax translations that introduce already known
   979   rather do it via @{ML Ast.Constant} to prevent accidental re-interpretation
  1029   constants would rather do it via @{ML Ast.Constant} to prevent
   980   later on.
  1030   accidental re-interpretation later on.
   981 
  1031 
   982   Output syntax turns term constants into @{ML Ast.Constant} and variables
  1032   Output syntax turns term constants into @{ML Ast.Constant} and
   983   (free or schematic) into @{ML Ast.Variable}. This information is precise
  1033   variables (free or schematic) into @{ML Ast.Variable}.  This
   984   when printing fully formal \<open>\<lambda>\<close>-terms.
  1034   information is precise when printing fully formal \<open>\<lambda>\<close>-terms.
   985 
  1035 
   986   \<^medskip>
  1036   \<^medskip>
   987   AST translation patterns (\secref{sec:syn-trans}) that represent terms
  1037   AST translation patterns (\secref{sec:syn-trans}) that
   988   cannot distinguish constants and variables syntactically. Explicit
  1038   represent terms cannot distinguish constants and variables
   989   indication of \<open>CONST c\<close> inside the term language is required, unless \<open>c\<close> is
  1039   syntactically.  Explicit indication of \<open>CONST c\<close> inside the
   990   known as special \<^emph>\<open>syntax constant\<close> (see also @{command syntax}). It is also
  1040   term language is required, unless \<open>c\<close> is known as special
   991   possible to use @{command syntax} declarations (without mixfix annotation)
  1041   \<^emph>\<open>syntax constant\<close> (see also @{command syntax}).  It is also
   992   to enforce that certain unqualified names are always treated as constant
  1042   possible to use @{command syntax} declarations (without mixfix
   993   within the syntax machinery.
  1043   annotation) to enforce that certain unqualified names are always
   994 
  1044   treated as constant within the syntax machinery.
   995   The situation is simpler for ASTs that represent types or sorts, since the
  1045 
   996   concrete syntax already distinguishes type variables from type constants
  1046   The situation is simpler for ASTs that represent types or sorts,
   997   (constructors). So \<open>('a, 'b) foo\<close> corresponds to an AST application of some
  1047   since the concrete syntax already distinguishes type variables from
   998   constant for \<open>foo\<close> and variable arguments for \<open>'a\<close> and \<open>'b\<close>. Note that the
  1048   type constants (constructors).  So \<open>('a, 'b) foo\<close>
   999   postfix application is merely a feature of the concrete syntax, while in the
  1049   corresponds to an AST application of some constant for \<open>foo\<close>
  1000   AST the constructor occurs in head position.
  1050   and variable arguments for \<open>'a\<close> and \<open>'b\<close>.  Note that
  1001 \<close>
  1051   the postfix application is merely a feature of the concrete syntax,
       
  1052   while in the AST the constructor occurs in head position.\<close>
       
  1053 
  1002 
  1054 
  1003 
  1055 subsubsection \<open>Authentic syntax names\<close>
  1004 subsubsection \<open>Authentic syntax names\<close>
  1056 
  1005 
  1057 text \<open>Naming constant entities within ASTs is another delicate
  1006 text \<open>
  1058   issue.  Unqualified names are resolved in the name space tables in
  1007   Naming constant entities within ASTs is another delicate issue. Unqualified
  1059   the last stage of parsing, after all translations have been applied.
  1008   names are resolved in the name space tables in the last stage of parsing,
  1060   Since syntax transformations do not know about this later name
  1009   after all translations have been applied. Since syntax transformations do
  1061   resolution, there can be surprises in boundary cases.
  1010   not know about this later name resolution, there can be surprises in
  1062 
  1011   boundary cases.
  1063   \<^emph>\<open>Authentic syntax names\<close> for @{ML Ast.Constant} avoid this
  1012 
  1064   problem: the fully-qualified constant name with a special prefix for
  1013   \<^emph>\<open>Authentic syntax names\<close> for @{ML Ast.Constant} avoid this problem: the
  1065   its formal category (\<open>class\<close>, \<open>type\<close>, \<open>const\<close>, \<open>fixed\<close>) represents the information faithfully
  1014   fully-qualified constant name with a special prefix for its formal category
  1066   within the untyped AST format.  Accidental overlap with free or
  1015   (\<open>class\<close>, \<open>type\<close>, \<open>const\<close>, \<open>fixed\<close>) represents the information faithfully
  1067   bound variables is excluded as well.  Authentic syntax names work
  1016   within the untyped AST format. Accidental overlap with free or bound
  1068   implicitly in the following situations:
  1017   variables is excluded as well. Authentic syntax names work implicitly in the
  1069 
  1018   following situations:
  1070   \<^item> Input of term constants (or fixed variables) that are
  1019 
  1071   introduced by concrete syntax via @{command notation}: the
  1020     \<^item> Input of term constants (or fixed variables) that are introduced by
  1072   correspondence of a particular grammar production to some known term
  1021     concrete syntax via @{command notation}: the correspondence of a
  1073   entity is preserved.
  1022     particular grammar production to some known term entity is preserved.
  1074 
  1023 
  1075   \<^item> Input of type constants (constructors) and type classes ---
  1024     \<^item> Input of type constants (constructors) and type classes --- thanks to
  1076   thanks to explicit syntactic distinction independently on the
  1025     explicit syntactic distinction independently on the context.
  1077   context.
  1026 
  1078 
  1027     \<^item> Output of term constants, type constants, type classes --- this
  1079   \<^item> Output of term constants, type constants, type classes ---
  1028     information is already available from the internal term to be printed.
  1080   this information is already available from the internal term to be
  1029 
  1081   printed.
  1030   In other words, syntax transformations that operate on input terms written
  1082 
  1031   as prefix applications are difficult to make robust. Luckily, this case
  1083 
  1032   rarely occurs in practice, because syntax forms to be translated usually
  1084   In other words, syntax transformations that operate on input terms
  1033   correspond to some concrete notation.
  1085   written as prefix applications are difficult to make robust.
  1034 \<close>
  1086   Luckily, this case rarely occurs in practice, because syntax forms
       
  1087   to be translated usually correspond to some concrete notation.\<close>
       
  1088 
  1035 
  1089 
  1036 
  1090 subsection \<open>Raw syntax and translations \label{sec:syn-trans}\<close>
  1037 subsection \<open>Raw syntax and translations \label{sec:syn-trans}\<close>
  1091 
  1038 
  1092 text \<open>
  1039 text \<open>
  1099     @{attribute_def syntax_ast_trace} & : & \<open>attribute\<close> & default \<open>false\<close> \\
  1046     @{attribute_def syntax_ast_trace} & : & \<open>attribute\<close> & default \<open>false\<close> \\
  1100     @{attribute_def syntax_ast_stats} & : & \<open>attribute\<close> & default \<open>false\<close> \\
  1047     @{attribute_def syntax_ast_stats} & : & \<open>attribute\<close> & default \<open>false\<close> \\
  1101   \end{tabular}
  1048   \end{tabular}
  1102   \<^medskip>
  1049   \<^medskip>
  1103 
  1050 
  1104   Unlike mixfix notation for existing formal entities
  1051   Unlike mixfix notation for existing formal entities (\secref{sec:notation}),
  1105   (\secref{sec:notation}), raw syntax declarations provide full access
  1052   raw syntax declarations provide full access to the priority grammar of the
  1106   to the priority grammar of the inner syntax, without any sanity
  1053   inner syntax, without any sanity checks. This includes additional syntactic
  1107   checks.  This includes additional syntactic categories (via
  1054   categories (via @{command nonterminal}) and free-form grammar productions
  1108   @{command nonterminal}) and free-form grammar productions (via
  1055   (via @{command syntax}). Additional syntax translations (or macros, via
  1109   @{command syntax}).  Additional syntax translations (or macros, via
  1056   @{command translations}) are required to turn resulting parse trees into
  1110   @{command translations}) are required to turn resulting parse trees
  1057   proper representations of formal entities again.
  1111   into proper representations of formal entities again.
       
  1112 
  1058 
  1113   @{rail \<open>
  1059   @{rail \<open>
  1114     @@{command nonterminal} (@{syntax name} + @'and')
  1060     @@{command nonterminal} (@{syntax name} + @'and')
  1115     ;
  1061     ;
  1116     (@@{command syntax} | @@{command no_syntax}) @{syntax mode}? (constdecl +)
  1062     (@@{command syntax} | @@{command no_syntax}) @{syntax mode}? (constdecl +)
  1124     mode: ('(' ( @{syntax name} | @'output' | @{syntax name} @'output' ) ')')
  1070     mode: ('(' ( @{syntax name} | @'output' | @{syntax name} @'output' ) ')')
  1125     ;
  1071     ;
  1126     transpat: ('(' @{syntax nameref} ')')? @{syntax string}
  1072     transpat: ('(' @{syntax nameref} ')')? @{syntax string}
  1127   \<close>}
  1073   \<close>}
  1128 
  1074 
  1129   \<^descr> @{command "nonterminal"}~\<open>c\<close> declares a type
  1075   \<^descr> @{command "nonterminal"}~\<open>c\<close> declares a type constructor \<open>c\<close> (without
  1130   constructor \<open>c\<close> (without arguments) to act as purely syntactic
  1076   arguments) to act as purely syntactic type: a nonterminal symbol of the
  1131   type: a nonterminal symbol of the inner syntax.
  1077   inner syntax.
  1132 
  1078 
  1133   \<^descr> @{command "syntax"}~\<open>(mode) c :: \<sigma> (mx)\<close> augments the
  1079   \<^descr> @{command "syntax"}~\<open>(mode) c :: \<sigma> (mx)\<close> augments the priority grammar and
  1134   priority grammar and the pretty printer table for the given print
  1080   the pretty printer table for the given print mode (default \<^verbatim>\<open>""\<close>). An
  1135   mode (default \<^verbatim>\<open>""\<close>). An optional keyword @{keyword_ref
  1081   optional keyword @{keyword_ref "output"} means that only the pretty printer
  1136   "output"} means that only the pretty printer table is affected.
  1082   table is affected.
  1137 
  1083 
  1138   Following \secref{sec:mixfix}, the mixfix annotation \<open>mx =
  1084   Following \secref{sec:mixfix}, the mixfix annotation \<open>mx = template ps q\<close>
  1139   template ps q\<close> together with type \<open>\<sigma> = \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> and
  1085   together with type \<open>\<sigma> = \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> and specify a grammar production.
  1140   specify a grammar production.  The \<open>template\<close> contains
  1086   The \<open>template\<close> contains delimiter tokens that surround \<open>n\<close> argument
  1141   delimiter tokens that surround \<open>n\<close> argument positions
  1087   positions (\<^verbatim>\<open>_\<close>). The latter correspond to nonterminal symbols \<open>A\<^sub>i\<close> derived
  1142   (\<^verbatim>\<open>_\<close>).  The latter correspond to nonterminal symbols
  1088   from the argument types \<open>\<tau>\<^sub>i\<close> as follows:
  1143   \<open>A\<^sub>i\<close> derived from the argument types \<open>\<tau>\<^sub>i\<close> as
       
  1144   follows:
       
  1145 
  1089 
  1146     \<^item> \<open>prop\<close> if \<open>\<tau>\<^sub>i = prop\<close>
  1090     \<^item> \<open>prop\<close> if \<open>\<tau>\<^sub>i = prop\<close>
  1147 
  1091 
  1148     \<^item> \<open>logic\<close> if \<open>\<tau>\<^sub>i = (\<dots>)\<kappa>\<close> for logical type
  1092     \<^item> \<open>logic\<close> if \<open>\<tau>\<^sub>i = (\<dots>)\<kappa>\<close> for logical type constructor \<open>\<kappa> \<noteq> prop\<close>
  1149     constructor \<open>\<kappa> \<noteq> prop\<close>
       
  1150 
  1093 
  1151     \<^item> \<open>any\<close> if \<open>\<tau>\<^sub>i = \<alpha>\<close> for type variables
  1094     \<^item> \<open>any\<close> if \<open>\<tau>\<^sub>i = \<alpha>\<close> for type variables
  1152 
  1095 
  1153     \<^item> \<open>\<kappa>\<close> if \<open>\<tau>\<^sub>i = \<kappa>\<close> for nonterminal \<open>\<kappa>\<close>
  1096     \<^item> \<open>\<kappa>\<close> if \<open>\<tau>\<^sub>i = \<kappa>\<close> for nonterminal \<open>\<kappa>\<close> (syntactic type constructor)
  1154     (syntactic type constructor)
  1097 
  1155 
  1098   Each \<open>A\<^sub>i\<close> is decorated by priority \<open>p\<^sub>i\<close> from the given list \<open>ps\<close>; missing
  1156   Each \<open>A\<^sub>i\<close> is decorated by priority \<open>p\<^sub>i\<close> from the
  1099   priorities default to 0.
  1157   given list \<open>ps\<close>; missing priorities default to 0.
  1100 
  1158 
  1101   The resulting nonterminal of the production is determined similarly from
  1159   The resulting nonterminal of the production is determined similarly
  1102   type \<open>\<tau>\<close>, with priority \<open>q\<close> and default 1000.
  1160   from type \<open>\<tau>\<close>, with priority \<open>q\<close> and default 1000.
  1103 
  1161 
  1104   \<^medskip>
  1162   \<^medskip>
  1105   Parsing via this production produces parse trees \<open>t\<^sub>1, \<dots>, t\<^sub>n\<close> for the
  1163   Parsing via this production produces parse trees \<open>t\<^sub>1, \<dots>, t\<^sub>n\<close> for the argument slots.  The resulting parse tree is
  1106   argument slots. The resulting parse tree is composed as \<open>c t\<^sub>1 \<dots> t\<^sub>n\<close>, by
  1164   composed as \<open>c t\<^sub>1 \<dots> t\<^sub>n\<close>, by using the syntax constant \<open>c\<close> of the syntax declaration.
  1107   using the syntax constant \<open>c\<close> of the syntax declaration.
  1165 
  1108 
  1166   Such syntactic constants are invented on the spot, without formal
  1109   Such syntactic constants are invented on the spot, without formal check
  1167   check wrt.\ existing declarations.  It is conventional to use plain
  1110   wrt.\ existing declarations. It is conventional to use plain identifiers
  1168   identifiers prefixed by a single underscore (e.g.\ \<open>_foobar\<close>).  Names should be chosen with care, to avoid clashes
  1111   prefixed by a single underscore (e.g.\ \<open>_foobar\<close>). Names should be chosen
  1169   with other syntax declarations.
  1112   with care, to avoid clashes with other syntax declarations.
  1170 
  1113 
  1171   \<^medskip>
  1114   \<^medskip>
  1172   The special case of copy production is specified by \<open>c =\<close>~\<^verbatim>\<open>""\<close> (empty string).
  1115   The special case of copy production is specified by \<open>c =\<close>~\<^verbatim>\<open>""\<close> (empty
  1173   It means that the resulting parse tree \<open>t\<close> is copied directly, without any
  1116   string). It means that the resulting parse tree \<open>t\<close> is copied directly,
  1174   further decoration.
  1117   without any further decoration.
  1175 
  1118 
  1176   \<^descr> @{command "no_syntax"}~\<open>(mode) decls\<close> removes grammar
  1119   \<^descr> @{command "no_syntax"}~\<open>(mode) decls\<close> removes grammar declarations (and
  1177   declarations (and translations) resulting from \<open>decls\<close>, which
  1120   translations) resulting from \<open>decls\<close>, which are interpreted in the same
  1178   are interpreted in the same manner as for @{command "syntax"} above.
  1121   manner as for @{command "syntax"} above.
  1179 
  1122 
  1180   \<^descr> @{command "translations"}~\<open>rules\<close> specifies syntactic
  1123   \<^descr> @{command "translations"}~\<open>rules\<close> specifies syntactic translation rules
  1181   translation rules (i.e.\ macros) as first-order rewrite rules on
  1124   (i.e.\ macros) as first-order rewrite rules on ASTs (\secref{sec:ast}). The
  1182   ASTs (\secref{sec:ast}).  The theory context maintains two
  1125   theory context maintains two independent lists translation rules: parse
  1183   independent lists translation rules: parse rules (\<^verbatim>\<open>=>\<close>
  1126   rules (\<^verbatim>\<open>=>\<close> or \<open>\<rightharpoonup>\<close>) and print rules (\<^verbatim>\<open><=\<close> or \<open>\<leftharpoondown>\<close>). For convenience, both
  1184   or \<open>\<rightharpoonup>\<close>) and print rules (\<^verbatim>\<open><=\<close> or \<open>\<leftharpoondown>\<close>).
  1127   can be specified simultaneously as parse~/ print rules (\<^verbatim>\<open>==\<close> or \<open>\<rightleftharpoons>\<close>).
  1185   For convenience, both can be specified simultaneously as parse~/
  1128 
  1186   print rules (\<^verbatim>\<open>==\<close> or \<open>\<rightleftharpoons>\<close>).
  1129   Translation patterns may be prefixed by the syntactic category to be used
  1187 
  1130   for parsing; the default is \<open>logic\<close> which means that regular term syntax is
  1188   Translation patterns may be prefixed by the syntactic category to be
  1131   used. Both sides of the syntax translation rule undergo parsing and parse
  1189   used for parsing; the default is \<open>logic\<close> which means that
  1132   AST translations \secref{sec:tr-funs}, in order to perform some fundamental
  1190   regular term syntax is used.  Both sides of the syntax translation
  1133   normalization like \<open>\<lambda>x y. b \<leadsto> \<lambda>x. \<lambda>y. b\<close>, but other AST translation rules
  1191   rule undergo parsing and parse AST translations
  1134   are \<^emph>\<open>not\<close> applied recursively here.
  1192   \secref{sec:tr-funs}, in order to perform some fundamental
  1135 
  1193   normalization like \<open>\<lambda>x y. b \<leadsto> \<lambda>x. \<lambda>y. b\<close>, but other AST
  1136   When processing AST patterns, the inner syntax lexer runs in a different
  1194   translation rules are \<^emph>\<open>not\<close> applied recursively here.
  1137   mode that allows identifiers to start with underscore. This accommodates the
  1195 
  1138   usual naming convention for auxiliary syntax constants --- those that do not
  1196   When processing AST patterns, the inner syntax lexer runs in a
  1139   have a logical counter part --- by allowing to specify arbitrary AST
  1197   different mode that allows identifiers to start with underscore.
  1140   applications within the term syntax, independently of the corresponding
  1198   This accommodates the usual naming convention for auxiliary syntax
  1141   concrete syntax.
  1199   constants --- those that do not have a logical counter part --- by
       
  1200   allowing to specify arbitrary AST applications within the term
       
  1201   syntax, independently of the corresponding concrete syntax.
       
  1202 
  1142 
  1203   Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML
  1143   Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML
  1204   Ast.Variable} as follows: a qualified name or syntax constant
  1144   Ast.Variable} as follows: a qualified name or syntax constant declared via
  1205   declared via @{command syntax}, or parse tree head of concrete
  1145   @{command syntax}, or parse tree head of concrete notation becomes @{ML
  1206   notation becomes @{ML Ast.Constant}, anything else @{ML
  1146   Ast.Constant}, anything else @{ML Ast.Variable}. Note that \<open>CONST\<close> and
  1207   Ast.Variable}.  Note that \<open>CONST\<close> and \<open>XCONST\<close> within
  1147   \<open>XCONST\<close> within the term language (\secref{sec:pure-grammar}) allow to
  1208   the term language (\secref{sec:pure-grammar}) allow to enforce
  1148   enforce treatment as constants.
  1209   treatment as constants.
  1149 
  1210 
  1150   AST rewrite rules \<open>(lhs, rhs)\<close> need to obey the following side-conditions:
  1211   AST rewrite rules \<open>(lhs, rhs)\<close> need to obey the following
  1151 
  1212   side-conditions:
  1152     \<^item> Rules must be left linear: \<open>lhs\<close> must not contain repeated
  1213 
  1153     variables.\<^footnote>\<open>The deeper reason for this is that AST equality is not
  1214     \<^item> Rules must be left linear: \<open>lhs\<close> must not contain
  1154     well-defined: different occurrences of the ``same'' AST could be decorated
  1215     repeated variables.\<^footnote>\<open>The deeper reason for this is that AST
  1155     differently by accidental type-constraints or source position information,
  1216     equality is not well-defined: different occurrences of the ``same''
  1156     for example.\<close>
  1217     AST could be decorated differently by accidental type-constraints or
       
  1218     source position information, for example.\<close>
       
  1219 
  1157 
  1220     \<^item> Every variable in \<open>rhs\<close> must also occur in \<open>lhs\<close>.
  1158     \<^item> Every variable in \<open>rhs\<close> must also occur in \<open>lhs\<close>.
  1221 
  1159 
  1222   \<^descr> @{command "no_translations"}~\<open>rules\<close> removes syntactic
  1160   \<^descr> @{command "no_translations"}~\<open>rules\<close> removes syntactic translation rules,
  1223   translation rules, which are interpreted in the same manner as for
  1161   which are interpreted in the same manner as for @{command "translations"}
  1224   @{command "translations"} above.
  1162   above.
  1225 
  1163 
  1226   \<^descr> @{attribute syntax_ast_trace} and @{attribute
  1164   \<^descr> @{attribute syntax_ast_trace} and @{attribute syntax_ast_stats} control
  1227   syntax_ast_stats} control diagnostic output in the AST normalization
  1165   diagnostic output in the AST normalization process, when translation rules
  1228   process, when translation rules are applied to concrete input or
  1166   are applied to concrete input or output.
  1229   output.
  1167 
  1230 
  1168 
  1231 
  1169   Raw syntax and translations provides a slightly more low-level access to the
  1232   Raw syntax and translations provides a slightly more low-level
  1170   grammar and the form of resulting parse trees. It is often possible to avoid
  1233   access to the grammar and the form of resulting parse trees.  It is
  1171   this untyped macro mechanism, and use type-safe @{command abbreviation} or
  1234   often possible to avoid this untyped macro mechanism, and use
  1172   @{command notation} instead. Some important situations where @{command
  1235   type-safe @{command abbreviation} or @{command notation} instead.
  1173   syntax} and @{command translations} are really need are as follows:
  1236   Some important situations where @{command syntax} and @{command
  1174 
  1237   translations} are really need are as follows:
  1175   \<^item> Iterated replacement via recursive @{command translations}. For example,
  1238 
  1176   consider list enumeration @{term "[a, b, c, d]"} as defined in theory
  1239   \<^item> Iterated replacement via recursive @{command translations}.
  1177   @{theory List} in Isabelle/HOL.
  1240   For example, consider list enumeration @{term "[a, b, c, d]"} as
  1178 
       
  1179   \<^item> Change of binding status of variables: anything beyond the built-in
       
  1180   @{keyword "binder"} mixfix annotation requires explicit syntax translations.
       
  1181   For example, consider list filter comprehension @{term "[x \<leftarrow> xs . P]"} as
  1241   defined in theory @{theory List} in Isabelle/HOL.
  1182   defined in theory @{theory List} in Isabelle/HOL.
  1242 
       
  1243   \<^item> Change of binding status of variables: anything beyond the
       
  1244   built-in @{keyword "binder"} mixfix annotation requires explicit
       
  1245   syntax translations.  For example, consider list filter
       
  1246   comprehension @{term "[x \<leftarrow> xs . P]"} as defined in theory @{theory
       
  1247   List} in Isabelle/HOL.
       
  1248 \<close>
  1183 \<close>
  1249 
  1184 
  1250 
  1185 
  1251 subsubsection \<open>Applying translation rules\<close>
  1186 subsubsection \<open>Applying translation rules\<close>
  1252 
  1187 
  1253 text \<open>As a term is being parsed or printed, an AST is generated as
  1188 text \<open>
  1254   an intermediate form according to \figref{fig:parse-print}.  The AST
  1189   As a term is being parsed or printed, an AST is generated as an intermediate
  1255   is normalized by applying translation rules in the manner of a
  1190   form according to \figref{fig:parse-print}. The AST is normalized by
  1256   first-order term rewriting system.  We first examine how a single
  1191   applying translation rules in the manner of a first-order term rewriting
  1257   rule is applied.
  1192   system. We first examine how a single rule is applied.
  1258 
  1193 
  1259   Let \<open>t\<close> be the abstract syntax tree to be normalized and
  1194   Let \<open>t\<close> be the abstract syntax tree to be normalized and \<open>(lhs, rhs)\<close> some
  1260   \<open>(lhs, rhs)\<close> some translation rule.  A subtree \<open>u\<close>
  1195   translation rule. A subtree \<open>u\<close> of \<open>t\<close> is called \<^emph>\<open>redex\<close> if it is an
  1261   of \<open>t\<close> is called \<^emph>\<open>redex\<close> if it is an instance of \<open>lhs\<close>; in this case the pattern \<open>lhs\<close> is said to match the
  1196   instance of \<open>lhs\<close>; in this case the pattern \<open>lhs\<close> is said to match the
  1262   object \<open>u\<close>.  A redex matched by \<open>lhs\<close> may be
  1197   object \<open>u\<close>. A redex matched by \<open>lhs\<close> may be replaced by the corresponding
  1263   replaced by the corresponding instance of \<open>rhs\<close>, thus
  1198   instance of \<open>rhs\<close>, thus \<^emph>\<open>rewriting\<close> the AST \<open>t\<close>. Matching requires some
  1264   \<^emph>\<open>rewriting\<close> the AST \<open>t\<close>.  Matching requires some notion
  1199   notion of \<^emph>\<open>place-holders\<close> in rule patterns: @{ML Ast.Variable} serves this
  1265   of \<^emph>\<open>place-holders\<close> in rule patterns: @{ML Ast.Variable} serves
  1200   purpose.
  1266   this purpose.
  1201 
  1267 
  1202   More precisely, the matching of the object \<open>u\<close> against the pattern \<open>lhs\<close> is
  1268   More precisely, the matching of the object \<open>u\<close> against the
  1203   performed as follows:
  1269   pattern \<open>lhs\<close> is performed as follows:
  1204 
  1270 
  1205     \<^item> Objects of the form @{ML Ast.Variable}~\<open>x\<close> or @{ML Ast.Constant}~\<open>x\<close> are
  1271   \<^item> Objects of the form @{ML Ast.Variable}~\<open>x\<close> or @{ML
  1206     matched by pattern @{ML Ast.Constant}~\<open>x\<close>. Thus all atomic ASTs in the
  1272   Ast.Constant}~\<open>x\<close> are matched by pattern @{ML
  1207     object are treated as (potential) constants, and a successful match makes
  1273   Ast.Constant}~\<open>x\<close>.  Thus all atomic ASTs in the object are
  1208     them actual constants even before name space resolution (see also
  1274   treated as (potential) constants, and a successful match makes them
  1209     \secref{sec:ast}).
  1275   actual constants even before name space resolution (see also
  1210 
  1276   \secref{sec:ast}).
  1211     \<^item> Object \<open>u\<close> is matched by pattern @{ML Ast.Variable}~\<open>x\<close>, binding \<open>x\<close> to
  1277 
  1212     \<open>u\<close>.
  1278   \<^item> Object \<open>u\<close> is matched by pattern @{ML
  1213 
  1279   Ast.Variable}~\<open>x\<close>, binding \<open>x\<close> to \<open>u\<close>.
  1214     \<^item> Object @{ML Ast.Appl}~\<open>us\<close> is matched by @{ML Ast.Appl}~\<open>ts\<close> if \<open>us\<close> and
  1280 
  1215     \<open>ts\<close> have the same length and each corresponding subtree matches.
  1281   \<^item> Object @{ML Ast.Appl}~\<open>us\<close> is matched by @{ML
  1216 
  1282   Ast.Appl}~\<open>ts\<close> if \<open>us\<close> and \<open>ts\<close> have the
  1217     \<^item> In every other case, matching fails.
  1283   same length and each corresponding subtree matches.
  1218 
  1284 
  1219   A successful match yields a substitution that is applied to \<open>rhs\<close>,
  1285   \<^item> In every other case, matching fails.
  1220   generating the instance that replaces \<open>u\<close>.
  1286 
  1221 
  1287 
  1222   Normalizing an AST involves repeatedly applying translation rules until none
  1288   A successful match yields a substitution that is applied to \<open>rhs\<close>, generating the instance that replaces \<open>u\<close>.
  1223   are applicable. This works yoyo-like: top-down, bottom-up, top-down, etc. At
  1289 
  1224   each subtree position, rules are chosen in order of appearance in the theory
  1290   Normalizing an AST involves repeatedly applying translation rules
  1225   definitions.
  1291   until none are applicable.  This works yoyo-like: top-down,
  1226 
  1292   bottom-up, top-down, etc.  At each subtree position, rules are
  1227   The configuration options @{attribute syntax_ast_trace} and @{attribute
  1293   chosen in order of appearance in the theory definitions.
  1228   syntax_ast_stats} might help to understand this process and diagnose
  1294 
  1229   problems.
  1295   The configuration options @{attribute syntax_ast_trace} and
       
  1296   @{attribute syntax_ast_stats} might help to understand this process
       
  1297   and diagnose problems.
       
  1298 
  1230 
  1299   \begin{warn}
  1231   \begin{warn}
  1300   If syntax translation rules work incorrectly, the output of
  1232   If syntax translation rules work incorrectly, the output of @{command_ref
  1301   @{command_ref print_syntax} with its \<^emph>\<open>rules\<close> sections reveals the
  1233   print_syntax} with its \<^emph>\<open>rules\<close> sections reveals the actual internal forms
  1302   actual internal forms of AST pattern, without potentially confusing
  1234   of AST pattern, without potentially confusing concrete syntax. Recall that
  1303   concrete syntax.  Recall that AST constants appear as quoted strings
  1235   AST constants appear as quoted strings and variables without quotes.
  1304   and variables without quotes.
       
  1305   \end{warn}
  1236   \end{warn}
  1306 
  1237 
  1307   \begin{warn}
  1238   \begin{warn}
  1308   If @{attribute_ref eta_contract} is set to \<open>true\<close>, terms
  1239   If @{attribute_ref eta_contract} is set to \<open>true\<close>, terms will be
  1309   will be \<open>\<eta>\<close>-contracted \<^emph>\<open>before\<close> the AST rewriter sees
  1240   \<open>\<eta>\<close>-contracted \<^emph>\<open>before\<close> the AST rewriter sees them. Thus some abstraction
  1310   them.  Thus some abstraction nodes needed for print rules to match
  1241   nodes needed for print rules to match may vanish. For example, \<open>Ball A (\<lambda>x.
  1311   may vanish.  For example, \<open>Ball A (\<lambda>x. P x)\<close> would contract
  1242   P x)\<close> would contract to \<open>Ball A P\<close> and the standard print rule would fail to
  1312   to \<open>Ball A P\<close> and the standard print rule would fail to
  1243   apply. This problem can be avoided by hand-written ML translation functions
  1313   apply.  This problem can be avoided by hand-written ML translation
  1244   (see also \secref{sec:tr-funs}), which is in fact the same mechanism used in
  1314   functions (see also \secref{sec:tr-funs}), which is in fact the same
  1245   built-in @{keyword "binder"} declarations.
  1315   mechanism used in built-in @{keyword "binder"} declarations.
       
  1316   \end{warn}
  1246   \end{warn}
  1317 \<close>
  1247 \<close>
  1318 
  1248 
  1319 
  1249 
  1320 subsection \<open>Syntax translation functions \label{sec:tr-funs}\<close>
  1250 subsection \<open>Syntax translation functions \label{sec:tr-funs}\<close>
  1345    @@{ML_antiquotation type_syntax} |
  1275    @@{ML_antiquotation type_syntax} |
  1346    @@{ML_antiquotation const_syntax} |
  1276    @@{ML_antiquotation const_syntax} |
  1347    @@{ML_antiquotation syntax_const}) name
  1277    @@{ML_antiquotation syntax_const}) name
  1348   \<close>}
  1278   \<close>}
  1349 
  1279 
  1350   \<^descr> @{command parse_translation} etc. declare syntax translation
  1280   \<^descr> @{command parse_translation} etc. declare syntax translation functions to
  1351   functions to the theory.  Any of these commands have a single
  1281   the theory. Any of these commands have a single @{syntax text} argument that
  1352   @{syntax text} argument that refers to an ML expression of
  1282   refers to an ML expression of appropriate type as follows:
  1353   appropriate type as follows:
       
  1354 
  1283 
  1355   \<^medskip>
  1284   \<^medskip>
  1356   {\footnotesize
  1285   {\footnotesize
  1357   \begin{tabular}{l}
  1286   \begin{tabular}{l}
  1358   @{command parse_ast_translation} : \\
  1287   @{command parse_ast_translation} : \\
  1366   @{command print_ast_translation} : \\
  1295   @{command print_ast_translation} : \\
  1367   \quad @{ML_type "(string * (Proof.context -> Ast.ast list -> Ast.ast)) list"} \\
  1296   \quad @{ML_type "(string * (Proof.context -> Ast.ast list -> Ast.ast)) list"} \\
  1368   \end{tabular}}
  1297   \end{tabular}}
  1369   \<^medskip>
  1298   \<^medskip>
  1370 
  1299 
  1371   The argument list consists of \<open>(c, tr)\<close> pairs, where \<open>c\<close> is the syntax name of the formal entity involved, and \<open>tr\<close> a function that translates a syntax form \<open>c args\<close> into
  1300   The argument list consists of \<open>(c, tr)\<close> pairs, where \<open>c\<close> is the syntax name
  1372   \<open>tr ctxt args\<close> (depending on the context).  The Isabelle/ML
  1301   of the formal entity involved, and \<open>tr\<close> a function that translates a syntax
  1373   naming convention for parse translations is \<open>c_tr\<close> and for
  1302   form \<open>c args\<close> into \<open>tr ctxt args\<close> (depending on the context). The
  1374   print translations \<open>c_tr'\<close>.
  1303   Isabelle/ML naming convention for parse translations is \<open>c_tr\<close> and for print
       
  1304   translations \<open>c_tr'\<close>.
  1375 
  1305 
  1376   The @{command_ref print_syntax} command displays the sets of names
  1306   The @{command_ref print_syntax} command displays the sets of names
  1377   associated with the translation functions of a theory under \<open>parse_ast_translation\<close> etc.
  1307   associated with the translation functions of a theory under
  1378 
  1308   \<open>parse_ast_translation\<close> etc.
  1379   \<^descr> \<open>@{class_syntax c}\<close>, \<open>@{type_syntax c}\<close>,
  1309 
  1380   \<open>@{const_syntax c}\<close> inline the authentic syntax name of the
  1310   \<^descr> \<open>@{class_syntax c}\<close>, \<open>@{type_syntax c}\<close>, \<open>@{const_syntax c}\<close> inline the
  1381   given formal entities into the ML source.  This is the
  1311   authentic syntax name of the given formal entities into the ML source. This
  1382   fully-qualified logical name prefixed by a special marker to
  1312   is the fully-qualified logical name prefixed by a special marker to indicate
  1383   indicate its kind: thus different logical name spaces are properly
  1313   its kind: thus different logical name spaces are properly distinguished
  1384   distinguished within parse trees.
  1314   within parse trees.
  1385 
  1315 
  1386   \<^descr> \<open>@{const_syntax c}\<close> inlines the name \<open>c\<close> of
  1316   \<^descr> \<open>@{const_syntax c}\<close> inlines the name \<open>c\<close> of the given syntax constant,
  1387   the given syntax constant, having checked that it has been declared
  1317   having checked that it has been declared via some @{command syntax} commands
  1388   via some @{command syntax} commands within the theory context.  Note
  1318   within the theory context. Note that the usual naming convention makes
  1389   that the usual naming convention makes syntax constants start with
  1319   syntax constants start with underscore, to reduce the chance of accidental
  1390   underscore, to reduce the chance of accidental clashes with other
  1320   clashes with other names occurring in parse trees (unqualified constants
  1391   names occurring in parse trees (unqualified constants etc.).
  1321   etc.).
  1392 \<close>
  1322 \<close>
  1393 
  1323 
  1394 
  1324 
  1395 subsubsection \<open>The translation strategy\<close>
  1325 subsubsection \<open>The translation strategy\<close>
  1396 
  1326 
  1397 text \<open>The different kinds of translation functions are invoked during
  1327 text \<open>
  1398   the transformations between parse trees, ASTs and syntactic terms
  1328   The different kinds of translation functions are invoked during the
  1399   (cf.\ \figref{fig:parse-print}).  Whenever a combination of the form
  1329   transformations between parse trees, ASTs and syntactic terms (cf.\
  1400   \<open>c x\<^sub>1 \<dots> x\<^sub>n\<close> is encountered, and a translation function
  1330   \figref{fig:parse-print}). Whenever a combination of the form \<open>c x\<^sub>1 \<dots> x\<^sub>n\<close>
  1401   \<open>f\<close> of appropriate kind is declared for \<open>c\<close>, the
  1331   is encountered, and a translation function \<open>f\<close> of appropriate kind is
  1402   result is produced by evaluation of \<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> in ML.
  1332   declared for \<open>c\<close>, the result is produced by evaluation of \<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close>
  1403 
  1333   in ML.
  1404   For AST translations, the arguments \<open>x\<^sub>1, \<dots>, x\<^sub>n\<close> are ASTs.  A
  1334 
  1405   combination has the form @{ML "Ast.Constant"}~\<open>c\<close> or @{ML
  1335   For AST translations, the arguments \<open>x\<^sub>1, \<dots>, x\<^sub>n\<close> are ASTs. A combination
  1406   "Ast.Appl"}~\<open>[\<close>@{ML Ast.Constant}~\<open>c, x\<^sub>1, \<dots>, x\<^sub>n]\<close>.
  1336   has the form @{ML "Ast.Constant"}~\<open>c\<close> or @{ML "Ast.Appl"}~\<open>[\<close>@{ML
  1407   For term translations, the arguments are terms and a combination has
  1337   Ast.Constant}~\<open>c, x\<^sub>1, \<dots>, x\<^sub>n]\<close>. For term translations, the arguments are
  1408   the form @{ML Const}~\<open>(c, \<tau>)\<close> or @{ML Const}~\<open>(c, \<tau>)
  1338   terms and a combination has the form @{ML Const}~\<open>(c, \<tau>)\<close> or @{ML
  1409   $ x\<^sub>1 $ \<dots> $ x\<^sub>n\<close>.  Terms allow more sophisticated transformations
  1339   Const}~\<open>(c, \<tau>) $ x\<^sub>1 $ \<dots> $ x\<^sub>n\<close>. Terms allow more sophisticated
  1410   than ASTs do, typically involving abstractions and bound
  1340   transformations than ASTs do, typically involving abstractions and bound
  1411   variables. \<^emph>\<open>Typed\<close> print translations may even peek at the type
  1341   variables. \<^emph>\<open>Typed\<close> print translations may even peek at the type \<open>\<tau>\<close> of the
  1412   \<open>\<tau>\<close> of the constant they are invoked on, although some
  1342   constant they are invoked on, although some information might have been
  1413   information might have been suppressed for term output already.
  1343   suppressed for term output already.
  1414 
  1344 
  1415   Regardless of whether they act on ASTs or terms, translation
  1345   Regardless of whether they act on ASTs or terms, translation functions
  1416   functions called during the parsing process differ from those for
  1346   called during the parsing process differ from those for printing in their
  1417   printing in their overall behaviour:
  1347   overall behaviour:
  1418 
  1348 
  1419   \<^descr>[Parse translations] are applied bottom-up.  The arguments are
  1349     \<^descr>[Parse translations] are applied bottom-up. The arguments are already in
  1420   already in translated form.  The translations must not fail;
  1350     translated form. The translations must not fail; exceptions trigger an
  1421   exceptions trigger an error message.  There may be at most one
  1351     error message. There may be at most one function associated with any
  1422   function associated with any syntactic name.
  1352     syntactic name.
  1423 
  1353 
  1424   \<^descr>[Print translations] are applied top-down.  They are supplied
  1354     \<^descr>[Print translations] are applied top-down. They are supplied with
  1425   with arguments that are partly still in internal form.  The result
  1355     arguments that are partly still in internal form. The result again
  1426   again undergoes translation; therefore a print translation should
  1356     undergoes translation; therefore a print translation should not introduce
  1427   not introduce as head the very constant that invoked it.  The
  1357     as head the very constant that invoked it. The function may raise
  1428   function may raise exception @{ML Match} to indicate failure; in
  1358     exception @{ML Match} to indicate failure; in this event it has no effect.
  1429   this event it has no effect.  Multiple functions associated with
  1359     Multiple functions associated with some syntactic name are tried in the
  1430   some syntactic name are tried in the order of declaration in the
  1360     order of declaration in the theory.
  1431   theory.
  1361 
  1432 
  1362   Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and @{ML
  1433 
  1363   Const} for terms --- can invoke translation functions. This means that parse
  1434   Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and
  1364   translations can only be associated with parse tree heads of concrete
  1435   @{ML Const} for terms --- can invoke translation functions.  This
  1365   syntax, or syntactic constants introduced via other translations. For plain
  1436   means that parse translations can only be associated with parse tree
  1366   identifiers within the term language, the status of constant versus variable
  1437   heads of concrete syntax, or syntactic constants introduced via
  1367   is not yet know during parsing. This is in contrast to print translations,
  1438   other translations.  For plain identifiers within the term language,
  1368   where constants are explicitly known from the given term in its fully
  1439   the status of constant versus variable is not yet know during
  1369   internal form.
  1440   parsing.  This is in contrast to print translations, where constants
       
  1441   are explicitly known from the given term in its fully internal form.
       
  1442 \<close>
  1370 \<close>
  1443 
  1371 
  1444 
  1372 
  1445 subsection \<open>Built-in syntax transformations\<close>
  1373 subsection \<open>Built-in syntax transformations\<close>
  1446 
  1374 
  1447 text \<open>
  1375 text \<open>
  1448   Here are some further details of the main syntax transformation
  1376   Here are some further details of the main syntax transformation phases of
  1449   phases of \figref{fig:parse-print}.
  1377   \figref{fig:parse-print}.
  1450 \<close>
  1378 \<close>
  1451 
  1379 
  1452 
  1380 
  1453 subsubsection \<open>Transforming parse trees to ASTs\<close>
  1381 subsubsection \<open>Transforming parse trees to ASTs\<close>
  1454 
  1382 
  1455 text \<open>The parse tree is the raw output of the parser.  It is
  1383 text \<open>
  1456   transformed into an AST according to some basic scheme that may be
  1384   The parse tree is the raw output of the parser. It is transformed into an
  1457   augmented by AST translation functions as explained in
  1385   AST according to some basic scheme that may be augmented by AST translation
  1458   \secref{sec:tr-funs}.
  1386   functions as explained in \secref{sec:tr-funs}.
  1459 
  1387 
  1460   The parse tree is constructed by nesting the right-hand sides of the
  1388   The parse tree is constructed by nesting the right-hand sides of the
  1461   productions used to recognize the input.  Such parse trees are
  1389   productions used to recognize the input. Such parse trees are simply lists
  1462   simply lists of tokens and constituent parse trees, the latter
  1390   of tokens and constituent parse trees, the latter representing the
  1463   representing the nonterminals of the productions.  Ignoring AST
  1391   nonterminals of the productions. Ignoring AST translation functions, parse
  1464   translation functions, parse trees are transformed to ASTs by
  1392   trees are transformed to ASTs by stripping out delimiters and copy
  1465   stripping out delimiters and copy productions, while retaining some
  1393   productions, while retaining some source position information from input
  1466   source position information from input tokens.
  1394   tokens.
  1467 
  1395 
  1468   The Pure syntax provides predefined AST translations to make the
  1396   The Pure syntax provides predefined AST translations to make the basic
  1469   basic \<open>\<lambda>\<close>-term structure more apparent within the
  1397   \<open>\<lambda>\<close>-term structure more apparent within the (first-order) AST
  1470   (first-order) AST representation, and thus facilitate the use of
  1398   representation, and thus facilitate the use of @{command translations} (see
  1471   @{command translations} (see also \secref{sec:syn-trans}).  This
  1399   also \secref{sec:syn-trans}). This covers ordinary term application, type
  1472   covers ordinary term application, type application, nested
  1400   application, nested abstraction, iterated meta implications and function
  1473   abstraction, iterated meta implications and function types.  The
  1401   types. The effect is illustrated on some representative input strings is as
  1474   effect is illustrated on some representative input strings is as
       
  1475   follows:
  1402   follows:
  1476 
  1403 
  1477   \begin{center}
  1404   \begin{center}
  1478   \begin{tabular}{ll}
  1405   \begin{tabular}{ll}
  1479   input source & AST \\
  1406   input source & AST \\
  1487    \<open>['a, 'b, 'c] \<Rightarrow> 'd\<close> & \<^verbatim>\<open>("fun" 'a ("fun" 'b ("fun" 'c 'd)))\<close> \\
  1414    \<open>['a, 'b, 'c] \<Rightarrow> 'd\<close> & \<^verbatim>\<open>("fun" 'a ("fun" 'b ("fun" 'c 'd)))\<close> \\
  1488   \end{tabular}
  1415   \end{tabular}
  1489   \end{center}
  1416   \end{center}
  1490 
  1417 
  1491   Note that type and sort constraints may occur in further places ---
  1418   Note that type and sort constraints may occur in further places ---
  1492   translations need to be ready to cope with them.  The built-in
  1419   translations need to be ready to cope with them. The built-in syntax
  1493   syntax transformation from parse trees to ASTs insert additional
  1420   transformation from parse trees to ASTs insert additional constraints that
  1494   constraints that represent source positions.
  1421   represent source positions.
  1495 \<close>
  1422 \<close>
  1496 
  1423 
  1497 
  1424 
  1498 subsubsection \<open>Transforming ASTs to terms\<close>
  1425 subsubsection \<open>Transforming ASTs to terms\<close>
  1499 
  1426 
  1500 text \<open>After application of macros (\secref{sec:syn-trans}), the AST
  1427 text \<open>
  1501   is transformed into a term.  This term still lacks proper type
  1428   After application of macros (\secref{sec:syn-trans}), the AST is transformed
  1502   information, but it might contain some constraints consisting of
  1429   into a term. This term still lacks proper type information, but it might
  1503   applications with head \<^verbatim>\<open>_constrain\<close>, where the second
  1430   contain some constraints consisting of applications with head \<^verbatim>\<open>_constrain\<close>,
  1504   argument is a type encoded as a pre-term within the syntax.  Type
  1431   where the second argument is a type encoded as a pre-term within the syntax.
  1505   inference later introduces correct types, or indicates type errors
  1432   Type inference later introduces correct types, or indicates type errors in
  1506   in the input.
  1433   the input.
  1507 
  1434 
  1508   Ignoring parse translations, ASTs are transformed to terms by
  1435   Ignoring parse translations, ASTs are transformed to terms by mapping AST
  1509   mapping AST constants to term constants, AST variables to term
  1436   constants to term constants, AST variables to term variables or constants
  1510   variables or constants (according to the name space), and AST
  1437   (according to the name space), and AST applications to iterated term
  1511   applications to iterated term applications.
  1438   applications.
  1512 
  1439 
  1513   The outcome is still a first-order term.  Proper abstractions and
  1440   The outcome is still a first-order term. Proper abstractions and bound
  1514   bound variables are introduced by parse translations associated with
  1441   variables are introduced by parse translations associated with certain
  1515   certain syntax constants.  Thus \<^verbatim>\<open>("_abs" x x)\<close> eventually
  1442   syntax constants. Thus \<^verbatim>\<open>("_abs" x x)\<close> eventually becomes a de-Bruijn term
  1516   becomes a de-Bruijn term \<^verbatim>\<open>Abs ("x", _, Bound 0)\<close>.
  1443   \<^verbatim>\<open>Abs ("x", _, Bound 0)\<close>.
  1517 \<close>
  1444 \<close>
  1518 
  1445 
  1519 
  1446 
  1520 subsubsection \<open>Printing of terms\<close>
  1447 subsubsection \<open>Printing of terms\<close>
  1521 
  1448 
  1522 text \<open>The output phase is essentially the inverse of the input
  1449 text \<open>
  1523   phase.  Terms are translated via abstract syntax trees into
  1450   The output phase is essentially the inverse of the input phase. Terms are
  1524   pretty-printed text.
  1451   translated via abstract syntax trees into pretty-printed text.
  1525 
  1452 
  1526   Ignoring print translations, the transformation maps term constants,
  1453   Ignoring print translations, the transformation maps term constants,
  1527   variables and applications to the corresponding constructs on ASTs.
  1454   variables and applications to the corresponding constructs on ASTs.
  1528   Abstractions are mapped to applications of the special constant
  1455   Abstractions are mapped to applications of the special constant \<^verbatim>\<open>_abs\<close> as
  1529   \<^verbatim>\<open>_abs\<close> as seen before.  Type constraints are represented
  1456   seen before. Type constraints are represented via special \<^verbatim>\<open>_constrain\<close>
  1530   via special \<^verbatim>\<open>_constrain\<close> forms, according to various
  1457   forms, according to various policies of type annotation determined
  1531   policies of type annotation determined elsewhere.  Sort constraints
  1458   elsewhere. Sort constraints of type variables are handled in a similar
  1532   of type variables are handled in a similar fashion.
  1459   fashion.
  1533 
  1460 
  1534   After application of macros (\secref{sec:syn-trans}), the AST is
  1461   After application of macros (\secref{sec:syn-trans}), the AST is finally
  1535   finally pretty-printed.  The built-in print AST translations reverse
  1462   pretty-printed. The built-in print AST translations reverse the
  1536   the corresponding parse AST translations.
  1463   corresponding parse AST translations.
  1537 
  1464 
  1538   \<^medskip>
  1465   \<^medskip>
  1539   For the actual printing process, the priority grammar
  1466   For the actual printing process, the priority grammar
  1540   (\secref{sec:priority-grammar}) plays a vital role: productions are
  1467   (\secref{sec:priority-grammar}) plays a vital role: productions are used as
  1541   used as templates for pretty printing, with argument slots stemming
  1468   templates for pretty printing, with argument slots stemming from
  1542   from nonterminals, and syntactic sugar stemming from literal tokens.
  1469   nonterminals, and syntactic sugar stemming from literal tokens.
  1543 
  1470 
  1544   Each AST application with constant head \<open>c\<close> and arguments
  1471   Each AST application with constant head \<open>c\<close> and arguments \<open>t\<^sub>1\<close>, \dots,
  1545   \<open>t\<^sub>1\<close>, \dots, \<open>t\<^sub>n\<close> (for \<open>n = 0\<close> the AST is
  1472   \<open>t\<^sub>n\<close> (for \<open>n = 0\<close> the AST is just the constant \<open>c\<close> itself) is printed
  1546   just the constant \<open>c\<close> itself) is printed according to the
  1473   according to the first grammar production of result name \<open>c\<close>. The required
  1547   first grammar production of result name \<open>c\<close>.  The required
  1474   syntax priority of the argument slot is given by its nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>.
  1548   syntax priority of the argument slot is given by its nonterminal
  1475   The argument \<open>t\<^sub>i\<close> that corresponds to the position of \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is printed
  1549   \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>.  The argument \<open>t\<^sub>i\<close> that corresponds to the
  1476   recursively, and then put in parentheses \<^emph>\<open>if\<close> its priority \<open>p\<close> requires
  1550   position of \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is printed recursively, and then put in
  1477   this. The resulting output is concatenated with the syntactic sugar
  1551   parentheses \<^emph>\<open>if\<close> its priority \<open>p\<close> requires this.  The
  1478   according to the grammar production.
  1552   resulting output is concatenated with the syntactic sugar according
  1479 
  1553   to the grammar production.
  1480   If an AST application \<open>(c x\<^sub>1 \<dots> x\<^sub>m)\<close> has more arguments than the
  1554 
  1481   corresponding production, it is first split into \<open>((c x\<^sub>1 \<dots> x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \<dots>
  1555   If an AST application \<open>(c x\<^sub>1 \<dots> x\<^sub>m)\<close> has more arguments than
  1482   x\<^sub>m)\<close> and then printed recursively as above.
  1556   the corresponding production, it is first split into \<open>((c x\<^sub>1
  1483 
  1557   \<dots> x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \<dots> x\<^sub>m)\<close> and then printed recursively as above.
  1484   Applications with too few arguments or with non-constant head or without a
  1558 
  1485   corresponding production are printed in prefix-form like \<open>f t\<^sub>1 \<dots> t\<^sub>n\<close> for
  1559   Applications with too few arguments or with non-constant head or
  1486   terms.
  1560   without a corresponding production are printed in prefix-form like
  1487 
  1561   \<open>f t\<^sub>1 \<dots> t\<^sub>n\<close> for terms.
  1488   Multiple productions associated with some name \<open>c\<close> are tried in order of
  1562 
  1489   appearance within the grammar. An occurrence of some AST variable \<open>x\<close> is
  1563   Multiple productions associated with some name \<open>c\<close> are tried
  1490   printed as \<open>x\<close> outright.
  1564   in order of appearance within the grammar.  An occurrence of some
  1491 
  1565   AST variable \<open>x\<close> is printed as \<open>x\<close> outright.
  1492   \<^medskip>
  1566 
  1493   White space is \<^emph>\<open>not\<close> inserted automatically. If blanks (or breaks) are
  1567   \<^medskip>
  1494   required to separate tokens, they need to be specified in the mixfix
  1568   White space is \<^emph>\<open>not\<close> inserted automatically.  If
  1495   declaration (\secref{sec:mixfix}).
  1569   blanks (or breaks) are required to separate tokens, they need to be
       
  1570   specified in the mixfix declaration (\secref{sec:mixfix}).
       
  1571 \<close>
  1496 \<close>
  1572 
  1497 
  1573 end
  1498 end