src/Doc/Isar_Ref/Inner_Syntax.thy
 author wenzelm Tue Jan 16 15:42:21 2018 +0100 (19 months ago) changeset 67448 dbb1f02e667d parent 67398 5eb932e604a2 child 67513 731b1ad6759a permissions -rw-r--r--
more documentation;
     1 (*:maxLineLen=78:*)

     2

     3 theory Inner_Syntax

     4   imports Main Base

     5 begin

     6

     7 chapter \<open>Inner syntax --- the term language \label{ch:inner-syntax}\<close>

     8

     9 text \<open>

    10   The inner syntax of Isabelle provides concrete notation for the main

    11   entities of the logical framework, notably \<open>\<lambda>\<close>-terms with types and type

    12   classes. Applications may either extend existing syntactic categories by

    13   additional notation, or define new sub-languages that are linked to the

    14   standard term language via some explicit markers. For example \<^verbatim>\<open>FOO\<close>~\<open>foo\<close>

    15   could embed the syntax corresponding for some user-defined nonterminal \<open>foo\<close>

    16   --- within the bounds of the given lexical syntax of Isabelle/Pure.

    17

    18   The most basic way to specify concrete syntax for logical entities works via

    19   mixfix annotations (\secref{sec:mixfix}), which may be usually given as part

    20   of the original declaration or via explicit notation commands later on

    21   (\secref{sec:notation}). This already covers many needs of concrete syntax

    22   without having to understand the full complexity of inner syntax layers.

    23

    24   Further details of the syntax engine involves the classical distinction of

    25   lexical language versus context-free grammar (see \secref{sec:pure-syntax}),

    26   and various mechanisms for \<^emph>\<open>syntax transformations\<close> (see

    27   \secref{sec:syntax-transformations}).

    28 \<close>

    29

    30

    31 section \<open>Printing logical entities\<close>

    32

    33 subsection \<open>Diagnostic commands \label{sec:print-diag}\<close>

    34

    35 text \<open>

    36   \begin{matharray}{rcl}

    37     @{command_def "typ"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\

    38     @{command_def "term"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\

    39     @{command_def "prop"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\

    40     @{command_def "thm"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\

    41     @{command_def "prf"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\

    42     @{command_def "full_prf"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\

    43     @{command_def "print_state"}\<open>\<^sup>*\<close> & : & \<open>any \<rightarrow>\<close> \\

    44   \end{matharray}

    45

    46   These diagnostic commands assist interactive development by printing

    47   internal logical entities in a human-readable fashion.

    48

    49   @{rail \<open>

    50     @@{command typ} @{syntax modes}? @{syntax type} ('::' @{syntax sort})?

    51     ;

    52     @@{command term} @{syntax modes}? @{syntax term}

    53     ;

    54     @@{command prop} @{syntax modes}? @{syntax prop}

    55     ;

    56     @@{command thm} @{syntax modes}? @{syntax thms}

    57     ;

    58     ( @@{command prf} | @@{command full_prf} ) @{syntax modes}? @{syntax thms}?

    59     ;

    60     @@{command print_state} @{syntax modes}?

    61     ;

    62     @{syntax_def modes}: '(' (@{syntax name} + ) ')'

    63   \<close>}

    64

    65   \<^descr> @{command "typ"}~\<open>\<tau>\<close> reads and prints a type expression according to the

    66   current context.

    67

    68   \<^descr> @{command "typ"}~\<open>\<tau> :: s\<close> uses type-inference to determine the most

    69   general way to make \<open>\<tau>\<close> conform to sort \<open>s\<close>. For concrete \<open>\<tau>\<close> this checks if

    70   the type belongs to that sort. Dummy type parameters \<open>_\<close>'' (underscore)

    71   are assigned to fresh type variables with most general sorts, according the

    72   the principles of type-inference.

    73

    74     \<^descr> @{command "term"}~\<open>t\<close> and @{command "prop"}~\<open>\<phi>\<close> read, type-check and

    75     print terms or propositions according to the current theory or proof

    76     context; the inferred type of \<open>t\<close> is output as well. Note that these

    77     commands are also useful in inspecting the current environment of term

    78     abbreviations.

    79

    80     \<^descr> @{command "thm"}~\<open>a\<^sub>1 \<dots> a\<^sub>n\<close> retrieves theorems from the current theory

    81     or proof context. Note that any attributes included in the theorem

    82     specifications are applied to a temporary context derived from the current

    83     theory or proof; the result is discarded, i.e.\ attributes involved in

    84     \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> do not have any permanent effect.

    85

    86     \<^descr> @{command "prf"} displays the (compact) proof term of the current proof

    87     state (if present), or of the given theorems. Note that this requires an

    88     underlying logic image with proof terms enabled, e.g. \<open>HOL-Proofs\<close>.

    89

    90     \<^descr> @{command "full_prf"} is like @{command "prf"}, but displays the full

    91     proof term, i.e.\ also displays information omitted in the compact proof

    92     term, which is denoted by \<open>_\<close>'' placeholders there.

    93

    94     \<^descr> @{command "print_state"} prints the current proof state (if present),

    95     including current facts and goals.

    96

    97   All of the diagnostic commands above admit a list of \<open>modes\<close> to be

    98   specified, which is appended to the current print mode; see also

    99   \secref{sec:print-modes}. Thus the output behavior may be modified according

   100   particular print mode features. For example, @{command

   101   "print_state"}~\<open>(latex)\<close> prints the current proof state with mathematical

   102   symbols and special characters represented in {\LaTeX} source, according to

   103   the Isabelle style @{cite "isabelle-system"}.

   104

   105   Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more systematic

   106   way to include formal items into the printed text document.

   107 \<close>

   108

   109

   110 subsection \<open>Details of printed content\<close>

   111

   112 text \<open>

   113   \begin{tabular}{rcll}

   114     @{attribute_def show_markup} & : & \<open>attribute\<close> \\

   115     @{attribute_def show_types} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   116     @{attribute_def show_sorts} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   117     @{attribute_def show_consts} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   118     @{attribute_def show_abbrevs} & : & \<open>attribute\<close> & default \<open>true\<close> \\

   119     @{attribute_def show_brackets} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   120     @{attribute_def names_long} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   121     @{attribute_def names_short} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   122     @{attribute_def names_unique} & : & \<open>attribute\<close> & default \<open>true\<close> \\

   123     @{attribute_def eta_contract} & : & \<open>attribute\<close> & default \<open>true\<close> \\

   124     @{attribute_def goals_limit} & : & \<open>attribute\<close> & default \<open>10\<close> \\

   125     @{attribute_def show_main_goal} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   126     @{attribute_def show_hyps} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   127     @{attribute_def show_tags} & : & \<open>attribute\<close> & default \<open>false\<close> \\

   128     @{attribute_def show_question_marks} & : & \<open>attribute\<close> & default \<open>true\<close> \\

   129   \end{tabular}

   130   \<^medskip>

   131

   132   These configuration options control the detail of information that is

   133   displayed for types, terms, theorems, goals etc. See also

   134   \secref{sec:config}.

   135

   136   \<^descr> @{attribute show_markup} controls direct inlining of markup into the

   137   printed representation of formal entities --- notably type and sort

   138   constraints. This enables Prover IDE users to retrieve that information via

   139   tooltips or popups while hovering with the mouse over the output window, for

   140   example. Consequently, this option is enabled by default for Isabelle/jEdit.

   141

   142   \<^descr> @{attribute show_types} and @{attribute show_sorts} control printing of

   143   type constraints for term variables, and sort constraints for type

   144   variables. By default, neither of these are shown in output. If @{attribute

   145   show_sorts} is enabled, types are always shown as well. In Isabelle/jEdit,

   146   manual setting of these options is normally not required thanks to

   147   @{attribute show_markup} above.

   148

   149   Note that displaying types and sorts may explain why a polymorphic inference

   150   rule fails to resolve with some goal, or why a rewrite rule does not apply

   151   as expected.

   152

   153   \<^descr> @{attribute show_consts} controls printing of types of constants when

   154   displaying a goal state.

   155

   156   Note that the output can be enormous, because polymorphic constants often

   157   occur at several different type instances.

   158

   159   \<^descr> @{attribute show_abbrevs} controls folding of constant abbreviations.

   160

   161   \<^descr> @{attribute show_brackets} controls bracketing in pretty printed output.

   162   If enabled, all sub-expressions of the pretty printing tree will be

   163   parenthesized, even if this produces malformed term syntax! This crude way

   164   of showing the internal structure of pretty printed entities may

   165   occasionally help to diagnose problems with operator priorities, for

   166   example.

   167

   168   \<^descr> @{attribute names_long}, @{attribute names_short}, and @{attribute

   169   names_unique} control the way of printing fully qualified internal names in

   170   external form. See also \secref{sec:antiq} for the document antiquotation

   171   options of the same names.

   172

   173   \<^descr> @{attribute eta_contract} controls \<open>\<eta>\<close>-contracted printing of terms.

   174

   175   The \<open>\<eta>\<close>-contraction law asserts @{prop "(\<lambda>x. f x) \<equiv> f"}, provided \<open>x\<close> is not

   176   free in \<open>f\<close>. It asserts \<^emph>\<open>extensionality\<close> of functions: @{prop "f \<equiv> g"} if

   177   @{prop "f x \<equiv> g x"} for all \<open>x\<close>. Higher-order unification frequently puts

   178   terms into a fully \<open>\<eta>\<close>-expanded form. For example, if \<open>F\<close> has type \<open>(\<tau> \<Rightarrow> \<tau>)

   179   \<Rightarrow> \<tau>\<close> then its expanded form is @{term "\<lambda>h. F (\<lambda>x. h x)"}.

   180

   181   Enabling @{attribute eta_contract} makes Isabelle perform \<open>\<eta>\<close>-contractions

   182   before printing, so that @{term "\<lambda>h. F (\<lambda>x. h x)"} appears simply as \<open>F\<close>.

   183

   184   Note that the distinction between a term and its \<open>\<eta>\<close>-expanded form

   185   occasionally matters. While higher-order resolution and rewriting operate

   186   modulo \<open>\<alpha>\<beta>\<eta>\<close>-conversion, some other tools might look at terms more

   187   discretely.

   188

   189   \<^descr> @{attribute goals_limit} controls the maximum number of subgoals to be

   190   printed.

   191

   192   \<^descr> @{attribute show_main_goal} controls whether the main result to be proven

   193   should be displayed. This information might be relevant for schematic goals,

   194   to inspect the current claim that has been synthesized so far.

   195

   196   \<^descr> @{attribute show_hyps} controls printing of implicit hypotheses of local

   197   facts. Normally, only those hypotheses are displayed that are \<^emph>\<open>not\<close> covered

   198   by the assumptions of the current context: this situation indicates a fault

   199   in some tool being used.

   200

   201   By enabling @{attribute show_hyps}, output of \<^emph>\<open>all\<close> hypotheses can be

   202   enforced, which is occasionally useful for diagnostic purposes.

   203

   204   \<^descr> @{attribute show_tags} controls printing of extra annotations within

   205   theorems, such as internal position information, or the case names being

   206   attached by the attribute @{attribute case_names}.

   207

   208   Note that the @{attribute tagged} and @{attribute untagged} attributes

   209   provide low-level access to the collection of tags associated with a

   210   theorem.

   211

   212   \<^descr> @{attribute show_question_marks} controls printing of question marks for

   213   schematic variables, such as \<open>?x\<close>. Only the leading question mark is

   214   affected, the remaining text is unchanged (including proper markup for

   215   schematic variables that might be relevant for user interfaces).

   216 \<close>

   217

   218

   219 subsection \<open>Alternative print modes \label{sec:print-modes}\<close>

   220

   221 text \<open>

   222   \begin{mldecls}

   223     @{index_ML print_mode_value: "unit -> string list"} \\

   224     @{index_ML Print_Mode.with_modes: "string list -> ('a -> 'b) -> 'a -> 'b"} \\

   225   \end{mldecls}

   226

   227   The \<^emph>\<open>print mode\<close> facility allows to modify various operations for printing.

   228   Commands like @{command typ}, @{command term}, @{command thm} (see

   229   \secref{sec:print-diag}) take additional print modes as optional argument.

   230   The underlying ML operations are as follows.

   231

   232     \<^descr> @{ML "print_mode_value ()"} yields the list of currently active print

   233     mode names. This should be understood as symbolic representation of

   234     certain individual features for printing (with precedence from left to

   235     right).

   236

   237     \<^descr> @{ML Print_Mode.with_modes}~\<open>modes f x\<close> evaluates \<open>f x\<close> in an execution

   238     context where the print mode is prepended by the given \<open>modes\<close>. This

   239     provides a thread-safe way to augment print modes. It is also monotonic in

   240     the set of mode names: it retains the default print mode that certain

   241     user-interfaces might have installed for their proper functioning!

   242

   243   \<^medskip>

   244   The pretty printer for inner syntax maintains alternative mixfix productions

   245   for any print mode name invented by the user, say in commands like @{command

   246   notation} or @{command abbreviation}. Mode names can be arbitrary, but the

   247   following ones have a specific meaning by convention:

   248

   249     \<^item> \<^verbatim>\<open>""\<close> (the empty string): default mode; implicitly active as last

   250     element in the list of modes.

   251

   252     \<^item> \<^verbatim>\<open>input\<close>: dummy print mode that is never active; may be used to specify

   253     notation that is only available for input.

   254

   255     \<^item> \<^verbatim>\<open>internal\<close> dummy print mode that is never active; used internally in

   256     Isabelle/Pure.

   257

   258     \<^item> \<^verbatim>\<open>ASCII\<close>: prefer ASCII art over mathematical symbols.

   259

   260     \<^item> \<^verbatim>\<open>latex\<close>: additional mode that is active in {\LaTeX} document

   261     preparation of Isabelle theory sources; allows to provide alternative

   262     output notation.

   263 \<close>

   264

   265

   266 section \<open>Mixfix annotations \label{sec:mixfix}\<close>

   267

   268 text \<open>

   269   Mixfix annotations specify concrete \<^emph>\<open>inner syntax\<close> of Isabelle types and

   270   terms. Locally fixed parameters in toplevel theorem statements, locale and

   271   class specifications also admit mixfix annotations in a fairly uniform

   272   manner. A mixfix annotation describes the concrete syntax, the translation

   273   to abstract syntax, and the pretty printing. Special case annotations

   274   provide a simple means of specifying infix operators and binders.

   275

   276   Isabelle mixfix syntax is inspired by {\OBJ} @{cite OBJ}. It allows to

   277   specify any context-free priority grammar, which is more general than the

   278   fixity declarations of ML and Prolog.

   279

   280   @{rail \<open>

   281     @{syntax_def mixfix}: '('

   282       (@{syntax template} prios? @{syntax nat}? |

   283         (@'infix' | @'infixl' | @'infixr') @{syntax template} @{syntax nat} |

   284         @'binder' @{syntax template} prios? @{syntax nat} |

   285         @'structure') ')'

   286     ;

   287     @{syntax template}: string

   288     ;

   289     prios: '[' (@{syntax nat} + ',') ']'

   290   \<close>}

   291

   292   The string given as \<open>template\<close> may include literal text, spacing, blocks,

   293   and arguments (denoted by \<open>_\<close>''); the special symbol \<^verbatim>\<open>\<index>\<close>'' (printed as

   294   \<open>\<index>\<close>'') represents an index argument that specifies an implicit @{keyword

   295   "structure"} reference (see also \secref{sec:locale}). Only locally fixed

   296   variables may be declared as @{keyword "structure"}.

   297

   298   Infix and binder declarations provide common abbreviations for particular

   299   mixfix declarations. So in practice, mixfix templates mostly degenerate to

   300   literal text for concrete syntax, such as \<^verbatim>\<open>++\<close>'' for an infix symbol.

   301 \<close>

   302

   303

   304 subsection \<open>The general mixfix form\<close>

   305

   306 text \<open>

   307   In full generality, mixfix declarations work as follows. Suppose a constant

   308   \<open>c :: \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> is annotated by \<open>(mixfix [p\<^sub>1, \<dots>, p\<^sub>n] p)\<close>, where

   309   \<open>mixfix\<close> is a string \<open>d\<^sub>0 _ d\<^sub>1 _ \<dots> _ d\<^sub>n\<close> consisting of delimiters that

   310   surround argument positions as indicated by underscores.

   311

   312   Altogether this determines a production for a context-free priority grammar,

   313   where for each argument \<open>i\<close> the syntactic category is determined by \<open>\<tau>\<^sub>i\<close>

   314   (with priority \<open>p\<^sub>i\<close>), and the result category is determined from \<open>\<tau>\<close> (with

   315   priority \<open>p\<close>). Priority specifications are optional, with default 0 for

   316   arguments and 1000 for the result.\<^footnote>\<open>Omitting priorities is prone to

   317   syntactic ambiguities unless the delimiter tokens determine fully bracketed

   318   notation, as in \<open>if _ then _ else _ fi\<close>.\<close>

   319

   320   Since \<open>\<tau>\<close> may be again a function type, the constant type scheme may have

   321   more argument positions than the mixfix pattern. Printing a nested

   322   application \<open>c t\<^sub>1 \<dots> t\<^sub>m\<close> for \<open>m > n\<close> works by attaching concrete notation

   323   only to the innermost part, essentially by printing \<open>(c t\<^sub>1 \<dots> t\<^sub>n) \<dots> t\<^sub>m\<close>

   324   instead. If a term has fewer arguments than specified in the mixfix

   325   template, the concrete syntax is ignored.

   326

   327   \<^medskip>

   328   A mixfix template may also contain additional directives for pretty

   329   printing, notably spaces, blocks, and breaks. The general template format is

   330   a sequence over any of the following entities.

   331

   332   \<^descr> \<open>d\<close> is a delimiter, namely a non-empty sequence delimiter items of the

   333   following form:

   334     \<^enum> a control symbol followed by a cartouche

   335     \<^enum> a single symbol, excluding the following special characters:

   336       \<^medskip>

   337       \begin{tabular}{ll}

   338         \<^verbatim>\<open>'\<close> & single quote \\

   339         \<^verbatim>\<open>_\<close> & underscore \\

   340         \<open>\<index>\<close> & index symbol \\

   341         \<^verbatim>\<open>(\<close> & open parenthesis \\

   342         \<^verbatim>\<open>)\<close> & close parenthesis \\

   343         \<^verbatim>\<open>/\<close> & slash \\

   344         \<open>\<open> \<close>\<close> & cartouche delimiters \\

   345       \end{tabular}

   346       \<^medskip>

   347

   348   \<^descr> \<^verbatim>\<open>'\<close> escapes the special meaning of these meta-characters, producing a

   349   literal version of the following character, unless that is a blank.

   350

   351   A single quote followed by a blank separates delimiters, without affecting

   352   printing, but input tokens may have additional white space here.

   353

   354   \<^descr> \<^verbatim>\<open>_\<close> is an argument position, which stands for a certain syntactic

   355   category in the underlying grammar.

   356

   357   \<^descr> \<open>\<index>\<close> is an indexed argument position; this is the place where implicit

   358   structure arguments can be attached.

   359

   360   \<^descr> \<open>s\<close> is a non-empty sequence of spaces for printing. This and the following

   361   specifications do not affect parsing at all.

   362

   363   \<^descr> \<^verbatim>\<open>(\<close>\<open>n\<close> opens a pretty printing block. The optional natural number

   364   specifies the block indentation, i.e. how much spaces to add when a line

   365   break occurs within the block. The default indentation is 0.

   366

   367   \<^descr> \<^verbatim>\<open>(\<close>\<open>\<open>properties\<close>\<close> opens a pretty printing block, with properties

   368   specified within the given text cartouche. The syntax and semantics of

   369   the category @{syntax_ref mixfix_properties} is described below.

   370

   371   \<^descr> \<^verbatim>\<open>)\<close> closes a pretty printing block.

   372

   373   \<^descr> \<^verbatim>\<open>//\<close> forces a line break.

   374

   375   \<^descr> \<^verbatim>\<open>/\<close>\<open>s\<close> allows a line break. Here \<open>s\<close> stands for the string of spaces

   376   (zero or more) right after the slash. These spaces are printed if the break

   377   is \<^emph>\<open>not\<close> taken.

   378

   379

   380   \<^medskip>

   381   Block properties allow more control over the details of pretty-printed

   382   output. The concrete syntax is defined as follows.

   383

   384   @{rail \<open>

   385     @{syntax_def "mixfix_properties"}: (entry *)

   386     ;

   387     entry: atom ('=' atom)?

   388     ;

   389     atom: @{syntax short_ident} | @{syntax int} | @{syntax float} | @{syntax cartouche}

   390   \<close>}

   391

   392   Each @{syntax entry} is a name-value pair: if the value is omitted, if

   393   defaults to \<^verbatim>\<open>true\<close> (intended for Boolean properties). The following

   394   standard block properties are supported:

   395

   396     \<^item> \<open>indent\<close> (natural number): the block indentation --- the same as for the

   397     simple syntax without block properties.

   398

   399     \<^item> \<open>consistent\<close> (Boolean): this block has consistent breaks (if one break

   400     is taken, all breaks are taken).

   401

   402     \<^item> \<open>unbreakable\<close> (Boolean): all possible breaks of the block are disabled

   403     (turned into spaces).

   404

   405     \<^item> \<open>markup\<close> (string): the optional name of the markup node. If this is

   406     provided, all remaining properties are turned into its XML attributes.

   407     This allows to specify free-form PIDE markup, e.g.\ for specialized

   408     output.

   409

   410   \<^medskip>

   411   Note that the general idea of pretty printing with blocks and breaks is

   412   described in @{cite "paulson-ml2"}; it goes back to @{cite "Oppen:1980"}.

   413 \<close>

   414

   415

   416 subsection \<open>Infixes\<close>

   417

   418 text \<open>

   419   Infix operators are specified by convenient short forms that abbreviate

   420   general mixfix annotations as follows:

   421

   422   \begin{center}

   423   \begin{tabular}{lll}

   424

   425   \<^verbatim>\<open>(\<close>@{keyword_def "infix"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close> \<open>p\<close>\<^verbatim>\<open>)\<close>

   426   & \<open>\<mapsto>\<close> &

   427   \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p + 1\<close>\<^verbatim>\<open>,\<close>~\<open>p + 1\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\

   428   \<^verbatim>\<open>(\<close>@{keyword_def "infixl"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close> \<open>p\<close>\<^verbatim>\<open>)\<close>

   429   & \<open>\<mapsto>\<close> &

   430   \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p\<close>\<^verbatim>\<open>,\<close>~\<open>p + 1\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\

   431   \<^verbatim>\<open>(\<close>@{keyword_def "infixr"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close>

   432   & \<open>\<mapsto>\<close> &

   433   \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p + 1\<close>\<^verbatim>\<open>,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\

   434

   435   \end{tabular}

   436   \end{center}

   437

   438   The mixfix template \<^verbatim>\<open>"(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)"\<close> specifies two argument positions;

   439   the delimiter is preceded by a space and followed by a space or line break;

   440   the entire phrase is a pretty printing block.

   441

   442   The alternative notation \<^verbatim>\<open>(\<close>\<open>sy\<close>\<^verbatim>\<open>)\<close> is introduced in addition. Thus any

   443   infix operator may be written in prefix form (as in Haskell), independently of

   444   the number of arguments in the term. To avoid conflict with the comment brackets

   445   \<^verbatim>\<open>(*\<close> and \<^verbatim>\<open>*)\<close>, infix operators that begin or end with a \<^verbatim>\<open>*\<close> require

   446   extra spaces, e.g. \<^verbatim>\<open>( * )\<close>.

   447 \<close>

   448

   449

   450 subsection \<open>Binders\<close>

   451

   452 text \<open>

   453   A \<^emph>\<open>binder\<close> is a variable-binding construct such as a quantifier. The idea

   454   to formalize \<open>\<forall>x. b\<close> as \<open>All (\<lambda>x. b)\<close> for \<open>All :: ('a \<Rightarrow> bool) \<Rightarrow> bool\<close>

   455   already goes back to @{cite church40}. Isabelle declarations of certain

   456   higher-order operators may be annotated with @{keyword_def "binder"}

   457   annotations as follows:

   458

   459   \begin{center}

   460   \<open>c ::\<close>~\<^verbatim>\<open>"\<close>\<open>(\<tau>\<^sub>1 \<Rightarrow> \<tau>\<^sub>2) \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  (\<close>@{keyword "binder"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>" [\<close>\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>

   461   \end{center}

   462

   463   This introduces concrete binder syntax \<open>sy x. b\<close>, where \<open>x\<close> is a bound

   464   variable of type \<open>\<tau>\<^sub>1\<close>, the body \<open>b\<close> has type \<open>\<tau>\<^sub>2\<close> and the whole term has

   465   type \<open>\<tau>\<^sub>3\<close>. The optional integer \<open>p\<close> specifies the syntactic priority of the

   466   body; the default is \<open>q\<close>, which is also the priority of the whole construct.

   467

   468   Internally, the binder syntax is expanded to something like this:

   469   \begin{center}

   470   \<open>c_binder ::\<close>~\<^verbatim>\<open>"\<close>\<open>idts \<Rightarrow> \<tau>\<^sub>2 \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  ("(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)" [0,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>

   471   \end{center}

   472

   473   Here @{syntax (inner) idts} is the nonterminal symbol for a list of

   474   identifiers with optional type constraints (see also

   475   \secref{sec:pure-grammar}). The mixfix template \<^verbatim>\<open>"(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)"\<close> defines

   476   argument positions for the bound identifiers and the body, separated by a

   477   dot with optional line break; the entire phrase is a pretty printing block

   478   of indentation level 3. Note that there is no extra space after \<open>sy\<close>, so it

   479   needs to be included user specification if the binder syntax ends with a

   480   token that may be continued by an identifier token at the start of @{syntax

   481   (inner) idts}.

   482

   483   Furthermore, a syntax translation to transforms \<open>c_binder x\<^sub>1 \<dots> x\<^sub>n b\<close> into

   484   iterated application \<open>c (\<lambda>x\<^sub>1. \<dots> c (\<lambda>x\<^sub>n. b)\<dots>)\<close>. This works in both

   485   directions, for parsing and printing.

   486 \<close>

   487

   488

   489 section \<open>Explicit notation \label{sec:notation}\<close>

   490

   491 text \<open>

   492   \begin{matharray}{rcll}

   493     @{command_def "type_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\

   494     @{command_def "no_type_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\

   495     @{command_def "notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\

   496     @{command_def "no_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\

   497     @{command_def "write"} & : & \<open>proof(state) \<rightarrow> proof(state)\<close> \\

   498   \end{matharray}

   499

   500   Commands that introduce new logical entities (terms or types) usually allow

   501   to provide mixfix annotations on the spot, which is convenient for default

   502   notation. Nonetheless, the syntax may be modified later on by declarations

   503   for explicit notation. This allows to add or delete mixfix annotations for

   504   of existing logical entities within the current context.

   505

   506   @{rail \<open>

   507     (@@{command type_notation} | @@{command no_type_notation}) @{syntax mode}? \<newline>

   508       (@{syntax name} @{syntax mixfix} + @'and')

   509     ;

   510     (@@{command notation} | @@{command no_notation}) @{syntax mode}? \<newline>

   511       (@{syntax name} @{syntax mixfix} + @'and')

   512     ;

   513     @@{command write} @{syntax mode}? (@{syntax name} @{syntax mixfix} + @'and')

   514   \<close>}

   515

   516   \<^descr> @{command "type_notation"}~\<open>c (mx)\<close> associates mixfix syntax with an

   517   existing type constructor. The arity of the constructor is retrieved from

   518   the context.

   519

   520   \<^descr> @{command "no_type_notation"} is similar to @{command "type_notation"},

   521   but removes the specified syntax annotation from the present context.

   522

   523   \<^descr> @{command "notation"}~\<open>c (mx)\<close> associates mixfix syntax with an existing

   524   constant or fixed variable. The type declaration of the given entity is

   525   retrieved from the context.

   526

   527   \<^descr> @{command "no_notation"} is similar to @{command "notation"}, but removes

   528   the specified syntax annotation from the present context.

   529

   530   \<^descr> @{command "write"} is similar to @{command "notation"}, but works within

   531   an Isar proof body.

   532 \<close>

   533

   534

   535 section \<open>The Pure syntax \label{sec:pure-syntax}\<close>

   536

   537 subsection \<open>Lexical matters \label{sec:inner-lex}\<close>

   538

   539 text \<open>

   540   The inner lexical syntax vaguely resembles the outer one

   541   (\secref{sec:outer-lex}), but some details are different. There are two main

   542   categories of inner syntax tokens:

   543

   544   \<^enum> \<^emph>\<open>delimiters\<close> --- the literal tokens occurring in productions of the given

   545   priority grammar (cf.\ \secref{sec:priority-grammar});

   546

   547   \<^enum> \<^emph>\<open>named tokens\<close> --- various categories of identifiers etc.

   548

   549

   550   Delimiters override named tokens and may thus render certain identifiers

   551   inaccessible. Sometimes the logical context admits alternative ways to refer

   552   to the same entity, potentially via qualified names.

   553

   554   \<^medskip>

   555   The categories for named tokens are defined once and for all as follows,

   556   reusing some categories of the outer token syntax (\secref{sec:outer-lex}).

   557

   558   \begin{center}

   559   \begin{supertabular}{rcl}

   560     @{syntax_def (inner) id} & = & @{syntax_ref short_ident} \\

   561     @{syntax_def (inner) longid} & = & @{syntax_ref long_ident} \\

   562     @{syntax_def (inner) var} & = & @{syntax_ref var} \\

   563     @{syntax_def (inner) tid} & = & @{syntax_ref type_ident} \\

   564     @{syntax_def (inner) tvar} & = & @{syntax_ref type_var} \\

   565     @{syntax_def (inner) num_token} & = & @{syntax_ref nat} \\

   566     @{syntax_def (inner) float_token} & = & @{syntax_ref nat}\<^verbatim>\<open>.\<close>@{syntax_ref nat} \\

   567     @{syntax_def (inner) str_token} & = & \<^verbatim>\<open>''\<close> \<open>\<dots>\<close> \<^verbatim>\<open>''\<close> \\

   568     @{syntax_def (inner) string_token} & = & \<^verbatim>\<open>"\<close> \<open>\<dots>\<close> \<^verbatim>\<open>"\<close> \\

   569     @{syntax_def (inner) cartouche} & = & @{verbatim "\<open>"} \<open>\<dots>\<close> @{verbatim "\<close>"} \\

   570   \end{supertabular}

   571   \end{center}

   572

   573   The token categories @{syntax (inner) num_token}, @{syntax (inner)

   574   float_token}, @{syntax (inner) str_token}, @{syntax (inner) string_token},

   575   and @{syntax (inner) cartouche} are not used in Pure. Object-logics may

   576   implement numerals and string literals by adding appropriate syntax

   577   declarations, together with some translation functions (e.g.\ see

   578   \<^file>\<open>~~/src/HOL/Tools/string_syntax.ML\<close>).

   579

   580   The derived categories @{syntax_def (inner) num_const}, and @{syntax_def

   581   (inner) float_const}, provide robust access to the respective tokens: the

   582   syntax tree holds a syntactic constant instead of a free variable.

   583

   584   Formal document comments (\secref{sec:comments}) may be also used within the

   585   inner syntax.

   586 \<close>

   587

   588

   589 subsection \<open>Priority grammars \label{sec:priority-grammar}\<close>

   590

   591 text \<open>

   592   A context-free grammar consists of a set of \<^emph>\<open>terminal symbols\<close>, a set of

   593   \<^emph>\<open>nonterminal symbols\<close> and a set of \<^emph>\<open>productions\<close>. Productions have the

   594   form \<open>A = \<gamma>\<close>, where \<open>A\<close> is a nonterminal and \<open>\<gamma>\<close> is a string of terminals

   595   and nonterminals. One designated nonterminal is called the \<^emph>\<open>root symbol\<close>.

   596   The language defined by the grammar consists of all strings of terminals

   597   that can be derived from the root symbol by applying productions as rewrite

   598   rules.

   599

   600   The standard Isabelle parser for inner syntax uses a \<^emph>\<open>priority grammar\<close>.

   601   Each nonterminal is decorated by an integer priority: \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>. In a

   602   derivation, \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> may be rewritten using a production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> only

   603   if \<open>p \<le> q\<close>. Any priority grammar can be translated into a normal

   604   context-free grammar by introducing new nonterminals and productions.

   605

   606   \<^medskip>

   607   Formally, a set of context free productions \<open>G\<close> induces a derivation

   608   relation \<open>\<longrightarrow>\<^sub>G\<close> as follows. Let \<open>\<alpha>\<close> and \<open>\<beta>\<close> denote strings of terminal or

   609   nonterminal symbols. Then \<open>\<alpha> A\<^sup>(\<^sup>p\<^sup>) \<beta> \<longrightarrow>\<^sub>G \<alpha> \<gamma> \<beta>\<close> holds if and only if \<open>G\<close>

   610   contains some production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> for \<open>p \<le> q\<close>.

   611

   612   \<^medskip>

   613   The following grammar for arithmetic expressions demonstrates how binding

   614   power and associativity of operators can be enforced by priorities.

   615

   616   \begin{center}

   617   \begin{tabular}{rclr}

   618   \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>(\<close> \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>)\<close> \\

   619   \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>0\<close> \\

   620   \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>+\<close> \<open>A\<^sup>(\<^sup>1\<^sup>)\<close> \\

   621   \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>*\<close> \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> \\

   622   \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>-\<close> \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \\

   623   \end{tabular}

   624   \end{center}

   625   The choice of priorities determines that \<^verbatim>\<open>-\<close> binds tighter than \<^verbatim>\<open>*\<close>, which

   626   binds tighter than \<^verbatim>\<open>+\<close>. Furthermore \<^verbatim>\<open>+\<close> associates to the left and \<^verbatim>\<open>*\<close> to

   627   the right.

   628

   629   \<^medskip>

   630   For clarity, grammars obey these conventions:

   631

   632     \<^item> All priorities must lie between 0 and 1000.

   633

   634     \<^item> Priority 0 on the right-hand side and priority 1000 on the left-hand

   635     side may be omitted.

   636

   637     \<^item> The production \<open>A\<^sup>(\<^sup>p\<^sup>) = \<alpha>\<close> is written as \<open>A = \<alpha> (p)\<close>, i.e.\ the

   638     priority of the left-hand side actually appears in a column on the far

   639     right.

   640

   641     \<^item> Alternatives are separated by \<open>|\<close>.

   642

   643     \<^item> Repetition is indicated by dots \<open>(\<dots>)\<close> in an informal but obvious way.

   644

   645   Using these conventions, the example grammar specification above

   646   takes the form:

   647   \begin{center}

   648   \begin{tabular}{rclc}

   649     \<open>A\<close> & \<open>=\<close> & \<^verbatim>\<open>(\<close> \<open>A\<close> \<^verbatim>\<open>)\<close> \\

   650               & \<open>|\<close> & \<^verbatim>\<open>0\<close> & \qquad\qquad \\

   651               & \<open>|\<close> & \<open>A\<close> \<^verbatim>\<open>+\<close> \<open>A\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(0)\<close> \\

   652               & \<open>|\<close> & \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>*\<close> \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>(2)\<close> \\

   653               & \<open>|\<close> & \<^verbatim>\<open>-\<close> \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\

   654   \end{tabular}

   655   \end{center}

   656 \<close>

   657

   658

   659 subsection \<open>The Pure grammar \label{sec:pure-grammar}\<close>

   660

   661 text \<open>

   662   The priority grammar of the \<open>Pure\<close> theory is defined approximately like

   663   this:

   664

   665   \begin{center}

   666   \begin{supertabular}{rclr}

   667

   668   @{syntax_def (inner) any} & = & \<open>prop  |  logic\<close> \\\\

   669

   670   @{syntax_def (inner) prop} & = & \<^verbatim>\<open>(\<close> \<open>prop\<close> \<^verbatim>\<open>)\<close> \\

   671     & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>4\<^sup>)\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(3)\<close> \\

   672     & \<open>|\<close> & \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>==\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(2)\<close> \\

   673     & \<open>|\<close> & \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> \<open>\<equiv>\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(2)\<close> \\

   674     & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>&&&\<close> \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>(2)\<close> \\

   675     & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> \<^verbatim>\<open>==>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\

   676     & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> \<open>\<Longrightarrow>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\

   677     & \<open>|\<close> & \<^verbatim>\<open>[|\<close> \<open>prop\<close> \<^verbatim>\<open>;\<close> \<open>\<dots>\<close> \<^verbatim>\<open>;\<close> \<open>prop\<close> \<^verbatim>\<open>|]\<close> \<^verbatim>\<open>==>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\

   678     & \<open>|\<close> & \<open>\<lbrakk>\<close> \<open>prop\<close> \<^verbatim>\<open>;\<close> \<open>\<dots>\<close> \<^verbatim>\<open>;\<close> \<open>prop\<close> \<open>\<rbrakk>\<close> \<open>\<Longrightarrow>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\

   679     & \<open>|\<close> & \<^verbatim>\<open>!!\<close> \<open>idts\<close> \<^verbatim>\<open>.\<close> \<open>prop\<close> & \<open>(0)\<close> \\

   680     & \<open>|\<close> & \<open>\<And>\<close> \<open>idts\<close> \<^verbatim>\<open>.\<close> \<open>prop\<close> & \<open>(0)\<close> \\

   681     & \<open>|\<close> & \<^verbatim>\<open>OFCLASS\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>logic\<close> \<^verbatim>\<open>)\<close> \\

   682     & \<open>|\<close> & \<^verbatim>\<open>SORT_CONSTRAINT\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\

   683     & \<open>|\<close> & \<^verbatim>\<open>TERM\<close> \<open>logic\<close> \\

   684     & \<open>|\<close> & \<^verbatim>\<open>PROP\<close> \<open>aprop\<close> \\\\

   685

   686   @{syntax_def (inner) aprop} & = & \<^verbatim>\<open>(\<close> \<open>aprop\<close> \<^verbatim>\<open>)\<close> \\

   687     & \<open>|\<close> & \<open>id  |  longid  |  var  |\<close>~~\<^verbatim>\<open>_\<close>~~\<open>|\<close>~~\<^verbatim>\<open>...\<close> \\

   688     & \<open>|\<close> & \<^verbatim>\<open>CONST\<close> \<open>id  |\<close>~~\<^verbatim>\<open>CONST\<close> \<open>longid\<close> \\

   689     & \<open>|\<close> & \<^verbatim>\<open>XCONST\<close> \<open>id  |\<close>~~\<^verbatim>\<open>XCONST\<close> \<open>longid\<close> \\

   690     & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)  any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>(999)\<close> \\\\

   691

   692   @{syntax_def (inner) logic} & = & \<^verbatim>\<open>(\<close> \<open>logic\<close> \<^verbatim>\<open>)\<close> \\

   693     & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>4\<^sup>)\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(3)\<close> \\

   694     & \<open>|\<close> & \<open>id  |  longid  |  var  |\<close>~~\<^verbatim>\<open>_\<close>~~\<open>|\<close>~~\<^verbatim>\<open>...\<close> \\

   695     & \<open>|\<close> & \<^verbatim>\<open>CONST\<close> \<open>id  |\<close>~~\<^verbatim>\<open>CONST\<close> \<open>longid\<close> \\

   696     & \<open>|\<close> & \<^verbatim>\<open>XCONST\<close> \<open>id  |\<close>~~\<^verbatim>\<open>XCONST\<close> \<open>longid\<close> \\

   697     & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)  any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>(999)\<close> \\

   698     & \<open>|\<close> & \<^verbatim>\<open>%\<close> \<open>pttrns\<close> \<^verbatim>\<open>.\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\

   699     & \<open>|\<close> & \<open>\<lambda>\<close> \<open>pttrns\<close> \<^verbatim>\<open>.\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\

   700     & \<open>|\<close> & \<^verbatim>\<open>(==)\<close>~~\<open>|\<close>~~\<^verbatim>\<open>(\<close>\<open>\<equiv>\<close>\<^verbatim>\<open>)\<close>~~\<open>|\<close>~~\<^verbatim>\<open>(&&&)\<close> \\

   701     & \<open>|\<close> & \<^verbatim>\<open>(==>)\<close>~~\<open>|\<close>~~\<^verbatim>\<open>(\<close>\<open>\<Longrightarrow>\<close>\<^verbatim>\<open>)\<close> \\

   702     & \<open>|\<close> & \<^verbatim>\<open>TYPE\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\\\

   703

   704   @{syntax_def (inner) idt} & = & \<^verbatim>\<open>(\<close> \<open>idt\<close> \<^verbatim>\<open>)\<close>~~\<open>|  id  |\<close>~~\<^verbatim>\<open>_\<close> \\

   705     & \<open>|\<close> & \<open>id\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(0)\<close> \\

   706     & \<open>|\<close> & \<^verbatim>\<open>_\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(0)\<close> \\\\

   707

   708   @{syntax_def (inner) index} & = & \<^verbatim>\<open>\<^bsub>\<close> \<open>logic\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>\<^esub>\<close>~~\<open>|  |  \<index>\<close> \\\\

   709

   710   @{syntax_def (inner) idts} & = & \<open>idt  |  idt\<^sup>(\<^sup>1\<^sup>) idts\<close> & \<open>(0)\<close> \\\\

   711

   712   @{syntax_def (inner) pttrn} & = & \<open>idt\<close> \\\\

   713

   714   @{syntax_def (inner) pttrns} & = & \<open>pttrn  |  pttrn\<^sup>(\<^sup>1\<^sup>) pttrns\<close> & \<open>(0)\<close> \\\\

   715

   716   @{syntax_def (inner) type} & = & \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\

   717     & \<open>|\<close> & \<open>tid  |  tvar  |\<close>~~\<^verbatim>\<open>_\<close> \\

   718     & \<open>|\<close> & \<open>tid\<close> \<^verbatim>\<open>::\<close> \<open>sort  |  tvar\<close>~~\<^verbatim>\<open>::\<close> \<open>sort  |\<close>~~\<^verbatim>\<open>_\<close> \<^verbatim>\<open>::\<close> \<open>sort\<close> \\

   719     & \<open>|\<close> & \<open>type_name  |  type\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) type_name\<close> \\

   720     & \<open>|\<close> & \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \<open>type_name\<close> \\

   721     & \<open>|\<close> & \<open>type\<^sup>(\<^sup>1\<^sup>)\<close> \<^verbatim>\<open>=>\<close> \<open>type\<close> & \<open>(0)\<close> \\

   722     & \<open>|\<close> & \<open>type\<^sup>(\<^sup>1\<^sup>)\<close> \<open>\<Rightarrow>\<close> \<open>type\<close> & \<open>(0)\<close> \\

   723     & \<open>|\<close> & \<^verbatim>\<open>[\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>]\<close> \<^verbatim>\<open>=>\<close> \<open>type\<close> & \<open>(0)\<close> \\

   724     & \<open>|\<close> & \<^verbatim>\<open>[\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>]\<close> \<open>\<Rightarrow>\<close> \<open>type\<close> & \<open>(0)\<close> \\

   725   @{syntax_def (inner) type_name} & = & \<open>id  |  longid\<close> \\\\

   726

   727   @{syntax_def (inner) sort} & = & @{syntax class_name}~~\<open>|\<close>~~\<^verbatim>\<open>{}\<close> \\

   728     & \<open>|\<close> & \<^verbatim>\<open>{\<close> @{syntax class_name} \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> @{syntax class_name} \<^verbatim>\<open>}\<close> \\

   729   @{syntax_def (inner) class_name} & = & \<open>id  |  longid\<close> \\

   730   \end{supertabular}

   731   \end{center}

   732

   733   \<^medskip>

   734   Here literal terminals are printed \<^verbatim>\<open>verbatim\<close>; see also

   735   \secref{sec:inner-lex} for further token categories of the inner syntax. The

   736   meaning of the nonterminals defined by the above grammar is as follows:

   737

   738   \<^descr> @{syntax_ref (inner) any} denotes any term.

   739

   740   \<^descr> @{syntax_ref (inner) prop} denotes meta-level propositions, which are

   741   terms of type @{typ prop}. The syntax of such formulae of the meta-logic is

   742   carefully distinguished from usual conventions for object-logics. In

   743   particular, plain \<open>\<lambda>\<close>-term notation is \<^emph>\<open>not\<close> recognized as @{syntax (inner)

   744   prop}.

   745

   746   \<^descr> @{syntax_ref (inner) aprop} denotes atomic propositions, which are

   747   embedded into regular @{syntax (inner) prop} by means of an explicit \<^verbatim>\<open>PROP\<close>

   748   token.

   749

   750   Terms of type @{typ prop} with non-constant head, e.g.\ a plain variable,

   751   are printed in this form. Constants that yield type @{typ prop} are expected

   752   to provide their own concrete syntax; otherwise the printed version will

   753   appear like @{syntax (inner) logic} and cannot be parsed again as @{syntax

   754   (inner) prop}.

   755

   756   \<^descr> @{syntax_ref (inner) logic} denotes arbitrary terms of a logical type,

   757   excluding type @{typ prop}. This is the main syntactic category of

   758   object-logic entities, covering plain \<open>\<lambda>\<close>-term notation (variables,

   759   abstraction, application), plus anything defined by the user.

   760

   761   When specifying notation for logical entities, all logical types (excluding

   762   @{typ prop}) are \<^emph>\<open>collapsed\<close> to this single category of @{syntax (inner)

   763   logic}.

   764

   765   \<^descr> @{syntax_ref (inner) index} denotes an optional index term for indexed

   766   syntax. If omitted, it refers to the first @{keyword_ref "structure"}

   767   variable in the context. The special dummy \<open>\<index>\<close>'' serves as pattern

   768   variable in mixfix annotations that introduce indexed notation.

   769

   770   \<^descr> @{syntax_ref (inner) idt} denotes identifiers, possibly constrained by

   771   types.

   772

   773   \<^descr> @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref (inner)

   774   idt}. This is the most basic category for variables in iterated binders,

   775   such as \<open>\<lambda>\<close> or \<open>\<And>\<close>.

   776

   777   \<^descr> @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns} denote

   778   patterns for abstraction, cases bindings etc. In Pure, these categories

   779   start as a merely copy of @{syntax (inner) idt} and @{syntax (inner) idts},

   780   respectively. Object-logics may add additional productions for binding

   781   forms.

   782

   783   \<^descr> @{syntax_ref (inner) type} denotes types of the meta-logic.

   784

   785   \<^descr> @{syntax_ref (inner) sort} denotes meta-level sorts.

   786

   787

   788   Here are some further explanations of certain syntax features.

   789

   790   \<^item> In @{syntax (inner) idts}, note that \<open>x :: nat y\<close> is parsed as \<open>x :: (nat

   791   y)\<close>, treating \<open>y\<close> like a type constructor applied to \<open>nat\<close>. To avoid this

   792   interpretation, write \<open>(x :: nat) y\<close> with explicit parentheses.

   793

   794   \<^item> Similarly, \<open>x :: nat y :: nat\<close> is parsed as \<open>x :: (nat y :: nat)\<close>. The

   795   correct form is \<open>(x :: nat) (y :: nat)\<close>, or \<open>(x :: nat) y :: nat\<close> if \<open>y\<close> is

   796   last in the sequence of identifiers.

   797

   798   \<^item> Type constraints for terms bind very weakly. For example, \<open>x < y :: nat\<close>

   799   is normally parsed as \<open>(x < y) :: nat\<close>, unless \<open><\<close> has a very low priority,

   800   in which case the input is likely to be ambiguous. The correct form is \<open>x <

   801   (y :: nat)\<close>.

   802

   803   \<^item> Dummy variables (written as underscore) may occur in different

   804   roles.

   805

   806     \<^descr> A type \<open>_\<close>'' or \<open>_ :: sort\<close>'' acts like an anonymous inference

   807     parameter, which is filled-in according to the most general type produced

   808     by the type-checking phase.

   809

   810     \<^descr> A bound \<open>_\<close>'' refers to a vacuous abstraction, where the body does not

   811     refer to the binding introduced here. As in the term @{term "\<lambda>x _. x"},

   812     which is \<open>\<alpha>\<close>-equivalent to \<open>\<lambda>x y. x\<close>.

   813

   814     \<^descr> A free \<open>_\<close>'' refers to an implicit outer binding. Higher definitional

   815     packages usually allow forms like \<open>f x _ = x\<close>.

   816

   817     \<^descr> A schematic \<open>_\<close>'' (within a term pattern, see \secref{sec:term-decls})

   818     refers to an anonymous variable that is implicitly abstracted over its

   819     context of locally bound variables. For example, this allows pattern

   820     matching of \<open>{x. f x = g x}\<close> against \<open>{x. _ = _}\<close>, or even \<open>{_. _ = _}\<close> by

   821     using both bound and schematic dummies.

   822

   823   \<^descr> The three literal dots \<^verbatim>\<open>...\<close>'' may be also written as ellipsis symbol

   824   \<^verbatim>\<open>\<dots>\<close>. In both cases this refers to a special schematic variable, which is

   825   bound in the context. This special term abbreviation works nicely with

   826   calculational reasoning (\secref{sec:calculation}).

   827

   828   \<^descr> \<^verbatim>\<open>CONST\<close> ensures that the given identifier is treated as constant term,

   829   and passed through the parse tree in fully internalized form. This is

   830   particularly relevant for translation rules (\secref{sec:syn-trans}),

   831   notably on the RHS.

   832

   833   \<^descr> \<^verbatim>\<open>XCONST\<close> is similar to \<^verbatim>\<open>CONST\<close>, but retains the constant name as given.

   834   This is only relevant to translation rules (\secref{sec:syn-trans}), notably

   835   on the LHS.

   836 \<close>

   837

   838

   839 subsection \<open>Inspecting the syntax\<close>

   840

   841 text \<open>

   842   \begin{matharray}{rcl}

   843     @{command_def "print_syntax"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\

   844   \end{matharray}

   845

   846   \<^descr> @{command "print_syntax"} prints the inner syntax of the current context.

   847   The output can be quite large; the most important sections are explained

   848   below.

   849

   850     \<^descr> \<open>lexicon\<close> lists the delimiters of the inner token language; see

   851     \secref{sec:inner-lex}.

   852

   853     \<^descr> \<open>prods\<close> lists the productions of the underlying priority grammar; see

   854     \secref{sec:priority-grammar}.

   855

   856     The nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is rendered in plain text as \<open>A[p]\<close>; delimiters

   857     are quoted. Many productions have an extra \<open>\<dots> => name\<close>. These names later

   858     become the heads of parse trees; they also guide the pretty printer.

   859

   860     Productions without such parse tree names are called \<^emph>\<open>copy productions\<close>.

   861     Their right-hand side must have exactly one nonterminal symbol (or named

   862     token). The parser does not create a new parse tree node for copy

   863     productions, but simply returns the parse tree of the right-hand symbol.

   864

   865     If the right-hand side of a copy production consists of a single

   866     nonterminal without any delimiters, then it is called a \<^emph>\<open>chain

   867     production\<close>. Chain productions act as abbreviations: conceptually, they

   868     are removed from the grammar by adding new productions. Priority

   869     information attached to chain productions is ignored; only the dummy value

   870     \<open>-1\<close> is displayed.

   871

   872     \<^descr> \<open>print modes\<close> lists the alternative print modes provided by this

   873     grammar; see \secref{sec:print-modes}.

   874

   875     \<^descr> \<open>parse_rules\<close> and \<open>print_rules\<close> relate to syntax translations (macros);

   876     see \secref{sec:syn-trans}.

   877

   878     \<^descr> \<open>parse_ast_translation\<close> and \<open>print_ast_translation\<close> list sets of

   879     constants that invoke translation functions for abstract syntax trees,

   880     which are only required in very special situations; see

   881     \secref{sec:tr-funs}.

   882

   883     \<^descr> \<open>parse_translation\<close> and \<open>print_translation\<close> list the sets of constants

   884     that invoke regular translation functions; see \secref{sec:tr-funs}.

   885 \<close>

   886

   887

   888 subsection \<open>Ambiguity of parsed expressions\<close>

   889

   890 text \<open>

   891   \begin{tabular}{rcll}

   892     @{attribute_def syntax_ambiguity_warning} & : & \<open>attribute\<close> & default \<open>true\<close> \\

   893     @{attribute_def syntax_ambiguity_limit} & : & \<open>attribute\<close> & default \<open>10\<close> \\

   894   \end{tabular}

   895

   896   Depending on the grammar and the given input, parsing may be ambiguous.

   897   Isabelle lets the Earley parser enumerate all possible parse trees, and then

   898   tries to make the best out of the situation. Terms that cannot be

   899   type-checked are filtered out, which often leads to a unique result in the

   900   end. Unlike regular type reconstruction, which is applied to the whole

   901   collection of input terms simultaneously, the filtering stage only treats

   902   each given term in isolation. Filtering is also not attempted for individual

   903   types or raw ASTs (as required for @{command translations}).

   904

   905   Certain warning or error messages are printed, depending on the situation

   906   and the given configuration options. Parsing ultimately fails, if multiple

   907   results remain after the filtering phase.

   908

   909   \<^descr> @{attribute syntax_ambiguity_warning} controls output of explicit warning

   910   messages about syntax ambiguity.

   911

   912   \<^descr> @{attribute syntax_ambiguity_limit} determines the number of resulting

   913   parse trees that are shown as part of the printed message in case of an

   914   ambiguity.

   915 \<close>

   916

   917

   918 section \<open>Syntax transformations \label{sec:syntax-transformations}\<close>

   919

   920 text \<open>

   921   The inner syntax engine of Isabelle provides separate mechanisms to

   922   transform parse trees either via rewrite systems on first-order ASTs

   923   (\secref{sec:syn-trans}), or ML functions on ASTs or syntactic \<open>\<lambda>\<close>-terms

   924   (\secref{sec:tr-funs}). This works both for parsing and printing, as

   925   outlined in \figref{fig:parse-print}.

   926

   927   \begin{figure}[htbp]

   928   \begin{center}

   929   \begin{tabular}{cl}

   930   string          & \\

   931   \<open>\<down>\<close>     & lexer + parser \\

   932   parse tree      & \\

   933   \<open>\<down>\<close>     & parse AST translation \\

   934   AST             & \\

   935   \<open>\<down>\<close>     & AST rewriting (macros) \\

   936   AST             & \\

   937   \<open>\<down>\<close>     & parse translation \\

   938   --- pre-term ---    & \\

   939   \<open>\<down>\<close>     & print translation \\

   940   AST             & \\

   941   \<open>\<down>\<close>     & AST rewriting (macros) \\

   942   AST             & \\

   943   \<open>\<down>\<close>     & print AST translation \\

   944   string          &

   945   \end{tabular}

   946   \end{center}

   947   \caption{Parsing and printing with translations}\label{fig:parse-print}

   948   \end{figure}

   949

   950   These intermediate syntax tree formats eventually lead to a pre-term with

   951   all names and binding scopes resolved, but most type information still

   952   missing. Explicit type constraints might be given by the user, or implicit

   953   position information by the system --- both need to be passed-through

   954   carefully by syntax transformations.

   955

   956   Pre-terms are further processed by the so-called \<^emph>\<open>check\<close> and \<^emph>\<open>uncheck\<close>

   957   phases that are intertwined with type-inference (see also @{cite

   958   "isabelle-implementation"}). The latter allows to operate on higher-order

   959   abstract syntax with proper binding and type information already available.

   960

   961   As a rule of thumb, anything that manipulates bindings of variables or

   962   constants needs to be implemented as syntax transformation (see below).

   963   Anything else is better done via check/uncheck: a prominent example

   964   application is the @{command abbreviation} concept of Isabelle/Pure.

   965 \<close>

   966

   967

   968 subsection \<open>Abstract syntax trees \label{sec:ast}\<close>

   969

   970 text \<open>

   971   The ML datatype @{ML_type Ast.ast} explicitly represents the intermediate

   972   AST format that is used for syntax rewriting (\secref{sec:syn-trans}). It is

   973   defined in ML as follows:

   974   @{verbatim [display]

   975 \<open>datatype ast =

   976   Constant of string |

   977   Variable of string |

   978   Appl of ast list\<close>}

   979

   980   An AST is either an atom (constant or variable) or a list of (at least two)

   981   subtrees. Occasional diagnostic output of ASTs uses notation that resembles

   982   S-expression of LISP. Constant atoms are shown as quoted strings, variable

   983   atoms as non-quoted strings and applications as a parenthesized list of

   984   subtrees. For example, the AST

   985   @{ML [display] \<open>Ast.Appl [Ast.Constant "_abs", Ast.Variable "x", Ast.Variable "t"]\<close>}

   986   is pretty-printed as \<^verbatim>\<open>("_abs" x t)\<close>. Note that \<^verbatim>\<open>()\<close> and \<^verbatim>\<open>(x)\<close> are

   987   excluded as ASTs, because they have too few subtrees.

   988

   989   \<^medskip>

   990   AST application is merely a pro-forma mechanism to indicate certain

   991   syntactic structures. Thus \<^verbatim>\<open>(c a b)\<close> could mean either term application or

   992   type application, depending on the syntactic context.

   993

   994   Nested application like \<^verbatim>\<open>(("_abs" x t) u)\<close> is also possible, but ASTs are

   995   definitely first-order: the syntax constant \<^verbatim>\<open>"_abs"\<close> does not bind the \<^verbatim>\<open>x\<close>

   996   in any way. Proper bindings are introduced in later stages of the term

   997   syntax, where \<^verbatim>\<open>("_abs" x t)\<close> becomes an @{ML Abs} node and occurrences of

   998   \<^verbatim>\<open>x\<close> in \<^verbatim>\<open>t\<close> are replaced by bound variables (represented as de-Bruijn

   999   indices).

  1000 \<close>

  1001

  1002

  1003 subsubsection \<open>AST constants versus variables\<close>

  1004

  1005 text \<open>

  1006   Depending on the situation --- input syntax, output syntax, translation

  1007   patterns --- the distinction of atomic ASTs as @{ML Ast.Constant} versus

  1008   @{ML Ast.Variable} serves slightly different purposes.

  1009

  1010   Input syntax of a term such as \<open>f a b = c\<close> does not yet indicate the scopes

  1011   of atomic entities \<open>f, a, b, c\<close>: they could be global constants or local

  1012   variables, even bound ones depending on the context of the term. @{ML

  1013   Ast.Variable} leaves this choice still open: later syntax layers (or

  1014   translation functions) may capture such a variable to determine its role

  1015   specifically, to make it a constant, bound variable, free variable etc. In

  1016   contrast, syntax translations that introduce already known constants would

  1017   rather do it via @{ML Ast.Constant} to prevent accidental re-interpretation

  1018   later on.

  1019

  1020   Output syntax turns term constants into @{ML Ast.Constant} and variables

  1021   (free or schematic) into @{ML Ast.Variable}. This information is precise

  1022   when printing fully formal \<open>\<lambda>\<close>-terms.

  1023

  1024   \<^medskip>

  1025   AST translation patterns (\secref{sec:syn-trans}) that represent terms

  1026   cannot distinguish constants and variables syntactically. Explicit

  1027   indication of \<open>CONST c\<close> inside the term language is required, unless \<open>c\<close> is

  1028   known as special \<^emph>\<open>syntax constant\<close> (see also @{command syntax}). It is also

  1029   possible to use @{command syntax} declarations (without mixfix annotation)

  1030   to enforce that certain unqualified names are always treated as constant

  1031   within the syntax machinery.

  1032

  1033   The situation is simpler for ASTs that represent types or sorts, since the

  1034   concrete syntax already distinguishes type variables from type constants

  1035   (constructors). So \<open>('a, 'b) foo\<close> corresponds to an AST application of some

  1036   constant for \<open>foo\<close> and variable arguments for \<open>'a\<close> and \<open>'b\<close>. Note that the

  1037   postfix application is merely a feature of the concrete syntax, while in the

  1038   AST the constructor occurs in head position.

  1039 \<close>

  1040

  1041

  1042 subsubsection \<open>Authentic syntax names\<close>

  1043

  1044 text \<open>

  1045   Naming constant entities within ASTs is another delicate issue. Unqualified

  1046   names are resolved in the name space tables in the last stage of parsing,

  1047   after all translations have been applied. Since syntax transformations do

  1048   not know about this later name resolution, there can be surprises in

  1049   boundary cases.

  1050

  1051   \<^emph>\<open>Authentic syntax names\<close> for @{ML Ast.Constant} avoid this problem: the

  1052   fully-qualified constant name with a special prefix for its formal category

  1053   (\<open>class\<close>, \<open>type\<close>, \<open>const\<close>, \<open>fixed\<close>) represents the information faithfully

  1054   within the untyped AST format. Accidental overlap with free or bound

  1055   variables is excluded as well. Authentic syntax names work implicitly in the

  1056   following situations:

  1057

  1058     \<^item> Input of term constants (or fixed variables) that are introduced by

  1059     concrete syntax via @{command notation}: the correspondence of a

  1060     particular grammar production to some known term entity is preserved.

  1061

  1062     \<^item> Input of type constants (constructors) and type classes --- thanks to

  1063     explicit syntactic distinction independently on the context.

  1064

  1065     \<^item> Output of term constants, type constants, type classes --- this

  1066     information is already available from the internal term to be printed.

  1067

  1068   In other words, syntax transformations that operate on input terms written

  1069   as prefix applications are difficult to make robust. Luckily, this case

  1070   rarely occurs in practice, because syntax forms to be translated usually

  1071   correspond to some concrete notation.

  1072 \<close>

  1073

  1074

  1075 subsection \<open>Raw syntax and translations \label{sec:syn-trans}\<close>

  1076

  1077 text \<open>

  1078   \begin{tabular}{rcll}

  1079     @{command_def "nonterminal"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1080     @{command_def "syntax"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1081     @{command_def "no_syntax"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1082     @{command_def "translations"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1083     @{command_def "no_translations"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1084     @{attribute_def syntax_ast_trace} & : & \<open>attribute\<close> & default \<open>false\<close> \\

  1085     @{attribute_def syntax_ast_stats} & : & \<open>attribute\<close> & default \<open>false\<close> \\

  1086   \end{tabular}

  1087   \<^medskip>

  1088

  1089   Unlike mixfix notation for existing formal entities (\secref{sec:notation}),

  1090   raw syntax declarations provide full access to the priority grammar of the

  1091   inner syntax, without any sanity checks. This includes additional syntactic

  1092   categories (via @{command nonterminal}) and free-form grammar productions

  1093   (via @{command syntax}). Additional syntax translations (or macros, via

  1094   @{command translations}) are required to turn resulting parse trees into

  1095   proper representations of formal entities again.

  1096

  1097   @{rail \<open>

  1098     @@{command nonterminal} (@{syntax name} + @'and')

  1099     ;

  1100     (@@{command syntax} | @@{command no_syntax}) @{syntax mode}? (constdecl +)

  1101     ;

  1102     (@@{command translations} | @@{command no_translations})

  1103       (transpat ('==' | '=>' | '<=' | '\<rightleftharpoons>' | '\<rightharpoonup>' | '\<leftharpoondown>') transpat +)

  1104     ;

  1105

  1106     constdecl: @{syntax name} '::' @{syntax type} @{syntax mixfix}?

  1107     ;

  1108     mode: ('(' ( @{syntax name} | @'output' | @{syntax name} @'output' ) ')')

  1109     ;

  1110     transpat: ('(' @{syntax name} ')')? @{syntax string}

  1111   \<close>}

  1112

  1113   \<^descr> @{command "nonterminal"}~\<open>c\<close> declares a type constructor \<open>c\<close> (without

  1114   arguments) to act as purely syntactic type: a nonterminal symbol of the

  1115   inner syntax.

  1116

  1117   \<^descr> @{command "syntax"}~\<open>(mode) c :: \<sigma> (mx)\<close> augments the priority grammar and

  1118   the pretty printer table for the given print mode (default \<^verbatim>\<open>""\<close>). An

  1119   optional keyword @{keyword_ref "output"} means that only the pretty printer

  1120   table is affected.

  1121

  1122   Following \secref{sec:mixfix}, the mixfix annotation \<open>mx = template ps q\<close>

  1123   together with type \<open>\<sigma> = \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> and specify a grammar production.

  1124   The \<open>template\<close> contains delimiter tokens that surround \<open>n\<close> argument

  1125   positions (\<^verbatim>\<open>_\<close>). The latter correspond to nonterminal symbols \<open>A\<^sub>i\<close> derived

  1126   from the argument types \<open>\<tau>\<^sub>i\<close> as follows:

  1127

  1128     \<^item> \<open>prop\<close> if \<open>\<tau>\<^sub>i = prop\<close>

  1129

  1130     \<^item> \<open>logic\<close> if \<open>\<tau>\<^sub>i = (\<dots>)\<kappa>\<close> for logical type constructor \<open>\<kappa> \<noteq> prop\<close>

  1131

  1132     \<^item> \<open>any\<close> if \<open>\<tau>\<^sub>i = \<alpha>\<close> for type variables

  1133

  1134     \<^item> \<open>\<kappa>\<close> if \<open>\<tau>\<^sub>i = \<kappa>\<close> for nonterminal \<open>\<kappa>\<close> (syntactic type constructor)

  1135

  1136   Each \<open>A\<^sub>i\<close> is decorated by priority \<open>p\<^sub>i\<close> from the given list \<open>ps\<close>; missing

  1137   priorities default to 0.

  1138

  1139   The resulting nonterminal of the production is determined similarly from

  1140   type \<open>\<tau>\<close>, with priority \<open>q\<close> and default 1000.

  1141

  1142   \<^medskip>

  1143   Parsing via this production produces parse trees \<open>t\<^sub>1, \<dots>, t\<^sub>n\<close> for the

  1144   argument slots. The resulting parse tree is composed as \<open>c t\<^sub>1 \<dots> t\<^sub>n\<close>, by

  1145   using the syntax constant \<open>c\<close> of the syntax declaration.

  1146

  1147   Such syntactic constants are invented on the spot, without formal check

  1148   wrt.\ existing declarations. It is conventional to use plain identifiers

  1149   prefixed by a single underscore (e.g.\ \<open>_foobar\<close>). Names should be chosen

  1150   with care, to avoid clashes with other syntax declarations.

  1151

  1152   \<^medskip>

  1153   The special case of copy production is specified by \<open>c =\<close>~\<^verbatim>\<open>""\<close> (empty

  1154   string). It means that the resulting parse tree \<open>t\<close> is copied directly,

  1155   without any further decoration.

  1156

  1157   \<^descr> @{command "no_syntax"}~\<open>(mode) decls\<close> removes grammar declarations (and

  1158   translations) resulting from \<open>decls\<close>, which are interpreted in the same

  1159   manner as for @{command "syntax"} above.

  1160

  1161   \<^descr> @{command "translations"}~\<open>rules\<close> specifies syntactic translation rules

  1162   (i.e.\ macros) as first-order rewrite rules on ASTs (\secref{sec:ast}). The

  1163   theory context maintains two independent lists translation rules: parse

  1164   rules (\<^verbatim>\<open>=>\<close> or \<open>\<rightharpoonup>\<close>) and print rules (\<^verbatim>\<open><=\<close> or \<open>\<leftharpoondown>\<close>). For convenience, both

  1165   can be specified simultaneously as parse~/ print rules (\<^verbatim>\<open>==\<close> or \<open>\<rightleftharpoons>\<close>).

  1166

  1167   Translation patterns may be prefixed by the syntactic category to be used

  1168   for parsing; the default is \<open>logic\<close> which means that regular term syntax is

  1169   used. Both sides of the syntax translation rule undergo parsing and parse

  1170   AST translations \secref{sec:tr-funs}, in order to perform some fundamental

  1171   normalization like \<open>\<lambda>x y. b \<leadsto> \<lambda>x. \<lambda>y. b\<close>, but other AST translation rules

  1172   are \<^emph>\<open>not\<close> applied recursively here.

  1173

  1174   When processing AST patterns, the inner syntax lexer runs in a different

  1175   mode that allows identifiers to start with underscore. This accommodates the

  1176   usual naming convention for auxiliary syntax constants --- those that do not

  1177   have a logical counter part --- by allowing to specify arbitrary AST

  1178   applications within the term syntax, independently of the corresponding

  1179   concrete syntax.

  1180

  1181   Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML

  1182   Ast.Variable} as follows: a qualified name or syntax constant declared via

  1183   @{command syntax}, or parse tree head of concrete notation becomes @{ML

  1184   Ast.Constant}, anything else @{ML Ast.Variable}. Note that \<open>CONST\<close> and

  1185   \<open>XCONST\<close> within the term language (\secref{sec:pure-grammar}) allow to

  1186   enforce treatment as constants.

  1187

  1188   AST rewrite rules \<open>(lhs, rhs)\<close> need to obey the following side-conditions:

  1189

  1190     \<^item> Rules must be left linear: \<open>lhs\<close> must not contain repeated

  1191     variables.\<^footnote>\<open>The deeper reason for this is that AST equality is not

  1192     well-defined: different occurrences of the same'' AST could be decorated

  1193     differently by accidental type-constraints or source position information,

  1194     for example.\<close>

  1195

  1196     \<^item> Every variable in \<open>rhs\<close> must also occur in \<open>lhs\<close>.

  1197

  1198   \<^descr> @{command "no_translations"}~\<open>rules\<close> removes syntactic translation rules,

  1199   which are interpreted in the same manner as for @{command "translations"}

  1200   above.

  1201

  1202   \<^descr> @{attribute syntax_ast_trace} and @{attribute syntax_ast_stats} control

  1203   diagnostic output in the AST normalization process, when translation rules

  1204   are applied to concrete input or output.

  1205

  1206

  1207   Raw syntax and translations provides a slightly more low-level access to the

  1208   grammar and the form of resulting parse trees. It is often possible to avoid

  1209   this untyped macro mechanism, and use type-safe @{command abbreviation} or

  1210   @{command notation} instead. Some important situations where @{command

  1211   syntax} and @{command translations} are really need are as follows:

  1212

  1213   \<^item> Iterated replacement via recursive @{command translations}. For example,

  1214   consider list enumeration @{term "[a, b, c, d]"} as defined in theory

  1215   @{theory List} in Isabelle/HOL.

  1216

  1217   \<^item> Change of binding status of variables: anything beyond the built-in

  1218   @{keyword "binder"} mixfix annotation requires explicit syntax translations.

  1219   For example, consider list filter comprehension @{term "[x \<leftarrow> xs . P]"} as

  1220   defined in theory @{theory List} in Isabelle/HOL.

  1221 \<close>

  1222

  1223

  1224 subsubsection \<open>Applying translation rules\<close>

  1225

  1226 text \<open>

  1227   As a term is being parsed or printed, an AST is generated as an intermediate

  1228   form according to \figref{fig:parse-print}. The AST is normalized by

  1229   applying translation rules in the manner of a first-order term rewriting

  1230   system. We first examine how a single rule is applied.

  1231

  1232   Let \<open>t\<close> be the abstract syntax tree to be normalized and \<open>(lhs, rhs)\<close> some

  1233   translation rule. A subtree \<open>u\<close> of \<open>t\<close> is called \<^emph>\<open>redex\<close> if it is an

  1234   instance of \<open>lhs\<close>; in this case the pattern \<open>lhs\<close> is said to match the

  1235   object \<open>u\<close>. A redex matched by \<open>lhs\<close> may be replaced by the corresponding

  1236   instance of \<open>rhs\<close>, thus \<^emph>\<open>rewriting\<close> the AST \<open>t\<close>. Matching requires some

  1237   notion of \<^emph>\<open>place-holders\<close> in rule patterns: @{ML Ast.Variable} serves this

  1238   purpose.

  1239

  1240   More precisely, the matching of the object \<open>u\<close> against the pattern \<open>lhs\<close> is

  1241   performed as follows:

  1242

  1243     \<^item> Objects of the form @{ML Ast.Variable}~\<open>x\<close> or @{ML Ast.Constant}~\<open>x\<close> are

  1244     matched by pattern @{ML Ast.Constant}~\<open>x\<close>. Thus all atomic ASTs in the

  1245     object are treated as (potential) constants, and a successful match makes

  1246     them actual constants even before name space resolution (see also

  1247     \secref{sec:ast}).

  1248

  1249     \<^item> Object \<open>u\<close> is matched by pattern @{ML Ast.Variable}~\<open>x\<close>, binding \<open>x\<close> to

  1250     \<open>u\<close>.

  1251

  1252     \<^item> Object @{ML Ast.Appl}~\<open>us\<close> is matched by @{ML Ast.Appl}~\<open>ts\<close> if \<open>us\<close> and

  1253     \<open>ts\<close> have the same length and each corresponding subtree matches.

  1254

  1255     \<^item> In every other case, matching fails.

  1256

  1257   A successful match yields a substitution that is applied to \<open>rhs\<close>,

  1258   generating the instance that replaces \<open>u\<close>.

  1259

  1260   Normalizing an AST involves repeatedly applying translation rules until none

  1261   are applicable. This works yoyo-like: top-down, bottom-up, top-down, etc. At

  1262   each subtree position, rules are chosen in order of appearance in the theory

  1263   definitions.

  1264

  1265   The configuration options @{attribute syntax_ast_trace} and @{attribute

  1266   syntax_ast_stats} might help to understand this process and diagnose

  1267   problems.

  1268

  1269   \begin{warn}

  1270   If syntax translation rules work incorrectly, the output of @{command_ref

  1271   print_syntax} with its \<^emph>\<open>rules\<close> sections reveals the actual internal forms

  1272   of AST pattern, without potentially confusing concrete syntax. Recall that

  1273   AST constants appear as quoted strings and variables without quotes.

  1274   \end{warn}

  1275

  1276   \begin{warn}

  1277   If @{attribute_ref eta_contract} is set to \<open>true\<close>, terms will be

  1278   \<open>\<eta>\<close>-contracted \<^emph>\<open>before\<close> the AST rewriter sees them. Thus some abstraction

  1279   nodes needed for print rules to match may vanish. For example, \<open>Ball A (\<lambda>x.

  1280   P x)\<close> would contract to \<open>Ball A P\<close> and the standard print rule would fail to

  1281   apply. This problem can be avoided by hand-written ML translation functions

  1282   (see also \secref{sec:tr-funs}), which is in fact the same mechanism used in

  1283   built-in @{keyword "binder"} declarations.

  1284   \end{warn}

  1285 \<close>

  1286

  1287

  1288 subsection \<open>Syntax translation functions \label{sec:tr-funs}\<close>

  1289

  1290 text \<open>

  1291   \begin{matharray}{rcl}

  1292     @{command_def "parse_ast_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1293     @{command_def "parse_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1294     @{command_def "print_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1295     @{command_def "typed_print_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1296     @{command_def "print_ast_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\

  1297     @{ML_antiquotation_def "class_syntax"} & : & \<open>ML antiquotation\<close> \\

  1298     @{ML_antiquotation_def "type_syntax"} & : & \<open>ML antiquotation\<close> \\

  1299     @{ML_antiquotation_def "const_syntax"} & : & \<open>ML antiquotation\<close> \\

  1300     @{ML_antiquotation_def "syntax_const"} & : & \<open>ML antiquotation\<close> \\

  1301   \end{matharray}

  1302

  1303   Syntax translation functions written in ML admit almost arbitrary

  1304   manipulations of inner syntax, at the expense of some complexity and

  1305   obscurity in the implementation.

  1306

  1307   @{rail \<open>

  1308   ( @@{command parse_ast_translation} | @@{command parse_translation} |

  1309     @@{command print_translation} | @@{command typed_print_translation} |

  1310     @@{command print_ast_translation}) @{syntax text}

  1311   ;

  1312   (@@{ML_antiquotation class_syntax} |

  1313    @@{ML_antiquotation type_syntax} |

  1314    @@{ML_antiquotation const_syntax} |

  1315    @@{ML_antiquotation syntax_const}) embedded

  1316   \<close>}

  1317

  1318   \<^descr> @{command parse_translation} etc. declare syntax translation functions to

  1319   the theory. Any of these commands have a single @{syntax text} argument that

  1320   refers to an ML expression of appropriate type as follows:

  1321

  1322   \<^medskip>

  1323   {\footnotesize

  1324   \begin{tabular}{l}

  1325   @{command parse_ast_translation} : \\

  1326   \quad @{ML_type "(string * (Proof.context -> Ast.ast list -> Ast.ast)) list"} \\

  1327   @{command parse_translation} : \\

  1328   \quad @{ML_type "(string * (Proof.context -> term list -> term)) list"} \\

  1329   @{command print_translation} : \\

  1330   \quad @{ML_type "(string * (Proof.context -> term list -> term)) list"} \\

  1331   @{command typed_print_translation} : \\

  1332   \quad @{ML_type "(string * (Proof.context -> typ -> term list -> term)) list"} \\

  1333   @{command print_ast_translation} : \\

  1334   \quad @{ML_type "(string * (Proof.context -> Ast.ast list -> Ast.ast)) list"} \\

  1335   \end{tabular}}

  1336   \<^medskip>

  1337

  1338   The argument list consists of \<open>(c, tr)\<close> pairs, where \<open>c\<close> is the syntax name

  1339   of the formal entity involved, and \<open>tr\<close> a function that translates a syntax

  1340   form \<open>c args\<close> into \<open>tr ctxt args\<close> (depending on the context). The

  1341   Isabelle/ML naming convention for parse translations is \<open>c_tr\<close> and for print

  1342   translations \<open>c_tr'\<close>.

  1343

  1344   The @{command_ref print_syntax} command displays the sets of names

  1345   associated with the translation functions of a theory under

  1346   \<open>parse_ast_translation\<close> etc.

  1347

  1348   \<^descr> \<open>@{class_syntax c}\<close>, \<open>@{type_syntax c}\<close>, \<open>@{const_syntax c}\<close> inline the

  1349   authentic syntax name of the given formal entities into the ML source. This

  1350   is the fully-qualified logical name prefixed by a special marker to indicate

  1351   its kind: thus different logical name spaces are properly distinguished

  1352   within parse trees.

  1353

  1354   \<^descr> \<open>@{const_syntax c}\<close> inlines the name \<open>c\<close> of the given syntax constant,

  1355   having checked that it has been declared via some @{command syntax} commands

  1356   within the theory context. Note that the usual naming convention makes

  1357   syntax constants start with underscore, to reduce the chance of accidental

  1358   clashes with other names occurring in parse trees (unqualified constants

  1359   etc.).

  1360 \<close>

  1361

  1362

  1363 subsubsection \<open>The translation strategy\<close>

  1364

  1365 text \<open>

  1366   The different kinds of translation functions are invoked during the

  1367   transformations between parse trees, ASTs and syntactic terms (cf.\

  1368   \figref{fig:parse-print}). Whenever a combination of the form \<open>c x\<^sub>1 \<dots> x\<^sub>n\<close>

  1369   is encountered, and a translation function \<open>f\<close> of appropriate kind is

  1370   declared for \<open>c\<close>, the result is produced by evaluation of \<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close>

  1371   in ML.

  1372

  1373   For AST translations, the arguments \<open>x\<^sub>1, \<dots>, x\<^sub>n\<close> are ASTs. A combination

  1374   has the form @{ML "Ast.Constant"}~\<open>c\<close> or @{ML "Ast.Appl"}~\<open>[\<close>@{ML

  1375   Ast.Constant}~\<open>c, x\<^sub>1, \<dots>, x\<^sub>n]\<close>. For term translations, the arguments are

  1376   terms and a combination has the form @{ML Const}~\<open>(c, \<tau>)\<close> or @{ML

  1377   Const}~\<open>(c, \<tau>) $x\<^sub>1$ \<dots> \$ x\<^sub>n\<close>. Terms allow more sophisticated

  1378   transformations than ASTs do, typically involving abstractions and bound

  1379   variables. \<^emph>\<open>Typed\<close> print translations may even peek at the type \<open>\<tau>\<close> of the

  1380   constant they are invoked on, although some information might have been

  1381   suppressed for term output already.

  1382

  1383   Regardless of whether they act on ASTs or terms, translation functions

  1384   called during the parsing process differ from those for printing in their

  1385   overall behaviour:

  1386

  1387     \<^descr>[Parse translations] are applied bottom-up. The arguments are already in

  1388     translated form. The translations must not fail; exceptions trigger an

  1389     error message. There may be at most one function associated with any

  1390     syntactic name.

  1391

  1392     \<^descr>[Print translations] are applied top-down. They are supplied with

  1393     arguments that are partly still in internal form. The result again

  1394     undergoes translation; therefore a print translation should not introduce

  1395     as head the very constant that invoked it. The function may raise

  1396     exception @{ML Match} to indicate failure; in this event it has no effect.

  1397     Multiple functions associated with some syntactic name are tried in the

  1398     order of declaration in the theory.

  1399

  1400   Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and @{ML

  1401   Const} for terms --- can invoke translation functions. This means that parse

  1402   translations can only be associated with parse tree heads of concrete

  1403   syntax, or syntactic constants introduced via other translations. For plain

  1404   identifiers within the term language, the status of constant versus variable

  1405   is not yet know during parsing. This is in contrast to print translations,

  1406   where constants are explicitly known from the given term in its fully

  1407   internal form.

  1408 \<close>

  1409

  1410

  1411 subsection \<open>Built-in syntax transformations\<close>

  1412

  1413 text \<open>

  1414   Here are some further details of the main syntax transformation phases of

  1415   \figref{fig:parse-print}.

  1416 \<close>

  1417

  1418

  1419 subsubsection \<open>Transforming parse trees to ASTs\<close>

  1420

  1421 text \<open>

  1422   The parse tree is the raw output of the parser. It is transformed into an

  1423   AST according to some basic scheme that may be augmented by AST translation

  1424   functions as explained in \secref{sec:tr-funs}.

  1425

  1426   The parse tree is constructed by nesting the right-hand sides of the

  1427   productions used to recognize the input. Such parse trees are simply lists

  1428   of tokens and constituent parse trees, the latter representing the

  1429   nonterminals of the productions. Ignoring AST translation functions, parse

  1430   trees are transformed to ASTs by stripping out delimiters and copy

  1431   productions, while retaining some source position information from input

  1432   tokens.

  1433

  1434   The Pure syntax provides predefined AST translations to make the basic

  1435   \<open>\<lambda>\<close>-term structure more apparent within the (first-order) AST

  1436   representation, and thus facilitate the use of @{command translations} (see

  1437   also \secref{sec:syn-trans}). This covers ordinary term application, type

  1438   application, nested abstraction, iterated meta implications and function

  1439   types. The effect is illustrated on some representative input strings is as

  1440   follows:

  1441

  1442   \begin{center}

  1443   \begin{tabular}{ll}

  1444   input source & AST \\

  1445   \hline

  1446   \<open>f x y z\<close> & \<^verbatim>\<open>(f x y z)\<close> \\

  1447   \<open>'a ty\<close> & \<^verbatim>\<open>(ty 'a)\<close> \\

  1448   \<open>('a, 'b)ty\<close> & \<^verbatim>\<open>(ty 'a 'b)\<close> \\

  1449   \<open>\<lambda>x y z. t\<close> & \<^verbatim>\<open>("_abs" x ("_abs" y ("_abs" z t)))\<close> \\

  1450   \<open>\<lambda>x :: 'a. t\<close> & \<^verbatim>\<open>("_abs" ("_constrain" x 'a) t)\<close> \\

  1451   \<open>\<lbrakk>P; Q; R\<rbrakk> \<Longrightarrow> S\<close> & \<^verbatim>\<open>("Pure.imp" P ("Pure.imp" Q ("Pure.imp" R S)))\<close> \\

  1452    \<open>['a, 'b, 'c] \<Rightarrow> 'd\<close> & \<^verbatim>\<open>("fun" 'a ("fun" 'b ("fun" 'c 'd)))\<close> \\

  1453   \end{tabular}

  1454   \end{center}

  1455

  1456   Note that type and sort constraints may occur in further places ---

  1457   translations need to be ready to cope with them. The built-in syntax

  1458   transformation from parse trees to ASTs insert additional constraints that

  1459   represent source positions.

  1460 \<close>

  1461

  1462

  1463 subsubsection \<open>Transforming ASTs to terms\<close>

  1464

  1465 text \<open>

  1466   After application of macros (\secref{sec:syn-trans}), the AST is transformed

  1467   into a term. This term still lacks proper type information, but it might

  1468   contain some constraints consisting of applications with head \<^verbatim>\<open>_constrain\<close>,

  1469   where the second argument is a type encoded as a pre-term within the syntax.

  1470   Type inference later introduces correct types, or indicates type errors in

  1471   the input.

  1472

  1473   Ignoring parse translations, ASTs are transformed to terms by mapping AST

  1474   constants to term constants, AST variables to term variables or constants

  1475   (according to the name space), and AST applications to iterated term

  1476   applications.

  1477

  1478   The outcome is still a first-order term. Proper abstractions and bound

  1479   variables are introduced by parse translations associated with certain

  1480   syntax constants. Thus \<^verbatim>\<open>("_abs" x x)\<close> eventually becomes a de-Bruijn term

  1481   \<^verbatim>\<open>Abs ("x", _, Bound 0)\<close>.

  1482 \<close>

  1483

  1484

  1485 subsubsection \<open>Printing of terms\<close>

  1486

  1487 text \<open>

  1488   The output phase is essentially the inverse of the input phase. Terms are

  1489   translated via abstract syntax trees into pretty-printed text.

  1490

  1491   Ignoring print translations, the transformation maps term constants,

  1492   variables and applications to the corresponding constructs on ASTs.

  1493   Abstractions are mapped to applications of the special constant \<^verbatim>\<open>_abs\<close> as

  1494   seen before. Type constraints are represented via special \<^verbatim>\<open>_constrain\<close>

  1495   forms, according to various policies of type annotation determined

  1496   elsewhere. Sort constraints of type variables are handled in a similar

  1497   fashion.

  1498

  1499   After application of macros (\secref{sec:syn-trans}), the AST is finally

  1500   pretty-printed. The built-in print AST translations reverse the

  1501   corresponding parse AST translations.

  1502

  1503   \<^medskip>

  1504   For the actual printing process, the priority grammar

  1505   (\secref{sec:priority-grammar}) plays a vital role: productions are used as

  1506   templates for pretty printing, with argument slots stemming from

  1507   nonterminals, and syntactic sugar stemming from literal tokens.

  1508

  1509   Each AST application with constant head \<open>c\<close> and arguments \<open>t\<^sub>1\<close>, \dots,

  1510   \<open>t\<^sub>n\<close> (for \<open>n = 0\<close> the AST is just the constant \<open>c\<close> itself) is printed

  1511   according to the first grammar production of result name \<open>c\<close>. The required

  1512   syntax priority of the argument slot is given by its nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>.

  1513   The argument \<open>t\<^sub>i\<close> that corresponds to the position of \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is printed

  1514   recursively, and then put in parentheses \<^emph>\<open>if\<close> its priority \<open>p\<close> requires

  1515   this. The resulting output is concatenated with the syntactic sugar

  1516   according to the grammar production.

  1517

  1518   If an AST application \<open>(c x\<^sub>1 \<dots> x\<^sub>m)\<close> has more arguments than the

  1519   corresponding production, it is first split into \<open>((c x\<^sub>1 \<dots> x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \<dots>

  1520   x\<^sub>m)\<close> and then printed recursively as above.

  1521

  1522   Applications with too few arguments or with non-constant head or without a

  1523   corresponding production are printed in prefix-form like \<open>f t\<^sub>1 \<dots> t\<^sub>n\<close> for

  1524   terms.

  1525

  1526   Multiple productions associated with some name \<open>c\<close> are tried in order of

  1527   appearance within the grammar. An occurrence of some AST variable \<open>x\<close> is

  1528   printed as \<open>x\<close> outright.

  1529

  1530   \<^medskip>

  1531   White space is \<^emph>\<open>not\<close> inserted automatically. If blanks (or breaks) are

  1532   required to separate tokens, they need to be specified in the mixfix

  1533   declaration (\secref{sec:mixfix}).

  1534 \<close>

  1535

  1536 end