src/Doc/Isar_Ref/Inner_Syntax.thy
author wenzelm
Mon Oct 09 21:12:22 2017 +0200 (23 months ago)
changeset 66822 4642cf4a7ebb
parent 63933 e149e3e320a3
child 67146 909dcdec2122
permissions -rw-r--r--
tuned signature;
wenzelm@61656
     1
(*:maxLineLen=78:*)
wenzelm@61656
     2
wenzelm@28762
     3
theory Inner_Syntax
wenzelm@63531
     4
  imports Main Base
wenzelm@28762
     5
begin
wenzelm@28762
     6
wenzelm@58618
     7
chapter \<open>Inner syntax --- the term language \label{ch:inner-syntax}\<close>
wenzelm@28762
     8
wenzelm@62106
     9
text \<open>
wenzelm@62106
    10
  The inner syntax of Isabelle provides concrete notation for the main
wenzelm@62106
    11
  entities of the logical framework, notably \<open>\<lambda>\<close>-terms with types and type
wenzelm@62106
    12
  classes. Applications may either extend existing syntactic categories by
wenzelm@62106
    13
  additional notation, or define new sub-languages that are linked to the
wenzelm@62106
    14
  standard term language via some explicit markers. For example \<^verbatim>\<open>FOO\<close>~\<open>foo\<close>
wenzelm@62106
    15
  could embed the syntax corresponding for some user-defined nonterminal \<open>foo\<close>
wenzelm@62106
    16
  --- within the bounds of the given lexical syntax of Isabelle/Pure.
wenzelm@46282
    17
wenzelm@62106
    18
  The most basic way to specify concrete syntax for logical entities works via
wenzelm@62106
    19
  mixfix annotations (\secref{sec:mixfix}), which may be usually given as part
wenzelm@62106
    20
  of the original declaration or via explicit notation commands later on
wenzelm@62106
    21
  (\secref{sec:notation}). This already covers many needs of concrete syntax
wenzelm@62106
    22
  without having to understand the full complexity of inner syntax layers.
wenzelm@46282
    23
wenzelm@62106
    24
  Further details of the syntax engine involves the classical distinction of
wenzelm@62106
    25
  lexical language versus context-free grammar (see \secref{sec:pure-syntax}),
wenzelm@62106
    26
  and various mechanisms for \<^emph>\<open>syntax transformations\<close> (see
wenzelm@62106
    27
  \secref{sec:syntax-transformations}).
wenzelm@58618
    28
\<close>
wenzelm@46282
    29
wenzelm@46282
    30
wenzelm@58618
    31
section \<open>Printing logical entities\<close>
wenzelm@28762
    32
wenzelm@58618
    33
subsection \<open>Diagnostic commands \label{sec:print-diag}\<close>
wenzelm@28762
    34
wenzelm@58618
    35
text \<open>
wenzelm@28762
    36
  \begin{matharray}{rcl}
wenzelm@61493
    37
    @{command_def "typ"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
wenzelm@61493
    38
    @{command_def "term"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
wenzelm@61493
    39
    @{command_def "prop"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
wenzelm@61493
    40
    @{command_def "thm"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
wenzelm@61493
    41
    @{command_def "prf"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
wenzelm@61493
    42
    @{command_def "full_prf"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
wenzelm@61493
    43
    @{command_def "print_state"}\<open>\<^sup>*\<close> & : & \<open>any \<rightarrow>\<close> \\
wenzelm@28762
    44
  \end{matharray}
wenzelm@28762
    45
wenzelm@28762
    46
  These diagnostic commands assist interactive development by printing
wenzelm@28762
    47
  internal logical entities in a human-readable fashion.
wenzelm@28762
    48
wenzelm@55112
    49
  @{rail \<open>
wenzelm@48792
    50
    @@{command typ} @{syntax modes}? @{syntax type} ('::' @{syntax sort})?
wenzelm@28762
    51
    ;
wenzelm@42596
    52
    @@{command term} @{syntax modes}? @{syntax term}
wenzelm@28762
    53
    ;
wenzelm@42596
    54
    @@{command prop} @{syntax modes}? @{syntax prop}
wenzelm@28762
    55
    ;
wenzelm@62969
    56
    @@{command thm} @{syntax modes}? @{syntax thms}
wenzelm@28762
    57
    ;
wenzelm@62969
    58
    ( @@{command prf} | @@{command full_prf} ) @{syntax modes}? @{syntax thms}?
wenzelm@28762
    59
    ;
wenzelm@52430
    60
    @@{command print_state} @{syntax modes}?
wenzelm@28762
    61
    ;
wenzelm@42596
    62
    @{syntax_def modes}: '(' (@{syntax name} + ) ')'
wenzelm@55112
    63
  \<close>}
wenzelm@28762
    64
wenzelm@62106
    65
  \<^descr> @{command "typ"}~\<open>\<tau>\<close> reads and prints a type expression according to the
wenzelm@62106
    66
  current context.
wenzelm@48792
    67
wenzelm@62106
    68
  \<^descr> @{command "typ"}~\<open>\<tau> :: s\<close> uses type-inference to determine the most
wenzelm@62106
    69
  general way to make \<open>\<tau>\<close> conform to sort \<open>s\<close>. For concrete \<open>\<tau>\<close> this checks if
wenzelm@62106
    70
  the type belongs to that sort. Dummy type parameters ``\<open>_\<close>'' (underscore)
wenzelm@62106
    71
  are assigned to fresh type variables with most general sorts, according the
wenzelm@62106
    72
  the principles of type-inference.
wenzelm@28766
    73
wenzelm@62106
    74
    \<^descr> @{command "term"}~\<open>t\<close> and @{command "prop"}~\<open>\<phi>\<close> read, type-check and
wenzelm@62106
    75
    print terms or propositions according to the current theory or proof
wenzelm@62106
    76
    context; the inferred type of \<open>t\<close> is output as well. Note that these
wenzelm@62106
    77
    commands are also useful in inspecting the current environment of term
wenzelm@62106
    78
    abbreviations.
wenzelm@28762
    79
wenzelm@62106
    80
    \<^descr> @{command "thm"}~\<open>a\<^sub>1 \<dots> a\<^sub>n\<close> retrieves theorems from the current theory
wenzelm@62106
    81
    or proof context. Note that any attributes included in the theorem
wenzelm@62106
    82
    specifications are applied to a temporary context derived from the current
wenzelm@62106
    83
    theory or proof; the result is discarded, i.e.\ attributes involved in
wenzelm@62106
    84
    \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> do not have any permanent effect.
wenzelm@28762
    85
wenzelm@62106
    86
    \<^descr> @{command "prf"} displays the (compact) proof term of the current proof
wenzelm@63624
    87
    state (if present), or of the given theorems. Note that this requires an
wenzelm@63624
    88
    underlying logic image with proof terms enabled, e.g. \<open>HOL-Proofs\<close>.
wenzelm@28762
    89
wenzelm@62106
    90
    \<^descr> @{command "full_prf"} is like @{command "prf"}, but displays the full
wenzelm@62106
    91
    proof term, i.e.\ also displays information omitted in the compact proof
wenzelm@62106
    92
    term, which is denoted by ``\<open>_\<close>'' placeholders there.
wenzelm@28762
    93
wenzelm@62106
    94
    \<^descr> @{command "print_state"} prints the current proof state (if present),
wenzelm@62106
    95
    including current facts and goals.
wenzelm@28762
    96
wenzelm@61997
    97
  All of the diagnostic commands above admit a list of \<open>modes\<close> to be
wenzelm@61997
    98
  specified, which is appended to the current print mode; see also
wenzelm@61997
    99
  \secref{sec:print-modes}. Thus the output behavior may be modified according
wenzelm@61997
   100
  particular print mode features. For example, @{command
wenzelm@61997
   101
  "print_state"}~\<open>(latex)\<close> prints the current proof state with mathematical
wenzelm@61997
   102
  symbols and special characters represented in {\LaTeX} source, according to
wenzelm@61997
   103
  the Isabelle style @{cite "isabelle-system"}.
wenzelm@28762
   104
wenzelm@62106
   105
  Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more systematic
wenzelm@62106
   106
  way to include formal items into the printed text document.
wenzelm@58618
   107
\<close>
wenzelm@28762
   108
wenzelm@28762
   109
wenzelm@58618
   110
subsection \<open>Details of printed content\<close>
wenzelm@28763
   111
wenzelm@58618
   112
text \<open>
wenzelm@42655
   113
  \begin{tabular}{rcll}
wenzelm@61493
   114
    @{attribute_def show_markup} & : & \<open>attribute\<close> \\
wenzelm@61493
   115
    @{attribute_def show_types} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   116
    @{attribute_def show_sorts} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   117
    @{attribute_def show_consts} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   118
    @{attribute_def show_abbrevs} & : & \<open>attribute\<close> & default \<open>true\<close> \\
wenzelm@61493
   119
    @{attribute_def show_brackets} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   120
    @{attribute_def names_long} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   121
    @{attribute_def names_short} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   122
    @{attribute_def names_unique} & : & \<open>attribute\<close> & default \<open>true\<close> \\
wenzelm@61493
   123
    @{attribute_def eta_contract} & : & \<open>attribute\<close> & default \<open>true\<close> \\
wenzelm@61493
   124
    @{attribute_def goals_limit} & : & \<open>attribute\<close> & default \<open>10\<close> \\
wenzelm@61493
   125
    @{attribute_def show_main_goal} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   126
    @{attribute_def show_hyps} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   127
    @{attribute_def show_tags} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
   128
    @{attribute_def show_question_marks} & : & \<open>attribute\<close> & default \<open>true\<close> \\
wenzelm@42655
   129
  \end{tabular}
wenzelm@61421
   130
  \<^medskip>
wenzelm@28763
   131
wenzelm@62106
   132
  These configuration options control the detail of information that is
wenzelm@62106
   133
  displayed for types, terms, theorems, goals etc. See also
wenzelm@42655
   134
  \secref{sec:config}.
wenzelm@28765
   135
wenzelm@62106
   136
  \<^descr> @{attribute show_markup} controls direct inlining of markup into the
wenzelm@62106
   137
  printed representation of formal entities --- notably type and sort
wenzelm@62106
   138
  constraints. This enables Prover IDE users to retrieve that information via
wenzelm@62106
   139
  tooltips or popups while hovering with the mouse over the output window, for
wenzelm@62106
   140
  example. Consequently, this option is enabled by default for Isabelle/jEdit.
wenzelm@49699
   141
wenzelm@62106
   142
  \<^descr> @{attribute show_types} and @{attribute show_sorts} control printing of
wenzelm@62106
   143
  type constraints for term variables, and sort constraints for type
wenzelm@62106
   144
  variables. By default, neither of these are shown in output. If @{attribute
wenzelm@62106
   145
  show_sorts} is enabled, types are always shown as well. In Isabelle/jEdit,
wenzelm@62106
   146
  manual setting of these options is normally not required thanks to
wenzelm@62106
   147
  @{attribute show_markup} above.
wenzelm@28763
   148
wenzelm@62106
   149
  Note that displaying types and sorts may explain why a polymorphic inference
wenzelm@62106
   150
  rule fails to resolve with some goal, or why a rewrite rule does not apply
wenzelm@62106
   151
  as expected.
wenzelm@28765
   152
wenzelm@62106
   153
  \<^descr> @{attribute show_consts} controls printing of types of constants when
wenzelm@62106
   154
  displaying a goal state.
wenzelm@28763
   155
wenzelm@62106
   156
  Note that the output can be enormous, because polymorphic constants often
wenzelm@62106
   157
  occur at several different type instances.
wenzelm@62106
   158
wenzelm@62106
   159
  \<^descr> @{attribute show_abbrevs} controls folding of constant abbreviations.
wenzelm@40879
   160
wenzelm@62106
   161
  \<^descr> @{attribute show_brackets} controls bracketing in pretty printed output.
wenzelm@62106
   162
  If enabled, all sub-expressions of the pretty printing tree will be
wenzelm@62106
   163
  parenthesized, even if this produces malformed term syntax! This crude way
wenzelm@62106
   164
  of showing the internal structure of pretty printed entities may
wenzelm@62106
   165
  occasionally help to diagnose problems with operator priorities, for
wenzelm@62106
   166
  example.
wenzelm@28763
   167
wenzelm@62106
   168
  \<^descr> @{attribute names_long}, @{attribute names_short}, and @{attribute
wenzelm@62106
   169
  names_unique} control the way of printing fully qualified internal names in
wenzelm@62106
   170
  external form. See also \secref{sec:antiq} for the document antiquotation
wenzelm@62106
   171
  options of the same names.
wenzelm@42358
   172
wenzelm@62106
   173
  \<^descr> @{attribute eta_contract} controls \<open>\<eta>\<close>-contracted printing of terms.
wenzelm@28763
   174
wenzelm@62106
   175
  The \<open>\<eta>\<close>-contraction law asserts @{prop "(\<lambda>x. f x) \<equiv> f"}, provided \<open>x\<close> is not
wenzelm@62106
   176
  free in \<open>f\<close>. It asserts \<^emph>\<open>extensionality\<close> of functions: @{prop "f \<equiv> g"} if
wenzelm@62106
   177
  @{prop "f x \<equiv> g x"} for all \<open>x\<close>. Higher-order unification frequently puts
wenzelm@62106
   178
  terms into a fully \<open>\<eta>\<close>-expanded form. For example, if \<open>F\<close> has type \<open>(\<tau> \<Rightarrow> \<tau>)
wenzelm@62106
   179
  \<Rightarrow> \<tau>\<close> then its expanded form is @{term "\<lambda>h. F (\<lambda>x. h x)"}.
wenzelm@28763
   180
wenzelm@62106
   181
  Enabling @{attribute eta_contract} makes Isabelle perform \<open>\<eta>\<close>-contractions
wenzelm@62106
   182
  before printing, so that @{term "\<lambda>h. F (\<lambda>x. h x)"} appears simply as \<open>F\<close>.
wenzelm@28763
   183
wenzelm@62106
   184
  Note that the distinction between a term and its \<open>\<eta>\<close>-expanded form
wenzelm@62106
   185
  occasionally matters. While higher-order resolution and rewriting operate
wenzelm@62106
   186
  modulo \<open>\<alpha>\<beta>\<eta>\<close>-conversion, some other tools might look at terms more
wenzelm@62106
   187
  discretely.
wenzelm@28763
   188
wenzelm@62106
   189
  \<^descr> @{attribute goals_limit} controls the maximum number of subgoals to be
wenzelm@62106
   190
  printed.
wenzelm@28763
   191
wenzelm@62106
   192
  \<^descr> @{attribute show_main_goal} controls whether the main result to be proven
wenzelm@62106
   193
  should be displayed. This information might be relevant for schematic goals,
wenzelm@62106
   194
  to inspect the current claim that has been synthesized so far.
wenzelm@28763
   195
wenzelm@62106
   196
  \<^descr> @{attribute show_hyps} controls printing of implicit hypotheses of local
wenzelm@62106
   197
  facts. Normally, only those hypotheses are displayed that are \<^emph>\<open>not\<close> covered
wenzelm@62106
   198
  by the assumptions of the current context: this situation indicates a fault
wenzelm@62106
   199
  in some tool being used.
wenzelm@28763
   200
wenzelm@62106
   201
  By enabling @{attribute show_hyps}, output of \<^emph>\<open>all\<close> hypotheses can be
wenzelm@62106
   202
  enforced, which is occasionally useful for diagnostic purposes.
wenzelm@28763
   203
wenzelm@62106
   204
  \<^descr> @{attribute show_tags} controls printing of extra annotations within
wenzelm@62106
   205
  theorems, such as internal position information, or the case names being
wenzelm@62106
   206
  attached by the attribute @{attribute case_names}.
wenzelm@28765
   207
wenzelm@62106
   208
  Note that the @{attribute tagged} and @{attribute untagged} attributes
wenzelm@62106
   209
  provide low-level access to the collection of tags associated with a
wenzelm@62106
   210
  theorem.
wenzelm@28765
   211
wenzelm@62106
   212
  \<^descr> @{attribute show_question_marks} controls printing of question marks for
wenzelm@62106
   213
  schematic variables, such as \<open>?x\<close>. Only the leading question mark is
wenzelm@62106
   214
  affected, the remaining text is unchanged (including proper markup for
wenzelm@62106
   215
  schematic variables that might be relevant for user interfaces).
wenzelm@58618
   216
\<close>
wenzelm@28765
   217
wenzelm@28765
   218
wenzelm@58618
   219
subsection \<open>Alternative print modes \label{sec:print-modes}\<close>
wenzelm@46284
   220
wenzelm@58618
   221
text \<open>
wenzelm@46284
   222
  \begin{mldecls}
wenzelm@46284
   223
    @{index_ML print_mode_value: "unit -> string list"} \\
wenzelm@46284
   224
    @{index_ML Print_Mode.with_modes: "string list -> ('a -> 'b) -> 'a -> 'b"} \\
wenzelm@46284
   225
  \end{mldecls}
wenzelm@46284
   226
wenzelm@62106
   227
  The \<^emph>\<open>print mode\<close> facility allows to modify various operations for printing.
wenzelm@62106
   228
  Commands like @{command typ}, @{command term}, @{command thm} (see
wenzelm@62106
   229
  \secref{sec:print-diag}) take additional print modes as optional argument.
wenzelm@62106
   230
  The underlying ML operations are as follows.
wenzelm@46284
   231
wenzelm@62106
   232
    \<^descr> @{ML "print_mode_value ()"} yields the list of currently active print
wenzelm@62106
   233
    mode names. This should be understood as symbolic representation of
wenzelm@62106
   234
    certain individual features for printing (with precedence from left to
wenzelm@62106
   235
    right).
wenzelm@46284
   236
wenzelm@62106
   237
    \<^descr> @{ML Print_Mode.with_modes}~\<open>modes f x\<close> evaluates \<open>f x\<close> in an execution
wenzelm@62106
   238
    context where the print mode is prepended by the given \<open>modes\<close>. This
wenzelm@62106
   239
    provides a thread-safe way to augment print modes. It is also monotonic in
wenzelm@62106
   240
    the set of mode names: it retains the default print mode that certain
wenzelm@62106
   241
    user-interfaces might have installed for their proper functioning!
wenzelm@46284
   242
wenzelm@61421
   243
  \<^medskip>
wenzelm@62106
   244
  The pretty printer for inner syntax maintains alternative mixfix productions
wenzelm@62106
   245
  for any print mode name invented by the user, say in commands like @{command
wenzelm@62106
   246
  notation} or @{command abbreviation}. Mode names can be arbitrary, but the
wenzelm@62106
   247
  following ones have a specific meaning by convention:
wenzelm@46284
   248
wenzelm@62106
   249
    \<^item> \<^verbatim>\<open>""\<close> (the empty string): default mode; implicitly active as last
wenzelm@62106
   250
    element in the list of modes.
wenzelm@46284
   251
wenzelm@62106
   252
    \<^item> \<^verbatim>\<open>input\<close>: dummy print mode that is never active; may be used to specify
wenzelm@62106
   253
    notation that is only available for input.
wenzelm@46284
   254
wenzelm@62106
   255
    \<^item> \<^verbatim>\<open>internal\<close> dummy print mode that is never active; used internally in
wenzelm@62106
   256
    Isabelle/Pure.
wenzelm@46284
   257
wenzelm@62106
   258
    \<^item> \<^verbatim>\<open>ASCII\<close>: prefer ASCII art over mathematical symbols.
wenzelm@46284
   259
wenzelm@62106
   260
    \<^item> \<^verbatim>\<open>latex\<close>: additional mode that is active in {\LaTeX} document
wenzelm@62106
   261
    preparation of Isabelle theory sources; allows to provide alternative
wenzelm@62106
   262
    output notation.
wenzelm@58618
   263
\<close>
wenzelm@46284
   264
wenzelm@46284
   265
wenzelm@58618
   266
section \<open>Mixfix annotations \label{sec:mixfix}\<close>
wenzelm@28762
   267
wenzelm@62106
   268
text \<open>
wenzelm@62106
   269
  Mixfix annotations specify concrete \<^emph>\<open>inner syntax\<close> of Isabelle types and
wenzelm@62106
   270
  terms. Locally fixed parameters in toplevel theorem statements, locale and
wenzelm@62106
   271
  class specifications also admit mixfix annotations in a fairly uniform
wenzelm@62106
   272
  manner. A mixfix annotation describes the concrete syntax, the translation
wenzelm@62106
   273
  to abstract syntax, and the pretty printing. Special case annotations
wenzelm@62106
   274
  provide a simple means of specifying infix operators and binders.
wenzelm@46290
   275
wenzelm@62106
   276
  Isabelle mixfix syntax is inspired by {\OBJ} @{cite OBJ}. It allows to
wenzelm@62106
   277
  specify any context-free priority grammar, which is more general than the
wenzelm@62106
   278
  fixity declarations of ML and Prolog.
wenzelm@28762
   279
wenzelm@55112
   280
  @{rail \<open>
wenzelm@51654
   281
    @{syntax_def mixfix}: '('
wenzelm@58761
   282
      (@{syntax template} prios? @{syntax nat}? |
wenzelm@58761
   283
        (@'infix' | @'infixl' | @'infixr') @{syntax template} @{syntax nat} |
wenzelm@58761
   284
        @'binder' @{syntax template} prios? @{syntax nat} |
wenzelm@58761
   285
        @'structure') ')'
wenzelm@46290
   286
    ;
wenzelm@62807
   287
    @{syntax template}: string
wenzelm@46289
   288
    ;
wenzelm@42596
   289
    prios: '[' (@{syntax nat} + ',') ']'
wenzelm@55112
   290
  \<close>}
wenzelm@28762
   291
wenzelm@62106
   292
  The string given as \<open>template\<close> may include literal text, spacing, blocks,
wenzelm@62106
   293
  and arguments (denoted by ``\<open>_\<close>''); the special symbol ``\<^verbatim>\<open>\<index>\<close>'' (printed as
wenzelm@62106
   294
  ``\<open>\<index>\<close>'') represents an index argument that specifies an implicit @{keyword
wenzelm@62106
   295
  "structure"} reference (see also \secref{sec:locale}). Only locally fixed
wenzelm@62106
   296
  variables may be declared as @{keyword "structure"}.
wenzelm@51657
   297
wenzelm@62106
   298
  Infix and binder declarations provide common abbreviations for particular
wenzelm@62106
   299
  mixfix declarations. So in practice, mixfix templates mostly degenerate to
wenzelm@62106
   300
  literal text for concrete syntax, such as ``\<^verbatim>\<open>++\<close>'' for an infix symbol.
wenzelm@61503
   301
\<close>
wenzelm@28762
   302
wenzelm@46290
   303
wenzelm@58618
   304
subsection \<open>The general mixfix form\<close>
wenzelm@46290
   305
wenzelm@62106
   306
text \<open>
wenzelm@62106
   307
  In full generality, mixfix declarations work as follows. Suppose a constant
wenzelm@62106
   308
  \<open>c :: \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> is annotated by \<open>(mixfix [p\<^sub>1, \<dots>, p\<^sub>n] p)\<close>, where
wenzelm@62106
   309
  \<open>mixfix\<close> is a string \<open>d\<^sub>0 _ d\<^sub>1 _ \<dots> _ d\<^sub>n\<close> consisting of delimiters that
wenzelm@62106
   310
  surround argument positions as indicated by underscores.
wenzelm@28762
   311
wenzelm@62106
   312
  Altogether this determines a production for a context-free priority grammar,
wenzelm@62106
   313
  where for each argument \<open>i\<close> the syntactic category is determined by \<open>\<tau>\<^sub>i\<close>
wenzelm@62106
   314
  (with priority \<open>p\<^sub>i\<close>), and the result category is determined from \<open>\<tau>\<close> (with
wenzelm@62106
   315
  priority \<open>p\<close>). Priority specifications are optional, with default 0 for
wenzelm@62106
   316
  arguments and 1000 for the result.\<^footnote>\<open>Omitting priorities is prone to
wenzelm@62106
   317
  syntactic ambiguities unless the delimiter tokens determine fully bracketed
wenzelm@62106
   318
  notation, as in \<open>if _ then _ else _ fi\<close>.\<close>
wenzelm@28762
   319
wenzelm@62106
   320
  Since \<open>\<tau>\<close> may be again a function type, the constant type scheme may have
wenzelm@62106
   321
  more argument positions than the mixfix pattern. Printing a nested
wenzelm@62106
   322
  application \<open>c t\<^sub>1 \<dots> t\<^sub>m\<close> for \<open>m > n\<close> works by attaching concrete notation
wenzelm@62106
   323
  only to the innermost part, essentially by printing \<open>(c t\<^sub>1 \<dots> t\<^sub>n) \<dots> t\<^sub>m\<close>
wenzelm@62106
   324
  instead. If a term has fewer arguments than specified in the mixfix
wenzelm@28762
   325
  template, the concrete syntax is ignored.
wenzelm@28762
   326
wenzelm@61421
   327
  \<^medskip>
wenzelm@62106
   328
  A mixfix template may also contain additional directives for pretty
wenzelm@62106
   329
  printing, notably spaces, blocks, and breaks. The general template format is
wenzelm@62106
   330
  a sequence over any of the following entities.
wenzelm@28762
   331
wenzelm@63933
   332
  \<^descr> \<open>d\<close> is a delimiter, namely a non-empty sequence delimiter items of the
wenzelm@63933
   333
  following form:
wenzelm@63933
   334
    \<^enum> a control symbol followed by a cartouche
wenzelm@63933
   335
    \<^enum> a single symbol, excluding the following special characters:
wenzelm@63933
   336
      \<^medskip>
wenzelm@63933
   337
      \begin{tabular}{ll}
wenzelm@63933
   338
        \<^verbatim>\<open>'\<close> & single quote \\
wenzelm@63933
   339
        \<^verbatim>\<open>_\<close> & underscore \\
wenzelm@63933
   340
        \<open>\<index>\<close> & index symbol \\
wenzelm@63933
   341
        \<^verbatim>\<open>(\<close> & open parenthesis \\
wenzelm@63933
   342
        \<^verbatim>\<open>)\<close> & close parenthesis \\
wenzelm@63933
   343
        \<^verbatim>\<open>/\<close> & slash \\
wenzelm@63933
   344
        \<open>\<open> \<close>\<close> & cartouche delimiters \\
wenzelm@63933
   345
      \end{tabular}
wenzelm@63933
   346
      \<^medskip>
wenzelm@28762
   347
wenzelm@62106
   348
  \<^descr> \<^verbatim>\<open>'\<close> escapes the special meaning of these meta-characters, producing a
wenzelm@62106
   349
  literal version of the following character, unless that is a blank.
wenzelm@28771
   350
wenzelm@62106
   351
  A single quote followed by a blank separates delimiters, without affecting
wenzelm@62106
   352
  printing, but input tokens may have additional white space here.
wenzelm@28771
   353
wenzelm@62106
   354
  \<^descr> \<^verbatim>\<open>_\<close> is an argument position, which stands for a certain syntactic
wenzelm@62106
   355
  category in the underlying grammar.
wenzelm@28762
   356
wenzelm@62106
   357
  \<^descr> \<open>\<index>\<close> is an indexed argument position; this is the place where implicit
wenzelm@62106
   358
  structure arguments can be attached.
wenzelm@28762
   359
wenzelm@62106
   360
  \<^descr> \<open>s\<close> is a non-empty sequence of spaces for printing. This and the following
wenzelm@62106
   361
  specifications do not affect parsing at all.
wenzelm@62106
   362
wenzelm@62807
   363
  \<^descr> \<^verbatim>\<open>(\<close>\<open>n\<close> opens a pretty printing block. The optional natural number
wenzelm@62807
   364
  specifies the block indentation, i.e. how much spaces to add when a line
wenzelm@62807
   365
  break occurs within the block. The default indentation is 0.
wenzelm@62807
   366
wenzelm@62807
   367
  \<^descr> \<^verbatim>\<open>(\<close>\<open>\<open>properties\<close>\<close> opens a pretty printing block, with properties
wenzelm@62807
   368
  specified within the given text cartouche. The syntax and semantics of
wenzelm@62807
   369
  the category @{syntax_ref mixfix_properties} is described below.
wenzelm@28762
   370
wenzelm@61503
   371
  \<^descr> \<^verbatim>\<open>)\<close> closes a pretty printing block.
wenzelm@28762
   372
wenzelm@61503
   373
  \<^descr> \<^verbatim>\<open>//\<close> forces a line break.
wenzelm@28762
   374
wenzelm@62106
   375
  \<^descr> \<^verbatim>\<open>/\<close>\<open>s\<close> allows a line break. Here \<open>s\<close> stands for the string of spaces
wenzelm@62106
   376
  (zero or more) right after the slash. These spaces are printed if the break
wenzelm@62106
   377
  is \<^emph>\<open>not\<close> taken.
wenzelm@28762
   378
wenzelm@28762
   379
wenzelm@62807
   380
  \<^medskip>
wenzelm@62807
   381
  Block properties allow more control over the details of pretty-printed
wenzelm@62807
   382
  output. The concrete syntax is defined as follows.
wenzelm@62807
   383
wenzelm@62807
   384
  @{rail \<open>
wenzelm@62807
   385
    @{syntax_def "mixfix_properties"}: (entry *)
wenzelm@62807
   386
    ;
wenzelm@62807
   387
    entry: atom ('=' atom)?
wenzelm@62807
   388
    ;
wenzelm@63138
   389
    atom: @{syntax short_ident} | @{syntax int} | @{syntax float} | @{syntax cartouche}
wenzelm@62807
   390
  \<close>}
wenzelm@62807
   391
wenzelm@62807
   392
  Each @{syntax entry} is a name-value pair: if the value is omitted, if
wenzelm@62807
   393
  defaults to \<^verbatim>\<open>true\<close> (intended for Boolean properties). The following
wenzelm@62807
   394
  standard block properties are supported:
wenzelm@62807
   395
wenzelm@62807
   396
    \<^item> \<open>indent\<close> (natural number): the block indentation --- the same as for the
wenzelm@62807
   397
    simple syntax without block properties.
wenzelm@62807
   398
wenzelm@62807
   399
    \<^item> \<open>consistent\<close> (Boolean): this block has consistent breaks (if one break
wenzelm@62807
   400
    is taken, all breaks are taken).
wenzelm@62807
   401
wenzelm@62807
   402
    \<^item> \<open>unbreakable\<close> (Boolean): all possible breaks of the block are disabled
wenzelm@62807
   403
    (turned into spaces).
wenzelm@62807
   404
wenzelm@62807
   405
    \<^item> \<open>markup\<close> (string): the optional name of the markup node. If this is
wenzelm@62807
   406
    provided, all remaining properties are turned into its XML attributes.
wenzelm@62807
   407
    This allows to specify free-form PIDE markup, e.g.\ for specialized
wenzelm@62807
   408
    output.
wenzelm@62807
   409
wenzelm@62807
   410
  \<^medskip>
wenzelm@62807
   411
  Note that the general idea of pretty printing with blocks and breaks is
wenzelm@62807
   412
  described in @{cite "paulson-ml2"}; it goes back to @{cite "Oppen:1980"}.
wenzelm@58618
   413
\<close>
wenzelm@28762
   414
wenzelm@28762
   415
wenzelm@58618
   416
subsection \<open>Infixes\<close>
wenzelm@46290
   417
wenzelm@62106
   418
text \<open>
wenzelm@62106
   419
  Infix operators are specified by convenient short forms that abbreviate
wenzelm@62106
   420
  general mixfix annotations as follows:
wenzelm@46290
   421
wenzelm@46290
   422
  \begin{center}
wenzelm@46290
   423
  \begin{tabular}{lll}
wenzelm@46290
   424
wenzelm@61503
   425
  \<^verbatim>\<open>(\<close>@{keyword_def "infix"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close> \<open>p\<close>\<^verbatim>\<open>)\<close>
wenzelm@61493
   426
  & \<open>\<mapsto>\<close> &
wenzelm@61503
   427
  \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p + 1\<close>\<^verbatim>\<open>,\<close>~\<open>p + 1\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\
wenzelm@61503
   428
  \<^verbatim>\<open>(\<close>@{keyword_def "infixl"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close> \<open>p\<close>\<^verbatim>\<open>)\<close>
wenzelm@61493
   429
  & \<open>\<mapsto>\<close> &
wenzelm@61503
   430
  \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p\<close>\<^verbatim>\<open>,\<close>~\<open>p + 1\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\
wenzelm@61503
   431
  \<^verbatim>\<open>(\<close>@{keyword_def "infixr"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close>
wenzelm@61493
   432
  & \<open>\<mapsto>\<close> &
wenzelm@61503
   433
  \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p + 1\<close>\<^verbatim>\<open>,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\
wenzelm@46290
   434
wenzelm@46290
   435
  \end{tabular}
wenzelm@46290
   436
  \end{center}
wenzelm@46290
   437
wenzelm@62106
   438
  The mixfix template \<^verbatim>\<open>"(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)"\<close> specifies two argument positions;
wenzelm@62106
   439
  the delimiter is preceded by a space and followed by a space or line break;
wenzelm@62106
   440
  the entire phrase is a pretty printing block.
wenzelm@46290
   441
wenzelm@62106
   442
  The alternative notation \<^verbatim>\<open>op\<close>~\<open>sy\<close> is introduced in addition. Thus any
wenzelm@62106
   443
  infix operator may be written in prefix form (as in ML), independently of
wenzelm@62106
   444
  the number of arguments in the term.
wenzelm@58618
   445
\<close>
wenzelm@46290
   446
wenzelm@46290
   447
wenzelm@58618
   448
subsection \<open>Binders\<close>
wenzelm@46290
   449
wenzelm@62106
   450
text \<open>
wenzelm@62106
   451
  A \<^emph>\<open>binder\<close> is a variable-binding construct such as a quantifier. The idea
wenzelm@62106
   452
  to formalize \<open>\<forall>x. b\<close> as \<open>All (\<lambda>x. b)\<close> for \<open>All :: ('a \<Rightarrow> bool) \<Rightarrow> bool\<close>
wenzelm@62106
   453
  already goes back to @{cite church40}. Isabelle declarations of certain
wenzelm@62106
   454
  higher-order operators may be annotated with @{keyword_def "binder"}
wenzelm@62106
   455
  annotations as follows:
wenzelm@46290
   456
wenzelm@46290
   457
  \begin{center}
wenzelm@61503
   458
  \<open>c ::\<close>~\<^verbatim>\<open>"\<close>\<open>(\<tau>\<^sub>1 \<Rightarrow> \<tau>\<^sub>2) \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  (\<close>@{keyword "binder"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>" [\<close>\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>
wenzelm@46290
   459
  \end{center}
wenzelm@46290
   460
wenzelm@62106
   461
  This introduces concrete binder syntax \<open>sy x. b\<close>, where \<open>x\<close> is a bound
wenzelm@62106
   462
  variable of type \<open>\<tau>\<^sub>1\<close>, the body \<open>b\<close> has type \<open>\<tau>\<^sub>2\<close> and the whole term has
wenzelm@62106
   463
  type \<open>\<tau>\<^sub>3\<close>. The optional integer \<open>p\<close> specifies the syntactic priority of the
wenzelm@62106
   464
  body; the default is \<open>q\<close>, which is also the priority of the whole construct.
wenzelm@46290
   465
wenzelm@46290
   466
  Internally, the binder syntax is expanded to something like this:
wenzelm@46290
   467
  \begin{center}
wenzelm@61503
   468
  \<open>c_binder ::\<close>~\<^verbatim>\<open>"\<close>\<open>idts \<Rightarrow> \<tau>\<^sub>2 \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  ("(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)" [0,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>
wenzelm@46290
   469
  \end{center}
wenzelm@46290
   470
wenzelm@46290
   471
  Here @{syntax (inner) idts} is the nonterminal symbol for a list of
wenzelm@46290
   472
  identifiers with optional type constraints (see also
wenzelm@62106
   473
  \secref{sec:pure-grammar}). The mixfix template \<^verbatim>\<open>"(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)"\<close> defines
wenzelm@62106
   474
  argument positions for the bound identifiers and the body, separated by a
wenzelm@62106
   475
  dot with optional line break; the entire phrase is a pretty printing block
wenzelm@62106
   476
  of indentation level 3. Note that there is no extra space after \<open>sy\<close>, so it
wenzelm@62106
   477
  needs to be included user specification if the binder syntax ends with a
wenzelm@62106
   478
  token that may be continued by an identifier token at the start of @{syntax
wenzelm@62106
   479
  (inner) idts}.
wenzelm@46290
   480
wenzelm@62106
   481
  Furthermore, a syntax translation to transforms \<open>c_binder x\<^sub>1 \<dots> x\<^sub>n b\<close> into
wenzelm@62106
   482
  iterated application \<open>c (\<lambda>x\<^sub>1. \<dots> c (\<lambda>x\<^sub>n. b)\<dots>)\<close>. This works in both
wenzelm@62106
   483
  directions, for parsing and printing.
wenzelm@62106
   484
\<close>
wenzelm@46290
   485
wenzelm@46290
   486
wenzelm@58618
   487
section \<open>Explicit notation \label{sec:notation}\<close>
wenzelm@28762
   488
wenzelm@58618
   489
text \<open>
wenzelm@28762
   490
  \begin{matharray}{rcll}
wenzelm@61493
   491
    @{command_def "type_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
wenzelm@61493
   492
    @{command_def "no_type_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
wenzelm@61493
   493
    @{command_def "notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
wenzelm@61493
   494
    @{command_def "no_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
wenzelm@61493
   495
    @{command_def "write"} & : & \<open>proof(state) \<rightarrow> proof(state)\<close> \\
wenzelm@28762
   496
  \end{matharray}
wenzelm@28762
   497
wenzelm@62106
   498
  Commands that introduce new logical entities (terms or types) usually allow
wenzelm@62106
   499
  to provide mixfix annotations on the spot, which is convenient for default
wenzelm@62106
   500
  notation. Nonetheless, the syntax may be modified later on by declarations
wenzelm@62106
   501
  for explicit notation. This allows to add or delete mixfix annotations for
wenzelm@62106
   502
  of existing logical entities within the current context.
wenzelm@46288
   503
wenzelm@55112
   504
  @{rail \<open>
wenzelm@59783
   505
    (@@{command type_notation} | @@{command no_type_notation}) @{syntax mode}? \<newline>
wenzelm@62969
   506
      (@{syntax name} @{syntax mixfix} + @'and')
wenzelm@35413
   507
    ;
wenzelm@59783
   508
    (@@{command notation} | @@{command no_notation}) @{syntax mode}? \<newline>
wenzelm@62969
   509
      (@{syntax name} @{syntax mixfix} + @'and')
wenzelm@28762
   510
    ;
wenzelm@62969
   511
    @@{command write} @{syntax mode}? (@{syntax name} @{syntax mixfix} + @'and')
wenzelm@55112
   512
  \<close>}
wenzelm@28762
   513
wenzelm@62106
   514
  \<^descr> @{command "type_notation"}~\<open>c (mx)\<close> associates mixfix syntax with an
wenzelm@62106
   515
  existing type constructor. The arity of the constructor is retrieved from
wenzelm@62106
   516
  the context.
wenzelm@46282
   517
wenzelm@62106
   518
  \<^descr> @{command "no_type_notation"} is similar to @{command "type_notation"},
wenzelm@62106
   519
  but removes the specified syntax annotation from the present context.
wenzelm@35413
   520
wenzelm@62106
   521
  \<^descr> @{command "notation"}~\<open>c (mx)\<close> associates mixfix syntax with an existing
wenzelm@62106
   522
  constant or fixed variable. The type declaration of the given entity is
wenzelm@62106
   523
  retrieved from the context.
wenzelm@46282
   524
wenzelm@62106
   525
  \<^descr> @{command "no_notation"} is similar to @{command "notation"}, but removes
wenzelm@62106
   526
  the specified syntax annotation from the present context.
wenzelm@28762
   527
wenzelm@62106
   528
  \<^descr> @{command "write"} is similar to @{command "notation"}, but works within
wenzelm@62106
   529
  an Isar proof body.
wenzelm@58618
   530
\<close>
wenzelm@28762
   531
wenzelm@28778
   532
wenzelm@58618
   533
section \<open>The Pure syntax \label{sec:pure-syntax}\<close>
wenzelm@28769
   534
wenzelm@58618
   535
subsection \<open>Lexical matters \label{sec:inner-lex}\<close>
wenzelm@46282
   536
wenzelm@62106
   537
text \<open>
wenzelm@62106
   538
  The inner lexical syntax vaguely resembles the outer one
wenzelm@62106
   539
  (\secref{sec:outer-lex}), but some details are different. There are two main
wenzelm@62106
   540
  categories of inner syntax tokens:
wenzelm@46282
   541
wenzelm@62106
   542
  \<^enum> \<^emph>\<open>delimiters\<close> --- the literal tokens occurring in productions of the given
wenzelm@62106
   543
  priority grammar (cf.\ \secref{sec:priority-grammar});
wenzelm@46282
   544
wenzelm@61477
   545
  \<^enum> \<^emph>\<open>named tokens\<close> --- various categories of identifiers etc.
wenzelm@46282
   546
wenzelm@46282
   547
wenzelm@62106
   548
  Delimiters override named tokens and may thus render certain identifiers
wenzelm@62106
   549
  inaccessible. Sometimes the logical context admits alternative ways to refer
wenzelm@62106
   550
  to the same entity, potentially via qualified names.
wenzelm@46282
   551
wenzelm@61421
   552
  \<^medskip>
wenzelm@62106
   553
  The categories for named tokens are defined once and for all as follows,
wenzelm@62106
   554
  reusing some categories of the outer token syntax (\secref{sec:outer-lex}).
wenzelm@46282
   555
wenzelm@46282
   556
  \begin{center}
wenzelm@46282
   557
  \begin{supertabular}{rcl}
wenzelm@63138
   558
    @{syntax_def (inner) id} & = & @{syntax_ref short_ident} \\
wenzelm@63138
   559
    @{syntax_def (inner) longid} & = & @{syntax_ref long_ident} \\
wenzelm@46282
   560
    @{syntax_def (inner) var} & = & @{syntax_ref var} \\
wenzelm@63138
   561
    @{syntax_def (inner) tid} & = & @{syntax_ref type_ident} \\
wenzelm@63138
   562
    @{syntax_def (inner) tvar} & = & @{syntax_ref type_var} \\
haftmann@58410
   563
    @{syntax_def (inner) num_token} & = & @{syntax_ref nat} \\
wenzelm@61503
   564
    @{syntax_def (inner) float_token} & = & @{syntax_ref nat}\<^verbatim>\<open>.\<close>@{syntax_ref nat} \\
wenzelm@61503
   565
    @{syntax_def (inner) str_token} & = & \<^verbatim>\<open>''\<close> \<open>\<dots>\<close> \<^verbatim>\<open>''\<close> \\
wenzelm@61503
   566
    @{syntax_def (inner) string_token} & = & \<^verbatim>\<open>"\<close> \<open>\<dots>\<close> \<^verbatim>\<open>"\<close> \\
wenzelm@61493
   567
    @{syntax_def (inner) cartouche} & = & @{verbatim "\<open>"} \<open>\<dots>\<close> @{verbatim "\<close>"} \\
wenzelm@46282
   568
  \end{supertabular}
wenzelm@46282
   569
  \end{center}
wenzelm@46282
   570
wenzelm@46282
   571
  The token categories @{syntax (inner) num_token}, @{syntax (inner)
wenzelm@58421
   572
  float_token}, @{syntax (inner) str_token}, @{syntax (inner) string_token},
wenzelm@58421
   573
  and @{syntax (inner) cartouche} are not used in Pure. Object-logics may
wenzelm@58421
   574
  implement numerals and string literals by adding appropriate syntax
wenzelm@63680
   575
  declarations, together with some translation functions (e.g.\ see
wenzelm@63680
   576
  \<^file>\<open>~~/src/HOL/Tools/string_syntax.ML\<close>).
wenzelm@46282
   577
wenzelm@58421
   578
  The derived categories @{syntax_def (inner) num_const}, and @{syntax_def
wenzelm@58421
   579
  (inner) float_const}, provide robust access to the respective tokens: the
wenzelm@58421
   580
  syntax tree holds a syntactic constant instead of a free variable.
wenzelm@58618
   581
\<close>
wenzelm@46282
   582
wenzelm@46282
   583
wenzelm@58618
   584
subsection \<open>Priority grammars \label{sec:priority-grammar}\<close>
wenzelm@28769
   585
wenzelm@62106
   586
text \<open>
wenzelm@62106
   587
  A context-free grammar consists of a set of \<^emph>\<open>terminal symbols\<close>, a set of
wenzelm@62106
   588
  \<^emph>\<open>nonterminal symbols\<close> and a set of \<^emph>\<open>productions\<close>. Productions have the
wenzelm@62106
   589
  form \<open>A = \<gamma>\<close>, where \<open>A\<close> is a nonterminal and \<open>\<gamma>\<close> is a string of terminals
wenzelm@62106
   590
  and nonterminals. One designated nonterminal is called the \<^emph>\<open>root symbol\<close>.
wenzelm@62106
   591
  The language defined by the grammar consists of all strings of terminals
wenzelm@62106
   592
  that can be derived from the root symbol by applying productions as rewrite
wenzelm@62106
   593
  rules.
wenzelm@28769
   594
wenzelm@62106
   595
  The standard Isabelle parser for inner syntax uses a \<^emph>\<open>priority grammar\<close>.
wenzelm@62106
   596
  Each nonterminal is decorated by an integer priority: \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>. In a
wenzelm@62106
   597
  derivation, \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> may be rewritten using a production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> only
wenzelm@62106
   598
  if \<open>p \<le> q\<close>. Any priority grammar can be translated into a normal
wenzelm@62106
   599
  context-free grammar by introducing new nonterminals and productions.
wenzelm@28769
   600
wenzelm@61421
   601
  \<^medskip>
wenzelm@62106
   602
  Formally, a set of context free productions \<open>G\<close> induces a derivation
wenzelm@62106
   603
  relation \<open>\<longrightarrow>\<^sub>G\<close> as follows. Let \<open>\<alpha>\<close> and \<open>\<beta>\<close> denote strings of terminal or
wenzelm@62106
   604
  nonterminal symbols. Then \<open>\<alpha> A\<^sup>(\<^sup>p\<^sup>) \<beta> \<longrightarrow>\<^sub>G \<alpha> \<gamma> \<beta>\<close> holds if and only if \<open>G\<close>
wenzelm@61493
   605
  contains some production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> for \<open>p \<le> q\<close>.
wenzelm@28769
   606
wenzelm@61421
   607
  \<^medskip>
wenzelm@62106
   608
  The following grammar for arithmetic expressions demonstrates how binding
wenzelm@62106
   609
  power and associativity of operators can be enforced by priorities.
wenzelm@28769
   610
wenzelm@28769
   611
  \begin{center}
wenzelm@28769
   612
  \begin{tabular}{rclr}
wenzelm@61503
   613
  \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>(\<close> \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>)\<close> \\
wenzelm@61503
   614
  \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>0\<close> \\
wenzelm@61503
   615
  \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>+\<close> \<open>A\<^sup>(\<^sup>1\<^sup>)\<close> \\
wenzelm@61503
   616
  \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>*\<close> \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> \\
wenzelm@61503
   617
  \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>-\<close> \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \\
wenzelm@28769
   618
  \end{tabular}
wenzelm@28769
   619
  \end{center}
wenzelm@62106
   620
  The choice of priorities determines that \<^verbatim>\<open>-\<close> binds tighter than \<^verbatim>\<open>*\<close>, which
wenzelm@62106
   621
  binds tighter than \<^verbatim>\<open>+\<close>. Furthermore \<^verbatim>\<open>+\<close> associates to the left and \<^verbatim>\<open>*\<close> to
wenzelm@62106
   622
  the right.
wenzelm@28769
   623
wenzelm@61421
   624
  \<^medskip>
wenzelm@61421
   625
  For clarity, grammars obey these conventions:
wenzelm@28769
   626
wenzelm@62106
   627
    \<^item> All priorities must lie between 0 and 1000.
wenzelm@28769
   628
wenzelm@62106
   629
    \<^item> Priority 0 on the right-hand side and priority 1000 on the left-hand
wenzelm@62106
   630
    side may be omitted.
wenzelm@28769
   631
wenzelm@62106
   632
    \<^item> The production \<open>A\<^sup>(\<^sup>p\<^sup>) = \<alpha>\<close> is written as \<open>A = \<alpha> (p)\<close>, i.e.\ the
wenzelm@62106
   633
    priority of the left-hand side actually appears in a column on the far
wenzelm@62106
   634
    right.
wenzelm@28769
   635
wenzelm@62106
   636
    \<^item> Alternatives are separated by \<open>|\<close>.
wenzelm@28769
   637
wenzelm@62106
   638
    \<^item> Repetition is indicated by dots \<open>(\<dots>)\<close> in an informal but obvious way.
wenzelm@28769
   639
wenzelm@28769
   640
  Using these conventions, the example grammar specification above
wenzelm@28769
   641
  takes the form:
wenzelm@28769
   642
  \begin{center}
wenzelm@28769
   643
  \begin{tabular}{rclc}
wenzelm@61503
   644
    \<open>A\<close> & \<open>=\<close> & \<^verbatim>\<open>(\<close> \<open>A\<close> \<^verbatim>\<open>)\<close> \\
wenzelm@61503
   645
              & \<open>|\<close> & \<^verbatim>\<open>0\<close> & \qquad\qquad \\
wenzelm@61503
   646
              & \<open>|\<close> & \<open>A\<close> \<^verbatim>\<open>+\<close> \<open>A\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(0)\<close> \\
wenzelm@61503
   647
              & \<open>|\<close> & \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>*\<close> \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>(2)\<close> \\
wenzelm@61503
   648
              & \<open>|\<close> & \<^verbatim>\<open>-\<close> \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\
wenzelm@28769
   649
  \end{tabular}
wenzelm@28769
   650
  \end{center}
wenzelm@58618
   651
\<close>
wenzelm@28769
   652
wenzelm@28769
   653
wenzelm@58618
   654
subsection \<open>The Pure grammar \label{sec:pure-grammar}\<close>
wenzelm@28770
   655
wenzelm@62106
   656
text \<open>
wenzelm@62106
   657
  The priority grammar of the \<open>Pure\<close> theory is defined approximately like
wenzelm@62106
   658
  this:
wenzelm@28774
   659
wenzelm@28770
   660
  \begin{center}
wenzelm@28773
   661
  \begin{supertabular}{rclr}
wenzelm@28770
   662
wenzelm@61493
   663
  @{syntax_def (inner) any} & = & \<open>prop  |  logic\<close> \\\\
wenzelm@28772
   664
wenzelm@61503
   665
  @{syntax_def (inner) prop} & = & \<^verbatim>\<open>(\<close> \<open>prop\<close> \<^verbatim>\<open>)\<close> \\
wenzelm@61503
   666
    & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>4\<^sup>)\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(3)\<close> \\
wenzelm@61503
   667
    & \<open>|\<close> & \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>==\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(2)\<close> \\
wenzelm@61493
   668
    & \<open>|\<close> & \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> \<open>\<equiv>\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(2)\<close> \\
wenzelm@61503
   669
    & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>&&&\<close> \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>(2)\<close> \\
wenzelm@61503
   670
    & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> \<^verbatim>\<open>==>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\
wenzelm@61493
   671
    & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> \<open>\<Longrightarrow>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\
wenzelm@61503
   672
    & \<open>|\<close> & \<^verbatim>\<open>[|\<close> \<open>prop\<close> \<^verbatim>\<open>;\<close> \<open>\<dots>\<close> \<^verbatim>\<open>;\<close> \<open>prop\<close> \<^verbatim>\<open>|]\<close> \<^verbatim>\<open>==>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\
wenzelm@61503
   673
    & \<open>|\<close> & \<open>\<lbrakk>\<close> \<open>prop\<close> \<^verbatim>\<open>;\<close> \<open>\<dots>\<close> \<^verbatim>\<open>;\<close> \<open>prop\<close> \<open>\<rbrakk>\<close> \<open>\<Longrightarrow>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\
wenzelm@61503
   674
    & \<open>|\<close> & \<^verbatim>\<open>!!\<close> \<open>idts\<close> \<^verbatim>\<open>.\<close> \<open>prop\<close> & \<open>(0)\<close> \\
wenzelm@61503
   675
    & \<open>|\<close> & \<open>\<And>\<close> \<open>idts\<close> \<^verbatim>\<open>.\<close> \<open>prop\<close> & \<open>(0)\<close> \\
wenzelm@61503
   676
    & \<open>|\<close> & \<^verbatim>\<open>OFCLASS\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>logic\<close> \<^verbatim>\<open>)\<close> \\
wenzelm@61503
   677
    & \<open>|\<close> & \<^verbatim>\<open>SORT_CONSTRAINT\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\
wenzelm@61503
   678
    & \<open>|\<close> & \<^verbatim>\<open>TERM\<close> \<open>logic\<close> \\
wenzelm@61503
   679
    & \<open>|\<close> & \<^verbatim>\<open>PROP\<close> \<open>aprop\<close> \\\\
wenzelm@28772
   680
wenzelm@61503
   681
  @{syntax_def (inner) aprop} & = & \<^verbatim>\<open>(\<close> \<open>aprop\<close> \<^verbatim>\<open>)\<close> \\
wenzelm@61503
   682
    & \<open>|\<close> & \<open>id  |  longid  |  var  |\<close>~~\<^verbatim>\<open>_\<close>~~\<open>|\<close>~~\<^verbatim>\<open>...\<close> \\
wenzelm@61503
   683
    & \<open>|\<close> & \<^verbatim>\<open>CONST\<close> \<open>id  |\<close>~~\<^verbatim>\<open>CONST\<close> \<open>longid\<close> \\
wenzelm@61503
   684
    & \<open>|\<close> & \<^verbatim>\<open>XCONST\<close> \<open>id  |\<close>~~\<^verbatim>\<open>XCONST\<close> \<open>longid\<close> \\
wenzelm@61493
   685
    & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)  any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>(999)\<close> \\\\
wenzelm@28770
   686
wenzelm@61503
   687
  @{syntax_def (inner) logic} & = & \<^verbatim>\<open>(\<close> \<open>logic\<close> \<^verbatim>\<open>)\<close> \\
wenzelm@61503
   688
    & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>4\<^sup>)\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(3)\<close> \\
wenzelm@61503
   689
    & \<open>|\<close> & \<open>id  |  longid  |  var  |\<close>~~\<^verbatim>\<open>_\<close>~~\<open>|\<close>~~\<^verbatim>\<open>...\<close> \\
wenzelm@61503
   690
    & \<open>|\<close> & \<^verbatim>\<open>CONST\<close> \<open>id  |\<close>~~\<^verbatim>\<open>CONST\<close> \<open>longid\<close> \\
wenzelm@61503
   691
    & \<open>|\<close> & \<^verbatim>\<open>XCONST\<close> \<open>id  |\<close>~~\<^verbatim>\<open>XCONST\<close> \<open>longid\<close> \\
wenzelm@61493
   692
    & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)  any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>(999)\<close> \\
wenzelm@61503
   693
    & \<open>|\<close> & \<^verbatim>\<open>%\<close> \<open>pttrns\<close> \<^verbatim>\<open>.\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\
wenzelm@61503
   694
    & \<open>|\<close> & \<open>\<lambda>\<close> \<open>pttrns\<close> \<^verbatim>\<open>.\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\
wenzelm@61503
   695
    & \<open>|\<close> & \<^verbatim>\<open>op\<close> \<^verbatim>\<open>==\<close>~~\<open>|\<close>~~\<^verbatim>\<open>op\<close> \<open>\<equiv>\<close>~~\<open>|\<close>~~\<^verbatim>\<open>op\<close> \<^verbatim>\<open>&&&\<close> \\
wenzelm@61503
   696
    & \<open>|\<close> & \<^verbatim>\<open>op\<close> \<^verbatim>\<open>==>\<close>~~\<open>|\<close>~~\<^verbatim>\<open>op\<close> \<open>\<Longrightarrow>\<close> \\
wenzelm@61503
   697
    & \<open>|\<close> & \<^verbatim>\<open>TYPE\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\\\
wenzelm@28772
   698
wenzelm@61503
   699
  @{syntax_def (inner) idt} & = & \<^verbatim>\<open>(\<close> \<open>idt\<close> \<^verbatim>\<open>)\<close>~~\<open>|  id  |\<close>~~\<^verbatim>\<open>_\<close> \\
wenzelm@61503
   700
    & \<open>|\<close> & \<open>id\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(0)\<close> \\
wenzelm@61503
   701
    & \<open>|\<close> & \<^verbatim>\<open>_\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(0)\<close> \\\\
wenzelm@28772
   702
wenzelm@61503
   703
  @{syntax_def (inner) index} & = & \<^verbatim>\<open>\<^bsub>\<close> \<open>logic\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>\<^esub>\<close>~~\<open>|  |  \<index>\<close> \\\\
wenzelm@46287
   704
wenzelm@61493
   705
  @{syntax_def (inner) idts} & = & \<open>idt  |  idt\<^sup>(\<^sup>1\<^sup>) idts\<close> & \<open>(0)\<close> \\\\
wenzelm@28772
   706
wenzelm@61493
   707
  @{syntax_def (inner) pttrn} & = & \<open>idt\<close> \\\\
wenzelm@28772
   708
wenzelm@61493
   709
  @{syntax_def (inner) pttrns} & = & \<open>pttrn  |  pttrn\<^sup>(\<^sup>1\<^sup>) pttrns\<close> & \<open>(0)\<close> \\\\
wenzelm@28774
   710
wenzelm@61503
   711
  @{syntax_def (inner) type} & = & \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\
wenzelm@61503
   712
    & \<open>|\<close> & \<open>tid  |  tvar  |\<close>~~\<^verbatim>\<open>_\<close> \\
wenzelm@61503
   713
    & \<open>|\<close> & \<open>tid\<close> \<^verbatim>\<open>::\<close> \<open>sort  |  tvar\<close>~~\<^verbatim>\<open>::\<close> \<open>sort  |\<close>~~\<^verbatim>\<open>_\<close> \<^verbatim>\<open>::\<close> \<open>sort\<close> \\
wenzelm@61493
   714
    & \<open>|\<close> & \<open>type_name  |  type\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) type_name\<close> \\
wenzelm@61503
   715
    & \<open>|\<close> & \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \<open>type_name\<close> \\
wenzelm@61503
   716
    & \<open>|\<close> & \<open>type\<^sup>(\<^sup>1\<^sup>)\<close> \<^verbatim>\<open>=>\<close> \<open>type\<close> & \<open>(0)\<close> \\
wenzelm@61493
   717
    & \<open>|\<close> & \<open>type\<^sup>(\<^sup>1\<^sup>)\<close> \<open>\<Rightarrow>\<close> \<open>type\<close> & \<open>(0)\<close> \\
wenzelm@61503
   718
    & \<open>|\<close> & \<^verbatim>\<open>[\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>]\<close> \<^verbatim>\<open>=>\<close> \<open>type\<close> & \<open>(0)\<close> \\
wenzelm@61503
   719
    & \<open>|\<close> & \<^verbatim>\<open>[\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>]\<close> \<open>\<Rightarrow>\<close> \<open>type\<close> & \<open>(0)\<close> \\
wenzelm@61493
   720
  @{syntax_def (inner) type_name} & = & \<open>id  |  longid\<close> \\\\
wenzelm@28772
   721
wenzelm@61503
   722
  @{syntax_def (inner) sort} & = & @{syntax class_name}~~\<open>|\<close>~~\<^verbatim>\<open>{}\<close> \\
wenzelm@61503
   723
    & \<open>|\<close> & \<^verbatim>\<open>{\<close> @{syntax class_name} \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> @{syntax class_name} \<^verbatim>\<open>}\<close> \\
wenzelm@61493
   724
  @{syntax_def (inner) class_name} & = & \<open>id  |  longid\<close> \\
wenzelm@28773
   725
  \end{supertabular}
wenzelm@28770
   726
  \end{center}
wenzelm@28770
   727
wenzelm@61421
   728
  \<^medskip>
wenzelm@62106
   729
  Here literal terminals are printed \<^verbatim>\<open>verbatim\<close>; see also
wenzelm@62106
   730
  \secref{sec:inner-lex} for further token categories of the inner syntax. The
wenzelm@62106
   731
  meaning of the nonterminals defined by the above grammar is as follows:
wenzelm@28770
   732
wenzelm@61439
   733
  \<^descr> @{syntax_ref (inner) any} denotes any term.
wenzelm@28770
   734
wenzelm@62106
   735
  \<^descr> @{syntax_ref (inner) prop} denotes meta-level propositions, which are
wenzelm@62106
   736
  terms of type @{typ prop}. The syntax of such formulae of the meta-logic is
wenzelm@62106
   737
  carefully distinguished from usual conventions for object-logics. In
wenzelm@62106
   738
  particular, plain \<open>\<lambda>\<close>-term notation is \<^emph>\<open>not\<close> recognized as @{syntax (inner)
wenzelm@62106
   739
  prop}.
wenzelm@28770
   740
wenzelm@62106
   741
  \<^descr> @{syntax_ref (inner) aprop} denotes atomic propositions, which are
wenzelm@62106
   742
  embedded into regular @{syntax (inner) prop} by means of an explicit \<^verbatim>\<open>PROP\<close>
wenzelm@62106
   743
  token.
wenzelm@28770
   744
wenzelm@62106
   745
  Terms of type @{typ prop} with non-constant head, e.g.\ a plain variable,
wenzelm@62106
   746
  are printed in this form. Constants that yield type @{typ prop} are expected
wenzelm@62106
   747
  to provide their own concrete syntax; otherwise the printed version will
wenzelm@62106
   748
  appear like @{syntax (inner) logic} and cannot be parsed again as @{syntax
wenzelm@62106
   749
  (inner) prop}.
wenzelm@28770
   750
wenzelm@62106
   751
  \<^descr> @{syntax_ref (inner) logic} denotes arbitrary terms of a logical type,
wenzelm@62106
   752
  excluding type @{typ prop}. This is the main syntactic category of
wenzelm@62106
   753
  object-logic entities, covering plain \<open>\<lambda>\<close>-term notation (variables,
wenzelm@62106
   754
  abstraction, application), plus anything defined by the user.
wenzelm@28770
   755
wenzelm@62106
   756
  When specifying notation for logical entities, all logical types (excluding
wenzelm@62106
   757
  @{typ prop}) are \<^emph>\<open>collapsed\<close> to this single category of @{syntax (inner)
wenzelm@62106
   758
  logic}.
wenzelm@28770
   759
wenzelm@62106
   760
  \<^descr> @{syntax_ref (inner) index} denotes an optional index term for indexed
wenzelm@62106
   761
  syntax. If omitted, it refers to the first @{keyword_ref "structure"}
wenzelm@62106
   762
  variable in the context. The special dummy ``\<open>\<index>\<close>'' serves as pattern
wenzelm@62106
   763
  variable in mixfix annotations that introduce indexed notation.
wenzelm@46287
   764
wenzelm@62106
   765
  \<^descr> @{syntax_ref (inner) idt} denotes identifiers, possibly constrained by
wenzelm@62106
   766
  types.
wenzelm@28770
   767
wenzelm@62106
   768
  \<^descr> @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref (inner)
wenzelm@62106
   769
  idt}. This is the most basic category for variables in iterated binders,
wenzelm@62106
   770
  such as \<open>\<lambda>\<close> or \<open>\<And>\<close>.
wenzelm@28770
   771
wenzelm@62106
   772
  \<^descr> @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns} denote
wenzelm@62106
   773
  patterns for abstraction, cases bindings etc. In Pure, these categories
wenzelm@62106
   774
  start as a merely copy of @{syntax (inner) idt} and @{syntax (inner) idts},
wenzelm@62106
   775
  respectively. Object-logics may add additional productions for binding
wenzelm@62106
   776
  forms.
wenzelm@28770
   777
wenzelm@61439
   778
  \<^descr> @{syntax_ref (inner) type} denotes types of the meta-logic.
wenzelm@28770
   779
wenzelm@61439
   780
  \<^descr> @{syntax_ref (inner) sort} denotes meta-level sorts.
wenzelm@28770
   781
wenzelm@28770
   782
wenzelm@28774
   783
  Here are some further explanations of certain syntax features.
wenzelm@28773
   784
wenzelm@62106
   785
  \<^item> In @{syntax (inner) idts}, note that \<open>x :: nat y\<close> is parsed as \<open>x :: (nat
wenzelm@62106
   786
  y)\<close>, treating \<open>y\<close> like a type constructor applied to \<open>nat\<close>. To avoid this
wenzelm@62106
   787
  interpretation, write \<open>(x :: nat) y\<close> with explicit parentheses.
wenzelm@28773
   788
wenzelm@62106
   789
  \<^item> Similarly, \<open>x :: nat y :: nat\<close> is parsed as \<open>x :: (nat y :: nat)\<close>. The
wenzelm@62106
   790
  correct form is \<open>(x :: nat) (y :: nat)\<close>, or \<open>(x :: nat) y :: nat\<close> if \<open>y\<close> is
wenzelm@62106
   791
  last in the sequence of identifiers.
wenzelm@28773
   792
wenzelm@62106
   793
  \<^item> Type constraints for terms bind very weakly. For example, \<open>x < y :: nat\<close>
wenzelm@62106
   794
  is normally parsed as \<open>(x < y) :: nat\<close>, unless \<open><\<close> has a very low priority,
wenzelm@62106
   795
  in which case the input is likely to be ambiguous. The correct form is \<open>x <
wenzelm@62106
   796
  (y :: nat)\<close>.
wenzelm@28773
   797
wenzelm@61421
   798
  \<^item> Dummy variables (written as underscore) may occur in different
wenzelm@28774
   799
  roles.
wenzelm@28773
   800
wenzelm@62106
   801
    \<^descr> A type ``\<open>_\<close>'' or ``\<open>_ :: sort\<close>'' acts like an anonymous inference
wenzelm@62106
   802
    parameter, which is filled-in according to the most general type produced
wenzelm@62106
   803
    by the type-checking phase.
wenzelm@28770
   804
wenzelm@62106
   805
    \<^descr> A bound ``\<open>_\<close>'' refers to a vacuous abstraction, where the body does not
wenzelm@62106
   806
    refer to the binding introduced here. As in the term @{term "\<lambda>x _. x"},
wenzelm@62106
   807
    which is \<open>\<alpha>\<close>-equivalent to \<open>\<lambda>x y. x\<close>.
wenzelm@28773
   808
wenzelm@62106
   809
    \<^descr> A free ``\<open>_\<close>'' refers to an implicit outer binding. Higher definitional
wenzelm@62106
   810
    packages usually allow forms like \<open>f x _ = x\<close>.
wenzelm@28773
   811
wenzelm@62106
   812
    \<^descr> A schematic ``\<open>_\<close>'' (within a term pattern, see \secref{sec:term-decls})
wenzelm@62106
   813
    refers to an anonymous variable that is implicitly abstracted over its
wenzelm@62106
   814
    context of locally bound variables. For example, this allows pattern
wenzelm@62106
   815
    matching of \<open>{x. f x = g x}\<close> against \<open>{x. _ = _}\<close>, or even \<open>{_. _ = _}\<close> by
wenzelm@61458
   816
    using both bound and schematic dummies.
wenzelm@28773
   817
wenzelm@62106
   818
  \<^descr> The three literal dots ``\<^verbatim>\<open>...\<close>'' may be also written as ellipsis symbol
wenzelm@62106
   819
  \<^verbatim>\<open>\<dots>\<close>. In both cases this refers to a special schematic variable, which is
wenzelm@62106
   820
  bound in the context. This special term abbreviation works nicely with
wenzelm@28774
   821
  calculational reasoning (\secref{sec:calculation}).
wenzelm@28774
   822
wenzelm@62106
   823
  \<^descr> \<^verbatim>\<open>CONST\<close> ensures that the given identifier is treated as constant term,
wenzelm@62106
   824
  and passed through the parse tree in fully internalized form. This is
wenzelm@62106
   825
  particularly relevant for translation rules (\secref{sec:syn-trans}),
wenzelm@62106
   826
  notably on the RHS.
wenzelm@46287
   827
wenzelm@62106
   828
  \<^descr> \<^verbatim>\<open>XCONST\<close> is similar to \<^verbatim>\<open>CONST\<close>, but retains the constant name as given.
wenzelm@62106
   829
  This is only relevant to translation rules (\secref{sec:syn-trans}), notably
wenzelm@62106
   830
  on the LHS.
wenzelm@58618
   831
\<close>
wenzelm@28770
   832
wenzelm@28777
   833
wenzelm@58618
   834
subsection \<open>Inspecting the syntax\<close>
wenzelm@28777
   835
wenzelm@58618
   836
text \<open>
wenzelm@46282
   837
  \begin{matharray}{rcl}
wenzelm@61493
   838
    @{command_def "print_syntax"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
wenzelm@46282
   839
  \end{matharray}
wenzelm@28777
   840
wenzelm@62106
   841
  \<^descr> @{command "print_syntax"} prints the inner syntax of the current context.
wenzelm@62106
   842
  The output can be quite large; the most important sections are explained
wenzelm@62106
   843
  below.
wenzelm@28777
   844
wenzelm@62106
   845
    \<^descr> \<open>lexicon\<close> lists the delimiters of the inner token language; see
wenzelm@62106
   846
    \secref{sec:inner-lex}.
wenzelm@28777
   847
wenzelm@62106
   848
    \<^descr> \<open>prods\<close> lists the productions of the underlying priority grammar; see
wenzelm@62106
   849
    \secref{sec:priority-grammar}.
wenzelm@28777
   850
wenzelm@62106
   851
    The nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is rendered in plain text as \<open>A[p]\<close>; delimiters
wenzelm@62106
   852
    are quoted. Many productions have an extra \<open>\<dots> => name\<close>. These names later
wenzelm@62106
   853
    become the heads of parse trees; they also guide the pretty printer.
wenzelm@62106
   854
wenzelm@62106
   855
    Productions without such parse tree names are called \<^emph>\<open>copy productions\<close>.
wenzelm@62106
   856
    Their right-hand side must have exactly one nonterminal symbol (or named
wenzelm@62106
   857
    token). The parser does not create a new parse tree node for copy
wenzelm@62106
   858
    productions, but simply returns the parse tree of the right-hand symbol.
wenzelm@46282
   859
wenzelm@61458
   860
    If the right-hand side of a copy production consists of a single
wenzelm@61477
   861
    nonterminal without any delimiters, then it is called a \<^emph>\<open>chain
wenzelm@62106
   862
    production\<close>. Chain productions act as abbreviations: conceptually, they
wenzelm@62106
   863
    are removed from the grammar by adding new productions. Priority
wenzelm@62106
   864
    information attached to chain productions is ignored; only the dummy value
wenzelm@62106
   865
    \<open>-1\<close> is displayed.
wenzelm@46282
   866
wenzelm@62106
   867
    \<^descr> \<open>print modes\<close> lists the alternative print modes provided by this
wenzelm@62106
   868
    grammar; see \secref{sec:print-modes}.
wenzelm@28777
   869
wenzelm@62106
   870
    \<^descr> \<open>parse_rules\<close> and \<open>print_rules\<close> relate to syntax translations (macros);
wenzelm@62106
   871
    see \secref{sec:syn-trans}.
wenzelm@46282
   872
wenzelm@62106
   873
    \<^descr> \<open>parse_ast_translation\<close> and \<open>print_ast_translation\<close> list sets of
wenzelm@62106
   874
    constants that invoke translation functions for abstract syntax trees,
wenzelm@62106
   875
    which are only required in very special situations; see
wenzelm@62106
   876
    \secref{sec:tr-funs}.
wenzelm@28777
   877
wenzelm@62106
   878
    \<^descr> \<open>parse_translation\<close> and \<open>print_translation\<close> list the sets of constants
wenzelm@62106
   879
    that invoke regular translation functions; see \secref{sec:tr-funs}.
wenzelm@58618
   880
\<close>
wenzelm@28774
   881
wenzelm@28770
   882
wenzelm@58618
   883
subsection \<open>Ambiguity of parsed expressions\<close>
wenzelm@46291
   884
wenzelm@58618
   885
text \<open>
wenzelm@46291
   886
  \begin{tabular}{rcll}
wenzelm@61493
   887
    @{attribute_def syntax_ambiguity_warning} & : & \<open>attribute\<close> & default \<open>true\<close> \\
wenzelm@61493
   888
    @{attribute_def syntax_ambiguity_limit} & : & \<open>attribute\<close> & default \<open>10\<close> \\
wenzelm@46291
   889
  \end{tabular}
wenzelm@46291
   890
wenzelm@62106
   891
  Depending on the grammar and the given input, parsing may be ambiguous.
wenzelm@62106
   892
  Isabelle lets the Earley parser enumerate all possible parse trees, and then
wenzelm@62106
   893
  tries to make the best out of the situation. Terms that cannot be
wenzelm@62106
   894
  type-checked are filtered out, which often leads to a unique result in the
wenzelm@62106
   895
  end. Unlike regular type reconstruction, which is applied to the whole
wenzelm@62106
   896
  collection of input terms simultaneously, the filtering stage only treats
wenzelm@62106
   897
  each given term in isolation. Filtering is also not attempted for individual
wenzelm@46291
   898
  types or raw ASTs (as required for @{command translations}).
wenzelm@46291
   899
wenzelm@62106
   900
  Certain warning or error messages are printed, depending on the situation
wenzelm@62106
   901
  and the given configuration options. Parsing ultimately fails, if multiple
wenzelm@62106
   902
  results remain after the filtering phase.
wenzelm@46291
   903
wenzelm@62106
   904
  \<^descr> @{attribute syntax_ambiguity_warning} controls output of explicit warning
wenzelm@62106
   905
  messages about syntax ambiguity.
wenzelm@46291
   906
wenzelm@62106
   907
  \<^descr> @{attribute syntax_ambiguity_limit} determines the number of resulting
wenzelm@62106
   908
  parse trees that are shown as part of the printed message in case of an
wenzelm@62106
   909
  ambiguity.
wenzelm@58618
   910
\<close>
wenzelm@46291
   911
wenzelm@46291
   912
wenzelm@58618
   913
section \<open>Syntax transformations \label{sec:syntax-transformations}\<close>
wenzelm@48113
   914
wenzelm@62106
   915
text \<open>
wenzelm@62106
   916
  The inner syntax engine of Isabelle provides separate mechanisms to
wenzelm@62106
   917
  transform parse trees either via rewrite systems on first-order ASTs
wenzelm@62106
   918
  (\secref{sec:syn-trans}), or ML functions on ASTs or syntactic \<open>\<lambda>\<close>-terms
wenzelm@62106
   919
  (\secref{sec:tr-funs}). This works both for parsing and printing, as
wenzelm@62106
   920
  outlined in \figref{fig:parse-print}.
wenzelm@48113
   921
wenzelm@48113
   922
  \begin{figure}[htbp]
wenzelm@48113
   923
  \begin{center}
wenzelm@48113
   924
  \begin{tabular}{cl}
wenzelm@48113
   925
  string          & \\
wenzelm@61493
   926
  \<open>\<down>\<close>     & lexer + parser \\
wenzelm@48113
   927
  parse tree      & \\
wenzelm@61493
   928
  \<open>\<down>\<close>     & parse AST translation \\
wenzelm@48113
   929
  AST             & \\
wenzelm@61493
   930
  \<open>\<down>\<close>     & AST rewriting (macros) \\
wenzelm@48113
   931
  AST             & \\
wenzelm@61493
   932
  \<open>\<down>\<close>     & parse translation \\
wenzelm@48113
   933
  --- pre-term ---    & \\
wenzelm@61493
   934
  \<open>\<down>\<close>     & print translation \\
wenzelm@48113
   935
  AST             & \\
wenzelm@61493
   936
  \<open>\<down>\<close>     & AST rewriting (macros) \\
wenzelm@48113
   937
  AST             & \\
wenzelm@61493
   938
  \<open>\<down>\<close>     & print AST translation \\
wenzelm@48113
   939
  string          &
wenzelm@48113
   940
  \end{tabular}
wenzelm@48113
   941
  \end{center}
wenzelm@48113
   942
  \caption{Parsing and printing with translations}\label{fig:parse-print}
wenzelm@48113
   943
  \end{figure}
wenzelm@48113
   944
wenzelm@62106
   945
  These intermediate syntax tree formats eventually lead to a pre-term with
wenzelm@62106
   946
  all names and binding scopes resolved, but most type information still
wenzelm@62106
   947
  missing. Explicit type constraints might be given by the user, or implicit
wenzelm@62106
   948
  position information by the system --- both need to be passed-through
wenzelm@62106
   949
  carefully by syntax transformations.
wenzelm@48113
   950
wenzelm@62106
   951
  Pre-terms are further processed by the so-called \<^emph>\<open>check\<close> and \<^emph>\<open>uncheck\<close>
wenzelm@62106
   952
  phases that are intertwined with type-inference (see also @{cite
wenzelm@62106
   953
  "isabelle-implementation"}). The latter allows to operate on higher-order
wenzelm@62106
   954
  abstract syntax with proper binding and type information already available.
wenzelm@48113
   955
wenzelm@62106
   956
  As a rule of thumb, anything that manipulates bindings of variables or
wenzelm@62106
   957
  constants needs to be implemented as syntax transformation (see below).
wenzelm@62106
   958
  Anything else is better done via check/uncheck: a prominent example
wenzelm@62106
   959
  application is the @{command abbreviation} concept of Isabelle/Pure.
wenzelm@62106
   960
\<close>
wenzelm@48113
   961
wenzelm@48113
   962
wenzelm@58618
   963
subsection \<open>Abstract syntax trees \label{sec:ast}\<close>
wenzelm@48113
   964
wenzelm@62106
   965
text \<open>
wenzelm@62106
   966
  The ML datatype @{ML_type Ast.ast} explicitly represents the intermediate
wenzelm@62106
   967
  AST format that is used for syntax rewriting (\secref{sec:syn-trans}). It is
wenzelm@62106
   968
  defined in ML as follows:
wenzelm@61408
   969
  @{verbatim [display]
wenzelm@61408
   970
\<open>datatype ast =
wenzelm@61408
   971
  Constant of string |
wenzelm@61408
   972
  Variable of string |
wenzelm@61408
   973
  Appl of ast list\<close>}
wenzelm@48114
   974
wenzelm@62106
   975
  An AST is either an atom (constant or variable) or a list of (at least two)
wenzelm@62106
   976
  subtrees. Occasional diagnostic output of ASTs uses notation that resembles
wenzelm@62106
   977
  S-expression of LISP. Constant atoms are shown as quoted strings, variable
wenzelm@62106
   978
  atoms as non-quoted strings and applications as a parenthesized list of
wenzelm@62106
   979
  subtrees. For example, the AST
wenzelm@58724
   980
  @{ML [display] \<open>Ast.Appl [Ast.Constant "_abs", Ast.Variable "x", Ast.Variable "t"]\<close>}
wenzelm@62106
   981
  is pretty-printed as \<^verbatim>\<open>("_abs" x t)\<close>. Note that \<^verbatim>\<open>()\<close> and \<^verbatim>\<open>(x)\<close> are
wenzelm@62106
   982
  excluded as ASTs, because they have too few subtrees.
wenzelm@48114
   983
wenzelm@61421
   984
  \<^medskip>
wenzelm@62106
   985
  AST application is merely a pro-forma mechanism to indicate certain
wenzelm@62106
   986
  syntactic structures. Thus \<^verbatim>\<open>(c a b)\<close> could mean either term application or
wenzelm@62106
   987
  type application, depending on the syntactic context.
wenzelm@48114
   988
wenzelm@62106
   989
  Nested application like \<^verbatim>\<open>(("_abs" x t) u)\<close> is also possible, but ASTs are
wenzelm@62106
   990
  definitely first-order: the syntax constant \<^verbatim>\<open>"_abs"\<close> does not bind the \<^verbatim>\<open>x\<close>
wenzelm@62106
   991
  in any way. Proper bindings are introduced in later stages of the term
wenzelm@62106
   992
  syntax, where \<^verbatim>\<open>("_abs" x t)\<close> becomes an @{ML Abs} node and occurrences of
wenzelm@62106
   993
  \<^verbatim>\<open>x\<close> in \<^verbatim>\<open>t\<close> are replaced by bound variables (represented as de-Bruijn
wenzelm@62106
   994
  indices).
wenzelm@58618
   995
\<close>
wenzelm@48113
   996
wenzelm@48113
   997
wenzelm@58618
   998
subsubsection \<open>AST constants versus variables\<close>
wenzelm@48114
   999
wenzelm@62106
  1000
text \<open>
wenzelm@62106
  1001
  Depending on the situation --- input syntax, output syntax, translation
wenzelm@62106
  1002
  patterns --- the distinction of atomic ASTs as @{ML Ast.Constant} versus
wenzelm@62106
  1003
  @{ML Ast.Variable} serves slightly different purposes.
wenzelm@48114
  1004
wenzelm@62106
  1005
  Input syntax of a term such as \<open>f a b = c\<close> does not yet indicate the scopes
wenzelm@62106
  1006
  of atomic entities \<open>f, a, b, c\<close>: they could be global constants or local
wenzelm@62106
  1007
  variables, even bound ones depending on the context of the term. @{ML
wenzelm@62106
  1008
  Ast.Variable} leaves this choice still open: later syntax layers (or
wenzelm@62106
  1009
  translation functions) may capture such a variable to determine its role
wenzelm@62106
  1010
  specifically, to make it a constant, bound variable, free variable etc. In
wenzelm@62106
  1011
  contrast, syntax translations that introduce already known constants would
wenzelm@62106
  1012
  rather do it via @{ML Ast.Constant} to prevent accidental re-interpretation
wenzelm@62106
  1013
  later on.
wenzelm@48114
  1014
wenzelm@62106
  1015
  Output syntax turns term constants into @{ML Ast.Constant} and variables
wenzelm@62106
  1016
  (free or schematic) into @{ML Ast.Variable}. This information is precise
wenzelm@62106
  1017
  when printing fully formal \<open>\<lambda>\<close>-terms.
wenzelm@48114
  1018
wenzelm@61421
  1019
  \<^medskip>
wenzelm@62106
  1020
  AST translation patterns (\secref{sec:syn-trans}) that represent terms
wenzelm@62106
  1021
  cannot distinguish constants and variables syntactically. Explicit
wenzelm@62106
  1022
  indication of \<open>CONST c\<close> inside the term language is required, unless \<open>c\<close> is
wenzelm@62106
  1023
  known as special \<^emph>\<open>syntax constant\<close> (see also @{command syntax}). It is also
wenzelm@62106
  1024
  possible to use @{command syntax} declarations (without mixfix annotation)
wenzelm@62106
  1025
  to enforce that certain unqualified names are always treated as constant
wenzelm@62106
  1026
  within the syntax machinery.
wenzelm@48114
  1027
wenzelm@62106
  1028
  The situation is simpler for ASTs that represent types or sorts, since the
wenzelm@62106
  1029
  concrete syntax already distinguishes type variables from type constants
wenzelm@62106
  1030
  (constructors). So \<open>('a, 'b) foo\<close> corresponds to an AST application of some
wenzelm@62106
  1031
  constant for \<open>foo\<close> and variable arguments for \<open>'a\<close> and \<open>'b\<close>. Note that the
wenzelm@62106
  1032
  postfix application is merely a feature of the concrete syntax, while in the
wenzelm@62106
  1033
  AST the constructor occurs in head position.
wenzelm@62106
  1034
\<close>
wenzelm@48114
  1035
wenzelm@48114
  1036
wenzelm@58618
  1037
subsubsection \<open>Authentic syntax names\<close>
wenzelm@48114
  1038
wenzelm@62106
  1039
text \<open>
wenzelm@62106
  1040
  Naming constant entities within ASTs is another delicate issue. Unqualified
wenzelm@62106
  1041
  names are resolved in the name space tables in the last stage of parsing,
wenzelm@62106
  1042
  after all translations have been applied. Since syntax transformations do
wenzelm@62106
  1043
  not know about this later name resolution, there can be surprises in
wenzelm@62106
  1044
  boundary cases.
wenzelm@48114
  1045
wenzelm@62106
  1046
  \<^emph>\<open>Authentic syntax names\<close> for @{ML Ast.Constant} avoid this problem: the
wenzelm@62106
  1047
  fully-qualified constant name with a special prefix for its formal category
wenzelm@62106
  1048
  (\<open>class\<close>, \<open>type\<close>, \<open>const\<close>, \<open>fixed\<close>) represents the information faithfully
wenzelm@62106
  1049
  within the untyped AST format. Accidental overlap with free or bound
wenzelm@62106
  1050
  variables is excluded as well. Authentic syntax names work implicitly in the
wenzelm@62106
  1051
  following situations:
wenzelm@48114
  1052
wenzelm@62106
  1053
    \<^item> Input of term constants (or fixed variables) that are introduced by
wenzelm@62106
  1054
    concrete syntax via @{command notation}: the correspondence of a
wenzelm@62106
  1055
    particular grammar production to some known term entity is preserved.
wenzelm@48114
  1056
wenzelm@62106
  1057
    \<^item> Input of type constants (constructors) and type classes --- thanks to
wenzelm@62106
  1058
    explicit syntactic distinction independently on the context.
wenzelm@48114
  1059
wenzelm@62106
  1060
    \<^item> Output of term constants, type constants, type classes --- this
wenzelm@62106
  1061
    information is already available from the internal term to be printed.
wenzelm@48114
  1062
wenzelm@62106
  1063
  In other words, syntax transformations that operate on input terms written
wenzelm@62106
  1064
  as prefix applications are difficult to make robust. Luckily, this case
wenzelm@62106
  1065
  rarely occurs in practice, because syntax forms to be translated usually
wenzelm@62106
  1066
  correspond to some concrete notation.
wenzelm@62106
  1067
\<close>
wenzelm@48114
  1068
wenzelm@48114
  1069
wenzelm@58618
  1070
subsection \<open>Raw syntax and translations \label{sec:syn-trans}\<close>
wenzelm@28762
  1071
wenzelm@58618
  1072
text \<open>
wenzelm@48117
  1073
  \begin{tabular}{rcll}
wenzelm@61493
  1074
    @{command_def "nonterminal"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1075
    @{command_def "syntax"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1076
    @{command_def "no_syntax"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1077
    @{command_def "translations"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1078
    @{command_def "no_translations"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1079
    @{attribute_def syntax_ast_trace} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@61493
  1080
    @{attribute_def syntax_ast_stats} & : & \<open>attribute\<close> & default \<open>false\<close> \\
wenzelm@48117
  1081
  \end{tabular}
wenzelm@61421
  1082
  \<^medskip>
wenzelm@59783
  1083
wenzelm@62106
  1084
  Unlike mixfix notation for existing formal entities (\secref{sec:notation}),
wenzelm@62106
  1085
  raw syntax declarations provide full access to the priority grammar of the
wenzelm@62106
  1086
  inner syntax, without any sanity checks. This includes additional syntactic
wenzelm@62106
  1087
  categories (via @{command nonterminal}) and free-form grammar productions
wenzelm@62106
  1088
  (via @{command syntax}). Additional syntax translations (or macros, via
wenzelm@62106
  1089
  @{command translations}) are required to turn resulting parse trees into
wenzelm@62106
  1090
  proper representations of formal entities again.
wenzelm@46292
  1091
wenzelm@55112
  1092
  @{rail \<open>
wenzelm@42596
  1093
    @@{command nonterminal} (@{syntax name} + @'and')
wenzelm@28762
  1094
    ;
wenzelm@46494
  1095
    (@@{command syntax} | @@{command no_syntax}) @{syntax mode}? (constdecl +)
wenzelm@28762
  1096
    ;
wenzelm@42596
  1097
    (@@{command translations} | @@{command no_translations})
wenzelm@42596
  1098
      (transpat ('==' | '=>' | '<=' | '\<rightleftharpoons>' | '\<rightharpoonup>' | '\<leftharpoondown>') transpat +)
wenzelm@28762
  1099
    ;
wenzelm@28762
  1100
wenzelm@46494
  1101
    constdecl: @{syntax name} '::' @{syntax type} @{syntax mixfix}?
wenzelm@46494
  1102
    ;
wenzelm@42596
  1103
    mode: ('(' ( @{syntax name} | @'output' | @{syntax name} @'output' ) ')')
wenzelm@28762
  1104
    ;
wenzelm@62969
  1105
    transpat: ('(' @{syntax name} ')')? @{syntax string}
wenzelm@55112
  1106
  \<close>}
wenzelm@28762
  1107
wenzelm@62106
  1108
  \<^descr> @{command "nonterminal"}~\<open>c\<close> declares a type constructor \<open>c\<close> (without
wenzelm@62106
  1109
  arguments) to act as purely syntactic type: a nonterminal symbol of the
wenzelm@62106
  1110
  inner syntax.
wenzelm@28762
  1111
wenzelm@62106
  1112
  \<^descr> @{command "syntax"}~\<open>(mode) c :: \<sigma> (mx)\<close> augments the priority grammar and
wenzelm@62106
  1113
  the pretty printer table for the given print mode (default \<^verbatim>\<open>""\<close>). An
wenzelm@62106
  1114
  optional keyword @{keyword_ref "output"} means that only the pretty printer
wenzelm@62106
  1115
  table is affected.
wenzelm@46292
  1116
wenzelm@62106
  1117
  Following \secref{sec:mixfix}, the mixfix annotation \<open>mx = template ps q\<close>
wenzelm@62106
  1118
  together with type \<open>\<sigma> = \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> and specify a grammar production.
wenzelm@62106
  1119
  The \<open>template\<close> contains delimiter tokens that surround \<open>n\<close> argument
wenzelm@62106
  1120
  positions (\<^verbatim>\<open>_\<close>). The latter correspond to nonterminal symbols \<open>A\<^sub>i\<close> derived
wenzelm@62106
  1121
  from the argument types \<open>\<tau>\<^sub>i\<close> as follows:
wenzelm@46292
  1122
wenzelm@61493
  1123
    \<^item> \<open>prop\<close> if \<open>\<tau>\<^sub>i = prop\<close>
wenzelm@46292
  1124
wenzelm@62106
  1125
    \<^item> \<open>logic\<close> if \<open>\<tau>\<^sub>i = (\<dots>)\<kappa>\<close> for logical type constructor \<open>\<kappa> \<noteq> prop\<close>
wenzelm@46292
  1126
wenzelm@61493
  1127
    \<^item> \<open>any\<close> if \<open>\<tau>\<^sub>i = \<alpha>\<close> for type variables
wenzelm@46292
  1128
wenzelm@62106
  1129
    \<^item> \<open>\<kappa>\<close> if \<open>\<tau>\<^sub>i = \<kappa>\<close> for nonterminal \<open>\<kappa>\<close> (syntactic type constructor)
wenzelm@46292
  1130
wenzelm@62106
  1131
  Each \<open>A\<^sub>i\<close> is decorated by priority \<open>p\<^sub>i\<close> from the given list \<open>ps\<close>; missing
wenzelm@62106
  1132
  priorities default to 0.
wenzelm@46292
  1133
wenzelm@62106
  1134
  The resulting nonterminal of the production is determined similarly from
wenzelm@62106
  1135
  type \<open>\<tau>\<close>, with priority \<open>q\<close> and default 1000.
wenzelm@46292
  1136
wenzelm@61421
  1137
  \<^medskip>
wenzelm@62106
  1138
  Parsing via this production produces parse trees \<open>t\<^sub>1, \<dots>, t\<^sub>n\<close> for the
wenzelm@62106
  1139
  argument slots. The resulting parse tree is composed as \<open>c t\<^sub>1 \<dots> t\<^sub>n\<close>, by
wenzelm@62106
  1140
  using the syntax constant \<open>c\<close> of the syntax declaration.
wenzelm@46292
  1141
wenzelm@62106
  1142
  Such syntactic constants are invented on the spot, without formal check
wenzelm@62106
  1143
  wrt.\ existing declarations. It is conventional to use plain identifiers
wenzelm@62106
  1144
  prefixed by a single underscore (e.g.\ \<open>_foobar\<close>). Names should be chosen
wenzelm@62106
  1145
  with care, to avoid clashes with other syntax declarations.
wenzelm@46292
  1146
wenzelm@61421
  1147
  \<^medskip>
wenzelm@62106
  1148
  The special case of copy production is specified by \<open>c =\<close>~\<^verbatim>\<open>""\<close> (empty
wenzelm@62106
  1149
  string). It means that the resulting parse tree \<open>t\<close> is copied directly,
wenzelm@62106
  1150
  without any further decoration.
wenzelm@46282
  1151
wenzelm@62106
  1152
  \<^descr> @{command "no_syntax"}~\<open>(mode) decls\<close> removes grammar declarations (and
wenzelm@62106
  1153
  translations) resulting from \<open>decls\<close>, which are interpreted in the same
wenzelm@62106
  1154
  manner as for @{command "syntax"} above.
wenzelm@62106
  1155
wenzelm@62106
  1156
  \<^descr> @{command "translations"}~\<open>rules\<close> specifies syntactic translation rules
wenzelm@62106
  1157
  (i.e.\ macros) as first-order rewrite rules on ASTs (\secref{sec:ast}). The
wenzelm@62106
  1158
  theory context maintains two independent lists translation rules: parse
wenzelm@62106
  1159
  rules (\<^verbatim>\<open>=>\<close> or \<open>\<rightharpoonup>\<close>) and print rules (\<^verbatim>\<open><=\<close> or \<open>\<leftharpoondown>\<close>). For convenience, both
wenzelm@62106
  1160
  can be specified simultaneously as parse~/ print rules (\<^verbatim>\<open>==\<close> or \<open>\<rightleftharpoons>\<close>).
wenzelm@46282
  1161
wenzelm@62106
  1162
  Translation patterns may be prefixed by the syntactic category to be used
wenzelm@62106
  1163
  for parsing; the default is \<open>logic\<close> which means that regular term syntax is
wenzelm@62106
  1164
  used. Both sides of the syntax translation rule undergo parsing and parse
wenzelm@62106
  1165
  AST translations \secref{sec:tr-funs}, in order to perform some fundamental
wenzelm@62106
  1166
  normalization like \<open>\<lambda>x y. b \<leadsto> \<lambda>x. \<lambda>y. b\<close>, but other AST translation rules
wenzelm@62106
  1167
  are \<^emph>\<open>not\<close> applied recursively here.
wenzelm@48115
  1168
wenzelm@62106
  1169
  When processing AST patterns, the inner syntax lexer runs in a different
wenzelm@62106
  1170
  mode that allows identifiers to start with underscore. This accommodates the
wenzelm@62106
  1171
  usual naming convention for auxiliary syntax constants --- those that do not
wenzelm@62106
  1172
  have a logical counter part --- by allowing to specify arbitrary AST
wenzelm@62106
  1173
  applications within the term syntax, independently of the corresponding
wenzelm@62106
  1174
  concrete syntax.
wenzelm@48115
  1175
wenzelm@48115
  1176
  Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML
wenzelm@62106
  1177
  Ast.Variable} as follows: a qualified name or syntax constant declared via
wenzelm@62106
  1178
  @{command syntax}, or parse tree head of concrete notation becomes @{ML
wenzelm@62106
  1179
  Ast.Constant}, anything else @{ML Ast.Variable}. Note that \<open>CONST\<close> and
wenzelm@62106
  1180
  \<open>XCONST\<close> within the term language (\secref{sec:pure-grammar}) allow to
wenzelm@62106
  1181
  enforce treatment as constants.
wenzelm@48115
  1182
wenzelm@62106
  1183
  AST rewrite rules \<open>(lhs, rhs)\<close> need to obey the following side-conditions:
wenzelm@48115
  1184
wenzelm@62106
  1185
    \<^item> Rules must be left linear: \<open>lhs\<close> must not contain repeated
wenzelm@62106
  1186
    variables.\<^footnote>\<open>The deeper reason for this is that AST equality is not
wenzelm@62106
  1187
    well-defined: different occurrences of the ``same'' AST could be decorated
wenzelm@62106
  1188
    differently by accidental type-constraints or source position information,
wenzelm@62106
  1189
    for example.\<close>
wenzelm@48115
  1190
wenzelm@61493
  1191
    \<^item> Every variable in \<open>rhs\<close> must also occur in \<open>lhs\<close>.
wenzelm@48115
  1192
wenzelm@62106
  1193
  \<^descr> @{command "no_translations"}~\<open>rules\<close> removes syntactic translation rules,
wenzelm@62106
  1194
  which are interpreted in the same manner as for @{command "translations"}
wenzelm@62106
  1195
  above.
wenzelm@28762
  1196
wenzelm@62106
  1197
  \<^descr> @{attribute syntax_ast_trace} and @{attribute syntax_ast_stats} control
wenzelm@62106
  1198
  diagnostic output in the AST normalization process, when translation rules
wenzelm@62106
  1199
  are applied to concrete input or output.
wenzelm@48117
  1200
wenzelm@46293
  1201
wenzelm@62106
  1202
  Raw syntax and translations provides a slightly more low-level access to the
wenzelm@62106
  1203
  grammar and the form of resulting parse trees. It is often possible to avoid
wenzelm@62106
  1204
  this untyped macro mechanism, and use type-safe @{command abbreviation} or
wenzelm@62106
  1205
  @{command notation} instead. Some important situations where @{command
wenzelm@62106
  1206
  syntax} and @{command translations} are really need are as follows:
wenzelm@46293
  1207
wenzelm@62106
  1208
  \<^item> Iterated replacement via recursive @{command translations}. For example,
wenzelm@62106
  1209
  consider list enumeration @{term "[a, b, c, d]"} as defined in theory
wenzelm@62106
  1210
  @{theory List} in Isabelle/HOL.
wenzelm@46293
  1211
wenzelm@62106
  1212
  \<^item> Change of binding status of variables: anything beyond the built-in
wenzelm@62106
  1213
  @{keyword "binder"} mixfix annotation requires explicit syntax translations.
wenzelm@62106
  1214
  For example, consider list filter comprehension @{term "[x \<leftarrow> xs . P]"} as
wenzelm@62106
  1215
  defined in theory @{theory List} in Isabelle/HOL.
wenzelm@61458
  1216
\<close>
wenzelm@46293
  1217
wenzelm@28762
  1218
wenzelm@58618
  1219
subsubsection \<open>Applying translation rules\<close>
wenzelm@48117
  1220
wenzelm@62106
  1221
text \<open>
wenzelm@62106
  1222
  As a term is being parsed or printed, an AST is generated as an intermediate
wenzelm@62106
  1223
  form according to \figref{fig:parse-print}. The AST is normalized by
wenzelm@62106
  1224
  applying translation rules in the manner of a first-order term rewriting
wenzelm@62106
  1225
  system. We first examine how a single rule is applied.
wenzelm@48117
  1226
wenzelm@62106
  1227
  Let \<open>t\<close> be the abstract syntax tree to be normalized and \<open>(lhs, rhs)\<close> some
wenzelm@62106
  1228
  translation rule. A subtree \<open>u\<close> of \<open>t\<close> is called \<^emph>\<open>redex\<close> if it is an
wenzelm@62106
  1229
  instance of \<open>lhs\<close>; in this case the pattern \<open>lhs\<close> is said to match the
wenzelm@62106
  1230
  object \<open>u\<close>. A redex matched by \<open>lhs\<close> may be replaced by the corresponding
wenzelm@62106
  1231
  instance of \<open>rhs\<close>, thus \<^emph>\<open>rewriting\<close> the AST \<open>t\<close>. Matching requires some
wenzelm@62106
  1232
  notion of \<^emph>\<open>place-holders\<close> in rule patterns: @{ML Ast.Variable} serves this
wenzelm@62106
  1233
  purpose.
wenzelm@48117
  1234
wenzelm@62106
  1235
  More precisely, the matching of the object \<open>u\<close> against the pattern \<open>lhs\<close> is
wenzelm@62106
  1236
  performed as follows:
wenzelm@48117
  1237
wenzelm@62106
  1238
    \<^item> Objects of the form @{ML Ast.Variable}~\<open>x\<close> or @{ML Ast.Constant}~\<open>x\<close> are
wenzelm@62106
  1239
    matched by pattern @{ML Ast.Constant}~\<open>x\<close>. Thus all atomic ASTs in the
wenzelm@62106
  1240
    object are treated as (potential) constants, and a successful match makes
wenzelm@62106
  1241
    them actual constants even before name space resolution (see also
wenzelm@62106
  1242
    \secref{sec:ast}).
wenzelm@48117
  1243
wenzelm@62106
  1244
    \<^item> Object \<open>u\<close> is matched by pattern @{ML Ast.Variable}~\<open>x\<close>, binding \<open>x\<close> to
wenzelm@62106
  1245
    \<open>u\<close>.
wenzelm@48117
  1246
wenzelm@62106
  1247
    \<^item> Object @{ML Ast.Appl}~\<open>us\<close> is matched by @{ML Ast.Appl}~\<open>ts\<close> if \<open>us\<close> and
wenzelm@62106
  1248
    \<open>ts\<close> have the same length and each corresponding subtree matches.
wenzelm@48117
  1249
wenzelm@62106
  1250
    \<^item> In every other case, matching fails.
wenzelm@48117
  1251
wenzelm@62106
  1252
  A successful match yields a substitution that is applied to \<open>rhs\<close>,
wenzelm@62106
  1253
  generating the instance that replaces \<open>u\<close>.
wenzelm@48117
  1254
wenzelm@62106
  1255
  Normalizing an AST involves repeatedly applying translation rules until none
wenzelm@62106
  1256
  are applicable. This works yoyo-like: top-down, bottom-up, top-down, etc. At
wenzelm@62106
  1257
  each subtree position, rules are chosen in order of appearance in the theory
wenzelm@62106
  1258
  definitions.
wenzelm@48117
  1259
wenzelm@62106
  1260
  The configuration options @{attribute syntax_ast_trace} and @{attribute
wenzelm@62106
  1261
  syntax_ast_stats} might help to understand this process and diagnose
wenzelm@62106
  1262
  problems.
wenzelm@48117
  1263
wenzelm@48117
  1264
  \begin{warn}
wenzelm@62106
  1265
  If syntax translation rules work incorrectly, the output of @{command_ref
wenzelm@62106
  1266
  print_syntax} with its \<^emph>\<open>rules\<close> sections reveals the actual internal forms
wenzelm@62106
  1267
  of AST pattern, without potentially confusing concrete syntax. Recall that
wenzelm@62106
  1268
  AST constants appear as quoted strings and variables without quotes.
wenzelm@48117
  1269
  \end{warn}
wenzelm@48117
  1270
wenzelm@48117
  1271
  \begin{warn}
wenzelm@62106
  1272
  If @{attribute_ref eta_contract} is set to \<open>true\<close>, terms will be
wenzelm@62106
  1273
  \<open>\<eta>\<close>-contracted \<^emph>\<open>before\<close> the AST rewriter sees them. Thus some abstraction
wenzelm@62106
  1274
  nodes needed for print rules to match may vanish. For example, \<open>Ball A (\<lambda>x.
wenzelm@62106
  1275
  P x)\<close> would contract to \<open>Ball A P\<close> and the standard print rule would fail to
wenzelm@62106
  1276
  apply. This problem can be avoided by hand-written ML translation functions
wenzelm@62106
  1277
  (see also \secref{sec:tr-funs}), which is in fact the same mechanism used in
wenzelm@62106
  1278
  built-in @{keyword "binder"} declarations.
wenzelm@48117
  1279
  \end{warn}
wenzelm@58618
  1280
\<close>
wenzelm@48117
  1281
wenzelm@28762
  1282
wenzelm@58618
  1283
subsection \<open>Syntax translation functions \label{sec:tr-funs}\<close>
wenzelm@28762
  1284
wenzelm@58618
  1285
text \<open>
wenzelm@28762
  1286
  \begin{matharray}{rcl}
wenzelm@61493
  1287
    @{command_def "parse_ast_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1288
    @{command_def "parse_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1289
    @{command_def "print_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1290
    @{command_def "typed_print_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1291
    @{command_def "print_ast_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
wenzelm@61493
  1292
    @{ML_antiquotation_def "class_syntax"} & : & \<open>ML antiquotation\<close> \\
wenzelm@61493
  1293
    @{ML_antiquotation_def "type_syntax"} & : & \<open>ML antiquotation\<close> \\
wenzelm@61493
  1294
    @{ML_antiquotation_def "const_syntax"} & : & \<open>ML antiquotation\<close> \\
wenzelm@61493
  1295
    @{ML_antiquotation_def "syntax_const"} & : & \<open>ML antiquotation\<close> \\
wenzelm@28762
  1296
  \end{matharray}
wenzelm@28762
  1297
wenzelm@48118
  1298
  Syntax translation functions written in ML admit almost arbitrary
wenzelm@48118
  1299
  manipulations of inner syntax, at the expense of some complexity and
wenzelm@48118
  1300
  obscurity in the implementation.
wenzelm@48118
  1301
wenzelm@55112
  1302
  @{rail \<open>
wenzelm@42596
  1303
  ( @@{command parse_ast_translation} | @@{command parse_translation} |
wenzelm@42596
  1304
    @@{command print_translation} | @@{command typed_print_translation} |
wenzelm@52143
  1305
    @@{command print_ast_translation}) @{syntax text}
wenzelm@48119
  1306
  ;
wenzelm@48119
  1307
  (@@{ML_antiquotation class_syntax} |
wenzelm@48119
  1308
   @@{ML_antiquotation type_syntax} |
wenzelm@48119
  1309
   @@{ML_antiquotation const_syntax} |
wenzelm@48119
  1310
   @@{ML_antiquotation syntax_const}) name
wenzelm@55112
  1311
  \<close>}
wenzelm@28762
  1312
wenzelm@62106
  1313
  \<^descr> @{command parse_translation} etc. declare syntax translation functions to
wenzelm@62106
  1314
  the theory. Any of these commands have a single @{syntax text} argument that
wenzelm@62106
  1315
  refers to an ML expression of appropriate type as follows:
wenzelm@48118
  1316
wenzelm@61421
  1317
  \<^medskip>
wenzelm@48119
  1318
  {\footnotesize
wenzelm@52143
  1319
  \begin{tabular}{l}
wenzelm@52143
  1320
  @{command parse_ast_translation} : \\
wenzelm@52143
  1321
  \quad @{ML_type "(string * (Proof.context -> Ast.ast list -> Ast.ast)) list"} \\
wenzelm@52143
  1322
  @{command parse_translation} : \\
wenzelm@52143
  1323
  \quad @{ML_type "(string * (Proof.context -> term list -> term)) list"} \\
wenzelm@52143
  1324
  @{command print_translation} : \\
wenzelm@52143
  1325
  \quad @{ML_type "(string * (Proof.context -> term list -> term)) list"} \\
wenzelm@52143
  1326
  @{command typed_print_translation} : \\
wenzelm@52143
  1327
  \quad @{ML_type "(string * (Proof.context -> typ -> term list -> term)) list"} \\
wenzelm@52143
  1328
  @{command print_ast_translation} : \\
wenzelm@52143
  1329
  \quad @{ML_type "(string * (Proof.context -> Ast.ast list -> Ast.ast)) list"} \\
wenzelm@48118
  1330
  \end{tabular}}
wenzelm@61421
  1331
  \<^medskip>
wenzelm@28762
  1332
wenzelm@62106
  1333
  The argument list consists of \<open>(c, tr)\<close> pairs, where \<open>c\<close> is the syntax name
wenzelm@62106
  1334
  of the formal entity involved, and \<open>tr\<close> a function that translates a syntax
wenzelm@62106
  1335
  form \<open>c args\<close> into \<open>tr ctxt args\<close> (depending on the context). The
wenzelm@62106
  1336
  Isabelle/ML naming convention for parse translations is \<open>c_tr\<close> and for print
wenzelm@62106
  1337
  translations \<open>c_tr'\<close>.
wenzelm@48118
  1338
wenzelm@48118
  1339
  The @{command_ref print_syntax} command displays the sets of names
wenzelm@62106
  1340
  associated with the translation functions of a theory under
wenzelm@62106
  1341
  \<open>parse_ast_translation\<close> etc.
wenzelm@48118
  1342
wenzelm@62106
  1343
  \<^descr> \<open>@{class_syntax c}\<close>, \<open>@{type_syntax c}\<close>, \<open>@{const_syntax c}\<close> inline the
wenzelm@62106
  1344
  authentic syntax name of the given formal entities into the ML source. This
wenzelm@62106
  1345
  is the fully-qualified logical name prefixed by a special marker to indicate
wenzelm@62106
  1346
  its kind: thus different logical name spaces are properly distinguished
wenzelm@62106
  1347
  within parse trees.
wenzelm@48119
  1348
wenzelm@62106
  1349
  \<^descr> \<open>@{const_syntax c}\<close> inlines the name \<open>c\<close> of the given syntax constant,
wenzelm@62106
  1350
  having checked that it has been declared via some @{command syntax} commands
wenzelm@62106
  1351
  within the theory context. Note that the usual naming convention makes
wenzelm@62106
  1352
  syntax constants start with underscore, to reduce the chance of accidental
wenzelm@62106
  1353
  clashes with other names occurring in parse trees (unqualified constants
wenzelm@62106
  1354
  etc.).
wenzelm@58618
  1355
\<close>
wenzelm@48118
  1356
wenzelm@48119
  1357
wenzelm@58618
  1358
subsubsection \<open>The translation strategy\<close>
wenzelm@28762
  1359
wenzelm@62106
  1360
text \<open>
wenzelm@62106
  1361
  The different kinds of translation functions are invoked during the
wenzelm@62106
  1362
  transformations between parse trees, ASTs and syntactic terms (cf.\
wenzelm@62106
  1363
  \figref{fig:parse-print}). Whenever a combination of the form \<open>c x\<^sub>1 \<dots> x\<^sub>n\<close>
wenzelm@62106
  1364
  is encountered, and a translation function \<open>f\<close> of appropriate kind is
wenzelm@62106
  1365
  declared for \<open>c\<close>, the result is produced by evaluation of \<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close>
wenzelm@62106
  1366
  in ML.
wenzelm@48118
  1367
wenzelm@62106
  1368
  For AST translations, the arguments \<open>x\<^sub>1, \<dots>, x\<^sub>n\<close> are ASTs. A combination
wenzelm@62106
  1369
  has the form @{ML "Ast.Constant"}~\<open>c\<close> or @{ML "Ast.Appl"}~\<open>[\<close>@{ML
wenzelm@62106
  1370
  Ast.Constant}~\<open>c, x\<^sub>1, \<dots>, x\<^sub>n]\<close>. For term translations, the arguments are
wenzelm@62106
  1371
  terms and a combination has the form @{ML Const}~\<open>(c, \<tau>)\<close> or @{ML
wenzelm@62106
  1372
  Const}~\<open>(c, \<tau>) $ x\<^sub>1 $ \<dots> $ x\<^sub>n\<close>. Terms allow more sophisticated
wenzelm@62106
  1373
  transformations than ASTs do, typically involving abstractions and bound
wenzelm@62106
  1374
  variables. \<^emph>\<open>Typed\<close> print translations may even peek at the type \<open>\<tau>\<close> of the
wenzelm@62106
  1375
  constant they are invoked on, although some information might have been
wenzelm@62106
  1376
  suppressed for term output already.
wenzelm@48118
  1377
wenzelm@62106
  1378
  Regardless of whether they act on ASTs or terms, translation functions
wenzelm@62106
  1379
  called during the parsing process differ from those for printing in their
wenzelm@62106
  1380
  overall behaviour:
wenzelm@48118
  1381
wenzelm@62106
  1382
    \<^descr>[Parse translations] are applied bottom-up. The arguments are already in
wenzelm@62106
  1383
    translated form. The translations must not fail; exceptions trigger an
wenzelm@62106
  1384
    error message. There may be at most one function associated with any
wenzelm@62106
  1385
    syntactic name.
wenzelm@46294
  1386
wenzelm@62106
  1387
    \<^descr>[Print translations] are applied top-down. They are supplied with
wenzelm@62106
  1388
    arguments that are partly still in internal form. The result again
wenzelm@62106
  1389
    undergoes translation; therefore a print translation should not introduce
wenzelm@62106
  1390
    as head the very constant that invoked it. The function may raise
wenzelm@62106
  1391
    exception @{ML Match} to indicate failure; in this event it has no effect.
wenzelm@62106
  1392
    Multiple functions associated with some syntactic name are tried in the
wenzelm@62106
  1393
    order of declaration in the theory.
wenzelm@48118
  1394
wenzelm@62106
  1395
  Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and @{ML
wenzelm@62106
  1396
  Const} for terms --- can invoke translation functions. This means that parse
wenzelm@62106
  1397
  translations can only be associated with parse tree heads of concrete
wenzelm@62106
  1398
  syntax, or syntactic constants introduced via other translations. For plain
wenzelm@62106
  1399
  identifiers within the term language, the status of constant versus variable
wenzelm@62106
  1400
  is not yet know during parsing. This is in contrast to print translations,
wenzelm@62106
  1401
  where constants are explicitly known from the given term in its fully
wenzelm@62106
  1402
  internal form.
wenzelm@58618
  1403
\<close>
wenzelm@28762
  1404
wenzelm@52414
  1405
wenzelm@58618
  1406
subsection \<open>Built-in syntax transformations\<close>
wenzelm@52414
  1407
wenzelm@58618
  1408
text \<open>
wenzelm@62106
  1409
  Here are some further details of the main syntax transformation phases of
wenzelm@62106
  1410
  \figref{fig:parse-print}.
wenzelm@58618
  1411
\<close>
wenzelm@52414
  1412
wenzelm@52414
  1413
wenzelm@58618
  1414
subsubsection \<open>Transforming parse trees to ASTs\<close>
wenzelm@52414
  1415
wenzelm@62106
  1416
text \<open>
wenzelm@62106
  1417
  The parse tree is the raw output of the parser. It is transformed into an
wenzelm@62106
  1418
  AST according to some basic scheme that may be augmented by AST translation
wenzelm@62106
  1419
  functions as explained in \secref{sec:tr-funs}.
wenzelm@52414
  1420
wenzelm@52414
  1421
  The parse tree is constructed by nesting the right-hand sides of the
wenzelm@62106
  1422
  productions used to recognize the input. Such parse trees are simply lists
wenzelm@62106
  1423
  of tokens and constituent parse trees, the latter representing the
wenzelm@62106
  1424
  nonterminals of the productions. Ignoring AST translation functions, parse
wenzelm@62106
  1425
  trees are transformed to ASTs by stripping out delimiters and copy
wenzelm@62106
  1426
  productions, while retaining some source position information from input
wenzelm@62106
  1427
  tokens.
wenzelm@52414
  1428
wenzelm@62106
  1429
  The Pure syntax provides predefined AST translations to make the basic
wenzelm@62106
  1430
  \<open>\<lambda>\<close>-term structure more apparent within the (first-order) AST
wenzelm@62106
  1431
  representation, and thus facilitate the use of @{command translations} (see
wenzelm@62106
  1432
  also \secref{sec:syn-trans}). This covers ordinary term application, type
wenzelm@62106
  1433
  application, nested abstraction, iterated meta implications and function
wenzelm@62106
  1434
  types. The effect is illustrated on some representative input strings is as
wenzelm@52414
  1435
  follows:
wenzelm@52414
  1436
wenzelm@52414
  1437
  \begin{center}
wenzelm@52414
  1438
  \begin{tabular}{ll}
wenzelm@52414
  1439
  input source & AST \\
wenzelm@52414
  1440
  \hline
wenzelm@61503
  1441
  \<open>f x y z\<close> & \<^verbatim>\<open>(f x y z)\<close> \\
wenzelm@61503
  1442
  \<open>'a ty\<close> & \<^verbatim>\<open>(ty 'a)\<close> \\
wenzelm@61503
  1443
  \<open>('a, 'b)ty\<close> & \<^verbatim>\<open>(ty 'a 'b)\<close> \\
wenzelm@61503
  1444
  \<open>\<lambda>x y z. t\<close> & \<^verbatim>\<open>("_abs" x ("_abs" y ("_abs" z t)))\<close> \\
wenzelm@61503
  1445
  \<open>\<lambda>x :: 'a. t\<close> & \<^verbatim>\<open>("_abs" ("_constrain" x 'a) t)\<close> \\
wenzelm@61503
  1446
  \<open>\<lbrakk>P; Q; R\<rbrakk> \<Longrightarrow> S\<close> & \<^verbatim>\<open>("Pure.imp" P ("Pure.imp" Q ("Pure.imp" R S)))\<close> \\
wenzelm@61503
  1447
   \<open>['a, 'b, 'c] \<Rightarrow> 'd\<close> & \<^verbatim>\<open>("fun" 'a ("fun" 'b ("fun" 'c 'd)))\<close> \\
wenzelm@52414
  1448
  \end{tabular}
wenzelm@52414
  1449
  \end{center}
wenzelm@52414
  1450
wenzelm@52414
  1451
  Note that type and sort constraints may occur in further places ---
wenzelm@62106
  1452
  translations need to be ready to cope with them. The built-in syntax
wenzelm@62106
  1453
  transformation from parse trees to ASTs insert additional constraints that
wenzelm@62106
  1454
  represent source positions.
wenzelm@58618
  1455
\<close>
wenzelm@52414
  1456
wenzelm@52414
  1457
wenzelm@58618
  1458
subsubsection \<open>Transforming ASTs to terms\<close>
wenzelm@52414
  1459
wenzelm@62106
  1460
text \<open>
wenzelm@62106
  1461
  After application of macros (\secref{sec:syn-trans}), the AST is transformed
wenzelm@62106
  1462
  into a term. This term still lacks proper type information, but it might
wenzelm@62106
  1463
  contain some constraints consisting of applications with head \<^verbatim>\<open>_constrain\<close>,
wenzelm@62106
  1464
  where the second argument is a type encoded as a pre-term within the syntax.
wenzelm@62106
  1465
  Type inference later introduces correct types, or indicates type errors in
wenzelm@62106
  1466
  the input.
wenzelm@52414
  1467
wenzelm@62106
  1468
  Ignoring parse translations, ASTs are transformed to terms by mapping AST
wenzelm@62106
  1469
  constants to term constants, AST variables to term variables or constants
wenzelm@62106
  1470
  (according to the name space), and AST applications to iterated term
wenzelm@62106
  1471
  applications.
wenzelm@52414
  1472
wenzelm@62106
  1473
  The outcome is still a first-order term. Proper abstractions and bound
wenzelm@62106
  1474
  variables are introduced by parse translations associated with certain
wenzelm@62106
  1475
  syntax constants. Thus \<^verbatim>\<open>("_abs" x x)\<close> eventually becomes a de-Bruijn term
wenzelm@62106
  1476
  \<^verbatim>\<open>Abs ("x", _, Bound 0)\<close>.
wenzelm@58618
  1477
\<close>
wenzelm@52414
  1478
wenzelm@52414
  1479
wenzelm@58618
  1480
subsubsection \<open>Printing of terms\<close>
wenzelm@52414
  1481
wenzelm@62106
  1482
text \<open>
wenzelm@62106
  1483
  The output phase is essentially the inverse of the input phase. Terms are
wenzelm@62106
  1484
  translated via abstract syntax trees into pretty-printed text.
wenzelm@52414
  1485
wenzelm@52414
  1486
  Ignoring print translations, the transformation maps term constants,
wenzelm@52414
  1487
  variables and applications to the corresponding constructs on ASTs.
wenzelm@62106
  1488
  Abstractions are mapped to applications of the special constant \<^verbatim>\<open>_abs\<close> as
wenzelm@62106
  1489
  seen before. Type constraints are represented via special \<^verbatim>\<open>_constrain\<close>
wenzelm@62106
  1490
  forms, according to various policies of type annotation determined
wenzelm@62106
  1491
  elsewhere. Sort constraints of type variables are handled in a similar
wenzelm@62106
  1492
  fashion.
wenzelm@52414
  1493
wenzelm@62106
  1494
  After application of macros (\secref{sec:syn-trans}), the AST is finally
wenzelm@62106
  1495
  pretty-printed. The built-in print AST translations reverse the
wenzelm@62106
  1496
  corresponding parse AST translations.
wenzelm@52414
  1497
wenzelm@61421
  1498
  \<^medskip>
wenzelm@61421
  1499
  For the actual printing process, the priority grammar
wenzelm@62106
  1500
  (\secref{sec:priority-grammar}) plays a vital role: productions are used as
wenzelm@62106
  1501
  templates for pretty printing, with argument slots stemming from
wenzelm@62106
  1502
  nonterminals, and syntactic sugar stemming from literal tokens.
wenzelm@52414
  1503
wenzelm@62106
  1504
  Each AST application with constant head \<open>c\<close> and arguments \<open>t\<^sub>1\<close>, \dots,
wenzelm@62106
  1505
  \<open>t\<^sub>n\<close> (for \<open>n = 0\<close> the AST is just the constant \<open>c\<close> itself) is printed
wenzelm@62106
  1506
  according to the first grammar production of result name \<open>c\<close>. The required
wenzelm@62106
  1507
  syntax priority of the argument slot is given by its nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>.
wenzelm@62106
  1508
  The argument \<open>t\<^sub>i\<close> that corresponds to the position of \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is printed
wenzelm@62106
  1509
  recursively, and then put in parentheses \<^emph>\<open>if\<close> its priority \<open>p\<close> requires
wenzelm@62106
  1510
  this. The resulting output is concatenated with the syntactic sugar
wenzelm@62106
  1511
  according to the grammar production.
wenzelm@52414
  1512
wenzelm@62106
  1513
  If an AST application \<open>(c x\<^sub>1 \<dots> x\<^sub>m)\<close> has more arguments than the
wenzelm@62106
  1514
  corresponding production, it is first split into \<open>((c x\<^sub>1 \<dots> x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \<dots>
wenzelm@62106
  1515
  x\<^sub>m)\<close> and then printed recursively as above.
wenzelm@52414
  1516
wenzelm@62106
  1517
  Applications with too few arguments or with non-constant head or without a
wenzelm@62106
  1518
  corresponding production are printed in prefix-form like \<open>f t\<^sub>1 \<dots> t\<^sub>n\<close> for
wenzelm@62106
  1519
  terms.
wenzelm@52414
  1520
wenzelm@62106
  1521
  Multiple productions associated with some name \<open>c\<close> are tried in order of
wenzelm@62106
  1522
  appearance within the grammar. An occurrence of some AST variable \<open>x\<close> is
wenzelm@62106
  1523
  printed as \<open>x\<close> outright.
wenzelm@52414
  1524
wenzelm@61421
  1525
  \<^medskip>
wenzelm@62106
  1526
  White space is \<^emph>\<open>not\<close> inserted automatically. If blanks (or breaks) are
wenzelm@62106
  1527
  required to separate tokens, they need to be specified in the mixfix
wenzelm@62106
  1528
  declaration (\secref{sec:mixfix}).
wenzelm@58618
  1529
\<close>
wenzelm@52414
  1530
wenzelm@28762
  1531
end