| author | wenzelm | 
| Tue, 18 Apr 2017 19:17:46 +0200 | |
| changeset 65507 | decdb95bd007 | 
| parent 63933 | e149e3e320a3 | 
| child 67146 | 909dcdec2122 | 
| permissions | -rw-r--r-- | 
| 61656 | 1 | (*:maxLineLen=78:*) | 
| 2 | ||
| 28762 | 3 | theory Inner_Syntax | 
| 63531 | 4 | imports Main Base | 
| 28762 | 5 | begin | 
| 6 | ||
| 58618 | 7 | chapter \<open>Inner syntax --- the term language \label{ch:inner-syntax}\<close>
 | 
| 28762 | 8 | |
| 62106 | 9 | text \<open> | 
| 10 | The inner syntax of Isabelle provides concrete notation for the main | |
| 11 | entities of the logical framework, notably \<open>\<lambda>\<close>-terms with types and type | |
| 12 | classes. Applications may either extend existing syntactic categories by | |
| 13 | additional notation, or define new sub-languages that are linked to the | |
| 14 | standard term language via some explicit markers. For example \<^verbatim>\<open>FOO\<close>~\<open>foo\<close> | |
| 15 | could embed the syntax corresponding for some user-defined nonterminal \<open>foo\<close> | |
| 16 | --- within the bounds of the given lexical syntax of Isabelle/Pure. | |
| 46282 | 17 | |
| 62106 | 18 | The most basic way to specify concrete syntax for logical entities works via | 
| 19 |   mixfix annotations (\secref{sec:mixfix}), which may be usually given as part
 | |
| 20 | of the original declaration or via explicit notation commands later on | |
| 21 |   (\secref{sec:notation}). This already covers many needs of concrete syntax
 | |
| 22 | without having to understand the full complexity of inner syntax layers. | |
| 46282 | 23 | |
| 62106 | 24 | Further details of the syntax engine involves the classical distinction of | 
| 25 |   lexical language versus context-free grammar (see \secref{sec:pure-syntax}),
 | |
| 26 | and various mechanisms for \<^emph>\<open>syntax transformations\<close> (see | |
| 27 |   \secref{sec:syntax-transformations}).
 | |
| 58618 | 28 | \<close> | 
| 46282 | 29 | |
| 30 | ||
| 58618 | 31 | section \<open>Printing logical entities\<close> | 
| 28762 | 32 | |
| 58618 | 33 | subsection \<open>Diagnostic commands \label{sec:print-diag}\<close>
 | 
| 28762 | 34 | |
| 58618 | 35 | text \<open> | 
| 28762 | 36 |   \begin{matharray}{rcl}
 | 
| 61493 | 37 |     @{command_def "typ"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
 | 
| 38 |     @{command_def "term"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
 | |
| 39 |     @{command_def "prop"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
 | |
| 40 |     @{command_def "thm"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
 | |
| 41 |     @{command_def "prf"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
 | |
| 42 |     @{command_def "full_prf"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
 | |
| 43 |     @{command_def "print_state"}\<open>\<^sup>*\<close> & : & \<open>any \<rightarrow>\<close> \\
 | |
| 28762 | 44 |   \end{matharray}
 | 
| 45 | ||
| 46 | These diagnostic commands assist interactive development by printing | |
| 47 | internal logical entities in a human-readable fashion. | |
| 48 | ||
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 49 |   @{rail \<open>
 | 
| 48792 | 50 |     @@{command typ} @{syntax modes}? @{syntax type} ('::' @{syntax sort})?
 | 
| 28762 | 51 | ; | 
| 42596 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 52 |     @@{command term} @{syntax modes}? @{syntax term}
 | 
| 28762 | 53 | ; | 
| 42596 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 54 |     @@{command prop} @{syntax modes}? @{syntax prop}
 | 
| 28762 | 55 | ; | 
| 62969 | 56 |     @@{command thm} @{syntax modes}? @{syntax thms}
 | 
| 28762 | 57 | ; | 
| 62969 | 58 |     ( @@{command prf} | @@{command full_prf} ) @{syntax modes}? @{syntax thms}?
 | 
| 28762 | 59 | ; | 
| 52430 | 60 |     @@{command print_state} @{syntax modes}?
 | 
| 28762 | 61 | ; | 
| 42596 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 62 |     @{syntax_def modes}: '(' (@{syntax name} + ) ')'
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 63 | \<close>} | 
| 28762 | 64 | |
| 62106 | 65 |   \<^descr> @{command "typ"}~\<open>\<tau>\<close> reads and prints a type expression according to the
 | 
| 66 | current context. | |
| 48792 | 67 | |
| 62106 | 68 |   \<^descr> @{command "typ"}~\<open>\<tau> :: s\<close> uses type-inference to determine the most
 | 
| 69 | general way to make \<open>\<tau>\<close> conform to sort \<open>s\<close>. For concrete \<open>\<tau>\<close> this checks if | |
| 70 | the type belongs to that sort. Dummy type parameters ``\<open>_\<close>'' (underscore) | |
| 71 | are assigned to fresh type variables with most general sorts, according the | |
| 72 | the principles of type-inference. | |
| 28766 
accab7594b8e
misc tuning and rearrangement of section "Printing logical entities";
 wenzelm parents: 
28765diff
changeset | 73 | |
| 62106 | 74 |     \<^descr> @{command "term"}~\<open>t\<close> and @{command "prop"}~\<open>\<phi>\<close> read, type-check and
 | 
| 75 | print terms or propositions according to the current theory or proof | |
| 76 | context; the inferred type of \<open>t\<close> is output as well. Note that these | |
| 77 | commands are also useful in inspecting the current environment of term | |
| 78 | abbreviations. | |
| 28762 | 79 | |
| 62106 | 80 |     \<^descr> @{command "thm"}~\<open>a\<^sub>1 \<dots> a\<^sub>n\<close> retrieves theorems from the current theory
 | 
| 81 | or proof context. Note that any attributes included in the theorem | |
| 82 | specifications are applied to a temporary context derived from the current | |
| 83 | theory or proof; the result is discarded, i.e.\ attributes involved in | |
| 84 | \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> do not have any permanent effect. | |
| 28762 | 85 | |
| 62106 | 86 |     \<^descr> @{command "prf"} displays the (compact) proof term of the current proof
 | 
| 63624 
994d1a1105ef
more informative 'prf' and 'full_prf', based on HOL/Proofs/ex/XML_Data.thy;
 wenzelm parents: 
63531diff
changeset | 87 | state (if present), or of the given theorems. Note that this requires an | 
| 
994d1a1105ef
more informative 'prf' and 'full_prf', based on HOL/Proofs/ex/XML_Data.thy;
 wenzelm parents: 
63531diff
changeset | 88 | underlying logic image with proof terms enabled, e.g. \<open>HOL-Proofs\<close>. | 
| 28762 | 89 | |
| 62106 | 90 |     \<^descr> @{command "full_prf"} is like @{command "prf"}, but displays the full
 | 
| 91 | proof term, i.e.\ also displays information omitted in the compact proof | |
| 92 | term, which is denoted by ``\<open>_\<close>'' placeholders there. | |
| 28762 | 93 | |
| 62106 | 94 |     \<^descr> @{command "print_state"} prints the current proof state (if present),
 | 
| 95 | including current facts and goals. | |
| 28762 | 96 | |
| 61997 | 97 | All of the diagnostic commands above admit a list of \<open>modes\<close> to be | 
| 98 | specified, which is appended to the current print mode; see also | |
| 99 |   \secref{sec:print-modes}. Thus the output behavior may be modified according
 | |
| 100 |   particular print mode features. For example, @{command
 | |
| 101 | "print_state"}~\<open>(latex)\<close> prints the current proof state with mathematical | |
| 102 |   symbols and special characters represented in {\LaTeX} source, according to
 | |
| 103 |   the Isabelle style @{cite "isabelle-system"}.
 | |
| 28762 | 104 | |
| 62106 | 105 |   Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more systematic
 | 
| 106 | way to include formal items into the printed text document. | |
| 58618 | 107 | \<close> | 
| 28762 | 108 | |
| 109 | ||
| 58618 | 110 | subsection \<open>Details of printed content\<close> | 
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 111 | |
| 58618 | 112 | text \<open> | 
| 42655 | 113 |   \begin{tabular}{rcll}
 | 
| 61493 | 114 |     @{attribute_def show_markup} & : & \<open>attribute\<close> \\
 | 
| 115 |     @{attribute_def show_types} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 116 |     @{attribute_def show_sorts} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 117 |     @{attribute_def show_consts} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 118 |     @{attribute_def show_abbrevs} & : & \<open>attribute\<close> & default \<open>true\<close> \\
 | |
| 119 |     @{attribute_def show_brackets} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 120 |     @{attribute_def names_long} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 121 |     @{attribute_def names_short} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 122 |     @{attribute_def names_unique} & : & \<open>attribute\<close> & default \<open>true\<close> \\
 | |
| 123 |     @{attribute_def eta_contract} & : & \<open>attribute\<close> & default \<open>true\<close> \\
 | |
| 124 |     @{attribute_def goals_limit} & : & \<open>attribute\<close> & default \<open>10\<close> \\
 | |
| 125 |     @{attribute_def show_main_goal} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 126 |     @{attribute_def show_hyps} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 127 |     @{attribute_def show_tags} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 128 |     @{attribute_def show_question_marks} & : & \<open>attribute\<close> & default \<open>true\<close> \\
 | |
| 42655 | 129 |   \end{tabular}
 | 
| 61421 | 130 | \<^medskip> | 
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 131 | |
| 62106 | 132 | These configuration options control the detail of information that is | 
| 133 | displayed for types, terms, theorems, goals etc. See also | |
| 42655 | 134 |   \secref{sec:config}.
 | 
| 28765 
da8f6f4a74be
misc tuning and rearrangement of section "Printing logical entities";
 wenzelm parents: 
28763diff
changeset | 135 | |
| 62106 | 136 |   \<^descr> @{attribute show_markup} controls direct inlining of markup into the
 | 
| 137 | printed representation of formal entities --- notably type and sort | |
| 138 | constraints. This enables Prover IDE users to retrieve that information via | |
| 139 | tooltips or popups while hovering with the mouse over the output window, for | |
| 140 | example. Consequently, this option is enabled by default for Isabelle/jEdit. | |
| 49699 | 141 | |
| 62106 | 142 |   \<^descr> @{attribute show_types} and @{attribute show_sorts} control printing of
 | 
| 143 | type constraints for term variables, and sort constraints for type | |
| 144 |   variables. By default, neither of these are shown in output. If @{attribute
 | |
| 145 | show_sorts} is enabled, types are always shown as well. In Isabelle/jEdit, | |
| 146 | manual setting of these options is normally not required thanks to | |
| 147 |   @{attribute show_markup} above.
 | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 148 | |
| 62106 | 149 | Note that displaying types and sorts may explain why a polymorphic inference | 
| 150 | rule fails to resolve with some goal, or why a rewrite rule does not apply | |
| 151 | as expected. | |
| 28765 
da8f6f4a74be
misc tuning and rearrangement of section "Printing logical entities";
 wenzelm parents: 
28763diff
changeset | 152 | |
| 62106 | 153 |   \<^descr> @{attribute show_consts} controls printing of types of constants when
 | 
| 154 | displaying a goal state. | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 155 | |
| 62106 | 156 | Note that the output can be enormous, because polymorphic constants often | 
| 157 | occur at several different type instances. | |
| 158 | ||
| 159 |   \<^descr> @{attribute show_abbrevs} controls folding of constant abbreviations.
 | |
| 40879 
ca132ef44944
configuration option "show_abbrevs" supersedes print mode "no_abbrevs", with inverted meaning;
 wenzelm parents: 
40255diff
changeset | 160 | |
| 62106 | 161 |   \<^descr> @{attribute show_brackets} controls bracketing in pretty printed output.
 | 
| 162 | If enabled, all sub-expressions of the pretty printing tree will be | |
| 163 | parenthesized, even if this produces malformed term syntax! This crude way | |
| 164 | of showing the internal structure of pretty printed entities may | |
| 165 | occasionally help to diagnose problems with operator priorities, for | |
| 166 | example. | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 167 | |
| 62106 | 168 |   \<^descr> @{attribute names_long}, @{attribute names_short}, and @{attribute
 | 
| 169 | names_unique} control the way of printing fully qualified internal names in | |
| 170 |   external form. See also \secref{sec:antiq} for the document antiquotation
 | |
| 171 | options of the same names. | |
| 42358 
b47d41d9f4b5
Name_Space: proper configuration options long_names, short_names, unique_names instead of former unsynchronized references;
 wenzelm parents: 
42279diff
changeset | 172 | |
| 62106 | 173 |   \<^descr> @{attribute eta_contract} controls \<open>\<eta>\<close>-contracted printing of terms.
 | 
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 174 | |
| 62106 | 175 |   The \<open>\<eta>\<close>-contraction law asserts @{prop "(\<lambda>x. f x) \<equiv> f"}, provided \<open>x\<close> is not
 | 
| 176 |   free in \<open>f\<close>. It asserts \<^emph>\<open>extensionality\<close> of functions: @{prop "f \<equiv> g"} if
 | |
| 177 |   @{prop "f x \<equiv> g x"} for all \<open>x\<close>. Higher-order unification frequently puts
 | |
| 178 | terms into a fully \<open>\<eta>\<close>-expanded form. For example, if \<open>F\<close> has type \<open>(\<tau> \<Rightarrow> \<tau>) | |
| 179 |   \<Rightarrow> \<tau>\<close> then its expanded form is @{term "\<lambda>h. F (\<lambda>x. h x)"}.
 | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 180 | |
| 62106 | 181 |   Enabling @{attribute eta_contract} makes Isabelle perform \<open>\<eta>\<close>-contractions
 | 
| 182 |   before printing, so that @{term "\<lambda>h. F (\<lambda>x. h x)"} appears simply as \<open>F\<close>.
 | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 183 | |
| 62106 | 184 | Note that the distinction between a term and its \<open>\<eta>\<close>-expanded form | 
| 185 | occasionally matters. While higher-order resolution and rewriting operate | |
| 186 | modulo \<open>\<alpha>\<beta>\<eta>\<close>-conversion, some other tools might look at terms more | |
| 187 | discretely. | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 188 | |
| 62106 | 189 |   \<^descr> @{attribute goals_limit} controls the maximum number of subgoals to be
 | 
| 190 | printed. | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 191 | |
| 62106 | 192 |   \<^descr> @{attribute show_main_goal} controls whether the main result to be proven
 | 
| 193 | should be displayed. This information might be relevant for schematic goals, | |
| 194 | to inspect the current claim that has been synthesized so far. | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 195 | |
| 62106 | 196 |   \<^descr> @{attribute show_hyps} controls printing of implicit hypotheses of local
 | 
| 197 | facts. Normally, only those hypotheses are displayed that are \<^emph>\<open>not\<close> covered | |
| 198 | by the assumptions of the current context: this situation indicates a fault | |
| 199 | in some tool being used. | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 200 | |
| 62106 | 201 |   By enabling @{attribute show_hyps}, output of \<^emph>\<open>all\<close> hypotheses can be
 | 
| 202 | enforced, which is occasionally useful for diagnostic purposes. | |
| 28763 
b5e6122ff575
added pretty printing options (from old ref manual);
 wenzelm parents: 
28762diff
changeset | 203 | |
| 62106 | 204 |   \<^descr> @{attribute show_tags} controls printing of extra annotations within
 | 
| 205 | theorems, such as internal position information, or the case names being | |
| 206 |   attached by the attribute @{attribute case_names}.
 | |
| 28765 
da8f6f4a74be
misc tuning and rearrangement of section "Printing logical entities";
 wenzelm parents: 
28763diff
changeset | 207 | |
| 62106 | 208 |   Note that the @{attribute tagged} and @{attribute untagged} attributes
 | 
| 209 | provide low-level access to the collection of tags associated with a | |
| 210 | theorem. | |
| 28765 
da8f6f4a74be
misc tuning and rearrangement of section "Printing logical entities";
 wenzelm parents: 
28763diff
changeset | 211 | |
| 62106 | 212 |   \<^descr> @{attribute show_question_marks} controls printing of question marks for
 | 
| 213 | schematic variables, such as \<open>?x\<close>. Only the leading question mark is | |
| 214 | affected, the remaining text is unchanged (including proper markup for | |
| 215 | schematic variables that might be relevant for user interfaces). | |
| 58618 | 216 | \<close> | 
| 28765 
da8f6f4a74be
misc tuning and rearrangement of section "Printing logical entities";
 wenzelm parents: 
28763diff
changeset | 217 | |
| 
da8f6f4a74be
misc tuning and rearrangement of section "Printing logical entities";
 wenzelm parents: 
28763diff
changeset | 218 | |
| 58618 | 219 | subsection \<open>Alternative print modes \label{sec:print-modes}\<close>
 | 
| 46284 | 220 | |
| 58618 | 221 | text \<open> | 
| 46284 | 222 |   \begin{mldecls}
 | 
| 223 |     @{index_ML print_mode_value: "unit -> string list"} \\
 | |
| 224 |     @{index_ML Print_Mode.with_modes: "string list -> ('a -> 'b) -> 'a -> 'b"} \\
 | |
| 225 |   \end{mldecls}
 | |
| 226 | ||
| 62106 | 227 | The \<^emph>\<open>print mode\<close> facility allows to modify various operations for printing. | 
| 228 |   Commands like @{command typ}, @{command term}, @{command thm} (see
 | |
| 229 |   \secref{sec:print-diag}) take additional print modes as optional argument.
 | |
| 230 | The underlying ML operations are as follows. | |
| 46284 | 231 | |
| 62106 | 232 |     \<^descr> @{ML "print_mode_value ()"} yields the list of currently active print
 | 
| 233 | mode names. This should be understood as symbolic representation of | |
| 234 | certain individual features for printing (with precedence from left to | |
| 235 | right). | |
| 46284 | 236 | |
| 62106 | 237 |     \<^descr> @{ML Print_Mode.with_modes}~\<open>modes f x\<close> evaluates \<open>f x\<close> in an execution
 | 
| 238 | context where the print mode is prepended by the given \<open>modes\<close>. This | |
| 239 | provides a thread-safe way to augment print modes. It is also monotonic in | |
| 240 | the set of mode names: it retains the default print mode that certain | |
| 241 | user-interfaces might have installed for their proper functioning! | |
| 46284 | 242 | |
| 61421 | 243 | \<^medskip> | 
| 62106 | 244 | The pretty printer for inner syntax maintains alternative mixfix productions | 
| 245 |   for any print mode name invented by the user, say in commands like @{command
 | |
| 246 |   notation} or @{command abbreviation}. Mode names can be arbitrary, but the
 | |
| 247 | following ones have a specific meaning by convention: | |
| 46284 | 248 | |
| 62106 | 249 | \<^item> \<^verbatim>\<open>""\<close> (the empty string): default mode; implicitly active as last | 
| 250 | element in the list of modes. | |
| 46284 | 251 | |
| 62106 | 252 | \<^item> \<^verbatim>\<open>input\<close>: dummy print mode that is never active; may be used to specify | 
| 253 | notation that is only available for input. | |
| 46284 | 254 | |
| 62106 | 255 | \<^item> \<^verbatim>\<open>internal\<close> dummy print mode that is never active; used internally in | 
| 256 | Isabelle/Pure. | |
| 46284 | 257 | |
| 62106 | 258 | \<^item> \<^verbatim>\<open>ASCII\<close>: prefer ASCII art over mathematical symbols. | 
| 46284 | 259 | |
| 62106 | 260 |     \<^item> \<^verbatim>\<open>latex\<close>: additional mode that is active in {\LaTeX} document
 | 
| 261 | preparation of Isabelle theory sources; allows to provide alternative | |
| 262 | output notation. | |
| 58618 | 263 | \<close> | 
| 46284 | 264 | |
| 265 | ||
| 58618 | 266 | section \<open>Mixfix annotations \label{sec:mixfix}\<close>
 | 
| 28762 | 267 | |
| 62106 | 268 | text \<open> | 
| 269 | Mixfix annotations specify concrete \<^emph>\<open>inner syntax\<close> of Isabelle types and | |
| 270 | terms. Locally fixed parameters in toplevel theorem statements, locale and | |
| 271 | class specifications also admit mixfix annotations in a fairly uniform | |
| 272 | manner. A mixfix annotation describes the concrete syntax, the translation | |
| 273 | to abstract syntax, and the pretty printing. Special case annotations | |
| 274 | provide a simple means of specifying infix operators and binders. | |
| 46290 | 275 | |
| 62106 | 276 |   Isabelle mixfix syntax is inspired by {\OBJ} @{cite OBJ}. It allows to
 | 
| 277 | specify any context-free priority grammar, which is more general than the | |
| 278 | fixity declarations of ML and Prolog. | |
| 28762 | 279 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 280 |   @{rail \<open>
 | 
| 51654 
8450b944e58a
just one syntax category "mixfix" -- check structure annotation semantically;
 wenzelm parents: 
50636diff
changeset | 281 |     @{syntax_def mixfix}: '('
 | 
| 58761 | 282 |       (@{syntax template} prios? @{syntax nat}? |
 | 
| 283 |         (@'infix' | @'infixl' | @'infixr') @{syntax template} @{syntax nat} |
 | |
| 284 |         @'binder' @{syntax template} prios? @{syntax nat} |
 | |
| 285 | @'structure') ')' | |
| 46290 | 286 | ; | 
| 62807 | 287 |     @{syntax template}: string
 | 
| 46289 | 288 | ; | 
| 42596 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 289 |     prios: '[' (@{syntax nat} + ',') ']'
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 290 | \<close>} | 
| 28762 | 291 | |
| 62106 | 292 | The string given as \<open>template\<close> may include literal text, spacing, blocks, | 
| 293 | and arguments (denoted by ``\<open>_\<close>''); the special symbol ``\<^verbatim>\<open>\<index>\<close>'' (printed as | |
| 294 |   ``\<open>\<index>\<close>'') represents an index argument that specifies an implicit @{keyword
 | |
| 295 |   "structure"} reference (see also \secref{sec:locale}). Only locally fixed
 | |
| 296 |   variables may be declared as @{keyword "structure"}.
 | |
| 51657 
3db1bbc82d8d
more accurate documentation of "(structure)" mixfix;
 wenzelm parents: 
51654diff
changeset | 297 | |
| 62106 | 298 | Infix and binder declarations provide common abbreviations for particular | 
| 299 | mixfix declarations. So in practice, mixfix templates mostly degenerate to | |
| 300 | literal text for concrete syntax, such as ``\<^verbatim>\<open>++\<close>'' for an infix symbol. | |
| 61503 | 301 | \<close> | 
| 28762 | 302 | |
| 46290 | 303 | |
| 58618 | 304 | subsection \<open>The general mixfix form\<close> | 
| 46290 | 305 | |
| 62106 | 306 | text \<open> | 
| 307 | In full generality, mixfix declarations work as follows. Suppose a constant | |
| 308 | \<open>c :: \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> is annotated by \<open>(mixfix [p\<^sub>1, \<dots>, p\<^sub>n] p)\<close>, where | |
| 309 | \<open>mixfix\<close> is a string \<open>d\<^sub>0 _ d\<^sub>1 _ \<dots> _ d\<^sub>n\<close> consisting of delimiters that | |
| 310 | surround argument positions as indicated by underscores. | |
| 28762 | 311 | |
| 62106 | 312 | Altogether this determines a production for a context-free priority grammar, | 
| 313 | where for each argument \<open>i\<close> the syntactic category is determined by \<open>\<tau>\<^sub>i\<close> | |
| 314 | (with priority \<open>p\<^sub>i\<close>), and the result category is determined from \<open>\<tau>\<close> (with | |
| 315 | priority \<open>p\<close>). Priority specifications are optional, with default 0 for | |
| 316 | arguments and 1000 for the result.\<^footnote>\<open>Omitting priorities is prone to | |
| 317 | syntactic ambiguities unless the delimiter tokens determine fully bracketed | |
| 318 | notation, as in \<open>if _ then _ else _ fi\<close>.\<close> | |
| 28762 | 319 | |
| 62106 | 320 | Since \<open>\<tau>\<close> may be again a function type, the constant type scheme may have | 
| 321 | more argument positions than the mixfix pattern. Printing a nested | |
| 322 | application \<open>c t\<^sub>1 \<dots> t\<^sub>m\<close> for \<open>m > n\<close> works by attaching concrete notation | |
| 323 | only to the innermost part, essentially by printing \<open>(c t\<^sub>1 \<dots> t\<^sub>n) \<dots> t\<^sub>m\<close> | |
| 324 | instead. If a term has fewer arguments than specified in the mixfix | |
| 28762 | 325 | template, the concrete syntax is ignored. | 
| 326 | ||
| 61421 | 327 | \<^medskip> | 
| 62106 | 328 | A mixfix template may also contain additional directives for pretty | 
| 329 | printing, notably spaces, blocks, and breaks. The general template format is | |
| 330 | a sequence over any of the following entities. | |
| 28762 | 331 | |
| 63933 | 332 | \<^descr> \<open>d\<close> is a delimiter, namely a non-empty sequence delimiter items of the | 
| 333 | following form: | |
| 334 | \<^enum> a control symbol followed by a cartouche | |
| 335 | \<^enum> a single symbol, excluding the following special characters: | |
| 336 | \<^medskip> | |
| 337 |       \begin{tabular}{ll}
 | |
| 338 | \<^verbatim>\<open>'\<close> & single quote \\ | |
| 339 | \<^verbatim>\<open>_\<close> & underscore \\ | |
| 340 | \<open>\<index>\<close> & index symbol \\ | |
| 341 | \<^verbatim>\<open>(\<close> & open parenthesis \\ | |
| 342 | \<^verbatim>\<open>)\<close> & close parenthesis \\ | |
| 343 | \<^verbatim>\<open>/\<close> & slash \\ | |
| 344 | \<open>\<open> \<close>\<close> & cartouche delimiters \\ | |
| 345 |       \end{tabular}
 | |
| 346 | \<^medskip> | |
| 28762 | 347 | |
| 62106 | 348 | \<^descr> \<^verbatim>\<open>'\<close> escapes the special meaning of these meta-characters, producing a | 
| 349 | literal version of the following character, unless that is a blank. | |
| 28771 
4510201c6aaf
mixfix annotations: verbatim for special symbols;
 wenzelm parents: 
28770diff
changeset | 350 | |
| 62106 | 351 | A single quote followed by a blank separates delimiters, without affecting | 
| 352 | printing, but input tokens may have additional white space here. | |
| 28771 
4510201c6aaf
mixfix annotations: verbatim for special symbols;
 wenzelm parents: 
28770diff
changeset | 353 | |
| 62106 | 354 | \<^descr> \<^verbatim>\<open>_\<close> is an argument position, which stands for a certain syntactic | 
| 355 | category in the underlying grammar. | |
| 28762 | 356 | |
| 62106 | 357 | \<^descr> \<open>\<index>\<close> is an indexed argument position; this is the place where implicit | 
| 358 | structure arguments can be attached. | |
| 28762 | 359 | |
| 62106 | 360 | \<^descr> \<open>s\<close> is a non-empty sequence of spaces for printing. This and the following | 
| 361 | specifications do not affect parsing at all. | |
| 362 | ||
| 62807 | 363 | \<^descr> \<^verbatim>\<open>(\<close>\<open>n\<close> opens a pretty printing block. The optional natural number | 
| 364 | specifies the block indentation, i.e. how much spaces to add when a line | |
| 365 | break occurs within the block. The default indentation is 0. | |
| 366 | ||
| 367 | \<^descr> \<^verbatim>\<open>(\<close>\<open>\<open>properties\<close>\<close> opens a pretty printing block, with properties | |
| 368 | specified within the given text cartouche. The syntax and semantics of | |
| 369 |   the category @{syntax_ref mixfix_properties} is described below.
 | |
| 28762 | 370 | |
| 61503 | 371 | \<^descr> \<^verbatim>\<open>)\<close> closes a pretty printing block. | 
| 28762 | 372 | |
| 61503 | 373 | \<^descr> \<^verbatim>\<open>//\<close> forces a line break. | 
| 28762 | 374 | |
| 62106 | 375 | \<^descr> \<^verbatim>\<open>/\<close>\<open>s\<close> allows a line break. Here \<open>s\<close> stands for the string of spaces | 
| 376 | (zero or more) right after the slash. These spaces are printed if the break | |
| 377 | is \<^emph>\<open>not\<close> taken. | |
| 28762 | 378 | |
| 379 | ||
| 62807 | 380 | \<^medskip> | 
| 381 | Block properties allow more control over the details of pretty-printed | |
| 382 | output. The concrete syntax is defined as follows. | |
| 383 | ||
| 384 |   @{rail \<open>
 | |
| 385 |     @{syntax_def "mixfix_properties"}: (entry *)
 | |
| 386 | ; | |
| 387 |     entry: atom ('=' atom)?
 | |
| 388 | ; | |
| 63138 
70f4d67235a0
clarified syntax category names according to Isabelle/ML/Scala;
 wenzelm parents: 
62969diff
changeset | 389 |     atom: @{syntax short_ident} | @{syntax int} | @{syntax float} | @{syntax cartouche}
 | 
| 62807 | 390 | \<close>} | 
| 391 | ||
| 392 |   Each @{syntax entry} is a name-value pair: if the value is omitted, if
 | |
| 393 | defaults to \<^verbatim>\<open>true\<close> (intended for Boolean properties). The following | |
| 394 | standard block properties are supported: | |
| 395 | ||
| 396 | \<^item> \<open>indent\<close> (natural number): the block indentation --- the same as for the | |
| 397 | simple syntax without block properties. | |
| 398 | ||
| 399 | \<^item> \<open>consistent\<close> (Boolean): this block has consistent breaks (if one break | |
| 400 | is taken, all breaks are taken). | |
| 401 | ||
| 402 | \<^item> \<open>unbreakable\<close> (Boolean): all possible breaks of the block are disabled | |
| 403 | (turned into spaces). | |
| 404 | ||
| 405 | \<^item> \<open>markup\<close> (string): the optional name of the markup node. If this is | |
| 406 | provided, all remaining properties are turned into its XML attributes. | |
| 407 | This allows to specify free-form PIDE markup, e.g.\ for specialized | |
| 408 | output. | |
| 409 | ||
| 410 | \<^medskip> | |
| 411 | Note that the general idea of pretty printing with blocks and breaks is | |
| 412 |   described in @{cite "paulson-ml2"}; it goes back to @{cite "Oppen:1980"}.
 | |
| 58618 | 413 | \<close> | 
| 28762 | 414 | |
| 415 | ||
| 58618 | 416 | subsection \<open>Infixes\<close> | 
| 46290 | 417 | |
| 62106 | 418 | text \<open> | 
| 419 | Infix operators are specified by convenient short forms that abbreviate | |
| 420 | general mixfix annotations as follows: | |
| 46290 | 421 | |
| 422 |   \begin{center}
 | |
| 423 |   \begin{tabular}{lll}
 | |
| 424 | ||
| 61503 | 425 |   \<^verbatim>\<open>(\<close>@{keyword_def "infix"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close> \<open>p\<close>\<^verbatim>\<open>)\<close>
 | 
| 61493 | 426 | & \<open>\<mapsto>\<close> & | 
| 61503 | 427 |   \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p + 1\<close>\<^verbatim>\<open>,\<close>~\<open>p + 1\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\
 | 
| 428 |   \<^verbatim>\<open>(\<close>@{keyword_def "infixl"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close> \<open>p\<close>\<^verbatim>\<open>)\<close>
 | |
| 61493 | 429 | & \<open>\<mapsto>\<close> & | 
| 61503 | 430 |   \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p\<close>\<^verbatim>\<open>,\<close>~\<open>p + 1\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\
 | 
| 431 |   \<^verbatim>\<open>(\<close>@{keyword_def "infixr"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>"\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close>
 | |
| 61493 | 432 | & \<open>\<mapsto>\<close> & | 
| 61503 | 433 |   \<^verbatim>\<open>("(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)" [\<close>\<open>p + 1\<close>\<^verbatim>\<open>,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>p\<close>\<^verbatim>\<open>)\<close> \\
 | 
| 46290 | 434 | |
| 435 |   \end{tabular}
 | |
| 436 |   \end{center}
 | |
| 437 | ||
| 62106 | 438 | The mixfix template \<^verbatim>\<open>"(_\<close>~\<open>sy\<close>\<^verbatim>\<open>/ _)"\<close> specifies two argument positions; | 
| 439 | the delimiter is preceded by a space and followed by a space or line break; | |
| 440 | the entire phrase is a pretty printing block. | |
| 46290 | 441 | |
| 62106 | 442 | The alternative notation \<^verbatim>\<open>op\<close>~\<open>sy\<close> is introduced in addition. Thus any | 
| 443 | infix operator may be written in prefix form (as in ML), independently of | |
| 444 | the number of arguments in the term. | |
| 58618 | 445 | \<close> | 
| 46290 | 446 | |
| 447 | ||
| 58618 | 448 | subsection \<open>Binders\<close> | 
| 46290 | 449 | |
| 62106 | 450 | text \<open> | 
| 451 | A \<^emph>\<open>binder\<close> is a variable-binding construct such as a quantifier. The idea | |
| 452 |   to formalize \<open>\<forall>x. b\<close> as \<open>All (\<lambda>x. b)\<close> for \<open>All :: ('a \<Rightarrow> bool) \<Rightarrow> bool\<close>
 | |
| 453 |   already goes back to @{cite church40}. Isabelle declarations of certain
 | |
| 454 |   higher-order operators may be annotated with @{keyword_def "binder"}
 | |
| 455 | annotations as follows: | |
| 46290 | 456 | |
| 457 |   \begin{center}
 | |
| 61503 | 458 |   \<open>c ::\<close>~\<^verbatim>\<open>"\<close>\<open>(\<tau>\<^sub>1 \<Rightarrow> \<tau>\<^sub>2) \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  (\<close>@{keyword "binder"}~\<^verbatim>\<open>"\<close>\<open>sy\<close>\<^verbatim>\<open>" [\<close>\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>
 | 
| 46290 | 459 |   \end{center}
 | 
| 460 | ||
| 62106 | 461 | This introduces concrete binder syntax \<open>sy x. b\<close>, where \<open>x\<close> is a bound | 
| 462 | variable of type \<open>\<tau>\<^sub>1\<close>, the body \<open>b\<close> has type \<open>\<tau>\<^sub>2\<close> and the whole term has | |
| 463 | type \<open>\<tau>\<^sub>3\<close>. The optional integer \<open>p\<close> specifies the syntactic priority of the | |
| 464 | body; the default is \<open>q\<close>, which is also the priority of the whole construct. | |
| 46290 | 465 | |
| 466 | Internally, the binder syntax is expanded to something like this: | |
| 467 |   \begin{center}
 | |
| 61503 | 468 |   \<open>c_binder ::\<close>~\<^verbatim>\<open>"\<close>\<open>idts \<Rightarrow> \<tau>\<^sub>2 \<Rightarrow> \<tau>\<^sub>3\<close>\<^verbatim>\<open>"  ("(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)" [0,\<close>~\<open>p\<close>\<^verbatim>\<open>]\<close>~\<open>q\<close>\<^verbatim>\<open>)\<close>
 | 
| 46290 | 469 |   \end{center}
 | 
| 470 | ||
| 471 |   Here @{syntax (inner) idts} is the nonterminal symbol for a list of
 | |
| 472 | identifiers with optional type constraints (see also | |
| 62106 | 473 |   \secref{sec:pure-grammar}). The mixfix template \<^verbatim>\<open>"(3\<close>\<open>sy\<close>\<^verbatim>\<open>_./ _)"\<close> defines
 | 
| 474 | argument positions for the bound identifiers and the body, separated by a | |
| 475 | dot with optional line break; the entire phrase is a pretty printing block | |
| 476 | of indentation level 3. Note that there is no extra space after \<open>sy\<close>, so it | |
| 477 | needs to be included user specification if the binder syntax ends with a | |
| 478 |   token that may be continued by an identifier token at the start of @{syntax
 | |
| 479 | (inner) idts}. | |
| 46290 | 480 | |
| 62106 | 481 | Furthermore, a syntax translation to transforms \<open>c_binder x\<^sub>1 \<dots> x\<^sub>n b\<close> into | 
| 482 | iterated application \<open>c (\<lambda>x\<^sub>1. \<dots> c (\<lambda>x\<^sub>n. b)\<dots>)\<close>. This works in both | |
| 483 | directions, for parsing and printing. | |
| 484 | \<close> | |
| 46290 | 485 | |
| 486 | ||
| 58618 | 487 | section \<open>Explicit notation \label{sec:notation}\<close>
 | 
| 28762 | 488 | |
| 58618 | 489 | text \<open> | 
| 28762 | 490 |   \begin{matharray}{rcll}
 | 
| 61493 | 491 |     @{command_def "type_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
 | 
| 492 |     @{command_def "no_type_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
 | |
| 493 |     @{command_def "notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
 | |
| 494 |     @{command_def "no_notation"} & : & \<open>local_theory \<rightarrow> local_theory\<close> \\
 | |
| 495 |     @{command_def "write"} & : & \<open>proof(state) \<rightarrow> proof(state)\<close> \\
 | |
| 28762 | 496 |   \end{matharray}
 | 
| 497 | ||
| 62106 | 498 | Commands that introduce new logical entities (terms or types) usually allow | 
| 499 | to provide mixfix annotations on the spot, which is convenient for default | |
| 500 | notation. Nonetheless, the syntax may be modified later on by declarations | |
| 501 | for explicit notation. This allows to add or delete mixfix annotations for | |
| 502 | of existing logical entities within the current context. | |
| 46288 | 503 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 504 |   @{rail \<open>
 | 
| 59783 
00b62aa9f430
tuned syntax diagrams -- no duplication of "target";
 wenzelm parents: 
58842diff
changeset | 505 |     (@@{command type_notation} | @@{command no_type_notation}) @{syntax mode}? \<newline>
 | 
| 62969 | 506 |       (@{syntax name} @{syntax mixfix} + @'and')
 | 
| 35413 | 507 | ; | 
| 59783 
00b62aa9f430
tuned syntax diagrams -- no duplication of "target";
 wenzelm parents: 
58842diff
changeset | 508 |     (@@{command notation} | @@{command no_notation}) @{syntax mode}? \<newline>
 | 
| 62969 | 509 |       (@{syntax name} @{syntax mixfix} + @'and')
 | 
| 28762 | 510 | ; | 
| 62969 | 511 |     @@{command write} @{syntax mode}? (@{syntax name} @{syntax mixfix} + @'and')
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 512 | \<close>} | 
| 28762 | 513 | |
| 62106 | 514 |   \<^descr> @{command "type_notation"}~\<open>c (mx)\<close> associates mixfix syntax with an
 | 
| 515 | existing type constructor. The arity of the constructor is retrieved from | |
| 516 | the context. | |
| 46282 | 517 | |
| 62106 | 518 |   \<^descr> @{command "no_type_notation"} is similar to @{command "type_notation"},
 | 
| 519 | but removes the specified syntax annotation from the present context. | |
| 35413 | 520 | |
| 62106 | 521 |   \<^descr> @{command "notation"}~\<open>c (mx)\<close> associates mixfix syntax with an existing
 | 
| 522 | constant or fixed variable. The type declaration of the given entity is | |
| 523 | retrieved from the context. | |
| 46282 | 524 | |
| 62106 | 525 |   \<^descr> @{command "no_notation"} is similar to @{command "notation"}, but removes
 | 
| 526 | the specified syntax annotation from the present context. | |
| 28762 | 527 | |
| 62106 | 528 |   \<^descr> @{command "write"} is similar to @{command "notation"}, but works within
 | 
| 529 | an Isar proof body. | |
| 58618 | 530 | \<close> | 
| 28762 | 531 | |
| 28778 | 532 | |
| 58618 | 533 | section \<open>The Pure syntax \label{sec:pure-syntax}\<close>
 | 
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 534 | |
| 58618 | 535 | subsection \<open>Lexical matters \label{sec:inner-lex}\<close>
 | 
| 46282 | 536 | |
| 62106 | 537 | text \<open> | 
| 538 | The inner lexical syntax vaguely resembles the outer one | |
| 539 |   (\secref{sec:outer-lex}), but some details are different. There are two main
 | |
| 540 | categories of inner syntax tokens: | |
| 46282 | 541 | |
| 62106 | 542 | \<^enum> \<^emph>\<open>delimiters\<close> --- the literal tokens occurring in productions of the given | 
| 543 |   priority grammar (cf.\ \secref{sec:priority-grammar});
 | |
| 46282 | 544 | |
| 61477 | 545 | \<^enum> \<^emph>\<open>named tokens\<close> --- various categories of identifiers etc. | 
| 46282 | 546 | |
| 547 | ||
| 62106 | 548 | Delimiters override named tokens and may thus render certain identifiers | 
| 549 | inaccessible. Sometimes the logical context admits alternative ways to refer | |
| 550 | to the same entity, potentially via qualified names. | |
| 46282 | 551 | |
| 61421 | 552 | \<^medskip> | 
| 62106 | 553 | The categories for named tokens are defined once and for all as follows, | 
| 554 |   reusing some categories of the outer token syntax (\secref{sec:outer-lex}).
 | |
| 46282 | 555 | |
| 556 |   \begin{center}
 | |
| 557 |   \begin{supertabular}{rcl}
 | |
| 63138 
70f4d67235a0
clarified syntax category names according to Isabelle/ML/Scala;
 wenzelm parents: 
62969diff
changeset | 558 |     @{syntax_def (inner) id} & = & @{syntax_ref short_ident} \\
 | 
| 
70f4d67235a0
clarified syntax category names according to Isabelle/ML/Scala;
 wenzelm parents: 
62969diff
changeset | 559 |     @{syntax_def (inner) longid} & = & @{syntax_ref long_ident} \\
 | 
| 46282 | 560 |     @{syntax_def (inner) var} & = & @{syntax_ref var} \\
 | 
| 63138 
70f4d67235a0
clarified syntax category names according to Isabelle/ML/Scala;
 wenzelm parents: 
62969diff
changeset | 561 |     @{syntax_def (inner) tid} & = & @{syntax_ref type_ident} \\
 | 
| 
70f4d67235a0
clarified syntax category names according to Isabelle/ML/Scala;
 wenzelm parents: 
62969diff
changeset | 562 |     @{syntax_def (inner) tvar} & = & @{syntax_ref type_var} \\
 | 
| 58410 
6d46ad54a2ab
explicit separation of signed and unsigned numerals using existing lexical categories num and xnum
 haftmann parents: 
58409diff
changeset | 563 |     @{syntax_def (inner) num_token} & = & @{syntax_ref nat} \\
 | 
| 61503 | 564 |     @{syntax_def (inner) float_token} & = & @{syntax_ref nat}\<^verbatim>\<open>.\<close>@{syntax_ref nat} \\
 | 
| 565 |     @{syntax_def (inner) str_token} & = & \<^verbatim>\<open>''\<close> \<open>\<dots>\<close> \<^verbatim>\<open>''\<close> \\
 | |
| 566 |     @{syntax_def (inner) string_token} & = & \<^verbatim>\<open>"\<close> \<open>\<dots>\<close> \<^verbatim>\<open>"\<close> \\
 | |
| 61493 | 567 |     @{syntax_def (inner) cartouche} & = & @{verbatim "\<open>"} \<open>\<dots>\<close> @{verbatim "\<close>"} \\
 | 
| 46282 | 568 |   \end{supertabular}
 | 
| 569 |   \end{center}
 | |
| 570 | ||
| 571 |   The token categories @{syntax (inner) num_token}, @{syntax (inner)
 | |
| 58421 | 572 |   float_token}, @{syntax (inner) str_token}, @{syntax (inner) string_token},
 | 
| 573 |   and @{syntax (inner) cartouche} are not used in Pure. Object-logics may
 | |
| 574 | implement numerals and string literals by adding appropriate syntax | |
| 63680 | 575 | declarations, together with some translation functions (e.g.\ see | 
| 576 | \<^file>\<open>~~/src/HOL/Tools/string_syntax.ML\<close>). | |
| 46282 | 577 | |
| 58421 | 578 |   The derived categories @{syntax_def (inner) num_const}, and @{syntax_def
 | 
| 579 | (inner) float_const}, provide robust access to the respective tokens: the | |
| 580 | syntax tree holds a syntactic constant instead of a free variable. | |
| 58618 | 581 | \<close> | 
| 46282 | 582 | |
| 583 | ||
| 58618 | 584 | subsection \<open>Priority grammars \label{sec:priority-grammar}\<close>
 | 
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 585 | |
| 62106 | 586 | text \<open> | 
| 587 | A context-free grammar consists of a set of \<^emph>\<open>terminal symbols\<close>, a set of | |
| 588 | \<^emph>\<open>nonterminal symbols\<close> and a set of \<^emph>\<open>productions\<close>. Productions have the | |
| 589 | form \<open>A = \<gamma>\<close>, where \<open>A\<close> is a nonterminal and \<open>\<gamma>\<close> is a string of terminals | |
| 590 | and nonterminals. One designated nonterminal is called the \<^emph>\<open>root symbol\<close>. | |
| 591 | The language defined by the grammar consists of all strings of terminals | |
| 592 | that can be derived from the root symbol by applying productions as rewrite | |
| 593 | rules. | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 594 | |
| 62106 | 595 | The standard Isabelle parser for inner syntax uses a \<^emph>\<open>priority grammar\<close>. | 
| 596 | Each nonterminal is decorated by an integer priority: \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>. In a | |
| 597 | derivation, \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> may be rewritten using a production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> only | |
| 598 | if \<open>p \<le> q\<close>. Any priority grammar can be translated into a normal | |
| 599 | context-free grammar by introducing new nonterminals and productions. | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 600 | |
| 61421 | 601 | \<^medskip> | 
| 62106 | 602 | Formally, a set of context free productions \<open>G\<close> induces a derivation | 
| 603 | relation \<open>\<longrightarrow>\<^sub>G\<close> as follows. Let \<open>\<alpha>\<close> and \<open>\<beta>\<close> denote strings of terminal or | |
| 604 | nonterminal symbols. Then \<open>\<alpha> A\<^sup>(\<^sup>p\<^sup>) \<beta> \<longrightarrow>\<^sub>G \<alpha> \<gamma> \<beta>\<close> holds if and only if \<open>G\<close> | |
| 61493 | 605 | contains some production \<open>A\<^sup>(\<^sup>q\<^sup>) = \<gamma>\<close> for \<open>p \<le> q\<close>. | 
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 606 | |
| 61421 | 607 | \<^medskip> | 
| 62106 | 608 | The following grammar for arithmetic expressions demonstrates how binding | 
| 609 | power and associativity of operators can be enforced by priorities. | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 610 | |
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 611 |   \begin{center}
 | 
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 612 |   \begin{tabular}{rclr}
 | 
| 61503 | 613 | \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>(\<close> \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>)\<close> \\ | 
| 614 | \<open>A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>0\<close> \\ | |
| 615 | \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>+\<close> \<open>A\<^sup>(\<^sup>1\<^sup>)\<close> \\ | |
| 616 | \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>=\<close> & \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>*\<close> \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> \\ | |
| 617 | \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>=\<close> & \<^verbatim>\<open>-\<close> \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \\ | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 618 |   \end{tabular}
 | 
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 619 |   \end{center}
 | 
| 62106 | 620 | The choice of priorities determines that \<^verbatim>\<open>-\<close> binds tighter than \<^verbatim>\<open>*\<close>, which | 
| 621 | binds tighter than \<^verbatim>\<open>+\<close>. Furthermore \<^verbatim>\<open>+\<close> associates to the left and \<^verbatim>\<open>*\<close> to | |
| 622 | the right. | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 623 | |
| 61421 | 624 | \<^medskip> | 
| 625 | For clarity, grammars obey these conventions: | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 626 | |
| 62106 | 627 | \<^item> All priorities must lie between 0 and 1000. | 
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 628 | |
| 62106 | 629 | \<^item> Priority 0 on the right-hand side and priority 1000 on the left-hand | 
| 630 | side may be omitted. | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 631 | |
| 62106 | 632 | \<^item> The production \<open>A\<^sup>(\<^sup>p\<^sup>) = \<alpha>\<close> is written as \<open>A = \<alpha> (p)\<close>, i.e.\ the | 
| 633 | priority of the left-hand side actually appears in a column on the far | |
| 634 | right. | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 635 | |
| 62106 | 636 | \<^item> Alternatives are separated by \<open>|\<close>. | 
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 637 | |
| 62106 | 638 | \<^item> Repetition is indicated by dots \<open>(\<dots>)\<close> in an informal but obvious way. | 
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 639 | |
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 640 | Using these conventions, the example grammar specification above | 
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 641 | takes the form: | 
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 642 |   \begin{center}
 | 
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 643 |   \begin{tabular}{rclc}
 | 
| 61503 | 644 | \<open>A\<close> & \<open>=\<close> & \<^verbatim>\<open>(\<close> \<open>A\<close> \<^verbatim>\<open>)\<close> \\ | 
| 645 | & \<open>|\<close> & \<^verbatim>\<open>0\<close> & \qquad\qquad \\ | |
| 646 | & \<open>|\<close> & \<open>A\<close> \<^verbatim>\<open>+\<close> \<open>A\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(0)\<close> \\ | |
| 647 | & \<open>|\<close> & \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>*\<close> \<open>A\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>(2)\<close> \\ | |
| 648 | & \<open>|\<close> & \<^verbatim>\<open>-\<close> \<open>A\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\ | |
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 649 |   \end{tabular}
 | 
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 650 |   \end{center}
 | 
| 58618 | 651 | \<close> | 
| 28769 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 652 | |
| 
8fc228f21861
added section "Priority grammars" (variant from old ref manual);
 wenzelm parents: 
28767diff
changeset | 653 | |
| 58618 | 654 | subsection \<open>The Pure grammar \label{sec:pure-grammar}\<close>
 | 
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 655 | |
| 62106 | 656 | text \<open> | 
| 657 | The priority grammar of the \<open>Pure\<close> theory is defined approximately like | |
| 658 | this: | |
| 28774 | 659 | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 660 |   \begin{center}
 | 
| 28773 | 661 |   \begin{supertabular}{rclr}
 | 
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 662 | |
| 61493 | 663 |   @{syntax_def (inner) any} & = & \<open>prop  |  logic\<close> \\\\
 | 
| 28772 | 664 | |
| 61503 | 665 |   @{syntax_def (inner) prop} & = & \<^verbatim>\<open>(\<close> \<open>prop\<close> \<^verbatim>\<open>)\<close> \\
 | 
| 666 | & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>4\<^sup>)\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(3)\<close> \\ | |
| 667 | & \<open>|\<close> & \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>==\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(2)\<close> \\ | |
| 61493 | 668 | & \<open>|\<close> & \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> \<open>\<equiv>\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(2)\<close> \\ | 
| 61503 | 669 | & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>3\<^sup>)\<close> \<^verbatim>\<open>&&&\<close> \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> & \<open>(2)\<close> \\ | 
| 670 | & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> \<^verbatim>\<open>==>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\ | |
| 61493 | 671 | & \<open>|\<close> & \<open>prop\<^sup>(\<^sup>2\<^sup>)\<close> \<open>\<Longrightarrow>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\ | 
| 61503 | 672 | & \<open>|\<close> & \<^verbatim>\<open>[|\<close> \<open>prop\<close> \<^verbatim>\<open>;\<close> \<open>\<dots>\<close> \<^verbatim>\<open>;\<close> \<open>prop\<close> \<^verbatim>\<open>|]\<close> \<^verbatim>\<open>==>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\ | 
| 673 | & \<open>|\<close> & \<open>\<lbrakk>\<close> \<open>prop\<close> \<^verbatim>\<open>;\<close> \<open>\<dots>\<close> \<^verbatim>\<open>;\<close> \<open>prop\<close> \<open>\<rbrakk>\<close> \<open>\<Longrightarrow>\<close> \<open>prop\<^sup>(\<^sup>1\<^sup>)\<close> & \<open>(1)\<close> \\ | |
| 674 | & \<open>|\<close> & \<^verbatim>\<open>!!\<close> \<open>idts\<close> \<^verbatim>\<open>.\<close> \<open>prop\<close> & \<open>(0)\<close> \\ | |
| 675 | & \<open>|\<close> & \<open>\<And>\<close> \<open>idts\<close> \<^verbatim>\<open>.\<close> \<open>prop\<close> & \<open>(0)\<close> \\ | |
| 676 | & \<open>|\<close> & \<^verbatim>\<open>OFCLASS\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>logic\<close> \<^verbatim>\<open>)\<close> \\ | |
| 677 | & \<open>|\<close> & \<^verbatim>\<open>SORT_CONSTRAINT\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\ | |
| 678 | & \<open>|\<close> & \<^verbatim>\<open>TERM\<close> \<open>logic\<close> \\ | |
| 679 | & \<open>|\<close> & \<^verbatim>\<open>PROP\<close> \<open>aprop\<close> \\\\ | |
| 28772 | 680 | |
| 61503 | 681 |   @{syntax_def (inner) aprop} & = & \<^verbatim>\<open>(\<close> \<open>aprop\<close> \<^verbatim>\<open>)\<close> \\
 | 
| 682 | & \<open>|\<close> & \<open>id | longid | var |\<close>~~\<^verbatim>\<open>_\<close>~~\<open>|\<close>~~\<^verbatim>\<open>...\<close> \\ | |
| 683 | & \<open>|\<close> & \<^verbatim>\<open>CONST\<close> \<open>id |\<close>~~\<^verbatim>\<open>CONST\<close> \<open>longid\<close> \\ | |
| 684 | & \<open>|\<close> & \<^verbatim>\<open>XCONST\<close> \<open>id |\<close>~~\<^verbatim>\<open>XCONST\<close> \<open>longid\<close> \\ | |
| 61493 | 685 | & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>(999)\<close> \\\\ | 
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 686 | |
| 61503 | 687 |   @{syntax_def (inner) logic} & = & \<^verbatim>\<open>(\<close> \<open>logic\<close> \<^verbatim>\<open>)\<close> \\
 | 
| 688 | & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>4\<^sup>)\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(3)\<close> \\ | |
| 689 | & \<open>|\<close> & \<open>id | longid | var |\<close>~~\<^verbatim>\<open>_\<close>~~\<open>|\<close>~~\<^verbatim>\<open>...\<close> \\ | |
| 690 | & \<open>|\<close> & \<^verbatim>\<open>CONST\<close> \<open>id |\<close>~~\<^verbatim>\<open>CONST\<close> \<open>longid\<close> \\ | |
| 691 | & \<open>|\<close> & \<^verbatim>\<open>XCONST\<close> \<open>id |\<close>~~\<^verbatim>\<open>XCONST\<close> \<open>longid\<close> \\ | |
| 61493 | 692 | & \<open>|\<close> & \<open>logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)\<close> & \<open>(999)\<close> \\ | 
| 61503 | 693 | & \<open>|\<close> & \<^verbatim>\<open>%\<close> \<open>pttrns\<close> \<^verbatim>\<open>.\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\ | 
| 694 | & \<open>|\<close> & \<open>\<lambda>\<close> \<open>pttrns\<close> \<^verbatim>\<open>.\<close> \<open>any\<^sup>(\<^sup>3\<^sup>)\<close> & \<open>(3)\<close> \\ | |
| 695 | & \<open>|\<close> & \<^verbatim>\<open>op\<close> \<^verbatim>\<open>==\<close>~~\<open>|\<close>~~\<^verbatim>\<open>op\<close> \<open>\<equiv>\<close>~~\<open>|\<close>~~\<^verbatim>\<open>op\<close> \<^verbatim>\<open>&&&\<close> \\ | |
| 696 | & \<open>|\<close> & \<^verbatim>\<open>op\<close> \<^verbatim>\<open>==>\<close>~~\<open>|\<close>~~\<^verbatim>\<open>op\<close> \<open>\<Longrightarrow>\<close> \\ | |
| 697 | & \<open>|\<close> & \<^verbatim>\<open>TYPE\<close> \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\\\ | |
| 28772 | 698 | |
| 61503 | 699 |   @{syntax_def (inner) idt} & = & \<^verbatim>\<open>(\<close> \<open>idt\<close> \<^verbatim>\<open>)\<close>~~\<open>|  id  |\<close>~~\<^verbatim>\<open>_\<close> \\
 | 
| 700 | & \<open>|\<close> & \<open>id\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(0)\<close> \\ | |
| 701 | & \<open>|\<close> & \<^verbatim>\<open>_\<close> \<^verbatim>\<open>::\<close> \<open>type\<close> & \<open>(0)\<close> \\\\ | |
| 28772 | 702 | |
| 61503 | 703 |   @{syntax_def (inner) index} & = & \<^verbatim>\<open>\<^bsub>\<close> \<open>logic\<^sup>(\<^sup>0\<^sup>)\<close> \<^verbatim>\<open>\<^esub>\<close>~~\<open>|  |  \<index>\<close> \\\\
 | 
| 46287 | 704 | |
| 61493 | 705 |   @{syntax_def (inner) idts} & = & \<open>idt  |  idt\<^sup>(\<^sup>1\<^sup>) idts\<close> & \<open>(0)\<close> \\\\
 | 
| 28772 | 706 | |
| 61493 | 707 |   @{syntax_def (inner) pttrn} & = & \<open>idt\<close> \\\\
 | 
| 28772 | 708 | |
| 61493 | 709 |   @{syntax_def (inner) pttrns} & = & \<open>pttrn  |  pttrn\<^sup>(\<^sup>1\<^sup>) pttrns\<close> & \<open>(0)\<close> \\\\
 | 
| 28774 | 710 | |
| 61503 | 711 |   @{syntax_def (inner) type} & = & \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \\
 | 
| 712 | & \<open>|\<close> & \<open>tid | tvar |\<close>~~\<^verbatim>\<open>_\<close> \\ | |
| 713 | & \<open>|\<close> & \<open>tid\<close> \<^verbatim>\<open>::\<close> \<open>sort | tvar\<close>~~\<^verbatim>\<open>::\<close> \<open>sort |\<close>~~\<^verbatim>\<open>_\<close> \<^verbatim>\<open>::\<close> \<open>sort\<close> \\ | |
| 61493 | 714 | & \<open>|\<close> & \<open>type_name | type\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) type_name\<close> \\ | 
| 61503 | 715 | & \<open>|\<close> & \<^verbatim>\<open>(\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>)\<close> \<open>type_name\<close> \\ | 
| 716 | & \<open>|\<close> & \<open>type\<^sup>(\<^sup>1\<^sup>)\<close> \<^verbatim>\<open>=>\<close> \<open>type\<close> & \<open>(0)\<close> \\ | |
| 61493 | 717 | & \<open>|\<close> & \<open>type\<^sup>(\<^sup>1\<^sup>)\<close> \<open>\<Rightarrow>\<close> \<open>type\<close> & \<open>(0)\<close> \\ | 
| 61503 | 718 | & \<open>|\<close> & \<^verbatim>\<open>[\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>]\<close> \<^verbatim>\<open>=>\<close> \<open>type\<close> & \<open>(0)\<close> \\ | 
| 719 | & \<open>|\<close> & \<^verbatim>\<open>[\<close> \<open>type\<close> \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> \<open>type\<close> \<^verbatim>\<open>]\<close> \<open>\<Rightarrow>\<close> \<open>type\<close> & \<open>(0)\<close> \\ | |
| 61493 | 720 |   @{syntax_def (inner) type_name} & = & \<open>id  |  longid\<close> \\\\
 | 
| 28772 | 721 | |
| 61503 | 722 |   @{syntax_def (inner) sort} & = & @{syntax class_name}~~\<open>|\<close>~~\<^verbatim>\<open>{}\<close> \\
 | 
| 723 |     & \<open>|\<close> & \<^verbatim>\<open>{\<close> @{syntax class_name} \<^verbatim>\<open>,\<close> \<open>\<dots>\<close> \<^verbatim>\<open>,\<close> @{syntax class_name} \<^verbatim>\<open>}\<close> \\
 | |
| 61493 | 724 |   @{syntax_def (inner) class_name} & = & \<open>id  |  longid\<close> \\
 | 
| 28773 | 725 |   \end{supertabular}
 | 
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 726 |   \end{center}
 | 
| 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 727 | |
| 61421 | 728 | \<^medskip> | 
| 62106 | 729 | Here literal terminals are printed \<^verbatim>\<open>verbatim\<close>; see also | 
| 730 |   \secref{sec:inner-lex} for further token categories of the inner syntax. The
 | |
| 731 | meaning of the nonterminals defined by the above grammar is as follows: | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 732 | |
| 61439 | 733 |   \<^descr> @{syntax_ref (inner) any} denotes any term.
 | 
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 734 | |
| 62106 | 735 |   \<^descr> @{syntax_ref (inner) prop} denotes meta-level propositions, which are
 | 
| 736 |   terms of type @{typ prop}. The syntax of such formulae of the meta-logic is
 | |
| 737 | carefully distinguished from usual conventions for object-logics. In | |
| 738 |   particular, plain \<open>\<lambda>\<close>-term notation is \<^emph>\<open>not\<close> recognized as @{syntax (inner)
 | |
| 739 | prop}. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 740 | |
| 62106 | 741 |   \<^descr> @{syntax_ref (inner) aprop} denotes atomic propositions, which are
 | 
| 742 |   embedded into regular @{syntax (inner) prop} by means of an explicit \<^verbatim>\<open>PROP\<close>
 | |
| 743 | token. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 744 | |
| 62106 | 745 |   Terms of type @{typ prop} with non-constant head, e.g.\ a plain variable,
 | 
| 746 |   are printed in this form. Constants that yield type @{typ prop} are expected
 | |
| 747 | to provide their own concrete syntax; otherwise the printed version will | |
| 748 |   appear like @{syntax (inner) logic} and cannot be parsed again as @{syntax
 | |
| 749 | (inner) prop}. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 750 | |
| 62106 | 751 |   \<^descr> @{syntax_ref (inner) logic} denotes arbitrary terms of a logical type,
 | 
| 752 |   excluding type @{typ prop}. This is the main syntactic category of
 | |
| 753 | object-logic entities, covering plain \<open>\<lambda>\<close>-term notation (variables, | |
| 754 | abstraction, application), plus anything defined by the user. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 755 | |
| 62106 | 756 | When specifying notation for logical entities, all logical types (excluding | 
| 757 |   @{typ prop}) are \<^emph>\<open>collapsed\<close> to this single category of @{syntax (inner)
 | |
| 758 | logic}. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 759 | |
| 62106 | 760 |   \<^descr> @{syntax_ref (inner) index} denotes an optional index term for indexed
 | 
| 761 |   syntax. If omitted, it refers to the first @{keyword_ref "structure"}
 | |
| 762 | variable in the context. The special dummy ``\<open>\<index>\<close>'' serves as pattern | |
| 763 | variable in mixfix annotations that introduce indexed notation. | |
| 46287 | 764 | |
| 62106 | 765 |   \<^descr> @{syntax_ref (inner) idt} denotes identifiers, possibly constrained by
 | 
| 766 | types. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 767 | |
| 62106 | 768 |   \<^descr> @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref (inner)
 | 
| 769 | idt}. This is the most basic category for variables in iterated binders, | |
| 770 | such as \<open>\<lambda>\<close> or \<open>\<And>\<close>. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 771 | |
| 62106 | 772 |   \<^descr> @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns} denote
 | 
| 773 | patterns for abstraction, cases bindings etc. In Pure, these categories | |
| 774 |   start as a merely copy of @{syntax (inner) idt} and @{syntax (inner) idts},
 | |
| 775 | respectively. Object-logics may add additional productions for binding | |
| 776 | forms. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 777 | |
| 61439 | 778 |   \<^descr> @{syntax_ref (inner) type} denotes types of the meta-logic.
 | 
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 779 | |
| 61439 | 780 |   \<^descr> @{syntax_ref (inner) sort} denotes meta-level sorts.
 | 
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 781 | |
| 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 782 | |
| 28774 | 783 | Here are some further explanations of certain syntax features. | 
| 28773 | 784 | |
| 62106 | 785 |   \<^item> In @{syntax (inner) idts}, note that \<open>x :: nat y\<close> is parsed as \<open>x :: (nat
 | 
| 786 | y)\<close>, treating \<open>y\<close> like a type constructor applied to \<open>nat\<close>. To avoid this | |
| 787 | interpretation, write \<open>(x :: nat) y\<close> with explicit parentheses. | |
| 28773 | 788 | |
| 62106 | 789 | \<^item> Similarly, \<open>x :: nat y :: nat\<close> is parsed as \<open>x :: (nat y :: nat)\<close>. The | 
| 790 | correct form is \<open>(x :: nat) (y :: nat)\<close>, or \<open>(x :: nat) y :: nat\<close> if \<open>y\<close> is | |
| 791 | last in the sequence of identifiers. | |
| 28773 | 792 | |
| 62106 | 793 | \<^item> Type constraints for terms bind very weakly. For example, \<open>x < y :: nat\<close> | 
| 794 | is normally parsed as \<open>(x < y) :: nat\<close>, unless \<open><\<close> has a very low priority, | |
| 795 | in which case the input is likely to be ambiguous. The correct form is \<open>x < | |
| 796 | (y :: nat)\<close>. | |
| 28773 | 797 | |
| 61421 | 798 | \<^item> Dummy variables (written as underscore) may occur in different | 
| 28774 | 799 | roles. | 
| 28773 | 800 | |
| 62106 | 801 | \<^descr> A type ``\<open>_\<close>'' or ``\<open>_ :: sort\<close>'' acts like an anonymous inference | 
| 802 | parameter, which is filled-in according to the most general type produced | |
| 803 | by the type-checking phase. | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 804 | |
| 62106 | 805 | \<^descr> A bound ``\<open>_\<close>'' refers to a vacuous abstraction, where the body does not | 
| 806 |     refer to the binding introduced here. As in the term @{term "\<lambda>x _. x"},
 | |
| 807 | which is \<open>\<alpha>\<close>-equivalent to \<open>\<lambda>x y. x\<close>. | |
| 28773 | 808 | |
| 62106 | 809 | \<^descr> A free ``\<open>_\<close>'' refers to an implicit outer binding. Higher definitional | 
| 810 | packages usually allow forms like \<open>f x _ = x\<close>. | |
| 28773 | 811 | |
| 62106 | 812 |     \<^descr> A schematic ``\<open>_\<close>'' (within a term pattern, see \secref{sec:term-decls})
 | 
| 813 | refers to an anonymous variable that is implicitly abstracted over its | |
| 814 | context of locally bound variables. For example, this allows pattern | |
| 815 |     matching of \<open>{x. f x = g x}\<close> against \<open>{x. _ = _}\<close>, or even \<open>{_. _ = _}\<close> by
 | |
| 61458 | 816 | using both bound and schematic dummies. | 
| 28773 | 817 | |
| 62106 | 818 | \<^descr> The three literal dots ``\<^verbatim>\<open>...\<close>'' may be also written as ellipsis symbol | 
| 819 | \<^verbatim>\<open>\<dots>\<close>. In both cases this refers to a special schematic variable, which is | |
| 820 | bound in the context. This special term abbreviation works nicely with | |
| 28774 | 821 |   calculational reasoning (\secref{sec:calculation}).
 | 
| 822 | ||
| 62106 | 823 | \<^descr> \<^verbatim>\<open>CONST\<close> ensures that the given identifier is treated as constant term, | 
| 824 | and passed through the parse tree in fully internalized form. This is | |
| 825 |   particularly relevant for translation rules (\secref{sec:syn-trans}),
 | |
| 826 | notably on the RHS. | |
| 46287 | 827 | |
| 62106 | 828 | \<^descr> \<^verbatim>\<open>XCONST\<close> is similar to \<^verbatim>\<open>CONST\<close>, but retains the constant name as given. | 
| 829 |   This is only relevant to translation rules (\secref{sec:syn-trans}), notably
 | |
| 830 | on the LHS. | |
| 58618 | 831 | \<close> | 
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 832 | |
| 28777 | 833 | |
| 58618 | 834 | subsection \<open>Inspecting the syntax\<close> | 
| 28777 | 835 | |
| 58618 | 836 | text \<open> | 
| 46282 | 837 |   \begin{matharray}{rcl}
 | 
| 61493 | 838 |     @{command_def "print_syntax"}\<open>\<^sup>*\<close> & : & \<open>context \<rightarrow>\<close> \\
 | 
| 46282 | 839 |   \end{matharray}
 | 
| 28777 | 840 | |
| 62106 | 841 |   \<^descr> @{command "print_syntax"} prints the inner syntax of the current context.
 | 
| 842 | The output can be quite large; the most important sections are explained | |
| 843 | below. | |
| 28777 | 844 | |
| 62106 | 845 | \<^descr> \<open>lexicon\<close> lists the delimiters of the inner token language; see | 
| 846 |     \secref{sec:inner-lex}.
 | |
| 28777 | 847 | |
| 62106 | 848 | \<^descr> \<open>prods\<close> lists the productions of the underlying priority grammar; see | 
| 849 |     \secref{sec:priority-grammar}.
 | |
| 28777 | 850 | |
| 62106 | 851 | The nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is rendered in plain text as \<open>A[p]\<close>; delimiters | 
| 852 | are quoted. Many productions have an extra \<open>\<dots> => name\<close>. These names later | |
| 853 | become the heads of parse trees; they also guide the pretty printer. | |
| 854 | ||
| 855 | Productions without such parse tree names are called \<^emph>\<open>copy productions\<close>. | |
| 856 | Their right-hand side must have exactly one nonterminal symbol (or named | |
| 857 | token). The parser does not create a new parse tree node for copy | |
| 858 | productions, but simply returns the parse tree of the right-hand symbol. | |
| 46282 | 859 | |
| 61458 | 860 | If the right-hand side of a copy production consists of a single | 
| 61477 | 861 | nonterminal without any delimiters, then it is called a \<^emph>\<open>chain | 
| 62106 | 862 | production\<close>. Chain productions act as abbreviations: conceptually, they | 
| 863 | are removed from the grammar by adding new productions. Priority | |
| 864 | information attached to chain productions is ignored; only the dummy value | |
| 865 | \<open>-1\<close> is displayed. | |
| 46282 | 866 | |
| 62106 | 867 | \<^descr> \<open>print modes\<close> lists the alternative print modes provided by this | 
| 868 |     grammar; see \secref{sec:print-modes}.
 | |
| 28777 | 869 | |
| 62106 | 870 | \<^descr> \<open>parse_rules\<close> and \<open>print_rules\<close> relate to syntax translations (macros); | 
| 871 |     see \secref{sec:syn-trans}.
 | |
| 46282 | 872 | |
| 62106 | 873 | \<^descr> \<open>parse_ast_translation\<close> and \<open>print_ast_translation\<close> list sets of | 
| 874 | constants that invoke translation functions for abstract syntax trees, | |
| 875 | which are only required in very special situations; see | |
| 876 |     \secref{sec:tr-funs}.
 | |
| 28777 | 877 | |
| 62106 | 878 | \<^descr> \<open>parse_translation\<close> and \<open>print_translation\<close> list the sets of constants | 
| 879 |     that invoke regular translation functions; see \secref{sec:tr-funs}.
 | |
| 58618 | 880 | \<close> | 
| 28774 | 881 | |
| 28770 
93a372e2dc7a
added section "The Pure grammar" (incomplete version, based on old ref manual);
 wenzelm parents: 
28769diff
changeset | 882 | |
| 58618 | 883 | subsection \<open>Ambiguity of parsed expressions\<close> | 
| 46291 | 884 | |
| 58618 | 885 | text \<open> | 
| 46291 | 886 |   \begin{tabular}{rcll}
 | 
| 61493 | 887 |     @{attribute_def syntax_ambiguity_warning} & : & \<open>attribute\<close> & default \<open>true\<close> \\
 | 
| 888 |     @{attribute_def syntax_ambiguity_limit} & : & \<open>attribute\<close> & default \<open>10\<close> \\
 | |
| 46291 | 889 |   \end{tabular}
 | 
| 890 | ||
| 62106 | 891 | Depending on the grammar and the given input, parsing may be ambiguous. | 
| 892 | Isabelle lets the Earley parser enumerate all possible parse trees, and then | |
| 893 | tries to make the best out of the situation. Terms that cannot be | |
| 894 | type-checked are filtered out, which often leads to a unique result in the | |
| 895 | end. Unlike regular type reconstruction, which is applied to the whole | |
| 896 | collection of input terms simultaneously, the filtering stage only treats | |
| 897 | each given term in isolation. Filtering is also not attempted for individual | |
| 46291 | 898 |   types or raw ASTs (as required for @{command translations}).
 | 
| 899 | ||
| 62106 | 900 | Certain warning or error messages are printed, depending on the situation | 
| 901 | and the given configuration options. Parsing ultimately fails, if multiple | |
| 902 | results remain after the filtering phase. | |
| 46291 | 903 | |
| 62106 | 904 |   \<^descr> @{attribute syntax_ambiguity_warning} controls output of explicit warning
 | 
| 905 | messages about syntax ambiguity. | |
| 46291 | 906 | |
| 62106 | 907 |   \<^descr> @{attribute syntax_ambiguity_limit} determines the number of resulting
 | 
| 908 | parse trees that are shown as part of the printed message in case of an | |
| 909 | ambiguity. | |
| 58618 | 910 | \<close> | 
| 46291 | 911 | |
| 912 | ||
| 58618 | 913 | section \<open>Syntax transformations \label{sec:syntax-transformations}\<close>
 | 
| 48113 | 914 | |
| 62106 | 915 | text \<open> | 
| 916 | The inner syntax engine of Isabelle provides separate mechanisms to | |
| 917 | transform parse trees either via rewrite systems on first-order ASTs | |
| 918 |   (\secref{sec:syn-trans}), or ML functions on ASTs or syntactic \<open>\<lambda>\<close>-terms
 | |
| 919 |   (\secref{sec:tr-funs}). This works both for parsing and printing, as
 | |
| 920 |   outlined in \figref{fig:parse-print}.
 | |
| 48113 | 921 | |
| 922 |   \begin{figure}[htbp]
 | |
| 923 |   \begin{center}
 | |
| 924 |   \begin{tabular}{cl}
 | |
| 925 | string & \\ | |
| 61493 | 926 | \<open>\<down>\<close> & lexer + parser \\ | 
| 48113 | 927 | parse tree & \\ | 
| 61493 | 928 | \<open>\<down>\<close> & parse AST translation \\ | 
| 48113 | 929 | AST & \\ | 
| 61493 | 930 | \<open>\<down>\<close> & AST rewriting (macros) \\ | 
| 48113 | 931 | AST & \\ | 
| 61493 | 932 | \<open>\<down>\<close> & parse translation \\ | 
| 48113 | 933 | --- pre-term --- & \\ | 
| 61493 | 934 | \<open>\<down>\<close> & print translation \\ | 
| 48113 | 935 | AST & \\ | 
| 61493 | 936 | \<open>\<down>\<close> & AST rewriting (macros) \\ | 
| 48113 | 937 | AST & \\ | 
| 61493 | 938 | \<open>\<down>\<close> & print AST translation \\ | 
| 48113 | 939 | string & | 
| 940 |   \end{tabular}
 | |
| 941 |   \end{center}
 | |
| 942 |   \caption{Parsing and printing with translations}\label{fig:parse-print}
 | |
| 943 |   \end{figure}
 | |
| 944 | ||
| 62106 | 945 | These intermediate syntax tree formats eventually lead to a pre-term with | 
| 946 | all names and binding scopes resolved, but most type information still | |
| 947 | missing. Explicit type constraints might be given by the user, or implicit | |
| 948 | position information by the system --- both need to be passed-through | |
| 949 | carefully by syntax transformations. | |
| 48113 | 950 | |
| 62106 | 951 | Pre-terms are further processed by the so-called \<^emph>\<open>check\<close> and \<^emph>\<open>uncheck\<close> | 
| 952 |   phases that are intertwined with type-inference (see also @{cite
 | |
| 953 | "isabelle-implementation"}). The latter allows to operate on higher-order | |
| 954 | abstract syntax with proper binding and type information already available. | |
| 48113 | 955 | |
| 62106 | 956 | As a rule of thumb, anything that manipulates bindings of variables or | 
| 957 | constants needs to be implemented as syntax transformation (see below). | |
| 958 | Anything else is better done via check/uncheck: a prominent example | |
| 959 |   application is the @{command abbreviation} concept of Isabelle/Pure.
 | |
| 960 | \<close> | |
| 48113 | 961 | |
| 962 | ||
| 58618 | 963 | subsection \<open>Abstract syntax trees \label{sec:ast}\<close>
 | 
| 48113 | 964 | |
| 62106 | 965 | text \<open> | 
| 966 |   The ML datatype @{ML_type Ast.ast} explicitly represents the intermediate
 | |
| 967 |   AST format that is used for syntax rewriting (\secref{sec:syn-trans}). It is
 | |
| 968 | defined in ML as follows: | |
| 61408 
9020a3ba6c9a
@{verbatim [display]} supersedes old alltt/ttbox;
 wenzelm parents: 
61143diff
changeset | 969 |   @{verbatim [display]
 | 
| 
9020a3ba6c9a
@{verbatim [display]} supersedes old alltt/ttbox;
 wenzelm parents: 
61143diff
changeset | 970 | \<open>datatype ast = | 
| 
9020a3ba6c9a
@{verbatim [display]} supersedes old alltt/ttbox;
 wenzelm parents: 
61143diff
changeset | 971 | Constant of string | | 
| 
9020a3ba6c9a
@{verbatim [display]} supersedes old alltt/ttbox;
 wenzelm parents: 
61143diff
changeset | 972 | Variable of string | | 
| 
9020a3ba6c9a
@{verbatim [display]} supersedes old alltt/ttbox;
 wenzelm parents: 
61143diff
changeset | 973 | Appl of ast list\<close>} | 
| 48114 | 974 | |
| 62106 | 975 | An AST is either an atom (constant or variable) or a list of (at least two) | 
| 976 | subtrees. Occasional diagnostic output of ASTs uses notation that resembles | |
| 977 | S-expression of LISP. Constant atoms are shown as quoted strings, variable | |
| 978 | atoms as non-quoted strings and applications as a parenthesized list of | |
| 979 | subtrees. For example, the AST | |
| 58724 | 980 |   @{ML [display] \<open>Ast.Appl [Ast.Constant "_abs", Ast.Variable "x", Ast.Variable "t"]\<close>}
 | 
| 62106 | 981 |   is pretty-printed as \<^verbatim>\<open>("_abs" x t)\<close>. Note that \<^verbatim>\<open>()\<close> and \<^verbatim>\<open>(x)\<close> are
 | 
| 982 | excluded as ASTs, because they have too few subtrees. | |
| 48114 | 983 | |
| 61421 | 984 | \<^medskip> | 
| 62106 | 985 | AST application is merely a pro-forma mechanism to indicate certain | 
| 986 | syntactic structures. Thus \<^verbatim>\<open>(c a b)\<close> could mean either term application or | |
| 987 | type application, depending on the syntactic context. | |
| 48114 | 988 | |
| 62106 | 989 |   Nested application like \<^verbatim>\<open>(("_abs" x t) u)\<close> is also possible, but ASTs are
 | 
| 990 | definitely first-order: the syntax constant \<^verbatim>\<open>"_abs"\<close> does not bind the \<^verbatim>\<open>x\<close> | |
| 991 | in any way. Proper bindings are introduced in later stages of the term | |
| 992 |   syntax, where \<^verbatim>\<open>("_abs" x t)\<close> becomes an @{ML Abs} node and occurrences of
 | |
| 993 | \<^verbatim>\<open>x\<close> in \<^verbatim>\<open>t\<close> are replaced by bound variables (represented as de-Bruijn | |
| 994 | indices). | |
| 58618 | 995 | \<close> | 
| 48113 | 996 | |
| 997 | ||
| 58618 | 998 | subsubsection \<open>AST constants versus variables\<close> | 
| 48114 | 999 | |
| 62106 | 1000 | text \<open> | 
| 1001 | Depending on the situation --- input syntax, output syntax, translation | |
| 1002 |   patterns --- the distinction of atomic ASTs as @{ML Ast.Constant} versus
 | |
| 1003 |   @{ML Ast.Variable} serves slightly different purposes.
 | |
| 48114 | 1004 | |
| 62106 | 1005 | Input syntax of a term such as \<open>f a b = c\<close> does not yet indicate the scopes | 
| 1006 | of atomic entities \<open>f, a, b, c\<close>: they could be global constants or local | |
| 1007 |   variables, even bound ones depending on the context of the term. @{ML
 | |
| 1008 | Ast.Variable} leaves this choice still open: later syntax layers (or | |
| 1009 | translation functions) may capture such a variable to determine its role | |
| 1010 | specifically, to make it a constant, bound variable, free variable etc. In | |
| 1011 | contrast, syntax translations that introduce already known constants would | |
| 1012 |   rather do it via @{ML Ast.Constant} to prevent accidental re-interpretation
 | |
| 1013 | later on. | |
| 48114 | 1014 | |
| 62106 | 1015 |   Output syntax turns term constants into @{ML Ast.Constant} and variables
 | 
| 1016 |   (free or schematic) into @{ML Ast.Variable}. This information is precise
 | |
| 1017 | when printing fully formal \<open>\<lambda>\<close>-terms. | |
| 48114 | 1018 | |
| 61421 | 1019 | \<^medskip> | 
| 62106 | 1020 |   AST translation patterns (\secref{sec:syn-trans}) that represent terms
 | 
| 1021 | cannot distinguish constants and variables syntactically. Explicit | |
| 1022 | indication of \<open>CONST c\<close> inside the term language is required, unless \<open>c\<close> is | |
| 1023 |   known as special \<^emph>\<open>syntax constant\<close> (see also @{command syntax}). It is also
 | |
| 1024 |   possible to use @{command syntax} declarations (without mixfix annotation)
 | |
| 1025 | to enforce that certain unqualified names are always treated as constant | |
| 1026 | within the syntax machinery. | |
| 48114 | 1027 | |
| 62106 | 1028 | The situation is simpler for ASTs that represent types or sorts, since the | 
| 1029 | concrete syntax already distinguishes type variables from type constants | |
| 1030 |   (constructors). So \<open>('a, 'b) foo\<close> corresponds to an AST application of some
 | |
| 1031 | constant for \<open>foo\<close> and variable arguments for \<open>'a\<close> and \<open>'b\<close>. Note that the | |
| 1032 | postfix application is merely a feature of the concrete syntax, while in the | |
| 1033 | AST the constructor occurs in head position. | |
| 1034 | \<close> | |
| 48114 | 1035 | |
| 1036 | ||
| 58618 | 1037 | subsubsection \<open>Authentic syntax names\<close> | 
| 48114 | 1038 | |
| 62106 | 1039 | text \<open> | 
| 1040 | Naming constant entities within ASTs is another delicate issue. Unqualified | |
| 1041 | names are resolved in the name space tables in the last stage of parsing, | |
| 1042 | after all translations have been applied. Since syntax transformations do | |
| 1043 | not know about this later name resolution, there can be surprises in | |
| 1044 | boundary cases. | |
| 48114 | 1045 | |
| 62106 | 1046 |   \<^emph>\<open>Authentic syntax names\<close> for @{ML Ast.Constant} avoid this problem: the
 | 
| 1047 | fully-qualified constant name with a special prefix for its formal category | |
| 1048 | (\<open>class\<close>, \<open>type\<close>, \<open>const\<close>, \<open>fixed\<close>) represents the information faithfully | |
| 1049 | within the untyped AST format. Accidental overlap with free or bound | |
| 1050 | variables is excluded as well. Authentic syntax names work implicitly in the | |
| 1051 | following situations: | |
| 48114 | 1052 | |
| 62106 | 1053 | \<^item> Input of term constants (or fixed variables) that are introduced by | 
| 1054 |     concrete syntax via @{command notation}: the correspondence of a
 | |
| 1055 | particular grammar production to some known term entity is preserved. | |
| 48114 | 1056 | |
| 62106 | 1057 | \<^item> Input of type constants (constructors) and type classes --- thanks to | 
| 1058 | explicit syntactic distinction independently on the context. | |
| 48114 | 1059 | |
| 62106 | 1060 | \<^item> Output of term constants, type constants, type classes --- this | 
| 1061 | information is already available from the internal term to be printed. | |
| 48114 | 1062 | |
| 62106 | 1063 | In other words, syntax transformations that operate on input terms written | 
| 1064 | as prefix applications are difficult to make robust. Luckily, this case | |
| 1065 | rarely occurs in practice, because syntax forms to be translated usually | |
| 1066 | correspond to some concrete notation. | |
| 1067 | \<close> | |
| 48114 | 1068 | |
| 1069 | ||
| 58618 | 1070 | subsection \<open>Raw syntax and translations \label{sec:syn-trans}\<close>
 | 
| 28762 | 1071 | |
| 58618 | 1072 | text \<open> | 
| 48117 | 1073 |   \begin{tabular}{rcll}
 | 
| 61493 | 1074 |     @{command_def "nonterminal"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | 
| 1075 |     @{command_def "syntax"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | |
| 1076 |     @{command_def "no_syntax"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | |
| 1077 |     @{command_def "translations"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | |
| 1078 |     @{command_def "no_translations"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | |
| 1079 |     @{attribute_def syntax_ast_trace} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 1080 |     @{attribute_def syntax_ast_stats} & : & \<open>attribute\<close> & default \<open>false\<close> \\
 | |
| 48117 | 1081 |   \end{tabular}
 | 
| 61421 | 1082 | \<^medskip> | 
| 59783 
00b62aa9f430
tuned syntax diagrams -- no duplication of "target";
 wenzelm parents: 
58842diff
changeset | 1083 | |
| 62106 | 1084 |   Unlike mixfix notation for existing formal entities (\secref{sec:notation}),
 | 
| 1085 | raw syntax declarations provide full access to the priority grammar of the | |
| 1086 | inner syntax, without any sanity checks. This includes additional syntactic | |
| 1087 |   categories (via @{command nonterminal}) and free-form grammar productions
 | |
| 1088 |   (via @{command syntax}). Additional syntax translations (or macros, via
 | |
| 1089 |   @{command translations}) are required to turn resulting parse trees into
 | |
| 1090 | proper representations of formal entities again. | |
| 46292 | 1091 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 1092 |   @{rail \<open>
 | 
| 42596 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 1093 |     @@{command nonterminal} (@{syntax name} + @'and')
 | 
| 28762 | 1094 | ; | 
| 46494 
ea2ae63336f3
clarified outer syntax "constdecl", which is only local to some rail diagrams;
 wenzelm parents: 
46483diff
changeset | 1095 |     (@@{command syntax} | @@{command no_syntax}) @{syntax mode}? (constdecl +)
 | 
| 28762 | 1096 | ; | 
| 42596 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 1097 |     (@@{command translations} | @@{command no_translations})
 | 
| 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 1098 |       (transpat ('==' | '=>' | '<=' | '\<rightleftharpoons>' | '\<rightharpoonup>' | '\<leftharpoondown>') transpat +)
 | 
| 28762 | 1099 | ; | 
| 1100 | ||
| 46494 
ea2ae63336f3
clarified outer syntax "constdecl", which is only local to some rail diagrams;
 wenzelm parents: 
46483diff
changeset | 1101 |     constdecl: @{syntax name} '::' @{syntax type} @{syntax mixfix}?
 | 
| 
ea2ae63336f3
clarified outer syntax "constdecl", which is only local to some rail diagrams;
 wenzelm parents: 
46483diff
changeset | 1102 | ; | 
| 42596 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 1103 |     mode: ('(' ( @{syntax name} | @'output' | @{syntax name} @'output' ) ')')
 | 
| 28762 | 1104 | ; | 
| 62969 | 1105 |     transpat: ('(' @{syntax name} ')')? @{syntax string}
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 1106 | \<close>} | 
| 28762 | 1107 | |
| 62106 | 1108 |   \<^descr> @{command "nonterminal"}~\<open>c\<close> declares a type constructor \<open>c\<close> (without
 | 
| 1109 | arguments) to act as purely syntactic type: a nonterminal symbol of the | |
| 1110 | inner syntax. | |
| 28762 | 1111 | |
| 62106 | 1112 |   \<^descr> @{command "syntax"}~\<open>(mode) c :: \<sigma> (mx)\<close> augments the priority grammar and
 | 
| 1113 | the pretty printer table for the given print mode (default \<^verbatim>\<open>""\<close>). An | |
| 1114 |   optional keyword @{keyword_ref "output"} means that only the pretty printer
 | |
| 1115 | table is affected. | |
| 46292 | 1116 | |
| 62106 | 1117 |   Following \secref{sec:mixfix}, the mixfix annotation \<open>mx = template ps q\<close>
 | 
| 1118 | together with type \<open>\<sigma> = \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>\<close> and specify a grammar production. | |
| 1119 | The \<open>template\<close> contains delimiter tokens that surround \<open>n\<close> argument | |
| 1120 | positions (\<^verbatim>\<open>_\<close>). The latter correspond to nonterminal symbols \<open>A\<^sub>i\<close> derived | |
| 1121 | from the argument types \<open>\<tau>\<^sub>i\<close> as follows: | |
| 46292 | 1122 | |
| 61493 | 1123 | \<^item> \<open>prop\<close> if \<open>\<tau>\<^sub>i = prop\<close> | 
| 46292 | 1124 | |
| 62106 | 1125 | \<^item> \<open>logic\<close> if \<open>\<tau>\<^sub>i = (\<dots>)\<kappa>\<close> for logical type constructor \<open>\<kappa> \<noteq> prop\<close> | 
| 46292 | 1126 | |
| 61493 | 1127 | \<^item> \<open>any\<close> if \<open>\<tau>\<^sub>i = \<alpha>\<close> for type variables | 
| 46292 | 1128 | |
| 62106 | 1129 | \<^item> \<open>\<kappa>\<close> if \<open>\<tau>\<^sub>i = \<kappa>\<close> for nonterminal \<open>\<kappa>\<close> (syntactic type constructor) | 
| 46292 | 1130 | |
| 62106 | 1131 | Each \<open>A\<^sub>i\<close> is decorated by priority \<open>p\<^sub>i\<close> from the given list \<open>ps\<close>; missing | 
| 1132 | priorities default to 0. | |
| 46292 | 1133 | |
| 62106 | 1134 | The resulting nonterminal of the production is determined similarly from | 
| 1135 | type \<open>\<tau>\<close>, with priority \<open>q\<close> and default 1000. | |
| 46292 | 1136 | |
| 61421 | 1137 | \<^medskip> | 
| 62106 | 1138 | Parsing via this production produces parse trees \<open>t\<^sub>1, \<dots>, t\<^sub>n\<close> for the | 
| 1139 | argument slots. The resulting parse tree is composed as \<open>c t\<^sub>1 \<dots> t\<^sub>n\<close>, by | |
| 1140 | using the syntax constant \<open>c\<close> of the syntax declaration. | |
| 46292 | 1141 | |
| 62106 | 1142 | Such syntactic constants are invented on the spot, without formal check | 
| 1143 | wrt.\ existing declarations. It is conventional to use plain identifiers | |
| 1144 | prefixed by a single underscore (e.g.\ \<open>_foobar\<close>). Names should be chosen | |
| 1145 | with care, to avoid clashes with other syntax declarations. | |
| 46292 | 1146 | |
| 61421 | 1147 | \<^medskip> | 
| 62106 | 1148 | The special case of copy production is specified by \<open>c =\<close>~\<^verbatim>\<open>""\<close> (empty | 
| 1149 | string). It means that the resulting parse tree \<open>t\<close> is copied directly, | |
| 1150 | without any further decoration. | |
| 46282 | 1151 | |
| 62106 | 1152 |   \<^descr> @{command "no_syntax"}~\<open>(mode) decls\<close> removes grammar declarations (and
 | 
| 1153 | translations) resulting from \<open>decls\<close>, which are interpreted in the same | |
| 1154 |   manner as for @{command "syntax"} above.
 | |
| 1155 | ||
| 1156 |   \<^descr> @{command "translations"}~\<open>rules\<close> specifies syntactic translation rules
 | |
| 1157 |   (i.e.\ macros) as first-order rewrite rules on ASTs (\secref{sec:ast}). The
 | |
| 1158 | theory context maintains two independent lists translation rules: parse | |
| 1159 | rules (\<^verbatim>\<open>=>\<close> or \<open>\<rightharpoonup>\<close>) and print rules (\<^verbatim>\<open><=\<close> or \<open>\<leftharpoondown>\<close>). For convenience, both | |
| 1160 | can be specified simultaneously as parse~/ print rules (\<^verbatim>\<open>==\<close> or \<open>\<rightleftharpoons>\<close>). | |
| 46282 | 1161 | |
| 62106 | 1162 | Translation patterns may be prefixed by the syntactic category to be used | 
| 1163 | for parsing; the default is \<open>logic\<close> which means that regular term syntax is | |
| 1164 | used. Both sides of the syntax translation rule undergo parsing and parse | |
| 1165 |   AST translations \secref{sec:tr-funs}, in order to perform some fundamental
 | |
| 1166 | normalization like \<open>\<lambda>x y. b \<leadsto> \<lambda>x. \<lambda>y. b\<close>, but other AST translation rules | |
| 1167 | are \<^emph>\<open>not\<close> applied recursively here. | |
| 48115 | 1168 | |
| 62106 | 1169 | When processing AST patterns, the inner syntax lexer runs in a different | 
| 1170 | mode that allows identifiers to start with underscore. This accommodates the | |
| 1171 | usual naming convention for auxiliary syntax constants --- those that do not | |
| 1172 | have a logical counter part --- by allowing to specify arbitrary AST | |
| 1173 | applications within the term syntax, independently of the corresponding | |
| 1174 | concrete syntax. | |
| 48115 | 1175 | |
| 1176 |   Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML
 | |
| 62106 | 1177 | Ast.Variable} as follows: a qualified name or syntax constant declared via | 
| 1178 |   @{command syntax}, or parse tree head of concrete notation becomes @{ML
 | |
| 1179 |   Ast.Constant}, anything else @{ML Ast.Variable}. Note that \<open>CONST\<close> and
 | |
| 1180 |   \<open>XCONST\<close> within the term language (\secref{sec:pure-grammar}) allow to
 | |
| 1181 | enforce treatment as constants. | |
| 48115 | 1182 | |
| 62106 | 1183 | AST rewrite rules \<open>(lhs, rhs)\<close> need to obey the following side-conditions: | 
| 48115 | 1184 | |
| 62106 | 1185 | \<^item> Rules must be left linear: \<open>lhs\<close> must not contain repeated | 
| 1186 | variables.\<^footnote>\<open>The deeper reason for this is that AST equality is not | |
| 1187 | well-defined: different occurrences of the ``same'' AST could be decorated | |
| 1188 | differently by accidental type-constraints or source position information, | |
| 1189 | for example.\<close> | |
| 48115 | 1190 | |
| 61493 | 1191 | \<^item> Every variable in \<open>rhs\<close> must also occur in \<open>lhs\<close>. | 
| 48115 | 1192 | |
| 62106 | 1193 |   \<^descr> @{command "no_translations"}~\<open>rules\<close> removes syntactic translation rules,
 | 
| 1194 |   which are interpreted in the same manner as for @{command "translations"}
 | |
| 1195 | above. | |
| 28762 | 1196 | |
| 62106 | 1197 |   \<^descr> @{attribute syntax_ast_trace} and @{attribute syntax_ast_stats} control
 | 
| 1198 | diagnostic output in the AST normalization process, when translation rules | |
| 1199 | are applied to concrete input or output. | |
| 48117 | 1200 | |
| 46293 | 1201 | |
| 62106 | 1202 | Raw syntax and translations provides a slightly more low-level access to the | 
| 1203 | grammar and the form of resulting parse trees. It is often possible to avoid | |
| 1204 |   this untyped macro mechanism, and use type-safe @{command abbreviation} or
 | |
| 1205 |   @{command notation} instead. Some important situations where @{command
 | |
| 1206 |   syntax} and @{command translations} are really need are as follows:
 | |
| 46293 | 1207 | |
| 62106 | 1208 |   \<^item> Iterated replacement via recursive @{command translations}. For example,
 | 
| 1209 |   consider list enumeration @{term "[a, b, c, d]"} as defined in theory
 | |
| 1210 |   @{theory List} in Isabelle/HOL.
 | |
| 46293 | 1211 | |
| 62106 | 1212 | \<^item> Change of binding status of variables: anything beyond the built-in | 
| 1213 |   @{keyword "binder"} mixfix annotation requires explicit syntax translations.
 | |
| 1214 |   For example, consider list filter comprehension @{term "[x \<leftarrow> xs . P]"} as
 | |
| 1215 |   defined in theory @{theory List} in Isabelle/HOL.
 | |
| 61458 | 1216 | \<close> | 
| 46293 | 1217 | |
| 28762 | 1218 | |
| 58618 | 1219 | subsubsection \<open>Applying translation rules\<close> | 
| 48117 | 1220 | |
| 62106 | 1221 | text \<open> | 
| 1222 | As a term is being parsed or printed, an AST is generated as an intermediate | |
| 1223 |   form according to \figref{fig:parse-print}. The AST is normalized by
 | |
| 1224 | applying translation rules in the manner of a first-order term rewriting | |
| 1225 | system. We first examine how a single rule is applied. | |
| 48117 | 1226 | |
| 62106 | 1227 | Let \<open>t\<close> be the abstract syntax tree to be normalized and \<open>(lhs, rhs)\<close> some | 
| 1228 | translation rule. A subtree \<open>u\<close> of \<open>t\<close> is called \<^emph>\<open>redex\<close> if it is an | |
| 1229 | instance of \<open>lhs\<close>; in this case the pattern \<open>lhs\<close> is said to match the | |
| 1230 | object \<open>u\<close>. A redex matched by \<open>lhs\<close> may be replaced by the corresponding | |
| 1231 | instance of \<open>rhs\<close>, thus \<^emph>\<open>rewriting\<close> the AST \<open>t\<close>. Matching requires some | |
| 1232 |   notion of \<^emph>\<open>place-holders\<close> in rule patterns: @{ML Ast.Variable} serves this
 | |
| 1233 | purpose. | |
| 48117 | 1234 | |
| 62106 | 1235 | More precisely, the matching of the object \<open>u\<close> against the pattern \<open>lhs\<close> is | 
| 1236 | performed as follows: | |
| 48117 | 1237 | |
| 62106 | 1238 |     \<^item> Objects of the form @{ML Ast.Variable}~\<open>x\<close> or @{ML Ast.Constant}~\<open>x\<close> are
 | 
| 1239 |     matched by pattern @{ML Ast.Constant}~\<open>x\<close>. Thus all atomic ASTs in the
 | |
| 1240 | object are treated as (potential) constants, and a successful match makes | |
| 1241 | them actual constants even before name space resolution (see also | |
| 1242 |     \secref{sec:ast}).
 | |
| 48117 | 1243 | |
| 62106 | 1244 |     \<^item> Object \<open>u\<close> is matched by pattern @{ML Ast.Variable}~\<open>x\<close>, binding \<open>x\<close> to
 | 
| 1245 | \<open>u\<close>. | |
| 48117 | 1246 | |
| 62106 | 1247 |     \<^item> Object @{ML Ast.Appl}~\<open>us\<close> is matched by @{ML Ast.Appl}~\<open>ts\<close> if \<open>us\<close> and
 | 
| 1248 | \<open>ts\<close> have the same length and each corresponding subtree matches. | |
| 48117 | 1249 | |
| 62106 | 1250 | \<^item> In every other case, matching fails. | 
| 48117 | 1251 | |
| 62106 | 1252 | A successful match yields a substitution that is applied to \<open>rhs\<close>, | 
| 1253 | generating the instance that replaces \<open>u\<close>. | |
| 48117 | 1254 | |
| 62106 | 1255 | Normalizing an AST involves repeatedly applying translation rules until none | 
| 1256 | are applicable. This works yoyo-like: top-down, bottom-up, top-down, etc. At | |
| 1257 | each subtree position, rules are chosen in order of appearance in the theory | |
| 1258 | definitions. | |
| 48117 | 1259 | |
| 62106 | 1260 |   The configuration options @{attribute syntax_ast_trace} and @{attribute
 | 
| 1261 | syntax_ast_stats} might help to understand this process and diagnose | |
| 1262 | problems. | |
| 48117 | 1263 | |
| 1264 |   \begin{warn}
 | |
| 62106 | 1265 |   If syntax translation rules work incorrectly, the output of @{command_ref
 | 
| 1266 | print_syntax} with its \<^emph>\<open>rules\<close> sections reveals the actual internal forms | |
| 1267 | of AST pattern, without potentially confusing concrete syntax. Recall that | |
| 1268 | AST constants appear as quoted strings and variables without quotes. | |
| 48117 | 1269 |   \end{warn}
 | 
| 1270 | ||
| 1271 |   \begin{warn}
 | |
| 62106 | 1272 |   If @{attribute_ref eta_contract} is set to \<open>true\<close>, terms will be
 | 
| 1273 | \<open>\<eta>\<close>-contracted \<^emph>\<open>before\<close> the AST rewriter sees them. Thus some abstraction | |
| 1274 | nodes needed for print rules to match may vanish. For example, \<open>Ball A (\<lambda>x. | |
| 1275 | P x)\<close> would contract to \<open>Ball A P\<close> and the standard print rule would fail to | |
| 1276 | apply. This problem can be avoided by hand-written ML translation functions | |
| 1277 |   (see also \secref{sec:tr-funs}), which is in fact the same mechanism used in
 | |
| 1278 |   built-in @{keyword "binder"} declarations.
 | |
| 48117 | 1279 |   \end{warn}
 | 
| 58618 | 1280 | \<close> | 
| 48117 | 1281 | |
| 28762 | 1282 | |
| 58618 | 1283 | subsection \<open>Syntax translation functions \label{sec:tr-funs}\<close>
 | 
| 28762 | 1284 | |
| 58618 | 1285 | text \<open> | 
| 28762 | 1286 |   \begin{matharray}{rcl}
 | 
| 61493 | 1287 |     @{command_def "parse_ast_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | 
| 1288 |     @{command_def "parse_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | |
| 1289 |     @{command_def "print_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | |
| 1290 |     @{command_def "typed_print_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | |
| 1291 |     @{command_def "print_ast_translation"} & : & \<open>theory \<rightarrow> theory\<close> \\
 | |
| 1292 |     @{ML_antiquotation_def "class_syntax"} & : & \<open>ML antiquotation\<close> \\
 | |
| 1293 |     @{ML_antiquotation_def "type_syntax"} & : & \<open>ML antiquotation\<close> \\
 | |
| 1294 |     @{ML_antiquotation_def "const_syntax"} & : & \<open>ML antiquotation\<close> \\
 | |
| 1295 |     @{ML_antiquotation_def "syntax_const"} & : & \<open>ML antiquotation\<close> \\
 | |
| 28762 | 1296 |   \end{matharray}
 | 
| 1297 | ||
| 48118 | 1298 | Syntax translation functions written in ML admit almost arbitrary | 
| 1299 | manipulations of inner syntax, at the expense of some complexity and | |
| 1300 | obscurity in the implementation. | |
| 1301 | ||
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 1302 |   @{rail \<open>
 | 
| 42596 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 1303 |   ( @@{command parse_ast_translation} | @@{command parse_translation} |
 | 
| 
6c621a9d612a
modernized rail diagrams using @{rail} antiquotation;
 wenzelm parents: 
42358diff
changeset | 1304 |     @@{command print_translation} | @@{command typed_print_translation} |
 | 
| 52143 | 1305 |     @@{command print_ast_translation}) @{syntax text}
 | 
| 48119 
55c305e29f4b
cover @{class_syntax}, @{type_syntax}, @{const_syntax}, @{syntax_const} in isar-ref, in contrast to other ML antiquotations in implementation manual;
 wenzelm parents: 
48118diff
changeset | 1306 | ; | 
| 
55c305e29f4b
cover @{class_syntax}, @{type_syntax}, @{const_syntax}, @{syntax_const} in isar-ref, in contrast to other ML antiquotations in implementation manual;
 wenzelm parents: 
48118diff
changeset | 1307 |   (@@{ML_antiquotation class_syntax} |
 | 
| 
55c305e29f4b
cover @{class_syntax}, @{type_syntax}, @{const_syntax}, @{syntax_const} in isar-ref, in contrast to other ML antiquotations in implementation manual;
 wenzelm parents: 
48118diff
changeset | 1308 |    @@{ML_antiquotation type_syntax} |
 | 
| 
55c305e29f4b
cover @{class_syntax}, @{type_syntax}, @{const_syntax}, @{syntax_const} in isar-ref, in contrast to other ML antiquotations in implementation manual;
 wenzelm parents: 
48118diff
changeset | 1309 |    @@{ML_antiquotation const_syntax} |
 | 
| 
55c305e29f4b
cover @{class_syntax}, @{type_syntax}, @{const_syntax}, @{syntax_const} in isar-ref, in contrast to other ML antiquotations in implementation manual;
 wenzelm parents: 
48118diff
changeset | 1310 |    @@{ML_antiquotation syntax_const}) name
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
55108diff
changeset | 1311 | \<close>} | 
| 28762 | 1312 | |
| 62106 | 1313 |   \<^descr> @{command parse_translation} etc. declare syntax translation functions to
 | 
| 1314 |   the theory. Any of these commands have a single @{syntax text} argument that
 | |
| 1315 | refers to an ML expression of appropriate type as follows: | |
| 48118 | 1316 | |
| 61421 | 1317 | \<^medskip> | 
| 48119 
55c305e29f4b
cover @{class_syntax}, @{type_syntax}, @{const_syntax}, @{syntax_const} in isar-ref, in contrast to other ML antiquotations in implementation manual;
 wenzelm parents: 
48118diff
changeset | 1318 |   {\footnotesize
 | 
| 52143 | 1319 |   \begin{tabular}{l}
 | 
| 1320 |   @{command parse_ast_translation} : \\
 | |
| 1321 |   \quad @{ML_type "(string * (Proof.context -> Ast.ast list -> Ast.ast)) list"} \\
 | |
| 1322 |   @{command parse_translation} : \\
 | |
| 1323 |   \quad @{ML_type "(string * (Proof.context -> term list -> term)) list"} \\
 | |
| 1324 |   @{command print_translation} : \\
 | |
| 1325 |   \quad @{ML_type "(string * (Proof.context -> term list -> term)) list"} \\
 | |
| 1326 |   @{command typed_print_translation} : \\
 | |
| 1327 |   \quad @{ML_type "(string * (Proof.context -> typ -> term list -> term)) list"} \\
 | |
| 1328 |   @{command print_ast_translation} : \\
 | |
| 1329 |   \quad @{ML_type "(string * (Proof.context -> Ast.ast list -> Ast.ast)) list"} \\
 | |
| 48118 | 1330 |   \end{tabular}}
 | 
| 61421 | 1331 | \<^medskip> | 
| 28762 | 1332 | |
| 62106 | 1333 | The argument list consists of \<open>(c, tr)\<close> pairs, where \<open>c\<close> is the syntax name | 
| 1334 | of the formal entity involved, and \<open>tr\<close> a function that translates a syntax | |
| 1335 | form \<open>c args\<close> into \<open>tr ctxt args\<close> (depending on the context). The | |
| 1336 | Isabelle/ML naming convention for parse translations is \<open>c_tr\<close> and for print | |
| 1337 | translations \<open>c_tr'\<close>. | |
| 48118 | 1338 | |
| 1339 |   The @{command_ref print_syntax} command displays the sets of names
 | |
| 62106 | 1340 | associated with the translation functions of a theory under | 
| 1341 | \<open>parse_ast_translation\<close> etc. | |
| 48118 | 1342 | |
| 62106 | 1343 |   \<^descr> \<open>@{class_syntax c}\<close>, \<open>@{type_syntax c}\<close>, \<open>@{const_syntax c}\<close> inline the
 | 
| 1344 | authentic syntax name of the given formal entities into the ML source. This | |
| 1345 | is the fully-qualified logical name prefixed by a special marker to indicate | |
| 1346 | its kind: thus different logical name spaces are properly distinguished | |
| 1347 | within parse trees. | |
| 48119 
55c305e29f4b
cover @{class_syntax}, @{type_syntax}, @{const_syntax}, @{syntax_const} in isar-ref, in contrast to other ML antiquotations in implementation manual;
 wenzelm parents: 
48118diff
changeset | 1348 | |
| 62106 | 1349 |   \<^descr> \<open>@{const_syntax c}\<close> inlines the name \<open>c\<close> of the given syntax constant,
 | 
| 1350 |   having checked that it has been declared via some @{command syntax} commands
 | |
| 1351 | within the theory context. Note that the usual naming convention makes | |
| 1352 | syntax constants start with underscore, to reduce the chance of accidental | |
| 1353 | clashes with other names occurring in parse trees (unqualified constants | |
| 1354 | etc.). | |
| 58618 | 1355 | \<close> | 
| 48118 | 1356 | |
| 48119 
55c305e29f4b
cover @{class_syntax}, @{type_syntax}, @{const_syntax}, @{syntax_const} in isar-ref, in contrast to other ML antiquotations in implementation manual;
 wenzelm parents: 
48118diff
changeset | 1357 | |
| 58618 | 1358 | subsubsection \<open>The translation strategy\<close> | 
| 28762 | 1359 | |
| 62106 | 1360 | text \<open> | 
| 1361 | The different kinds of translation functions are invoked during the | |
| 1362 | transformations between parse trees, ASTs and syntactic terms (cf.\ | |
| 1363 |   \figref{fig:parse-print}). Whenever a combination of the form \<open>c x\<^sub>1 \<dots> x\<^sub>n\<close>
 | |
| 1364 | is encountered, and a translation function \<open>f\<close> of appropriate kind is | |
| 1365 | declared for \<open>c\<close>, the result is produced by evaluation of \<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> | |
| 1366 | in ML. | |
| 48118 | 1367 | |
| 62106 | 1368 | For AST translations, the arguments \<open>x\<^sub>1, \<dots>, x\<^sub>n\<close> are ASTs. A combination | 
| 1369 |   has the form @{ML "Ast.Constant"}~\<open>c\<close> or @{ML "Ast.Appl"}~\<open>[\<close>@{ML
 | |
| 1370 | Ast.Constant}~\<open>c, x\<^sub>1, \<dots>, x\<^sub>n]\<close>. For term translations, the arguments are | |
| 1371 |   terms and a combination has the form @{ML Const}~\<open>(c, \<tau>)\<close> or @{ML
 | |
| 1372 | Const}~\<open>(c, \<tau>) $ x\<^sub>1 $ \<dots> $ x\<^sub>n\<close>. Terms allow more sophisticated | |
| 1373 | transformations than ASTs do, typically involving abstractions and bound | |
| 1374 | variables. \<^emph>\<open>Typed\<close> print translations may even peek at the type \<open>\<tau>\<close> of the | |
| 1375 | constant they are invoked on, although some information might have been | |
| 1376 | suppressed for term output already. | |
| 48118 | 1377 | |
| 62106 | 1378 | Regardless of whether they act on ASTs or terms, translation functions | 
| 1379 | called during the parsing process differ from those for printing in their | |
| 1380 | overall behaviour: | |
| 48118 | 1381 | |
| 62106 | 1382 | \<^descr>[Parse translations] are applied bottom-up. The arguments are already in | 
| 1383 | translated form. The translations must not fail; exceptions trigger an | |
| 1384 | error message. There may be at most one function associated with any | |
| 1385 | syntactic name. | |
| 46294 | 1386 | |
| 62106 | 1387 | \<^descr>[Print translations] are applied top-down. They are supplied with | 
| 1388 | arguments that are partly still in internal form. The result again | |
| 1389 | undergoes translation; therefore a print translation should not introduce | |
| 1390 | as head the very constant that invoked it. The function may raise | |
| 1391 |     exception @{ML Match} to indicate failure; in this event it has no effect.
 | |
| 1392 | Multiple functions associated with some syntactic name are tried in the | |
| 1393 | order of declaration in the theory. | |
| 48118 | 1394 | |
| 62106 | 1395 |   Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and @{ML
 | 
| 1396 | Const} for terms --- can invoke translation functions. This means that parse | |
| 1397 | translations can only be associated with parse tree heads of concrete | |
| 1398 | syntax, or syntactic constants introduced via other translations. For plain | |
| 1399 | identifiers within the term language, the status of constant versus variable | |
| 1400 | is not yet know during parsing. This is in contrast to print translations, | |
| 1401 | where constants are explicitly known from the given term in its fully | |
| 1402 | internal form. | |
| 58618 | 1403 | \<close> | 
| 28762 | 1404 | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1405 | |
| 58618 | 1406 | subsection \<open>Built-in syntax transformations\<close> | 
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1407 | |
| 58618 | 1408 | text \<open> | 
| 62106 | 1409 | Here are some further details of the main syntax transformation phases of | 
| 1410 |   \figref{fig:parse-print}.
 | |
| 58618 | 1411 | \<close> | 
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1412 | |
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1413 | |
| 58618 | 1414 | subsubsection \<open>Transforming parse trees to ASTs\<close> | 
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1415 | |
| 62106 | 1416 | text \<open> | 
| 1417 | The parse tree is the raw output of the parser. It is transformed into an | |
| 1418 | AST according to some basic scheme that may be augmented by AST translation | |
| 1419 |   functions as explained in \secref{sec:tr-funs}.
 | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1420 | |
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1421 | The parse tree is constructed by nesting the right-hand sides of the | 
| 62106 | 1422 | productions used to recognize the input. Such parse trees are simply lists | 
| 1423 | of tokens and constituent parse trees, the latter representing the | |
| 1424 | nonterminals of the productions. Ignoring AST translation functions, parse | |
| 1425 | trees are transformed to ASTs by stripping out delimiters and copy | |
| 1426 | productions, while retaining some source position information from input | |
| 1427 | tokens. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1428 | |
| 62106 | 1429 | The Pure syntax provides predefined AST translations to make the basic | 
| 1430 | \<open>\<lambda>\<close>-term structure more apparent within the (first-order) AST | |
| 1431 |   representation, and thus facilitate the use of @{command translations} (see
 | |
| 1432 |   also \secref{sec:syn-trans}). This covers ordinary term application, type
 | |
| 1433 | application, nested abstraction, iterated meta implications and function | |
| 1434 | types. The effect is illustrated on some representative input strings is as | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1435 | follows: | 
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1436 | |
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1437 |   \begin{center}
 | 
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1438 |   \begin{tabular}{ll}
 | 
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1439 | input source & AST \\ | 
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1440 | \hline | 
| 61503 | 1441 | \<open>f x y z\<close> & \<^verbatim>\<open>(f x y z)\<close> \\ | 
| 1442 | \<open>'a ty\<close> & \<^verbatim>\<open>(ty 'a)\<close> \\ | |
| 1443 |   \<open>('a, 'b)ty\<close> & \<^verbatim>\<open>(ty 'a 'b)\<close> \\
 | |
| 1444 |   \<open>\<lambda>x y z. t\<close> & \<^verbatim>\<open>("_abs" x ("_abs" y ("_abs" z t)))\<close> \\
 | |
| 1445 |   \<open>\<lambda>x :: 'a. t\<close> & \<^verbatim>\<open>("_abs" ("_constrain" x 'a) t)\<close> \\
 | |
| 1446 |   \<open>\<lbrakk>P; Q; R\<rbrakk> \<Longrightarrow> S\<close> & \<^verbatim>\<open>("Pure.imp" P ("Pure.imp" Q ("Pure.imp" R S)))\<close> \\
 | |
| 1447 |    \<open>['a, 'b, 'c] \<Rightarrow> 'd\<close> & \<^verbatim>\<open>("fun" 'a ("fun" 'b ("fun" 'c 'd)))\<close> \\
 | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1448 |   \end{tabular}
 | 
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1449 |   \end{center}
 | 
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1450 | |
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1451 | Note that type and sort constraints may occur in further places --- | 
| 62106 | 1452 | translations need to be ready to cope with them. The built-in syntax | 
| 1453 | transformation from parse trees to ASTs insert additional constraints that | |
| 1454 | represent source positions. | |
| 58618 | 1455 | \<close> | 
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1456 | |
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1457 | |
| 58618 | 1458 | subsubsection \<open>Transforming ASTs to terms\<close> | 
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1459 | |
| 62106 | 1460 | text \<open> | 
| 1461 |   After application of macros (\secref{sec:syn-trans}), the AST is transformed
 | |
| 1462 | into a term. This term still lacks proper type information, but it might | |
| 1463 | contain some constraints consisting of applications with head \<^verbatim>\<open>_constrain\<close>, | |
| 1464 | where the second argument is a type encoded as a pre-term within the syntax. | |
| 1465 | Type inference later introduces correct types, or indicates type errors in | |
| 1466 | the input. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1467 | |
| 62106 | 1468 | Ignoring parse translations, ASTs are transformed to terms by mapping AST | 
| 1469 | constants to term constants, AST variables to term variables or constants | |
| 1470 | (according to the name space), and AST applications to iterated term | |
| 1471 | applications. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1472 | |
| 62106 | 1473 | The outcome is still a first-order term. Proper abstractions and bound | 
| 1474 | variables are introduced by parse translations associated with certain | |
| 1475 |   syntax constants. Thus \<^verbatim>\<open>("_abs" x x)\<close> eventually becomes a de-Bruijn term
 | |
| 1476 |   \<^verbatim>\<open>Abs ("x", _, Bound 0)\<close>.
 | |
| 58618 | 1477 | \<close> | 
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1478 | |
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1479 | |
| 58618 | 1480 | subsubsection \<open>Printing of terms\<close> | 
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1481 | |
| 62106 | 1482 | text \<open> | 
| 1483 | The output phase is essentially the inverse of the input phase. Terms are | |
| 1484 | translated via abstract syntax trees into pretty-printed text. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1485 | |
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1486 | Ignoring print translations, the transformation maps term constants, | 
| 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1487 | variables and applications to the corresponding constructs on ASTs. | 
| 62106 | 1488 | Abstractions are mapped to applications of the special constant \<^verbatim>\<open>_abs\<close> as | 
| 1489 | seen before. Type constraints are represented via special \<^verbatim>\<open>_constrain\<close> | |
| 1490 | forms, according to various policies of type annotation determined | |
| 1491 | elsewhere. Sort constraints of type variables are handled in a similar | |
| 1492 | fashion. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1493 | |
| 62106 | 1494 |   After application of macros (\secref{sec:syn-trans}), the AST is finally
 | 
| 1495 | pretty-printed. The built-in print AST translations reverse the | |
| 1496 | corresponding parse AST translations. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1497 | |
| 61421 | 1498 | \<^medskip> | 
| 1499 | For the actual printing process, the priority grammar | |
| 62106 | 1500 |   (\secref{sec:priority-grammar}) plays a vital role: productions are used as
 | 
| 1501 | templates for pretty printing, with argument slots stemming from | |
| 1502 | nonterminals, and syntactic sugar stemming from literal tokens. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1503 | |
| 62106 | 1504 | Each AST application with constant head \<open>c\<close> and arguments \<open>t\<^sub>1\<close>, \dots, | 
| 1505 | \<open>t\<^sub>n\<close> (for \<open>n = 0\<close> the AST is just the constant \<open>c\<close> itself) is printed | |
| 1506 | according to the first grammar production of result name \<open>c\<close>. The required | |
| 1507 | syntax priority of the argument slot is given by its nonterminal \<open>A\<^sup>(\<^sup>p\<^sup>)\<close>. | |
| 1508 | The argument \<open>t\<^sub>i\<close> that corresponds to the position of \<open>A\<^sup>(\<^sup>p\<^sup>)\<close> is printed | |
| 1509 | recursively, and then put in parentheses \<^emph>\<open>if\<close> its priority \<open>p\<close> requires | |
| 1510 | this. The resulting output is concatenated with the syntactic sugar | |
| 1511 | according to the grammar production. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1512 | |
| 62106 | 1513 | If an AST application \<open>(c x\<^sub>1 \<dots> x\<^sub>m)\<close> has more arguments than the | 
| 1514 | corresponding production, it is first split into \<open>((c x\<^sub>1 \<dots> x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \<dots> | |
| 1515 | x\<^sub>m)\<close> and then printed recursively as above. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1516 | |
| 62106 | 1517 | Applications with too few arguments or with non-constant head or without a | 
| 1518 | corresponding production are printed in prefix-form like \<open>f t\<^sub>1 \<dots> t\<^sub>n\<close> for | |
| 1519 | terms. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1520 | |
| 62106 | 1521 | Multiple productions associated with some name \<open>c\<close> are tried in order of | 
| 1522 | appearance within the grammar. An occurrence of some AST variable \<open>x\<close> is | |
| 1523 | printed as \<open>x\<close> outright. | |
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1524 | |
| 61421 | 1525 | \<^medskip> | 
| 62106 | 1526 | White space is \<^emph>\<open>not\<close> inserted automatically. If blanks (or breaks) are | 
| 1527 | required to separate tokens, they need to be specified in the mixfix | |
| 1528 |   declaration (\secref{sec:mixfix}).
 | |
| 58618 | 1529 | \<close> | 
| 52414 
8429123bc58a
more on built-in syntax transformations, based on reduced version of old material;
 wenzelm parents: 
52413diff
changeset | 1530 | |
| 28762 | 1531 | end |