|
1 theory Inner_Syntax |
|
2 imports Base Main |
|
3 begin |
|
4 |
|
5 chapter {* Inner syntax --- the term language \label{ch:inner-syntax} *} |
|
6 |
|
7 text {* The inner syntax of Isabelle provides concrete notation for |
|
8 the main entities of the logical framework, notably @{text |
|
9 "\<lambda>"}-terms with types and type classes. Applications may either |
|
10 extend existing syntactic categories by additional notation, or |
|
11 define new sub-languages that are linked to the standard term |
|
12 language via some explicit markers. For example @{verbatim |
|
13 FOO}~@{text "foo"} could embed the syntax corresponding for some |
|
14 user-defined nonterminal @{text "foo"} --- within the bounds of the |
|
15 given lexical syntax of Isabelle/Pure. |
|
16 |
|
17 The most basic way to specify concrete syntax for logical entities |
|
18 works via mixfix annotations (\secref{sec:mixfix}), which may be |
|
19 usually given as part of the original declaration or via explicit |
|
20 notation commands later on (\secref{sec:notation}). This already |
|
21 covers many needs of concrete syntax without having to understand |
|
22 the full complexity of inner syntax layers. |
|
23 |
|
24 Further details of the syntax engine involves the classical |
|
25 distinction of lexical language versus context-free grammar (see |
|
26 \secref{sec:pure-syntax}), and various mechanisms for \emph{syntax |
|
27 transformations} (see \secref{sec:syntax-transformations}). |
|
28 *} |
|
29 |
|
30 |
|
31 section {* Printing logical entities *} |
|
32 |
|
33 subsection {* Diagnostic commands \label{sec:print-diag} *} |
|
34 |
|
35 text {* |
|
36 \begin{matharray}{rcl} |
|
37 @{command_def "typ"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\ |
|
38 @{command_def "term"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\ |
|
39 @{command_def "prop"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\ |
|
40 @{command_def "thm"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\ |
|
41 @{command_def "prf"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\ |
|
42 @{command_def "full_prf"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\ |
|
43 @{command_def "pr"}@{text "\<^sup>*"} & : & @{text "any \<rightarrow>"} \\ |
|
44 \end{matharray} |
|
45 |
|
46 These diagnostic commands assist interactive development by printing |
|
47 internal logical entities in a human-readable fashion. |
|
48 |
|
49 @{rail " |
|
50 @@{command typ} @{syntax modes}? @{syntax type} ('::' @{syntax sort})? |
|
51 ; |
|
52 @@{command term} @{syntax modes}? @{syntax term} |
|
53 ; |
|
54 @@{command prop} @{syntax modes}? @{syntax prop} |
|
55 ; |
|
56 @@{command thm} @{syntax modes}? @{syntax thmrefs} |
|
57 ; |
|
58 ( @@{command prf} | @@{command full_prf} ) @{syntax modes}? @{syntax thmrefs}? |
|
59 ; |
|
60 @@{command pr} @{syntax modes}? @{syntax nat}? |
|
61 ; |
|
62 |
|
63 @{syntax_def modes}: '(' (@{syntax name} + ) ')' |
|
64 "} |
|
65 |
|
66 \begin{description} |
|
67 |
|
68 \item @{command "typ"}~@{text \<tau>} reads and prints a type expression |
|
69 according to the current context. |
|
70 |
|
71 \item @{command "typ"}~@{text "\<tau> :: s"} uses type-inference to |
|
72 determine the most general way to make @{text "\<tau>"} conform to sort |
|
73 @{text "s"}. For concrete @{text "\<tau>"} this checks if the type |
|
74 belongs to that sort. Dummy type parameters ``@{text "_"}'' |
|
75 (underscore) are assigned to fresh type variables with most general |
|
76 sorts, according the the principles of type-inference. |
|
77 |
|
78 \item @{command "term"}~@{text t} and @{command "prop"}~@{text \<phi>} |
|
79 read, type-check and print terms or propositions according to the |
|
80 current theory or proof context; the inferred type of @{text t} is |
|
81 output as well. Note that these commands are also useful in |
|
82 inspecting the current environment of term abbreviations. |
|
83 |
|
84 \item @{command "thm"}~@{text "a\<^sub>1 \<dots> a\<^sub>n"} retrieves |
|
85 theorems from the current theory or proof context. Note that any |
|
86 attributes included in the theorem specifications are applied to a |
|
87 temporary context derived from the current theory or proof; the |
|
88 result is discarded, i.e.\ attributes involved in @{text "a\<^sub>1, |
|
89 \<dots>, a\<^sub>n"} do not have any permanent effect. |
|
90 |
|
91 \item @{command "prf"} displays the (compact) proof term of the |
|
92 current proof state (if present), or of the given theorems. Note |
|
93 that this requires proof terms to be switched on for the current |
|
94 object logic (see the ``Proof terms'' section of the Isabelle |
|
95 reference manual for information on how to do this). |
|
96 |
|
97 \item @{command "full_prf"} is like @{command "prf"}, but displays |
|
98 the full proof term, i.e.\ also displays information omitted in the |
|
99 compact proof term, which is denoted by ``@{text _}'' placeholders |
|
100 there. |
|
101 |
|
102 \item @{command "pr"}~@{text "goals"} prints the current proof state |
|
103 (if present), including current facts and goals. The optional limit |
|
104 arguments affect the number of goals to be displayed, which is |
|
105 initially 10. Omitting limit value leaves the current setting |
|
106 unchanged. |
|
107 |
|
108 \end{description} |
|
109 |
|
110 All of the diagnostic commands above admit a list of @{text modes} |
|
111 to be specified, which is appended to the current print mode; see |
|
112 also \secref{sec:print-modes}. Thus the output behavior may be |
|
113 modified according particular print mode features. For example, |
|
114 @{command "pr"}~@{text "(latex xsymbols)"} would print the current |
|
115 proof state with mathematical symbols and special characters |
|
116 represented in {\LaTeX} source, according to the Isabelle style |
|
117 \cite{isabelle-sys}. |
|
118 |
|
119 Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more |
|
120 systematic way to include formal items into the printed text |
|
121 document. |
|
122 *} |
|
123 |
|
124 |
|
125 subsection {* Details of printed content *} |
|
126 |
|
127 text {* |
|
128 \begin{tabular}{rcll} |
|
129 @{attribute_def show_types} & : & @{text attribute} & default @{text false} \\ |
|
130 @{attribute_def show_sorts} & : & @{text attribute} & default @{text false} \\ |
|
131 @{attribute_def show_consts} & : & @{text attribute} & default @{text false} \\ |
|
132 @{attribute_def show_abbrevs} & : & @{text attribute} & default @{text true} \\ |
|
133 @{attribute_def show_brackets} & : & @{text attribute} & default @{text false} \\ |
|
134 @{attribute_def names_long} & : & @{text attribute} & default @{text false} \\ |
|
135 @{attribute_def names_short} & : & @{text attribute} & default @{text false} \\ |
|
136 @{attribute_def names_unique} & : & @{text attribute} & default @{text true} \\ |
|
137 @{attribute_def eta_contract} & : & @{text attribute} & default @{text true} \\ |
|
138 @{attribute_def goals_limit} & : & @{text attribute} & default @{text 10} \\ |
|
139 @{attribute_def show_main_goal} & : & @{text attribute} & default @{text false} \\ |
|
140 @{attribute_def show_hyps} & : & @{text attribute} & default @{text false} \\ |
|
141 @{attribute_def show_tags} & : & @{text attribute} & default @{text false} \\ |
|
142 @{attribute_def show_question_marks} & : & @{text attribute} & default @{text true} \\ |
|
143 \end{tabular} |
|
144 \medskip |
|
145 |
|
146 These configuration options control the detail of information that |
|
147 is displayed for types, terms, theorems, goals etc. See also |
|
148 \secref{sec:config}. |
|
149 |
|
150 \begin{description} |
|
151 |
|
152 \item @{attribute show_types} and @{attribute show_sorts} control |
|
153 printing of type constraints for term variables, and sort |
|
154 constraints for type variables. By default, neither of these are |
|
155 shown in output. If @{attribute show_sorts} is enabled, types are |
|
156 always shown as well. |
|
157 |
|
158 Note that displaying types and sorts may explain why a polymorphic |
|
159 inference rule fails to resolve with some goal, or why a rewrite |
|
160 rule does not apply as expected. |
|
161 |
|
162 \item @{attribute show_consts} controls printing of types of |
|
163 constants when displaying a goal state. |
|
164 |
|
165 Note that the output can be enormous, because polymorphic constants |
|
166 often occur at several different type instances. |
|
167 |
|
168 \item @{attribute show_abbrevs} controls folding of constant |
|
169 abbreviations. |
|
170 |
|
171 \item @{attribute show_brackets} controls bracketing in pretty |
|
172 printed output. If enabled, all sub-expressions of the pretty |
|
173 printing tree will be parenthesized, even if this produces malformed |
|
174 term syntax! This crude way of showing the internal structure of |
|
175 pretty printed entities may occasionally help to diagnose problems |
|
176 with operator priorities, for example. |
|
177 |
|
178 \item @{attribute names_long}, @{attribute names_short}, and |
|
179 @{attribute names_unique} control the way of printing fully |
|
180 qualified internal names in external form. See also |
|
181 \secref{sec:antiq} for the document antiquotation options of the |
|
182 same names. |
|
183 |
|
184 \item @{attribute eta_contract} controls @{text "\<eta>"}-contracted |
|
185 printing of terms. |
|
186 |
|
187 The @{text \<eta>}-contraction law asserts @{prop "(\<lambda>x. f x) \<equiv> f"}, |
|
188 provided @{text x} is not free in @{text f}. It asserts |
|
189 \emph{extensionality} of functions: @{prop "f \<equiv> g"} if @{prop "f x \<equiv> |
|
190 g x"} for all @{text x}. Higher-order unification frequently puts |
|
191 terms into a fully @{text \<eta>}-expanded form. For example, if @{text |
|
192 F} has type @{text "(\<tau> \<Rightarrow> \<tau>) \<Rightarrow> \<tau>"} then its expanded form is @{term |
|
193 "\<lambda>h. F (\<lambda>x. h x)"}. |
|
194 |
|
195 Enabling @{attribute eta_contract} makes Isabelle perform @{text |
|
196 \<eta>}-contractions before printing, so that @{term "\<lambda>h. F (\<lambda>x. h x)"} |
|
197 appears simply as @{text F}. |
|
198 |
|
199 Note that the distinction between a term and its @{text \<eta>}-expanded |
|
200 form occasionally matters. While higher-order resolution and |
|
201 rewriting operate modulo @{text "\<alpha>\<beta>\<eta>"}-conversion, some other tools |
|
202 might look at terms more discretely. |
|
203 |
|
204 \item @{attribute goals_limit} controls the maximum number of |
|
205 subgoals to be shown in goal output. |
|
206 |
|
207 \item @{attribute show_main_goal} controls whether the main result |
|
208 to be proven should be displayed. This information might be |
|
209 relevant for schematic goals, to inspect the current claim that has |
|
210 been synthesized so far. |
|
211 |
|
212 \item @{attribute show_hyps} controls printing of implicit |
|
213 hypotheses of local facts. Normally, only those hypotheses are |
|
214 displayed that are \emph{not} covered by the assumptions of the |
|
215 current context: this situation indicates a fault in some tool being |
|
216 used. |
|
217 |
|
218 By enabling @{attribute show_hyps}, output of \emph{all} hypotheses |
|
219 can be enforced, which is occasionally useful for diagnostic |
|
220 purposes. |
|
221 |
|
222 \item @{attribute show_tags} controls printing of extra annotations |
|
223 within theorems, such as internal position information, or the case |
|
224 names being attached by the attribute @{attribute case_names}. |
|
225 |
|
226 Note that the @{attribute tagged} and @{attribute untagged} |
|
227 attributes provide low-level access to the collection of tags |
|
228 associated with a theorem. |
|
229 |
|
230 \item @{attribute show_question_marks} controls printing of question |
|
231 marks for schematic variables, such as @{text ?x}. Only the leading |
|
232 question mark is affected, the remaining text is unchanged |
|
233 (including proper markup for schematic variables that might be |
|
234 relevant for user interfaces). |
|
235 |
|
236 \end{description} |
|
237 *} |
|
238 |
|
239 |
|
240 subsection {* Alternative print modes \label{sec:print-modes} *} |
|
241 |
|
242 text {* |
|
243 \begin{mldecls} |
|
244 @{index_ML print_mode_value: "unit -> string list"} \\ |
|
245 @{index_ML Print_Mode.with_modes: "string list -> ('a -> 'b) -> 'a -> 'b"} \\ |
|
246 \end{mldecls} |
|
247 |
|
248 The \emph{print mode} facility allows to modify various operations |
|
249 for printing. Commands like @{command typ}, @{command term}, |
|
250 @{command thm} (see \secref{sec:print-diag}) take additional print |
|
251 modes as optional argument. The underlying ML operations are as |
|
252 follows. |
|
253 |
|
254 \begin{description} |
|
255 |
|
256 \item @{ML "print_mode_value ()"} yields the list of currently |
|
257 active print mode names. This should be understood as symbolic |
|
258 representation of certain individual features for printing (with |
|
259 precedence from left to right). |
|
260 |
|
261 \item @{ML Print_Mode.with_modes}~@{text "modes f x"} evaluates |
|
262 @{text "f x"} in an execution context where the print mode is |
|
263 prepended by the given @{text "modes"}. This provides a thread-safe |
|
264 way to augment print modes. It is also monotonic in the set of mode |
|
265 names: it retains the default print mode that certain |
|
266 user-interfaces might have installed for their proper functioning! |
|
267 |
|
268 \end{description} |
|
269 |
|
270 \begin{warn} |
|
271 The old global reference @{ML print_mode} should never be used |
|
272 directly in applications. Its main reason for being publicly |
|
273 accessible is to support historic versions of Proof~General. |
|
274 \end{warn} |
|
275 |
|
276 \medskip The pretty printer for inner syntax maintains alternative |
|
277 mixfix productions for any print mode name invented by the user, say |
|
278 in commands like @{command notation} or @{command abbreviation}. |
|
279 Mode names can be arbitrary, but the following ones have a specific |
|
280 meaning by convention: |
|
281 |
|
282 \begin{itemize} |
|
283 |
|
284 \item @{verbatim "\"\""} (the empty string): default mode; |
|
285 implicitly active as last element in the list of modes. |
|
286 |
|
287 \item @{verbatim input}: dummy print mode that is never active; may |
|
288 be used to specify notation that is only available for input. |
|
289 |
|
290 \item @{verbatim internal} dummy print mode that is never active; |
|
291 used internally in Isabelle/Pure. |
|
292 |
|
293 \item @{verbatim xsymbols}: enable proper mathematical symbols |
|
294 instead of ASCII art.\footnote{This traditional mode name stems from |
|
295 the ``X-Symbol'' package for old versions Proof~General with XEmacs, |
|
296 although that package has been superseded by Unicode in recent |
|
297 years.} |
|
298 |
|
299 \item @{verbatim HTML}: additional mode that is active in HTML |
|
300 presentation of Isabelle theory sources; allows to provide |
|
301 alternative output notation. |
|
302 |
|
303 \item @{verbatim latex}: additional mode that is active in {\LaTeX} |
|
304 document preparation of Isabelle theory sources; allows to provide |
|
305 alternative output notation. |
|
306 |
|
307 \end{itemize} |
|
308 *} |
|
309 |
|
310 |
|
311 subsection {* Printing limits *} |
|
312 |
|
313 text {* |
|
314 \begin{mldecls} |
|
315 @{index_ML Pretty.margin_default: "int Unsynchronized.ref"} \\ |
|
316 @{index_ML print_depth: "int -> unit"} \\ |
|
317 \end{mldecls} |
|
318 |
|
319 These ML functions set limits for pretty printed text. |
|
320 |
|
321 \begin{description} |
|
322 |
|
323 \item @{ML Pretty.margin_default} indicates the global default for |
|
324 the right margin of the built-in pretty printer, with initial value |
|
325 76. Note that user-interfaces typically control margins |
|
326 automatically when resizing windows, or even bypass the formatting |
|
327 engine of Isabelle/ML altogether and do it within the front end via |
|
328 Isabelle/Scala. |
|
329 |
|
330 \item @{ML print_depth}~@{text n} limits the printing depth of the |
|
331 ML toplevel pretty printer; the precise effect depends on the ML |
|
332 compiler and run-time system. Typically @{text n} should be less |
|
333 than 10. Bigger values such as 100--1000 are useful for debugging. |
|
334 |
|
335 \end{description} |
|
336 *} |
|
337 |
|
338 |
|
339 section {* Mixfix annotations \label{sec:mixfix} *} |
|
340 |
|
341 text {* Mixfix annotations specify concrete \emph{inner syntax} of |
|
342 Isabelle types and terms. Locally fixed parameters in toplevel |
|
343 theorem statements, locale and class specifications also admit |
|
344 mixfix annotations in a fairly uniform manner. A mixfix annotation |
|
345 describes describes the concrete syntax, the translation to abstract |
|
346 syntax, and the pretty printing. Special case annotations provide a |
|
347 simple means of specifying infix operators and binders. |
|
348 |
|
349 Isabelle mixfix syntax is inspired by {\OBJ} \cite{OBJ}. It allows |
|
350 to specify any context-free priority grammar, which is more general |
|
351 than the fixity declarations of ML and Prolog. |
|
352 |
|
353 @{rail " |
|
354 @{syntax_def mixfix}: '(' mfix ')' |
|
355 ; |
|
356 @{syntax_def struct_mixfix}: '(' ( mfix | @'structure' ) ')' |
|
357 ; |
|
358 |
|
359 mfix: @{syntax template} prios? @{syntax nat}? | |
|
360 (@'infix' | @'infixl' | @'infixr') @{syntax template} @{syntax nat} | |
|
361 @'binder' @{syntax template} prios? @{syntax nat} |
|
362 ; |
|
363 template: string |
|
364 ; |
|
365 prios: '[' (@{syntax nat} + ',') ']' |
|
366 "} |
|
367 |
|
368 The string given as @{text template} may include literal text, |
|
369 spacing, blocks, and arguments (denoted by ``@{text _}''); the |
|
370 special symbol ``@{verbatim "\<index>"}'' (printed as ``@{text "\<index>"}'') |
|
371 represents an index argument that specifies an implicit structure |
|
372 reference (see also \secref{sec:locale}). Infix and binder |
|
373 declarations provide common abbreviations for particular mixfix |
|
374 declarations. So in practice, mixfix templates mostly degenerate to |
|
375 literal text for concrete syntax, such as ``@{verbatim "++"}'' for |
|
376 an infix symbol. |
|
377 *} |
|
378 |
|
379 |
|
380 subsection {* The general mixfix form *} |
|
381 |
|
382 text {* In full generality, mixfix declarations work as follows. |
|
383 Suppose a constant @{text "c :: \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>"} is annotated by |
|
384 @{text "(mixfix [p\<^sub>1, \<dots>, p\<^sub>n] p)"}, where @{text "mixfix"} is a string |
|
385 @{text "d\<^sub>0 _ d\<^sub>1 _ \<dots> _ d\<^sub>n"} consisting of delimiters that surround |
|
386 argument positions as indicated by underscores. |
|
387 |
|
388 Altogether this determines a production for a context-free priority |
|
389 grammar, where for each argument @{text "i"} the syntactic category |
|
390 is determined by @{text "\<tau>\<^sub>i"} (with priority @{text "p\<^sub>i"}), and the |
|
391 result category is determined from @{text "\<tau>"} (with priority @{text |
|
392 "p"}). Priority specifications are optional, with default 0 for |
|
393 arguments and 1000 for the result.\footnote{Omitting priorities is |
|
394 prone to syntactic ambiguities unless the delimiter tokens determine |
|
395 fully bracketed notation, as in @{text "if _ then _ else _ fi"}.} |
|
396 |
|
397 Since @{text "\<tau>"} may be again a function type, the constant |
|
398 type scheme may have more argument positions than the mixfix |
|
399 pattern. Printing a nested application @{text "c t\<^sub>1 \<dots> t\<^sub>m"} for |
|
400 @{text "m > n"} works by attaching concrete notation only to the |
|
401 innermost part, essentially by printing @{text "(c t\<^sub>1 \<dots> t\<^sub>n) \<dots> t\<^sub>m"} |
|
402 instead. If a term has fewer arguments than specified in the mixfix |
|
403 template, the concrete syntax is ignored. |
|
404 |
|
405 \medskip A mixfix template may also contain additional directives |
|
406 for pretty printing, notably spaces, blocks, and breaks. The |
|
407 general template format is a sequence over any of the following |
|
408 entities. |
|
409 |
|
410 \begin{description} |
|
411 |
|
412 \item @{text "d"} is a delimiter, namely a non-empty sequence of |
|
413 characters other than the following special characters: |
|
414 |
|
415 \smallskip |
|
416 \begin{tabular}{ll} |
|
417 @{verbatim "'"} & single quote \\ |
|
418 @{verbatim "_"} & underscore \\ |
|
419 @{text "\<index>"} & index symbol \\ |
|
420 @{verbatim "("} & open parenthesis \\ |
|
421 @{verbatim ")"} & close parenthesis \\ |
|
422 @{verbatim "/"} & slash \\ |
|
423 \end{tabular} |
|
424 \medskip |
|
425 |
|
426 \item @{verbatim "'"} escapes the special meaning of these |
|
427 meta-characters, producing a literal version of the following |
|
428 character, unless that is a blank. |
|
429 |
|
430 A single quote followed by a blank separates delimiters, without |
|
431 affecting printing, but input tokens may have additional white space |
|
432 here. |
|
433 |
|
434 \item @{verbatim "_"} is an argument position, which stands for a |
|
435 certain syntactic category in the underlying grammar. |
|
436 |
|
437 \item @{text "\<index>"} is an indexed argument position; this is the place |
|
438 where implicit structure arguments can be attached. |
|
439 |
|
440 \item @{text "s"} is a non-empty sequence of spaces for printing. |
|
441 This and the following specifications do not affect parsing at all. |
|
442 |
|
443 \item @{verbatim "("}@{text n} opens a pretty printing block. The |
|
444 optional number specifies how much indentation to add when a line |
|
445 break occurs within the block. If the parenthesis is not followed |
|
446 by digits, the indentation defaults to 0. A block specified via |
|
447 @{verbatim "(00"} is unbreakable. |
|
448 |
|
449 \item @{verbatim ")"} closes a pretty printing block. |
|
450 |
|
451 \item @{verbatim "//"} forces a line break. |
|
452 |
|
453 \item @{verbatim "/"}@{text s} allows a line break. Here @{text s} |
|
454 stands for the string of spaces (zero or more) right after the |
|
455 slash. These spaces are printed if the break is \emph{not} taken. |
|
456 |
|
457 \end{description} |
|
458 |
|
459 The general idea of pretty printing with blocks and breaks is also |
|
460 described in \cite{paulson-ml2}; it goes back to \cite{Oppen:1980}. |
|
461 *} |
|
462 |
|
463 |
|
464 subsection {* Infixes *} |
|
465 |
|
466 text {* Infix operators are specified by convenient short forms that |
|
467 abbreviate general mixfix annotations as follows: |
|
468 |
|
469 \begin{center} |
|
470 \begin{tabular}{lll} |
|
471 |
|
472 @{verbatim "("}@{keyword_def "infix"}~@{verbatim "\""}@{text sy}@{verbatim "\""} @{text "p"}@{verbatim ")"} |
|
473 & @{text "\<mapsto>"} & |
|
474 @{verbatim "(\"(_ "}@{text sy}@{verbatim "/ _)\" ["}@{text "p + 1"}@{verbatim ", "}@{text "p + 1"}@{verbatim "]"}@{text " p"}@{verbatim ")"} \\ |
|
475 @{verbatim "("}@{keyword_def "infixl"}~@{verbatim "\""}@{text sy}@{verbatim "\""} @{text "p"}@{verbatim ")"} |
|
476 & @{text "\<mapsto>"} & |
|
477 @{verbatim "(\"(_ "}@{text sy}@{verbatim "/ _)\" ["}@{text "p"}@{verbatim ", "}@{text "p + 1"}@{verbatim "]"}@{text " p"}@{verbatim ")"} \\ |
|
478 @{verbatim "("}@{keyword_def "infixr"}~@{verbatim "\""}@{text sy}@{verbatim "\""} @{text "p"}@{verbatim ")"} |
|
479 & @{text "\<mapsto>"} & |
|
480 @{verbatim "(\"(_ "}@{text sy}@{verbatim "/ _)\" ["}@{text "p + 1"}@{verbatim ", "}@{text "p"}@{verbatim "]"}@{text " p"}@{verbatim ")"} \\ |
|
481 |
|
482 \end{tabular} |
|
483 \end{center} |
|
484 |
|
485 The mixfix template @{verbatim "\"(_ "}@{text sy}@{verbatim "/ _)\""} |
|
486 specifies two argument positions; the delimiter is preceded by a |
|
487 space and followed by a space or line break; the entire phrase is a |
|
488 pretty printing block. |
|
489 |
|
490 The alternative notation @{verbatim "op"}~@{text sy} is introduced |
|
491 in addition. Thus any infix operator may be written in prefix form |
|
492 (as in ML), independently of the number of arguments in the term. |
|
493 *} |
|
494 |
|
495 |
|
496 subsection {* Binders *} |
|
497 |
|
498 text {* A \emph{binder} is a variable-binding construct such as a |
|
499 quantifier. The idea to formalize @{text "\<forall>x. b"} as @{text "All |
|
500 (\<lambda>x. b)"} for @{text "All :: ('a \<Rightarrow> bool) \<Rightarrow> bool"} already goes back |
|
501 to \cite{church40}. Isabelle declarations of certain higher-order |
|
502 operators may be annotated with @{keyword_def "binder"} annotations |
|
503 as follows: |
|
504 |
|
505 \begin{center} |
|
506 @{text "c :: "}@{verbatim "\""}@{text "(\<tau>\<^sub>1 \<Rightarrow> \<tau>\<^sub>2) \<Rightarrow> \<tau>\<^sub>3"}@{verbatim "\" ("}@{keyword "binder"}@{verbatim " \""}@{text "sy"}@{verbatim "\" ["}@{text "p"}@{verbatim "] "}@{text "q"}@{verbatim ")"} |
|
507 \end{center} |
|
508 |
|
509 This introduces concrete binder syntax @{text "sy x. b"}, where |
|
510 @{text x} is a bound variable of type @{text "\<tau>\<^sub>1"}, the body @{text |
|
511 b} has type @{text "\<tau>\<^sub>2"} and the whole term has type @{text "\<tau>\<^sub>3"}. |
|
512 The optional integer @{text p} specifies the syntactic priority of |
|
513 the body; the default is @{text "q"}, which is also the priority of |
|
514 the whole construct. |
|
515 |
|
516 Internally, the binder syntax is expanded to something like this: |
|
517 \begin{center} |
|
518 @{text "c_binder :: "}@{verbatim "\""}@{text "idts \<Rightarrow> \<tau>\<^sub>2 \<Rightarrow> \<tau>\<^sub>3"}@{verbatim "\" (\"(3"}@{text sy}@{verbatim "_./ _)\" [0, "}@{text "p"}@{verbatim "] "}@{text "q"}@{verbatim ")"} |
|
519 \end{center} |
|
520 |
|
521 Here @{syntax (inner) idts} is the nonterminal symbol for a list of |
|
522 identifiers with optional type constraints (see also |
|
523 \secref{sec:pure-grammar}). The mixfix template @{verbatim |
|
524 "\"(3"}@{text sy}@{verbatim "_./ _)\""} defines argument positions |
|
525 for the bound identifiers and the body, separated by a dot with |
|
526 optional line break; the entire phrase is a pretty printing block of |
|
527 indentation level 3. Note that there is no extra space after @{text |
|
528 "sy"}, so it needs to be included user specification if the binder |
|
529 syntax ends with a token that may be continued by an identifier |
|
530 token at the start of @{syntax (inner) idts}. |
|
531 |
|
532 Furthermore, a syntax translation to transforms @{text "c_binder x\<^sub>1 |
|
533 \<dots> x\<^sub>n b"} into iterated application @{text "c (\<lambda>x\<^sub>1. \<dots> c (\<lambda>x\<^sub>n. b)\<dots>)"}. |
|
534 This works in both directions, for parsing and printing. *} |
|
535 |
|
536 |
|
537 section {* Explicit notation \label{sec:notation} *} |
|
538 |
|
539 text {* |
|
540 \begin{matharray}{rcll} |
|
541 @{command_def "type_notation"} & : & @{text "local_theory \<rightarrow> local_theory"} \\ |
|
542 @{command_def "no_type_notation"} & : & @{text "local_theory \<rightarrow> local_theory"} \\ |
|
543 @{command_def "notation"} & : & @{text "local_theory \<rightarrow> local_theory"} \\ |
|
544 @{command_def "no_notation"} & : & @{text "local_theory \<rightarrow> local_theory"} \\ |
|
545 @{command_def "write"} & : & @{text "proof(state) \<rightarrow> proof(state)"} \\ |
|
546 \end{matharray} |
|
547 |
|
548 Commands that introduce new logical entities (terms or types) |
|
549 usually allow to provide mixfix annotations on the spot, which is |
|
550 convenient for default notation. Nonetheless, the syntax may be |
|
551 modified later on by declarations for explicit notation. This |
|
552 allows to add or delete mixfix annotations for of existing logical |
|
553 entities within the current context. |
|
554 |
|
555 @{rail " |
|
556 (@@{command type_notation} | @@{command no_type_notation}) @{syntax target}? |
|
557 @{syntax mode}? \\ (@{syntax nameref} @{syntax mixfix} + @'and') |
|
558 ; |
|
559 (@@{command notation} | @@{command no_notation}) @{syntax target}? @{syntax mode}? \\ |
|
560 (@{syntax nameref} @{syntax struct_mixfix} + @'and') |
|
561 ; |
|
562 @@{command write} @{syntax mode}? (@{syntax nameref} @{syntax struct_mixfix} + @'and') |
|
563 "} |
|
564 |
|
565 \begin{description} |
|
566 |
|
567 \item @{command "type_notation"}~@{text "c (mx)"} associates mixfix |
|
568 syntax with an existing type constructor. The arity of the |
|
569 constructor is retrieved from the context. |
|
570 |
|
571 \item @{command "no_type_notation"} is similar to @{command |
|
572 "type_notation"}, but removes the specified syntax annotation from |
|
573 the present context. |
|
574 |
|
575 \item @{command "notation"}~@{text "c (mx)"} associates mixfix |
|
576 syntax with an existing constant or fixed variable. The type |
|
577 declaration of the given entity is retrieved from the context. |
|
578 |
|
579 \item @{command "no_notation"} is similar to @{command "notation"}, |
|
580 but removes the specified syntax annotation from the present |
|
581 context. |
|
582 |
|
583 \item @{command "write"} is similar to @{command "notation"}, but |
|
584 works within an Isar proof body. |
|
585 |
|
586 \end{description} |
|
587 *} |
|
588 |
|
589 |
|
590 section {* The Pure syntax \label{sec:pure-syntax} *} |
|
591 |
|
592 subsection {* Lexical matters \label{sec:inner-lex} *} |
|
593 |
|
594 text {* The inner lexical syntax vaguely resembles the outer one |
|
595 (\secref{sec:outer-lex}), but some details are different. There are |
|
596 two main categories of inner syntax tokens: |
|
597 |
|
598 \begin{enumerate} |
|
599 |
|
600 \item \emph{delimiters} --- the literal tokens occurring in |
|
601 productions of the given priority grammar (cf.\ |
|
602 \secref{sec:priority-grammar}); |
|
603 |
|
604 \item \emph{named tokens} --- various categories of identifiers etc. |
|
605 |
|
606 \end{enumerate} |
|
607 |
|
608 Delimiters override named tokens and may thus render certain |
|
609 identifiers inaccessible. Sometimes the logical context admits |
|
610 alternative ways to refer to the same entity, potentially via |
|
611 qualified names. |
|
612 |
|
613 \medskip The categories for named tokens are defined once and for |
|
614 all as follows, reusing some categories of the outer token syntax |
|
615 (\secref{sec:outer-lex}). |
|
616 |
|
617 \begin{center} |
|
618 \begin{supertabular}{rcl} |
|
619 @{syntax_def (inner) id} & = & @{syntax_ref ident} \\ |
|
620 @{syntax_def (inner) longid} & = & @{syntax_ref longident} \\ |
|
621 @{syntax_def (inner) var} & = & @{syntax_ref var} \\ |
|
622 @{syntax_def (inner) tid} & = & @{syntax_ref typefree} \\ |
|
623 @{syntax_def (inner) tvar} & = & @{syntax_ref typevar} \\ |
|
624 @{syntax_def (inner) num_token} & = & @{syntax_ref nat}@{text " | "}@{verbatim "-"}@{syntax_ref nat} \\ |
|
625 @{syntax_def (inner) float_token} & = & @{syntax_ref nat}@{verbatim "."}@{syntax_ref nat}@{text " | "}@{verbatim "-"}@{syntax_ref nat}@{verbatim "."}@{syntax_ref nat} \\ |
|
626 @{syntax_def (inner) xnum_token} & = & @{verbatim "#"}@{syntax_ref nat}@{text " | "}@{verbatim "#-"}@{syntax_ref nat} \\ |
|
627 |
|
628 @{syntax_def (inner) str_token} & = & @{verbatim "''"} @{text "\<dots>"} @{verbatim "''"} \\ |
|
629 \end{supertabular} |
|
630 \end{center} |
|
631 |
|
632 The token categories @{syntax (inner) num_token}, @{syntax (inner) |
|
633 float_token}, @{syntax (inner) xnum_token}, and @{syntax (inner) |
|
634 str_token} are not used in Pure. Object-logics may implement numerals |
|
635 and string constants by adding appropriate syntax declarations, |
|
636 together with some translation functions (e.g.\ see Isabelle/HOL). |
|
637 |
|
638 The derived categories @{syntax_def (inner) num_const}, @{syntax_def |
|
639 (inner) float_const}, and @{syntax_def (inner) num_const} provide |
|
640 robust access to the respective tokens: the syntax tree holds a |
|
641 syntactic constant instead of a free variable. |
|
642 *} |
|
643 |
|
644 |
|
645 subsection {* Priority grammars \label{sec:priority-grammar} *} |
|
646 |
|
647 text {* A context-free grammar consists of a set of \emph{terminal |
|
648 symbols}, a set of \emph{nonterminal symbols} and a set of |
|
649 \emph{productions}. Productions have the form @{text "A = \<gamma>"}, |
|
650 where @{text A} is a nonterminal and @{text \<gamma>} is a string of |
|
651 terminals and nonterminals. One designated nonterminal is called |
|
652 the \emph{root symbol}. The language defined by the grammar |
|
653 consists of all strings of terminals that can be derived from the |
|
654 root symbol by applying productions as rewrite rules. |
|
655 |
|
656 The standard Isabelle parser for inner syntax uses a \emph{priority |
|
657 grammar}. Each nonterminal is decorated by an integer priority: |
|
658 @{text "A\<^sup>(\<^sup>p\<^sup>)"}. In a derivation, @{text "A\<^sup>(\<^sup>p\<^sup>)"} may be rewritten |
|
659 using a production @{text "A\<^sup>(\<^sup>q\<^sup>) = \<gamma>"} only if @{text "p \<le> q"}. Any |
|
660 priority grammar can be translated into a normal context-free |
|
661 grammar by introducing new nonterminals and productions. |
|
662 |
|
663 \medskip Formally, a set of context free productions @{text G} |
|
664 induces a derivation relation @{text "\<longrightarrow>\<^sub>G"} as follows. Let @{text |
|
665 \<alpha>} and @{text \<beta>} denote strings of terminal or nonterminal symbols. |
|
666 Then @{text "\<alpha> A\<^sup>(\<^sup>p\<^sup>) \<beta> \<longrightarrow>\<^sub>G \<alpha> \<gamma> \<beta>"} holds if and only if @{text G} |
|
667 contains some production @{text "A\<^sup>(\<^sup>q\<^sup>) = \<gamma>"} for @{text "p \<le> q"}. |
|
668 |
|
669 \medskip The following grammar for arithmetic expressions |
|
670 demonstrates how binding power and associativity of operators can be |
|
671 enforced by priorities. |
|
672 |
|
673 \begin{center} |
|
674 \begin{tabular}{rclr} |
|
675 @{text "A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} & @{text "="} & @{verbatim "("} @{text "A\<^sup>(\<^sup>0\<^sup>)"} @{verbatim ")"} \\ |
|
676 @{text "A\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} & @{text "="} & @{verbatim 0} \\ |
|
677 @{text "A\<^sup>(\<^sup>0\<^sup>)"} & @{text "="} & @{text "A\<^sup>(\<^sup>0\<^sup>)"} @{verbatim "+"} @{text "A\<^sup>(\<^sup>1\<^sup>)"} \\ |
|
678 @{text "A\<^sup>(\<^sup>2\<^sup>)"} & @{text "="} & @{text "A\<^sup>(\<^sup>3\<^sup>)"} @{verbatim "*"} @{text "A\<^sup>(\<^sup>2\<^sup>)"} \\ |
|
679 @{text "A\<^sup>(\<^sup>3\<^sup>)"} & @{text "="} & @{verbatim "-"} @{text "A\<^sup>(\<^sup>3\<^sup>)"} \\ |
|
680 \end{tabular} |
|
681 \end{center} |
|
682 The choice of priorities determines that @{verbatim "-"} binds |
|
683 tighter than @{verbatim "*"}, which binds tighter than @{verbatim |
|
684 "+"}. Furthermore @{verbatim "+"} associates to the left and |
|
685 @{verbatim "*"} to the right. |
|
686 |
|
687 \medskip For clarity, grammars obey these conventions: |
|
688 \begin{itemize} |
|
689 |
|
690 \item All priorities must lie between 0 and 1000. |
|
691 |
|
692 \item Priority 0 on the right-hand side and priority 1000 on the |
|
693 left-hand side may be omitted. |
|
694 |
|
695 \item The production @{text "A\<^sup>(\<^sup>p\<^sup>) = \<alpha>"} is written as @{text "A = \<alpha> |
|
696 (p)"}, i.e.\ the priority of the left-hand side actually appears in |
|
697 a column on the far right. |
|
698 |
|
699 \item Alternatives are separated by @{text "|"}. |
|
700 |
|
701 \item Repetition is indicated by dots @{text "(\<dots>)"} in an informal |
|
702 but obvious way. |
|
703 |
|
704 \end{itemize} |
|
705 |
|
706 Using these conventions, the example grammar specification above |
|
707 takes the form: |
|
708 \begin{center} |
|
709 \begin{tabular}{rclc} |
|
710 @{text A} & @{text "="} & @{verbatim "("} @{text A} @{verbatim ")"} \\ |
|
711 & @{text "|"} & @{verbatim 0} & \qquad\qquad \\ |
|
712 & @{text "|"} & @{text A} @{verbatim "+"} @{text "A\<^sup>(\<^sup>1\<^sup>)"} & @{text "(0)"} \\ |
|
713 & @{text "|"} & @{text "A\<^sup>(\<^sup>3\<^sup>)"} @{verbatim "*"} @{text "A\<^sup>(\<^sup>2\<^sup>)"} & @{text "(2)"} \\ |
|
714 & @{text "|"} & @{verbatim "-"} @{text "A\<^sup>(\<^sup>3\<^sup>)"} & @{text "(3)"} \\ |
|
715 \end{tabular} |
|
716 \end{center} |
|
717 *} |
|
718 |
|
719 |
|
720 subsection {* The Pure grammar \label{sec:pure-grammar} *} |
|
721 |
|
722 text {* The priority grammar of the @{text "Pure"} theory is defined |
|
723 approximately like this: |
|
724 |
|
725 \begin{center} |
|
726 \begin{supertabular}{rclr} |
|
727 |
|
728 @{syntax_def (inner) any} & = & @{text "prop | logic"} \\\\ |
|
729 |
|
730 @{syntax_def (inner) prop} & = & @{verbatim "("} @{text prop} @{verbatim ")"} \\ |
|
731 & @{text "|"} & @{text "prop\<^sup>(\<^sup>4\<^sup>)"} @{verbatim "::"} @{text type} & @{text "(3)"} \\ |
|
732 & @{text "|"} & @{text "any\<^sup>(\<^sup>3\<^sup>)"} @{verbatim "=="} @{text "any\<^sup>(\<^sup>2\<^sup>)"} & @{text "(2)"} \\ |
|
733 & @{text "|"} & @{text "any\<^sup>(\<^sup>3\<^sup>)"} @{text "\<equiv>"} @{text "any\<^sup>(\<^sup>2\<^sup>)"} & @{text "(2)"} \\ |
|
734 & @{text "|"} & @{text "prop\<^sup>(\<^sup>3\<^sup>)"} @{verbatim "&&&"} @{text "prop\<^sup>(\<^sup>2\<^sup>)"} & @{text "(2)"} \\ |
|
735 & @{text "|"} & @{text "prop\<^sup>(\<^sup>2\<^sup>)"} @{verbatim "==>"} @{text "prop\<^sup>(\<^sup>1\<^sup>)"} & @{text "(1)"} \\ |
|
736 & @{text "|"} & @{text "prop\<^sup>(\<^sup>2\<^sup>)"} @{text "\<Longrightarrow>"} @{text "prop\<^sup>(\<^sup>1\<^sup>)"} & @{text "(1)"} \\ |
|
737 & @{text "|"} & @{verbatim "[|"} @{text prop} @{verbatim ";"} @{text "\<dots>"} @{verbatim ";"} @{text prop} @{verbatim "|]"} @{verbatim "==>"} @{text "prop\<^sup>(\<^sup>1\<^sup>)"} & @{text "(1)"} \\ |
|
738 & @{text "|"} & @{text "\<lbrakk>"} @{text prop} @{verbatim ";"} @{text "\<dots>"} @{verbatim ";"} @{text prop} @{text "\<rbrakk>"} @{text "\<Longrightarrow>"} @{text "prop\<^sup>(\<^sup>1\<^sup>)"} & @{text "(1)"} \\ |
|
739 & @{text "|"} & @{verbatim "!!"} @{text idts} @{verbatim "."} @{text prop} & @{text "(0)"} \\ |
|
740 & @{text "|"} & @{text "\<And>"} @{text idts} @{verbatim "."} @{text prop} & @{text "(0)"} \\ |
|
741 & @{text "|"} & @{verbatim OFCLASS} @{verbatim "("} @{text type} @{verbatim ","} @{text logic} @{verbatim ")"} \\ |
|
742 & @{text "|"} & @{verbatim SORT_CONSTRAINT} @{verbatim "("} @{text type} @{verbatim ")"} \\ |
|
743 & @{text "|"} & @{verbatim TERM} @{text logic} \\ |
|
744 & @{text "|"} & @{verbatim PROP} @{text aprop} \\\\ |
|
745 |
|
746 @{syntax_def (inner) aprop} & = & @{verbatim "("} @{text aprop} @{verbatim ")"} \\ |
|
747 & @{text "|"} & @{text "id | longid | var | "}@{verbatim "_"}@{text " | "}@{verbatim "..."} \\ |
|
748 & @{text "|"} & @{verbatim CONST} @{text "id | "}@{verbatim CONST} @{text "longid"} \\ |
|
749 & @{text "|"} & @{verbatim XCONST} @{text "id | "}@{verbatim XCONST} @{text "longid"} \\ |
|
750 & @{text "|"} & @{text "logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} & @{text "(999)"} \\\\ |
|
751 |
|
752 @{syntax_def (inner) logic} & = & @{verbatim "("} @{text logic} @{verbatim ")"} \\ |
|
753 & @{text "|"} & @{text "logic\<^sup>(\<^sup>4\<^sup>)"} @{verbatim "::"} @{text type} & @{text "(3)"} \\ |
|
754 & @{text "|"} & @{text "id | longid | var | "}@{verbatim "_"}@{text " | "}@{verbatim "..."} \\ |
|
755 & @{text "|"} & @{verbatim CONST} @{text "id | "}@{verbatim CONST} @{text "longid"} \\ |
|
756 & @{text "|"} & @{verbatim XCONST} @{text "id | "}@{verbatim XCONST} @{text "longid"} \\ |
|
757 & @{text "|"} & @{text "logic\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) \<dots> any\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} & @{text "(999)"} \\ |
|
758 & @{text "|"} & @{text "\<struct> index\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>)"} \\ |
|
759 & @{text "|"} & @{verbatim "%"} @{text pttrns} @{verbatim "."} @{text "any\<^sup>(\<^sup>3\<^sup>)"} & @{text "(3)"} \\ |
|
760 & @{text "|"} & @{text \<lambda>} @{text pttrns} @{verbatim "."} @{text "any\<^sup>(\<^sup>3\<^sup>)"} & @{text "(3)"} \\ |
|
761 & @{text "|"} & @{verbatim op} @{verbatim "=="}@{text " | "}@{verbatim op} @{text "\<equiv>"}@{text " | "}@{verbatim op} @{verbatim "&&&"} \\ |
|
762 & @{text "|"} & @{verbatim op} @{verbatim "==>"}@{text " | "}@{verbatim op} @{text "\<Longrightarrow>"} \\ |
|
763 & @{text "|"} & @{verbatim TYPE} @{verbatim "("} @{text type} @{verbatim ")"} \\\\ |
|
764 |
|
765 @{syntax_def (inner) idt} & = & @{verbatim "("} @{text idt} @{verbatim ")"}@{text " | id | "}@{verbatim "_"} \\ |
|
766 & @{text "|"} & @{text id} @{verbatim "::"} @{text type} & @{text "(0)"} \\ |
|
767 & @{text "|"} & @{verbatim "_"} @{verbatim "::"} @{text type} & @{text "(0)"} \\\\ |
|
768 |
|
769 @{syntax_def (inner) index} & = & @{verbatim "\<^bsub>"} @{text "logic\<^sup>(\<^sup>0\<^sup>)"} @{verbatim "\<^esub>"}@{text " | | \<index>"} \\\\ |
|
770 |
|
771 @{syntax_def (inner) idts} & = & @{text "idt | idt\<^sup>(\<^sup>1\<^sup>) idts"} & @{text "(0)"} \\\\ |
|
772 |
|
773 @{syntax_def (inner) pttrn} & = & @{text idt} \\\\ |
|
774 |
|
775 @{syntax_def (inner) pttrns} & = & @{text "pttrn | pttrn\<^sup>(\<^sup>1\<^sup>) pttrns"} & @{text "(0)"} \\\\ |
|
776 |
|
777 @{syntax_def (inner) type} & = & @{verbatim "("} @{text type} @{verbatim ")"} \\ |
|
778 & @{text "|"} & @{text "tid | tvar | "}@{verbatim "_"} \\ |
|
779 & @{text "|"} & @{text "tid"} @{verbatim "::"} @{text "sort | tvar "}@{verbatim "::"} @{text "sort | "}@{verbatim "_"} @{verbatim "::"} @{text "sort"} \\ |
|
780 & @{text "|"} & @{text "type_name | type\<^sup>(\<^sup>1\<^sup>0\<^sup>0\<^sup>0\<^sup>) type_name"} \\ |
|
781 & @{text "|"} & @{verbatim "("} @{text type} @{verbatim ","} @{text "\<dots>"} @{verbatim ","} @{text type} @{verbatim ")"} @{text type_name} \\ |
|
782 & @{text "|"} & @{text "type\<^sup>(\<^sup>1\<^sup>)"} @{verbatim "=>"} @{text type} & @{text "(0)"} \\ |
|
783 & @{text "|"} & @{text "type\<^sup>(\<^sup>1\<^sup>)"} @{text "\<Rightarrow>"} @{text type} & @{text "(0)"} \\ |
|
784 & @{text "|"} & @{verbatim "["} @{text type} @{verbatim ","} @{text "\<dots>"} @{verbatim ","} @{text type} @{verbatim "]"} @{verbatim "=>"} @{text type} & @{text "(0)"} \\ |
|
785 & @{text "|"} & @{verbatim "["} @{text type} @{verbatim ","} @{text "\<dots>"} @{verbatim ","} @{text type} @{verbatim "]"} @{text "\<Rightarrow>"} @{text type} & @{text "(0)"} \\ |
|
786 @{syntax_def (inner) type_name} & = & @{text "id | longid"} \\\\ |
|
787 |
|
788 @{syntax_def (inner) sort} & = & @{syntax class_name}~@{text " | "}@{verbatim "{}"} \\ |
|
789 & @{text "|"} & @{verbatim "{"} @{syntax class_name} @{verbatim ","} @{text "\<dots>"} @{verbatim ","} @{syntax class_name} @{verbatim "}"} \\ |
|
790 @{syntax_def (inner) class_name} & = & @{text "id | longid"} \\ |
|
791 \end{supertabular} |
|
792 \end{center} |
|
793 |
|
794 \medskip Here literal terminals are printed @{verbatim "verbatim"}; |
|
795 see also \secref{sec:inner-lex} for further token categories of the |
|
796 inner syntax. The meaning of the nonterminals defined by the above |
|
797 grammar is as follows: |
|
798 |
|
799 \begin{description} |
|
800 |
|
801 \item @{syntax_ref (inner) any} denotes any term. |
|
802 |
|
803 \item @{syntax_ref (inner) prop} denotes meta-level propositions, |
|
804 which are terms of type @{typ prop}. The syntax of such formulae of |
|
805 the meta-logic is carefully distinguished from usual conventions for |
|
806 object-logics. In particular, plain @{text "\<lambda>"}-term notation is |
|
807 \emph{not} recognized as @{syntax (inner) prop}. |
|
808 |
|
809 \item @{syntax_ref (inner) aprop} denotes atomic propositions, which |
|
810 are embedded into regular @{syntax (inner) prop} by means of an |
|
811 explicit @{verbatim PROP} token. |
|
812 |
|
813 Terms of type @{typ prop} with non-constant head, e.g.\ a plain |
|
814 variable, are printed in this form. Constants that yield type @{typ |
|
815 prop} are expected to provide their own concrete syntax; otherwise |
|
816 the printed version will appear like @{syntax (inner) logic} and |
|
817 cannot be parsed again as @{syntax (inner) prop}. |
|
818 |
|
819 \item @{syntax_ref (inner) logic} denotes arbitrary terms of a |
|
820 logical type, excluding type @{typ prop}. This is the main |
|
821 syntactic category of object-logic entities, covering plain @{text |
|
822 \<lambda>}-term notation (variables, abstraction, application), plus |
|
823 anything defined by the user. |
|
824 |
|
825 When specifying notation for logical entities, all logical types |
|
826 (excluding @{typ prop}) are \emph{collapsed} to this single category |
|
827 of @{syntax (inner) logic}. |
|
828 |
|
829 \item @{syntax_ref (inner) index} denotes an optional index term for |
|
830 indexed syntax. If omitted, it refers to the first @{keyword |
|
831 "structure"} variable in the context. The special dummy ``@{text |
|
832 "\<index>"}'' serves as pattern variable in mixfix annotations that |
|
833 introduce indexed notation. |
|
834 |
|
835 \item @{syntax_ref (inner) idt} denotes identifiers, possibly |
|
836 constrained by types. |
|
837 |
|
838 \item @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref |
|
839 (inner) idt}. This is the most basic category for variables in |
|
840 iterated binders, such as @{text "\<lambda>"} or @{text "\<And>"}. |
|
841 |
|
842 \item @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns} |
|
843 denote patterns for abstraction, cases bindings etc. In Pure, these |
|
844 categories start as a merely copy of @{syntax (inner) idt} and |
|
845 @{syntax (inner) idts}, respectively. Object-logics may add |
|
846 additional productions for binding forms. |
|
847 |
|
848 \item @{syntax_ref (inner) type} denotes types of the meta-logic. |
|
849 |
|
850 \item @{syntax_ref (inner) sort} denotes meta-level sorts. |
|
851 |
|
852 \end{description} |
|
853 |
|
854 Here are some further explanations of certain syntax features. |
|
855 |
|
856 \begin{itemize} |
|
857 |
|
858 \item In @{syntax (inner) idts}, note that @{text "x :: nat y"} is |
|
859 parsed as @{text "x :: (nat y)"}, treating @{text y} like a type |
|
860 constructor applied to @{text nat}. To avoid this interpretation, |
|
861 write @{text "(x :: nat) y"} with explicit parentheses. |
|
862 |
|
863 \item Similarly, @{text "x :: nat y :: nat"} is parsed as @{text "x :: |
|
864 (nat y :: nat)"}. The correct form is @{text "(x :: nat) (y :: |
|
865 nat)"}, or @{text "(x :: nat) y :: nat"} if @{text y} is last in the |
|
866 sequence of identifiers. |
|
867 |
|
868 \item Type constraints for terms bind very weakly. For example, |
|
869 @{text "x < y :: nat"} is normally parsed as @{text "(x < y) :: |
|
870 nat"}, unless @{text "<"} has a very low priority, in which case the |
|
871 input is likely to be ambiguous. The correct form is @{text "x < (y |
|
872 :: nat)"}. |
|
873 |
|
874 \item Constraints may be either written with two literal colons |
|
875 ``@{verbatim "::"}'' or the double-colon symbol @{verbatim "\<Colon>"}, |
|
876 which actually looks exactly the same in some {\LaTeX} styles. |
|
877 |
|
878 \item Dummy variables (written as underscore) may occur in different |
|
879 roles. |
|
880 |
|
881 \begin{description} |
|
882 |
|
883 \item A type ``@{text "_"}'' or ``@{text "_ :: sort"}'' acts like an |
|
884 anonymous inference parameter, which is filled-in according to the |
|
885 most general type produced by the type-checking phase. |
|
886 |
|
887 \item A bound ``@{text "_"}'' refers to a vacuous abstraction, where |
|
888 the body does not refer to the binding introduced here. As in the |
|
889 term @{term "\<lambda>x _. x"}, which is @{text "\<alpha>"}-equivalent to @{text |
|
890 "\<lambda>x y. x"}. |
|
891 |
|
892 \item A free ``@{text "_"}'' refers to an implicit outer binding. |
|
893 Higher definitional packages usually allow forms like @{text "f x _ |
|
894 = x"}. |
|
895 |
|
896 \item A schematic ``@{text "_"}'' (within a term pattern, see |
|
897 \secref{sec:term-decls}) refers to an anonymous variable that is |
|
898 implicitly abstracted over its context of locally bound variables. |
|
899 For example, this allows pattern matching of @{text "{x. f x = g |
|
900 x}"} against @{text "{x. _ = _}"}, or even @{text "{_. _ = _}"} by |
|
901 using both bound and schematic dummies. |
|
902 |
|
903 \end{description} |
|
904 |
|
905 \item The three literal dots ``@{verbatim "..."}'' may be also |
|
906 written as ellipsis symbol @{verbatim "\<dots>"}. In both cases this |
|
907 refers to a special schematic variable, which is bound in the |
|
908 context. This special term abbreviation works nicely with |
|
909 calculational reasoning (\secref{sec:calculation}). |
|
910 |
|
911 \item @{verbatim CONST} ensures that the given identifier is treated |
|
912 as constant term, and passed through the parse tree in fully |
|
913 internalized form. This is particularly relevant for translation |
|
914 rules (\secref{sec:syn-trans}), notably on the RHS. |
|
915 |
|
916 \item @{verbatim XCONST} is similar to @{verbatim CONST}, but |
|
917 retains the constant name as given. This is only relevant to |
|
918 translation rules (\secref{sec:syn-trans}), notably on the LHS. |
|
919 |
|
920 \end{itemize} |
|
921 *} |
|
922 |
|
923 |
|
924 subsection {* Inspecting the syntax *} |
|
925 |
|
926 text {* |
|
927 \begin{matharray}{rcl} |
|
928 @{command_def "print_syntax"}@{text "\<^sup>*"} & : & @{text "context \<rightarrow>"} \\ |
|
929 \end{matharray} |
|
930 |
|
931 \begin{description} |
|
932 |
|
933 \item @{command "print_syntax"} prints the inner syntax of the |
|
934 current context. The output can be quite large; the most important |
|
935 sections are explained below. |
|
936 |
|
937 \begin{description} |
|
938 |
|
939 \item @{text "lexicon"} lists the delimiters of the inner token |
|
940 language; see \secref{sec:inner-lex}. |
|
941 |
|
942 \item @{text "prods"} lists the productions of the underlying |
|
943 priority grammar; see \secref{sec:priority-grammar}. |
|
944 |
|
945 The nonterminal @{text "A\<^sup>(\<^sup>p\<^sup>)"} is rendered in plain text as @{text |
|
946 "A[p]"}; delimiters are quoted. Many productions have an extra |
|
947 @{text "\<dots> => name"}. These names later become the heads of parse |
|
948 trees; they also guide the pretty printer. |
|
949 |
|
950 Productions without such parse tree names are called \emph{copy |
|
951 productions}. Their right-hand side must have exactly one |
|
952 nonterminal symbol (or named token). The parser does not create a |
|
953 new parse tree node for copy productions, but simply returns the |
|
954 parse tree of the right-hand symbol. |
|
955 |
|
956 If the right-hand side of a copy production consists of a single |
|
957 nonterminal without any delimiters, then it is called a \emph{chain |
|
958 production}. Chain productions act as abbreviations: conceptually, |
|
959 they are removed from the grammar by adding new productions. |
|
960 Priority information attached to chain productions is ignored; only |
|
961 the dummy value @{text "-1"} is displayed. |
|
962 |
|
963 \item @{text "print modes"} lists the alternative print modes |
|
964 provided by this grammar; see \secref{sec:print-modes}. |
|
965 |
|
966 \item @{text "parse_rules"} and @{text "print_rules"} relate to |
|
967 syntax translations (macros); see \secref{sec:syn-trans}. |
|
968 |
|
969 \item @{text "parse_ast_translation"} and @{text |
|
970 "print_ast_translation"} list sets of constants that invoke |
|
971 translation functions for abstract syntax trees, which are only |
|
972 required in very special situations; see \secref{sec:tr-funs}. |
|
973 |
|
974 \item @{text "parse_translation"} and @{text "print_translation"} |
|
975 list the sets of constants that invoke regular translation |
|
976 functions; see \secref{sec:tr-funs}. |
|
977 |
|
978 \end{description} |
|
979 |
|
980 \end{description} |
|
981 *} |
|
982 |
|
983 |
|
984 subsection {* Ambiguity of parsed expressions *} |
|
985 |
|
986 text {* |
|
987 \begin{tabular}{rcll} |
|
988 @{attribute_def syntax_ambiguity_warning} & : & @{text attribute} & default @{text true} \\ |
|
989 @{attribute_def syntax_ambiguity_limit} & : & @{text attribute} & default @{text 10} \\ |
|
990 \end{tabular} |
|
991 |
|
992 Depending on the grammar and the given input, parsing may be |
|
993 ambiguous. Isabelle lets the Earley parser enumerate all possible |
|
994 parse trees, and then tries to make the best out of the situation. |
|
995 Terms that cannot be type-checked are filtered out, which often |
|
996 leads to a unique result in the end. Unlike regular type |
|
997 reconstruction, which is applied to the whole collection of input |
|
998 terms simultaneously, the filtering stage only treats each given |
|
999 term in isolation. Filtering is also not attempted for individual |
|
1000 types or raw ASTs (as required for @{command translations}). |
|
1001 |
|
1002 Certain warning or error messages are printed, depending on the |
|
1003 situation and the given configuration options. Parsing ultimately |
|
1004 fails, if multiple results remain after the filtering phase. |
|
1005 |
|
1006 \begin{description} |
|
1007 |
|
1008 \item @{attribute syntax_ambiguity_warning} controls output of |
|
1009 explicit warning messages about syntax ambiguity. |
|
1010 |
|
1011 \item @{attribute syntax_ambiguity_limit} determines the number of |
|
1012 resulting parse trees that are shown as part of the printed message |
|
1013 in case of an ambiguity. |
|
1014 |
|
1015 \end{description} |
|
1016 *} |
|
1017 |
|
1018 |
|
1019 section {* Syntax transformations \label{sec:syntax-transformations} *} |
|
1020 |
|
1021 text {* The inner syntax engine of Isabelle provides separate |
|
1022 mechanisms to transform parse trees either as rewrite systems on |
|
1023 first-order ASTs (\secref{sec:syn-trans}), or ML functions on ASTs |
|
1024 or syntactic @{text "\<lambda>"}-terms (\secref{sec:tr-funs}). This works |
|
1025 both for parsing and printing, as outlined in |
|
1026 \figref{fig:parse-print}. |
|
1027 |
|
1028 \begin{figure}[htbp] |
|
1029 \begin{center} |
|
1030 \begin{tabular}{cl} |
|
1031 string & \\ |
|
1032 @{text "\<down>"} & lexer + parser \\ |
|
1033 parse tree & \\ |
|
1034 @{text "\<down>"} & parse AST translation \\ |
|
1035 AST & \\ |
|
1036 @{text "\<down>"} & AST rewriting (macros) \\ |
|
1037 AST & \\ |
|
1038 @{text "\<down>"} & parse translation \\ |
|
1039 --- pre-term --- & \\ |
|
1040 @{text "\<down>"} & print translation \\ |
|
1041 AST & \\ |
|
1042 @{text "\<down>"} & AST rewriting (macros) \\ |
|
1043 AST & \\ |
|
1044 @{text "\<down>"} & print AST translation \\ |
|
1045 string & |
|
1046 \end{tabular} |
|
1047 \end{center} |
|
1048 \caption{Parsing and printing with translations}\label{fig:parse-print} |
|
1049 \end{figure} |
|
1050 |
|
1051 These intermediate syntax tree formats eventually lead to a pre-term |
|
1052 with all names and binding scopes resolved, but most type |
|
1053 information still missing. Explicit type constraints might be given by |
|
1054 the user, or implicit position information by the system --- both |
|
1055 need to be passed-through carefully by syntax transformations. |
|
1056 |
|
1057 Pre-terms are further processed by the so-called \emph{check} and |
|
1058 \emph{unckeck} phases that are intertwined with type-inference (see |
|
1059 also \cite{isabelle-implementation}). The latter allows to operate |
|
1060 on higher-order abstract syntax with proper binding and type |
|
1061 information already available. |
|
1062 |
|
1063 As a rule of thumb, anything that manipulates bindings of variables |
|
1064 or constants needs to be implemented as syntax transformation (see |
|
1065 below). Anything else is better done via check/uncheck: a prominent |
|
1066 example application is the @{command abbreviation} concept of |
|
1067 Isabelle/Pure. *} |
|
1068 |
|
1069 |
|
1070 subsection {* Abstract syntax trees \label{sec:ast} *} |
|
1071 |
|
1072 text {* The ML datatype @{ML_type Ast.ast} explicitly represents the |
|
1073 intermediate AST format that is used for syntax rewriting |
|
1074 (\secref{sec:syn-trans}). It is defined in ML as follows: |
|
1075 \begin{ttbox} |
|
1076 datatype ast = |
|
1077 Constant of string | |
|
1078 Variable of string | |
|
1079 Appl of ast list |
|
1080 \end{ttbox} |
|
1081 |
|
1082 An AST is either an atom (constant or variable) or a list of (at |
|
1083 least two) subtrees. Occasional diagnostic output of ASTs uses |
|
1084 notation that resembles S-expression of LISP. Constant atoms are |
|
1085 shown as quoted strings, variable atoms as non-quoted strings and |
|
1086 applications as a parenthesized list of subtrees. For example, the |
|
1087 AST |
|
1088 @{ML [display] "Ast.Appl |
|
1089 [Ast.Constant \"_abs\", Ast.Variable \"x\", Ast.Variable \"t\"]"} |
|
1090 is pretty-printed as @{verbatim "(\"_abs\" x t)"}. Note that |
|
1091 @{verbatim "()"} and @{verbatim "(x)"} are excluded as ASTs, because |
|
1092 they have too few subtrees. |
|
1093 |
|
1094 \medskip AST application is merely a pro-forma mechanism to indicate |
|
1095 certain syntactic structures. Thus @{verbatim "(c a b)"} could mean |
|
1096 either term application or type application, depending on the |
|
1097 syntactic context. |
|
1098 |
|
1099 Nested application like @{verbatim "((\"_abs\" x t) u)"} is also |
|
1100 possible, but ASTs are definitely first-order: the syntax constant |
|
1101 @{verbatim "\"_abs\""} does not bind the @{verbatim x} in any way. |
|
1102 Proper bindings are introduced in later stages of the term syntax, |
|
1103 where @{verbatim "(\"_abs\" x t)"} becomes an @{ML Abs} node and |
|
1104 occurrences of @{verbatim x} in @{verbatim t} are replaced by bound |
|
1105 variables (represented as de-Bruijn indices). |
|
1106 *} |
|
1107 |
|
1108 |
|
1109 subsubsection {* AST constants versus variables *} |
|
1110 |
|
1111 text {* Depending on the situation --- input syntax, output syntax, |
|
1112 translation patterns --- the distinction of atomic asts as @{ML |
|
1113 Ast.Constant} versus @{ML Ast.Variable} serves slightly different |
|
1114 purposes. |
|
1115 |
|
1116 Input syntax of a term such as @{text "f a b = c"} does not yet |
|
1117 indicate the scopes of atomic entities @{text "f, a, b, c"}: they |
|
1118 could be global constants or local variables, even bound ones |
|
1119 depending on the context of the term. @{ML Ast.Variable} leaves |
|
1120 this choice still open: later syntax layers (or translation |
|
1121 functions) may capture such a variable to determine its role |
|
1122 specifically, to make it a constant, bound variable, free variable |
|
1123 etc. In contrast, syntax translations that introduce already known |
|
1124 constants would rather do it via @{ML Ast.Constant} to prevent |
|
1125 accidental re-interpretation later on. |
|
1126 |
|
1127 Output syntax turns term constants into @{ML Ast.Constant} and |
|
1128 variables (free or schematic) into @{ML Ast.Variable}. This |
|
1129 information is precise when printing fully formal @{text "\<lambda>"}-terms. |
|
1130 |
|
1131 In AST translation patterns (\secref{sec:syn-trans}) the system |
|
1132 guesses from the current theory context which atoms should be |
|
1133 treated as constant versus variable for the matching process. |
|
1134 Sometimes this needs to be indicated more explicitly using @{text |
|
1135 "CONST c"} inside the term language. It is also possible to use |
|
1136 @{command syntax} declarations (without mixfix annotation) to |
|
1137 enforce that certain unqualified names are always treated as |
|
1138 constant within the syntax machinery. |
|
1139 |
|
1140 \medskip For ASTs that represent the language of types or sorts, the |
|
1141 situation is much simpler, since the concrete syntax already |
|
1142 distinguishes type variables from type constants (constructors). So |
|
1143 @{text "('a, 'b) foo"} corresponds to an AST application of some |
|
1144 constant for @{text foo} and variable arguments for @{text "'a"} and |
|
1145 @{text "'b"}. Note that the postfix application is merely a feature |
|
1146 of the concrete syntax, while in the AST the constructor occurs in |
|
1147 head position. *} |
|
1148 |
|
1149 |
|
1150 subsubsection {* Authentic syntax names *} |
|
1151 |
|
1152 text {* Naming constant entities within ASTs is another delicate |
|
1153 issue. Unqualified names are looked up in the name space tables in |
|
1154 the last stage of parsing, after all translations have been applied. |
|
1155 Since syntax transformations do not know about this later name |
|
1156 resolution yet, there can be surprises in boundary cases. |
|
1157 |
|
1158 \emph{Authentic syntax names} for @{ML Ast.Constant} avoid this |
|
1159 problem: the fully-qualified constant name with a special prefix for |
|
1160 its formal category (@{text "class"}, @{text "type"}, @{text |
|
1161 "const"}, @{text "fixed"}) represents the information faithfully |
|
1162 within the untyped AST format. Accidental overlap with free or |
|
1163 bound variables is excluded as well. Authentic syntax names work |
|
1164 implicitly in the following situations: |
|
1165 |
|
1166 \begin{itemize} |
|
1167 |
|
1168 \item Input of term constants (or fixed variables) that are |
|
1169 introduced by concrete syntax via @{command notation}: the |
|
1170 correspondence of a particular grammar production to some known term |
|
1171 entity is preserved. |
|
1172 |
|
1173 \item Input of type constants (constructors) and type classes --- |
|
1174 thanks to explicit syntactic distinction independently on the |
|
1175 context. |
|
1176 |
|
1177 \item Output of term constants, type constants, type classes --- |
|
1178 this information is already available from the internal term to be |
|
1179 printed. |
|
1180 |
|
1181 \end{itemize} |
|
1182 |
|
1183 In other words, syntax transformations that operate on input terms |
|
1184 written as prefix applications are difficult to make robust. |
|
1185 Luckily, this case rarely occurs in practice, because syntax forms |
|
1186 to be translated usually correspond to some bits of concrete |
|
1187 notation. *} |
|
1188 |
|
1189 |
|
1190 subsection {* Raw syntax and translations \label{sec:syn-trans} *} |
|
1191 |
|
1192 text {* |
|
1193 \begin{tabular}{rcll} |
|
1194 @{command_def "nonterminal"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1195 @{command_def "syntax"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1196 @{command_def "no_syntax"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1197 @{command_def "translations"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1198 @{command_def "no_translations"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1199 @{attribute_def syntax_ast_trace} & : & @{text attribute} & default @{text false} \\ |
|
1200 @{attribute_def syntax_ast_stats} & : & @{text attribute} & default @{text false} \\ |
|
1201 \end{tabular} |
|
1202 |
|
1203 Unlike mixfix notation for existing formal entities |
|
1204 (\secref{sec:notation}), raw syntax declarations provide full access |
|
1205 to the priority grammar of the inner syntax, without any sanity |
|
1206 checks. This includes additional syntactic categories (via |
|
1207 @{command nonterminal}) and free-form grammar productions (via |
|
1208 @{command syntax}). Additional syntax translations (or macros, via |
|
1209 @{command translations}) are required to turn resulting parse trees |
|
1210 into proper representations of formal entities again. |
|
1211 |
|
1212 @{rail " |
|
1213 @@{command nonterminal} (@{syntax name} + @'and') |
|
1214 ; |
|
1215 (@@{command syntax} | @@{command no_syntax}) @{syntax mode}? (constdecl +) |
|
1216 ; |
|
1217 (@@{command translations} | @@{command no_translations}) |
|
1218 (transpat ('==' | '=>' | '<=' | '\<rightleftharpoons>' | '\<rightharpoonup>' | '\<leftharpoondown>') transpat +) |
|
1219 ; |
|
1220 |
|
1221 constdecl: @{syntax name} '::' @{syntax type} @{syntax mixfix}? |
|
1222 ; |
|
1223 mode: ('(' ( @{syntax name} | @'output' | @{syntax name} @'output' ) ')') |
|
1224 ; |
|
1225 transpat: ('(' @{syntax nameref} ')')? @{syntax string} |
|
1226 "} |
|
1227 |
|
1228 \begin{description} |
|
1229 |
|
1230 \item @{command "nonterminal"}~@{text c} declares a type |
|
1231 constructor @{text c} (without arguments) to act as purely syntactic |
|
1232 type: a nonterminal symbol of the inner syntax. |
|
1233 |
|
1234 \item @{command "syntax"}~@{text "(mode) c :: \<sigma> (mx)"} augments the |
|
1235 priority grammar and the pretty printer table for the given print |
|
1236 mode (default @{verbatim "\"\""}). An optional keyword @{keyword_ref |
|
1237 "output"} means that only the pretty printer table is affected. |
|
1238 |
|
1239 Following \secref{sec:mixfix}, the mixfix annotation @{text "mx = |
|
1240 template ps q"} together with type @{text "\<sigma> = \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>"} and |
|
1241 specify a grammar production. The @{text template} contains |
|
1242 delimiter tokens that surround @{text "n"} argument positions |
|
1243 (@{verbatim "_"}). The latter correspond to nonterminal symbols |
|
1244 @{text "A\<^sub>i"} derived from the argument types @{text "\<tau>\<^sub>i"} as |
|
1245 follows: |
|
1246 \begin{itemize} |
|
1247 |
|
1248 \item @{text "prop"} if @{text "\<tau>\<^sub>i = prop"} |
|
1249 |
|
1250 \item @{text "logic"} if @{text "\<tau>\<^sub>i = (\<dots>)\<kappa>"} for logical type |
|
1251 constructor @{text "\<kappa> \<noteq> prop"} |
|
1252 |
|
1253 \item @{text any} if @{text "\<tau>\<^sub>i = \<alpha>"} for type variables |
|
1254 |
|
1255 \item @{text "\<kappa>"} if @{text "\<tau>\<^sub>i = \<kappa>"} for nonterminal @{text "\<kappa>"} |
|
1256 (syntactic type constructor) |
|
1257 |
|
1258 \end{itemize} |
|
1259 |
|
1260 Each @{text "A\<^sub>i"} is decorated by priority @{text "p\<^sub>i"} from the |
|
1261 given list @{text "ps"}; misssing priorities default to 0. |
|
1262 |
|
1263 The resulting nonterminal of the production is determined similarly |
|
1264 from type @{text "\<tau>"}, with priority @{text "q"} and default 1000. |
|
1265 |
|
1266 \medskip Parsing via this production produces parse trees @{text |
|
1267 "t\<^sub>1, \<dots>, t\<^sub>n"} for the argument slots. The resulting parse tree is |
|
1268 composed as @{text "c t\<^sub>1 \<dots> t\<^sub>n"}, by using the syntax constant @{text |
|
1269 "c"} of the syntax declaration. |
|
1270 |
|
1271 Such syntactic constants are invented on the spot, without formal |
|
1272 check wrt.\ existing declarations. It is conventional to use plain |
|
1273 identifiers prefixed by a single underscore (e.g.\ @{text |
|
1274 "_foobar"}). Names should be chosen with care, to avoid clashes |
|
1275 with other syntax declarations. |
|
1276 |
|
1277 \medskip The special case of copy production is specified by @{text |
|
1278 "c = "}@{verbatim "\"\""} (empty string). It means that the |
|
1279 resulting parse tree @{text "t"} is copied directly, without any |
|
1280 further decoration. |
|
1281 |
|
1282 \item @{command "no_syntax"}~@{text "(mode) decls"} removes grammar |
|
1283 declarations (and translations) resulting from @{text decls}, which |
|
1284 are interpreted in the same manner as for @{command "syntax"} above. |
|
1285 |
|
1286 \item @{command "translations"}~@{text rules} specifies syntactic |
|
1287 translation rules (i.e.\ macros) as first-order rewrite rules on |
|
1288 ASTs (\secref{sec:ast}). The theory context maintains two |
|
1289 independent lists translation rules: parse rules (@{verbatim "=>"} |
|
1290 or @{text "\<rightharpoonup>"}) and print rules (@{verbatim "<="} or @{text "\<leftharpoondown>"}). |
|
1291 For convenience, both can be specified simultaneously as parse~/ |
|
1292 print rules (@{verbatim "=="} or @{text "\<rightleftharpoons>"}). |
|
1293 |
|
1294 Translation patterns may be prefixed by the syntactic category to be |
|
1295 used for parsing; the default is @{text logic} which means that |
|
1296 regular term syntax is used. Both sides of the syntax translation |
|
1297 rule undergo parsing and parse AST translations |
|
1298 \secref{sec:tr-funs}, in order to perform some fundamental |
|
1299 normalization like @{text "\<lambda>x y. b \<leadsto> \<lambda>x. \<lambda>y. b"}, but other AST |
|
1300 translation rules are \emph{not} applied recursively here. |
|
1301 |
|
1302 When processing AST patterns, the inner syntax lexer runs in a |
|
1303 different mode that allows identifiers to start with underscore. |
|
1304 This accommodates the usual naming convention for auxiliary syntax |
|
1305 constants --- those that do not have a logical counter part --- by |
|
1306 allowing to specify arbitrary AST applications within the term |
|
1307 syntax, independently of the corresponding concrete syntax. |
|
1308 |
|
1309 Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML |
|
1310 Ast.Variable} as follows: a qualified name or syntax constant |
|
1311 declared via @{command syntax}, or parse tree head of concrete |
|
1312 notation becomes @{ML Ast.Constant}, anything else @{ML |
|
1313 Ast.Variable}. Note that @{text CONST} and @{text XCONST} within |
|
1314 the term language (\secref{sec:pure-grammar}) allow to enforce |
|
1315 treatment as constants. |
|
1316 |
|
1317 AST rewrite rules @{text "(lhs, rhs)"} need to obey the following |
|
1318 side-conditions: |
|
1319 |
|
1320 \begin{itemize} |
|
1321 |
|
1322 \item Rules must be left linear: @{text "lhs"} must not contain |
|
1323 repeated variables.\footnote{The deeper reason for this is that AST |
|
1324 equality is not well-defined: different occurrences of the ``same'' |
|
1325 AST could be decorated differently by accidental type-constraints or |
|
1326 source position information, for example.} |
|
1327 |
|
1328 \item Every variable in @{text "rhs"} must also occur in @{text |
|
1329 "lhs"}. |
|
1330 |
|
1331 \end{itemize} |
|
1332 |
|
1333 \item @{command "no_translations"}~@{text rules} removes syntactic |
|
1334 translation rules, which are interpreted in the same manner as for |
|
1335 @{command "translations"} above. |
|
1336 |
|
1337 \item @{attribute syntax_ast_trace} and @{attribute |
|
1338 syntax_ast_stats} control diagnostic output in the AST normalization |
|
1339 process, when translation rules are applied to concrete input or |
|
1340 output. |
|
1341 |
|
1342 \end{description} |
|
1343 |
|
1344 Raw syntax and translations provides a slightly more low-level |
|
1345 access to the grammar and the form of resulting parse trees. It is |
|
1346 often possible to avoid this untyped macro mechanism, and use |
|
1347 type-safe @{command abbreviation} or @{command notation} instead. |
|
1348 Some important situations where @{command syntax} and @{command |
|
1349 translations} are really need are as follows: |
|
1350 |
|
1351 \begin{itemize} |
|
1352 |
|
1353 \item Iterated replacement via recursive @{command translations}. |
|
1354 For example, consider list enumeration @{term "[a, b, c, d]"} as |
|
1355 defined in theory @{theory List} in Isabelle/HOL. |
|
1356 |
|
1357 \item Change of binding status of variables: anything beyond the |
|
1358 built-in @{keyword "binder"} mixfix annotation requires explicit |
|
1359 syntax translations. For example, consider list filter |
|
1360 comprehension @{term "[x \<leftarrow> xs . P]"} as defined in theory @{theory |
|
1361 List} in Isabelle/HOL. |
|
1362 |
|
1363 \end{itemize} |
|
1364 *} |
|
1365 |
|
1366 subsubsection {* Applying translation rules *} |
|
1367 |
|
1368 text {* As a term is being parsed or printed, an AST is generated as |
|
1369 an intermediate form according to \figref{fig:parse-print}. The AST |
|
1370 is normalized by applying translation rules in the manner of a |
|
1371 first-order term rewriting system. We first examine how a single |
|
1372 rule is applied. |
|
1373 |
|
1374 Let @{text "t"} be the abstract syntax tree to be normalized and |
|
1375 @{text "(lhs, rhs)"} some translation rule. A subtree @{text "u"} |
|
1376 of @{text "t"} is called \emph{redex} if it is an instance of @{text |
|
1377 "lhs"}; in this case the pattern @{text "lhs"} is said to match the |
|
1378 object @{text "u"}. A redex matched by @{text "lhs"} may be |
|
1379 replaced by the corresponding instance of @{text "rhs"}, thus |
|
1380 \emph{rewriting} the AST @{text "t"}. Matching requires some notion |
|
1381 of \emph{place-holders} in rule patterns: @{ML Ast.Variable} serves |
|
1382 this purpose. |
|
1383 |
|
1384 More precisely, the matching of the object @{text "u"} against the |
|
1385 pattern @{text "lhs"} is performed as follows: |
|
1386 |
|
1387 \begin{itemize} |
|
1388 |
|
1389 \item Objects of the form @{ML Ast.Variable}~@{text "x"} or @{ML |
|
1390 Ast.Constant}~@{text "x"} are matched by pattern @{ML |
|
1391 Ast.Constant}~@{text "x"}. Thus all atomic ASTs in the object are |
|
1392 treated as (potential) constants, and a successful match makes them |
|
1393 actual constants even before name space resolution (see also |
|
1394 \secref{sec:ast}). |
|
1395 |
|
1396 \item Object @{text "u"} is matched by pattern @{ML |
|
1397 Ast.Variable}~@{text "x"}, binding @{text "x"} to @{text "u"}. |
|
1398 |
|
1399 \item Object @{ML Ast.Appl}~@{text "us"} is matched by @{ML |
|
1400 Ast.Appl}~@{text "ts"} if @{text "us"} and @{text "ts"} have the |
|
1401 same length and each corresponding subtree matches. |
|
1402 |
|
1403 \item In every other case, matching fails. |
|
1404 |
|
1405 \end{itemize} |
|
1406 |
|
1407 A successful match yields a substitution that is applied to @{text |
|
1408 "rhs"}, generating the instance that replaces @{text "u"}. |
|
1409 |
|
1410 Normalizing an AST involves repeatedly applying translation rules |
|
1411 until none are applicable. This works yoyo-like: top-down, |
|
1412 bottom-up, top-down, etc. At each subtree position, rules are |
|
1413 chosen in order of appearance in the theory definitions. |
|
1414 |
|
1415 The configuration options @{attribute syntax_ast_trace} and |
|
1416 @{attribute syntax_ast_stats} might help to understand this process |
|
1417 and diagnose problems. |
|
1418 |
|
1419 \begin{warn} |
|
1420 If syntax translation rules work incorrectly, the output of |
|
1421 @{command_ref print_syntax} with its \emph{rules} sections reveals the |
|
1422 actual internal forms of AST pattern, without potentially confusing |
|
1423 concrete syntax. Recall that AST constants appear as quoted strings |
|
1424 and variables without quotes. |
|
1425 \end{warn} |
|
1426 |
|
1427 \begin{warn} |
|
1428 If @{attribute_ref eta_contract} is set to @{text "true"}, terms |
|
1429 will be @{text "\<eta>"}-contracted \emph{before} the AST rewriter sees |
|
1430 them. Thus some abstraction nodes needed for print rules to match |
|
1431 may vanish. For example, @{text "Ball A (\<lambda>x. P x)"} would contract |
|
1432 to @{text "Ball A P"} and the standard print rule would fail to |
|
1433 apply. This problem can be avoided by hand-written ML translation |
|
1434 functions (see also \secref{sec:tr-funs}), which is in fact the same |
|
1435 mechanism used in built-in @{keyword "binder"} declarations. |
|
1436 \end{warn} |
|
1437 *} |
|
1438 |
|
1439 |
|
1440 subsection {* Syntax translation functions \label{sec:tr-funs} *} |
|
1441 |
|
1442 text {* |
|
1443 \begin{matharray}{rcl} |
|
1444 @{command_def "parse_ast_translation"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1445 @{command_def "parse_translation"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1446 @{command_def "print_translation"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1447 @{command_def "typed_print_translation"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1448 @{command_def "print_ast_translation"} & : & @{text "theory \<rightarrow> theory"} \\ |
|
1449 @{ML_antiquotation_def "class_syntax"} & : & @{text ML_antiquotation} \\ |
|
1450 @{ML_antiquotation_def "type_syntax"} & : & @{text ML_antiquotation} \\ |
|
1451 @{ML_antiquotation_def "const_syntax"} & : & @{text ML_antiquotation} \\ |
|
1452 @{ML_antiquotation_def "syntax_const"} & : & @{text ML_antiquotation} \\ |
|
1453 \end{matharray} |
|
1454 |
|
1455 Syntax translation functions written in ML admit almost arbitrary |
|
1456 manipulations of inner syntax, at the expense of some complexity and |
|
1457 obscurity in the implementation. |
|
1458 |
|
1459 @{rail " |
|
1460 ( @@{command parse_ast_translation} | @@{command parse_translation} | |
|
1461 @@{command print_translation} | @@{command typed_print_translation} | |
|
1462 @@{command print_ast_translation}) ('(' @'advanced' ')')? @{syntax text} |
|
1463 ; |
|
1464 (@@{ML_antiquotation class_syntax} | |
|
1465 @@{ML_antiquotation type_syntax} | |
|
1466 @@{ML_antiquotation const_syntax} | |
|
1467 @@{ML_antiquotation syntax_const}) name |
|
1468 "} |
|
1469 |
|
1470 \begin{description} |
|
1471 |
|
1472 \item @{command parse_translation} etc. declare syntax translation |
|
1473 functions to the theory. Any of these commands have a single |
|
1474 @{syntax text} argument that refers to an ML expression of |
|
1475 appropriate type, which are as follows by default: |
|
1476 |
|
1477 \medskip |
|
1478 {\footnotesize |
|
1479 \begin{tabular}{ll} |
|
1480 @{command parse_ast_translation} & : @{ML_type "(string * (Ast.ast list -> Ast.ast)) list"} \\ |
|
1481 @{command parse_translation} & : @{ML_type "(string * (term list -> term)) list"} \\ |
|
1482 @{command print_translation} & : @{ML_type "(string * (term list -> term)) list"} \\ |
|
1483 @{command typed_print_translation} & : @{ML_type "(string * (typ -> term list -> term)) list"} \\ |
|
1484 @{command print_ast_translation} & : @{ML_type "(string * (Ast.ast list -> Ast.ast)) list"} \\ |
|
1485 \end{tabular}} |
|
1486 \medskip |
|
1487 |
|
1488 The argument list consists of @{text "(c, tr)"} pairs, where @{text |
|
1489 "c"} is the syntax name of the formal entity involved, and @{text |
|
1490 "tr"} a function that translates a syntax form @{text "c args"} into |
|
1491 @{text "tr args"}. The ML naming convention for parse translations |
|
1492 is @{text "c_tr"} and for print translations @{text "c_tr'"}. |
|
1493 |
|
1494 The @{command_ref print_syntax} command displays the sets of names |
|
1495 associated with the translation functions of a theory under @{text |
|
1496 "parse_ast_translation"} etc. |
|
1497 |
|
1498 If the @{verbatim "("}@{keyword "advanced"}@{verbatim ")"} option is |
|
1499 given, the corresponding translation functions depend on the current |
|
1500 theory or proof context as additional argument. This allows to |
|
1501 implement advanced syntax mechanisms, as translations functions may |
|
1502 refer to specific theory declarations or auxiliary proof data. |
|
1503 |
|
1504 \item @{text "@{class_syntax c}"}, @{text "@{type_syntax c}"}, |
|
1505 @{text "@{const_syntax c}"} inline the authentic syntax name of the |
|
1506 given formal entities into the ML source. This is the |
|
1507 fully-qualified logical name prefixed by a special marker to |
|
1508 indicate its kind: thus different logical name spaces are properly |
|
1509 distinguished within parse trees. |
|
1510 |
|
1511 \item @{text "@{const_syntax c}"} inlines the name @{text "c"} of |
|
1512 the given syntax constant, having checked that it has been declared |
|
1513 via some @{command syntax} commands within the theory context. Note |
|
1514 that the usual naming convention makes syntax constants start with |
|
1515 underscore, to reduce the chance of accidental clashes with other |
|
1516 names occurring in parse trees (unqualified constants etc.). |
|
1517 |
|
1518 \end{description} |
|
1519 *} |
|
1520 |
|
1521 |
|
1522 subsubsection {* The translation strategy *} |
|
1523 |
|
1524 text {* The different kinds of translation functions are invoked during |
|
1525 the transformations between parse trees, ASTs and syntactic terms |
|
1526 (cf.\ \figref{fig:parse-print}). Whenever a combination of the form |
|
1527 @{text "c x\<^sub>1 \<dots> x\<^sub>n"} is encountered, and a translation function |
|
1528 @{text "f"} of appropriate kind is declared for @{text "c"}, the |
|
1529 result is produced by evaluation of @{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} in ML. |
|
1530 |
|
1531 For AST translations, the arguments @{text "x\<^sub>1, \<dots>, x\<^sub>n"} are ASTs. A |
|
1532 combination has the form @{ML "Ast.Constant"}~@{text "c"} or @{ML |
|
1533 "Ast.Appl"}~@{text "["}@{ML Ast.Constant}~@{text "c, x\<^sub>1, \<dots>, x\<^sub>n]"}. |
|
1534 For term translations, the arguments are terms and a combination has |
|
1535 the form @{ML Const}~@{text "(c, \<tau>)"} or @{ML Const}~@{text "(c, \<tau>) |
|
1536 $ x\<^sub>1 $ \<dots> $ x\<^sub>n"}. Terms allow more sophisticated transformations |
|
1537 than ASTs do, typically involving abstractions and bound |
|
1538 variables. \emph{Typed} print translations may even peek at the type |
|
1539 @{text "\<tau>"} of the constant they are invoked on, although that information |
|
1540 may be inaccurate. |
|
1541 |
|
1542 Regardless of whether they act on ASTs or terms, translation |
|
1543 functions called during the parsing process differ from those for |
|
1544 printing in their overall behaviour: |
|
1545 |
|
1546 \begin{description} |
|
1547 |
|
1548 \item [Parse translations] are applied bottom-up. The arguments are |
|
1549 already in translated form. The translations must not fail; |
|
1550 exceptions trigger an error message. There may be at most one |
|
1551 function associated with any syntactic name. |
|
1552 |
|
1553 \item [Print translations] are applied top-down. They are supplied |
|
1554 with arguments that are partly still in internal form. The result |
|
1555 again undergoes translation; therefore a print translation should |
|
1556 not introduce as head the very constant that invoked it. The |
|
1557 function may raise exception @{ML Match} to indicate failure; in |
|
1558 this event it has no effect. Multiple functions associated with |
|
1559 some syntactic name are tried in the order of declaration in the |
|
1560 theory. |
|
1561 |
|
1562 \end{description} |
|
1563 |
|
1564 Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and |
|
1565 @{ML Const} for terms --- can invoke translation functions. This |
|
1566 means that parse translations can only be associated with parse tree |
|
1567 heads of concrete syntax, or syntactic constants introduced via |
|
1568 other translations. For plain identifiers within the term language, |
|
1569 the status of constant versus variable is not yet know during |
|
1570 parsing. This is in contrast to print translations, where constants |
|
1571 are explicitly known from the given term in its fully internal form. |
|
1572 *} |
|
1573 |
|
1574 end |