# HG changeset patch # User wenzelm # Date 1452280008 -3600 # Node ID d6af554512d74d783034d5074e5a7beb12a23db6 # Parent 686681f69d5e5ce89e785fa9789d143aad9ae3e6 tuned whitespace; diff -r 686681f69d5e -r d6af554512d7 src/Doc/Isar_Ref/Inner_Syntax.thy --- a/src/Doc/Isar_Ref/Inner_Syntax.thy Fri Jan 08 19:46:30 2016 +0100 +++ b/src/Doc/Isar_Ref/Inner_Syntax.thy Fri Jan 08 20:06:48 2016 +0100 @@ -6,26 +6,25 @@ chapter \Inner syntax --- the term language \label{ch:inner-syntax}\ -text \The inner syntax of Isabelle provides concrete notation for - the main entities of the logical framework, notably \\\-terms with types and type classes. Applications may either - extend existing syntactic categories by additional notation, or - define new sub-languages that are linked to the standard term - language via some explicit markers. For example \<^verbatim>\FOO\~\foo\ could - embed the syntax corresponding for some - user-defined nonterminal \foo\ --- within the bounds of the - given lexical syntax of Isabelle/Pure. +text \ + The inner syntax of Isabelle provides concrete notation for the main + entities of the logical framework, notably \\\-terms with types and type + classes. Applications may either extend existing syntactic categories by + additional notation, or define new sub-languages that are linked to the + standard term language via some explicit markers. For example \<^verbatim>\FOO\~\foo\ + could embed the syntax corresponding for some user-defined nonterminal \foo\ + --- within the bounds of the given lexical syntax of Isabelle/Pure. - The most basic way to specify concrete syntax for logical entities - works via mixfix annotations (\secref{sec:mixfix}), which may be - usually given as part of the original declaration or via explicit - notation commands later on (\secref{sec:notation}). This already - covers many needs of concrete syntax without having to understand - the full complexity of inner syntax layers. + The most basic way to specify concrete syntax for logical entities works via + mixfix annotations (\secref{sec:mixfix}), which may be usually given as part + of the original declaration or via explicit notation commands later on + (\secref{sec:notation}). This already covers many needs of concrete syntax + without having to understand the full complexity of inner syntax layers. - Further details of the syntax engine involves the classical - distinction of lexical language versus context-free grammar (see - \secref{sec:pure-syntax}), and various mechanisms for \<^emph>\syntax - transformations\ (see \secref{sec:syntax-transformations}). + Further details of the syntax engine involves the classical distinction of + lexical language versus context-free grammar (see \secref{sec:pure-syntax}), + and various mechanisms for \<^emph>\syntax transformations\ (see + \secref{sec:syntax-transformations}). \ @@ -63,43 +62,39 @@ @{syntax_def modes}: '(' (@{syntax name} + ) ')' \} - \<^descr> @{command "typ"}~\\\ reads and prints a type expression - according to the current context. + \<^descr> @{command "typ"}~\\\ reads and prints a type expression according to the + current context. - \<^descr> @{command "typ"}~\\ :: s\ uses type-inference to - determine the most general way to make \\\ conform to sort - \s\. For concrete \\\ this checks if the type - belongs to that sort. Dummy type parameters ``\_\'' - (underscore) are assigned to fresh type variables with most general - sorts, according the the principles of type-inference. + \<^descr> @{command "typ"}~\\ :: s\ uses type-inference to determine the most + general way to make \\\ conform to sort \s\. For concrete \\\ this checks if + the type belongs to that sort. Dummy type parameters ``\_\'' (underscore) + are assigned to fresh type variables with most general sorts, according the + the principles of type-inference. - \<^descr> @{command "term"}~\t\ and @{command "prop"}~\\\ - read, type-check and print terms or propositions according to the - current theory or proof context; the inferred type of \t\ is - output as well. Note that these commands are also useful in - inspecting the current environment of term abbreviations. + \<^descr> @{command "term"}~\t\ and @{command "prop"}~\\\ read, type-check and + print terms or propositions according to the current theory or proof + context; the inferred type of \t\ is output as well. Note that these + commands are also useful in inspecting the current environment of term + abbreviations. - \<^descr> @{command "thm"}~\a\<^sub>1 \ a\<^sub>n\ retrieves - theorems from the current theory or proof context. Note that any - attributes included in the theorem specifications are applied to a - temporary context derived from the current theory or proof; the - result is discarded, i.e.\ attributes involved in \a\<^sub>1, - \, a\<^sub>n\ do not have any permanent effect. + \<^descr> @{command "thm"}~\a\<^sub>1 \ a\<^sub>n\ retrieves theorems from the current theory + or proof context. Note that any attributes included in the theorem + specifications are applied to a temporary context derived from the current + theory or proof; the result is discarded, i.e.\ attributes involved in + \a\<^sub>1, \, a\<^sub>n\ do not have any permanent effect. - \<^descr> @{command "prf"} displays the (compact) proof term of the - current proof state (if present), or of the given theorems. Note - that this requires proof terms to be switched on for the current - object logic (see the ``Proof terms'' section of the Isabelle - reference manual for information on how to do this). + \<^descr> @{command "prf"} displays the (compact) proof term of the current proof + state (if present), or of the given theorems. Note that this requires + proof terms to be switched on for the current object logic (see the + ``Proof terms'' section of the Isabelle reference manual for information + on how to do this). - \<^descr> @{command "full_prf"} is like @{command "prf"}, but displays - the full proof term, i.e.\ also displays information omitted in the - compact proof term, which is denoted by ``\_\'' placeholders - there. + \<^descr> @{command "full_prf"} is like @{command "prf"}, but displays the full + proof term, i.e.\ also displays information omitted in the compact proof + term, which is denoted by ``\_\'' placeholders there. - \<^descr> @{command "print_state"} prints the current proof state (if - present), including current facts and goals. - + \<^descr> @{command "print_state"} prints the current proof state (if present), + including current facts and goals. All of the diagnostic commands above admit a list of \modes\ to be specified, which is appended to the current print mode; see also @@ -109,9 +104,8 @@ symbols and special characters represented in {\LaTeX} source, according to the Isabelle style @{cite "isabelle-system"}. - Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more - systematic way to include formal items into the printed text - document. + Note that antiquotations (cf.\ \secref{sec:antiq}) provide a more systematic + way to include formal items into the printed text document. \ @@ -137,100 +131,90 @@ \end{tabular} \<^medskip> - These configuration options control the detail of information that - is displayed for types, terms, theorems, goals etc. See also + These configuration options control the detail of information that is + displayed for types, terms, theorems, goals etc. See also \secref{sec:config}. - \<^descr> @{attribute show_markup} controls direct inlining of markup - into the printed representation of formal entities --- notably type - and sort constraints. This enables Prover IDE users to retrieve - that information via tooltips or popups while hovering with the - mouse over the output window, for example. Consequently, this - option is enabled by default for Isabelle/jEdit. + \<^descr> @{attribute show_markup} controls direct inlining of markup into the + printed representation of formal entities --- notably type and sort + constraints. This enables Prover IDE users to retrieve that information via + tooltips or popups while hovering with the mouse over the output window, for + example. Consequently, this option is enabled by default for Isabelle/jEdit. - \<^descr> @{attribute show_types} and @{attribute show_sorts} control - printing of type constraints for term variables, and sort - constraints for type variables. By default, neither of these are - shown in output. If @{attribute show_sorts} is enabled, types are - always shown as well. In Isabelle/jEdit, manual setting of these - options is normally not required thanks to @{attribute show_markup} - above. + \<^descr> @{attribute show_types} and @{attribute show_sorts} control printing of + type constraints for term variables, and sort constraints for type + variables. By default, neither of these are shown in output. If @{attribute + show_sorts} is enabled, types are always shown as well. In Isabelle/jEdit, + manual setting of these options is normally not required thanks to + @{attribute show_markup} above. - Note that displaying types and sorts may explain why a polymorphic - inference rule fails to resolve with some goal, or why a rewrite - rule does not apply as expected. - - \<^descr> @{attribute show_consts} controls printing of types of - constants when displaying a goal state. + Note that displaying types and sorts may explain why a polymorphic inference + rule fails to resolve with some goal, or why a rewrite rule does not apply + as expected. - Note that the output can be enormous, because polymorphic constants - often occur at several different type instances. + \<^descr> @{attribute show_consts} controls printing of types of constants when + displaying a goal state. - \<^descr> @{attribute show_abbrevs} controls folding of constant - abbreviations. + Note that the output can be enormous, because polymorphic constants often + occur at several different type instances. + + \<^descr> @{attribute show_abbrevs} controls folding of constant abbreviations. - \<^descr> @{attribute show_brackets} controls bracketing in pretty - printed output. If enabled, all sub-expressions of the pretty - printing tree will be parenthesized, even if this produces malformed - term syntax! This crude way of showing the internal structure of - pretty printed entities may occasionally help to diagnose problems - with operator priorities, for example. + \<^descr> @{attribute show_brackets} controls bracketing in pretty printed output. + If enabled, all sub-expressions of the pretty printing tree will be + parenthesized, even if this produces malformed term syntax! This crude way + of showing the internal structure of pretty printed entities may + occasionally help to diagnose problems with operator priorities, for + example. - \<^descr> @{attribute names_long}, @{attribute names_short}, and - @{attribute names_unique} control the way of printing fully - qualified internal names in external form. See also - \secref{sec:antiq} for the document antiquotation options of the - same names. + \<^descr> @{attribute names_long}, @{attribute names_short}, and @{attribute + names_unique} control the way of printing fully qualified internal names in + external form. See also \secref{sec:antiq} for the document antiquotation + options of the same names. - \<^descr> @{attribute eta_contract} controls \\\-contracted - printing of terms. + \<^descr> @{attribute eta_contract} controls \\\-contracted printing of terms. - The \\\-contraction law asserts @{prop "(\x. f x) \ f"}, - provided \x\ is not free in \f\. It asserts - \<^emph>\extensionality\ of functions: @{prop "f \ g"} if @{prop "f x \ - g x"} for all \x\. Higher-order unification frequently puts - terms into a fully \\\-expanded form. For example, if \F\ has type \(\ \ \) \ \\ then its expanded form is @{term - "\h. F (\x. h x)"}. + The \\\-contraction law asserts @{prop "(\x. f x) \ f"}, provided \x\ is not + free in \f\. It asserts \<^emph>\extensionality\ of functions: @{prop "f \ g"} if + @{prop "f x \ g x"} for all \x\. Higher-order unification frequently puts + terms into a fully \\\-expanded form. For example, if \F\ has type \(\ \ \) + \ \\ then its expanded form is @{term "\h. F (\x. h x)"}. - Enabling @{attribute eta_contract} makes Isabelle perform \\\-contractions before printing, so that @{term "\h. F (\x. h x)"} - appears simply as \F\. + Enabling @{attribute eta_contract} makes Isabelle perform \\\-contractions + before printing, so that @{term "\h. F (\x. h x)"} appears simply as \F\. - Note that the distinction between a term and its \\\-expanded - form occasionally matters. While higher-order resolution and - rewriting operate modulo \\\\\-conversion, some other tools - might look at terms more discretely. + Note that the distinction between a term and its \\\-expanded form + occasionally matters. While higher-order resolution and rewriting operate + modulo \\\\\-conversion, some other tools might look at terms more + discretely. - \<^descr> @{attribute goals_limit} controls the maximum number of - subgoals to be printed. + \<^descr> @{attribute goals_limit} controls the maximum number of subgoals to be + printed. - \<^descr> @{attribute show_main_goal} controls whether the main result - to be proven should be displayed. This information might be - relevant for schematic goals, to inspect the current claim that has - been synthesized so far. + \<^descr> @{attribute show_main_goal} controls whether the main result to be proven + should be displayed. This information might be relevant for schematic goals, + to inspect the current claim that has been synthesized so far. - \<^descr> @{attribute show_hyps} controls printing of implicit - hypotheses of local facts. Normally, only those hypotheses are - displayed that are \<^emph>\not\ covered by the assumptions of the - current context: this situation indicates a fault in some tool being - used. + \<^descr> @{attribute show_hyps} controls printing of implicit hypotheses of local + facts. Normally, only those hypotheses are displayed that are \<^emph>\not\ covered + by the assumptions of the current context: this situation indicates a fault + in some tool being used. - By enabling @{attribute show_hyps}, output of \<^emph>\all\ hypotheses - can be enforced, which is occasionally useful for diagnostic - purposes. + By enabling @{attribute show_hyps}, output of \<^emph>\all\ hypotheses can be + enforced, which is occasionally useful for diagnostic purposes. - \<^descr> @{attribute show_tags} controls printing of extra annotations - within theorems, such as internal position information, or the case - names being attached by the attribute @{attribute case_names}. + \<^descr> @{attribute show_tags} controls printing of extra annotations within + theorems, such as internal position information, or the case names being + attached by the attribute @{attribute case_names}. - Note that the @{attribute tagged} and @{attribute untagged} - attributes provide low-level access to the collection of tags - associated with a theorem. + Note that the @{attribute tagged} and @{attribute untagged} attributes + provide low-level access to the collection of tags associated with a + theorem. - \<^descr> @{attribute show_question_marks} controls printing of question - marks for schematic variables, such as \?x\. Only the leading - question mark is affected, the remaining text is unchanged - (including proper markup for schematic variables that might be - relevant for user interfaces). + \<^descr> @{attribute show_question_marks} controls printing of question marks for + schematic variables, such as \?x\. Only the leading question mark is + affected, the remaining text is unchanged (including proper markup for + schematic variables that might be relevant for user interfaces). \ @@ -242,62 +226,58 @@ @{index_ML Print_Mode.with_modes: "string list -> ('a -> 'b) -> 'a -> 'b"} \\ \end{mldecls} - The \<^emph>\print mode\ facility allows to modify various operations - for printing. Commands like @{command typ}, @{command term}, - @{command thm} (see \secref{sec:print-diag}) take additional print - modes as optional argument. The underlying ML operations are as - follows. + The \<^emph>\print mode\ facility allows to modify various operations for printing. + Commands like @{command typ}, @{command term}, @{command thm} (see + \secref{sec:print-diag}) take additional print modes as optional argument. + The underlying ML operations are as follows. - \<^descr> @{ML "print_mode_value ()"} yields the list of currently - active print mode names. This should be understood as symbolic - representation of certain individual features for printing (with - precedence from left to right). + \<^descr> @{ML "print_mode_value ()"} yields the list of currently active print + mode names. This should be understood as symbolic representation of + certain individual features for printing (with precedence from left to + right). - \<^descr> @{ML Print_Mode.with_modes}~\modes f x\ evaluates - \f x\ in an execution context where the print mode is - prepended by the given \modes\. This provides a thread-safe - way to augment print modes. It is also monotonic in the set of mode - names: it retains the default print mode that certain - user-interfaces might have installed for their proper functioning! - + \<^descr> @{ML Print_Mode.with_modes}~\modes f x\ evaluates \f x\ in an execution + context where the print mode is prepended by the given \modes\. This + provides a thread-safe way to augment print modes. It is also monotonic in + the set of mode names: it retains the default print mode that certain + user-interfaces might have installed for their proper functioning! \<^medskip> - The pretty printer for inner syntax maintains alternative - mixfix productions for any print mode name invented by the user, say - in commands like @{command notation} or @{command abbreviation}. - Mode names can be arbitrary, but the following ones have a specific - meaning by convention: + The pretty printer for inner syntax maintains alternative mixfix productions + for any print mode name invented by the user, say in commands like @{command + notation} or @{command abbreviation}. Mode names can be arbitrary, but the + following ones have a specific meaning by convention: - \<^item> \<^verbatim>\""\ (the empty string): default mode; - implicitly active as last element in the list of modes. + \<^item> \<^verbatim>\""\ (the empty string): default mode; implicitly active as last + element in the list of modes. - \<^item> \<^verbatim>\input\: dummy print mode that is never active; may - be used to specify notation that is only available for input. + \<^item> \<^verbatim>\input\: dummy print mode that is never active; may be used to specify + notation that is only available for input. - \<^item> \<^verbatim>\internal\ dummy print mode that is never active; - used internally in Isabelle/Pure. + \<^item> \<^verbatim>\internal\ dummy print mode that is never active; used internally in + Isabelle/Pure. - \<^item> \<^verbatim>\ASCII\: prefer ASCII art over mathematical symbols. + \<^item> \<^verbatim>\ASCII\: prefer ASCII art over mathematical symbols. - \<^item> \<^verbatim>\latex\: additional mode that is active in {\LaTeX} - document preparation of Isabelle theory sources; allows to provide - alternative output notation. + \<^item> \<^verbatim>\latex\: additional mode that is active in {\LaTeX} document + preparation of Isabelle theory sources; allows to provide alternative + output notation. \ section \Mixfix annotations \label{sec:mixfix}\ -text \Mixfix annotations specify concrete \<^emph>\inner syntax\ of - Isabelle types and terms. Locally fixed parameters in toplevel - theorem statements, locale and class specifications also admit - mixfix annotations in a fairly uniform manner. A mixfix annotation - describes the concrete syntax, the translation to abstract - syntax, and the pretty printing. Special case annotations provide a - simple means of specifying infix operators and binders. +text \ + Mixfix annotations specify concrete \<^emph>\inner syntax\ of Isabelle types and + terms. Locally fixed parameters in toplevel theorem statements, locale and + class specifications also admit mixfix annotations in a fairly uniform + manner. A mixfix annotation describes the concrete syntax, the translation + to abstract syntax, and the pretty printing. Special case annotations + provide a simple means of specifying infix operators and binders. - Isabelle mixfix syntax is inspired by {\OBJ} @{cite OBJ}. It allows - to specify any context-free priority grammar, which is more general - than the fixity declarations of ML and Prolog. + Isabelle mixfix syntax is inspired by {\OBJ} @{cite OBJ}. It allows to + specify any context-free priority grammar, which is more general than the + fixity declarations of ML and Prolog. @{rail \ @{syntax_def mixfix}: '(' @@ -311,52 +291,48 @@ prios: '[' (@{syntax nat} + ',') ']' \} - The string given as \template\ may include literal text, - spacing, blocks, and arguments (denoted by ``\_\''); the - special symbol ``\<^verbatim>\\\'' (printed as ``\\\'') - represents an index argument that specifies an implicit @{keyword - "structure"} reference (see also \secref{sec:locale}). Only locally - fixed variables may be declared as @{keyword "structure"}. + The string given as \template\ may include literal text, spacing, blocks, + and arguments (denoted by ``\_\''); the special symbol ``\<^verbatim>\\\'' (printed as + ``\\\'') represents an index argument that specifies an implicit @{keyword + "structure"} reference (see also \secref{sec:locale}). Only locally fixed + variables may be declared as @{keyword "structure"}. - Infix and binder declarations provide common abbreviations for - particular mixfix declarations. So in practice, mixfix templates - mostly degenerate to literal text for concrete syntax, such as - ``\<^verbatim>\++\'' for an infix symbol. + Infix and binder declarations provide common abbreviations for particular + mixfix declarations. So in practice, mixfix templates mostly degenerate to + literal text for concrete syntax, such as ``\<^verbatim>\++\'' for an infix symbol. \ subsection \The general mixfix form\ -text \In full generality, mixfix declarations work as follows. - Suppose a constant \c :: \\<^sub>1 \ \ \\<^sub>n \ \\ is annotated by - \(mixfix [p\<^sub>1, \, p\<^sub>n] p)\, where \mixfix\ is a string - \d\<^sub>0 _ d\<^sub>1 _ \ _ d\<^sub>n\ consisting of delimiters that surround - argument positions as indicated by underscores. +text \ + In full generality, mixfix declarations work as follows. Suppose a constant + \c :: \\<^sub>1 \ \ \\<^sub>n \ \\ is annotated by \(mixfix [p\<^sub>1, \, p\<^sub>n] p)\, where + \mixfix\ is a string \d\<^sub>0 _ d\<^sub>1 _ \ _ d\<^sub>n\ consisting of delimiters that + surround argument positions as indicated by underscores. - Altogether this determines a production for a context-free priority - grammar, where for each argument \i\ the syntactic category - is determined by \\\<^sub>i\ (with priority \p\<^sub>i\), and the - result category is determined from \\\ (with priority \p\). Priority specifications are optional, with default 0 for - arguments and 1000 for the result.\<^footnote>\Omitting priorities is - prone to syntactic ambiguities unless the delimiter tokens determine - fully bracketed notation, as in \if _ then _ else _ fi\.\ + Altogether this determines a production for a context-free priority grammar, + where for each argument \i\ the syntactic category is determined by \\\<^sub>i\ + (with priority \p\<^sub>i\), and the result category is determined from \\\ (with + priority \p\). Priority specifications are optional, with default 0 for + arguments and 1000 for the result.\<^footnote>\Omitting priorities is prone to + syntactic ambiguities unless the delimiter tokens determine fully bracketed + notation, as in \if _ then _ else _ fi\.\ - Since \\\ may be again a function type, the constant - type scheme may have more argument positions than the mixfix - pattern. Printing a nested application \c t\<^sub>1 \ t\<^sub>m\ for - \m > n\ works by attaching concrete notation only to the - innermost part, essentially by printing \(c t\<^sub>1 \ t\<^sub>n) \ t\<^sub>m\ - instead. If a term has fewer arguments than specified in the mixfix + Since \\\ may be again a function type, the constant type scheme may have + more argument positions than the mixfix pattern. Printing a nested + application \c t\<^sub>1 \ t\<^sub>m\ for \m > n\ works by attaching concrete notation + only to the innermost part, essentially by printing \(c t\<^sub>1 \ t\<^sub>n) \ t\<^sub>m\ + instead. If a term has fewer arguments than specified in the mixfix template, the concrete syntax is ignored. \<^medskip> - A mixfix template may also contain additional directives - for pretty printing, notably spaces, blocks, and breaks. The - general template format is a sequence over any of the following - entities. + A mixfix template may also contain additional directives for pretty + printing, notably spaces, blocks, and breaks. The general template format is + a sequence over any of the following entities. - \<^descr> \d\ is a delimiter, namely a non-empty sequence of - characters other than the following special characters: + \<^descr> \d\ is a delimiter, namely a non-empty sequence of characters other than + the following special characters: \<^medskip> \begin{tabular}{ll} @@ -369,47 +345,45 @@ \end{tabular} \<^medskip> - \<^descr> \<^verbatim>\'\ escapes the special meaning of these - meta-characters, producing a literal version of the following - character, unless that is a blank. + \<^descr> \<^verbatim>\'\ escapes the special meaning of these meta-characters, producing a + literal version of the following character, unless that is a blank. - A single quote followed by a blank separates delimiters, without - affecting printing, but input tokens may have additional white space - here. + A single quote followed by a blank separates delimiters, without affecting + printing, but input tokens may have additional white space here. - \<^descr> \<^verbatim>\_\ is an argument position, which stands for a - certain syntactic category in the underlying grammar. + \<^descr> \<^verbatim>\_\ is an argument position, which stands for a certain syntactic + category in the underlying grammar. - \<^descr> \\\ is an indexed argument position; this is the place - where implicit structure arguments can be attached. - - \<^descr> \s\ is a non-empty sequence of spaces for printing. - This and the following specifications do not affect parsing at all. + \<^descr> \\\ is an indexed argument position; this is the place where implicit + structure arguments can be attached. - \<^descr> \<^verbatim>\(\\n\ opens a pretty printing block. The - optional number specifies how much indentation to add when a line - break occurs within the block. If the parenthesis is not followed - by digits, the indentation defaults to 0. A block specified via - \<^verbatim>\(00\ is unbreakable. + \<^descr> \s\ is a non-empty sequence of spaces for printing. This and the following + specifications do not affect parsing at all. + + \<^descr> \<^verbatim>\(\\n\ opens a pretty printing block. The optional number specifies how + much indentation to add when a line break occurs within the block. If the + parenthesis is not followed by digits, the indentation defaults to 0. A + block specified via \<^verbatim>\(00\ is unbreakable. \<^descr> \<^verbatim>\)\ closes a pretty printing block. \<^descr> \<^verbatim>\//\ forces a line break. - \<^descr> \<^verbatim>\/\\s\ allows a line break. Here \s\ - stands for the string of spaces (zero or more) right after the - slash. These spaces are printed if the break is \<^emph>\not\ taken. + \<^descr> \<^verbatim>\/\\s\ allows a line break. Here \s\ stands for the string of spaces + (zero or more) right after the slash. These spaces are printed if the break + is \<^emph>\not\ taken. - The general idea of pretty printing with blocks and breaks is also - described in @{cite "paulson-ml2"}; it goes back to @{cite "Oppen:1980"}. + The general idea of pretty printing with blocks and breaks is also described + in @{cite "paulson-ml2"}; it goes back to @{cite "Oppen:1980"}. \ subsection \Infixes\ -text \Infix operators are specified by convenient short forms that - abbreviate general mixfix annotations as follows: +text \ + Infix operators are specified by convenient short forms that abbreviate + general mixfix annotations as follows: \begin{center} \begin{tabular}{lll} @@ -427,35 +401,33 @@ \end{tabular} \end{center} - The mixfix template \<^verbatim>\"(_\~\sy\\<^verbatim>\/ _)"\ - specifies two argument positions; the delimiter is preceded by a - space and followed by a space or line break; the entire phrase is a - pretty printing block. + The mixfix template \<^verbatim>\"(_\~\sy\\<^verbatim>\/ _)"\ specifies two argument positions; + the delimiter is preceded by a space and followed by a space or line break; + the entire phrase is a pretty printing block. - The alternative notation \<^verbatim>\op\~\sy\ is introduced - in addition. Thus any infix operator may be written in prefix form - (as in ML), independently of the number of arguments in the term. + The alternative notation \<^verbatim>\op\~\sy\ is introduced in addition. Thus any + infix operator may be written in prefix form (as in ML), independently of + the number of arguments in the term. \ subsection \Binders\ -text \A \<^emph>\binder\ is a variable-binding construct such as a - quantifier. The idea to formalize \\x. b\ as \All - (\x. b)\ for \All :: ('a \ bool) \ bool\ already goes back - to @{cite church40}. Isabelle declarations of certain higher-order - operators may be annotated with @{keyword_def "binder"} annotations - as follows: +text \ + A \<^emph>\binder\ is a variable-binding construct such as a quantifier. The idea + to formalize \\x. b\ as \All (\x. b)\ for \All :: ('a \ bool) \ bool\ + already goes back to @{cite church40}. Isabelle declarations of certain + higher-order operators may be annotated with @{keyword_def "binder"} + annotations as follows: \begin{center} \c ::\~\<^verbatim>\"\\(\\<^sub>1 \ \\<^sub>2) \ \\<^sub>3\\<^verbatim>\" (\@{keyword "binder"}~\<^verbatim>\"\\sy\\<^verbatim>\" [\\p\\<^verbatim>\]\~\q\\<^verbatim>\)\ \end{center} - This introduces concrete binder syntax \sy x. b\, where - \x\ is a bound variable of type \\\<^sub>1\, the body \b\ has type \\\<^sub>2\ and the whole term has type \\\<^sub>3\. - The optional integer \p\ specifies the syntactic priority of - the body; the default is \q\, which is also the priority of - the whole construct. + This introduces concrete binder syntax \sy x. b\, where \x\ is a bound + variable of type \\\<^sub>1\, the body \b\ has type \\\<^sub>2\ and the whole term has + type \\\<^sub>3\. The optional integer \p\ specifies the syntactic priority of the + body; the default is \q\, which is also the priority of the whole construct. Internally, the binder syntax is expanded to something like this: \begin{center} @@ -464,17 +436,18 @@ Here @{syntax (inner) idts} is the nonterminal symbol for a list of identifiers with optional type constraints (see also - \secref{sec:pure-grammar}). The mixfix template \<^verbatim>\"(3\\sy\\<^verbatim>\_./ _)"\ - defines argument positions - for the bound identifiers and the body, separated by a dot with - optional line break; the entire phrase is a pretty printing block of - indentation level 3. Note that there is no extra space after \sy\, so it needs to be included user specification if the binder - syntax ends with a token that may be continued by an identifier - token at the start of @{syntax (inner) idts}. + \secref{sec:pure-grammar}). The mixfix template \<^verbatim>\"(3\\sy\\<^verbatim>\_./ _)"\ defines + argument positions for the bound identifiers and the body, separated by a + dot with optional line break; the entire phrase is a pretty printing block + of indentation level 3. Note that there is no extra space after \sy\, so it + needs to be included user specification if the binder syntax ends with a + token that may be continued by an identifier token at the start of @{syntax + (inner) idts}. - Furthermore, a syntax translation to transforms \c_binder x\<^sub>1 - \ x\<^sub>n b\ into iterated application \c (\x\<^sub>1. \ c (\x\<^sub>n. b)\)\. - This works in both directions, for parsing and printing.\ + Furthermore, a syntax translation to transforms \c_binder x\<^sub>1 \ x\<^sub>n b\ into + iterated application \c (\x\<^sub>1. \ c (\x\<^sub>n. b)\)\. This works in both + directions, for parsing and printing. +\ section \Explicit notation \label{sec:notation}\ @@ -488,12 +461,11 @@ @{command_def "write"} & : & \proof(state) \ proof(state)\ \\ \end{matharray} - Commands that introduce new logical entities (terms or types) - usually allow to provide mixfix annotations on the spot, which is - convenient for default notation. Nonetheless, the syntax may be - modified later on by declarations for explicit notation. This - allows to add or delete mixfix annotations for of existing logical - entities within the current context. + Commands that introduce new logical entities (terms or types) usually allow + to provide mixfix annotations on the spot, which is convenient for default + notation. Nonetheless, the syntax may be modified later on by declarations + for explicit notation. This allows to add or delete mixfix annotations for + of existing logical entities within the current context. @{rail \ (@@{command type_notation} | @@{command no_type_notation}) @{syntax mode}? \ @@ -505,24 +477,22 @@ @@{command write} @{syntax mode}? (@{syntax nameref} @{syntax mixfix} + @'and') \} - \<^descr> @{command "type_notation"}~\c (mx)\ associates mixfix - syntax with an existing type constructor. The arity of the - constructor is retrieved from the context. + \<^descr> @{command "type_notation"}~\c (mx)\ associates mixfix syntax with an + existing type constructor. The arity of the constructor is retrieved from + the context. - \<^descr> @{command "no_type_notation"} is similar to @{command - "type_notation"}, but removes the specified syntax annotation from - the present context. + \<^descr> @{command "no_type_notation"} is similar to @{command "type_notation"}, + but removes the specified syntax annotation from the present context. - \<^descr> @{command "notation"}~\c (mx)\ associates mixfix - syntax with an existing constant or fixed variable. The type - declaration of the given entity is retrieved from the context. + \<^descr> @{command "notation"}~\c (mx)\ associates mixfix syntax with an existing + constant or fixed variable. The type declaration of the given entity is + retrieved from the context. - \<^descr> @{command "no_notation"} is similar to @{command "notation"}, - but removes the specified syntax annotation from the present - context. + \<^descr> @{command "no_notation"} is similar to @{command "notation"}, but removes + the specified syntax annotation from the present context. - \<^descr> @{command "write"} is similar to @{command "notation"}, but - works within an Isar proof body. + \<^descr> @{command "write"} is similar to @{command "notation"}, but works within + an Isar proof body. \ @@ -530,26 +500,24 @@ subsection \Lexical matters \label{sec:inner-lex}\ -text \The inner lexical syntax vaguely resembles the outer one - (\secref{sec:outer-lex}), but some details are different. There are - two main categories of inner syntax tokens: +text \ + The inner lexical syntax vaguely resembles the outer one + (\secref{sec:outer-lex}), but some details are different. There are two main + categories of inner syntax tokens: - \<^enum> \<^emph>\delimiters\ --- the literal tokens occurring in - productions of the given priority grammar (cf.\ - \secref{sec:priority-grammar}); + \<^enum> \<^emph>\delimiters\ --- the literal tokens occurring in productions of the given + priority grammar (cf.\ \secref{sec:priority-grammar}); \<^enum> \<^emph>\named tokens\ --- various categories of identifiers etc. - Delimiters override named tokens and may thus render certain - identifiers inaccessible. Sometimes the logical context admits - alternative ways to refer to the same entity, potentially via - qualified names. + Delimiters override named tokens and may thus render certain identifiers + inaccessible. Sometimes the logical context admits alternative ways to refer + to the same entity, potentially via qualified names. \<^medskip> - The categories for named tokens are defined once and for - all as follows, reusing some categories of the outer token syntax - (\secref{sec:outer-lex}). + The categories for named tokens are defined once and for all as follows, + reusing some categories of the outer token syntax (\secref{sec:outer-lex}). \begin{center} \begin{supertabular}{rcl} @@ -581,32 +549,30 @@ subsection \Priority grammars \label{sec:priority-grammar}\ -text \A context-free grammar consists of a set of \<^emph>\terminal - symbols\, a set of \<^emph>\nonterminal symbols\ and a set of - \<^emph>\productions\. Productions have the form \A = \\, - where \A\ is a nonterminal and \\\ is a string of - terminals and nonterminals. One designated nonterminal is called - the \<^emph>\root symbol\. The language defined by the grammar - consists of all strings of terminals that can be derived from the - root symbol by applying productions as rewrite rules. +text \ + A context-free grammar consists of a set of \<^emph>\terminal symbols\, a set of + \<^emph>\nonterminal symbols\ and a set of \<^emph>\productions\. Productions have the + form \A = \\, where \A\ is a nonterminal and \\\ is a string of terminals + and nonterminals. One designated nonterminal is called the \<^emph>\root symbol\. + The language defined by the grammar consists of all strings of terminals + that can be derived from the root symbol by applying productions as rewrite + rules. - The standard Isabelle parser for inner syntax uses a \<^emph>\priority - grammar\. Each nonterminal is decorated by an integer priority: - \A\<^sup>(\<^sup>p\<^sup>)\. In a derivation, \A\<^sup>(\<^sup>p\<^sup>)\ may be rewritten - using a production \A\<^sup>(\<^sup>q\<^sup>) = \\ only if \p \ q\. Any - priority grammar can be translated into a normal context-free - grammar by introducing new nonterminals and productions. + The standard Isabelle parser for inner syntax uses a \<^emph>\priority grammar\. + Each nonterminal is decorated by an integer priority: \A\<^sup>(\<^sup>p\<^sup>)\. In a + derivation, \A\<^sup>(\<^sup>p\<^sup>)\ may be rewritten using a production \A\<^sup>(\<^sup>q\<^sup>) = \\ only + if \p \ q\. Any priority grammar can be translated into a normal + context-free grammar by introducing new nonterminals and productions. \<^medskip> - Formally, a set of context free productions \G\ - induces a derivation relation \\\<^sub>G\ as follows. Let \\\ and \\\ denote strings of terminal or nonterminal symbols. - Then \\ A\<^sup>(\<^sup>p\<^sup>) \ \\<^sub>G \ \ \\ holds if and only if \G\ + Formally, a set of context free productions \G\ induces a derivation + relation \\\<^sub>G\ as follows. Let \\\ and \\\ denote strings of terminal or + nonterminal symbols. Then \\ A\<^sup>(\<^sup>p\<^sup>) \ \\<^sub>G \ \ \\ holds if and only if \G\ contains some production \A\<^sup>(\<^sup>q\<^sup>) = \\ for \p \ q\. \<^medskip> - The following grammar for arithmetic expressions - demonstrates how binding power and associativity of operators can be - enforced by priorities. + The following grammar for arithmetic expressions demonstrates how binding + power and associativity of operators can be enforced by priorities. \begin{center} \begin{tabular}{rclr} @@ -617,28 +583,25 @@ \A\<^sup>(\<^sup>3\<^sup>)\ & \=\ & \<^verbatim>\-\ \A\<^sup>(\<^sup>3\<^sup>)\ \\ \end{tabular} \end{center} - The choice of priorities determines that \<^verbatim>\-\ binds - tighter than \<^verbatim>\*\, which binds tighter than \<^verbatim>\+\. - Furthermore \<^verbatim>\+\ associates to the left and - \<^verbatim>\*\ to the right. + The choice of priorities determines that \<^verbatim>\-\ binds tighter than \<^verbatim>\*\, which + binds tighter than \<^verbatim>\+\. Furthermore \<^verbatim>\+\ associates to the left and \<^verbatim>\*\ to + the right. \<^medskip> For clarity, grammars obey these conventions: - \<^item> All priorities must lie between 0 and 1000. + \<^item> All priorities must lie between 0 and 1000. - \<^item> Priority 0 on the right-hand side and priority 1000 on the - left-hand side may be omitted. + \<^item> Priority 0 on the right-hand side and priority 1000 on the left-hand + side may be omitted. - \<^item> The production \A\<^sup>(\<^sup>p\<^sup>) = \\ is written as \A = \ - (p)\, i.e.\ the priority of the left-hand side actually appears in - a column on the far right. + \<^item> The production \A\<^sup>(\<^sup>p\<^sup>) = \\ is written as \A = \ (p)\, i.e.\ the + priority of the left-hand side actually appears in a column on the far + right. - \<^item> Alternatives are separated by \|\. + \<^item> Alternatives are separated by \|\. - \<^item> Repetition is indicated by dots \(\)\ in an informal - but obvious way. - + \<^item> Repetition is indicated by dots \(\)\ in an informal but obvious way. Using these conventions, the example grammar specification above takes the form: @@ -656,8 +619,9 @@ subsection \The Pure grammar \label{sec:pure-grammar}\ -text \The priority grammar of the \Pure\ theory is defined - approximately like this: +text \ + The priority grammar of the \Pure\ theory is defined approximately like + this: \begin{center} \begin{supertabular}{rclr} @@ -729,55 +693,54 @@ \end{center} \<^medskip> - Here literal terminals are printed \<^verbatim>\verbatim\; - see also \secref{sec:inner-lex} for further token categories of the - inner syntax. The meaning of the nonterminals defined by the above - grammar is as follows: + Here literal terminals are printed \<^verbatim>\verbatim\; see also + \secref{sec:inner-lex} for further token categories of the inner syntax. The + meaning of the nonterminals defined by the above grammar is as follows: \<^descr> @{syntax_ref (inner) any} denotes any term. - \<^descr> @{syntax_ref (inner) prop} denotes meta-level propositions, - which are terms of type @{typ prop}. The syntax of such formulae of - the meta-logic is carefully distinguished from usual conventions for - object-logics. In particular, plain \\\-term notation is - \<^emph>\not\ recognized as @{syntax (inner) prop}. + \<^descr> @{syntax_ref (inner) prop} denotes meta-level propositions, which are + terms of type @{typ prop}. The syntax of such formulae of the meta-logic is + carefully distinguished from usual conventions for object-logics. In + particular, plain \\\-term notation is \<^emph>\not\ recognized as @{syntax (inner) + prop}. - \<^descr> @{syntax_ref (inner) aprop} denotes atomic propositions, which - are embedded into regular @{syntax (inner) prop} by means of an - explicit \<^verbatim>\PROP\ token. + \<^descr> @{syntax_ref (inner) aprop} denotes atomic propositions, which are + embedded into regular @{syntax (inner) prop} by means of an explicit \<^verbatim>\PROP\ + token. - Terms of type @{typ prop} with non-constant head, e.g.\ a plain - variable, are printed in this form. Constants that yield type @{typ - prop} are expected to provide their own concrete syntax; otherwise - the printed version will appear like @{syntax (inner) logic} and - cannot be parsed again as @{syntax (inner) prop}. + Terms of type @{typ prop} with non-constant head, e.g.\ a plain variable, + are printed in this form. Constants that yield type @{typ prop} are expected + to provide their own concrete syntax; otherwise the printed version will + appear like @{syntax (inner) logic} and cannot be parsed again as @{syntax + (inner) prop}. - \<^descr> @{syntax_ref (inner) logic} denotes arbitrary terms of a - logical type, excluding type @{typ prop}. This is the main - syntactic category of object-logic entities, covering plain \\\-term notation (variables, abstraction, application), plus - anything defined by the user. + \<^descr> @{syntax_ref (inner) logic} denotes arbitrary terms of a logical type, + excluding type @{typ prop}. This is the main syntactic category of + object-logic entities, covering plain \\\-term notation (variables, + abstraction, application), plus anything defined by the user. - When specifying notation for logical entities, all logical types - (excluding @{typ prop}) are \<^emph>\collapsed\ to this single category - of @{syntax (inner) logic}. + When specifying notation for logical entities, all logical types (excluding + @{typ prop}) are \<^emph>\collapsed\ to this single category of @{syntax (inner) + logic}. - \<^descr> @{syntax_ref (inner) index} denotes an optional index term for - indexed syntax. If omitted, it refers to the first @{keyword_ref - "structure"} variable in the context. The special dummy ``\\\'' serves as pattern variable in mixfix annotations that - introduce indexed notation. + \<^descr> @{syntax_ref (inner) index} denotes an optional index term for indexed + syntax. If omitted, it refers to the first @{keyword_ref "structure"} + variable in the context. The special dummy ``\\\'' serves as pattern + variable in mixfix annotations that introduce indexed notation. - \<^descr> @{syntax_ref (inner) idt} denotes identifiers, possibly - constrained by types. + \<^descr> @{syntax_ref (inner) idt} denotes identifiers, possibly constrained by + types. - \<^descr> @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref - (inner) idt}. This is the most basic category for variables in - iterated binders, such as \\\ or \\\. + \<^descr> @{syntax_ref (inner) idts} denotes a sequence of @{syntax_ref (inner) + idt}. This is the most basic category for variables in iterated binders, + such as \\\ or \\\. - \<^descr> @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns} - denote patterns for abstraction, cases bindings etc. In Pure, these - categories start as a merely copy of @{syntax (inner) idt} and - @{syntax (inner) idts}, respectively. Object-logics may add - additional productions for binding forms. + \<^descr> @{syntax_ref (inner) pttrn} and @{syntax_ref (inner) pttrns} denote + patterns for abstraction, cases bindings etc. In Pure, these categories + start as a merely copy of @{syntax (inner) idt} and @{syntax (inner) idts}, + respectively. Object-logics may add additional productions for binding + forms. \<^descr> @{syntax_ref (inner) type} denotes types of the meta-logic. @@ -786,58 +749,52 @@ Here are some further explanations of certain syntax features. - \<^item> In @{syntax (inner) idts}, note that \x :: nat y\ is - parsed as \x :: (nat y)\, treating \y\ like a type - constructor applied to \nat\. To avoid this interpretation, - write \(x :: nat) y\ with explicit parentheses. + \<^item> In @{syntax (inner) idts}, note that \x :: nat y\ is parsed as \x :: (nat + y)\, treating \y\ like a type constructor applied to \nat\. To avoid this + interpretation, write \(x :: nat) y\ with explicit parentheses. - \<^item> Similarly, \x :: nat y :: nat\ is parsed as \x :: - (nat y :: nat)\. The correct form is \(x :: nat) (y :: - nat)\, or \(x :: nat) y :: nat\ if \y\ is last in the - sequence of identifiers. + \<^item> Similarly, \x :: nat y :: nat\ is parsed as \x :: (nat y :: nat)\. The + correct form is \(x :: nat) (y :: nat)\, or \(x :: nat) y :: nat\ if \y\ is + last in the sequence of identifiers. - \<^item> Type constraints for terms bind very weakly. For example, - \x < y :: nat\ is normally parsed as \(x < y) :: - nat\, unless \<\ has a very low priority, in which case the - input is likely to be ambiguous. The correct form is \x < (y - :: nat)\. + \<^item> Type constraints for terms bind very weakly. For example, \x < y :: nat\ + is normally parsed as \(x < y) :: nat\, unless \<\ has a very low priority, + in which case the input is likely to be ambiguous. The correct form is \x < + (y :: nat)\. \<^item> Dummy variables (written as underscore) may occur in different roles. - \<^descr> A type ``\_\'' or ``\_ :: sort\'' acts like an - anonymous inference parameter, which is filled-in according to the - most general type produced by the type-checking phase. + \<^descr> A type ``\_\'' or ``\_ :: sort\'' acts like an anonymous inference + parameter, which is filled-in according to the most general type produced + by the type-checking phase. - \<^descr> A bound ``\_\'' refers to a vacuous abstraction, where - the body does not refer to the binding introduced here. As in the - term @{term "\x _. x"}, which is \\\-equivalent to \\x y. x\. + \<^descr> A bound ``\_\'' refers to a vacuous abstraction, where the body does not + refer to the binding introduced here. As in the term @{term "\x _. x"}, + which is \\\-equivalent to \\x y. x\. - \<^descr> A free ``\_\'' refers to an implicit outer binding. - Higher definitional packages usually allow forms like \f x _ - = x\. + \<^descr> A free ``\_\'' refers to an implicit outer binding. Higher definitional + packages usually allow forms like \f x _ = x\. - \<^descr> A schematic ``\_\'' (within a term pattern, see - \secref{sec:term-decls}) refers to an anonymous variable that is - implicitly abstracted over its context of locally bound variables. - For example, this allows pattern matching of \{x. f x = g - x}\ against \{x. _ = _}\, or even \{_. _ = _}\ by + \<^descr> A schematic ``\_\'' (within a term pattern, see \secref{sec:term-decls}) + refers to an anonymous variable that is implicitly abstracted over its + context of locally bound variables. For example, this allows pattern + matching of \{x. f x = g x}\ against \{x. _ = _}\, or even \{_. _ = _}\ by using both bound and schematic dummies. - \<^descr> The three literal dots ``\<^verbatim>\...\'' may be also - written as ellipsis symbol \<^verbatim>\\\. In both cases this - refers to a special schematic variable, which is bound in the - context. This special term abbreviation works nicely with + \<^descr> The three literal dots ``\<^verbatim>\...\'' may be also written as ellipsis symbol + \<^verbatim>\\\. In both cases this refers to a special schematic variable, which is + bound in the context. This special term abbreviation works nicely with calculational reasoning (\secref{sec:calculation}). - \<^descr> \<^verbatim>\CONST\ ensures that the given identifier is treated - as constant term, and passed through the parse tree in fully - internalized form. This is particularly relevant for translation - rules (\secref{sec:syn-trans}), notably on the RHS. + \<^descr> \<^verbatim>\CONST\ ensures that the given identifier is treated as constant term, + and passed through the parse tree in fully internalized form. This is + particularly relevant for translation rules (\secref{sec:syn-trans}), + notably on the RHS. - \<^descr> \<^verbatim>\XCONST\ is similar to \<^verbatim>\CONST\, but - retains the constant name as given. This is only relevant to - translation rules (\secref{sec:syn-trans}), notably on the LHS. + \<^descr> \<^verbatim>\XCONST\ is similar to \<^verbatim>\CONST\, but retains the constant name as given. + This is only relevant to translation rules (\secref{sec:syn-trans}), notably + on the LHS. \ @@ -848,46 +805,45 @@ @{command_def "print_syntax"}\\<^sup>*\ & : & \context \\ \\ \end{matharray} - \<^descr> @{command "print_syntax"} prints the inner syntax of the - current context. The output can be quite large; the most important - sections are explained below. + \<^descr> @{command "print_syntax"} prints the inner syntax of the current context. + The output can be quite large; the most important sections are explained + below. - \<^descr> \lexicon\ lists the delimiters of the inner token - language; see \secref{sec:inner-lex}. - - \<^descr> \prods\ lists the productions of the underlying - priority grammar; see \secref{sec:priority-grammar}. + \<^descr> \lexicon\ lists the delimiters of the inner token language; see + \secref{sec:inner-lex}. - The nonterminal \A\<^sup>(\<^sup>p\<^sup>)\ is rendered in plain text as \A[p]\; delimiters are quoted. Many productions have an extra - \\ => name\. These names later become the heads of parse - trees; they also guide the pretty printer. + \<^descr> \prods\ lists the productions of the underlying priority grammar; see + \secref{sec:priority-grammar}. - Productions without such parse tree names are called \<^emph>\copy - productions\. Their right-hand side must have exactly one - nonterminal symbol (or named token). The parser does not create a - new parse tree node for copy productions, but simply returns the - parse tree of the right-hand symbol. + The nonterminal \A\<^sup>(\<^sup>p\<^sup>)\ is rendered in plain text as \A[p]\; delimiters + are quoted. Many productions have an extra \\ => name\. These names later + become the heads of parse trees; they also guide the pretty printer. + + Productions without such parse tree names are called \<^emph>\copy productions\. + Their right-hand side must have exactly one nonterminal symbol (or named + token). The parser does not create a new parse tree node for copy + productions, but simply returns the parse tree of the right-hand symbol. If the right-hand side of a copy production consists of a single nonterminal without any delimiters, then it is called a \<^emph>\chain - production\. Chain productions act as abbreviations: conceptually, - they are removed from the grammar by adding new productions. - Priority information attached to chain productions is ignored; only - the dummy value \-1\ is displayed. + production\. Chain productions act as abbreviations: conceptually, they + are removed from the grammar by adding new productions. Priority + information attached to chain productions is ignored; only the dummy value + \-1\ is displayed. - \<^descr> \print modes\ lists the alternative print modes - provided by this grammar; see \secref{sec:print-modes}. + \<^descr> \print modes\ lists the alternative print modes provided by this + grammar; see \secref{sec:print-modes}. - \<^descr> \parse_rules\ and \print_rules\ relate to - syntax translations (macros); see \secref{sec:syn-trans}. + \<^descr> \parse_rules\ and \print_rules\ relate to syntax translations (macros); + see \secref{sec:syn-trans}. - \<^descr> \parse_ast_translation\ and \print_ast_translation\ list sets of constants that invoke - translation functions for abstract syntax trees, which are only - required in very special situations; see \secref{sec:tr-funs}. + \<^descr> \parse_ast_translation\ and \print_ast_translation\ list sets of + constants that invoke translation functions for abstract syntax trees, + which are only required in very special situations; see + \secref{sec:tr-funs}. - \<^descr> \parse_translation\ and \print_translation\ - list the sets of constants that invoke regular translation - functions; see \secref{sec:tr-funs}. + \<^descr> \parse_translation\ and \print_translation\ list the sets of constants + that invoke regular translation functions; see \secref{sec:tr-funs}. \ @@ -899,37 +855,36 @@ @{attribute_def syntax_ambiguity_limit} & : & \attribute\ & default \10\ \\ \end{tabular} - Depending on the grammar and the given input, parsing may be - ambiguous. Isabelle lets the Earley parser enumerate all possible - parse trees, and then tries to make the best out of the situation. - Terms that cannot be type-checked are filtered out, which often - leads to a unique result in the end. Unlike regular type - reconstruction, which is applied to the whole collection of input - terms simultaneously, the filtering stage only treats each given - term in isolation. Filtering is also not attempted for individual + Depending on the grammar and the given input, parsing may be ambiguous. + Isabelle lets the Earley parser enumerate all possible parse trees, and then + tries to make the best out of the situation. Terms that cannot be + type-checked are filtered out, which often leads to a unique result in the + end. Unlike regular type reconstruction, which is applied to the whole + collection of input terms simultaneously, the filtering stage only treats + each given term in isolation. Filtering is also not attempted for individual types or raw ASTs (as required for @{command translations}). - Certain warning or error messages are printed, depending on the - situation and the given configuration options. Parsing ultimately - fails, if multiple results remain after the filtering phase. + Certain warning or error messages are printed, depending on the situation + and the given configuration options. Parsing ultimately fails, if multiple + results remain after the filtering phase. - \<^descr> @{attribute syntax_ambiguity_warning} controls output of - explicit warning messages about syntax ambiguity. + \<^descr> @{attribute syntax_ambiguity_warning} controls output of explicit warning + messages about syntax ambiguity. - \<^descr> @{attribute syntax_ambiguity_limit} determines the number of - resulting parse trees that are shown as part of the printed message - in case of an ambiguity. + \<^descr> @{attribute syntax_ambiguity_limit} determines the number of resulting + parse trees that are shown as part of the printed message in case of an + ambiguity. \ section \Syntax transformations \label{sec:syntax-transformations}\ -text \The inner syntax engine of Isabelle provides separate - mechanisms to transform parse trees either via rewrite systems on - first-order ASTs (\secref{sec:syn-trans}), or ML functions on ASTs - or syntactic \\\-terms (\secref{sec:tr-funs}). This works - both for parsing and printing, as outlined in - \figref{fig:parse-print}. +text \ + The inner syntax engine of Isabelle provides separate mechanisms to + transform parse trees either via rewrite systems on first-order ASTs + (\secref{sec:syn-trans}), or ML functions on ASTs or syntactic \\\-terms + (\secref{sec:tr-funs}). This works both for parsing and printing, as + outlined in \figref{fig:parse-print}. \begin{figure}[htbp] \begin{center} @@ -954,137 +909,129 @@ \caption{Parsing and printing with translations}\label{fig:parse-print} \end{figure} - These intermediate syntax tree formats eventually lead to a pre-term - with all names and binding scopes resolved, but most type - information still missing. Explicit type constraints might be given by - the user, or implicit position information by the system --- both - need to be passed-through carefully by syntax transformations. + These intermediate syntax tree formats eventually lead to a pre-term with + all names and binding scopes resolved, but most type information still + missing. Explicit type constraints might be given by the user, or implicit + position information by the system --- both need to be passed-through + carefully by syntax transformations. - Pre-terms are further processed by the so-called \<^emph>\check\ and - \<^emph>\uncheck\ phases that are intertwined with type-inference (see - also @{cite "isabelle-implementation"}). The latter allows to operate - on higher-order abstract syntax with proper binding and type - information already available. + Pre-terms are further processed by the so-called \<^emph>\check\ and \<^emph>\uncheck\ + phases that are intertwined with type-inference (see also @{cite + "isabelle-implementation"}). The latter allows to operate on higher-order + abstract syntax with proper binding and type information already available. - As a rule of thumb, anything that manipulates bindings of variables - or constants needs to be implemented as syntax transformation (see - below). Anything else is better done via check/uncheck: a prominent - example application is the @{command abbreviation} concept of - Isabelle/Pure.\ + As a rule of thumb, anything that manipulates bindings of variables or + constants needs to be implemented as syntax transformation (see below). + Anything else is better done via check/uncheck: a prominent example + application is the @{command abbreviation} concept of Isabelle/Pure. +\ subsection \Abstract syntax trees \label{sec:ast}\ -text \The ML datatype @{ML_type Ast.ast} explicitly represents the - intermediate AST format that is used for syntax rewriting - (\secref{sec:syn-trans}). It is defined in ML as follows: +text \ + The ML datatype @{ML_type Ast.ast} explicitly represents the intermediate + AST format that is used for syntax rewriting (\secref{sec:syn-trans}). It is + defined in ML as follows: @{verbatim [display] \datatype ast = Constant of string | Variable of string | Appl of ast list\} - An AST is either an atom (constant or variable) or a list of (at - least two) subtrees. Occasional diagnostic output of ASTs uses - notation that resembles S-expression of LISP. Constant atoms are - shown as quoted strings, variable atoms as non-quoted strings and - applications as a parenthesized list of subtrees. For example, the - AST + An AST is either an atom (constant or variable) or a list of (at least two) + subtrees. Occasional diagnostic output of ASTs uses notation that resembles + S-expression of LISP. Constant atoms are shown as quoted strings, variable + atoms as non-quoted strings and applications as a parenthesized list of + subtrees. For example, the AST @{ML [display] \Ast.Appl [Ast.Constant "_abs", Ast.Variable "x", Ast.Variable "t"]\} - is pretty-printed as \<^verbatim>\("_abs" x t)\. Note that - \<^verbatim>\()\ and \<^verbatim>\(x)\ are excluded as ASTs, because - they have too few subtrees. + is pretty-printed as \<^verbatim>\("_abs" x t)\. Note that \<^verbatim>\()\ and \<^verbatim>\(x)\ are + excluded as ASTs, because they have too few subtrees. \<^medskip> - AST application is merely a pro-forma mechanism to indicate - certain syntactic structures. Thus \<^verbatim>\(c a b)\ could mean - either term application or type application, depending on the - syntactic context. + AST application is merely a pro-forma mechanism to indicate certain + syntactic structures. Thus \<^verbatim>\(c a b)\ could mean either term application or + type application, depending on the syntactic context. - Nested application like \<^verbatim>\(("_abs" x t) u)\ is also - possible, but ASTs are definitely first-order: the syntax constant - \<^verbatim>\"_abs"\ does not bind the \<^verbatim>\x\ in any way. - Proper bindings are introduced in later stages of the term syntax, - where \<^verbatim>\("_abs" x t)\ becomes an @{ML Abs} node and - occurrences of \<^verbatim>\x\ in \<^verbatim>\t\ are replaced by bound - variables (represented as de-Bruijn indices). + Nested application like \<^verbatim>\(("_abs" x t) u)\ is also possible, but ASTs are + definitely first-order: the syntax constant \<^verbatim>\"_abs"\ does not bind the \<^verbatim>\x\ + in any way. Proper bindings are introduced in later stages of the term + syntax, where \<^verbatim>\("_abs" x t)\ becomes an @{ML Abs} node and occurrences of + \<^verbatim>\x\ in \<^verbatim>\t\ are replaced by bound variables (represented as de-Bruijn + indices). \ subsubsection \AST constants versus variables\ -text \Depending on the situation --- input syntax, output syntax, - translation patterns --- the distinction of atomic ASTs as @{ML - Ast.Constant} versus @{ML Ast.Variable} serves slightly different - purposes. +text \ + Depending on the situation --- input syntax, output syntax, translation + patterns --- the distinction of atomic ASTs as @{ML Ast.Constant} versus + @{ML Ast.Variable} serves slightly different purposes. - Input syntax of a term such as \f a b = c\ does not yet - indicate the scopes of atomic entities \f, a, b, c\: they - could be global constants or local variables, even bound ones - depending on the context of the term. @{ML Ast.Variable} leaves - this choice still open: later syntax layers (or translation - functions) may capture such a variable to determine its role - specifically, to make it a constant, bound variable, free variable - etc. In contrast, syntax translations that introduce already known - constants would rather do it via @{ML Ast.Constant} to prevent - accidental re-interpretation later on. + Input syntax of a term such as \f a b = c\ does not yet indicate the scopes + of atomic entities \f, a, b, c\: they could be global constants or local + variables, even bound ones depending on the context of the term. @{ML + Ast.Variable} leaves this choice still open: later syntax layers (or + translation functions) may capture such a variable to determine its role + specifically, to make it a constant, bound variable, free variable etc. In + contrast, syntax translations that introduce already known constants would + rather do it via @{ML Ast.Constant} to prevent accidental re-interpretation + later on. - Output syntax turns term constants into @{ML Ast.Constant} and - variables (free or schematic) into @{ML Ast.Variable}. This - information is precise when printing fully formal \\\-terms. + Output syntax turns term constants into @{ML Ast.Constant} and variables + (free or schematic) into @{ML Ast.Variable}. This information is precise + when printing fully formal \\\-terms. \<^medskip> - AST translation patterns (\secref{sec:syn-trans}) that - represent terms cannot distinguish constants and variables - syntactically. Explicit indication of \CONST c\ inside the - term language is required, unless \c\ is known as special - \<^emph>\syntax constant\ (see also @{command syntax}). It is also - possible to use @{command syntax} declarations (without mixfix - annotation) to enforce that certain unqualified names are always - treated as constant within the syntax machinery. + AST translation patterns (\secref{sec:syn-trans}) that represent terms + cannot distinguish constants and variables syntactically. Explicit + indication of \CONST c\ inside the term language is required, unless \c\ is + known as special \<^emph>\syntax constant\ (see also @{command syntax}). It is also + possible to use @{command syntax} declarations (without mixfix annotation) + to enforce that certain unqualified names are always treated as constant + within the syntax machinery. - The situation is simpler for ASTs that represent types or sorts, - since the concrete syntax already distinguishes type variables from - type constants (constructors). So \('a, 'b) foo\ - corresponds to an AST application of some constant for \foo\ - and variable arguments for \'a\ and \'b\. Note that - the postfix application is merely a feature of the concrete syntax, - while in the AST the constructor occurs in head position.\ + The situation is simpler for ASTs that represent types or sorts, since the + concrete syntax already distinguishes type variables from type constants + (constructors). So \('a, 'b) foo\ corresponds to an AST application of some + constant for \foo\ and variable arguments for \'a\ and \'b\. Note that the + postfix application is merely a feature of the concrete syntax, while in the + AST the constructor occurs in head position. +\ subsubsection \Authentic syntax names\ -text \Naming constant entities within ASTs is another delicate - issue. Unqualified names are resolved in the name space tables in - the last stage of parsing, after all translations have been applied. - Since syntax transformations do not know about this later name - resolution, there can be surprises in boundary cases. +text \ + Naming constant entities within ASTs is another delicate issue. Unqualified + names are resolved in the name space tables in the last stage of parsing, + after all translations have been applied. Since syntax transformations do + not know about this later name resolution, there can be surprises in + boundary cases. - \<^emph>\Authentic syntax names\ for @{ML Ast.Constant} avoid this - problem: the fully-qualified constant name with a special prefix for - its formal category (\class\, \type\, \const\, \fixed\) represents the information faithfully - within the untyped AST format. Accidental overlap with free or - bound variables is excluded as well. Authentic syntax names work - implicitly in the following situations: + \<^emph>\Authentic syntax names\ for @{ML Ast.Constant} avoid this problem: the + fully-qualified constant name with a special prefix for its formal category + (\class\, \type\, \const\, \fixed\) represents the information faithfully + within the untyped AST format. Accidental overlap with free or bound + variables is excluded as well. Authentic syntax names work implicitly in the + following situations: - \<^item> Input of term constants (or fixed variables) that are - introduced by concrete syntax via @{command notation}: the - correspondence of a particular grammar production to some known term - entity is preserved. + \<^item> Input of term constants (or fixed variables) that are introduced by + concrete syntax via @{command notation}: the correspondence of a + particular grammar production to some known term entity is preserved. - \<^item> Input of type constants (constructors) and type classes --- - thanks to explicit syntactic distinction independently on the - context. + \<^item> Input of type constants (constructors) and type classes --- thanks to + explicit syntactic distinction independently on the context. - \<^item> Output of term constants, type constants, type classes --- - this information is already available from the internal term to be - printed. - + \<^item> Output of term constants, type constants, type classes --- this + information is already available from the internal term to be printed. - In other words, syntax transformations that operate on input terms - written as prefix applications are difficult to make robust. - Luckily, this case rarely occurs in practice, because syntax forms - to be translated usually correspond to some concrete notation.\ + In other words, syntax transformations that operate on input terms written + as prefix applications are difficult to make robust. Luckily, this case + rarely occurs in practice, because syntax forms to be translated usually + correspond to some concrete notation. +\ subsection \Raw syntax and translations \label{sec:syn-trans}\ @@ -1101,14 +1048,13 @@ \end{tabular} \<^medskip> - Unlike mixfix notation for existing formal entities - (\secref{sec:notation}), raw syntax declarations provide full access - to the priority grammar of the inner syntax, without any sanity - checks. This includes additional syntactic categories (via - @{command nonterminal}) and free-form grammar productions (via - @{command syntax}). Additional syntax translations (or macros, via - @{command translations}) are required to turn resulting parse trees - into proper representations of formal entities again. + Unlike mixfix notation for existing formal entities (\secref{sec:notation}), + raw syntax declarations provide full access to the priority grammar of the + inner syntax, without any sanity checks. This includes additional syntactic + categories (via @{command nonterminal}) and free-form grammar productions + (via @{command syntax}). Additional syntax translations (or macros, via + @{command translations}) are required to turn resulting parse trees into + proper representations of formal entities again. @{rail \ @@{command nonterminal} (@{syntax name} + @'and') @@ -1126,193 +1072,177 @@ transpat: ('(' @{syntax nameref} ')')? @{syntax string} \} - \<^descr> @{command "nonterminal"}~\c\ declares a type - constructor \c\ (without arguments) to act as purely syntactic - type: a nonterminal symbol of the inner syntax. + \<^descr> @{command "nonterminal"}~\c\ declares a type constructor \c\ (without + arguments) to act as purely syntactic type: a nonterminal symbol of the + inner syntax. - \<^descr> @{command "syntax"}~\(mode) c :: \ (mx)\ augments the - priority grammar and the pretty printer table for the given print - mode (default \<^verbatim>\""\). An optional keyword @{keyword_ref - "output"} means that only the pretty printer table is affected. + \<^descr> @{command "syntax"}~\(mode) c :: \ (mx)\ augments the priority grammar and + the pretty printer table for the given print mode (default \<^verbatim>\""\). An + optional keyword @{keyword_ref "output"} means that only the pretty printer + table is affected. - Following \secref{sec:mixfix}, the mixfix annotation \mx = - template ps q\ together with type \\ = \\<^sub>1 \ \ \\<^sub>n \ \\ and - specify a grammar production. The \template\ contains - delimiter tokens that surround \n\ argument positions - (\<^verbatim>\_\). The latter correspond to nonterminal symbols - \A\<^sub>i\ derived from the argument types \\\<^sub>i\ as - follows: + Following \secref{sec:mixfix}, the mixfix annotation \mx = template ps q\ + together with type \\ = \\<^sub>1 \ \ \\<^sub>n \ \\ and specify a grammar production. + The \template\ contains delimiter tokens that surround \n\ argument + positions (\<^verbatim>\_\). The latter correspond to nonterminal symbols \A\<^sub>i\ derived + from the argument types \\\<^sub>i\ as follows: \<^item> \prop\ if \\\<^sub>i = prop\ - \<^item> \logic\ if \\\<^sub>i = (\)\\ for logical type - constructor \\ \ prop\ + \<^item> \logic\ if \\\<^sub>i = (\)\\ for logical type constructor \\ \ prop\ \<^item> \any\ if \\\<^sub>i = \\ for type variables - \<^item> \\\ if \\\<^sub>i = \\ for nonterminal \\\ - (syntactic type constructor) + \<^item> \\\ if \\\<^sub>i = \\ for nonterminal \\\ (syntactic type constructor) - Each \A\<^sub>i\ is decorated by priority \p\<^sub>i\ from the - given list \ps\; missing priorities default to 0. + Each \A\<^sub>i\ is decorated by priority \p\<^sub>i\ from the given list \ps\; missing + priorities default to 0. - The resulting nonterminal of the production is determined similarly - from type \\\, with priority \q\ and default 1000. + The resulting nonterminal of the production is determined similarly from + type \\\, with priority \q\ and default 1000. \<^medskip> - Parsing via this production produces parse trees \t\<^sub>1, \, t\<^sub>n\ for the argument slots. The resulting parse tree is - composed as \c t\<^sub>1 \ t\<^sub>n\, by using the syntax constant \c\ of the syntax declaration. + Parsing via this production produces parse trees \t\<^sub>1, \, t\<^sub>n\ for the + argument slots. The resulting parse tree is composed as \c t\<^sub>1 \ t\<^sub>n\, by + using the syntax constant \c\ of the syntax declaration. - Such syntactic constants are invented on the spot, without formal - check wrt.\ existing declarations. It is conventional to use plain - identifiers prefixed by a single underscore (e.g.\ \_foobar\). Names should be chosen with care, to avoid clashes - with other syntax declarations. + Such syntactic constants are invented on the spot, without formal check + wrt.\ existing declarations. It is conventional to use plain identifiers + prefixed by a single underscore (e.g.\ \_foobar\). Names should be chosen + with care, to avoid clashes with other syntax declarations. \<^medskip> - The special case of copy production is specified by \c =\~\<^verbatim>\""\ (empty string). - It means that the resulting parse tree \t\ is copied directly, without any - further decoration. + The special case of copy production is specified by \c =\~\<^verbatim>\""\ (empty + string). It means that the resulting parse tree \t\ is copied directly, + without any further decoration. - \<^descr> @{command "no_syntax"}~\(mode) decls\ removes grammar - declarations (and translations) resulting from \decls\, which - are interpreted in the same manner as for @{command "syntax"} above. + \<^descr> @{command "no_syntax"}~\(mode) decls\ removes grammar declarations (and + translations) resulting from \decls\, which are interpreted in the same + manner as for @{command "syntax"} above. + + \<^descr> @{command "translations"}~\rules\ specifies syntactic translation rules + (i.e.\ macros) as first-order rewrite rules on ASTs (\secref{sec:ast}). The + theory context maintains two independent lists translation rules: parse + rules (\<^verbatim>\=>\ or \\\) and print rules (\<^verbatim>\<=\ or \\\). For convenience, both + can be specified simultaneously as parse~/ print rules (\<^verbatim>\==\ or \\\). - \<^descr> @{command "translations"}~\rules\ specifies syntactic - translation rules (i.e.\ macros) as first-order rewrite rules on - ASTs (\secref{sec:ast}). The theory context maintains two - independent lists translation rules: parse rules (\<^verbatim>\=>\ - or \\\) and print rules (\<^verbatim>\<=\ or \\\). - For convenience, both can be specified simultaneously as parse~/ - print rules (\<^verbatim>\==\ or \\\). + Translation patterns may be prefixed by the syntactic category to be used + for parsing; the default is \logic\ which means that regular term syntax is + used. Both sides of the syntax translation rule undergo parsing and parse + AST translations \secref{sec:tr-funs}, in order to perform some fundamental + normalization like \\x y. b \ \x. \y. b\, but other AST translation rules + are \<^emph>\not\ applied recursively here. - Translation patterns may be prefixed by the syntactic category to be - used for parsing; the default is \logic\ which means that - regular term syntax is used. Both sides of the syntax translation - rule undergo parsing and parse AST translations - \secref{sec:tr-funs}, in order to perform some fundamental - normalization like \\x y. b \ \x. \y. b\, but other AST - translation rules are \<^emph>\not\ applied recursively here. - - When processing AST patterns, the inner syntax lexer runs in a - different mode that allows identifiers to start with underscore. - This accommodates the usual naming convention for auxiliary syntax - constants --- those that do not have a logical counter part --- by - allowing to specify arbitrary AST applications within the term - syntax, independently of the corresponding concrete syntax. + When processing AST patterns, the inner syntax lexer runs in a different + mode that allows identifiers to start with underscore. This accommodates the + usual naming convention for auxiliary syntax constants --- those that do not + have a logical counter part --- by allowing to specify arbitrary AST + applications within the term syntax, independently of the corresponding + concrete syntax. Atomic ASTs are distinguished as @{ML Ast.Constant} versus @{ML - Ast.Variable} as follows: a qualified name or syntax constant - declared via @{command syntax}, or parse tree head of concrete - notation becomes @{ML Ast.Constant}, anything else @{ML - Ast.Variable}. Note that \CONST\ and \XCONST\ within - the term language (\secref{sec:pure-grammar}) allow to enforce - treatment as constants. + Ast.Variable} as follows: a qualified name or syntax constant declared via + @{command syntax}, or parse tree head of concrete notation becomes @{ML + Ast.Constant}, anything else @{ML Ast.Variable}. Note that \CONST\ and + \XCONST\ within the term language (\secref{sec:pure-grammar}) allow to + enforce treatment as constants. - AST rewrite rules \(lhs, rhs)\ need to obey the following - side-conditions: + AST rewrite rules \(lhs, rhs)\ need to obey the following side-conditions: - \<^item> Rules must be left linear: \lhs\ must not contain - repeated variables.\<^footnote>\The deeper reason for this is that AST - equality is not well-defined: different occurrences of the ``same'' - AST could be decorated differently by accidental type-constraints or - source position information, for example.\ + \<^item> Rules must be left linear: \lhs\ must not contain repeated + variables.\<^footnote>\The deeper reason for this is that AST equality is not + well-defined: different occurrences of the ``same'' AST could be decorated + differently by accidental type-constraints or source position information, + for example.\ \<^item> Every variable in \rhs\ must also occur in \lhs\. - \<^descr> @{command "no_translations"}~\rules\ removes syntactic - translation rules, which are interpreted in the same manner as for - @{command "translations"} above. + \<^descr> @{command "no_translations"}~\rules\ removes syntactic translation rules, + which are interpreted in the same manner as for @{command "translations"} + above. - \<^descr> @{attribute syntax_ast_trace} and @{attribute - syntax_ast_stats} control diagnostic output in the AST normalization - process, when translation rules are applied to concrete input or - output. + \<^descr> @{attribute syntax_ast_trace} and @{attribute syntax_ast_stats} control + diagnostic output in the AST normalization process, when translation rules + are applied to concrete input or output. - Raw syntax and translations provides a slightly more low-level - access to the grammar and the form of resulting parse trees. It is - often possible to avoid this untyped macro mechanism, and use - type-safe @{command abbreviation} or @{command notation} instead. - Some important situations where @{command syntax} and @{command - translations} are really need are as follows: + Raw syntax and translations provides a slightly more low-level access to the + grammar and the form of resulting parse trees. It is often possible to avoid + this untyped macro mechanism, and use type-safe @{command abbreviation} or + @{command notation} instead. Some important situations where @{command + syntax} and @{command translations} are really need are as follows: - \<^item> Iterated replacement via recursive @{command translations}. - For example, consider list enumeration @{term "[a, b, c, d]"} as - defined in theory @{theory List} in Isabelle/HOL. + \<^item> Iterated replacement via recursive @{command translations}. For example, + consider list enumeration @{term "[a, b, c, d]"} as defined in theory + @{theory List} in Isabelle/HOL. - \<^item> Change of binding status of variables: anything beyond the - built-in @{keyword "binder"} mixfix annotation requires explicit - syntax translations. For example, consider list filter - comprehension @{term "[x \ xs . P]"} as defined in theory @{theory - List} in Isabelle/HOL. + \<^item> Change of binding status of variables: anything beyond the built-in + @{keyword "binder"} mixfix annotation requires explicit syntax translations. + For example, consider list filter comprehension @{term "[x \ xs . P]"} as + defined in theory @{theory List} in Isabelle/HOL. \ subsubsection \Applying translation rules\ -text \As a term is being parsed or printed, an AST is generated as - an intermediate form according to \figref{fig:parse-print}. The AST - is normalized by applying translation rules in the manner of a - first-order term rewriting system. We first examine how a single - rule is applied. +text \ + As a term is being parsed or printed, an AST is generated as an intermediate + form according to \figref{fig:parse-print}. The AST is normalized by + applying translation rules in the manner of a first-order term rewriting + system. We first examine how a single rule is applied. - Let \t\ be the abstract syntax tree to be normalized and - \(lhs, rhs)\ some translation rule. A subtree \u\ - of \t\ is called \<^emph>\redex\ if it is an instance of \lhs\; in this case the pattern \lhs\ is said to match the - object \u\. A redex matched by \lhs\ may be - replaced by the corresponding instance of \rhs\, thus - \<^emph>\rewriting\ the AST \t\. Matching requires some notion - of \<^emph>\place-holders\ in rule patterns: @{ML Ast.Variable} serves - this purpose. + Let \t\ be the abstract syntax tree to be normalized and \(lhs, rhs)\ some + translation rule. A subtree \u\ of \t\ is called \<^emph>\redex\ if it is an + instance of \lhs\; in this case the pattern \lhs\ is said to match the + object \u\. A redex matched by \lhs\ may be replaced by the corresponding + instance of \rhs\, thus \<^emph>\rewriting\ the AST \t\. Matching requires some + notion of \<^emph>\place-holders\ in rule patterns: @{ML Ast.Variable} serves this + purpose. - More precisely, the matching of the object \u\ against the - pattern \lhs\ is performed as follows: + More precisely, the matching of the object \u\ against the pattern \lhs\ is + performed as follows: - \<^item> Objects of the form @{ML Ast.Variable}~\x\ or @{ML - Ast.Constant}~\x\ are matched by pattern @{ML - Ast.Constant}~\x\. Thus all atomic ASTs in the object are - treated as (potential) constants, and a successful match makes them - actual constants even before name space resolution (see also - \secref{sec:ast}). + \<^item> Objects of the form @{ML Ast.Variable}~\x\ or @{ML Ast.Constant}~\x\ are + matched by pattern @{ML Ast.Constant}~\x\. Thus all atomic ASTs in the + object are treated as (potential) constants, and a successful match makes + them actual constants even before name space resolution (see also + \secref{sec:ast}). - \<^item> Object \u\ is matched by pattern @{ML - Ast.Variable}~\x\, binding \x\ to \u\. + \<^item> Object \u\ is matched by pattern @{ML Ast.Variable}~\x\, binding \x\ to + \u\. - \<^item> Object @{ML Ast.Appl}~\us\ is matched by @{ML - Ast.Appl}~\ts\ if \us\ and \ts\ have the - same length and each corresponding subtree matches. + \<^item> Object @{ML Ast.Appl}~\us\ is matched by @{ML Ast.Appl}~\ts\ if \us\ and + \ts\ have the same length and each corresponding subtree matches. - \<^item> In every other case, matching fails. + \<^item> In every other case, matching fails. - - A successful match yields a substitution that is applied to \rhs\, generating the instance that replaces \u\. + A successful match yields a substitution that is applied to \rhs\, + generating the instance that replaces \u\. - Normalizing an AST involves repeatedly applying translation rules - until none are applicable. This works yoyo-like: top-down, - bottom-up, top-down, etc. At each subtree position, rules are - chosen in order of appearance in the theory definitions. + Normalizing an AST involves repeatedly applying translation rules until none + are applicable. This works yoyo-like: top-down, bottom-up, top-down, etc. At + each subtree position, rules are chosen in order of appearance in the theory + definitions. - The configuration options @{attribute syntax_ast_trace} and - @{attribute syntax_ast_stats} might help to understand this process - and diagnose problems. + The configuration options @{attribute syntax_ast_trace} and @{attribute + syntax_ast_stats} might help to understand this process and diagnose + problems. \begin{warn} - If syntax translation rules work incorrectly, the output of - @{command_ref print_syntax} with its \<^emph>\rules\ sections reveals the - actual internal forms of AST pattern, without potentially confusing - concrete syntax. Recall that AST constants appear as quoted strings - and variables without quotes. + If syntax translation rules work incorrectly, the output of @{command_ref + print_syntax} with its \<^emph>\rules\ sections reveals the actual internal forms + of AST pattern, without potentially confusing concrete syntax. Recall that + AST constants appear as quoted strings and variables without quotes. \end{warn} \begin{warn} - If @{attribute_ref eta_contract} is set to \true\, terms - will be \\\-contracted \<^emph>\before\ the AST rewriter sees - them. Thus some abstraction nodes needed for print rules to match - may vanish. For example, \Ball A (\x. P x)\ would contract - to \Ball A P\ and the standard print rule would fail to - apply. This problem can be avoided by hand-written ML translation - functions (see also \secref{sec:tr-funs}), which is in fact the same - mechanism used in built-in @{keyword "binder"} declarations. + If @{attribute_ref eta_contract} is set to \true\, terms will be + \\\-contracted \<^emph>\before\ the AST rewriter sees them. Thus some abstraction + nodes needed for print rules to match may vanish. For example, \Ball A (\x. + P x)\ would contract to \Ball A P\ and the standard print rule would fail to + apply. This problem can be avoided by hand-written ML translation functions + (see also \secref{sec:tr-funs}), which is in fact the same mechanism used in + built-in @{keyword "binder"} declarations. \end{warn} \ @@ -1347,10 +1277,9 @@ @@{ML_antiquotation syntax_const}) name \} - \<^descr> @{command parse_translation} etc. declare syntax translation - functions to the theory. Any of these commands have a single - @{syntax text} argument that refers to an ML expression of - appropriate type as follows: + \<^descr> @{command parse_translation} etc. declare syntax translation functions to + the theory. Any of these commands have a single @{syntax text} argument that + refers to an ML expression of appropriate type as follows: \<^medskip> {\footnotesize @@ -1368,110 +1297,108 @@ \end{tabular}} \<^medskip> - The argument list consists of \(c, tr)\ pairs, where \c\ is the syntax name of the formal entity involved, and \tr\ a function that translates a syntax form \c args\ into - \tr ctxt args\ (depending on the context). The Isabelle/ML - naming convention for parse translations is \c_tr\ and for - print translations \c_tr'\. + The argument list consists of \(c, tr)\ pairs, where \c\ is the syntax name + of the formal entity involved, and \tr\ a function that translates a syntax + form \c args\ into \tr ctxt args\ (depending on the context). The + Isabelle/ML naming convention for parse translations is \c_tr\ and for print + translations \c_tr'\. The @{command_ref print_syntax} command displays the sets of names - associated with the translation functions of a theory under \parse_ast_translation\ etc. + associated with the translation functions of a theory under + \parse_ast_translation\ etc. - \<^descr> \@{class_syntax c}\, \@{type_syntax c}\, - \@{const_syntax c}\ inline the authentic syntax name of the - given formal entities into the ML source. This is the - fully-qualified logical name prefixed by a special marker to - indicate its kind: thus different logical name spaces are properly - distinguished within parse trees. + \<^descr> \@{class_syntax c}\, \@{type_syntax c}\, \@{const_syntax c}\ inline the + authentic syntax name of the given formal entities into the ML source. This + is the fully-qualified logical name prefixed by a special marker to indicate + its kind: thus different logical name spaces are properly distinguished + within parse trees. - \<^descr> \@{const_syntax c}\ inlines the name \c\ of - the given syntax constant, having checked that it has been declared - via some @{command syntax} commands within the theory context. Note - that the usual naming convention makes syntax constants start with - underscore, to reduce the chance of accidental clashes with other - names occurring in parse trees (unqualified constants etc.). + \<^descr> \@{const_syntax c}\ inlines the name \c\ of the given syntax constant, + having checked that it has been declared via some @{command syntax} commands + within the theory context. Note that the usual naming convention makes + syntax constants start with underscore, to reduce the chance of accidental + clashes with other names occurring in parse trees (unqualified constants + etc.). \ subsubsection \The translation strategy\ -text \The different kinds of translation functions are invoked during - the transformations between parse trees, ASTs and syntactic terms - (cf.\ \figref{fig:parse-print}). Whenever a combination of the form - \c x\<^sub>1 \ x\<^sub>n\ is encountered, and a translation function - \f\ of appropriate kind is declared for \c\, the - result is produced by evaluation of \f [x\<^sub>1, \, x\<^sub>n]\ in ML. +text \ + The different kinds of translation functions are invoked during the + transformations between parse trees, ASTs and syntactic terms (cf.\ + \figref{fig:parse-print}). Whenever a combination of the form \c x\<^sub>1 \ x\<^sub>n\ + is encountered, and a translation function \f\ of appropriate kind is + declared for \c\, the result is produced by evaluation of \f [x\<^sub>1, \, x\<^sub>n]\ + in ML. - For AST translations, the arguments \x\<^sub>1, \, x\<^sub>n\ are ASTs. A - combination has the form @{ML "Ast.Constant"}~\c\ or @{ML - "Ast.Appl"}~\[\@{ML Ast.Constant}~\c, x\<^sub>1, \, x\<^sub>n]\. - For term translations, the arguments are terms and a combination has - the form @{ML Const}~\(c, \)\ or @{ML Const}~\(c, \) - $ x\<^sub>1 $ \ $ x\<^sub>n\. Terms allow more sophisticated transformations - than ASTs do, typically involving abstractions and bound - variables. \<^emph>\Typed\ print translations may even peek at the type - \\\ of the constant they are invoked on, although some - information might have been suppressed for term output already. + For AST translations, the arguments \x\<^sub>1, \, x\<^sub>n\ are ASTs. A combination + has the form @{ML "Ast.Constant"}~\c\ or @{ML "Ast.Appl"}~\[\@{ML + Ast.Constant}~\c, x\<^sub>1, \, x\<^sub>n]\. For term translations, the arguments are + terms and a combination has the form @{ML Const}~\(c, \)\ or @{ML + Const}~\(c, \) $ x\<^sub>1 $ \ $ x\<^sub>n\. Terms allow more sophisticated + transformations than ASTs do, typically involving abstractions and bound + variables. \<^emph>\Typed\ print translations may even peek at the type \\\ of the + constant they are invoked on, although some information might have been + suppressed for term output already. - Regardless of whether they act on ASTs or terms, translation - functions called during the parsing process differ from those for - printing in their overall behaviour: + Regardless of whether they act on ASTs or terms, translation functions + called during the parsing process differ from those for printing in their + overall behaviour: - \<^descr>[Parse translations] are applied bottom-up. The arguments are - already in translated form. The translations must not fail; - exceptions trigger an error message. There may be at most one - function associated with any syntactic name. + \<^descr>[Parse translations] are applied bottom-up. The arguments are already in + translated form. The translations must not fail; exceptions trigger an + error message. There may be at most one function associated with any + syntactic name. - \<^descr>[Print translations] are applied top-down. They are supplied - with arguments that are partly still in internal form. The result - again undergoes translation; therefore a print translation should - not introduce as head the very constant that invoked it. The - function may raise exception @{ML Match} to indicate failure; in - this event it has no effect. Multiple functions associated with - some syntactic name are tried in the order of declaration in the - theory. + \<^descr>[Print translations] are applied top-down. They are supplied with + arguments that are partly still in internal form. The result again + undergoes translation; therefore a print translation should not introduce + as head the very constant that invoked it. The function may raise + exception @{ML Match} to indicate failure; in this event it has no effect. + Multiple functions associated with some syntactic name are tried in the + order of declaration in the theory. - - Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and - @{ML Const} for terms --- can invoke translation functions. This - means that parse translations can only be associated with parse tree - heads of concrete syntax, or syntactic constants introduced via - other translations. For plain identifiers within the term language, - the status of constant versus variable is not yet know during - parsing. This is in contrast to print translations, where constants - are explicitly known from the given term in its fully internal form. + Only constant atoms --- constructor @{ML Ast.Constant} for ASTs and @{ML + Const} for terms --- can invoke translation functions. This means that parse + translations can only be associated with parse tree heads of concrete + syntax, or syntactic constants introduced via other translations. For plain + identifiers within the term language, the status of constant versus variable + is not yet know during parsing. This is in contrast to print translations, + where constants are explicitly known from the given term in its fully + internal form. \ subsection \Built-in syntax transformations\ text \ - Here are some further details of the main syntax transformation - phases of \figref{fig:parse-print}. + Here are some further details of the main syntax transformation phases of + \figref{fig:parse-print}. \ subsubsection \Transforming parse trees to ASTs\ -text \The parse tree is the raw output of the parser. It is - transformed into an AST according to some basic scheme that may be - augmented by AST translation functions as explained in - \secref{sec:tr-funs}. +text \ + The parse tree is the raw output of the parser. It is transformed into an + AST according to some basic scheme that may be augmented by AST translation + functions as explained in \secref{sec:tr-funs}. The parse tree is constructed by nesting the right-hand sides of the - productions used to recognize the input. Such parse trees are - simply lists of tokens and constituent parse trees, the latter - representing the nonterminals of the productions. Ignoring AST - translation functions, parse trees are transformed to ASTs by - stripping out delimiters and copy productions, while retaining some - source position information from input tokens. + productions used to recognize the input. Such parse trees are simply lists + of tokens and constituent parse trees, the latter representing the + nonterminals of the productions. Ignoring AST translation functions, parse + trees are transformed to ASTs by stripping out delimiters and copy + productions, while retaining some source position information from input + tokens. - The Pure syntax provides predefined AST translations to make the - basic \\\-term structure more apparent within the - (first-order) AST representation, and thus facilitate the use of - @{command translations} (see also \secref{sec:syn-trans}). This - covers ordinary term application, type application, nested - abstraction, iterated meta implications and function types. The - effect is illustrated on some representative input strings is as + The Pure syntax provides predefined AST translations to make the basic + \\\-term structure more apparent within the (first-order) AST + representation, and thus facilitate the use of @{command translations} (see + also \secref{sec:syn-trans}). This covers ordinary term application, type + application, nested abstraction, iterated meta implications and function + types. The effect is illustrated on some representative input strings is as follows: \begin{center} @@ -1489,85 +1416,83 @@ \end{center} Note that type and sort constraints may occur in further places --- - translations need to be ready to cope with them. The built-in - syntax transformation from parse trees to ASTs insert additional - constraints that represent source positions. + translations need to be ready to cope with them. The built-in syntax + transformation from parse trees to ASTs insert additional constraints that + represent source positions. \ subsubsection \Transforming ASTs to terms\ -text \After application of macros (\secref{sec:syn-trans}), the AST - is transformed into a term. This term still lacks proper type - information, but it might contain some constraints consisting of - applications with head \<^verbatim>\_constrain\, where the second - argument is a type encoded as a pre-term within the syntax. Type - inference later introduces correct types, or indicates type errors - in the input. +text \ + After application of macros (\secref{sec:syn-trans}), the AST is transformed + into a term. This term still lacks proper type information, but it might + contain some constraints consisting of applications with head \<^verbatim>\_constrain\, + where the second argument is a type encoded as a pre-term within the syntax. + Type inference later introduces correct types, or indicates type errors in + the input. - Ignoring parse translations, ASTs are transformed to terms by - mapping AST constants to term constants, AST variables to term - variables or constants (according to the name space), and AST - applications to iterated term applications. + Ignoring parse translations, ASTs are transformed to terms by mapping AST + constants to term constants, AST variables to term variables or constants + (according to the name space), and AST applications to iterated term + applications. - The outcome is still a first-order term. Proper abstractions and - bound variables are introduced by parse translations associated with - certain syntax constants. Thus \<^verbatim>\("_abs" x x)\ eventually - becomes a de-Bruijn term \<^verbatim>\Abs ("x", _, Bound 0)\. + The outcome is still a first-order term. Proper abstractions and bound + variables are introduced by parse translations associated with certain + syntax constants. Thus \<^verbatim>\("_abs" x x)\ eventually becomes a de-Bruijn term + \<^verbatim>\Abs ("x", _, Bound 0)\. \ subsubsection \Printing of terms\ -text \The output phase is essentially the inverse of the input - phase. Terms are translated via abstract syntax trees into - pretty-printed text. +text \ + The output phase is essentially the inverse of the input phase. Terms are + translated via abstract syntax trees into pretty-printed text. Ignoring print translations, the transformation maps term constants, variables and applications to the corresponding constructs on ASTs. - Abstractions are mapped to applications of the special constant - \<^verbatim>\_abs\ as seen before. Type constraints are represented - via special \<^verbatim>\_constrain\ forms, according to various - policies of type annotation determined elsewhere. Sort constraints - of type variables are handled in a similar fashion. + Abstractions are mapped to applications of the special constant \<^verbatim>\_abs\ as + seen before. Type constraints are represented via special \<^verbatim>\_constrain\ + forms, according to various policies of type annotation determined + elsewhere. Sort constraints of type variables are handled in a similar + fashion. - After application of macros (\secref{sec:syn-trans}), the AST is - finally pretty-printed. The built-in print AST translations reverse - the corresponding parse AST translations. + After application of macros (\secref{sec:syn-trans}), the AST is finally + pretty-printed. The built-in print AST translations reverse the + corresponding parse AST translations. \<^medskip> For the actual printing process, the priority grammar - (\secref{sec:priority-grammar}) plays a vital role: productions are - used as templates for pretty printing, with argument slots stemming - from nonterminals, and syntactic sugar stemming from literal tokens. + (\secref{sec:priority-grammar}) plays a vital role: productions are used as + templates for pretty printing, with argument slots stemming from + nonterminals, and syntactic sugar stemming from literal tokens. - Each AST application with constant head \c\ and arguments - \t\<^sub>1\, \dots, \t\<^sub>n\ (for \n = 0\ the AST is - just the constant \c\ itself) is printed according to the - first grammar production of result name \c\. The required - syntax priority of the argument slot is given by its nonterminal - \A\<^sup>(\<^sup>p\<^sup>)\. The argument \t\<^sub>i\ that corresponds to the - position of \A\<^sup>(\<^sup>p\<^sup>)\ is printed recursively, and then put in - parentheses \<^emph>\if\ its priority \p\ requires this. The - resulting output is concatenated with the syntactic sugar according - to the grammar production. + Each AST application with constant head \c\ and arguments \t\<^sub>1\, \dots, + \t\<^sub>n\ (for \n = 0\ the AST is just the constant \c\ itself) is printed + according to the first grammar production of result name \c\. The required + syntax priority of the argument slot is given by its nonterminal \A\<^sup>(\<^sup>p\<^sup>)\. + The argument \t\<^sub>i\ that corresponds to the position of \A\<^sup>(\<^sup>p\<^sup>)\ is printed + recursively, and then put in parentheses \<^emph>\if\ its priority \p\ requires + this. The resulting output is concatenated with the syntactic sugar + according to the grammar production. - If an AST application \(c x\<^sub>1 \ x\<^sub>m)\ has more arguments than - the corresponding production, it is first split into \((c x\<^sub>1 - \ x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \ x\<^sub>m)\ and then printed recursively as above. + If an AST application \(c x\<^sub>1 \ x\<^sub>m)\ has more arguments than the + corresponding production, it is first split into \((c x\<^sub>1 \ x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \ + x\<^sub>m)\ and then printed recursively as above. - Applications with too few arguments or with non-constant head or - without a corresponding production are printed in prefix-form like - \f t\<^sub>1 \ t\<^sub>n\ for terms. + Applications with too few arguments or with non-constant head or without a + corresponding production are printed in prefix-form like \f t\<^sub>1 \ t\<^sub>n\ for + terms. - Multiple productions associated with some name \c\ are tried - in order of appearance within the grammar. An occurrence of some - AST variable \x\ is printed as \x\ outright. + Multiple productions associated with some name \c\ are tried in order of + appearance within the grammar. An occurrence of some AST variable \x\ is + printed as \x\ outright. \<^medskip> - White space is \<^emph>\not\ inserted automatically. If - blanks (or breaks) are required to separate tokens, they need to be - specified in the mixfix declaration (\secref{sec:mixfix}). + White space is \<^emph>\not\ inserted automatically. If blanks (or breaks) are + required to separate tokens, they need to be specified in the mixfix + declaration (\secref{sec:mixfix}). \ end