| author | wenzelm |
| Thu, 15 Mar 2012 11:37:56 +0100 | |
| changeset 46941 | c0f776b661fa |
| parent 46484 | 50fca9d09528 |
| child 47174 | b9b2e183e94d |
| permissions | -rw-r--r-- |
theory Prelim imports Base begin chapter {* Preliminaries *} section {* Contexts \label{sec:context} *} text {* A logical context represents the background that is required for formulating statements and composing proofs. It acts as a medium to produce formal content, depending on earlier material (declarations, results etc.). For example, derivations within the Isabelle/Pure logic can be described as a judgment @{text "\<Gamma> \<turnstile>\<^sub>\<Theta> \<phi>"}, which means that a proposition @{text "\<phi>"} is derivable from hypotheses @{text "\<Gamma>"} within the theory @{text "\<Theta>"}. There are logical reasons for keeping @{text "\<Theta>"} and @{text "\<Gamma>"} separate: theories can be liberal about supporting type constructors and schematic polymorphism of constants and axioms, while the inner calculus of @{text "\<Gamma> \<turnstile> \<phi>"} is strictly limited to Simple Type Theory (with fixed type variables in the assumptions). \medskip Contexts and derivations are linked by the following key principles: \begin{itemize} \item Transfer: monotonicity of derivations admits results to be transferred into a \emph{larger} context, i.e.\ @{text "\<Gamma> \<turnstile>\<^sub>\<Theta> \<phi>"} implies @{text "\<Gamma>' \<turnstile>\<^sub>\<Theta>\<^sub>' \<phi>"} for contexts @{text "\<Theta>' \<supseteq> \<Theta>"} and @{text "\<Gamma>' \<supseteq> \<Gamma>"}. \item Export: discharge of hypotheses admits results to be exported into a \emph{smaller} context, i.e.\ @{text "\<Gamma>' \<turnstile>\<^sub>\<Theta> \<phi>"} implies @{text "\<Gamma> \<turnstile>\<^sub>\<Theta> \<Delta> \<Longrightarrow> \<phi>"} where @{text "\<Gamma>' \<supseteq> \<Gamma>"} and @{text "\<Delta> = \<Gamma>' - \<Gamma>"}. Note that @{text "\<Theta>"} remains unchanged here, only the @{text "\<Gamma>"} part is affected. \end{itemize} \medskip By modeling the main characteristics of the primitive @{text "\<Theta>"} and @{text "\<Gamma>"} above, and abstracting over any particular logical content, we arrive at the fundamental notions of \emph{theory context} and \emph{proof context} in Isabelle/Isar. These implement a certain policy to manage arbitrary \emph{context data}. There is a strongly-typed mechanism to declare new kinds of data at compile time. The internal bootstrap process of Isabelle/Pure eventually reaches a stage where certain data slots provide the logical content of @{text "\<Theta>"} and @{text "\<Gamma>"} sketched above, but this does not stop there! Various additional data slots support all kinds of mechanisms that are not necessarily part of the core logic. For example, there would be data for canonical introduction and elimination rules for arbitrary operators (depending on the object-logic and application), which enables users to perform standard proof steps implicitly (cf.\ the @{text "rule"} method \cite{isabelle-isar-ref}). \medskip Thus Isabelle/Isar is able to bring forth more and more concepts successively. In particular, an object-logic like Isabelle/HOL continues the Isabelle/Pure setup by adding specific components for automated reasoning (classical reasoner, tableau prover, structured induction etc.) and derived specification mechanisms (inductive predicates, recursive functions etc.). All of this is ultimately based on the generic data management by theory and proof contexts introduced here. *} subsection {* Theory context \label{sec:context-theory} *} text {* A \emph{theory} is a data container with explicit name and unique identifier. Theories are related by a (nominal) sub-theory relation, which corresponds to the dependency graph of the original construction; each theory is derived from a certain sub-graph of ancestor theories. To this end, the system maintains a set of symbolic ``identification stamps'' within each theory. In order to avoid the full-scale overhead of explicit sub-theory identification of arbitrary intermediate stages, a theory is switched into @{text "draft"} mode under certain circumstances. A draft theory acts like a linear type, where updates invalidate earlier versions. An invalidated draft is called \emph{stale}. The @{text "checkpoint"} operation produces a safe stepping stone that will survive the next update without becoming stale: both the old and the new theory remain valid and are related by the sub-theory relation. Checkpointing essentially recovers purely functional theory values, at the expense of some extra internal bookkeeping. The @{text "copy"} operation produces an auxiliary version that has the same data content, but is unrelated to the original: updates of the copy do not affect the original, neither does the sub-theory relation hold. The @{text "merge"} operation produces the least upper bound of two theories, which actually degenerates into absorption of one theory into the other (according to the nominal sub-theory relation). The @{text "begin"} operation starts a new theory by importing several parent theories and entering a special mode of nameless incremental updates, until the final @{text "end"} operation is performed. \medskip The example in \figref{fig:ex-theory} below shows a theory graph derived from @{text "Pure"}, with theory @{text "Length"} importing @{text "Nat"} and @{text "List"}. The body of @{text "Length"} consists of a sequence of updates, working mostly on drafts internally, while transaction boundaries of Isar top-level commands (\secref{sec:isar-toplevel}) are guaranteed to be safe checkpoints. \begin{figure}[htb] \begin{center} \begin{tabular}{rcccl} & & @{text "Pure"} \\ & & @{text "\<down>"} \\ & & @{text "FOL"} \\ & $\swarrow$ & & $\searrow$ & \\ @{text "Nat"} & & & & @{text "List"} \\ & $\searrow$ & & $\swarrow$ \\ & & @{text "Length"} \\ & & \multicolumn{3}{l}{~~@{keyword "imports"}} \\ & & \multicolumn{3}{l}{~~@{keyword "begin"}} \\ & & $\vdots$~~ \\ & & @{text "\<bullet>"}~~ \\ & & $\vdots$~~ \\ & & @{text "\<bullet>"}~~ \\ & & $\vdots$~~ \\ & & \multicolumn{3}{l}{~~@{command "end"}} \\ \end{tabular} \caption{A theory definition depending on ancestors}\label{fig:ex-theory} \end{center} \end{figure} \medskip There is a separate notion of \emph{theory reference} for maintaining a live link to an evolving theory context: updates on drafts are propagated automatically. Dynamic updating stops when the next @{text "checkpoint"} is reached. Derived entities may store a theory reference in order to indicate the formal context from which they are derived. This implicitly assumes monotonic reasoning, because the referenced context may become larger without further notice. *} text %mlref {* \begin{mldecls} @{index_ML_type theory} \\ @{index_ML Theory.eq_thy: "theory * theory -> bool"} \\ @{index_ML Theory.subthy: "theory * theory -> bool"} \\ @{index_ML Theory.checkpoint: "theory -> theory"} \\ @{index_ML Theory.copy: "theory -> theory"} \\ @{index_ML Theory.merge: "theory * theory -> theory"} \\ @{index_ML Theory.begin_theory: "string -> theory list -> theory"} \\ @{index_ML Theory.parents_of: "theory -> theory list"} \\ @{index_ML Theory.ancestors_of: "theory -> theory list"} \\ \end{mldecls} \begin{mldecls} @{index_ML_type theory_ref} \\ @{index_ML Theory.deref: "theory_ref -> theory"} \\ @{index_ML Theory.check_thy: "theory -> theory_ref"} \\ \end{mldecls} \begin{description} \item Type @{ML_type theory} represents theory contexts. This is essentially a linear type, with explicit runtime checking. Primitive theory operations destroy the original version, which then becomes ``stale''. This can be prevented by explicit checkpointing, which the system does at least at the boundary of toplevel command transactions \secref{sec:isar-toplevel}. \item @{ML "Theory.eq_thy"}~@{text "(thy\<^sub>1, thy\<^sub>2)"} check strict identity of two theories. \item @{ML "Theory.subthy"}~@{text "(thy\<^sub>1, thy\<^sub>2)"} compares theories according to the intrinsic graph structure of the construction. This sub-theory relation is a nominal approximation of inclusion (@{text "\<subseteq>"}) of the corresponding content (according to the semantics of the ML modules that implement the data). \item @{ML "Theory.checkpoint"}~@{text "thy"} produces a safe stepping stone in the linear development of @{text "thy"}. This changes the old theory, but the next update will result in two related, valid theories. \item @{ML "Theory.copy"}~@{text "thy"} produces a variant of @{text "thy"} with the same data. The copy is not related to the original, but the original is unchanged. \item @{ML "Theory.merge"}~@{text "(thy\<^sub>1, thy\<^sub>2)"} absorbs one theory into the other, without changing @{text "thy\<^sub>1"} or @{text "thy\<^sub>2"}. This version of ad-hoc theory merge fails for unrelated theories! \item @{ML "Theory.begin_theory"}~@{text "name parents"} constructs a new theory based on the given parents. This ML function is normally not invoked directly. \item @{ML "Theory.parents_of"}~@{text "thy"} returns the direct ancestors of @{text thy}. \item @{ML "Theory.ancestors_of"}~@{text "thy"} returns all ancestors of @{text thy} (not including @{text thy} itself). \item Type @{ML_type theory_ref} represents a sliding reference to an always valid theory; updates on the original are propagated automatically. \item @{ML "Theory.deref"}~@{text "thy_ref"} turns a @{ML_type "theory_ref"} into an @{ML_type "theory"} value. As the referenced theory evolves monotonically over time, later invocations of @{ML "Theory.deref"} may refer to a larger context. \item @{ML "Theory.check_thy"}~@{text "thy"} produces a @{ML_type "theory_ref"} from a valid @{ML_type "theory"} value. \end{description} *} text %mlantiq {* \begin{matharray}{rcl} @{ML_antiquotation_def "theory"} & : & @{text ML_antiquotation} \\ \end{matharray} @{rail " @@{ML_antiquotation theory} nameref? "} \begin{description} \item @{text "@{theory}"} refers to the background theory of the current context --- as abstract value. \item @{text "@{theory A}"} refers to an explicitly named ancestor theory @{text "A"} of the background theory of the current context --- as abstract value. \end{description} *} subsection {* Proof context \label{sec:context-proof} *} text {* A proof context is a container for pure data with a back-reference to the theory from which it is derived. The @{text "init"} operation creates a proof context from a given theory. Modifications to draft theories are propagated to the proof context as usual, but there is also an explicit @{text "transfer"} operation to force resynchronization with more substantial updates to the underlying theory. Entities derived in a proof context need to record logical requirements explicitly, since there is no separate context identification or symbolic inclusion as for theories. For example, hypotheses used in primitive derivations (cf.\ \secref{sec:thms}) are recorded separately within the sequent @{text "\<Gamma> \<turnstile> \<phi>"}, just to make double sure. Results could still leak into an alien proof context due to programming errors, but Isabelle/Isar includes some extra validity checks in critical positions, notably at the end of a sub-proof. Proof contexts may be manipulated arbitrarily, although the common discipline is to follow block structure as a mental model: a given context is extended consecutively, and results are exported back into the original context. Note that an Isar proof state models block-structured reasoning explicitly, using a stack of proof contexts internally. For various technical reasons, the background theory of an Isar proof state must not be changed while the proof is still under construction! *} text %mlref {* \begin{mldecls} @{index_ML_type Proof.context} \\ @{index_ML Proof_Context.init_global: "theory -> Proof.context"} \\ @{index_ML Proof_Context.theory_of: "Proof.context -> theory"} \\ @{index_ML Proof_Context.transfer: "theory -> Proof.context -> Proof.context"} \\ \end{mldecls} \begin{description} \item Type @{ML_type Proof.context} represents proof contexts. Elements of this type are essentially pure values, with a sliding reference to the background theory. \item @{ML Proof_Context.init_global}~@{text "thy"} produces a proof context derived from @{text "thy"}, initializing all data. \item @{ML Proof_Context.theory_of}~@{text "ctxt"} selects the background theory from @{text "ctxt"}, dereferencing its internal @{ML_type theory_ref}. \item @{ML Proof_Context.transfer}~@{text "thy ctxt"} promotes the background theory of @{text "ctxt"} to the super theory @{text "thy"}. \end{description} *} text %mlantiq {* \begin{matharray}{rcl} @{ML_antiquotation_def "context"} & : & @{text ML_antiquotation} \\ \end{matharray} \begin{description} \item @{text "@{context}"} refers to \emph{the} context at compile-time --- as abstract value. Independently of (local) theory or proof mode, this always produces a meaningful result. This is probably the most common antiquotation in interactive experimentation with ML inside Isar. \end{description} *} subsection {* Generic contexts \label{sec:generic-context} *} text {* A generic context is the disjoint sum of either a theory or proof context. Occasionally, this enables uniform treatment of generic context data, typically extra-logical information. Operations on generic contexts include the usual injections, partial selections, and combinators for lifting operations on either component of the disjoint sum. Moreover, there are total operations @{text "theory_of"} and @{text "proof_of"} to convert a generic context into either kind: a theory can always be selected from the sum, while a proof context might have to be constructed by an ad-hoc @{text "init"} operation, which incurs a small runtime overhead. *} text %mlref {* \begin{mldecls} @{index_ML_type Context.generic} \\ @{index_ML Context.theory_of: "Context.generic -> theory"} \\ @{index_ML Context.proof_of: "Context.generic -> Proof.context"} \\ \end{mldecls} \begin{description} \item Type @{ML_type Context.generic} is the direct sum of @{ML_type "theory"} and @{ML_type "Proof.context"}, with the datatype constructors @{ML "Context.Theory"} and @{ML "Context.Proof"}. \item @{ML Context.theory_of}~@{text "context"} always produces a theory from the generic @{text "context"}, using @{ML "Proof_Context.theory_of"} as required. \item @{ML Context.proof_of}~@{text "context"} always produces a proof context from the generic @{text "context"}, using @{ML "Proof_Context.init_global"} as required (note that this re-initializes the context data with each invocation). \end{description} *} subsection {* Context data \label{sec:context-data} *} text {* The main purpose of theory and proof contexts is to manage arbitrary (pure) data. New data types can be declared incrementally at compile time. There are separate declaration mechanisms for any of the three kinds of contexts: theory, proof, generic. \paragraph{Theory data} declarations need to implement the following SML signature: \medskip \begin{tabular}{ll} @{text "\<type> T"} & representing type \\ @{text "\<val> empty: T"} & empty default value \\ @{text "\<val> extend: T \<rightarrow> T"} & re-initialize on import \\ @{text "\<val> merge: T \<times> T \<rightarrow> T"} & join on import \\ \end{tabular} \medskip The @{text "empty"} value acts as initial default for \emph{any} theory that does not declare actual data content; @{text "extend"} is acts like a unitary version of @{text "merge"}. Implementing @{text "merge"} can be tricky. The general idea is that @{text "merge (data\<^sub>1, data\<^sub>2)"} inserts those parts of @{text "data\<^sub>2"} into @{text "data\<^sub>1"} that are not yet present, while keeping the general order of things. The @{ML Library.merge} function on plain lists may serve as canonical template. Particularly note that shared parts of the data must not be duplicated by naive concatenation, or a theory graph that is like a chain of diamonds would cause an exponential blowup! \paragraph{Proof context data} declarations need to implement the following SML signature: \medskip \begin{tabular}{ll} @{text "\<type> T"} & representing type \\ @{text "\<val> init: theory \<rightarrow> T"} & produce initial value \\ \end{tabular} \medskip The @{text "init"} operation is supposed to produce a pure value from the given background theory and should be somehow ``immediate''. Whenever a proof context is initialized, which happens frequently, the the system invokes the @{text "init"} operation of \emph{all} theory data slots ever declared. This also means that one needs to be economic about the total number of proof data declarations in the system, i.e.\ each ML module should declare at most one, sometimes two data slots for its internal use. Repeated data declarations to simulate a record type should be avoided! \paragraph{Generic data} provides a hybrid interface for both theory and proof data. The @{text "init"} operation for proof contexts is predefined to select the current data value from the background theory. \bigskip Any of the above data declarations over type @{text "T"} result in an ML structure with the following signature: \medskip \begin{tabular}{ll} @{text "get: context \<rightarrow> T"} \\ @{text "put: T \<rightarrow> context \<rightarrow> context"} \\ @{text "map: (T \<rightarrow> T) \<rightarrow> context \<rightarrow> context"} \\ \end{tabular} \medskip These other operations provide exclusive access for the particular kind of context (theory, proof, or generic context). This interface observes the ML discipline for types and scopes: there is no other way to access the corresponding data slot of a context. By keeping these operations private, an Isabelle/ML module may maintain abstract values authentically. *} text %mlref {* \begin{mldecls} @{index_ML_functor Theory_Data} \\ @{index_ML_functor Proof_Data} \\ @{index_ML_functor Generic_Data} \\ \end{mldecls} \begin{description} \item @{ML_functor Theory_Data}@{text "(spec)"} declares data for type @{ML_type theory} according to the specification provided as argument structure. The resulting structure provides data init and access operations as described above. \item @{ML_functor Proof_Data}@{text "(spec)"} is analogous to @{ML_functor Theory_Data} for type @{ML_type Proof.context}. \item @{ML_functor Generic_Data}@{text "(spec)"} is analogous to @{ML_functor Theory_Data} for type @{ML_type Context.generic}. \end{description} *} text %mlex {* The following artificial example demonstrates theory data: we maintain a set of terms that are supposed to be wellformed wrt.\ the enclosing theory. The public interface is as follows: *} ML {* signature WELLFORMED_TERMS = sig val get: theory -> term list val add: term -> theory -> theory end; *} text {* The implementation uses private theory data internally, and only exposes an operation that involves explicit argument checking wrt.\ the given theory. *} ML {* structure Wellformed_Terms: WELLFORMED_TERMS = struct structure Terms = Theory_Data ( type T = term Ord_List.T; val empty = []; val extend = I; fun merge (ts1, ts2) = Ord_List.union Term_Ord.fast_term_ord ts1 ts2; ); val get = Terms.get; fun add raw_t thy = let val t = Sign.cert_term thy raw_t; in Terms.map (Ord_List.insert Term_Ord.fast_term_ord t) thy end; end; *} text {* Type @{ML_type "term Ord_List.T"} is used for reasonably efficient representation of a set of terms: all operations are linear in the number of stored elements. Here we assume that users of this module do not care about the declaration order, since that data structure forces its own arrangement of elements. Observe how the @{ML_text merge} operation joins the data slots of the two constituents: @{ML Ord_List.union} prevents duplication of common data from different branches, thus avoiding the danger of exponential blowup. Plain list append etc.\ must never be used for theory data merges! \medskip Our intended invariant is achieved as follows: \begin{enumerate} \item @{ML Wellformed_Terms.add} only admits terms that have passed the @{ML Sign.cert_term} check of the given theory at that point. \item Wellformedness in the sense of @{ML Sign.cert_term} is monotonic wrt.\ the sub-theory relation. So our data can move upwards in the hierarchy (via extension or merges), and maintain wellformedness without further checks. \end{enumerate} Note that all basic operations of the inference kernel (which includes @{ML Sign.cert_term}) observe this monotonicity principle, but other user-space tools don't. For example, fully-featured type-inference via @{ML Syntax.check_term} (cf.\ \secref{sec:term-check}) is not necessarily monotonic wrt.\ the background theory, since constraints of term constants can be modified by later declarations, for example. In most cases, user-space context data does not have to take such invariants too seriously. The situation is different in the implementation of the inference kernel itself, which uses the very same data mechanisms for types, constants, axioms etc. *} subsection {* Configuration options \label{sec:config-options} *} text {* A \emph{configuration option} is a named optional value of some basic type (Boolean, integer, string) that is stored in the context. It is a simple application of general context data (\secref{sec:context-data}) that is sufficiently common to justify customized setup, which includes some concrete declarations for end-users using existing notation for attributes (cf.\ \secref{sec:attributes}). For example, the predefined configuration option @{attribute show_types} controls output of explicit type constraints for variables in printed terms (cf.\ \secref{sec:read-print}). Its value can be modified within Isar text like this: *} declare [[show_types = false]] -- {* declaration within (local) theory context *} notepad begin note [[show_types = true]] -- {* declaration within proof (forward mode) *} term x have "x = x" using [[show_types = false]] -- {* declaration within proof (backward mode) *} .. end text {* Configuration options that are not set explicitly hold a default value that can depend on the application context. This allows to retrieve the value from another slot within the context, or fall back on a global preference mechanism, for example. The operations to declare configuration options and get/map their values are modeled as direct replacements for historic global references, only that the context is made explicit. This allows easy configuration of tools, without relying on the execution order as required for old-style mutable references. *} text %mlref {* \begin{mldecls} @{index_ML Config.get: "Proof.context -> 'a Config.T -> 'a"} \\ @{index_ML Config.map: "'a Config.T -> ('a -> 'a) -> Proof.context -> Proof.context"} \\ @{index_ML Attrib.setup_config_bool: "binding -> (Context.generic -> bool) -> bool Config.T"} \\ @{index_ML Attrib.setup_config_int: "binding -> (Context.generic -> int) -> int Config.T"} \\ @{index_ML Attrib.setup_config_real: "binding -> (Context.generic -> real) -> real Config.T"} \\ @{index_ML Attrib.setup_config_string: "binding -> (Context.generic -> string) -> string Config.T"} \\ \end{mldecls} \begin{description} \item @{ML Config.get}~@{text "ctxt config"} gets the value of @{text "config"} in the given context. \item @{ML Config.map}~@{text "config f ctxt"} updates the context by updating the value of @{text "config"}. \item @{text "config ="}~@{ML Attrib.setup_config_bool}~@{text "name default"} creates a named configuration option of type @{ML_type bool}, with the given @{text "default"} depending on the application context. The resulting @{text "config"} can be used to get/map its value in a given context. There is an implicit update of the background theory that registers the option as attribute with some concrete syntax. \item @{ML Attrib.config_int}, @{ML Attrib.config_real}, and @{ML Attrib.config_string} work like @{ML Attrib.config_bool}, but for types @{ML_type int} and @{ML_type string}, respectively. \end{description} *} text %mlex {* The following example shows how to declare and use a Boolean configuration option called @{text "my_flag"} with constant default value @{ML false}. *} ML {* val my_flag = Attrib.setup_config_bool @{binding my_flag} (K false) *} text {* Now the user can refer to @{attribute my_flag} in declarations, while ML tools can retrieve the current value from the context via @{ML Config.get}. *} ML_val {* @{assert} (Config.get @{context} my_flag = false) *} declare [[my_flag = true]] ML_val {* @{assert} (Config.get @{context} my_flag = true) *} notepad begin { note [[my_flag = false]] ML_val {* @{assert} (Config.get @{context} my_flag = false) *} } ML_val {* @{assert} (Config.get @{context} my_flag = true) *} end text {* Here is another example involving ML type @{ML_type real} (floating-point numbers). *} ML {* val airspeed_velocity = Attrib.setup_config_real @{binding airspeed_velocity} (K 0.0) *} declare [[airspeed_velocity = 10]] declare [[airspeed_velocity = 9.9]] section {* Names \label{sec:names} *} text {* In principle, a name is just a string, but there are various conventions for representing additional structure. For example, ``@{text "Foo.bar.baz"}'' is considered as a long name consisting of qualifier @{text "Foo.bar"} and base name @{text "baz"}. The individual constituents of a name may have further substructure, e.g.\ the string ``\verb,\,\verb,<alpha>,'' encodes as a single symbol. \medskip Subsequently, we shall introduce specific categories of names. Roughly speaking these correspond to logical entities as follows: \begin{itemize} \item Basic names (\secref{sec:basic-name}): free and bound variables. \item Indexed names (\secref{sec:indexname}): schematic variables. \item Long names (\secref{sec:long-name}): constants of any kind (type constructors, term constants, other concepts defined in user space). Such entities are typically managed via name spaces (\secref{sec:name-space}). \end{itemize} *} subsection {* Strings of symbols \label{sec:symbols} *} text {* A \emph{symbol} constitutes the smallest textual unit in Isabelle --- raw ML characters are normally not encountered at all! Isabelle strings consist of a sequence of symbols, represented as a packed string or an exploded list of strings. Each symbol is in itself a small string, which has either one of the following forms: \begin{enumerate} \item a single ASCII character ``@{text "c"}'', for example ``\verb,a,'', \item a codepoint according to UTF8 (non-ASCII byte sequence), \item a regular symbol ``\verb,\,\verb,<,@{text "ident"}\verb,>,'', for example ``\verb,\,\verb,<alpha>,'', \item a control symbol ``\verb,\,\verb,<^,@{text "ident"}\verb,>,'', for example ``\verb,\,\verb,<^bold>,'', \item a raw symbol ``\verb,\,\verb,<^raw:,@{text text}\verb,>,'' where @{text text} consists of printable characters excluding ``\verb,.,'' and ``\verb,>,'', for example ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'', \item a numbered raw control symbol ``\verb,\,\verb,<^raw,@{text n}\verb,>, where @{text n} consists of digits, for example ``\verb,\,\verb,<^raw42>,''. \end{enumerate} The @{text "ident"} syntax for symbol names is @{text "letter (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text "digit = 0..9"}. There are infinitely many regular symbols and control symbols, but a fixed collection of standard symbols is treated specifically. For example, ``\verb,\,\verb,<alpha>,'' is classified as a letter, which means it may occur within regular Isabelle identifiers. The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit character sequences are passed-through unchanged. Unicode/UCS data in UTF-8 encoding is processed in a non-strict fashion, such that well-formed code sequences are recognized accordingly.\footnote{Note that ISO-Latin-1 differs from UTF-8 only in some special punctuation characters that even have replacements within the standard collection of Isabelle symbols. Text consisting of ASCII plus accented letters can be processed in either encoding.} Unicode provides its own collection of mathematical symbols, but within the core Isabelle/ML world there is no link to the standard collection of Isabelle regular symbols. \medskip Output of Isabelle symbols depends on the print mode (\cite{isabelle-isar-ref}). For example, the standard {\LaTeX} setup of the Isabelle document preparation system would present ``\verb,\,\verb,<alpha>,'' as @{text "\<alpha>"}, and ``\verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as @{text "\<^bold>\<alpha>"}. On-screen rendering usually works by mapping a finite subset of Isabelle symbols to suitable Unicode characters. *} text %mlref {* \begin{mldecls} @{index_ML_type "Symbol.symbol": string} \\ @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\ @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\ @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\ @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\ @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\ \end{mldecls} \begin{mldecls} @{index_ML_type "Symbol.sym"} \\ @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\ \end{mldecls} \begin{description} \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle symbols. \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list from the packed form. This function supersedes @{ML "String.explode"} for virtually all purposes of manipulating text in Isabelle!\footnote{The runtime overhead for exploded strings is mainly that of the list structure: individual symbols that happen to be a singleton string do not require extra memory in Poly/ML.} \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard symbols according to fixed syntactic conventions of Isabelle, cf.\ \cite{isabelle-isar-ref}. \item Type @{ML_type "Symbol.sym"} is a concrete datatype that represents the different kinds of symbols explicitly, with constructors @{ML "Symbol.Char"}, @{ML "Symbol.Sym"}, @{ML "Symbol.UTF8"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"}. \item @{ML "Symbol.decode"} converts the string representation of a symbol into the datatype version. \end{description} \paragraph{Historical note.} In the original SML90 standard the primitive ML type @{ML_type char} did not exists, and the @{ML_text "explode: string -> string list"} operation would produce a list of singleton strings as does @{ML "raw_explode: string -> string list"} in Isabelle/ML today. When SML97 came out, Isabelle did not adopt its slightly anachronistic 8-bit characters, but the idea of exploding a string into a list of small strings was extended to ``symbols'' as explained above. Thus Isabelle sources can refer to an infinite store of user-defined symbols, without having to worry about the multitude of Unicode encodings. *} subsection {* Basic names \label{sec:basic-name} *} text {* A \emph{basic name} essentially consists of a single Isabelle identifier. There are conventions to mark separate classes of basic names, by attaching a suffix of underscores: one underscore means \emph{internal name}, two underscores means \emph{Skolem name}, three underscores means \emph{internal Skolem name}. For example, the basic name @{text "foo"} has the internal version @{text "foo_"}, with Skolem versions @{text "foo__"} and @{text "foo___"}, respectively. These special versions provide copies of the basic name space, apart from anything that normally appears in the user text. For example, system generated variables in Isar proof contexts are usually marked as internal, which prevents mysterious names like @{text "xaa"} to appear in human-readable text. \medskip Manipulating binding scopes often requires on-the-fly renamings. A \emph{name context} contains a collection of already used names. The @{text "declare"} operation adds names to the context. The @{text "invents"} operation derives a number of fresh names from a given starting point. For example, the first three names derived from @{text "a"} are @{text "a"}, @{text "b"}, @{text "c"}. The @{text "variants"} operation produces fresh names by incrementing tentative names as base-26 numbers (with digits @{text "a..z"}) until all clashes are resolved. For example, name @{text "foo"} results in variants @{text "fooa"}, @{text "foob"}, @{text "fooc"}, \dots, @{text "fooaa"}, @{text "fooab"} etc.; each renaming step picks the next unused variant from this sequence. *} text %mlref {* \begin{mldecls} @{index_ML Name.internal: "string -> string"} \\ @{index_ML Name.skolem: "string -> string"} \\ \end{mldecls} \begin{mldecls} @{index_ML_type Name.context} \\ @{index_ML Name.context: Name.context} \\ @{index_ML Name.declare: "string -> Name.context -> Name.context"} \\ @{index_ML Name.invent: "Name.context -> string -> int -> string list"} \\ @{index_ML Name.variant: "string -> Name.context -> string * Name.context"} \\ \end{mldecls} \begin{mldecls} @{index_ML Variable.names_of: "Proof.context -> Name.context"} \\ \end{mldecls} \begin{description} \item @{ML Name.internal}~@{text "name"} produces an internal name by adding one underscore. \item @{ML Name.skolem}~@{text "name"} produces a Skolem name by adding two underscores. \item Type @{ML_type Name.context} represents the context of already used names; the initial value is @{ML "Name.context"}. \item @{ML Name.declare}~@{text "name"} enters a used name into the context. \item @{ML Name.invent}~@{text "context name n"} produces @{text "n"} fresh names derived from @{text "name"}. \item @{ML Name.variant}~@{text "name context"} produces a fresh variant of @{text "name"}; the result is declared to the context. \item @{ML Variable.names_of}~@{text "ctxt"} retrieves the context of declared type and term variable names. Projecting a proof context down to a primitive name context is occasionally useful when invoking lower-level operations. Regular management of ``fresh variables'' is done by suitable operations of structure @{ML_struct Variable}, which is also able to provide an official status of ``locally fixed variable'' within the logical environment (cf.\ \secref{sec:variables}). \end{description} *} text %mlex {* The following simple examples demonstrate how to produce fresh names from the initial @{ML Name.context}. *} ML {* val list1 = Name.invent Name.context "a" 5; @{assert} (list1 = ["a", "b", "c", "d", "e"]); val list2 = #1 (fold_map Name.variant ["x", "x", "a", "a", "'a", "'a"] Name.context); @{assert} (list2 = ["x", "xa", "a", "aa", "'a", "'aa"]); *} text {* \medskip The same works relatively to the formal context as follows. *} locale ex = fixes a b c :: 'a begin ML {* val names = Variable.names_of @{context}; val list1 = Name.invent names "a" 5; @{assert} (list1 = ["d", "e", "f", "g", "h"]); val list2 = #1 (fold_map Name.variant ["x", "x", "a", "a", "'a", "'a"] names); @{assert} (list2 = ["x", "xa", "aa", "ab", "'aa", "'ab"]); *} end subsection {* Indexed names \label{sec:indexname} *} text {* An \emph{indexed name} (or @{text "indexname"}) is a pair of a basic name and a natural number. This representation allows efficient renaming by incrementing the second component only. The canonical way to rename two collections of indexnames apart from each other is this: determine the maximum index @{text "maxidx"} of the first collection, then increment all indexes of the second collection by @{text "maxidx + 1"}; the maximum index of an empty collection is @{text "-1"}. Occasionally, basic names are injected into the same pair type of indexed names: then @{text "(x, -1)"} is used to encode the basic name @{text "x"}. \medskip Isabelle syntax observes the following rules for representing an indexname @{text "(x, i)"} as a packed string: \begin{itemize} \item @{text "?x"} if @{text "x"} does not end with a digit and @{text "i = 0"}, \item @{text "?xi"} if @{text "x"} does not end with a digit, \item @{text "?x.i"} otherwise. \end{itemize} Indexnames may acquire large index numbers after several maxidx shifts have been applied. Results are usually normalized towards @{text "0"} at certain checkpoints, notably at the end of a proof. This works by producing variants of the corresponding basic name components. For example, the collection @{text "?x1, ?x7, ?x42"} becomes @{text "?x, ?xa, ?xb"}. *} text %mlref {* \begin{mldecls} @{index_ML_type indexname: "string * int"} \\ \end{mldecls} \begin{description} \item Type @{ML_type indexname} represents indexed names. This is an abbreviation for @{ML_type "string * int"}. The second component is usually non-negative, except for situations where @{text "(x, -1)"} is used to inject basic names into this type. Other negative indexes should not be used. \end{description} *} subsection {* Long names \label{sec:long-name} *} text {* A \emph{long name} consists of a sequence of non-empty name components. The packed representation uses a dot as separator, as in ``@{text "A.b.c"}''. The last component is called \emph{base name}, the remaining prefix is called \emph{qualifier} (which may be empty). The qualifier can be understood as the access path to the named entity while passing through some nested block-structure, although our free-form long names do not really enforce any strict discipline. For example, an item named ``@{text "A.b.c"}'' may be understood as a local entity @{text "c"}, within a local structure @{text "b"}, within a global structure @{text "A"}. In practice, long names usually represent 1--3 levels of qualification. User ML code should not make any assumptions about the particular structure of long names! The empty name is commonly used as an indication of unnamed entities, or entities that are not entered into the corresponding name space, whenever this makes any sense. The basic operations on long names map empty names again to empty names. *} text %mlref {* \begin{mldecls} @{index_ML Long_Name.base_name: "string -> string"} \\ @{index_ML Long_Name.qualifier: "string -> string"} \\ @{index_ML Long_Name.append: "string -> string -> string"} \\ @{index_ML Long_Name.implode: "string list -> string"} \\ @{index_ML Long_Name.explode: "string -> string list"} \\ \end{mldecls} \begin{description} \item @{ML Long_Name.base_name}~@{text "name"} returns the base name of a long name. \item @{ML Long_Name.qualifier}~@{text "name"} returns the qualifier of a long name. \item @{ML Long_Name.append}~@{text "name\<^isub>1 name\<^isub>2"} appends two long names. \item @{ML Long_Name.implode}~@{text "names"} and @{ML Long_Name.explode}~@{text "name"} convert between the packed string representation and the explicit list form of long names. \end{description} *} subsection {* Name spaces \label{sec:name-space} *} text {* A @{text "name space"} manages a collection of long names, together with a mapping between partially qualified external names and fully qualified internal names (in both directions). Note that the corresponding @{text "intern"} and @{text "extern"} operations are mostly used for parsing and printing only! The @{text "declare"} operation augments a name space according to the accesses determined by a given binding, and a naming policy from the context. \medskip A @{text "binding"} specifies details about the prospective long name of a newly introduced formal entity. It consists of a base name, prefixes for qualification (separate ones for system infrastructure and user-space mechanisms), a slot for the original source position, and some additional flags. \medskip A @{text "naming"} provides some additional details for producing a long name from a binding. Normally, the naming is implicit in the theory or proof context. The @{text "full"} operation (and its variants for different context types) produces a fully qualified internal name to be entered into a name space. The main equation of this ``chemical reaction'' when binding new entities in a context is as follows: \medskip \begin{tabular}{l} @{text "binding + naming \<longrightarrow> long name + name space accesses"} \end{tabular} \bigskip As a general principle, there is a separate name space for each kind of formal entity, e.g.\ fact, logical constant, type constructor, type class. It is usually clear from the occurrence in concrete syntax (or from the scope) which kind of entity a name refers to. For example, the very same name @{text "c"} may be used uniformly for a constant, type constructor, and type class. There are common schemes to name derived entities systematically according to the name of the main logical entity involved, e.g.\ fact @{text "c.intro"} for a canonical introduction rule related to constant @{text "c"}. This technique of mapping names from one space into another requires some care in order to avoid conflicts. In particular, theorem names derived from a type constructor or type class should get an additional suffix in addition to the usual qualification. This leads to the following conventions for derived names: \medskip \begin{tabular}{ll} logical entity & fact name \\\hline constant @{text "c"} & @{text "c.intro"} \\ type @{text "c"} & @{text "c_type.intro"} \\ class @{text "c"} & @{text "c_class.intro"} \\ \end{tabular} *} text %mlref {* \begin{mldecls} @{index_ML_type binding} \\ @{index_ML Binding.empty: binding} \\ @{index_ML Binding.name: "string -> binding"} \\ @{index_ML Binding.qualify: "bool -> string -> binding -> binding"} \\ @{index_ML Binding.prefix: "bool -> string -> binding -> binding"} \\ @{index_ML Binding.conceal: "binding -> binding"} \\ @{index_ML Binding.print: "binding -> string"} \\ \end{mldecls} \begin{mldecls} @{index_ML_type Name_Space.naming} \\ @{index_ML Name_Space.default_naming: Name_Space.naming} \\ @{index_ML Name_Space.add_path: "string -> Name_Space.naming -> Name_Space.naming"} \\ @{index_ML Name_Space.full_name: "Name_Space.naming -> binding -> string"} \\ \end{mldecls} \begin{mldecls} @{index_ML_type Name_Space.T} \\ @{index_ML Name_Space.empty: "string -> Name_Space.T"} \\ @{index_ML Name_Space.merge: "Name_Space.T * Name_Space.T -> Name_Space.T"} \\ @{index_ML Name_Space.declare: "Proof.context -> bool -> Name_Space.naming -> binding -> Name_Space.T -> string * Name_Space.T"} \\ @{index_ML Name_Space.intern: "Name_Space.T -> string -> string"} \\ @{index_ML Name_Space.extern: "Proof.context -> Name_Space.T -> string -> string"} \\ @{index_ML Name_Space.is_concealed: "Name_Space.T -> string -> bool"} \end{mldecls} \begin{description} \item Type @{ML_type binding} represents the abstract concept of name bindings. \item @{ML Binding.empty} is the empty binding. \item @{ML Binding.name}~@{text "name"} produces a binding with base name @{text "name"}. Note that this lacks proper source position information; see also the ML antiquotation @{ML_antiquotation binding}. \item @{ML Binding.qualify}~@{text "mandatory name binding"} prefixes qualifier @{text "name"} to @{text "binding"}. The @{text "mandatory"} flag tells if this name component always needs to be given in name space accesses --- this is mostly @{text "false"} in practice. Note that this part of qualification is typically used in derived specification mechanisms. \item @{ML Binding.prefix} is similar to @{ML Binding.qualify}, but affects the system prefix. This part of extra qualification is typically used in the infrastructure for modular specifications, notably ``local theory targets'' (see also \chref{ch:local-theory}). \item @{ML Binding.conceal}~@{text "binding"} indicates that the binding shall refer to an entity that serves foundational purposes only. This flag helps to mark implementation details of specification mechanism etc. Other tools should not depend on the particulars of concealed entities (cf.\ @{ML Name_Space.is_concealed}). \item @{ML Binding.print}~@{text "binding"} produces a string representation for human-readable output, together with some formal markup that might get used in GUI front-ends, for example. \item Type @{ML_type Name_Space.naming} represents the abstract concept of a naming policy. \item @{ML Name_Space.default_naming} is the default naming policy. In a theory context, this is usually augmented by a path prefix consisting of the theory name. \item @{ML Name_Space.add_path}~@{text "path naming"} augments the naming policy by extending its path component. \item @{ML Name_Space.full_name}~@{text "naming binding"} turns a name binding (usually a basic name) into the fully qualified internal name, according to the given naming policy. \item Type @{ML_type Name_Space.T} represents name spaces. \item @{ML Name_Space.empty}~@{text "kind"} and @{ML Name_Space.merge}~@{text "(space\<^isub>1, space\<^isub>2)"} are the canonical operations for maintaining name spaces according to theory data management (\secref{sec:context-data}); @{text "kind"} is a formal comment to characterize the purpose of a name space. \item @{ML Name_Space.declare}~@{text "ctxt strict naming bindings space"} enters a name binding as fully qualified internal name into the name space, with external accesses determined by the naming policy. \item @{ML Name_Space.intern}~@{text "space name"} internalizes a (partially qualified) external name. This operation is mostly for parsing! Note that fully qualified names stemming from declarations are produced via @{ML "Name_Space.full_name"} and @{ML "Name_Space.declare"} (or their derivatives for @{ML_type theory} and @{ML_type Proof.context}). \item @{ML Name_Space.extern}~@{text "ctxt space name"} externalizes a (fully qualified) internal name. This operation is mostly for printing! User code should not rely on the precise result too much. \item @{ML Name_Space.is_concealed}~@{text "space name"} indicates whether @{text "name"} refers to a strictly private entity that other tools are supposed to ignore! \end{description} *} text %mlantiq {* \begin{matharray}{rcl} @{ML_antiquotation_def "binding"} & : & @{text ML_antiquotation} \\ \end{matharray} @{rail " @@{ML_antiquotation binding} name "} \begin{description} \item @{text "@{binding name}"} produces a binding with base name @{text "name"} and the source position taken from the concrete syntax of this antiquotation. In many situations this is more appropriate than the more basic @{ML Binding.name} function. \end{description} *} text %mlex {* The following example yields the source position of some concrete binding inlined into the text: *} ML {* Binding.pos_of @{binding here} *} text {* \medskip That position can be also printed in a message as follows: *} ML_command {* writeln ("Look here" ^ Position.str_of (Binding.pos_of @{binding here})) *} text {* This illustrates a key virtue of formalized bindings as opposed to raw specifications of base names: the system can use this additional information for feedback given to the user (error messages etc.). *} end