# HG changeset patch # User wenzelm # Date 1157467343 -7200 # Node ID 6d3f144cc1bdd9276aa98b9cf201c2ce577f4da3 # Parent a04bf731ceb64f132a18f2d2932b46f161993802 more on names; diff -r a04bf731ceb6 -r 6d3f144cc1bd doc-src/IsarImplementation/Thy/prelim.thy --- a/doc-src/IsarImplementation/Thy/prelim.thy Mon Sep 04 20:07:55 2006 +0200 +++ b/doc-src/IsarImplementation/Thy/prelim.thy Tue Sep 05 16:42:23 2006 +0200 @@ -301,7 +301,8 @@ \end{description} *} -subsection {* Context data *} + +subsection {* Context data \label{sec:context-data} *} text {* The main purpose of theory and proof contexts is to manage arbitrary @@ -403,100 +404,75 @@ *} -section {* Name spaces *} - -text %FIXME {* - FIXME tune - - By general convention, each kind of formal entities (logical - constant, type, type class, theorem, method etc.) lives in a - separate name space. It is usually clear from the syntactic context - of a name, which kind of entity it refers to. For example, proof - method @{text "foo"} vs.\ theorem @{text "foo"} vs.\ logical - constant @{text "foo"} are easily distinguished thanks to the design - of the concrete outer syntax. A notable exception are logical - identifiers within a term (\secref{sec:terms}): constants, fixed - variables, and bound variables all share the same identifier syntax, - but are distinguished by their scope. +section {* Names *} - Name spaces are organized uniformly, as a collection of qualified - names consisting of a sequence of basic name components separated by - dots: @{text "Bar.bar.foo"}, @{text "Bar.foo"}, and @{text "foo"} - are examples for qualified names. +text {* + In principle, a name is just a string, but there are various + convention for encoding additional structure. - Despite the independence of names of different kinds, certain naming - conventions may relate them to each other. For example, a constant - @{text "foo"} could be accompanied with theorems @{text - "foo.intro"}, @{text "foo.elim"}, @{text "foo.simps"} etc. The same - could happen for a type @{text "foo"}, but this is apt to cause - clashes in the theorem name space! To avoid this, there is an - additional convention to add a suffix that determines the original - kind. For example, constant @{text "foo"} could associated with - theorem @{text "foo.intro"}, type @{text "foo"} with theorem @{text - "foo_type.intro"}, and type class @{text "foo"} with @{text - "foo_class.intro"}. - - \medskip Name components are subdivided into \emph{symbols}, which - constitute the smallest textual unit in Isabelle --- raw characters - are normally not encountered. + For example, the string ``@{text "Foo.bar.baz"}'' is considered as a + qualified name. The most basic constituents of names may have their + own structure, e.g.\ the string ``\verb,\,\verb,,'' is + considered as a single symbol (printed as ``@{text "\"}''). *} subsection {* Strings of symbols *} -text %FIXME {* - FIXME tune +text {* + \glossary{Symbol}{The smallest unit of text in Isabelle, subsumes + plain ASCII characters as well as an infinite collection of named + symbols (for greek, math etc.).} - Isabelle strings consist of a sequence of - symbols\glossary{Symbol}{The smallest unit of text in Isabelle, - subsumes plain ASCII characters as well as an infinite collection of - named symbols (for greek, math etc.).}, which are either packed as - an actual @{text "string"}, or represented as a list. Each symbol - is in itself a small string of the following form: + A \emph{symbol} constitutes the smallest textual unit in Isabelle + --- raw characters are normally not encountered. Isabelle strings + consist of a sequence of symbols, represented as a packed string or + a list of symbols. Each symbol is in itself a small string, which + is of one of the following forms: \begin{enumerate} - \item either a singleton ASCII character ``@{text "c"}'' (with - character code 0--127), for example ``\verb,a,'', + \item singleton ASCII character ``@{text "c"}'' (character code + 0--127), for example ``\verb,a,'', - \item or a regular symbol ``\verb,\,\verb,<,@{text - "ident"}\verb,>,'', for example ``\verb,\,\verb,,'', + \item regular symbol ``\verb,\,\verb,<,@{text "ident"}\verb,>,'', + for example ``\verb,\,\verb,,'', - \item or a control symbol ``\verb,\,\verb,<^,@{text - "ident"}\verb,>,'', for example ``\verb,\,\verb,<^bold>,'', + \item control symbol ``\verb,\,\verb,<^,@{text "ident"}\verb,>,'', + for example ``\verb,\,\verb,<^bold>,'', - \item or a raw control symbol ``\verb,\,\verb,<^raw:,@{text - "\"}\verb,>,'' where ``@{text "\"}'' refers to any printable ASCII - character (excluding ``\verb,.,'' and ``\verb,>,'') or non-ASCII - character, for example ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'', + \item raw symbol ``\verb,\,\verb,<^raw:,@{text text}\verb,>,'' where + @{text text} is constists of printable characters excluding + ``\verb,.,'' and ``\verb,>,'', for example + ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'', - \item or a numbered raw control symbol ``\verb,\,\verb,<^raw,@{text - "nnn"}\verb,>, where @{text "nnn"} are digits, for example + \item numbered raw control symbol ``\verb,\,\verb,<^raw,@{text + n}\verb,>, where @{text n} consists of digits, for example ``\verb,\,\verb,<^raw42>,''. \end{enumerate} - The @{text "ident"} syntax for symbol names is @{text "letter - (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and - @{text "digit = 0..9"}. There are infinitely many regular symbols - and control symbols available, but a certain collection of standard - symbols is treated specifically. For example, + \noindent The @{text "ident"} syntax for symbol names is @{text + "letter (letter | digit)\<^sup>*"}, where @{text "letter = + A..Za..z"} and @{text "digit = 0..9"}. There are infinitely many + regular symbols and control symbols, but a fixed collection of + standard symbols is treated specifically. For example, ``\verb,\,\verb,,'' is classified as a (non-ASCII) letter, which means it may occur within regular Isabelle identifier syntax. - Output of symbols depends on the print mode - (\secref{sec:print-mode}). For example, the standard {\LaTeX} setup - of the Isabelle document preparation system would present + Note that the character set underlying Isabelle symbols is plain + 7-bit ASCII. Since 8-bit characters are passed through + transparently, Isabelle may process Unicode/UCS data (in UTF-8 + encoding) as well. Unicode provides its own collection of + mathematical symbols, but there is no built-in link to the ones of + Isabelle. + + \medskip Output of Isabelle symbols depends on the print mode + (\secref{FIXME}). For example, the standard {\LaTeX} setup of the + Isabelle document preparation system would present ``\verb,\,\verb,,'' as @{text "\"}, and ``\verb,\,\verb,<^bold>,\verb,\,\verb,,'' as @{text "\<^bold>\"}. - - \medskip It is important to note that the character set underlying - Isabelle symbols is plain 7-bit ASCII. Since 8-bit characters are - passed through transparently, Isabelle may easily process - Unicode/UCS data as well (using UTF-8 encoding). Unicode provides - its own collection of mathematical symbols, but there is no built-in - link to the ones of Isabelle. *} text %mlref {* @@ -517,16 +493,15 @@ type is an alias for @{ML_type "string"}, but emphasizes the specific format encountered here. - \item @{ML "Symbol.explode"}~@{text "s"} produces a symbol list from - the packed form that is encountered in most practical situations. - This function supercedes @{ML "String.explode"} for virtually all - purposes of manipulating text in Isabelle! Plain @{ML "implode"} - may still be used for the reverse operation. + \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list + from the packed form that. This function supercedes @{ML + "String.explode"} for virtually all purposes of manipulating text in + Isabelle! \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML - "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify certain symbols - (both ASCII and several named ones) according to fixed syntactic - conventions of Isabelle, cf.\ \cite{isabelle-isar-ref}. + "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard + symbols according to fixed syntactic conventions of Isabelle, cf.\ + \cite{isabelle-isar-ref}. \item @{ML_type "Symbol.sym"} is a concrete datatype that represents the different kinds of symbols explicitly with constructors @{ML @@ -540,46 +515,264 @@ *} -subsection {* Qualified names *} +subsection {* Basic names \label{sec:basic-names} *} + +text {* + A \emph{basic name} essentially consists of a single Isabelle + identifier. There are conventions to mark separate classes of basic + names, by attaching a suffix of underscores (@{text "_"}): one + underscore means \emph{internal name}, two underscores means + \emph{Skolem name}, three underscores means \emph{internal Skolem + name}. + + For example, the basic name @{text "foo"} has the internal version + @{text "foo_"}, with Skolem versions @{text "foo__"} and @{text + "foo___"}, respectively. + + Such special versions are required for bookkeeping of names that are + apart from anything that may appear in the text given by the user. + In particular, system generated variables in high-level Isar proof + contexts are usually marked as internal, which prevents mysterious + name references such as @{text "xaa"} in the text. + + \medskip Basic manipulations of binding scopes requires names to be + modified. A \emph{name context} contains a collection of already + used names, which is maintained by the @{text "declare"} operation. + + The @{text "invents"} operation derives a number of fresh names + derived from a given starting point. For example, three names + derived from @{text "a"} are @{text "a"}, @{text "b"}, @{text "c"}, + provided there are no clashes with already used names. + + The @{text "variants"} operation produces fresh names by + incrementing given names as to base-26 numbers (with digits @{text + "a..z"}). For example, name @{text "foo"} results in variants + @{text "fooa"}, @{text "foob"}, @{text "fooc"}, \dots, @{text + "fooaa"}, @{text "fooab"}, \dots; each renaming step picks the next + unused variant from this list. +*} + +text %mlref {* + \begin{mldecls} + @{index_ML Name.internal: "string -> string"} \\ + @{index_ML Name.skolem: "string -> string"} \\[1ex] + @{index_ML_type Name.context} \\ + @{index_ML Name.context: Name.context} \\ + @{index_ML Name.declare: "string -> Name.context -> Name.context"} \\ + @{index_ML Name.invents: "Name.context -> string -> int -> string list"} \\ + @{index_ML Name.variants: "string list -> Name.context -> string list * Name.context"} \\ + \end{mldecls} + + \begin{description} + + \item @{ML Name.internal}~@{text "name"} produces an internal name + by adding one underscore. + + \item @{ML Name.skolem}~@{text "name"} produces a Skolem name by + adding two underscores. + + \item @{ML_type Name.context} represents the context of already used + names; the initial value is @{ML "Name.context"}. + + \item @{ML Name.declare}~@{text "name"} declares @{text "name"} as + being used. -text %FIXME {* - FIXME tune + \item @{ML Name.invents}~@{text "context base n"} produces @{text + "n"} fresh names derived from @{text "base"}. + + \end{description} +*} + + +subsection {* Indexed names *} + +text {* + An \emph{indexed name} (or @{text "indexname"}) is a pair of a basic + name with a natural number. This representation allows efficient + renaming by incrementing the second component only. To rename two + collections of indexnames apart from each other, first determine the + maximum index @{text "maxidx"} of the first collection, then + increment all indexes of the second collection by @{text "maxidx + + 1"}. Note that the maximum index of an empty collection is @{text + "-1"}. + + Isabelle syntax observes the following rules for representing an + indexname @{text "(x, i)"} as a packed string: + + \begin{itemize} + + \item @{text "?x"} if @{text "x"} does not end with a digit and @{text "i = 0"}. + + \item @{text "?xi"} if @{text "x"} does not end with a digit, + + \item @{text "?x.i"} else. + + \end{itemize} - A \emph{qualified name} essentially consists of a non-empty list of - basic name components. The packad notation uses a dot as separator, - as in @{text "A.b"}, for example. The very last component is called + Occasionally, basic names and indexed names are injected into the + same pair type: the (improper) indexname @{text "(x, -1)"} is used + to encode basic names. + + \medskip Indexnames may acquire arbitrary large index numbers over + time. Results are usually normalized towards @{text "0"} at certain + checkpoints, such that the very end of a proof. This works by + producing variants of the corresponding basic names + (\secref{sec:basic-names}). For example, the collection @{text + "?x.1, ?x.7, ?x.42"} then becomes @{text "?x, ?xa, ?xb"}. +*} + +text %mlref {* + \begin{mldecls} + @{index_ML_type indexname} \\ + \end{mldecls} + + \begin{description} + + \item @{ML_type indexname} represents indexed names. This is an + abbreviation for @{ML_type "string * int"}. The second component is + usually non-negative, except for situations where @{text "(x, -1)"} + is used to embed plain names. + + \end{description} +*} + + +subsection {* Qualified names and name spaces *} + +text {* + A \emph{qualified name} consists of a non-empty sequence of basic + name components. The packed representation a dot as separator, for + example in ``@{text "A.b.c"}''. The last component is called \emph{base} name, the remaining prefix \emph{qualifier} (which may be empty). - A @{text "naming"} policy tells how to produce fully qualified names - from a given specification. The @{text "full"} operation applies - performs naming of a name; the policy is usually taken from the - context. For example, a common policy is to attach an implicit - prefix. + The empty name is commonly used as an indication of unnamed + entities, if this makes any sense. The operations on qualified + names are smart enough to pass through such improper names + unchanged. + + The basic idea of qualified names is to encode a hierarchically + structured name spaces by recording the access path as part of the + name. For example, @{text "A.b.c"} may be understood as a local + entity @{text "c"} within a local structure @{text "b"} of the + enclosing structure @{text "A"}. Typically, name space hierarchies + consist of 1--3 levels, but this need not be always so. + + \medskip A @{text "naming"} policy tells how to turn a name + specification into a fully qualified internal name (by the @{text + "full"} operation), and how to fully qualified names may be accessed + externally. + + For example, the default naming prefixes an implicit path from the + context: @{text "x"} is becomes @{text "path.x"} internally; the + standard accesses include @{text "x"}, @{text "path.x"}, and further + partial @{text "path"} specifications. + + Normally, the naming is implicit in the theory or proof context. + There are separate versions of the corresponding operations for these + context types. - A @{text "name space"} manages declarations of fully qualified - names. There are separate operations to @{text "declare"}, @{text - "intern"}, and @{text "extern"} names. + \medskip A @{text "name space"} manages a collection of fully + internalized names, together with a mapping between external names + and internal names (in both directions). The corresponding @{text + "intern"} and @{text "extern"} operations are mostly used for + parsing and printing only! The @{text "declare"} operation augments + a name space according to a given naming policy. + + By general convention, there are separate name spaces for each kind + of formal entity, such as logical constant, type, type class, + theorem etc. It is usually clear from the occurrence in concrete + syntax (or from the scope) which kind of entity a name refers to. - FIXME + For example, the very same name @{text "c"} may be used uniformly + for a constant, type, type class, which are from separate syntactic + categories. There is a common convention to name theorems + systematically, according to the name of the main logical entity + being involved, such as @{text "c.intro"} and @{text "c.elim"} for + theorems related to constant @{text "c"}. + + This technique of mapping names from one space into another requires + some care in order to avoid conflicts. In particular, theorem names + derived from type or class names are better suffixed in addition to + the usual qualification, e.g.\ @{text "c_type.intro"} and @{text + "c_class.intro"} for theorems related to type @{text "c"} and class + @{text "c"}, respectively. *} -text %mlref FIXME - +text %mlref {* + \begin{mldecls} + @{index_ML NameSpace.base: "string -> string"} \\ + @{index_ML NameSpace.drop_base: "string -> string"} \\ + @{index_ML NameSpace.append: "string -> string -> string"} \\ + @{index_ML NameSpace.pack: "string list -> string"} \\ + @{index_ML NameSpace.unpack: "string -> string list"} \\[1ex] + @{index_ML_type NameSpace.naming} \\ + @{index_ML NameSpace.default_naming: NameSpace.naming} \\ + @{index_ML NameSpace.add_path: "string -> NameSpace.naming -> NameSpace.naming"} \\ + @{index_ML NameSpace.full: "NameSpace.naming -> string -> string"} \\[1ex] + @{index_ML_type NameSpace.T} \\ + @{index_ML NameSpace.empty: NameSpace.T} \\ + @{index_ML NameSpace.merge: "NameSpace.T * NameSpace.T -> NameSpace.T"} \\ + @{index_ML NameSpace.declare: "NameSpace.naming -> string -> NameSpace.T -> NameSpace.T"} \\ + @{index_ML NameSpace.intern: "NameSpace.T -> string -> string"} \\ + @{index_ML NameSpace.extern: "NameSpace.T -> string -> string"} \\ + \end{mldecls} -section {* Structured output *} + \begin{description} + + \item @{ML NameSpace.base}~@{text "name"} returns the base name of a + qualified name. + + \item @{ML NameSpace.drop_base}~@{text "name"} returns the qualifier + of a qualified name. -subsection {* Pretty printing *} + \item @{ML NameSpace.append}~@{text "name\<^isub>1 name\<^isub>2"} + appends two qualified names. -text FIXME + \item @{ML NameSpace.pack}~@{text "name"} and @{text + "NameSpace.unpack"}~@{text "names"} convert between the packed + string representation and explicit list form of qualified names. + + \item @{ML_type NameSpace.naming} represents the abstract concept of + a naming policy. -subsection {* Output channels *} + \item @{ML NameSpace.default_naming} is the default naming policy. + In a theory context, this is usually augmented by a path prefix + consisting of the theory name. + + \item @{ML NameSpace.add_path}~@{text "path naming"} augments the + naming policy by extending its path. -text FIXME + \item @{ML NameSpace.full}@{text "naming name"} turns a name + specification (usually a basic name) into the fully qualified + internal version, according to the given naming policy. + + \item @{ML_type NameSpace.T} represents name spaces. + + \item @{ML NameSpace.empty} and @{ML NameSpace.merge}~@{text + "(space\<^isub>1, space\<^isub>2)"} provide basic operations for + building name spaces in accordance to the usual theory data + management (\secref{sec:context-data}). -subsection {* Print modes \label{sec:print-mode} *} + \item @{ML NameSpace.declare}~@{text "naming name space"} enters a + fully qualified name into the name space, with partial accesses + being derived from the given policy. + + \item @{ML NameSpace.intern}~@{text "space name"} internalizes a + (partially qualified) external name. -text FIXME + This operation is mostly for parsing. Note that fully qualified + names stemming from declarations are produced via @{ML + "NameSpace.full"} (or derivatives for @{ML_type theory} or @{ML_type + Proof.context}). + \item @{ML NameSpace.extern}~@{text "space name"} externalizes a + (fully qualified) internal name. + + This operation is mostly for printing. Note unqualified names are + produced via @{ML NameSpace.base}. + + \end{description} +*} end