# HG changeset patch
# User wenzelm
# Date 1157467343 -7200
# Node ID 6d3f144cc1bdd9276aa98b9cf201c2ce577f4da3
# Parent  a04bf731ceb64f132a18f2d2932b46f161993802
more on names;

diff -r a04bf731ceb6 -r 6d3f144cc1bd doc-src/IsarImplementation/Thy/prelim.thy
--- a/doc-src/IsarImplementation/Thy/prelim.thy	Mon Sep 04 20:07:55 2006 +0200
+++ b/doc-src/IsarImplementation/Thy/prelim.thy	Tue Sep 05 16:42:23 2006 +0200
@@ -301,7 +301,8 @@
   \end{description}
 *}
 
-subsection {* Context data *}
+
+subsection {* Context data \label{sec:context-data} *}
 
 text {*
   The main purpose of theory and proof contexts is to manage arbitrary
@@ -403,100 +404,75 @@
 *}
 
 
-section {* Name spaces *}
-
-text %FIXME {*
-  FIXME tune
-
-  By general convention, each kind of formal entities (logical
-  constant, type, type class, theorem, method etc.) lives in a
-  separate name space.  It is usually clear from the syntactic context
-  of a name, which kind of entity it refers to.  For example, proof
-  method @{text "foo"} vs.\ theorem @{text "foo"} vs.\ logical
-  constant @{text "foo"} are easily distinguished thanks to the design
-  of the concrete outer syntax.  A notable exception are logical
-  identifiers within a term (\secref{sec:terms}): constants, fixed
-  variables, and bound variables all share the same identifier syntax,
-  but are distinguished by their scope.
+section {* Names *}
 
-  Name spaces are organized uniformly, as a collection of qualified
-  names consisting of a sequence of basic name components separated by
-  dots: @{text "Bar.bar.foo"}, @{text "Bar.foo"}, and @{text "foo"}
-  are examples for qualified names.
+text {*
+  In principle, a name is just a string, but there are various
+  convention for encoding additional structure.
 
-  Despite the independence of names of different kinds, certain naming
-  conventions may relate them to each other.  For example, a constant
-  @{text "foo"} could be accompanied with theorems @{text
-  "foo.intro"}, @{text "foo.elim"}, @{text "foo.simps"} etc.  The same
-  could happen for a type @{text "foo"}, but this is apt to cause
-  clashes in the theorem name space!  To avoid this, there is an
-  additional convention to add a suffix that determines the original
-  kind.  For example, constant @{text "foo"} could associated with
-  theorem @{text "foo.intro"}, type @{text "foo"} with theorem @{text
-  "foo_type.intro"}, and type class @{text "foo"} with @{text
-  "foo_class.intro"}.
-
-  \medskip Name components are subdivided into \emph{symbols}, which
-  constitute the smallest textual unit in Isabelle --- raw characters
-  are normally not encountered.
+  For example, the string ``@{text "Foo.bar.baz"}'' is considered as a
+  qualified name.  The most basic constituents of names may have their
+  own structure, e.g.\ the string ``\verb,\,\verb,<alpha>,'' is
+  considered as a single symbol (printed as ``@{text "\<alpha>"}'').
 *}
 
 
 subsection {* Strings of symbols *}
 
-text %FIXME {*
-  FIXME tune
+text {*
+  \glossary{Symbol}{The smallest unit of text in Isabelle, subsumes
+  plain ASCII characters as well as an infinite collection of named
+  symbols (for greek, math etc.).}
 
-  Isabelle strings consist of a sequence of
-  symbols\glossary{Symbol}{The smallest unit of text in Isabelle,
-  subsumes plain ASCII characters as well as an infinite collection of
-  named symbols (for greek, math etc.).}, which are either packed as
-  an actual @{text "string"}, or represented as a list.  Each symbol
-  is in itself a small string of the following form:
+  A \emph{symbol} constitutes the smallest textual unit in Isabelle
+  --- raw characters are normally not encountered.  Isabelle strings
+  consist of a sequence of symbols, represented as a packed string or
+  a list of symbols.  Each symbol is in itself a small string, which
+  is of one of the following forms:
 
   \begin{enumerate}
 
-  \item either a singleton ASCII character ``@{text "c"}'' (with
-  character code 0--127), for example ``\verb,a,'',
+  \item singleton ASCII character ``@{text "c"}'' (character code
+  0--127), for example ``\verb,a,'',
 
-  \item or a regular symbol ``\verb,\,\verb,<,@{text
-  "ident"}\verb,>,'', for example ``\verb,\,\verb,<alpha>,'',
+  \item regular symbol ``\verb,\,\verb,<,@{text "ident"}\verb,>,'',
+  for example ``\verb,\,\verb,<alpha>,'',
 
-  \item or a control symbol ``\verb,\,\verb,<^,@{text
-  "ident"}\verb,>,'', for example ``\verb,\,\verb,<^bold>,'',
+  \item control symbol ``\verb,\,\verb,<^,@{text "ident"}\verb,>,'',
+  for example ``\verb,\,\verb,<^bold>,'',
 
-  \item or a raw control symbol ``\verb,\,\verb,<^raw:,@{text
-  "\<dots>"}\verb,>,'' where ``@{text "\<dots>"}'' refers to any printable ASCII
-  character (excluding ``\verb,.,'' and ``\verb,>,'') or non-ASCII
-  character, for example ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'',
+  \item raw symbol ``\verb,\,\verb,<^raw:,@{text text}\verb,>,'' where
+  @{text text} is constists of printable characters excluding
+  ``\verb,.,'' and ``\verb,>,'', for example
+  ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'',
 
-  \item or a numbered raw control symbol ``\verb,\,\verb,<^raw,@{text
-  "nnn"}\verb,>, where @{text "nnn"} are digits, for example
+  \item numbered raw control symbol ``\verb,\,\verb,<^raw,@{text
+  n}\verb,>, where @{text n} consists of digits, for example
   ``\verb,\,\verb,<^raw42>,''.
 
   \end{enumerate}
 
-  The @{text "ident"} syntax for symbol names is @{text "letter
-  (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and
-  @{text "digit = 0..9"}.  There are infinitely many regular symbols
-  and control symbols available, but a certain collection of standard
-  symbols is treated specifically.  For example,
+  \noindent The @{text "ident"} syntax for symbol names is @{text
+  "letter (letter | digit)\<^sup>*"}, where @{text "letter =
+  A..Za..z"} and @{text "digit = 0..9"}.  There are infinitely many
+  regular symbols and control symbols, but a fixed collection of
+  standard symbols is treated specifically.  For example,
   ``\verb,\,\verb,<alpha>,'' is classified as a (non-ASCII) letter,
   which means it may occur within regular Isabelle identifier syntax.
 
-  Output of symbols depends on the print mode
-  (\secref{sec:print-mode}).  For example, the standard {\LaTeX} setup
-  of the Isabelle document preparation system would present
+  Note that the character set underlying Isabelle symbols is plain
+  7-bit ASCII.  Since 8-bit characters are passed through
+  transparently, Isabelle may process Unicode/UCS data (in UTF-8
+  encoding) as well.  Unicode provides its own collection of
+  mathematical symbols, but there is no built-in link to the ones of
+  Isabelle.
+
+  \medskip Output of Isabelle symbols depends on the print mode
+  (\secref{FIXME}).  For example, the standard {\LaTeX} setup of the
+  Isabelle document preparation system would present
   ``\verb,\,\verb,<alpha>,'' as @{text "\<alpha>"}, and
   ``\verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as @{text
   "\<^bold>\<alpha>"}.
-
-  \medskip It is important to note that the character set underlying
-  Isabelle symbols is plain 7-bit ASCII.  Since 8-bit characters are
-  passed through transparently, Isabelle may easily process
-  Unicode/UCS data as well (using UTF-8 encoding).  Unicode provides
-  its own collection of mathematical symbols, but there is no built-in
-  link to the ones of Isabelle.
 *}
 
 text %mlref {*
@@ -517,16 +493,15 @@
   type is an alias for @{ML_type "string"}, but emphasizes the
   specific format encountered here.
 
-  \item @{ML "Symbol.explode"}~@{text "s"} produces a symbol list from
-  the packed form that is encountered in most practical situations.
-  This function supercedes @{ML "String.explode"} for virtually all
-  purposes of manipulating text in Isabelle!  Plain @{ML "implode"}
-  may still be used for the reverse operation.
+  \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list
+  from the packed form that.  This function supercedes @{ML
+  "String.explode"} for virtually all purposes of manipulating text in
+  Isabelle!
 
   \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
-  "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify certain symbols
-  (both ASCII and several named ones) according to fixed syntactic
-  conventions of Isabelle, cf.\ \cite{isabelle-isar-ref}.
+  "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard
+  symbols according to fixed syntactic conventions of Isabelle, cf.\
+  \cite{isabelle-isar-ref}.
 
   \item @{ML_type "Symbol.sym"} is a concrete datatype that represents
   the different kinds of symbols explicitly with constructors @{ML
@@ -540,46 +515,264 @@
 *}
 
 
-subsection {* Qualified names *}
+subsection {* Basic names \label{sec:basic-names} *}
+
+text {*
+  A \emph{basic name} essentially consists of a single Isabelle
+  identifier.  There are conventions to mark separate classes of basic
+  names, by attaching a suffix of underscores (@{text "_"}): one
+  underscore means \emph{internal name}, two underscores means
+  \emph{Skolem name}, three underscores means \emph{internal Skolem
+  name}.
+
+  For example, the basic name @{text "foo"} has the internal version
+  @{text "foo_"}, with Skolem versions @{text "foo__"} and @{text
+  "foo___"}, respectively.
+
+  Such special versions are required for bookkeeping of names that are
+  apart from anything that may appear in the text given by the user.
+  In particular, system generated variables in high-level Isar proof
+  contexts are usually marked as internal, which prevents mysterious
+  name references such as @{text "xaa"} in the text.
+
+  \medskip Basic manipulations of binding scopes requires names to be
+  modified.  A \emph{name context} contains a collection of already
+  used names, which is maintained by the @{text "declare"} operation.
+
+  The @{text "invents"} operation derives a number of fresh names
+  derived from a given starting point.  For example, three names
+  derived from @{text "a"} are @{text "a"}, @{text "b"}, @{text "c"},
+  provided there are no clashes with already used names.
+
+  The @{text "variants"} operation produces fresh names by
+  incrementing given names as to base-26 numbers (with digits @{text
+  "a..z"}).  For example, name @{text "foo"} results in variants
+  @{text "fooa"}, @{text "foob"}, @{text "fooc"}, \dots, @{text
+  "fooaa"}, @{text "fooab"}, \dots; each renaming step picks the next
+  unused variant from this list.
+*}
+
+text %mlref {*
+  \begin{mldecls}
+  @{index_ML Name.internal: "string -> string"} \\
+  @{index_ML Name.skolem: "string -> string"} \\[1ex]
+  @{index_ML_type Name.context} \\
+  @{index_ML Name.context: Name.context} \\
+  @{index_ML Name.declare: "string -> Name.context -> Name.context"} \\
+  @{index_ML Name.invents: "Name.context -> string -> int -> string list"} \\
+  @{index_ML Name.variants: "string list -> Name.context -> string list * Name.context"} \\
+  \end{mldecls}
+
+  \begin{description}
+
+  \item @{ML Name.internal}~@{text "name"} produces an internal name
+  by adding one underscore.
+
+  \item @{ML Name.skolem}~@{text "name"} produces a Skolem name by
+  adding two underscores.
+
+  \item @{ML_type Name.context} represents the context of already used
+  names; the initial value is @{ML "Name.context"}.
+
+  \item @{ML Name.declare}~@{text "name"} declares @{text "name"} as
+  being used.
 
-text %FIXME {*
-  FIXME tune
+  \item @{ML Name.invents}~@{text "context base n"} produces @{text
+  "n"} fresh names derived from @{text "base"}.
+
+  \end{description}
+*}
+
+
+subsection {* Indexed names *}
+
+text {*
+  An \emph{indexed name} (or @{text "indexname"}) is a pair of a basic
+  name with a natural number.  This representation allows efficient
+  renaming by incrementing the second component only.  To rename two
+  collections of indexnames apart from each other, first determine the
+  maximum index @{text "maxidx"} of the first collection, then
+  increment all indexes of the second collection by @{text "maxidx +
+  1"}.  Note that the maximum index of an empty collection is @{text
+  "-1"}.
+
+  Isabelle syntax observes the following rules for representing an
+  indexname @{text "(x, i)"} as a packed string:
+
+  \begin{itemize}
+
+  \item @{text "?x"} if @{text "x"} does not end with a digit and @{text "i = 0"}.
+
+  \item @{text "?xi"} if @{text "x"} does not end with a digit,
+
+  \item @{text "?x.i"} else.
+
+  \end{itemize}
 
-  A \emph{qualified name} essentially consists of a non-empty list of
-  basic name components.  The packad notation uses a dot as separator,
-  as in @{text "A.b"}, for example.  The very last component is called
+  Occasionally, basic names and indexed names are injected into the
+  same pair type: the (improper) indexname @{text "(x, -1)"} is used
+  to encode basic names.
+
+  \medskip Indexnames may acquire arbitrary large index numbers over
+  time.  Results are usually normalized towards @{text "0"} at certain
+  checkpoints, such that the very end of a proof.  This works by
+  producing variants of the corresponding basic names
+  (\secref{sec:basic-names}).  For example, the collection @{text
+  "?x.1, ?x.7, ?x.42"} then becomes @{text "?x, ?xa, ?xb"}.
+*}
+
+text %mlref {*
+  \begin{mldecls}
+  @{index_ML_type indexname} \\
+  \end{mldecls}
+
+  \begin{description}
+
+  \item @{ML_type indexname} represents indexed names.  This is an
+  abbreviation for @{ML_type "string * int"}.  The second component is
+  usually non-negative, except for situations where @{text "(x, -1)"}
+  is used to embed plain names.
+
+  \end{description}
+*}
+
+
+subsection {* Qualified names and name spaces *}
+
+text {*
+  A \emph{qualified name} consists of a non-empty sequence of basic
+  name components.  The packed representation a dot as separator, for
+  example in ``@{text "A.b.c"}''.  The last component is called
   \emph{base} name, the remaining prefix \emph{qualifier} (which may
   be empty).
 
-  A @{text "naming"} policy tells how to produce fully qualified names
-  from a given specification.  The @{text "full"} operation applies
-  performs naming of a name; the policy is usually taken from the
-  context.  For example, a common policy is to attach an implicit
-  prefix.
+  The empty name is commonly used as an indication of unnamed
+  entities, if this makes any sense.  The operations on qualified
+  names are smart enough to pass through such improper names
+  unchanged.
+
+  The basic idea of qualified names is to encode a hierarchically
+  structured name spaces by recording the access path as part of the
+  name.  For example, @{text "A.b.c"} may be understood as a local
+  entity @{text "c"} within a local structure @{text "b"} of the
+  enclosing structure @{text "A"}.  Typically, name space hierarchies
+  consist of 1--3 levels, but this need not be always so.
+
+  \medskip A @{text "naming"} policy tells how to turn a name
+  specification into a fully qualified internal name (by the @{text
+  "full"} operation), and how to fully qualified names may be accessed
+  externally.
+
+  For example, the default naming prefixes an implicit path from the
+  context: @{text "x"} is becomes @{text "path.x"} internally; the
+  standard accesses include @{text "x"}, @{text "path.x"}, and further
+  partial @{text "path"} specifications.
+
+  Normally, the naming is implicit in the theory or proof context.
+  There are separate versions of the corresponding operations for these
+  context types.
 
-  A @{text "name space"} manages declarations of fully qualified
-  names.  There are separate operations to @{text "declare"}, @{text
-  "intern"}, and @{text "extern"} names.
+  \medskip A @{text "name space"} manages a collection of fully
+  internalized names, together with a mapping between external names
+  and internal names (in both directions).  The corresponding @{text
+  "intern"} and @{text "extern"} operations are mostly used for
+  parsing and printing only!  The @{text "declare"} operation augments
+  a name space according to a given naming policy.
+
+  By general convention, there are separate name spaces for each kind
+  of formal entity, such as logical constant, type, type class,
+  theorem etc.  It is usually clear from the occurrence in concrete
+  syntax (or from the scope) which kind of entity a name refers to.
 
-  FIXME
+  For example, the very same name @{text "c"} may be used uniformly
+  for a constant, type, type class, which are from separate syntactic
+  categories.  There is a common convention to name theorems
+  systematically, according to the name of the main logical entity
+  being involved, such as @{text "c.intro"} and @{text "c.elim"} for
+  theorems related to constant @{text "c"}.
+
+  This technique of mapping names from one space into another requires
+  some care in order to avoid conflicts.  In particular, theorem names
+  derived from type or class names are better suffixed in addition to
+  the usual qualification, e.g.\ @{text "c_type.intro"} and @{text
+  "c_class.intro"} for theorems related to type @{text "c"} and class
+  @{text "c"}, respectively.
 *}
 
-text %mlref FIXME
-
+text %mlref {*
+  \begin{mldecls}
+  @{index_ML NameSpace.base: "string -> string"} \\
+  @{index_ML NameSpace.drop_base: "string -> string"} \\
+  @{index_ML NameSpace.append: "string -> string -> string"} \\
+  @{index_ML NameSpace.pack: "string list -> string"} \\
+  @{index_ML NameSpace.unpack: "string -> string list"} \\[1ex]
+  @{index_ML_type NameSpace.naming} \\
+  @{index_ML NameSpace.default_naming: NameSpace.naming} \\
+  @{index_ML NameSpace.add_path: "string -> NameSpace.naming -> NameSpace.naming"} \\
+  @{index_ML NameSpace.full: "NameSpace.naming -> string -> string"} \\[1ex]
+  @{index_ML_type NameSpace.T} \\
+  @{index_ML NameSpace.empty: NameSpace.T} \\
+  @{index_ML NameSpace.merge: "NameSpace.T * NameSpace.T -> NameSpace.T"} \\
+  @{index_ML NameSpace.declare: "NameSpace.naming -> string -> NameSpace.T -> NameSpace.T"} \\
+  @{index_ML NameSpace.intern: "NameSpace.T -> string -> string"} \\
+  @{index_ML NameSpace.extern: "NameSpace.T -> string -> string"} \\
+  \end{mldecls}
 
-section {* Structured output *}
+  \begin{description}
+
+  \item @{ML NameSpace.base}~@{text "name"} returns the base name of a
+  qualified name.
+
+  \item @{ML NameSpace.drop_base}~@{text "name"} returns the qualifier
+  of a qualified name.
 
-subsection {* Pretty printing *}
+  \item @{ML NameSpace.append}~@{text "name\<^isub>1 name\<^isub>2"}
+  appends two qualified names.
 
-text FIXME
+  \item @{ML NameSpace.pack}~@{text "name"} and @{text
+  "NameSpace.unpack"}~@{text "names"} convert between the packed
+  string representation and explicit list form of qualified names.
+
+  \item @{ML_type NameSpace.naming} represents the abstract concept of
+  a naming policy.
 
-subsection {* Output channels *}
+  \item @{ML NameSpace.default_naming} is the default naming policy.
+  In a theory context, this is usually augmented by a path prefix
+  consisting of the theory name.
+
+  \item @{ML NameSpace.add_path}~@{text "path naming"} augments the
+  naming policy by extending its path.
 
-text FIXME
+  \item @{ML NameSpace.full}@{text "naming name"} turns a name
+  specification (usually a basic name) into the fully qualified
+  internal version, according to the given naming policy.
+
+  \item @{ML_type NameSpace.T} represents name spaces.
+
+  \item @{ML NameSpace.empty} and @{ML NameSpace.merge}~@{text
+  "(space\<^isub>1, space\<^isub>2)"} provide basic operations for
+  building name spaces in accordance to the usual theory data
+  management (\secref{sec:context-data}).
 
-subsection {* Print modes \label{sec:print-mode} *}
+  \item @{ML NameSpace.declare}~@{text "naming name space"} enters a
+  fully qualified name into the name space, with partial accesses
+  being derived from the given policy.
+
+  \item @{ML NameSpace.intern}~@{text "space name"} internalizes a
+  (partially qualified) external name.
 
-text FIXME
+  This operation is mostly for parsing.  Note that fully qualified
+  names stemming from declarations are produced via @{ML
+  "NameSpace.full"} (or derivatives for @{ML_type theory} or @{ML_type
+  Proof.context}).
 
+  \item @{ML NameSpace.extern}~@{text "space name"} externalizes a
+  (fully qualified) internal name.
+
+  This operation is mostly for printing.  Note unqualified names are
+  produced via @{ML NameSpace.base}.
+
+  \end{description}
+*}
 
 end