--- a/doc-src/IsarImplementation/Thy/document/Prelim.tex	Fri Feb 05 11:51:52 2010 +0100
+++ b/doc-src/IsarImplementation/Thy/document/Prelim.tex	Fri Feb 05 14:39:02 2010 +0100
@@ -93,39 +93,46 @@
 \isamarkuptrue%
 %
 \begin{isamarkuptext}%
-A \emph{theory} is a data container with explicit name and unique
-  identifier.  Theories are related by a (nominal) sub-theory
+A \emph{theory} is a data container with explicit name and
+  unique identifier.  Theories are related by a (nominal) sub-theory
   relation, which corresponds to the dependency graph of the original
   construction; each theory is derived from a certain sub-graph of
-  ancestor theories.
-
-  The \isa{merge} operation produces the least upper bound of two
-  theories, which actually degenerates into absorption of one theory
-  into the other (due to the nominal sub-theory relation).
+  ancestor theories.  To this end, the system maintains a set of
+  symbolic ``identification stamps'' within each theory.
 
-  The \isa{begin} operation starts a new theory by importing
-  several parent theories and entering a special \isa{draft} mode,
-  which is sustained until the final \isa{end} operation.  A draft
-  theory acts like a linear type, where updates invalidate earlier
-  versions.  An invalidated draft is called ``stale''.
+  In order to avoid the full-scale overhead of explicit sub-theory
+  identification of arbitrary intermediate stages, a theory is
+  switched into \isa{draft} mode under certain circumstances.  A
+  draft theory acts like a linear type, where updates invalidate
+  earlier versions.  An invalidated draft is called \emph{stale}.
 
-  The \isa{checkpoint} operation produces an intermediate stepping
-  stone that will survive the next update: both the original and the
-  changed theory remain valid and are related by the sub-theory
-  relation.  Checkpointing essentially recovers purely functional
-  theory values, at the expense of some extra internal bookkeeping.
+  The \isa{checkpoint} operation produces a safe stepping stone
+  that will survive the next update without becoming stale: both the
+  old and the new theory remain valid and are related by the
+  sub-theory relation.  Checkpointing essentially recovers purely
+  functional theory values, at the expense of some extra internal
+  bookkeeping.
 
   The \isa{copy} operation produces an auxiliary version that has
   the same data content, but is unrelated to the original: updates of
   the copy do not affect the original, neither does the sub-theory
   relation hold.
 
+  The \isa{merge} operation produces the least upper bound of two
+  theories, which actually degenerates into absorption of one theory
+  into the other (according to the nominal sub-theory relation).
+
+  The \isa{begin} operation starts a new theory by importing
+  several parent theories and entering a special mode of nameless
+  incremental updates, until the final \isa{end} operation is
+  performed.
+
   \medskip The example in \figref{fig:ex-theory} below shows a theory
   graph derived from \isa{Pure}, with theory \isa{Length}
   importing \isa{Nat} and \isa{List}.  The body of \isa{Length} consists of a sequence of updates, working mostly on
-  drafts.  Intermediate checkpoints may occur as well, due to the
-  history mechanism provided by the Isar top-level, cf.\
-  \secref{sec:isar-toplevel}.
+  drafts internally, while transaction boundaries of Isar top-level
+  commands (\secref{sec:isar-toplevel}) are guaranteed to be safe
+  checkpoints.
 
   \begin{figure}[htb]
   \begin{center}
@@ -172,9 +179,10 @@
 \begin{mldecls}
   \indexdef{}{ML type}{theory}\verb|type theory| \\
   \indexdef{}{ML}{Theory.subthy}\verb|Theory.subthy: theory * theory -> bool| \\
-  \indexdef{}{ML}{Theory.merge}\verb|Theory.merge: theory * theory -> theory| \\
   \indexdef{}{ML}{Theory.checkpoint}\verb|Theory.checkpoint: theory -> theory| \\
   \indexdef{}{ML}{Theory.copy}\verb|Theory.copy: theory -> theory| \\
+  \indexdef{}{ML}{Theory.merge}\verb|Theory.merge: theory * theory -> theory| \\
+  \indexdef{}{ML}{Theory.begin\_theory}\verb|Theory.begin_theory: string -> theory list -> theory| \\
   \end{mldecls}
   \begin{mldecls}
   \indexdef{}{ML type}{theory\_ref}\verb|type theory_ref| \\
@@ -185,24 +193,31 @@
   \begin{description}
 
   \item \verb|theory| represents theory contexts.  This is
-  essentially a linear type!  Most operations destroy the original
-  version, which then becomes ``stale''.
+  essentially a linear type, with explicit runtime checking!  Most
+  internal theory operations destroy the original version, which then
+  becomes ``stale''.
 
-  \item \verb|Theory.subthy|~\isa{{\isacharparenleft}thy\isactrlsub {\isadigit{1}}{\isacharcomma}\ thy\isactrlsub {\isadigit{2}}{\isacharparenright}}
-  compares theories according to the inherent graph structure of the
-  construction.  This sub-theory relation is a nominal approximation
-  of inclusion (\isa{{\isasymsubseteq}}) of the corresponding content.
-
-  \item \verb|Theory.merge|~\isa{{\isacharparenleft}thy\isactrlsub {\isadigit{1}}{\isacharcomma}\ thy\isactrlsub {\isadigit{2}}{\isacharparenright}}
-  absorbs one theory into the other.  This fails for unrelated
-  theories!
+  \item \verb|Theory.subthy|~\isa{{\isacharparenleft}thy\isactrlsub {\isadigit{1}}{\isacharcomma}\ thy\isactrlsub {\isadigit{2}}{\isacharparenright}} compares theories
+  according to the intrinsic graph structure of the construction.
+  This sub-theory relation is a nominal approximation of inclusion
+  (\isa{{\isasymsubseteq}}) of the corresponding content (according to the
+  semantics of the ML modules that implement the data).
 
   \item \verb|Theory.checkpoint|~\isa{thy} produces a safe
-  stepping stone in the linear development of \isa{thy}.  The next
-  update will result in two related, valid theories.
+  stepping stone in the linear development of \isa{thy}.  This
+  changes the old theory, but the next update will result in two
+  related, valid theories.
+
+  \item \verb|Theory.copy|~\isa{thy} produces a variant of \isa{thy} with the same data.  The copy is not related to the original,
+  but the original is unchanged.
 
-  \item \verb|Theory.copy|~\isa{thy} produces a variant of \isa{thy} with the same data.  The result is not related to the
-  original; the original is unchanged.
+  \item \verb|Theory.merge|~\isa{{\isacharparenleft}thy\isactrlsub {\isadigit{1}}{\isacharcomma}\ thy\isactrlsub {\isadigit{2}}{\isacharparenright}} absorbs one theory
+  into the other, without changing \isa{thy\isactrlsub {\isadigit{1}}} or \isa{thy\isactrlsub {\isadigit{2}}}.
+  This version of ad-hoc theory merge fails for unrelated theories!
+
+  \item \verb|Theory.begin_theory|~\isa{name\ parents} constructs
+  a new theory based on the given parents.  This {\ML} function is
+  normally not invoked directly.
 
   \item \verb|theory_ref| represents a sliding reference to an
   always valid theory; updates on the original are propagated
@@ -229,30 +244,32 @@
 \isamarkuptrue%
 %
 \begin{isamarkuptext}%
-A proof context is a container for pure data with a back-reference
-  to the theory it belongs to.  The \isa{init} operation creates a
-  proof context from a given theory.  Modifications to draft theories
-  are propagated to the proof context as usual, but there is also an
-  explicit \isa{transfer} operation to force resynchronization
-  with more substantial updates to the underlying theory.  The actual
-  context data does not require any special bookkeeping, thanks to the
-  lack of destructive features.
+A proof context is a container for pure data with a
+  back-reference to the theory it belongs to.  The \isa{init}
+  operation creates a proof context from a given theory.
+  Modifications to draft theories are propagated to the proof context
+  as usual, but there is also an explicit \isa{transfer} operation
+  to force resynchronization with more substantial updates to the
+  underlying theory.
 
-  Entities derived in a proof context need to record inherent logical
+  Entities derived in a proof context need to record logical
   requirements explicitly, since there is no separate context
-  identification as for theories.  For example, hypotheses used in
-  primitive derivations (cf.\ \secref{sec:thms}) are recorded
-  separately within the sequent \isa{{\isasymGamma}\ {\isasymturnstile}\ {\isasymphi}}, just to make double
-  sure.  Results could still leak into an alien proof context due to
-  programming errors, but Isabelle/Isar includes some extra validity
-  checks in critical positions, notably at the end of a sub-proof.
+  identification or symbolic inclusion as for theories.  For example,
+  hypotheses used in primitive derivations (cf.\ \secref{sec:thms})
+  are recorded separately within the sequent \isa{{\isasymGamma}\ {\isasymturnstile}\ {\isasymphi}}, just to
+  make double sure.  Results could still leak into an alien proof
+  context due to programming errors, but Isabelle/Isar includes some
+  extra validity checks in critical positions, notably at the end of a
+  sub-proof.
 
   Proof contexts may be manipulated arbitrarily, although the common
   discipline is to follow block structure as a mental model: a given
   context is extended consecutively, and results are exported back
-  into the original context.  Note that the Isar proof states model
+  into the original context.  Note that an Isar proof state models
   block-structured reasoning explicitly, using a stack of proof
-  contexts internally.%
+  contexts internally.  For various technical reasons, the background
+  theory of an Isar proof state must not be changed while the proof is
+  still under construction!%
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
@@ -311,7 +328,8 @@
 
   Moreover, there are total operations \isa{theory{\isacharunderscore}of} and \isa{proof{\isacharunderscore}of} to convert a generic context into either kind: a theory
   can always be selected from the sum, while a proof context might
-  have to be constructed by an ad-hoc \isa{init} operation.%
+  have to be constructed by an ad-hoc \isa{init} operation, which
+  incurs a small runtime overhead.%
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
@@ -376,6 +394,15 @@
   \noindent The \isa{empty} value acts as initial default for
   \emph{any} theory that does not declare actual data content; \isa{extend} is acts like a unitary version of \isa{merge}.
 
+  Implementing \isa{merge} can be tricky.  The general idea is
+  that \isa{merge\ {\isacharparenleft}data\isactrlsub {\isadigit{1}}{\isacharcomma}\ data\isactrlsub {\isadigit{2}}{\isacharparenright}} inserts those parts of \isa{data\isactrlsub {\isadigit{2}}} into \isa{data\isactrlsub {\isadigit{1}}} that are not yet present, while
+  keeping the general order of things.  The \verb|Library.merge|
+  function on plain lists may serve as canonical template.
+
+  Particularly note that shared parts of the data must not be
+  duplicated by naive concatenation, or a theory graph that is like a
+  chain of diamonds would cause an exponential blowup!
+
   \paragraph{Proof context data} declarations need to implement the
   following SML signature:
 
@@ -387,15 +414,18 @@
   \medskip
 
   \noindent The \isa{init} operation is supposed to produce a pure
-  value from the given background theory.
+  value from the given background theory and should be somehow
+  ``immediate''.  Whenever a proof context is initialized, which
+  happens frequently, the the system invokes the \isa{init}
+  operation of \emph{all} theory data slots ever declared.
 
   \paragraph{Generic data} provides a hybrid interface for both theory
   and proof data.  The \isa{init} operation for proof contexts is
   predefined to select the current data value from the background
   theory.
 
-  \bigskip A data declaration of type \isa{T} results in the
-  following interface:
+  \bigskip Any of these data declaration over type \isa{T} result
+  in an ML structure with the following signature:
 
   \medskip
   \begin{tabular}{ll}
@@ -405,12 +435,12 @@
   \end{tabular}
   \medskip
 
-  \noindent These other operations provide access for the particular
-  kind of context (theory, proof, or generic context).  Note that this
-  is a safe interface: there is no other way to access the
-  corresponding data slot of a context.  By keeping these operations
-  private, a component may maintain abstract values authentically,
-  without other components interfering.%
+  \noindent These other operations provide exclusive access for the
+  particular kind of context (theory, proof, or generic context).
+  This interface fully observes the ML discipline for types and
+  scopes: there is no other way to access the corresponding data slot
+  of a context.  By keeping these operations private, an Isabelle/ML
+  module may maintain abstract values authentically.%
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
@@ -451,16 +481,157 @@
 %
 \endisadelimmlref
 %
+\isadelimmlex
+%
+\endisadelimmlex
+%
+\isatagmlex
+%
+\begin{isamarkuptext}%
+The following artificial example demonstrates theory
+  data: we maintain a set of terms that are supposed to be wellformed
+  wrt.\ the enclosing theory.  The public interface is as follows:%
+\end{isamarkuptext}%
+\isamarkuptrue%
+%
+\endisatagmlex
+{\isafoldmlex}%
+%
+\isadelimmlex
+%
+\endisadelimmlex
+%
+\isadelimML
+%
+\endisadelimML
+%
+\isatagML
+\isacommand{ML}\isamarkupfalse%
+\ {\isacharverbatimopen}\isanewline
+\ \ signature\ WELLFORMED{\isacharunderscore}TERMS\ {\isacharequal}\isanewline
+\ \ sig\isanewline
+\ \ \ \ val\ get{\isacharcolon}\ theory\ {\isacharminus}{\isachargreater}\ term\ list\isanewline
+\ \ \ \ val\ add{\isacharcolon}\ term\ {\isacharminus}{\isachargreater}\ theory\ {\isacharminus}{\isachargreater}\ theory\isanewline
+\ \ end{\isacharsemicolon}\isanewline
+{\isacharverbatimclose}%
+\endisatagML
+{\isafoldML}%
+%
+\isadelimML
+%
+\endisadelimML
+%
+\begin{isamarkuptext}%
+\noindent The implementation uses private theory data
+  internally, and only exposes an operation that involves explicit
+  argument checking wrt.\ the given theory.%
+\end{isamarkuptext}%
+\isamarkuptrue%
+%
+\isadelimML
+%
+\endisadelimML
+%
+\isatagML
+\isacommand{ML}\isamarkupfalse%
+\ {\isacharverbatimopen}\isanewline
+\ \ structure\ Wellformed{\isacharunderscore}Terms{\isacharcolon}\ WELLFORMED{\isacharunderscore}TERMS\ {\isacharequal}\isanewline
+\ \ struct\isanewline
+\isanewline
+\ \ structure\ Terms\ {\isacharequal}\ Theory{\isacharunderscore}Data\isanewline
+\ \ {\isacharparenleft}\isanewline
+\ \ \ \ type\ T\ {\isacharequal}\ term\ OrdList{\isachardot}T{\isacharsemicolon}\isanewline
+\ \ \ \ val\ empty\ {\isacharequal}\ {\isacharbrackleft}{\isacharbrackright}{\isacharsemicolon}\isanewline
+\ \ \ \ val\ extend\ {\isacharequal}\ I{\isacharsemicolon}\isanewline
+\ \ \ \ fun\ merge\ {\isacharparenleft}ts{\isadigit{1}}{\isacharcomma}\ ts{\isadigit{2}}{\isacharparenright}\ {\isacharequal}\isanewline
+\ \ \ \ \ \ OrdList{\isachardot}union\ TermOrd{\isachardot}fast{\isacharunderscore}term{\isacharunderscore}ord\ ts{\isadigit{1}}\ ts{\isadigit{2}}{\isacharsemicolon}\isanewline
+\ \ {\isacharparenright}\isanewline
+\isanewline
+\ \ val\ get\ {\isacharequal}\ Terms{\isachardot}get{\isacharsemicolon}\isanewline
+\isanewline
+\ \ fun\ add\ raw{\isacharunderscore}t\ thy\ {\isacharequal}\isanewline
+\ \ \ \ let\ val\ t\ {\isacharequal}\ Sign{\isachardot}cert{\isacharunderscore}term\ thy\ raw{\isacharunderscore}t\isanewline
+\ \ \ \ in\ Terms{\isachardot}map\ {\isacharparenleft}OrdList{\isachardot}insert\ TermOrd{\isachardot}fast{\isacharunderscore}term{\isacharunderscore}ord\ t{\isacharparenright}\ thy\ end{\isacharsemicolon}\isanewline
+\isanewline
+\ \ end{\isacharsemicolon}\isanewline
+{\isacharverbatimclose}%
+\endisatagML
+{\isafoldML}%
+%
+\isadelimML
+%
+\endisadelimML
+%
+\begin{isamarkuptext}%
+We use \verb|term OrdList.T| for reasonably efficient
+  representation of a set of terms: all operations are linear in the
+  number of stored elements.  Here we assume that our users do not
+  care about the declaration order, since that data structure forces
+  its own arrangement of elements.
+
+  Observe how the \verb|merge| operation joins the data slots of
+  the two constituents: \verb|OrdList.union| prevents duplication of
+  common data from different branches, thus avoiding the danger of
+  exponential blowup.  (Plain list append etc.\ must never be used for
+  theory data merges.)
+
+  \medskip Our intended invariant is achieved as follows:
+  \begin{enumerate}
+
+  \item \verb|Wellformed_Terms.add| only admits terms that have passed
+  the \verb|Sign.cert_term| check of the given theory at that point.
+
+  \item Wellformedness in the sense of \verb|Sign.cert_term| is
+  monotonic wrt.\ the sub-theory relation.  So our data can move
+  upwards in the hierarchy (via extension or merges), and maintain
+  wellformedness without further checks.
+
+  \end{enumerate}
+
+  Note that all basic operations of the inference kernel (which
+  includes \verb|Sign.cert_term|) observe this monotonicity principle,
+  but other user-space tools don't.  For example, fully-featured
+  type-inference via \verb|Syntax.check_term| (cf.\
+  \secref{sec:term-check}) is not necessarily monotonic wrt.\ the
+  background theory, since constraints of term constants can be
+  strengthened by later declarations, for example.
+
+  In most cases, user-space context data does not have to take such
+  invariants too seriously.  The situation is different in the
+  implementation of the inference kernel itself, which uses the very
+  same data mechanisms for types, constants, axioms etc.%
+\end{isamarkuptext}%
+\isamarkuptrue%
+%
 \isamarkupsection{Names \label{sec:names}%
 }
 \isamarkuptrue%
 %
 \begin{isamarkuptext}%
 In principle, a name is just a string, but there are various
-  convention for encoding additional structure.  For example, ``\isa{Foo{\isachardot}bar{\isachardot}baz}'' is considered as a qualified name consisting of
-  three basic name components.  The individual constituents of a name
-  may have further substructure, e.g.\ the string
-  ``\verb,\,\verb,<alpha>,'' encodes as a single symbol.%
+  conventions for representing additional structure.  For example,
+  ``\isa{Foo{\isachardot}bar{\isachardot}baz}'' is considered as a long name consisting of
+  qualifier \isa{Foo{\isachardot}bar} and base name \isa{baz}.  The
+  individual constituents of a name may have further substructure,
+  e.g.\ the string ``\verb,\,\verb,<alpha>,'' encodes as a single
+  symbol.
+
+  \medskip Subsequently, we shall introduce specific categories of
+  names.  Roughly speaking these correspond to logical entities as
+  follows:
+  \begin{itemize}
+
+  \item Basic names (\secref{sec:basic-name}): free and bound
+  variables.
+
+  \item Indexed names (\secref{sec:indexname}): schematic variables.
+
+  \item Long names (\secref{sec:long-name}): constants of any kind
+  (type constructors, term constants, other concepts defined in user
+  space).  Such entities are typically managed via name spaces
+  (\secref{sec:name-space}).
+
+  \end{itemize}%
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
@@ -469,16 +640,16 @@
 \isamarkuptrue%
 %
 \begin{isamarkuptext}%
-A \emph{symbol} constitutes the smallest textual unit in Isabelle
-  --- raw characters are normally not encountered at all.  Isabelle
-  strings consist of a sequence of symbols, represented as a packed
-  string or a list of strings.  Each symbol is in itself a small
-  string, which has either one of the following forms:
+A \emph{symbol} constitutes the smallest textual unit in
+  Isabelle --- raw ML characters are normally not encountered at all!
+  Isabelle strings consist of a sequence of symbols, represented as a
+  packed string or an exploded list of strings.  Each symbol is in
+  itself a small string, which has either one of the following forms:
 
   \begin{enumerate}
 
-  \item a single ASCII character ``\isa{c}'', for example
-  ``\verb,a,'',
+  \item a single ASCII character ``\isa{c}'' or raw byte in the
+  range of 128\dots 255, for example ``\verb,a,'',
 
   \item a regular symbol ``\verb,\,\verb,<,\isa{ident}\verb,>,'',
   for example ``\verb,\,\verb,<alpha>,'',
@@ -487,7 +658,7 @@
   for example ``\verb,\,\verb,<^bold>,'',
 
   \item a raw symbol ``\verb,\,\verb,<^raw:,\isa{text}\verb,>,''
-  where \isa{text} constists of printable characters excluding
+  where \isa{text} consists of printable characters excluding
   ``\verb,.,'' and ``\verb,>,'', for example
   ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'',
 
@@ -503,16 +674,25 @@
   may occur within regular Isabelle identifiers.
 
   Since the character set underlying Isabelle symbols is 7-bit ASCII
-  and 8-bit characters are passed through transparently, Isabelle may
-  also process Unicode/UCS data in UTF-8 encoding.  Unicode provides
-  its own collection of mathematical symbols, but there is no built-in
-  link to the standard collection of Isabelle.
+  and 8-bit characters are passed through transparently, Isabelle can
+  also process Unicode/UCS data in UTF-8 encoding.\footnote{When
+  counting precise source positions internally, bytes in the range of
+  128\dots 191 are ignored.  In UTF-8 encoding, this interval covers
+  the additional trailer bytes, so Isabelle happens to count Unicode
+  characters here, not bytes in memory.  In ISO-Latin encoding, the
+  ignored range merely includes some extra punctuation characters that
+  even have replacements within the standard collection of Isabelle
+  symbols; the accented letters range is counted properly.} Unicode
+  provides its own collection of mathematical symbols, but within the
+  core Isabelle/ML world there is no link to the standard collection
+  of Isabelle regular symbols.
 
   \medskip Output of Isabelle symbols depends on the print mode
   (\secref{print-mode}).  For example, the standard {\LaTeX} setup of
   the Isabelle document preparation system would present
   ``\verb,\,\verb,<alpha>,'' as \isa{{\isasymalpha}}, and
-  ``\verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as \isa{\isactrlbold {\isasymalpha}}.%
+  ``\verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as \isa{\isactrlbold {\isasymalpha}}.  On-screen rendering usually works by mapping a finite
+  subset of Isabelle symbols to suitable Unicode characters.%
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
@@ -524,7 +704,7 @@
 %
 \begin{isamarkuptext}%
 \begin{mldecls}
-  \indexdef{}{ML type}{Symbol.symbol}\verb|type Symbol.symbol| \\
+  \indexdef{}{ML type}{Symbol.symbol}\verb|type Symbol.symbol = string| \\
   \indexdef{}{ML}{Symbol.explode}\verb|Symbol.explode: string -> Symbol.symbol list| \\
   \indexdef{}{ML}{Symbol.is\_letter}\verb|Symbol.is_letter: Symbol.symbol -> bool| \\
   \indexdef{}{ML}{Symbol.is\_digit}\verb|Symbol.is_digit: Symbol.symbol -> bool| \\
@@ -539,11 +719,14 @@
   \begin{description}
 
   \item \verb|Symbol.symbol| represents individual Isabelle
-  symbols; this is an alias for \verb|string|.
+  symbols.
 
   \item \verb|Symbol.explode|~\isa{str} produces a symbol list
   from the packed form.  This function supercedes \verb|String.explode| for virtually all purposes of manipulating text in
-  Isabelle!
+  Isabelle!\footnote{The runtime overhead for exploded strings is
+  mainly that of the list structure: individual symbols that happen to
+  be a singleton string --- which is the most common case --- do not
+  require extra memory in Poly/ML.}
 
   \item \verb|Symbol.is_letter|, \verb|Symbol.is_digit|, \verb|Symbol.is_quasi|, \verb|Symbol.is_blank| classify standard
   symbols according to fixed syntactic conventions of Isabelle, cf.\
@@ -555,7 +738,16 @@
   \item \verb|Symbol.decode| converts the string representation of a
   symbol into the datatype version.
 
-  \end{description}%
+  \end{description}
+
+  \paragraph{Historical note.} In the original SML90 standard the
+  primitive ML type \verb|char| did not exists, and the basic \verb|explode: string -> string list| operation would produce a list of
+  singleton strings as in Isabelle/ML today.  When SML97 came out,
+  Isabelle did not adopt its slightly anachronistic 8-bit characters,
+  but the idea of exploding a string into a list of small strings was
+  extended to ``symbols'' as explained above.  Thus Isabelle sources
+  can refer to an infinite store of user-defined symbols, without
+  having to worry about the multitude of Unicode encodings.%
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
@@ -566,7 +758,7 @@
 %
 \endisadelimmlref
 %
-\isamarkupsubsection{Basic names \label{sec:basic-names}%
+\isamarkupsubsection{Basic names \label{sec:basic-name}%
 }
 \isamarkuptrue%
 %
@@ -583,7 +775,8 @@
   These special versions provide copies of the basic name space, apart
   from anything that normally appears in the user text.  For example,
   system generated variables in Isar proof contexts are usually marked
-  as internal, which prevents mysterious name references like \isa{xaa} to appear in the text.
+  as internal, which prevents mysterious names like \isa{xaa} to
+  appear in human-readable text.
 
   \medskip Manipulating binding scopes often requires on-the-fly
   renamings.  A \emph{name context} contains a collection of already
@@ -618,6 +811,9 @@
   \indexdef{}{ML}{Name.invents}\verb|Name.invents: Name.context -> string -> int -> string list| \\
   \indexdef{}{ML}{Name.variants}\verb|Name.variants: string list -> Name.context -> string list * Name.context| \\
   \end{mldecls}
+  \begin{mldecls}
+  \indexdef{}{ML}{Variable.names\_of}\verb|Variable.names_of: Proof.context -> Name.context| \\
+  \end{mldecls}
 
   \begin{description}
 
@@ -638,6 +834,14 @@
   \item \verb|Name.variants|~\isa{names\ context} produces fresh
   variants of \isa{names}; the result is entered into the context.
 
+  \item \verb|Variable.names_of|~\isa{ctxt} retrieves the context
+  of declared type and term variable names.  Projecting a proof
+  context down to a primitive name context is occasionally useful when
+  invoking lower-level operations.  Regular management of ``fresh
+  variables'' is done by suitable operations of structure \verb|Variable|, which is also able to provide an official status of
+  ``locally fixed variable'' within the logical environment (cf.\
+  \secref{sec:variables}).
+
   \end{description}%
 \end{isamarkuptext}%
 \isamarkuptrue%
@@ -649,7 +853,7 @@
 %
 \endisadelimmlref
 %
-\isamarkupsubsection{Indexed names%
+\isamarkupsubsection{Indexed names \label{sec:indexname}%
 }
 \isamarkuptrue%
 %
@@ -663,9 +867,9 @@
   \isa{maxidx\ {\isacharplus}\ {\isadigit{1}}}; the maximum index of an empty collection is
   \isa{{\isacharminus}{\isadigit{1}}}.
 
-  Occasionally, basic names and indexed names are injected into the
-  same pair type: the (improper) indexname \isa{{\isacharparenleft}x{\isacharcomma}\ {\isacharminus}{\isadigit{1}}{\isacharparenright}} is used
-  to encode basic names.
+  Occasionally, basic names are injected into the same pair type of
+  indexed names: then \isa{{\isacharparenleft}x{\isacharcomma}\ {\isacharminus}{\isadigit{1}}{\isacharparenright}} is used to encode the basic
+  name \isa{x}.
 
   \medskip Isabelle syntax observes the following rules for
   representing an indexname \isa{{\isacharparenleft}x{\isacharcomma}\ i{\isacharparenright}} as a packed string:
@@ -680,11 +884,12 @@
 
   \end{itemize}
 
-  Indexnames may acquire large index numbers over time.  Results are
-  normalized towards \isa{{\isadigit{0}}} at certain checkpoints, notably at
-  the end of a proof.  This works by producing variants of the
-  corresponding basic name components.  For example, the collection
-  \isa{{\isacharquery}x{\isadigit{1}}{\isacharcomma}\ {\isacharquery}x{\isadigit{7}}{\isacharcomma}\ {\isacharquery}x{\isadigit{4}}{\isadigit{2}}} becomes \isa{{\isacharquery}x{\isacharcomma}\ {\isacharquery}xa{\isacharcomma}\ {\isacharquery}xb}.%
+  Indexnames may acquire large index numbers after several maxidx
+  shifts have been applied.  Results are usually normalized towards
+  \isa{{\isadigit{0}}} at certain checkpoints, notably at the end of a proof.
+  This works by producing variants of the corresponding basic name
+  components.  For example, the collection \isa{{\isacharquery}x{\isadigit{1}}{\isacharcomma}\ {\isacharquery}x{\isadigit{7}}{\isacharcomma}\ {\isacharquery}x{\isadigit{4}}{\isadigit{2}}}
+  becomes \isa{{\isacharquery}x{\isacharcomma}\ {\isacharquery}xa{\isacharcomma}\ {\isacharquery}xb}.%
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
@@ -704,7 +909,8 @@
   \item \verb|indexname| represents indexed names.  This is an
   abbreviation for \verb|string * int|.  The second component is
   usually non-negative, except for situations where \isa{{\isacharparenleft}x{\isacharcomma}\ {\isacharminus}{\isadigit{1}}{\isacharparenright}}
-  is used to embed basic names into this type.
+  is used to inject basic names into this type.  Other negative
+  indexes should not be used.
 
   \end{description}%
 \end{isamarkuptext}%
@@ -717,56 +923,31 @@
 %
 \endisadelimmlref
 %
-\isamarkupsubsection{Qualified names and name spaces%
+\isamarkupsubsection{Long names \label{sec:long-name}%
 }
 \isamarkuptrue%
 %
 \begin{isamarkuptext}%
-A \emph{qualified name} consists of a non-empty sequence of basic
-  name components.  The packed representation uses a dot as separator,
-  as in ``\isa{A{\isachardot}b{\isachardot}c}''.  The last component is called \emph{base}
-  name, the remaining prefix \emph{qualifier} (which may be empty).
-  The idea of qualified names is to encode nested structures by
-  recording the access paths as qualifiers.  For example, an item
-  named ``\isa{A{\isachardot}b{\isachardot}c}'' may be understood as a local entity \isa{c}, within a local structure \isa{b}, within a global
-  structure \isa{A}.  Typically, name space hierarchies consist of
-  1--2 levels of qualification, but this need not be always so.
+A \emph{long name} consists of a sequence of non-empty name
+  components.  The packed representation uses a dot as separator, as
+  in ``\isa{A{\isachardot}b{\isachardot}c}''.  The last component is called \emph{base
+  name}, the remaining prefix is called \emph{qualifier} (which may be
+  empty).  The qualifier can be understood as the access path to the
+  named entity while passing through some nested block-structure,
+  although our free-form long names do not really enforce any strict
+  discipline.
+
+  For example, an item named ``\isa{A{\isachardot}b{\isachardot}c}'' may be understood as
+  a local entity \isa{c}, within a local structure \isa{b},
+  within a global structure \isa{A}.  In practice, long names
+  usually represent 1--3 levels of qualification.  User ML code should
+  not make any assumptions about the particular structure of long
+  names!
 
   The empty name is commonly used as an indication of unnamed
-  entities, whenever this makes any sense.  The basic operations on
-  qualified names are smart enough to pass through such improper names
-  unchanged.
-
-  \medskip A \isa{naming} policy tells how to turn a name
-  specification into a fully qualified internal name (by the \isa{full} operation), and how fully qualified names may be accessed
-  externally.  For example, the default naming policy is to prefix an
-  implicit path: \isa{full\ x} produces \isa{path{\isachardot}x}, and the
-  standard accesses for \isa{path{\isachardot}x} include both \isa{x} and
-  \isa{path{\isachardot}x}.  Normally, the naming is implicit in the theory or
-  proof context; there are separate versions of the corresponding.
-
-  \medskip A \isa{name\ space} manages a collection of fully
-  internalized names, together with a mapping between external names
-  and internal names (in both directions).  The corresponding \isa{intern} and \isa{extern} operations are mostly used for
-  parsing and printing only!  The \isa{declare} operation augments
-  a name space according to the accesses determined by the naming
-  policy.
-
-  \medskip As a general principle, there is a separate name space for
-  each kind of formal entity, e.g.\ logical constant, type
-  constructor, type class, theorem.  It is usually clear from the
-  occurrence in concrete syntax (or from the scope) which kind of
-  entity a name refers to.  For example, the very same name \isa{c} may be used uniformly for a constant, type constructor, and
-  type class.
-
-  There are common schemes to name theorems systematically, according
-  to the name of the main logical entity involved, e.g.\ \isa{c{\isachardot}intro} for a canonical theorem related to constant \isa{c}.
-  This technique of mapping names from one space into another requires
-  some care in order to avoid conflicts.  In particular, theorem names
-  derived from a type constructor or type class are better suffixed in
-  addition to the usual qualification, e.g.\ \isa{c{\isacharunderscore}type{\isachardot}intro}
-  and \isa{c{\isacharunderscore}class{\isachardot}intro} for theorems related to type \isa{c}
-  and class \isa{c}, respectively.%
+  entities, or entities that are not entered into the corresponding
+  name space, whenever this makes any sense.  The basic operations on
+  long names map empty names again to empty names.%
 \end{isamarkuptext}%
 \isamarkuptrue%
 %
@@ -784,6 +965,100 @@
   \indexdef{}{ML}{Long\_Name.implode}\verb|Long_Name.implode: string list -> string| \\
   \indexdef{}{ML}{Long\_Name.explode}\verb|Long_Name.explode: string -> string list| \\
   \end{mldecls}
+
+  \begin{description}
+
+  \item \verb|Long_Name.base_name|~\isa{name} returns the base name
+  of a long name.
+
+  \item \verb|Long_Name.qualifier|~\isa{name} returns the qualifier
+  of a long name.
+
+  \item \verb|Long_Name.append|~\isa{name\isactrlisub {\isadigit{1}}\ name\isactrlisub {\isadigit{2}}} appends two long
+  names.
+
+  \item \verb|Long_Name.implode|~\isa{names} and \verb|Long_Name.explode|~\isa{name} convert between the packed string
+  representation and the explicit list form of long names.
+
+  \end{description}%
+\end{isamarkuptext}%
+\isamarkuptrue%
+%
+\endisatagmlref
+{\isafoldmlref}%
+%
+\isadelimmlref
+%
+\endisadelimmlref
+%
+\isamarkupsubsection{Name spaces \label{sec:name-space}%
+}
+\isamarkuptrue%
+%
+\begin{isamarkuptext}%
+A \isa{name\ space} manages a collection of long names,
+  together with a mapping between partially qualified external names
+  and fully qualified internal names (in both directions).  Note that
+  the corresponding \isa{intern} and \isa{extern} operations
+  are mostly used for parsing and printing only!  The \isa{declare} operation augments a name space according to the accesses
+  determined by a given binding, and a naming policy from the context.
+
+  \medskip A \isa{binding} specifies details about the prospective
+  long name of a newly introduced formal entity.  It consists of a
+  base name, prefixes for qualification (separate ones for system
+  infrastructure and user-space mechanisms), a slot for the original
+  source position, and some additional flags.
+
+  \medskip A \isa{naming} provides some additional details for
+  producing a long name from a binding.  Normally, the naming is
+  implicit in the theory or proof context.  The \isa{full}
+  operation (and its variants for different context types) produces a
+  fully qualified internal name to be entered into a name space.  The
+  main equation of this ``chemical reaction'' when binding new
+  entities in a context is as follows:
+
+  \smallskip
+  \begin{tabular}{l}
+  \isa{binding\ {\isacharplus}\ naming\ {\isasymlongrightarrow}\ long\ name\ {\isacharplus}\ name\ space\ accesses}
+  \end{tabular}
+  \smallskip
+
+  \medskip As a general principle, there is a separate name space for
+  each kind of formal entity, e.g.\ fact, logical constant, type
+  constructor, type class.  It is usually clear from the occurrence in
+  concrete syntax (or from the scope) which kind of entity a name
+  refers to.  For example, the very same name \isa{c} may be used
+  uniformly for a constant, type constructor, and type class.
+
+  There are common schemes to name derived entities systematically
+  according to the name of the main logical entity involved, e.g.\
+  fact \isa{c{\isachardot}intro} for a canonical introduction rule related to
+  constant \isa{c}.  This technique of mapping names from one
+  space into another requires some care in order to avoid conflicts.
+  In particular, theorem names derived from a type constructor or type
+  class are better suffixed in addition to the usual qualification,
+  e.g.\ \isa{c{\isacharunderscore}type{\isachardot}intro} and \isa{c{\isacharunderscore}class{\isachardot}intro} for
+  theorems related to type \isa{c} and class \isa{c},
+  respectively.%
+\end{isamarkuptext}%
+\isamarkuptrue%
+%
+\isadelimmlref
+%
+\endisadelimmlref
+%
+\isatagmlref
+%
+\begin{isamarkuptext}%
+\begin{mldecls}
+  \indexdef{}{ML type}{binding}\verb|type binding| \\
+  \indexdef{}{ML}{Binding.empty}\verb|Binding.empty: binding| \\
+  \indexdef{}{ML}{Binding.name}\verb|Binding.name: string -> binding| \\
+  \indexdef{}{ML}{Binding.qualify}\verb|Binding.qualify: bool -> string -> binding -> binding| \\
+  \indexdef{}{ML}{Binding.prefix}\verb|Binding.prefix: bool -> string -> binding -> binding| \\
+  \indexdef{}{ML}{Binding.conceal}\verb|Binding.conceal: binding -> binding| \\
+  \indexdef{}{ML}{Binding.str\_of}\verb|Binding.str_of: binding -> string| \\
+  \end{mldecls}
   \begin{mldecls}
   \indexdef{}{ML type}{Name\_Space.naming}\verb|type Name_Space.naming| \\
   \indexdef{}{ML}{Name\_Space.default\_naming}\verb|Name_Space.default_naming: Name_Space.naming| \\
@@ -798,21 +1073,39 @@
 \verb|  string * Name_Space.T| \\
   \indexdef{}{ML}{Name\_Space.intern}\verb|Name_Space.intern: Name_Space.T -> string -> string| \\
   \indexdef{}{ML}{Name\_Space.extern}\verb|Name_Space.extern: Name_Space.T -> string -> string| \\
+  \indexdef{}{ML}{Name\_Space.is\_concealed}\verb|Name_Space.is_concealed: Name_Space.T -> string -> bool|
   \end{mldecls}
 
   \begin{description}
 
-  \item \verb|Long_Name.base_name|~\isa{name} returns the base name of a
-  qualified name.
+  \item \verb|binding| represents the abstract concept of name
+  bindings.
+
+  \item \verb|Binding.empty| is the empty binding.
 
-  \item \verb|Long_Name.qualifier|~\isa{name} returns the qualifier
-  of a qualified name.
+  \item \verb|Binding.name|~\isa{name} produces a binding with base
+  name \isa{name}.
+
+  \item \verb|Binding.qualify|~\isa{mandatory\ name\ binding}
+  prefixes qualifier \isa{name} to \isa{binding}.  The \isa{mandatory} flag tells if this name component always needs to be
+  given in name space accesses --- this is mostly \isa{false} in
+  practice.  Note that this part of qualification is typically used in
+  derived specification mechanisms.
 
-  \item \verb|Long_Name.append|~\isa{name\isactrlisub {\isadigit{1}}\ name\isactrlisub {\isadigit{2}}}
-  appends two qualified names.
+  \item \verb|Binding.prefix| is similar to \verb|Binding.qualify|, but
+  affects the system prefix.  This part of extra qualification is
+  typically used in the infrastructure for modular specifications,
+  notably ``local theory targets'' (see also \chref{ch:local-theory}).
 
-  \item \verb|Long_Name.implode|~\isa{names} and \verb|Long_Name.explode|~\isa{name} convert between the packed string
-  representation and the explicit list form of qualified names.
+  \item \verb|Binding.conceal|~\isa{binding} indicates that the
+  binding shall refer to an entity that serves foundational purposes
+  only.  This flag helps to mark implementation details of
+  specification mechanism etc.  Other tools should not depend on the
+  particulars of concealed entities (cf.\ \verb|Name_Space.is_concealed|).
+
+  \item \verb|Binding.str_of|~\isa{binding} produces a string
+  representation for human-readable output, together with some formal
+  markup that might get used in GUI front-ends, for example.
 
   \item \verb|Name_Space.naming| represents the abstract concept of
   a naming policy.
@@ -853,6 +1146,10 @@
   This operation is mostly for printing!  User code should not rely on
   the precise result too much.
 
+  \item \verb|Name_Space.is_concealed|~\isa{space\ name} indicates
+  whether \isa{name} refers to a strictly private entity that
+  other tools are supposed to ignore!
+
   \end{description}%
 \end{isamarkuptext}%
 \isamarkuptrue%
changeset 35001	31f8d9eaceff
parent 33524	a08e6c1cbc04
child 35414	cc8e4276d093