--- a/doc-src/IsarImplementation/Thy/document/Prelim.tex Thu Jun 24 16:27:40 2010 +0100
+++ b/doc-src/IsarImplementation/Thy/document/Prelim.tex Thu Jun 24 21:57:18 2010 +0200
@@ -648,8 +648,10 @@
\begin{enumerate}
- \item a single ASCII character ``\isa{c}'' or raw byte in the
- range of 128\dots 255, for example ``\verb,a,'',
+ \item a single ASCII character ``\isa{c}'', for example
+ ``\verb,a,'',
+
+ \item a codepoint according to UTF8 (non-ASCII byte sequence),
\item a regular symbol ``\verb,\,\verb,<,\isa{ident}\verb,>,'',
for example ``\verb,\,\verb,<alpha>,'',
@@ -673,19 +675,17 @@
``\verb,\,\verb,<alpha>,'' is classified as a letter, which means it
may occur within regular Isabelle identifiers.
- Since the character set underlying Isabelle symbols is 7-bit ASCII
- and 8-bit characters are passed through transparently, Isabelle can
- also process Unicode/UCS data in UTF-8 encoding.\footnote{When
- counting precise source positions internally, bytes in the range of
- 128\dots 191 are ignored. In UTF-8 encoding, this interval covers
- the additional trailer bytes, so Isabelle happens to count Unicode
- characters here, not bytes in memory. In ISO-Latin encoding, the
- ignored range merely includes some extra punctuation characters that
- even have replacements within the standard collection of Isabelle
- symbols; the accented letters range is counted properly.} Unicode
- provides its own collection of mathematical symbols, but within the
- core Isabelle/ML world there is no link to the standard collection
- of Isabelle regular symbols.
+ The character set underlying Isabelle symbols is 7-bit ASCII, but
+ 8-bit character sequences are passed-through unchanged. Unicode/UCS
+ data in UTF-8 encoding is processed in a non-strict fashion, such
+ that well-formed code sequences are recognized
+ accordingly.\footnote{Note that ISO-Latin-1 differs from UTF-8 only
+ in some special punctuation characters that even have replacements
+ within the standard collection of Isabelle symbols. Text consisting
+ of ASCII plus accented letters can be processed in either encoding.}
+ Unicode provides its own collection of mathematical symbols, but
+ within the core Isabelle/ML world there is no link to the standard
+ collection of Isabelle regular symbols.
\medskip Output of Isabelle symbols depends on the print mode
(\secref{print-mode}). For example, the standard {\LaTeX} setup of
@@ -733,7 +733,7 @@
\cite{isabelle-isar-ref}.
\item \verb|Symbol.sym| is a concrete datatype that represents
- the different kinds of symbols explicitly, with constructors \verb|Symbol.Char|, \verb|Symbol.Sym|, \verb|Symbol.Ctrl|, \verb|Symbol.Raw|.
+ the different kinds of symbols explicitly, with constructors \verb|Symbol.Char|, \verb|Symbol.Sym|, \verb|Symbol.UTF8|, \verb|Symbol.Ctrl|, \verb|Symbol.Raw|.
\item \verb|Symbol.decode| converts the string representation of a
symbol into the datatype version.