src/Doc/IsarImplementation/ML.thy
author wenzelm
Tue Mar 18 11:07:47 2014 +0100 (2014-03-18)
changeset 56199 8e8d28ed7529
parent 55838 e120a15b0ee6
child 56303 4cc3f4db3447
permissions -rw-r--r--
tuned signature -- rearranged modules;
     1 theory "ML"
     2 imports Base
     3 begin
     4 
     5 chapter {* Isabelle/ML *}
     6 
     7 text {* Isabelle/ML is best understood as a certain culture based on
     8   Standard ML.  Thus it is not a new programming language, but a
     9   certain way to use SML at an advanced level within the Isabelle
    10   environment.  This covers a variety of aspects that are geared
    11   towards an efficient and robust platform for applications of formal
    12   logic with fully foundational proof construction --- according to
    13   the well-known \emph{LCF principle}.  There is specific
    14   infrastructure with library modules to address the needs of this
    15   difficult task.  For example, the raw parallel programming model of
    16   Poly/ML is presented as considerably more abstract concept of
    17   \emph{future values}, which is then used to augment the inference
    18   kernel, proof interpreter, and theory loader accordingly.
    19 
    20   The main aspects of Isabelle/ML are introduced below.  These
    21   first-hand explanations should help to understand how proper
    22   Isabelle/ML is to be read and written, and to get access to the
    23   wealth of experience that is expressed in the source text and its
    24   history of changes.\footnote{See
    25   @{url "http://isabelle.in.tum.de/repos/isabelle"} for the full
    26   Mercurial history.  There are symbolic tags to refer to official
    27   Isabelle releases, as opposed to arbitrary \emph{tip} versions that
    28   merely reflect snapshots that are never really up-to-date.}  *}
    29 
    30 
    31 section {* Style and orthography *}
    32 
    33 text {* The sources of Isabelle/Isar are optimized for
    34   \emph{readability} and \emph{maintainability}.  The main purpose is
    35   to tell an informed reader what is really going on and how things
    36   really work.  This is a non-trivial aim, but it is supported by a
    37   certain style of writing Isabelle/ML that has emerged from long
    38   years of system development.\footnote{See also the interesting style
    39   guide for OCaml
    40   @{url "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"}
    41   which shares many of our means and ends.}
    42 
    43   The main principle behind any coding style is \emph{consistency}.
    44   For a single author of a small program this merely means ``choose
    45   your style and stick to it''.  A complex project like Isabelle, with
    46   long years of development and different contributors, requires more
    47   standardization.  A coding style that is changed every few years or
    48   with every new contributor is no style at all, because consistency
    49   is quickly lost.  Global consistency is hard to achieve, though.
    50   Nonetheless, one should always strive at least for local consistency
    51   of modules and sub-systems, without deviating from some general
    52   principles how to write Isabelle/ML.
    53 
    54   In a sense, good coding style is like an \emph{orthography} for the
    55   sources: it helps to read quickly over the text and see through the
    56   main points, without getting distracted by accidental presentation
    57   of free-style code.
    58 *}
    59 
    60 
    61 subsection {* Header and sectioning *}
    62 
    63 text {* Isabelle source files have a certain standardized header
    64   format (with precise spacing) that follows ancient traditions
    65   reaching back to the earliest versions of the system by Larry
    66   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.
    67 
    68   The header includes at least @{verbatim Title} and @{verbatim
    69   Author} entries, followed by a prose description of the purpose of
    70   the module.  The latter can range from a single line to several
    71   paragraphs of explanations.
    72 
    73   The rest of the file is divided into sections, subsections,
    74   subsubsections, paragraphs etc.\ using a simple layout via ML
    75   comments as follows.
    76 
    77 \begin{verbatim}
    78 (*** section ***)
    79 
    80 (** subsection **)
    81 
    82 (* subsubsection *)
    83 
    84 (*short paragraph*)
    85 
    86 (*
    87   long paragraph,
    88   with more text
    89 *)
    90 \end{verbatim}
    91 
    92   As in regular typography, there is some extra space \emph{before}
    93   section headings that are adjacent to plain text (not other headings
    94   as in the example above).
    95 
    96   \medskip The precise wording of the prose text given in these
    97   headings is chosen carefully to introduce the main theme of the
    98   subsequent formal ML text.
    99 *}
   100 
   101 
   102 subsection {* Naming conventions *}
   103 
   104 text {* Since ML is the primary medium to express the meaning of the
   105   source text, naming of ML entities requires special care.
   106 
   107   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely
   108   4, but not more) that are separated by underscore.  There are three
   109   variants concerning upper or lower case letters, which are used for
   110   certain ML categories as follows:
   111 
   112   \medskip
   113   \begin{tabular}{lll}
   114   variant & example & ML categories \\\hline
   115   lower-case & @{ML_text foo_bar} & values, types, record fields \\
   116   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
   117   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
   118   \end{tabular}
   119   \medskip
   120 
   121   For historical reasons, many capitalized names omit underscores,
   122   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.
   123   Genuine mixed-case names are \emph{not} used, because clear division
   124   of words is essential for readability.\footnote{Camel-case was
   125   invented to workaround the lack of underscore in some early
   126   non-ASCII character sets.  Later it became habitual in some language
   127   communities that are now strong in numbers.}
   128 
   129   A single (capital) character does not count as ``word'' in this
   130   respect: some Isabelle/ML names are suffixed by extra markers like
   131   this: @{ML_text foo_barT}.
   132 
   133   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
   134   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
   135   foo''''} or more.  Decimal digits scale better to larger numbers,
   136   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
   137 
   138   \paragraph{Scopes.}  Apart from very basic library modules, ML
   139   structures are not ``opened'', but names are referenced with
   140   explicit qualification, as in @{ML Syntax.string_of_term} for
   141   example.  When devising names for structures and their components it
   142   is important aim at eye-catching compositions of both parts, because
   143   this is how they are seen in the sources and documentation.  For the
   144   same reasons, aliases of well-known library functions should be
   145   avoided.
   146 
   147   Local names of function abstraction or case/let bindings are
   148   typically shorter, sometimes using only rudiments of ``words'',
   149   while still avoiding cryptic shorthands.  An auxiliary function
   150   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is
   151   considered bad style.
   152 
   153   Example:
   154 
   155   \begin{verbatim}
   156   (* RIGHT *)
   157 
   158   fun print_foo ctxt foo =
   159     let
   160       fun print t = ... Syntax.string_of_term ctxt t ...
   161     in ... end;
   162 
   163 
   164   (* RIGHT *)
   165 
   166   fun print_foo ctxt foo =
   167     let
   168       val string_of_term = Syntax.string_of_term ctxt;
   169       fun print t = ... string_of_term t ...
   170     in ... end;
   171 
   172 
   173   (* WRONG *)
   174 
   175   val string_of_term = Syntax.string_of_term;
   176 
   177   fun print_foo ctxt foo =
   178     let
   179       fun aux t = ... string_of_term ctxt t ...
   180     in ... end;
   181 
   182   \end{verbatim}
   183 
   184 
   185   \paragraph{Specific conventions.} Here are some specific name forms
   186   that occur frequently in the sources.
   187 
   188   \begin{itemize}
   189 
   190   \item A function that maps @{ML_text foo} to @{ML_text bar} is
   191   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
   192   @{ML_text foo2bar}, @{ML_text bar_from_foo}, @{ML_text
   193   bar_for_foo}, or @{ML_text bar4foo}).
   194 
   195   \item The name component @{ML_text legacy} means that the operation
   196   is about to be discontinued soon.
   197 
   198   \item The name component @{ML_text old} means that this is historic
   199   material that might disappear at some later stage.
   200 
   201   \item The name component @{ML_text global} means that this works
   202   with the background theory instead of the regular local context
   203   (\secref{sec:context}), sometimes for historical reasons, sometimes
   204   due a genuine lack of locality of the concept involved, sometimes as
   205   a fall-back for the lack of a proper context in the application
   206   code.  Whenever there is a non-global variant available, the
   207   application should be migrated to use it with a proper local
   208   context.
   209 
   210   \item Variables of the main context types of the Isabelle/Isar
   211   framework (\secref{sec:context} and \chref{ch:local-theory}) have
   212   firm naming conventions as follows:
   213 
   214   \begin{itemize}
   215 
   216   \item theories are called @{ML_text thy}, rarely @{ML_text theory}
   217   (never @{ML_text thry})
   218 
   219   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text
   220   context} (never @{ML_text ctx})
   221 
   222   \item generic contexts are called @{ML_text context}, rarely
   223   @{ML_text ctxt}
   224 
   225   \item local theories are called @{ML_text lthy}, except for local
   226   theories that are treated as proof context (which is a semantic
   227   super-type)
   228 
   229   \end{itemize}
   230 
   231   Variations with primed or decimal numbers are always possible, as
   232   well as sematic prefixes like @{ML_text foo_thy} or @{ML_text
   233   bar_ctxt}, but the base conventions above need to be preserved.
   234   This allows to visualize the their data flow via plain regular
   235   expressions in the editor.
   236 
   237   \item The main logical entities (\secref{ch:logic}) have established
   238   naming convention as follows:
   239 
   240   \begin{itemize}
   241 
   242   \item sorts are called @{ML_text S}
   243 
   244   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text
   245   ty} (never @{ML_text t})
   246 
   247   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text
   248   tm} (never @{ML_text trm})
   249 
   250   \item certified types are called @{ML_text cT}, rarely @{ML_text
   251   T}, with variants as for types
   252 
   253   \item certified terms are called @{ML_text ct}, rarely @{ML_text
   254   t}, with variants as for terms (never @{ML_text ctrm})
   255 
   256   \item theorems are called @{ML_text th}, or @{ML_text thm}
   257 
   258   \end{itemize}
   259 
   260   Proper semantic names override these conventions completely.  For
   261   example, the left-hand side of an equation (as a term) can be called
   262   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known
   263   to be a variable can be called @{ML_text v} or @{ML_text x}.
   264 
   265   \item Tactics (\secref{sec:tactics}) are sufficiently important to
   266   have specific naming conventions.  The name of a basic tactic
   267   definition always has a @{ML_text "_tac"} suffix, the subgoal index
   268   (if applicable) is always called @{ML_text i}, and the goal state
   269   (if made explicit) is usually called @{ML_text st} instead of the
   270   somewhat misleading @{ML_text thm}.  Any other arguments are given
   271   before the latter two, and the general context is given first.
   272   Example:
   273 
   274   \begin{verbatim}
   275   fun my_tac ctxt arg1 arg2 i st = ...
   276   \end{verbatim}
   277 
   278   Note that the goal state @{ML_text st} above is rarely made
   279   explicit, if tactic combinators (tacticals) are used as usual.
   280 
   281   \end{itemize}
   282 *}
   283 
   284 
   285 subsection {* General source layout *}
   286 
   287 text {* The general Isabelle/ML source layout imitates regular
   288   type-setting to some extent, augmented by the requirements for
   289   deeply nested expressions that are commonplace in functional
   290   programming.
   291 
   292   \paragraph{Line length} is 80 characters according to ancient
   293   standards, but we allow as much as 100 characters (not
   294   more).\footnote{Readability requires to keep the beginning of a line
   295   in view while watching its end.  Modern wide-screen displays do not
   296   change the way how the human brain works.  Sources also need to be
   297   printable on plain paper with reasonable font-size.} The extra 20
   298   characters acknowledge the space requirements due to qualified
   299   library references in Isabelle/ML.
   300 
   301   \paragraph{White-space} is used to emphasize the structure of
   302   expressions, following mostly standard conventions for mathematical
   303   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
   304   defines positioning of spaces for parentheses, punctuation, and
   305   infixes as illustrated here:
   306 
   307   \begin{verbatim}
   308   val x = y + z * (a + b);
   309   val pair = (a, b);
   310   val record = {foo = 1, bar = 2};
   311   \end{verbatim}
   312 
   313   Lines are normally broken \emph{after} an infix operator or
   314   punctuation character.  For example:
   315 
   316   \begin{verbatim}
   317   val x =
   318     a +
   319     b +
   320     c;
   321 
   322   val tuple =
   323    (a,
   324     b,
   325     c);
   326   \end{verbatim}
   327 
   328   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the
   329   start of the line, but punctuation is always at the end.
   330 
   331   Function application follows the tradition of @{text "\<lambda>"}-calculus,
   332   not informal mathematics.  For example: @{ML_text "f a b"} for a
   333   curried function, or @{ML_text "g (a, b)"} for a tupled function.
   334   Note that the space between @{ML_text g} and the pair @{ML_text
   335   "(a, b)"} follows the important principle of
   336   \emph{compositionality}: the layout of @{ML_text "g p"} does not
   337   change when @{ML_text "p"} is refined to the concrete pair
   338   @{ML_text "(a, b)"}.
   339 
   340   \paragraph{Indentation} uses plain spaces, never hard
   341   tabulators.\footnote{Tabulators were invented to move the carriage
   342   of a type-writer to certain predefined positions.  In software they
   343   could be used as a primitive run-length compression of consecutive
   344   spaces, but the precise result would depend on non-standardized
   345   editor configuration.}
   346 
   347   Each level of nesting is indented by 2 spaces, sometimes 1, very
   348   rarely 4, never 8 or any other odd number.
   349 
   350   Indentation follows a simple logical format that only depends on the
   351   nesting depth, not the accidental length of the text that initiates
   352   a level of nesting.  Example:
   353 
   354   \begin{verbatim}
   355   (* RIGHT *)
   356 
   357   if b then
   358     expr1_part1
   359     expr1_part2
   360   else
   361     expr2_part1
   362     expr2_part2
   363 
   364 
   365   (* WRONG *)
   366 
   367   if b then expr1_part1
   368             expr1_part2
   369   else expr2_part1
   370        expr2_part2
   371   \end{verbatim}
   372 
   373   The second form has many problems: it assumes a fixed-width font
   374   when viewing the sources, it uses more space on the line and thus
   375   makes it hard to observe its strict length limit (working against
   376   \emph{readability}), it requires extra editing to adapt the layout
   377   to changes of the initial text (working against
   378   \emph{maintainability}) etc.
   379 
   380   \medskip For similar reasons, any kind of two-dimensional or tabular
   381   layouts, ASCII-art with lines or boxes of asterisks etc.\ should be
   382   avoided.
   383 
   384   \paragraph{Complex expressions} that consist of multi-clausal
   385   function definitions, @{ML_text handle}, @{ML_text case},
   386   @{ML_text let} (and combinations) require special attention.  The
   387   syntax of Standard ML is quite ambitious and admits a lot of
   388   variance that can distort the meaning of the text.
   389 
   390   Clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
   391   @{ML_text case} get extra indentation to indicate the nesting
   392   clearly.  Example:
   393 
   394   \begin{verbatim}
   395   (* RIGHT *)
   396 
   397   fun foo p1 =
   398         expr1
   399     | foo p2 =
   400         expr2
   401 
   402 
   403   (* WRONG *)
   404 
   405   fun foo p1 =
   406     expr1
   407     | foo p2 =
   408     expr2
   409   \end{verbatim}
   410 
   411   Body expressions consisting of @{ML_text case} or @{ML_text let}
   412   require care to maintain compositionality, to prevent loss of
   413   logical indentation where it is especially important to see the
   414   structure of the text.  Example:
   415 
   416   \begin{verbatim}
   417   (* RIGHT *)
   418 
   419   fun foo p1 =
   420         (case e of
   421           q1 => ...
   422         | q2 => ...)
   423     | foo p2 =
   424         let
   425           ...
   426         in
   427           ...
   428         end
   429 
   430 
   431   (* WRONG *)
   432 
   433   fun foo p1 = case e of
   434       q1 => ...
   435     | q2 => ...
   436     | foo p2 =
   437     let
   438       ...
   439     in
   440       ...
   441     end
   442   \end{verbatim}
   443 
   444   Extra parentheses around @{ML_text case} expressions are optional,
   445   but help to analyse the nesting based on character matching in the
   446   editor.
   447 
   448   \medskip There are two main exceptions to the overall principle of
   449   compositionality in the layout of complex expressions.
   450 
   451   \begin{enumerate}
   452 
   453   \item @{ML_text "if"} expressions are iterated as if there would be
   454   a multi-branch conditional in SML, e.g.
   455 
   456   \begin{verbatim}
   457   (* RIGHT *)
   458 
   459   if b1 then e1
   460   else if b2 then e2
   461   else e3
   462   \end{verbatim}
   463 
   464   \item @{ML_text fn} abstractions are often layed-out as if they
   465   would lack any structure by themselves.  This traditional form is
   466   motivated by the possibility to shift function arguments back and
   467   forth wrt.\ additional combinators.  Example:
   468 
   469   \begin{verbatim}
   470   (* RIGHT *)
   471 
   472   fun foo x y = fold (fn z =>
   473     expr)
   474   \end{verbatim}
   475 
   476   Here the visual appearance is that of three arguments @{ML_text x},
   477   @{ML_text y}, @{ML_text z}.
   478 
   479   \end{enumerate}
   480 
   481   Such weakly structured layout should be use with great care.  Here
   482   are some counter-examples involving @{ML_text let} expressions:
   483 
   484   \begin{verbatim}
   485   (* WRONG *)
   486 
   487   fun foo x = let
   488       val y = ...
   489     in ... end
   490 
   491 
   492   (* WRONG *)
   493 
   494   fun foo x = let
   495     val y = ...
   496   in ... end
   497 
   498 
   499   (* WRONG *)
   500 
   501   fun foo x =
   502   let
   503     val y = ...
   504   in ... end
   505   \end{verbatim}
   506 
   507   \medskip In general the source layout is meant to emphasize the
   508   structure of complex language expressions, not to pretend that SML
   509   had a completely different syntax (say that of Haskell or Java).
   510 *}
   511 
   512 
   513 section {* SML embedded into Isabelle/Isar *}
   514 
   515 text {* ML and Isar are intertwined via an open-ended bootstrap
   516   process that provides more and more programming facilities and
   517   logical content in an alternating manner.  Bootstrapping starts from
   518   the raw environment of existing implementations of Standard ML
   519   (mainly Poly/ML, but also SML/NJ).
   520 
   521   Isabelle/Pure marks the point where the original ML toplevel is
   522   superseded by the Isar toplevel that maintains a uniform context for
   523   arbitrary ML values (see also \secref{sec:context}).  This formal
   524   environment holds ML compiler bindings, logical entities, and many
   525   other things.  Raw SML is never encountered again after the initial
   526   bootstrap of Isabelle/Pure.
   527 
   528   Object-logics like Isabelle/HOL are built within the
   529   Isabelle/ML/Isar environment by introducing suitable theories with
   530   associated ML modules, either inlined or as separate files.  Thus
   531   Isabelle/HOL is defined as a regular user-space application within
   532   the Isabelle framework.  Further add-on tools can be implemented in
   533   ML within the Isar context in the same manner: ML is part of the
   534   standard repertoire of Isabelle, and there is no distinction between
   535   ``user'' and ``developer'' in this respect.
   536 *}
   537 
   538 
   539 subsection {* Isar ML commands *}
   540 
   541 text {* The primary Isar source language provides facilities to ``open
   542   a window'' to the underlying ML compiler.  Especially see the Isar
   543   commands @{command_ref "ML_file"} and @{command_ref "ML"}: both work the
   544   same way, only the source text is provided via a file vs.\ inlined,
   545   respectively.  Apart from embedding ML into the main theory
   546   definition like that, there are many more commands that refer to ML
   547   source, such as @{command_ref setup} or @{command_ref declaration}.
   548   Even more fine-grained embedding of ML into Isar is encountered in
   549   the proof method @{method_ref tactic}, which refines the pending
   550   goal state via a given expression of type @{ML_type tactic}.
   551 *}
   552 
   553 text %mlex {* The following artificial example demonstrates some ML
   554   toplevel declarations within the implicit Isar theory context.  This
   555   is regular functional programming without referring to logical
   556   entities yet.
   557 *}
   558 
   559 ML {*
   560   fun factorial 0 = 1
   561     | factorial n = n * factorial (n - 1)
   562 *}
   563 
   564 text {* Here the ML environment is already managed by Isabelle, i.e.\
   565   the @{ML factorial} function is not yet accessible in the preceding
   566   paragraph, nor in a different theory that is independent from the
   567   current one in the import hierarchy.
   568 
   569   Removing the above ML declaration from the source text will remove
   570   any trace of this definition as expected.  The Isabelle/ML toplevel
   571   environment is managed in a \emph{stateless} way: unlike the raw ML
   572   toplevel there are no global side-effects involved
   573   here.\footnote{Such a stateless compilation environment is also a
   574   prerequisite for robust parallel compilation within independent
   575   nodes of the implicit theory development graph.}
   576 
   577   \medskip The next example shows how to embed ML into Isar proofs, using
   578  @{command_ref "ML_prf"} instead of Instead of @{command_ref "ML"}.
   579   As illustrated below, the effect on the ML environment is local to
   580   the whole proof body, ignoring the block structure.
   581 *}
   582 
   583 notepad
   584 begin
   585   ML_prf %"ML" {* val a = 1 *}
   586   {
   587     ML_prf %"ML" {* val b = a + 1 *}
   588   } -- {* Isar block structure ignored by ML environment *}
   589   ML_prf %"ML" {* val c = b + 1 *}
   590 end
   591 
   592 text {* By side-stepping the normal scoping rules for Isar proof
   593   blocks, embedded ML code can refer to the different contexts and
   594   manipulate corresponding entities, e.g.\ export a fact from a block
   595   context.
   596 
   597   \medskip Two further ML commands are useful in certain situations:
   598   @{command_ref ML_val} and @{command_ref ML_command} are
   599   \emph{diagnostic} in the sense that there is no effect on the
   600   underlying environment, and can thus used anywhere (even outside a
   601   theory).  The examples below produce long strings of digits by
   602   invoking @{ML factorial}: @{command ML_val} already takes care of
   603   printing the ML toplevel result, but @{command ML_command} is silent
   604   so we produce an explicit output message.  *}
   605 
   606 ML_val {* factorial 100 *}
   607 ML_command {* writeln (string_of_int (factorial 100)) *}
   608 
   609 notepad
   610 begin
   611   ML_val {* factorial 100 *}
   612   ML_command {* writeln (string_of_int (factorial 100)) *}
   613 end
   614 
   615 
   616 subsection {* Compile-time context *}
   617 
   618 text {* Whenever the ML compiler is invoked within Isabelle/Isar, the
   619   formal context is passed as a thread-local reference variable.  Thus
   620   ML code may access the theory context during compilation, by reading
   621   or writing the (local) theory under construction.  Note that such
   622   direct access to the compile-time context is rare.  In practice it
   623   is typically done via some derived ML functions instead.
   624 *}
   625 
   626 text %mlref {*
   627   \begin{mldecls}
   628   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\
   629   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
   630   @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
   631   @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
   632   \end{mldecls}
   633 
   634   \begin{description}
   635 
   636   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory
   637   context of the ML toplevel --- at compile time.  ML code needs to
   638   take care to refer to @{ML "ML_Context.the_generic_context ()"}
   639   correctly.  Recall that evaluation of a function body is delayed
   640   until actual run-time.
   641 
   642   \item @{ML "Context.>>"}~@{text f} applies context transformation
   643   @{text f} to the implicit context of the ML toplevel.
   644 
   645   \item @{ML ML_Thms.bind_thms}~@{text "(name, thms)"} stores a list of
   646   theorems produced in ML both in the (global) theory context and the
   647   ML toplevel, associating it with the provided name.  Theorems are
   648   put into a global ``standard'' format before being stored.
   649 
   650   \item @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to a
   651   singleton fact.
   652 
   653   \end{description}
   654 
   655   It is important to note that the above functions are really
   656   restricted to the compile time, even though the ML compiler is
   657   invoked at run-time.  The majority of ML code either uses static
   658   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or
   659   proof context at run-time, by explicit functional abstraction.
   660 *}
   661 
   662 
   663 subsection {* Antiquotations \label{sec:ML-antiq} *}
   664 
   665 text {* A very important consequence of embedding SML into Isar is the
   666   concept of \emph{ML antiquotation}.  The standard token language of
   667   ML is augmented by special syntactic entities of the following form:
   668 
   669   @{rail \<open>
   670   @{syntax_def antiquote}: '@{' nameref args '}'
   671   \<close>}
   672 
   673   Here @{syntax nameref} and @{syntax args} are regular outer syntax
   674   categories \cite{isabelle-isar-ref}.  Attributes and proof methods
   675   use similar syntax.
   676 
   677   \medskip A regular antiquotation @{text "@{name args}"} processes
   678   its arguments by the usual means of the Isar source language, and
   679   produces corresponding ML source text, either as literal
   680   \emph{inline} text (e.g. @{text "@{term t}"}) or abstract
   681   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation
   682   scheme allows to refer to formal entities in a robust manner, with
   683   proper static scoping and with some degree of logical checking of
   684   small portions of the code.
   685 *}
   686 
   687 
   688 subsection {* Printing ML values *}
   689 
   690 text {* The ML compiler knows about the structure of values according
   691   to their static type, and can print them in the manner of the
   692   toplevel loop, although the details are non-portable.  The
   693   antiquotation @{ML_antiquotation_def "make_string"} provides a
   694   quasi-portable way to refer to this potential capability of the
   695   underlying ML system in generic Isabelle/ML sources.  *}
   696 
   697 text %mlantiq {*
   698   \begin{matharray}{rcl}
   699   @{ML_antiquotation_def "make_string"} & : & @{text ML_antiquotation} \\
   700   \end{matharray}
   701 
   702   @{rail \<open>
   703   @@{ML_antiquotation make_string}
   704   \<close>}
   705 
   706   \begin{description}
   707 
   708   \item @{text "@{make_string}"} inlines a function to print arbitrary
   709 values similar to the ML toplevel.  The result is compiler dependent
   710 and may fall back on "?" in certain situations.
   711 
   712   \end{description}
   713 *}
   714 
   715 text %mlex {* The following artificial example shows how to produce
   716   ad-hoc output of ML values for debugging purposes.  This should not
   717   be done in production code, and not persist in regular Isabelle/ML
   718   sources.  *}
   719 
   720 ML {*
   721   val x = 42;
   722   val y = true;
   723 
   724   writeln (@{make_string} {x = x, y = y});
   725 *}
   726 
   727 
   728 section {* Canonical argument order \label{sec:canonical-argument-order} *}
   729 
   730 text {* Standard ML is a language in the tradition of @{text
   731   "\<lambda>"}-calculus and \emph{higher-order functional programming},
   732   similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical
   733   languages.  Getting acquainted with the native style of representing
   734   functions in that setting can save a lot of extra boiler-plate of
   735   redundant shuffling of arguments, auxiliary abstractions etc.
   736 
   737   Functions are usually \emph{curried}: the idea of turning arguments
   738   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of
   739   type @{text "\<tau>"} is represented by the iterated function space
   740   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known
   741   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried
   742   version fits more smoothly into the basic calculus.\footnote{The
   743   difference is even more significant in higher-order logic, because
   744   the redundant tuple structure needs to be accommodated by formal
   745   reasoning.}
   746 
   747   Currying gives some flexiblity due to \emph{partial application}.  A
   748   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}
   749   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function
   750   etc.  How well this works in practice depends on the order of
   751   arguments.  In the worst case, arguments are arranged erratically,
   752   and using a function in a certain situation always requires some
   753   glue code.  Thus we would get exponentially many oppurtunities to
   754   decorate the code with meaningless permutations of arguments.
   755 
   756   This can be avoided by \emph{canonical argument order}, which
   757   observes certain standard patterns and minimizes adhoc permutations
   758   in their application.  In Isabelle/ML, large portions of text can be
   759   written without auxiliary operations like @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times>
   760   \<alpha>"} or @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} (the latter not
   761   present in the Isabelle/ML library).
   762 
   763   \medskip The basic idea is that arguments that vary less are moved
   764   further to the left than those that vary more.  Two particularly
   765   important categories of functions are \emph{selectors} and
   766   \emph{updates}.
   767 
   768   The subsequent scheme is based on a hypothetical set-like container
   769   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both
   770   the names and types of the associated operations are canonical for
   771   Isabelle/ML.
   772 
   773   \begin{center}
   774   \begin{tabular}{ll}
   775   kind & canonical name and type \\\hline
   776   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\
   777   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\
   778   \end{tabular}
   779   \end{center}
   780 
   781   Given a container @{text "B: \<beta>"}, the partially applied @{text
   782   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and
   783   thus represents the intended denotation directly.  It is customary
   784   to pass the abstract predicate to further operations, not the
   785   concrete container.  The argument order makes it easy to use other
   786   combinators: @{text "forall (member B) list"} will check a list of
   787   elements for membership in @{text "B"} etc. Often the explicit
   788   @{text "list"} is pointless and can be contracted to @{text "forall
   789   (member B)"} to get directly a predicate again.
   790 
   791   In contrast, an update operation varies the container, so it moves
   792   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to
   793   insert a value @{text "a"}.  These can be composed naturally as
   794   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward
   795   inversion of the composition order is due to conventional
   796   mathematical notation, which can be easily amended as explained
   797   below.
   798 *}
   799 
   800 
   801 subsection {* Forward application and composition *}
   802 
   803 text {* Regular function application and infix notation works best for
   804   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g
   805   z)"}.  The important special case of \emph{linear transformation}
   806   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This
   807   becomes hard to read and maintain if the functions are themselves
   808   given as complex expressions.  The notation can be significantly
   809   improved by introducing \emph{forward} versions of application and
   810   composition as follows:
   811 
   812   \medskip
   813   \begin{tabular}{lll}
   814   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\
   815   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\
   816   \end{tabular}
   817   \medskip
   818 
   819   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or
   820   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text
   821   "x"}.
   822 
   823   \medskip There is an additional set of combinators to accommodate
   824   multiple results (via pairs) that are passed on as multiple
   825   arguments (via currying).
   826 
   827   \medskip
   828   \begin{tabular}{lll}
   829   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\
   830   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\
   831   \end{tabular}
   832   \medskip
   833 *}
   834 
   835 text %mlref {*
   836   \begin{mldecls}
   837   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
   838   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
   839   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
   840   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
   841   \end{mldecls}
   842 *}
   843 
   844 
   845 subsection {* Canonical iteration *}
   846 
   847 text {* As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be
   848   understood as update on a configuration of type @{text "\<beta>"},
   849   parametrized by arguments of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}
   850   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates
   851   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a
   852   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f a\<^sub>n"}.
   853   The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.
   854   It can be applied to an initial configuration @{text "b: \<beta>"} to
   855   start the iteration over the given list of arguments: each @{text
   856   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a
   857   cumulative configuration.
   858 
   859   The @{text fold} combinator in Isabelle/ML lifts a function @{text
   860   "f"} as above to its iterated version over a list of arguments.
   861   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates
   862   over a list of lists as expected.
   863 
   864   The variant @{text "fold_rev"} works inside-out over the list of
   865   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.
   866 
   867   The @{text "fold_map"} combinator essentially performs @{text
   868   "fold"} and @{text "map"} simultaneously: each application of @{text
   869   "f"} produces an updated configuration together with a side-result;
   870   the iteration collects all such side-results as a separate list.
   871 *}
   872 
   873 text %mlref {*
   874   \begin{mldecls}
   875   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
   876   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
   877   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
   878   \end{mldecls}
   879 
   880   \begin{description}
   881 
   882   \item @{ML fold}~@{text f} lifts the parametrized update function
   883   @{text "f"} to a list of parameters.
   884 
   885   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text
   886   "f"}, but works inside-out.
   887 
   888   \item @{ML fold_map}~@{text "f"} lifts the parametrized update
   889   function @{text "f"} (with side-result) to a list of parameters and
   890   cumulative side-results.
   891 
   892   \end{description}
   893 
   894   \begin{warn}
   895   The literature on functional programming provides a multitude of
   896   combinators called @{text "foldl"}, @{text "foldr"} etc.  SML97
   897   provides its own variations as @{ML List.foldl} and @{ML
   898   List.foldr}, while the classic Isabelle library also has the
   899   historic @{ML Library.foldl} and @{ML Library.foldr}.  To avoid
   900   unnecessary complication and confusion, all these historical
   901   versions should be ignored, and @{ML fold} (or @{ML fold_rev}) used
   902   exclusively.
   903   \end{warn}
   904 *}
   905 
   906 text %mlex {* The following example shows how to fill a text buffer
   907   incrementally by adding strings, either individually or from a given
   908   list.
   909 *}
   910 
   911 ML {*
   912   val s =
   913     Buffer.empty
   914     |> Buffer.add "digits: "
   915     |> fold (Buffer.add o string_of_int) (0 upto 9)
   916     |> Buffer.content;
   917 
   918   @{assert} (s = "digits: 0123456789");
   919 *}
   920 
   921 text {* Note how @{ML "fold (Buffer.add o string_of_int)"} above saves
   922   an extra @{ML "map"} over the given list.  This kind of peephole
   923   optimization reduces both the code size and the tree structures in
   924   memory (``deforestation''), but it requires some practice to read
   925   and write fluently.
   926 
   927   \medskip The next example elaborates the idea of canonical
   928   iteration, demonstrating fast accumulation of tree content using a
   929   text buffer.
   930 *}
   931 
   932 ML {*
   933   datatype tree = Text of string | Elem of string * tree list;
   934 
   935   fun slow_content (Text txt) = txt
   936     | slow_content (Elem (name, ts)) =
   937         "<" ^ name ^ ">" ^
   938         implode (map slow_content ts) ^
   939         "</" ^ name ^ ">"
   940 
   941   fun add_content (Text txt) = Buffer.add txt
   942     | add_content (Elem (name, ts)) =
   943         Buffer.add ("<" ^ name ^ ">") #>
   944         fold add_content ts #>
   945         Buffer.add ("</" ^ name ^ ">");
   946 
   947   fun fast_content tree =
   948     Buffer.empty |> add_content tree |> Buffer.content;
   949 *}
   950 
   951 text {* The slow part of @{ML slow_content} is the @{ML implode} of
   952   the recursive results, because it copies previously produced strings
   953   again.
   954 
   955   The incremental @{ML add_content} avoids this by operating on a
   956   buffer that is passed through in a linear fashion.  Using @{ML_text
   957   "#>"} and contraction over the actual buffer argument saves some
   958   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}
   959   invocations with concatenated strings could have been split into
   960   smaller parts, but this would have obfuscated the source without
   961   making a big difference in allocations.  Here we have done some
   962   peephole-optimization for the sake of readability.
   963 
   964   Another benefit of @{ML add_content} is its ``open'' form as a
   965   function on buffers that can be continued in further linear
   966   transformations, folding etc.  Thus it is more compositional than
   967   the naive @{ML slow_content}.  As realistic example, compare the
   968   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer
   969   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
   970 
   971   Note that @{ML fast_content} above is only defined as example.  In
   972   many practical situations, it is customary to provide the
   973   incremental @{ML add_content} only and leave the initialization and
   974   termination to the concrete application by the user.
   975 *}
   976 
   977 
   978 section {* Message output channels \label{sec:message-channels} *}
   979 
   980 text {* Isabelle provides output channels for different kinds of
   981   messages: regular output, high-volume tracing information, warnings,
   982   and errors.
   983 
   984   Depending on the user interface involved, these messages may appear
   985   in different text styles or colours.  The standard output for
   986   terminal sessions prefixes each line of warnings by @{verbatim
   987   "###"} and errors by @{verbatim "***"}, but leaves anything else
   988   unchanged.
   989 
   990   Messages are associated with the transaction context of the running
   991   Isar command.  This enables the front-end to manage commands and
   992   resulting messages together.  For example, after deleting a command
   993   from a given theory document version, the corresponding message
   994   output can be retracted from the display.
   995 *}
   996 
   997 text %mlref {*
   998   \begin{mldecls}
   999   @{index_ML writeln: "string -> unit"} \\
  1000   @{index_ML tracing: "string -> unit"} \\
  1001   @{index_ML warning: "string -> unit"} \\
  1002   @{index_ML error: "string -> 'a"} \\
  1003   \end{mldecls}
  1004 
  1005   \begin{description}
  1006 
  1007   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular
  1008   message.  This is the primary message output operation of Isabelle
  1009   and should be used by default.
  1010 
  1011   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special
  1012   tracing message, indicating potential high-volume output to the
  1013   front-end (hundreds or thousands of messages issued by a single
  1014   command).  The idea is to allow the user-interface to downgrade the
  1015   quality of message display to achieve higher throughput.
  1016 
  1017   Note that the user might have to take special actions to see tracing
  1018   output, e.g.\ switch to a different output window.  So this channel
  1019   should not be used for regular output.
  1020 
  1021   \item @{ML warning}~@{text "text"} outputs @{text "text"} as
  1022   warning, which typically means some extra emphasis on the front-end
  1023   side (color highlighting, icons, etc.).
  1024 
  1025   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text
  1026   "text"} and thus lets the Isar toplevel print @{text "text"} on the
  1027   error channel, which typically means some extra emphasis on the
  1028   front-end side (color highlighting, icons, etc.).
  1029 
  1030   This assumes that the exception is not handled before the command
  1031   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a
  1032   perfectly legal alternative: it means that the error is absorbed
  1033   without any message output.
  1034 
  1035   \begin{warn}
  1036   The actual error channel is accessed via @{ML Output.error_message}, but
  1037   the old interaction protocol of Proof~General \emph{crashes} if that
  1038   function is used in regular ML code: error output and toplevel
  1039   command failure always need to coincide in classic TTY interaction.
  1040   \end{warn}
  1041 
  1042   \end{description}
  1043 
  1044   \begin{warn}
  1045   Regular Isabelle/ML code should output messages exclusively by the
  1046   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}
  1047   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in
  1048   the presence of parallel and asynchronous processing of Isabelle
  1049   theories.  Such raw output might be displayed by the front-end in
  1050   some system console log, with a low chance that the user will ever
  1051   see it.  Moreover, as a genuine side-effect on global process
  1052   channels, there is no proper way to retract output when Isar command
  1053   transactions are reset by the system.
  1054   \end{warn}
  1055 
  1056   \begin{warn}
  1057   The message channels should be used in a message-oriented manner.
  1058   This means that multi-line output that logically belongs together is
  1059   issued by a \emph{single} invocation of @{ML writeln} etc.\ with the
  1060   functional concatenation of all message constituents.
  1061   \end{warn}
  1062 *}
  1063 
  1064 text %mlex {* The following example demonstrates a multi-line
  1065   warning.  Note that in some situations the user sees only the first
  1066   line, so the most important point should be made first.
  1067 *}
  1068 
  1069 ML_command {*
  1070   warning (cat_lines
  1071    ["Beware the Jabberwock, my son!",
  1072     "The jaws that bite, the claws that catch!",
  1073     "Beware the Jubjub Bird, and shun",
  1074     "The frumious Bandersnatch!"]);
  1075 *}
  1076 
  1077 
  1078 section {* Exceptions \label{sec:exceptions} *}
  1079 
  1080 text {* The Standard ML semantics of strict functional evaluation
  1081   together with exceptions is rather well defined, but some delicate
  1082   points need to be observed to avoid that ML programs go wrong
  1083   despite static type-checking.  Exceptions in Isabelle/ML are
  1084   subsequently categorized as follows.
  1085 
  1086   \paragraph{Regular user errors.}  These are meant to provide
  1087   informative feedback about malformed input etc.
  1088 
  1089   The \emph{error} function raises the corresponding \emph{ERROR}
  1090   exception, with a plain text message as argument.  \emph{ERROR}
  1091   exceptions can be handled internally, in order to be ignored, turned
  1092   into other exceptions, or cascaded by appending messages.  If the
  1093   corresponding Isabelle/Isar command terminates with an \emph{ERROR}
  1094   exception state, the toplevel will print the result on the error
  1095   channel (see \secref{sec:message-channels}).
  1096 
  1097   It is considered bad style to refer to internal function names or
  1098   values in ML source notation in user error messages.
  1099 
  1100   Grammatical correctness of error messages can be improved by
  1101   \emph{omitting} final punctuation: messages are often concatenated
  1102   or put into a larger context (e.g.\ augmented with source position).
  1103   By not insisting in the final word at the origin of the error, the
  1104   system can perform its administrative tasks more easily and
  1105   robustly.
  1106 
  1107   \paragraph{Program failures.}  There is a handful of standard
  1108   exceptions that indicate general failure situations, or failures of
  1109   core operations on logical entities (types, terms, theorems,
  1110   theories, see \chref{ch:logic}).
  1111 
  1112   These exceptions indicate a genuine breakdown of the program, so the
  1113   main purpose is to determine quickly what has happened where.
  1114   Traditionally, the (short) exception message would include the name
  1115   of an ML function, although this is no longer necessary, because the
  1116   ML runtime system prints a detailed source position of the
  1117   corresponding @{ML_text raise} keyword.
  1118 
  1119   \medskip User modules can always introduce their own custom
  1120   exceptions locally, e.g.\ to organize internal failures robustly
  1121   without overlapping with existing exceptions.  Exceptions that are
  1122   exposed in module signatures require extra care, though, and should
  1123   \emph{not} be introduced by default.  Surprise by users of a module
  1124   can be often minimized by using plain user errors instead.
  1125 
  1126   \paragraph{Interrupts.}  These indicate arbitrary system events:
  1127   both the ML runtime system and the Isabelle/ML infrastructure signal
  1128   various exceptional situations by raising the special
  1129   \emph{Interrupt} exception in user code.
  1130 
  1131   This is the one and only way that physical events can intrude an
  1132   Isabelle/ML program.  Such an interrupt can mean out-of-memory,
  1133   stack overflow, timeout, internal signaling of threads, or the user
  1134   producing a console interrupt manually etc.  An Isabelle/ML program
  1135   that intercepts interrupts becomes dependent on physical effects of
  1136   the environment.  Even worse, exception handling patterns that are
  1137   too general by accident, e.g.\ by mispelled exception constructors,
  1138   will cover interrupts unintentionally and thus render the program
  1139   semantics ill-defined.
  1140 
  1141   Note that the Interrupt exception dates back to the original SML90
  1142   language definition.  It was excluded from the SML97 version to
  1143   avoid its malign impact on ML program semantics, but without
  1144   providing a viable alternative.  Isabelle/ML recovers physical
  1145   interruptibility (which is an indispensable tool to implement
  1146   managed evaluation of command transactions), but requires user code
  1147   to be strictly transparent wrt.\ interrupts.
  1148 
  1149   \begin{warn}
  1150   Isabelle/ML user code needs to terminate promptly on interruption,
  1151   without guessing at its meaning to the system infrastructure.
  1152   Temporary handling of interrupts for cleanup of global resources
  1153   etc.\ needs to be followed immediately by re-raising of the original
  1154   exception.
  1155   \end{warn}
  1156 *}
  1157 
  1158 text %mlref {*
  1159   \begin{mldecls}
  1160   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
  1161   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
  1162   @{index_ML_exception ERROR: string} \\
  1163   @{index_ML_exception Fail: string} \\
  1164   @{index_ML Exn.is_interrupt: "exn -> bool"} \\
  1165   @{index_ML reraise: "exn -> 'a"} \\
  1166   @{index_ML ML_Compiler.exn_trace: "(unit -> 'a) -> 'a"} \\
  1167   \end{mldecls}
  1168 
  1169   \begin{description}
  1170 
  1171   \item @{ML try}~@{text "f x"} makes the partiality of evaluating
  1172   @{text "f x"} explicit via the option datatype.  Interrupts are
  1173   \emph{not} handled here, i.e.\ this form serves as safe replacement
  1174   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f
  1175   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in
  1176   books about SML97, not Isabelle/ML.
  1177 
  1178   \item @{ML can} is similar to @{ML try} with more abstract result.
  1179 
  1180   \item @{ML ERROR}~@{text "msg"} represents user errors; this
  1181   exception is normally raised indirectly via the @{ML error} function
  1182   (see \secref{sec:message-channels}).
  1183 
  1184   \item @{ML Fail}~@{text "msg"} represents general program failures.
  1185 
  1186   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without
  1187   mentioning concrete exception constructors in user code.  Handled
  1188   interrupts need to be re-raised promptly!
  1189 
  1190   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}
  1191   while preserving its implicit position information (if possible,
  1192   depending on the ML platform).
  1193 
  1194   \item @{ML ML_Compiler.exn_trace}~@{ML_text "(fn () =>"}~@{text
  1195   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing
  1196   a full trace of its stack of nested exceptions (if possible,
  1197   depending on the ML platform).
  1198 
  1199   Inserting @{ML ML_Compiler.exn_trace} into ML code temporarily is
  1200   useful for debugging, but not suitable for production code.
  1201 
  1202   \end{description}
  1203 *}
  1204 
  1205 text %mlantiq {*
  1206   \begin{matharray}{rcl}
  1207   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\
  1208   \end{matharray}
  1209 
  1210   \begin{description}
  1211 
  1212   \item @{text "@{assert}"} inlines a function
  1213   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is
  1214   @{ML false}.  Due to inlining the source position of failed
  1215   assertions is included in the error output.
  1216 
  1217   \end{description}
  1218 *}
  1219 
  1220 
  1221 section {* Strings of symbols \label{sec:symbols} *}
  1222 
  1223 text {* A \emph{symbol} constitutes the smallest textual unit in
  1224   Isabelle/ML --- raw ML characters are normally not encountered at
  1225   all!  Isabelle strings consist of a sequence of symbols, represented
  1226   as a packed string or an exploded list of strings.  Each symbol is
  1227   in itself a small string, which has either one of the following
  1228   forms:
  1229 
  1230   \begin{enumerate}
  1231 
  1232   \item a single ASCII character ``@{text "c"}'', for example
  1233   ``\verb,a,'',
  1234 
  1235   \item a codepoint according to UTF8 (non-ASCII byte sequence),
  1236 
  1237   \item a regular symbol ``\verb,\,\verb,<,@{text "ident"}\verb,>,'',
  1238   for example ``\verb,\,\verb,<alpha>,'',
  1239 
  1240   \item a control symbol ``\verb,\,\verb,<^,@{text "ident"}\verb,>,'',
  1241   for example ``\verb,\,\verb,<^bold>,'',
  1242 
  1243   \item a raw symbol ``\verb,\,\verb,<^raw:,@{text text}\verb,>,''
  1244   where @{text text} consists of printable characters excluding
  1245   ``\verb,.,'' and ``\verb,>,'', for example
  1246   ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'',
  1247 
  1248   \item a numbered raw control symbol ``\verb,\,\verb,<^raw,@{text
  1249   n}\verb,>, where @{text n} consists of digits, for example
  1250   ``\verb,\,\verb,<^raw42>,''.
  1251 
  1252   \end{enumerate}
  1253 
  1254   The @{text "ident"} syntax for symbol names is @{text "letter
  1255   (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text
  1256   "digit = 0..9"}.  There are infinitely many regular symbols and
  1257   control symbols, but a fixed collection of standard symbols is
  1258   treated specifically.  For example, ``\verb,\,\verb,<alpha>,'' is
  1259   classified as a letter, which means it may occur within regular
  1260   Isabelle identifiers.
  1261 
  1262   The character set underlying Isabelle symbols is 7-bit ASCII, but
  1263   8-bit character sequences are passed-through unchanged.  Unicode/UCS
  1264   data in UTF-8 encoding is processed in a non-strict fashion, such
  1265   that well-formed code sequences are recognized
  1266   accordingly.\footnote{Note that ISO-Latin-1 differs from UTF-8 only
  1267   in some special punctuation characters that even have replacements
  1268   within the standard collection of Isabelle symbols.  Text consisting
  1269   of ASCII plus accented letters can be processed in either encoding.}
  1270   Unicode provides its own collection of mathematical symbols, but
  1271   within the core Isabelle/ML world there is no link to the standard
  1272   collection of Isabelle regular symbols.
  1273 
  1274   \medskip Output of Isabelle symbols depends on the print mode
  1275   \cite{isabelle-isar-ref}.  For example, the standard {\LaTeX}
  1276   setup of the Isabelle document preparation system would present
  1277   ``\verb,\,\verb,<alpha>,'' as @{text "\<alpha>"}, and
  1278   ``\verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as @{text "\<^bold>\<alpha>"}.
  1279   On-screen rendering usually works by mapping a finite subset of
  1280   Isabelle symbols to suitable Unicode characters.
  1281 *}
  1282 
  1283 text %mlref {*
  1284   \begin{mldecls}
  1285   @{index_ML_type "Symbol.symbol": string} \\
  1286   @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
  1287   @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
  1288   @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
  1289   @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
  1290   @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
  1291   \end{mldecls}
  1292   \begin{mldecls}
  1293   @{index_ML_type "Symbol.sym"} \\
  1294   @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
  1295   \end{mldecls}
  1296 
  1297   \begin{description}
  1298 
  1299   \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle
  1300   symbols.
  1301 
  1302   \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list
  1303   from the packed form.  This function supersedes @{ML
  1304   "String.explode"} for virtually all purposes of manipulating text in
  1305   Isabelle!\footnote{The runtime overhead for exploded strings is
  1306   mainly that of the list structure: individual symbols that happen to
  1307   be a singleton string do not require extra memory in Poly/ML.}
  1308 
  1309   \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
  1310   "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard
  1311   symbols according to fixed syntactic conventions of Isabelle, cf.\
  1312   \cite{isabelle-isar-ref}.
  1313 
  1314   \item Type @{ML_type "Symbol.sym"} is a concrete datatype that
  1315   represents the different kinds of symbols explicitly, with
  1316   constructors @{ML "Symbol.Char"}, @{ML "Symbol.Sym"}, @{ML
  1317   "Symbol.UTF8"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"}.
  1318 
  1319   \item @{ML "Symbol.decode"} converts the string representation of a
  1320   symbol into the datatype version.
  1321 
  1322   \end{description}
  1323 
  1324   \paragraph{Historical note.} In the original SML90 standard the
  1325   primitive ML type @{ML_type char} did not exists, and @{ML_text
  1326   "explode: string -> string list"} produced a list of singleton
  1327   strings like @{ML "raw_explode: string -> string list"} in
  1328   Isabelle/ML today.  When SML97 came out, Isabelle did not adopt its
  1329   somewhat anachronistic 8-bit or 16-bit characters, but the idea of
  1330   exploding a string into a list of small strings was extended to
  1331   ``symbols'' as explained above.  Thus Isabelle sources can refer to
  1332   an infinite store of user-defined symbols, without having to worry
  1333   about the multitude of Unicode encodings that have emerged over the
  1334   years.  *}
  1335 
  1336 
  1337 section {* Basic data types *}
  1338 
  1339 text {* The basis library proposal of SML97 needs to be treated with
  1340   caution.  Many of its operations simply do not fit with important
  1341   Isabelle/ML conventions (like ``canonical argument order'', see
  1342   \secref{sec:canonical-argument-order}), others cause problems with
  1343   the parallel evaluation model of Isabelle/ML (such as @{ML
  1344   TextIO.print} or @{ML OS.Process.system}).
  1345 
  1346   Subsequently we give a brief overview of important operations on
  1347   basic ML data types.
  1348 *}
  1349 
  1350 
  1351 subsection {* Characters *}
  1352 
  1353 text %mlref {*
  1354   \begin{mldecls}
  1355   @{index_ML_type char} \\
  1356   \end{mldecls}
  1357 
  1358   \begin{description}
  1359 
  1360   \item Type @{ML_type char} is \emph{not} used.  The smallest textual
  1361   unit in Isabelle is represented as a ``symbol'' (see
  1362   \secref{sec:symbols}).
  1363 
  1364   \end{description}
  1365 *}
  1366 
  1367 
  1368 subsection {* Strings *}
  1369 
  1370 text %mlref {*
  1371   \begin{mldecls}
  1372   @{index_ML_type string} \\
  1373   \end{mldecls}
  1374 
  1375   \begin{description}
  1376 
  1377   \item Type @{ML_type string} represents immutable vectors of 8-bit
  1378   characters.  There are operations in SML to convert back and forth
  1379   to actual byte vectors, which are seldom used.
  1380 
  1381   This historically important raw text representation is used for
  1382   Isabelle-specific purposes with the following implicit substructures
  1383   packed into the string content:
  1384 
  1385   \begin{enumerate}
  1386 
  1387   \item sequence of Isabelle symbols (see also \secref{sec:symbols}),
  1388   with @{ML Symbol.explode} as key operation;
  1389 
  1390   \item XML tree structure via YXML (see also \cite{isabelle-sys}),
  1391   with @{ML YXML.parse_body} as key operation.
  1392 
  1393   \end{enumerate}
  1394 
  1395   Note that Isabelle/ML string literals may refer Isabelle symbols
  1396   like ``\verb,\,\verb,<alpha>,'' natively, \emph{without} escaping
  1397   the backslash.  This is a consequence of Isabelle treating all
  1398   source text as strings of symbols, instead of raw characters.
  1399 
  1400   \end{description}
  1401 *}
  1402 
  1403 text %mlex {* The subsequent example illustrates the difference of
  1404   physical addressing of bytes versus logical addressing of symbols in
  1405   Isabelle strings.
  1406 *}
  1407 
  1408 ML_val {*
  1409   val s = "\<A>";
  1410 
  1411   @{assert} (length (Symbol.explode s) = 1);
  1412   @{assert} (size s = 4);
  1413 *}
  1414 
  1415 text {* Note that in Unicode renderings of the symbol @{text "\<A>"},
  1416   variations of encodings like UTF-8 or UTF-16 pose delicate questions
  1417   about the multi-byte representations its codepoint, which is outside
  1418   of the 16-bit address space of the original Unicode standard from
  1419   the 1990-ies.  In Isabelle/ML it is just ``\verb,\,\verb,<A>,''
  1420   literally, using plain ASCII characters beyond any doubts. *}
  1421 
  1422 
  1423 subsection {* Integers *}
  1424 
  1425 text %mlref {*
  1426   \begin{mldecls}
  1427   @{index_ML_type int} \\
  1428   \end{mldecls}
  1429 
  1430   \begin{description}
  1431 
  1432   \item Type @{ML_type int} represents regular mathematical integers,
  1433   which are \emph{unbounded}.  Overflow never happens in
  1434   practice.\footnote{The size limit for integer bit patterns in memory
  1435   is 64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.}
  1436   This works uniformly for all supported ML platforms (Poly/ML and
  1437   SML/NJ).
  1438 
  1439   Literal integers in ML text are forced to be of this one true
  1440   integer type --- adhoc overloading of SML97 is disabled.
  1441 
  1442   Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
  1443   @{ML_structure Int}.  Structure @{ML_structure Integer} in @{file
  1444   "~~/src/Pure/General/integer.ML"} provides some additional
  1445   operations.
  1446 
  1447   \end{description}
  1448 *}
  1449 
  1450 
  1451 subsection {* Time *}
  1452 
  1453 text %mlref {*
  1454   \begin{mldecls}
  1455   @{index_ML_type Time.time} \\
  1456   @{index_ML seconds: "real -> Time.time"} \\
  1457   \end{mldecls}
  1458 
  1459   \begin{description}
  1460 
  1461   \item Type @{ML_type Time.time} represents time abstractly according
  1462   to the SML97 basis library definition.  This is adequate for
  1463   internal ML operations, but awkward in concrete time specifications.
  1464 
  1465   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text
  1466   "s"} (measured in seconds) into an abstract time value.  Floating
  1467   point numbers are easy to use as configuration options in the
  1468   context (see \secref{sec:config-options}) or system preferences that
  1469   are maintained externally.
  1470 
  1471   \end{description}
  1472 *}
  1473 
  1474 
  1475 subsection {* Options *}
  1476 
  1477 text %mlref {*
  1478   \begin{mldecls}
  1479   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
  1480   @{index_ML is_some: "'a option -> bool"} \\
  1481   @{index_ML is_none: "'a option -> bool"} \\
  1482   @{index_ML the: "'a option -> 'a"} \\
  1483   @{index_ML these: "'a list option -> 'a list"} \\
  1484   @{index_ML the_list: "'a option -> 'a list"} \\
  1485   @{index_ML the_default: "'a -> 'a option -> 'a"} \\
  1486   \end{mldecls}
  1487 *}
  1488 
  1489 text {* Apart from @{ML Option.map} most other operations defined in
  1490   structure @{ML_structure Option} are alien to Isabelle/ML an never
  1491   used.  The operations shown above are defined in @{file
  1492   "~~/src/Pure/General/basics.ML"}.  *}
  1493 
  1494 
  1495 subsection {* Lists *}
  1496 
  1497 text {* Lists are ubiquitous in ML as simple and light-weight
  1498   ``collections'' for many everyday programming tasks.  Isabelle/ML
  1499   provides important additions and improvements over operations that
  1500   are predefined in the SML97 library. *}
  1501 
  1502 text %mlref {*
  1503   \begin{mldecls}
  1504   @{index_ML cons: "'a -> 'a list -> 'a list"} \\
  1505   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
  1506   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
  1507   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
  1508   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
  1509   \end{mldecls}
  1510 
  1511   \begin{description}
  1512 
  1513   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.
  1514 
  1515   Tupled infix operators are a historical accident in Standard ML.
  1516   The curried @{ML cons} amends this, but it should be only used when
  1517   partial application is required.
  1518 
  1519   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat
  1520   lists as a set-like container that maintains the order of elements.
  1521   See @{file "~~/src/Pure/library.ML"} for the full specifications
  1522   (written in ML).  There are some further derived operations like
  1523   @{ML union} or @{ML inter}.
  1524 
  1525   Note that @{ML insert} is conservative about elements that are
  1526   already a @{ML member} of the list, while @{ML update} ensures that
  1527   the latest entry is always put in front.  The latter discipline is
  1528   often more appropriate in declarations of context data
  1529   (\secref{sec:context-data}) that are issued by the user in Isar
  1530   source: later declarations take precedence over earlier ones.
  1531 
  1532   \end{description}
  1533 *}
  1534 
  1535 text %mlex {* Using canonical @{ML fold} together with @{ML cons} (or
  1536   similar standard operations) alternates the orientation of data.
  1537   The is quite natural and should not be altered forcible by inserting
  1538   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can
  1539   be used in the few situations, where alternation should be
  1540   prevented.
  1541 *}
  1542 
  1543 ML {*
  1544   val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
  1545 
  1546   val list1 = fold cons items [];
  1547   @{assert} (list1 = rev items);
  1548 
  1549   val list2 = fold_rev cons items [];
  1550   @{assert} (list2 = items);
  1551 *}
  1552 
  1553 text {* The subsequent example demonstrates how to \emph{merge} two
  1554   lists in a natural way. *}
  1555 
  1556 ML {*
  1557   fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;
  1558 *}
  1559 
  1560 text {* Here the first list is treated conservatively: only the new
  1561   elements from the second list are inserted.  The inside-out order of
  1562   insertion via @{ML fold_rev} attempts to preserve the order of
  1563   elements in the result.
  1564 
  1565   This way of merging lists is typical for context data
  1566   (\secref{sec:context-data}).  See also @{ML merge} as defined in
  1567   @{file "~~/src/Pure/library.ML"}.
  1568 *}
  1569 
  1570 
  1571 subsection {* Association lists *}
  1572 
  1573 text {* The operations for association lists interpret a concrete list
  1574   of pairs as a finite function from keys to values.  Redundant
  1575   representations with multiple occurrences of the same key are
  1576   implicitly normalized: lookup and update only take the first
  1577   occurrence into account.
  1578 *}
  1579 
  1580 text {*
  1581   \begin{mldecls}
  1582   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
  1583   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
  1584   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
  1585   \end{mldecls}
  1586 
  1587   \begin{description}
  1588 
  1589   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}
  1590   implement the main ``framework operations'' for mappings in
  1591   Isabelle/ML, following standard conventions for their names and
  1592   types.
  1593 
  1594   Note that a function called @{text lookup} is obliged to express its
  1595   partiality via an explicit option element.  There is no choice to
  1596   raise an exception, without changing the name to something like
  1597   @{text "the_element"} or @{text "get"}.
  1598 
  1599   The @{text "defined"} operation is essentially a contraction of @{ML
  1600   is_some} and @{text "lookup"}, but this is sufficiently frequent to
  1601   justify its independent existence.  This also gives the
  1602   implementation some opportunity for peep-hole optimization.
  1603 
  1604   \end{description}
  1605 
  1606   Association lists are adequate as simple and light-weight
  1607   implementation of finite mappings in many practical situations.  A
  1608   more heavy-duty table structure is defined in @{file
  1609   "~~/src/Pure/General/table.ML"}; that version scales easily to
  1610   thousands or millions of elements.
  1611 *}
  1612 
  1613 
  1614 subsection {* Unsynchronized references *}
  1615 
  1616 text %mlref {*
  1617   \begin{mldecls}
  1618   @{index_ML_type "'a Unsynchronized.ref"} \\
  1619   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
  1620   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
  1621   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
  1622   \end{mldecls}
  1623 *}
  1624 
  1625 text {* Due to ubiquitous parallelism in Isabelle/ML (see also
  1626   \secref{sec:multi-threading}), the mutable reference cells of
  1627   Standard ML are notorious for causing problems.  In a highly
  1628   parallel system, both correctness \emph{and} performance are easily
  1629   degraded when using mutable data.
  1630 
  1631   The unwieldy name of @{ML Unsynchronized.ref} for the constructor
  1632   for references in Isabelle/ML emphasizes the inconveniences caused by
  1633   mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are
  1634   unchanged, but should be used with special precautions, say in a
  1635   strictly local situation that is guaranteed to be restricted to
  1636   sequential evaluation --- now and in the future.
  1637 
  1638   \begin{warn}
  1639   Never @{ML_text "open Unsynchronized"}, not even in a local scope!
  1640   Pretending that mutable state is no problem is a very bad idea.
  1641   \end{warn}
  1642 *}
  1643 
  1644 
  1645 section {* Thread-safe programming \label{sec:multi-threading} *}
  1646 
  1647 text {* Multi-threaded execution has become an everyday reality in
  1648   Isabelle since Poly/ML 5.2.1 and Isabelle2008.  Isabelle/ML provides
  1649   implicit and explicit parallelism by default, and there is no way
  1650   for user-space tools to ``opt out''.  ML programs that are purely
  1651   functional, output messages only via the official channels
  1652   (\secref{sec:message-channels}), and do not intercept interrupts
  1653   (\secref{sec:exceptions}) can participate in the multi-threaded
  1654   environment immediately without further ado.
  1655 
  1656   More ambitious tools with more fine-grained interaction with the
  1657   environment need to observe the principles explained below.
  1658 *}
  1659 
  1660 
  1661 subsection {* Multi-threading with shared memory *}
  1662 
  1663 text {* Multiple threads help to organize advanced operations of the
  1664   system, such as real-time conditions on command transactions,
  1665   sub-components with explicit communication, general asynchronous
  1666   interaction etc.  Moreover, parallel evaluation is a prerequisite to
  1667   make adequate use of the CPU resources that are available on
  1668   multi-core systems.\footnote{Multi-core computing does not mean that
  1669   there are ``spare cycles'' to be wasted.  It means that the
  1670   continued exponential speedup of CPU performance due to ``Moore's
  1671   Law'' follows different rules: clock frequency has reached its peak
  1672   around 2005, and applications need to be parallelized in order to
  1673   avoid a perceived loss of performance.  See also
  1674   \cite{Sutter:2005}.}
  1675 
  1676   Isabelle/Isar exploits the inherent structure of theories and proofs
  1677   to support \emph{implicit parallelism} to a large extent.  LCF-style
  1678   theorem provides almost ideal conditions for that, see also
  1679   \cite{Wenzel:2009}.  This means, significant parts of theory and
  1680   proof checking is parallelized by default.  In Isabelle2013, a
  1681   maximum speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be
  1682   expected.
  1683 
  1684   \medskip ML threads lack the memory protection of separate
  1685   processes, and operate concurrently on shared heap memory.  This has
  1686   the advantage that results of independent computations are directly
  1687   available to other threads: abstract values can be passed without
  1688   copying or awkward serialization that is typically required for
  1689   separate processes.
  1690 
  1691   To make shared-memory multi-threading work robustly and efficiently,
  1692   some programming guidelines need to be observed.  While the ML
  1693   system is responsible to maintain basic integrity of the
  1694   representation of ML values in memory, the application programmer
  1695   needs to ensure that multi-threaded execution does not break the
  1696   intended semantics.
  1697 
  1698   \begin{warn}
  1699   To participate in implicit parallelism, tools need to be
  1700   thread-safe.  A single ill-behaved tool can affect the stability and
  1701   performance of the whole system.
  1702   \end{warn}
  1703 
  1704   Apart from observing the principles of thread-safeness passively,
  1705   advanced tools may also exploit parallelism actively, e.g.\ by using
  1706   ``future values'' (\secref{sec:futures}) or the more basic library
  1707   functions for parallel list operations (\secref{sec:parlist}).
  1708 
  1709   \begin{warn}
  1710   Parallel computing resources are managed centrally by the
  1711   Isabelle/ML infrastructure.  User programs must not fork their own
  1712   ML threads to perform computations.
  1713   \end{warn}
  1714 *}
  1715 
  1716 
  1717 subsection {* Critical shared resources *}
  1718 
  1719 text {* Thread-safeness is mainly concerned about concurrent
  1720   read/write access to shared resources, which are outside the purely
  1721   functional world of ML.  This covers the following in particular.
  1722 
  1723   \begin{itemize}
  1724 
  1725   \item Global references (or arrays), i.e.\ mutable memory cells that
  1726   persist over several invocations of associated
  1727   operations.\footnote{This is independent of the visibility of such
  1728   mutable values in the toplevel scope.}
  1729 
  1730   \item Global state of the running Isabelle/ML process, i.e.\ raw I/O
  1731   channels, environment variables, current working directory.
  1732 
  1733   \item Writable resources in the file-system that are shared among
  1734   different threads or external processes.
  1735 
  1736   \end{itemize}
  1737 
  1738   Isabelle/ML provides various mechanisms to avoid critical shared
  1739   resources in most situations.  As last resort there are some
  1740   mechanisms for explicit synchronization.  The following guidelines
  1741   help to make Isabelle/ML programs work smoothly in a concurrent
  1742   environment.
  1743 
  1744   \begin{itemize}
  1745 
  1746   \item Avoid global references altogether.  Isabelle/Isar maintains a
  1747   uniform context that incorporates arbitrary data declared by user
  1748   programs (\secref{sec:context-data}).  This context is passed as
  1749   plain value and user tools can get/map their own data in a purely
  1750   functional manner.  Configuration options within the context
  1751   (\secref{sec:config-options}) provide simple drop-in replacements
  1752   for historic reference variables.
  1753 
  1754   \item Keep components with local state information re-entrant.
  1755   Instead of poking initial values into (private) global references, a
  1756   new state record can be created on each invocation, and passed
  1757   through any auxiliary functions of the component.  The state record
  1758   may well contain mutable references, without requiring any special
  1759   synchronizations, as long as each invocation gets its own copy.
  1760 
  1761   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The
  1762   Poly/ML library is thread-safe for each individual output operation,
  1763   but the ordering of parallel invocations is arbitrary.  This means
  1764   raw output will appear on some system console with unpredictable
  1765   interleaving of atomic chunks.
  1766 
  1767   Note that this does not affect regular message output channels
  1768   (\secref{sec:message-channels}).  An official message is associated
  1769   with the command transaction from where it originates, independently
  1770   of other transactions.  This means each running Isar command has
  1771   effectively its own set of message channels, and interleaving can
  1772   only happen when commands use parallelism internally (and only at
  1773   message boundaries).
  1774 
  1775   \item Treat environment variables and the current working directory
  1776   of the running process as strictly read-only.
  1777 
  1778   \item Restrict writing to the file-system to unique temporary files.
  1779   Isabelle already provides a temporary directory that is unique for
  1780   the running process, and there is a centralized source of unique
  1781   serial numbers in Isabelle/ML.  Thus temporary files that are passed
  1782   to to some external process will be always disjoint, and thus
  1783   thread-safe.
  1784 
  1785   \end{itemize}
  1786 *}
  1787 
  1788 text %mlref {*
  1789   \begin{mldecls}
  1790   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
  1791   @{index_ML serial_string: "unit -> string"} \\
  1792   \end{mldecls}
  1793 
  1794   \begin{description}
  1795 
  1796   \item @{ML File.tmp_path}~@{text "path"} relocates the base
  1797   component of @{text "path"} into the unique temporary directory of
  1798   the running Isabelle/ML process.
  1799 
  1800   \item @{ML serial_string}~@{text "()"} creates a new serial number
  1801   that is unique over the runtime of the Isabelle/ML process.
  1802 
  1803   \end{description}
  1804 *}
  1805 
  1806 text %mlex {* The following example shows how to create unique
  1807   temporary file names.
  1808 *}
  1809 
  1810 ML {*
  1811   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
  1812   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
  1813   @{assert} (tmp1 <> tmp2);
  1814 *}
  1815 
  1816 
  1817 subsection {* Explicit synchronization *}
  1818 
  1819 text {* Isabelle/ML also provides some explicit synchronization
  1820   mechanisms, for the rare situations where mutable shared resources
  1821   are really required.  These are based on the synchronizations
  1822   primitives of Poly/ML, which have been adapted to the specific
  1823   assumptions of the concurrent Isabelle/ML environment.  User code
  1824   must not use the Poly/ML primitives directly!
  1825 
  1826   \medskip The most basic synchronization concept is a single
  1827   \emph{critical section} (also called ``monitor'' in the literature).
  1828   A thread that enters the critical section prevents all other threads
  1829   from doing the same.  A thread that is already within the critical
  1830   section may re-enter it in an idempotent manner.
  1831 
  1832   Such centralized locking is convenient, because it prevents
  1833   deadlocks by construction.
  1834 
  1835   \medskip More fine-grained locking works via \emph{synchronized
  1836   variables}.  An explicit state component is associated with
  1837   mechanisms for locking and signaling.  There are operations to
  1838   await a condition, change the state, and signal the change to all
  1839   other waiting threads.
  1840 
  1841   Here the synchronized access to the state variable is \emph{not}
  1842   re-entrant: direct or indirect nesting within the same thread will
  1843   cause a deadlock!
  1844 *}
  1845 
  1846 text %mlref {*
  1847   \begin{mldecls}
  1848   @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
  1849   @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
  1850   \end{mldecls}
  1851   \begin{mldecls}
  1852   @{index_ML_type "'a Synchronized.var"} \\
  1853   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
  1854   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
  1855   ('a -> ('b * 'a) option) -> 'b"} \\
  1856   \end{mldecls}
  1857 
  1858   \begin{description}
  1859 
  1860   \item @{ML NAMED_CRITICAL}~@{text "name e"} evaluates @{text "e ()"}
  1861   within the central critical section of Isabelle/ML.  No other thread
  1862   may do so at the same time, but non-critical parallel execution will
  1863   continue.  The @{text "name"} argument is used for tracing and might
  1864   help to spot sources of congestion.
  1865 
  1866   Entering the critical section without contention is very fast.  Each
  1867   thread should stay within the critical section only very briefly,
  1868   otherwise parallel performance may degrade.
  1869 
  1870   \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
  1871   name argument.
  1872 
  1873   \item Type @{ML_type "'a Synchronized.var"} represents synchronized
  1874   variables with state of type @{ML_type 'a}.
  1875 
  1876   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized
  1877   variable that is initialized with value @{text "x"}.  The @{text
  1878   "name"} is used for tracing.
  1879 
  1880   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the
  1881   function @{text "f"} operate within a critical section on the state
  1882   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it
  1883   continues to wait on the internal condition variable, expecting that
  1884   some other thread will eventually change the content in a suitable
  1885   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is
  1886   satisfied and assigns the new state value @{text "x'"}, broadcasts a
  1887   signal to all waiting threads on the associated condition variable,
  1888   and returns the result @{text "y"}.
  1889 
  1890   \end{description}
  1891 
  1892   There are some further variants of the @{ML
  1893   Synchronized.guarded_access} combinator, see @{file
  1894   "~~/src/Pure/Concurrent/synchronized.ML"} for details.
  1895 *}
  1896 
  1897 text %mlex {* The following example implements a counter that produces
  1898   positive integers that are unique over the runtime of the Isabelle
  1899   process:
  1900 *}
  1901 
  1902 ML {*
  1903   local
  1904     val counter = Synchronized.var "counter" 0;
  1905   in
  1906     fun next () =
  1907       Synchronized.guarded_access counter
  1908         (fn i =>
  1909           let val j = i + 1
  1910           in SOME (j, j) end);
  1911   end;
  1912 *}
  1913 
  1914 ML {*
  1915   val a = next ();
  1916   val b = next ();
  1917   @{assert} (a <> b);
  1918 *}
  1919 
  1920 text {* \medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how
  1921   to implement a mailbox as synchronized variable over a purely
  1922   functional queue. *}
  1923 
  1924 
  1925 section {* Managed evaluation *}
  1926 
  1927 text {* Execution of Standard ML follows the model of strict
  1928   functional evaluation with optional exceptions.  Evaluation happens
  1929   whenever some function is applied to (sufficiently many)
  1930   arguments. The result is either an explicit value or an implicit
  1931   exception.
  1932 
  1933   \emph{Managed evaluation} in Isabelle/ML organizes expressions and
  1934   results to control certain physical side-conditions, to say more
  1935   specifically when and how evaluation happens.  For example, the
  1936   Isabelle/ML library supports lazy evaluation with memoing, parallel
  1937   evaluation via futures, asynchronous evaluation via promises,
  1938   evaluation with time limit etc.
  1939 
  1940   \medskip An \emph{unevaluated expression} is represented either as
  1941   unit abstraction @{verbatim "fn () => a"} of type
  1942   @{verbatim "unit -> 'a"} or as regular function
  1943   @{verbatim "fn a => b"} of type @{verbatim "'a -> 'b"}.  Both forms
  1944   occur routinely, and special care is required to tell them apart ---
  1945   the static type-system of SML is only of limited help here.
  1946 
  1947   The first form is more intuitive: some combinator @{text "(unit ->
  1948   'a) -> 'a"} applies the given function to @{text "()"} to initiate
  1949   the postponed evaluation process.  The second form is more flexible:
  1950   some combinator @{text "('a -> 'b) -> 'a -> 'b"} acts like a
  1951   modified form of function application; several such combinators may
  1952   be cascaded to modify a given function, before it is ultimately
  1953   applied to some argument.
  1954 
  1955   \medskip \emph{Reified results} make the disjoint sum of regular
  1956   values versions exceptional situations explicit as ML datatype:
  1957   @{text "'a result = Res of 'a | Exn of exn"}.  This is typically
  1958   used for administrative purposes, to store the overall outcome of an
  1959   evaluation process.
  1960 
  1961   \emph{Parallel exceptions} aggregate reified results, such that
  1962   multiple exceptions are digested as a collection in canonical form
  1963   that identifies exceptions according to their original occurrence.
  1964   This is particular important for parallel evaluation via futures
  1965   \secref{sec:futures}, which are organized as acyclic graph of
  1966   evaluations that depend on other evaluations: exceptions stemming
  1967   from shared sub-graphs are exposed exactly once and in the order of
  1968   their original occurrence (e.g.\ when printed at the toplevel).
  1969   Interrupt counts as neutral element here: it is treated as minimal
  1970   information about some canceled evaluation process, and is absorbed
  1971   by the presence of regular program exceptions.  *}
  1972 
  1973 text %mlref {*
  1974   \begin{mldecls}
  1975   @{index_ML_type "'a Exn.result"} \\
  1976   @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
  1977   @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
  1978   @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
  1979   @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
  1980   @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
  1981   \end{mldecls}
  1982 
  1983   \begin{description}
  1984 
  1985   \item Type @{ML_type "'a Exn.result"} represents the disjoint sum of
  1986   ML results explicitly, with constructor @{ML Exn.Res} for regular
  1987   values and @{ML "Exn.Exn"} for exceptions.
  1988 
  1989   \item @{ML Exn.capture}~@{text "f x"} manages the evaluation of
  1990   @{text "f x"} such that exceptions are made explicit as @{ML
  1991   "Exn.Exn"}.  Note that this includes physical interrupts (see also
  1992   \secref{sec:exceptions}), so the same precautions apply to user
  1993   code: interrupts must not be absorbed accidentally!
  1994 
  1995   \item @{ML Exn.interruptible_capture} is similar to @{ML
  1996   Exn.capture}, but interrupts are immediately re-raised as required
  1997   for user code.
  1998 
  1999   \item @{ML Exn.release}~@{text "result"} releases the original
  2000   runtime result, exposing its regular value or raising the reified
  2001   exception.
  2002 
  2003   \item @{ML Par_Exn.release_all}~@{text "results"} combines results
  2004   that were produced independently (e.g.\ by parallel evaluation).  If
  2005   all results are regular values, that list is returned.  Otherwise,
  2006   the collection of all exceptions is raised, wrapped-up as collective
  2007   parallel exception.  Note that the latter prevents access to
  2008   individual exceptions by conventional @{verbatim "handle"} of SML.
  2009 
  2010   \item @{ML Par_Exn.release_first} is similar to @{ML
  2011   Par_Exn.release_all}, but only the first exception that has occurred
  2012   in the original evaluation process is raised again, the others are
  2013   ignored.  That single exception may get handled by conventional
  2014   means in SML.
  2015 
  2016   \end{description}
  2017 *}
  2018 
  2019 
  2020 subsection {* Parallel skeletons \label{sec:parlist} *}
  2021 
  2022 text {*
  2023   Algorithmic skeletons are combinators that operate on lists in
  2024   parallel, in the manner of well-known @{text map}, @{text exists},
  2025   @{text forall} etc.  Management of futures (\secref{sec:futures})
  2026   and their results as reified exceptions is wrapped up into simple
  2027   programming interfaces that resemble the sequential versions.
  2028 
  2029   What remains is the application-specific problem to present
  2030   expressions with suitable \emph{granularity}: each list element
  2031   corresponds to one evaluation task.  If the granularity is too
  2032   coarse, the available CPUs are not saturated.  If it is too
  2033   fine-grained, CPU cycles are wasted due to the overhead of
  2034   organizing parallel processing.  In the worst case, parallel
  2035   performance will be less than the sequential counterpart!
  2036 *}
  2037 
  2038 text %mlref {*
  2039   \begin{mldecls}
  2040   @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
  2041   @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
  2042   \end{mldecls}
  2043 
  2044   \begin{description}
  2045 
  2046   \item @{ML Par_List.map}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} is like @{ML
  2047   "map"}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"}, but the evaluation of @{text "f x\<^sub>i"}
  2048   for @{text "i = 1, \<dots>, n"} is performed in parallel.
  2049 
  2050   An exception in any @{text "f x\<^sub>i"} cancels the overall evaluation
  2051   process.  The final result is produced via @{ML
  2052   Par_Exn.release_first} as explained above, which means the first
  2053   program exception that happened to occur in the parallel evaluation
  2054   is propagated, and all other failures are ignored.
  2055 
  2056   \item @{ML Par_List.get_some}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} produces some
  2057   @{text "f x\<^sub>i"} that is of the form @{text "SOME y\<^sub>i"}, if that
  2058   exists, otherwise @{text "NONE"}.  Thus it is similar to @{ML
  2059   Library.get_first}, but subject to a non-deterministic parallel
  2060   choice process.  The first successful result cancels the overall
  2061   evaluation process; other exceptions are propagated as for @{ML
  2062   Par_List.map}.
  2063 
  2064   This generic parallel choice combinator is the basis for derived
  2065   forms, such as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML
  2066   Par_List.forall}.
  2067 
  2068   \end{description}
  2069 *}
  2070 
  2071 text %mlex {* Subsequently, the Ackermann function is evaluated in
  2072   parallel for some ranges of arguments. *}
  2073 
  2074 ML_val {*
  2075   fun ackermann 0 n = n + 1
  2076     | ackermann m 0 = ackermann (m - 1) 1
  2077     | ackermann m n = ackermann (m - 1) (ackermann m (n - 1));
  2078 
  2079   Par_List.map (ackermann 2) (500 upto 1000);
  2080   Par_List.map (ackermann 3) (5 upto 10);
  2081 *}
  2082 
  2083 
  2084 subsection {* Lazy evaluation *}
  2085 
  2086 text {*
  2087   %FIXME
  2088 
  2089   See also @{file "~~/src/Pure/Concurrent/lazy.ML"}.
  2090 *}
  2091 
  2092 
  2093 subsection {* Future values \label{sec:futures} *}
  2094 
  2095 text {*
  2096   %FIXME
  2097 
  2098   See also @{file "~~/src/Pure/Concurrent/future.ML"}.
  2099 *}
  2100 
  2101 
  2102 end