src/Doc/Implementation/ML.thy
author wenzelm
Wed Mar 25 11:39:52 2015 +0100 (2015-03-25)
changeset 59809 87641097d0f3
parent 59624 6c0e70b01111
child 59902 6afbe5a99139
permissions -rw-r--r--
tuned signature;
     1 (*:wrap=hard:maxLineLen=78:*)
     2 
     3 theory "ML"
     4 imports Base
     5 begin
     6 
     7 chapter \<open>Isabelle/ML\<close>
     8 
     9 text \<open>Isabelle/ML is best understood as a certain culture based on
    10   Standard ML.  Thus it is not a new programming language, but a
    11   certain way to use SML at an advanced level within the Isabelle
    12   environment.  This covers a variety of aspects that are geared
    13   towards an efficient and robust platform for applications of formal
    14   logic with fully foundational proof construction --- according to
    15   the well-known \emph{LCF principle}.  There is specific
    16   infrastructure with library modules to address the needs of this
    17   difficult task.  For example, the raw parallel programming model of
    18   Poly/ML is presented as considerably more abstract concept of
    19   \emph{futures}, which is then used to augment the inference
    20   kernel, Isar theory and proof interpreter, and PIDE document management.
    21 
    22   The main aspects of Isabelle/ML are introduced below.  These
    23   first-hand explanations should help to understand how proper
    24   Isabelle/ML is to be read and written, and to get access to the
    25   wealth of experience that is expressed in the source text and its
    26   history of changes.\footnote{See
    27   @{url "http://isabelle.in.tum.de/repos/isabelle"} for the full
    28   Mercurial history.  There are symbolic tags to refer to official
    29   Isabelle releases, as opposed to arbitrary \emph{tip} versions that
    30   merely reflect snapshots that are never really up-to-date.}\<close>
    31 
    32 
    33 section \<open>Style and orthography\<close>
    34 
    35 text \<open>The sources of Isabelle/Isar are optimized for
    36   \emph{readability} and \emph{maintainability}.  The main purpose is
    37   to tell an informed reader what is really going on and how things
    38   really work.  This is a non-trivial aim, but it is supported by a
    39   certain style of writing Isabelle/ML that has emerged from long
    40   years of system development.\footnote{See also the interesting style
    41   guide for OCaml
    42   @{url "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"}
    43   which shares many of our means and ends.}
    44 
    45   The main principle behind any coding style is \emph{consistency}.
    46   For a single author of a small program this merely means ``choose
    47   your style and stick to it''.  A complex project like Isabelle, with
    48   long years of development and different contributors, requires more
    49   standardization.  A coding style that is changed every few years or
    50   with every new contributor is no style at all, because consistency
    51   is quickly lost.  Global consistency is hard to achieve, though.
    52   Nonetheless, one should always strive at least for local consistency
    53   of modules and sub-systems, without deviating from some general
    54   principles how to write Isabelle/ML.
    55 
    56   In a sense, good coding style is like an \emph{orthography} for the
    57   sources: it helps to read quickly over the text and see through the
    58   main points, without getting distracted by accidental presentation
    59   of free-style code.
    60 \<close>
    61 
    62 
    63 subsection \<open>Header and sectioning\<close>
    64 
    65 text \<open>Isabelle source files have a certain standardized header
    66   format (with precise spacing) that follows ancient traditions
    67   reaching back to the earliest versions of the system by Larry
    68   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.
    69 
    70   The header includes at least @{verbatim Title} and @{verbatim
    71   Author} entries, followed by a prose description of the purpose of
    72   the module.  The latter can range from a single line to several
    73   paragraphs of explanations.
    74 
    75   The rest of the file is divided into sections, subsections,
    76   subsubsections, paragraphs etc.\ using a simple layout via ML
    77   comments as follows.
    78 
    79   \begin{verbatim}
    80   (*** section ***)
    81 
    82   (** subsection **)
    83 
    84   (* subsubsection *)
    85 
    86   (*short paragraph*)
    87 
    88   (*
    89     long paragraph,
    90     with more text
    91   *)
    92   \end{verbatim}
    93 
    94   As in regular typography, there is some extra space \emph{before}
    95   section headings that are adjacent to plain text, but not other headings
    96   as in the example above.
    97 
    98   \medskip The precise wording of the prose text given in these
    99   headings is chosen carefully to introduce the main theme of the
   100   subsequent formal ML text.
   101 \<close>
   102 
   103 
   104 subsection \<open>Naming conventions\<close>
   105 
   106 text \<open>Since ML is the primary medium to express the meaning of the
   107   source text, naming of ML entities requires special care.
   108 
   109   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely
   110   4, but not more) that are separated by underscore.  There are three
   111   variants concerning upper or lower case letters, which are used for
   112   certain ML categories as follows:
   113 
   114   \medskip
   115   \begin{tabular}{lll}
   116   variant & example & ML categories \\\hline
   117   lower-case & @{ML_text foo_bar} & values, types, record fields \\
   118   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
   119   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
   120   \end{tabular}
   121   \medskip
   122 
   123   For historical reasons, many capitalized names omit underscores,
   124   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.
   125   Genuine mixed-case names are \emph{not} used, because clear division
   126   of words is essential for readability.\footnote{Camel-case was
   127   invented to workaround the lack of underscore in some early
   128   non-ASCII character sets.  Later it became habitual in some language
   129   communities that are now strong in numbers.}
   130 
   131   A single (capital) character does not count as ``word'' in this
   132   respect: some Isabelle/ML names are suffixed by extra markers like
   133   this: @{ML_text foo_barT}.
   134 
   135   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
   136   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
   137   foo''''} or more.  Decimal digits scale better to larger numbers,
   138   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
   139 
   140   \paragraph{Scopes.}  Apart from very basic library modules, ML
   141   structures are not ``opened'', but names are referenced with
   142   explicit qualification, as in @{ML Syntax.string_of_term} for
   143   example.  When devising names for structures and their components it
   144   is important to aim at eye-catching compositions of both parts, because
   145   this is how they are seen in the sources and documentation.  For the
   146   same reasons, aliases of well-known library functions should be
   147   avoided.
   148 
   149   Local names of function abstraction or case/let bindings are
   150   typically shorter, sometimes using only rudiments of ``words'',
   151   while still avoiding cryptic shorthands.  An auxiliary function
   152   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is
   153   considered bad style.
   154 
   155   Example:
   156 
   157   \begin{verbatim}
   158   (* RIGHT *)
   159 
   160   fun print_foo ctxt foo =
   161     let
   162       fun print t = ... Syntax.string_of_term ctxt t ...
   163     in ... end;
   164 
   165 
   166   (* RIGHT *)
   167 
   168   fun print_foo ctxt foo =
   169     let
   170       val string_of_term = Syntax.string_of_term ctxt;
   171       fun print t = ... string_of_term t ...
   172     in ... end;
   173 
   174 
   175   (* WRONG *)
   176 
   177   val string_of_term = Syntax.string_of_term;
   178 
   179   fun print_foo ctxt foo =
   180     let
   181       fun aux t = ... string_of_term ctxt t ...
   182     in ... end;
   183   \end{verbatim}
   184 
   185 
   186   \paragraph{Specific conventions.} Here are some specific name forms
   187   that occur frequently in the sources.
   188 
   189   \begin{itemize}
   190 
   191   \item A function that maps @{ML_text foo} to @{ML_text bar} is
   192   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
   193   @{ML_text foo2bar}, nor @{ML_text bar_from_foo}, nor @{ML_text
   194   bar_for_foo}, nor @{ML_text bar4foo}).
   195 
   196   \item The name component @{ML_text legacy} means that the operation
   197   is about to be discontinued soon.
   198 
   199   \item The name component @{ML_text global} means that this works
   200   with the background theory instead of the regular local context
   201   (\secref{sec:context}), sometimes for historical reasons, sometimes
   202   due a genuine lack of locality of the concept involved, sometimes as
   203   a fall-back for the lack of a proper context in the application
   204   code.  Whenever there is a non-global variant available, the
   205   application should be migrated to use it with a proper local
   206   context.
   207 
   208   \item Variables of the main context types of the Isabelle/Isar
   209   framework (\secref{sec:context} and \chref{ch:local-theory}) have
   210   firm naming conventions as follows:
   211 
   212   \begin{itemize}
   213 
   214   \item theories are called @{ML_text thy}, rarely @{ML_text theory}
   215   (never @{ML_text thry})
   216 
   217   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text
   218   context} (never @{ML_text ctx})
   219 
   220   \item generic contexts are called @{ML_text context}
   221 
   222   \item local theories are called @{ML_text lthy}, except for local
   223   theories that are treated as proof context (which is a semantic
   224   super-type)
   225 
   226   \end{itemize}
   227 
   228   Variations with primed or decimal numbers are always possible, as
   229   well as semantic prefixes like @{ML_text foo_thy} or @{ML_text
   230   bar_ctxt}, but the base conventions above need to be preserved.
   231   This allows to emphasize their data flow via plain regular
   232   expressions in the text editor.
   233 
   234   \item The main logical entities (\secref{ch:logic}) have established
   235   naming convention as follows:
   236 
   237   \begin{itemize}
   238 
   239   \item sorts are called @{ML_text S}
   240 
   241   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text
   242   ty} (never @{ML_text t})
   243 
   244   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text
   245   tm} (never @{ML_text trm})
   246 
   247   \item certified types are called @{ML_text cT}, rarely @{ML_text
   248   T}, with variants as for types
   249 
   250   \item certified terms are called @{ML_text ct}, rarely @{ML_text
   251   t}, with variants as for terms (never @{ML_text ctrm})
   252 
   253   \item theorems are called @{ML_text th}, or @{ML_text thm}
   254 
   255   \end{itemize}
   256 
   257   Proper semantic names override these conventions completely.  For
   258   example, the left-hand side of an equation (as a term) can be called
   259   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known
   260   to be a variable can be called @{ML_text v} or @{ML_text x}.
   261 
   262   \item Tactics (\secref{sec:tactics}) are sufficiently important to
   263   have specific naming conventions.  The name of a basic tactic
   264   definition always has a @{ML_text "_tac"} suffix, the subgoal index
   265   (if applicable) is always called @{ML_text i}, and the goal state
   266   (if made explicit) is usually called @{ML_text st} instead of the
   267   somewhat misleading @{ML_text thm}.  Any other arguments are given
   268   before the latter two, and the general context is given first.
   269   Example:
   270 
   271   \begin{verbatim}
   272   fun my_tac ctxt arg1 arg2 i st = ...
   273   \end{verbatim}
   274 
   275   Note that the goal state @{ML_text st} above is rarely made
   276   explicit, if tactic combinators (tacticals) are used as usual.
   277 
   278   A tactic that requires a proof context needs to make that explicit as seen
   279   in the @{verbatim ctxt} argument above. Do not refer to the background
   280   theory of @{verbatim st} -- it is not a proper context, but merely a formal
   281   certificate.
   282 
   283   \end{itemize}
   284 \<close>
   285 
   286 
   287 subsection \<open>General source layout\<close>
   288 
   289 text \<open>
   290   The general Isabelle/ML source layout imitates regular type-setting
   291   conventions, augmented by the requirements for deeply nested expressions
   292   that are commonplace in functional programming.
   293 
   294   \paragraph{Line length} is limited to 80 characters according to ancient
   295   standards, but we allow as much as 100 characters (not
   296   more).\footnote{Readability requires to keep the beginning of a line
   297   in view while watching its end.  Modern wide-screen displays do not
   298   change the way how the human brain works.  Sources also need to be
   299   printable on plain paper with reasonable font-size.} The extra 20
   300   characters acknowledge the space requirements due to qualified
   301   library references in Isabelle/ML.
   302 
   303   \paragraph{White-space} is used to emphasize the structure of
   304   expressions, following mostly standard conventions for mathematical
   305   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
   306   defines positioning of spaces for parentheses, punctuation, and
   307   infixes as illustrated here:
   308 
   309   \begin{verbatim}
   310   val x = y + z * (a + b);
   311   val pair = (a, b);
   312   val record = {foo = 1, bar = 2};
   313   \end{verbatim}
   314 
   315   Lines are normally broken \emph{after} an infix operator or
   316   punctuation character.  For example:
   317 
   318   \begin{verbatim}
   319   val x =
   320     a +
   321     b +
   322     c;
   323 
   324   val tuple =
   325    (a,
   326     b,
   327     c);
   328   \end{verbatim}
   329 
   330   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the
   331   start of the line, but punctuation is always at the end.
   332 
   333   Function application follows the tradition of @{text "\<lambda>"}-calculus,
   334   not informal mathematics.  For example: @{ML_text "f a b"} for a
   335   curried function, or @{ML_text "g (a, b)"} for a tupled function.
   336   Note that the space between @{ML_text g} and the pair @{ML_text
   337   "(a, b)"} follows the important principle of
   338   \emph{compositionality}: the layout of @{ML_text "g p"} does not
   339   change when @{ML_text "p"} is refined to the concrete pair
   340   @{ML_text "(a, b)"}.
   341 
   342   \paragraph{Indentation} uses plain spaces, never hard
   343   tabulators.\footnote{Tabulators were invented to move the carriage
   344   of a type-writer to certain predefined positions.  In software they
   345   could be used as a primitive run-length compression of consecutive
   346   spaces, but the precise result would depend on non-standardized
   347   text editor configuration.}
   348 
   349   Each level of nesting is indented by 2 spaces, sometimes 1, very
   350   rarely 4, never 8 or any other odd number.
   351 
   352   Indentation follows a simple logical format that only depends on the
   353   nesting depth, not the accidental length of the text that initiates
   354   a level of nesting.  Example:
   355 
   356   \begin{verbatim}
   357   (* RIGHT *)
   358 
   359   if b then
   360     expr1_part1
   361     expr1_part2
   362   else
   363     expr2_part1
   364     expr2_part2
   365 
   366 
   367   (* WRONG *)
   368 
   369   if b then expr1_part1
   370             expr1_part2
   371   else expr2_part1
   372        expr2_part2
   373   \end{verbatim}
   374 
   375   The second form has many problems: it assumes a fixed-width font
   376   when viewing the sources, it uses more space on the line and thus
   377   makes it hard to observe its strict length limit (working against
   378   \emph{readability}), it requires extra editing to adapt the layout
   379   to changes of the initial text (working against
   380   \emph{maintainability}) etc.
   381 
   382   \medskip For similar reasons, any kind of two-dimensional or tabular
   383   layouts, ASCII-art with lines or boxes of asterisks etc.\ should be
   384   avoided.
   385 
   386   \paragraph{Complex expressions} that consist of multi-clausal
   387   function definitions, @{ML_text handle}, @{ML_text case},
   388   @{ML_text let} (and combinations) require special attention.  The
   389   syntax of Standard ML is quite ambitious and admits a lot of
   390   variance that can distort the meaning of the text.
   391 
   392   Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
   393   @{ML_text case} get extra indentation to indicate the nesting
   394   clearly.  Example:
   395 
   396   \begin{verbatim}
   397   (* RIGHT *)
   398 
   399   fun foo p1 =
   400         expr1
   401     | foo p2 =
   402         expr2
   403 
   404 
   405   (* WRONG *)
   406 
   407   fun foo p1 =
   408     expr1
   409     | foo p2 =
   410     expr2
   411   \end{verbatim}
   412 
   413   Body expressions consisting of @{ML_text case} or @{ML_text let}
   414   require care to maintain compositionality, to prevent loss of
   415   logical indentation where it is especially important to see the
   416   structure of the text.  Example:
   417 
   418   \begin{verbatim}
   419   (* RIGHT *)
   420 
   421   fun foo p1 =
   422         (case e of
   423           q1 => ...
   424         | q2 => ...)
   425     | foo p2 =
   426         let
   427           ...
   428         in
   429           ...
   430         end
   431 
   432 
   433   (* WRONG *)
   434 
   435   fun foo p1 = case e of
   436       q1 => ...
   437     | q2 => ...
   438     | foo p2 =
   439     let
   440       ...
   441     in
   442       ...
   443     end
   444   \end{verbatim}
   445 
   446   Extra parentheses around @{ML_text case} expressions are optional,
   447   but help to analyse the nesting based on character matching in the
   448   text editor.
   449 
   450   \medskip There are two main exceptions to the overall principle of
   451   compositionality in the layout of complex expressions.
   452 
   453   \begin{enumerate}
   454 
   455   \item @{ML_text "if"} expressions are iterated as if ML had multi-branch
   456   conditionals, e.g.
   457 
   458   \begin{verbatim}
   459   (* RIGHT *)
   460 
   461   if b1 then e1
   462   else if b2 then e2
   463   else e3
   464   \end{verbatim}
   465 
   466   \item @{ML_text fn} abstractions are often layed-out as if they
   467   would lack any structure by themselves.  This traditional form is
   468   motivated by the possibility to shift function arguments back and
   469   forth wrt.\ additional combinators.  Example:
   470 
   471   \begin{verbatim}
   472   (* RIGHT *)
   473 
   474   fun foo x y = fold (fn z =>
   475     expr)
   476   \end{verbatim}
   477 
   478   Here the visual appearance is that of three arguments @{ML_text x},
   479   @{ML_text y}, @{ML_text z} in a row.
   480 
   481   \end{enumerate}
   482 
   483   Such weakly structured layout should be use with great care.  Here
   484   are some counter-examples involving @{ML_text let} expressions:
   485 
   486   \begin{verbatim}
   487   (* WRONG *)
   488 
   489   fun foo x = let
   490       val y = ...
   491     in ... end
   492 
   493 
   494   (* WRONG *)
   495 
   496   fun foo x = let
   497     val y = ...
   498   in ... end
   499 
   500 
   501   (* WRONG *)
   502 
   503   fun foo x =
   504   let
   505     val y = ...
   506   in ... end
   507 
   508 
   509   (* WRONG *)
   510 
   511   fun foo x =
   512     let
   513       val y = ...
   514     in
   515       ... end
   516   \end{verbatim}
   517 
   518   \medskip In general the source layout is meant to emphasize the
   519   structure of complex language expressions, not to pretend that SML
   520   had a completely different syntax (say that of Haskell, Scala, Java).
   521 \<close>
   522 
   523 
   524 section \<open>ML embedded into Isabelle/Isar\<close>
   525 
   526 text \<open>ML and Isar are intertwined via an open-ended bootstrap
   527   process that provides more and more programming facilities and
   528   logical content in an alternating manner.  Bootstrapping starts from
   529   the raw environment of existing implementations of Standard ML
   530   (mainly Poly/ML, but also SML/NJ).
   531 
   532   Isabelle/Pure marks the point where the raw ML toplevel is superseded by
   533   Isabelle/ML within the Isar theory and proof language, with a uniform
   534   context for arbitrary ML values (see also \secref{sec:context}). This formal
   535   environment holds ML compiler bindings, logical entities, and many other
   536   things.
   537 
   538   Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar
   539   environment by introducing suitable theories with associated ML modules,
   540   either inlined within @{verbatim ".thy"} files, or as separate @{verbatim
   541   ".ML"} files that are loading from some theory. Thus Isabelle/HOL is defined
   542   as a regular user-space application within the Isabelle framework. Further
   543   add-on tools can be implemented in ML within the Isar context in the same
   544   manner: ML is part of the standard repertoire of Isabelle, and there is no
   545   distinction between ``users'' and ``developers'' in this respect.
   546 \<close>
   547 
   548 
   549 subsection \<open>Isar ML commands\<close>
   550 
   551 text \<open>
   552   The primary Isar source language provides facilities to ``open a window'' to
   553   the underlying ML compiler. Especially see the Isar commands @{command_ref
   554   "ML_file"} and @{command_ref "ML"}: both work the same way, but the source
   555   text is provided differently, via a file vs.\ inlined, respectively. Apart
   556   from embedding ML into the main theory definition like that, there are many
   557   more commands that refer to ML source, such as @{command_ref setup} or
   558   @{command_ref declaration}. Even more fine-grained embedding of ML into Isar
   559   is encountered in the proof method @{method_ref tactic}, which refines the
   560   pending goal state via a given expression of type @{ML_type tactic}.
   561 \<close>
   562 
   563 text %mlex \<open>The following artificial example demonstrates some ML
   564   toplevel declarations within the implicit Isar theory context.  This
   565   is regular functional programming without referring to logical
   566   entities yet.
   567 \<close>
   568 
   569 ML \<open>
   570   fun factorial 0 = 1
   571     | factorial n = n * factorial (n - 1)
   572 \<close>
   573 
   574 text \<open>Here the ML environment is already managed by Isabelle, i.e.\
   575   the @{ML factorial} function is not yet accessible in the preceding
   576   paragraph, nor in a different theory that is independent from the
   577   current one in the import hierarchy.
   578 
   579   Removing the above ML declaration from the source text will remove any trace
   580   of this definition, as expected. The Isabelle/ML toplevel environment is
   581   managed in a \emph{stateless} way: in contrast to the raw ML toplevel, there
   582   are no global side-effects involved here.\footnote{Such a stateless
   583   compilation environment is also a prerequisite for robust parallel
   584   compilation within independent nodes of the implicit theory development
   585   graph.}
   586 
   587   \medskip The next example shows how to embed ML into Isar proofs, using
   588   @{command_ref "ML_prf"} instead of @{command_ref "ML"}. As illustrated
   589   below, the effect on the ML environment is local to the whole proof body,
   590   but ignoring the block structure. \<close>
   591 
   592 notepad
   593 begin
   594   ML_prf %"ML" \<open>val a = 1\<close>
   595   {
   596     ML_prf %"ML" \<open>val b = a + 1\<close>
   597   } -- \<open>Isar block structure ignored by ML environment\<close>
   598   ML_prf %"ML" \<open>val c = b + 1\<close>
   599 end
   600 
   601 text \<open>By side-stepping the normal scoping rules for Isar proof
   602   blocks, embedded ML code can refer to the different contexts and
   603   manipulate corresponding entities, e.g.\ export a fact from a block
   604   context.
   605 
   606   \medskip Two further ML commands are useful in certain situations:
   607   @{command_ref ML_val} and @{command_ref ML_command} are \emph{diagnostic} in
   608   the sense that there is no effect on the underlying environment, and can
   609   thus be used anywhere. The examples below produce long strings of digits by
   610   invoking @{ML factorial}: @{command ML_val} takes care of printing the ML
   611   toplevel result, but @{command ML_command} is silent so we produce an
   612   explicit output message.
   613 \<close>
   614 
   615 ML_val \<open>factorial 100\<close>
   616 ML_command \<open>writeln (string_of_int (factorial 100))\<close>
   617 
   618 notepad
   619 begin
   620   ML_val \<open>factorial 100\<close>
   621   ML_command \<open>writeln (string_of_int (factorial 100))\<close>
   622 end
   623 
   624 
   625 subsection \<open>Compile-time context\<close>
   626 
   627 text \<open>Whenever the ML compiler is invoked within Isabelle/Isar, the
   628   formal context is passed as a thread-local reference variable.  Thus
   629   ML code may access the theory context during compilation, by reading
   630   or writing the (local) theory under construction.  Note that such
   631   direct access to the compile-time context is rare.  In practice it
   632   is typically done via some derived ML functions instead.
   633 \<close>
   634 
   635 text %mlref \<open>
   636   \begin{mldecls}
   637   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\
   638   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
   639   @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
   640   @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
   641   \end{mldecls}
   642 
   643   \begin{description}
   644 
   645   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory
   646   context of the ML toplevel --- at compile time.  ML code needs to
   647   take care to refer to @{ML "ML_Context.the_generic_context ()"}
   648   correctly.  Recall that evaluation of a function body is delayed
   649   until actual run-time.
   650 
   651   \item @{ML "Context.>>"}~@{text f} applies context transformation
   652   @{text f} to the implicit context of the ML toplevel.
   653 
   654   \item @{ML ML_Thms.bind_thms}~@{text "(name, thms)"} stores a list of
   655   theorems produced in ML both in the (global) theory context and the
   656   ML toplevel, associating it with the provided name.
   657 
   658   \item @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but
   659   refers to a singleton fact.
   660 
   661   \end{description}
   662 
   663   It is important to note that the above functions are really
   664   restricted to the compile time, even though the ML compiler is
   665   invoked at run-time.  The majority of ML code either uses static
   666   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or
   667   proof context at run-time, by explicit functional abstraction.
   668 \<close>
   669 
   670 
   671 subsection \<open>Antiquotations \label{sec:ML-antiq}\<close>
   672 
   673 text \<open>A very important consequence of embedding ML into Isar is the
   674   concept of \emph{ML antiquotation}.  The standard token language of
   675   ML is augmented by special syntactic entities of the following form:
   676 
   677   @{rail \<open>
   678   @{syntax_def antiquote}: '@{' nameref args '}'
   679   \<close>}
   680 
   681   Here @{syntax nameref} and @{syntax args} are outer syntax categories, as
   682   defined in @{cite "isabelle-isar-ref"}.
   683 
   684   \medskip A regular antiquotation @{text "@{name args}"} processes
   685   its arguments by the usual means of the Isar source language, and
   686   produces corresponding ML source text, either as literal
   687   \emph{inline} text (e.g.\ @{text "@{term t}"}) or abstract
   688   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation
   689   scheme allows to refer to formal entities in a robust manner, with
   690   proper static scoping and with some degree of logical checking of
   691   small portions of the code.
   692 \<close>
   693 
   694 
   695 subsection \<open>Printing ML values\<close>
   696 
   697 text \<open>The ML compiler knows about the structure of values according
   698   to their static type, and can print them in the manner of its
   699   toplevel, although the details are non-portable.  The
   700   antiquotations @{ML_antiquotation_def "make_string"} and
   701   @{ML_antiquotation_def "print"} provide a quasi-portable way to
   702   refer to this potential capability of the underlying ML system in
   703   generic Isabelle/ML sources.
   704 
   705   This is occasionally useful for diagnostic or demonstration
   706   purposes.  Note that production-quality tools require proper
   707   user-level error messages, avoiding raw ML values in the output.\<close>
   708 
   709 text %mlantiq \<open>
   710   \begin{matharray}{rcl}
   711   @{ML_antiquotation_def "make_string"} & : & @{text ML_antiquotation} \\
   712   @{ML_antiquotation_def "print"} & : & @{text ML_antiquotation} \\
   713   \end{matharray}
   714 
   715   @{rail \<open>
   716   @@{ML_antiquotation make_string}
   717   ;
   718   @@{ML_antiquotation print} @{syntax name}?
   719   \<close>}
   720 
   721   \begin{description}
   722 
   723   \item @{text "@{make_string}"} inlines a function to print arbitrary values
   724   similar to the ML toplevel. The result is compiler dependent and may fall
   725   back on "?" in certain situations. The value of configuration option
   726   @{attribute_ref ML_print_depth} determines further details of output.
   727 
   728   \item @{text "@{print f}"} uses the ML function @{text "f: string ->
   729   unit"} to output the result of @{text "@{make_string}"} above,
   730   together with the source position of the antiquotation.  The default
   731   output function is @{ML writeln}.
   732 
   733   \end{description}
   734 \<close>
   735 
   736 text %mlex \<open>The following artificial examples show how to produce
   737   adhoc output of ML values for debugging purposes.\<close>
   738 
   739 ML \<open>
   740   val x = 42;
   741   val y = true;
   742 
   743   writeln (@{make_string} {x = x, y = y});
   744 
   745   @{print} {x = x, y = y};
   746   @{print tracing} {x = x, y = y};
   747 \<close>
   748 
   749 
   750 section \<open>Canonical argument order \label{sec:canonical-argument-order}\<close>
   751 
   752 text \<open>Standard ML is a language in the tradition of @{text
   753   "\<lambda>"}-calculus and \emph{higher-order functional programming},
   754   similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical
   755   languages.  Getting acquainted with the native style of representing
   756   functions in that setting can save a lot of extra boiler-plate of
   757   redundant shuffling of arguments, auxiliary abstractions etc.
   758 
   759   Functions are usually \emph{curried}: the idea of turning arguments
   760   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of
   761   type @{text "\<tau>"} is represented by the iterated function space
   762   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known
   763   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried
   764   version fits more smoothly into the basic calculus.\footnote{The
   765   difference is even more significant in HOL, because the redundant
   766   tuple structure needs to be accommodated extraneous proof steps.}
   767 
   768   Currying gives some flexibility due to \emph{partial application}.  A
   769   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}
   770   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function
   771   etc.  How well this works in practice depends on the order of
   772   arguments.  In the worst case, arguments are arranged erratically,
   773   and using a function in a certain situation always requires some
   774   glue code.  Thus we would get exponentially many opportunities to
   775   decorate the code with meaningless permutations of arguments.
   776 
   777   This can be avoided by \emph{canonical argument order}, which
   778   observes certain standard patterns and minimizes adhoc permutations
   779   in their application.  In Isabelle/ML, large portions of text can be
   780   written without auxiliary operations like @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times>
   781   \<alpha>"} or @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} (the latter is not
   782   present in the Isabelle/ML library).
   783 
   784   \medskip The main idea is that arguments that vary less are moved
   785   further to the left than those that vary more.  Two particularly
   786   important categories of functions are \emph{selectors} and
   787   \emph{updates}.
   788 
   789   The subsequent scheme is based on a hypothetical set-like container
   790   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both
   791   the names and types of the associated operations are canonical for
   792   Isabelle/ML.
   793 
   794   \begin{center}
   795   \begin{tabular}{ll}
   796   kind & canonical name and type \\\hline
   797   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\
   798   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\
   799   \end{tabular}
   800   \end{center}
   801 
   802   Given a container @{text "B: \<beta>"}, the partially applied @{text
   803   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and
   804   thus represents the intended denotation directly.  It is customary
   805   to pass the abstract predicate to further operations, not the
   806   concrete container.  The argument order makes it easy to use other
   807   combinators: @{text "forall (member B) list"} will check a list of
   808   elements for membership in @{text "B"} etc. Often the explicit
   809   @{text "list"} is pointless and can be contracted to @{text "forall
   810   (member B)"} to get directly a predicate again.
   811 
   812   In contrast, an update operation varies the container, so it moves
   813   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to
   814   insert a value @{text "a"}.  These can be composed naturally as
   815   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward
   816   inversion of the composition order is due to conventional
   817   mathematical notation, which can be easily amended as explained
   818   below.
   819 \<close>
   820 
   821 
   822 subsection \<open>Forward application and composition\<close>
   823 
   824 text \<open>Regular function application and infix notation works best for
   825   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g
   826   z)"}.  The important special case of \emph{linear transformation}
   827   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This
   828   becomes hard to read and maintain if the functions are themselves
   829   given as complex expressions.  The notation can be significantly
   830   improved by introducing \emph{forward} versions of application and
   831   composition as follows:
   832 
   833   \medskip
   834   \begin{tabular}{lll}
   835   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\
   836   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\
   837   \end{tabular}
   838   \medskip
   839 
   840   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or
   841   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text
   842   "x"}.
   843 
   844   \medskip There is an additional set of combinators to accommodate
   845   multiple results (via pairs) that are passed on as multiple
   846   arguments (via currying).
   847 
   848   \medskip
   849   \begin{tabular}{lll}
   850   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\
   851   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\
   852   \end{tabular}
   853   \medskip
   854 \<close>
   855 
   856 text %mlref \<open>
   857   \begin{mldecls}
   858   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
   859   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
   860   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
   861   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
   862   \end{mldecls}
   863 \<close>
   864 
   865 
   866 subsection \<open>Canonical iteration\<close>
   867 
   868 text \<open>As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be
   869   understood as update on a configuration of type @{text "\<beta>"},
   870   parameterized by an argument of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}
   871   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates
   872   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a
   873   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f a\<^sub>n"}.
   874   The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.
   875   It can be applied to an initial configuration @{text "b: \<beta>"} to
   876   start the iteration over the given list of arguments: each @{text
   877   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a
   878   cumulative configuration.
   879 
   880   The @{text fold} combinator in Isabelle/ML lifts a function @{text
   881   "f"} as above to its iterated version over a list of arguments.
   882   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates
   883   over a list of lists as expected.
   884 
   885   The variant @{text "fold_rev"} works inside-out over the list of
   886   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.
   887 
   888   The @{text "fold_map"} combinator essentially performs @{text
   889   "fold"} and @{text "map"} simultaneously: each application of @{text
   890   "f"} produces an updated configuration together with a side-result;
   891   the iteration collects all such side-results as a separate list.
   892 \<close>
   893 
   894 text %mlref \<open>
   895   \begin{mldecls}
   896   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
   897   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
   898   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
   899   \end{mldecls}
   900 
   901   \begin{description}
   902 
   903   \item @{ML fold}~@{text f} lifts the parametrized update function
   904   @{text "f"} to a list of parameters.
   905 
   906   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text
   907   "f"}, but works inside-out, as if the list would be reversed.
   908 
   909   \item @{ML fold_map}~@{text "f"} lifts the parametrized update
   910   function @{text "f"} (with side-result) to a list of parameters and
   911   cumulative side-results.
   912 
   913   \end{description}
   914 
   915   \begin{warn}
   916   The literature on functional programming provides a confusing multitude of
   917   combinators called @{text "foldl"}, @{text "foldr"} etc. SML97 provides its
   918   own variations as @{ML List.foldl} and @{ML List.foldr}, while the classic
   919   Isabelle library also has the historic @{ML Library.foldl} and @{ML
   920   Library.foldr}. To avoid unnecessary complication, all these historical
   921   versions should be ignored, and the canonical @{ML fold} (or @{ML fold_rev})
   922   used exclusively.
   923   \end{warn}
   924 \<close>
   925 
   926 text %mlex \<open>The following example shows how to fill a text buffer
   927   incrementally by adding strings, either individually or from a given
   928   list.
   929 \<close>
   930 
   931 ML \<open>
   932   val s =
   933     Buffer.empty
   934     |> Buffer.add "digits: "
   935     |> fold (Buffer.add o string_of_int) (0 upto 9)
   936     |> Buffer.content;
   937 
   938   @{assert} (s = "digits: 0123456789");
   939 \<close>
   940 
   941 text \<open>Note how @{ML "fold (Buffer.add o string_of_int)"} above saves
   942   an extra @{ML "map"} over the given list.  This kind of peephole
   943   optimization reduces both the code size and the tree structures in
   944   memory (``deforestation''), but it requires some practice to read
   945   and write fluently.
   946 
   947   \medskip The next example elaborates the idea of canonical
   948   iteration, demonstrating fast accumulation of tree content using a
   949   text buffer.
   950 \<close>
   951 
   952 ML \<open>
   953   datatype tree = Text of string | Elem of string * tree list;
   954 
   955   fun slow_content (Text txt) = txt
   956     | slow_content (Elem (name, ts)) =
   957         "<" ^ name ^ ">" ^
   958         implode (map slow_content ts) ^
   959         "</" ^ name ^ ">"
   960 
   961   fun add_content (Text txt) = Buffer.add txt
   962     | add_content (Elem (name, ts)) =
   963         Buffer.add ("<" ^ name ^ ">") #>
   964         fold add_content ts #>
   965         Buffer.add ("</" ^ name ^ ">");
   966 
   967   fun fast_content tree =
   968     Buffer.empty |> add_content tree |> Buffer.content;
   969 \<close>
   970 
   971 text \<open>The slowness of @{ML slow_content} is due to the @{ML implode} of
   972   the recursive results, because it copies previously produced strings
   973   again and again.
   974 
   975   The incremental @{ML add_content} avoids this by operating on a
   976   buffer that is passed through in a linear fashion.  Using @{ML_text
   977   "#>"} and contraction over the actual buffer argument saves some
   978   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}
   979   invocations with concatenated strings could have been split into
   980   smaller parts, but this would have obfuscated the source without
   981   making a big difference in performance.  Here we have done some
   982   peephole-optimization for the sake of readability.
   983 
   984   Another benefit of @{ML add_content} is its ``open'' form as a
   985   function on buffers that can be continued in further linear
   986   transformations, folding etc.  Thus it is more compositional than
   987   the naive @{ML slow_content}.  As realistic example, compare the
   988   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer
   989   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
   990 
   991   Note that @{ML fast_content} above is only defined as example.  In
   992   many practical situations, it is customary to provide the
   993   incremental @{ML add_content} only and leave the initialization and
   994   termination to the concrete application to the user.
   995 \<close>
   996 
   997 
   998 section \<open>Message output channels \label{sec:message-channels}\<close>
   999 
  1000 text \<open>Isabelle provides output channels for different kinds of
  1001   messages: regular output, high-volume tracing information, warnings,
  1002   and errors.
  1003 
  1004   Depending on the user interface involved, these messages may appear
  1005   in different text styles or colours.  The standard output for
  1006   batch sessions prefixes each line of warnings by @{verbatim
  1007   "###"} and errors by @{verbatim "***"}, but leaves anything else
  1008   unchanged.  The message body may contain further markup and formatting,
  1009   which is routinely used in the Prover IDE @{cite "isabelle-jedit"}.
  1010 
  1011   Messages are associated with the transaction context of the running
  1012   Isar command.  This enables the front-end to manage commands and
  1013   resulting messages together.  For example, after deleting a command
  1014   from a given theory document version, the corresponding message
  1015   output can be retracted from the display.
  1016 \<close>
  1017 
  1018 text %mlref \<open>
  1019   \begin{mldecls}
  1020   @{index_ML writeln: "string -> unit"} \\
  1021   @{index_ML tracing: "string -> unit"} \\
  1022   @{index_ML warning: "string -> unit"} \\
  1023   @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\
  1024   \end{mldecls}
  1025 
  1026   \begin{description}
  1027 
  1028   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular
  1029   message.  This is the primary message output operation of Isabelle
  1030   and should be used by default.
  1031 
  1032   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special
  1033   tracing message, indicating potential high-volume output to the
  1034   front-end (hundreds or thousands of messages issued by a single
  1035   command).  The idea is to allow the user-interface to downgrade the
  1036   quality of message display to achieve higher throughput.
  1037 
  1038   Note that the user might have to take special actions to see tracing
  1039   output, e.g.\ switch to a different output window.  So this channel
  1040   should not be used for regular output.
  1041 
  1042   \item @{ML warning}~@{text "text"} outputs @{text "text"} as
  1043   warning, which typically means some extra emphasis on the front-end
  1044   side (color highlighting, icons, etc.).
  1045 
  1046   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text
  1047   "text"} and thus lets the Isar toplevel print @{text "text"} on the
  1048   error channel, which typically means some extra emphasis on the
  1049   front-end side (color highlighting, icons, etc.).
  1050 
  1051   This assumes that the exception is not handled before the command
  1052   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a
  1053   perfectly legal alternative: it means that the error is absorbed
  1054   without any message output.
  1055 
  1056   \begin{warn}
  1057   The actual error channel is accessed via @{ML Output.error_message}, but
  1058   this is normally not used directly in user code.
  1059   \end{warn}
  1060 
  1061   \end{description}
  1062 
  1063   \begin{warn}
  1064   Regular Isabelle/ML code should output messages exclusively by the
  1065   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}
  1066   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in
  1067   the presence of parallel and asynchronous processing of Isabelle
  1068   theories.  Such raw output might be displayed by the front-end in
  1069   some system console log, with a low chance that the user will ever
  1070   see it.  Moreover, as a genuine side-effect on global process
  1071   channels, there is no proper way to retract output when Isar command
  1072   transactions are reset by the system.
  1073   \end{warn}
  1074 
  1075   \begin{warn}
  1076   The message channels should be used in a message-oriented manner.
  1077   This means that multi-line output that logically belongs together is
  1078   issued by a single invocation of @{ML writeln} etc.\ with the
  1079   functional concatenation of all message constituents.
  1080   \end{warn}
  1081 \<close>
  1082 
  1083 text %mlex \<open>The following example demonstrates a multi-line
  1084   warning.  Note that in some situations the user sees only the first
  1085   line, so the most important point should be made first.
  1086 \<close>
  1087 
  1088 ML_command \<open>
  1089   warning (cat_lines
  1090    ["Beware the Jabberwock, my son!",
  1091     "The jaws that bite, the claws that catch!",
  1092     "Beware the Jubjub Bird, and shun",
  1093     "The frumious Bandersnatch!"]);
  1094 \<close>
  1095 
  1096 
  1097 section \<open>Exceptions \label{sec:exceptions}\<close>
  1098 
  1099 text \<open>The Standard ML semantics of strict functional evaluation
  1100   together with exceptions is rather well defined, but some delicate
  1101   points need to be observed to avoid that ML programs go wrong
  1102   despite static type-checking.  Exceptions in Isabelle/ML are
  1103   subsequently categorized as follows.
  1104 
  1105   \paragraph{Regular user errors.}  These are meant to provide
  1106   informative feedback about malformed input etc.
  1107 
  1108   The \emph{error} function raises the corresponding @{ML ERROR}
  1109   exception, with a plain text message as argument.  @{ML ERROR}
  1110   exceptions can be handled internally, in order to be ignored, turned
  1111   into other exceptions, or cascaded by appending messages.  If the
  1112   corresponding Isabelle/Isar command terminates with an @{ML ERROR}
  1113   exception state, the system will print the result on the error
  1114   channel (see \secref{sec:message-channels}).
  1115 
  1116   It is considered bad style to refer to internal function names or
  1117   values in ML source notation in user error messages.  Do not use
  1118   @{text "@{make_string}"} nor @{text "@{here}"}!
  1119 
  1120   Grammatical correctness of error messages can be improved by
  1121   \emph{omitting} final punctuation: messages are often concatenated
  1122   or put into a larger context (e.g.\ augmented with source position).
  1123   Note that punctuation after formal entities (types, terms, theorems) is
  1124   particularly prone to user confusion.
  1125 
  1126   \paragraph{Program failures.}  There is a handful of standard
  1127   exceptions that indicate general failure situations, or failures of
  1128   core operations on logical entities (types, terms, theorems,
  1129   theories, see \chref{ch:logic}).
  1130 
  1131   These exceptions indicate a genuine breakdown of the program, so the
  1132   main purpose is to determine quickly what has happened where.
  1133   Traditionally, the (short) exception message would include the name
  1134   of an ML function, although this is no longer necessary, because the
  1135   ML runtime system attaches detailed source position stemming from the
  1136   corresponding @{ML_text raise} keyword.
  1137 
  1138   \medskip User modules can always introduce their own custom
  1139   exceptions locally, e.g.\ to organize internal failures robustly
  1140   without overlapping with existing exceptions.  Exceptions that are
  1141   exposed in module signatures require extra care, though, and should
  1142   \emph{not} be introduced by default.  Surprise by users of a module
  1143   can be often minimized by using plain user errors instead.
  1144 
  1145   \paragraph{Interrupts.}  These indicate arbitrary system events:
  1146   both the ML runtime system and the Isabelle/ML infrastructure signal
  1147   various exceptional situations by raising the special
  1148   @{ML Exn.Interrupt} exception in user code.
  1149 
  1150   This is the one and only way that physical events can intrude an Isabelle/ML
  1151   program. Such an interrupt can mean out-of-memory, stack overflow, timeout,
  1152   internal signaling of threads, or a POSIX process signal. An Isabelle/ML
  1153   program that intercepts interrupts becomes dependent on physical effects of
  1154   the environment. Even worse, exception handling patterns that are too
  1155   general by accident, e.g.\ by misspelled exception constructors, will cover
  1156   interrupts unintentionally and thus render the program semantics
  1157   ill-defined.
  1158 
  1159   Note that the Interrupt exception dates back to the original SML90
  1160   language definition.  It was excluded from the SML97 version to
  1161   avoid its malign impact on ML program semantics, but without
  1162   providing a viable alternative.  Isabelle/ML recovers physical
  1163   interruptibility (which is an indispensable tool to implement
  1164   managed evaluation of command transactions), but requires user code
  1165   to be strictly transparent wrt.\ interrupts.
  1166 
  1167   \begin{warn}
  1168   Isabelle/ML user code needs to terminate promptly on interruption,
  1169   without guessing at its meaning to the system infrastructure.
  1170   Temporary handling of interrupts for cleanup of global resources
  1171   etc.\ needs to be followed immediately by re-raising of the original
  1172   exception.
  1173   \end{warn}
  1174 \<close>
  1175 
  1176 text %mlref \<open>
  1177   \begin{mldecls}
  1178   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
  1179   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
  1180   @{index_ML_exception ERROR: string} \\
  1181   @{index_ML_exception Fail: string} \\
  1182   @{index_ML Exn.is_interrupt: "exn -> bool"} \\
  1183   @{index_ML reraise: "exn -> 'a"} \\
  1184   @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\
  1185   \end{mldecls}
  1186 
  1187   \begin{description}
  1188 
  1189   \item @{ML try}~@{text "f x"} makes the partiality of evaluating
  1190   @{text "f x"} explicit via the option datatype.  Interrupts are
  1191   \emph{not} handled here, i.e.\ this form serves as safe replacement
  1192   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f
  1193   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in
  1194   books about SML97, but not in Isabelle/ML.
  1195 
  1196   \item @{ML can} is similar to @{ML try} with more abstract result.
  1197 
  1198   \item @{ML ERROR}~@{text "msg"} represents user errors; this
  1199   exception is normally raised indirectly via the @{ML error} function
  1200   (see \secref{sec:message-channels}).
  1201 
  1202   \item @{ML Fail}~@{text "msg"} represents general program failures.
  1203 
  1204   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without
  1205   mentioning concrete exception constructors in user code.  Handled
  1206   interrupts need to be re-raised promptly!
  1207 
  1208   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}
  1209   while preserving its implicit position information (if possible,
  1210   depending on the ML platform).
  1211 
  1212   \item @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~@{text
  1213   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing
  1214   a full trace of its stack of nested exceptions (if possible,
  1215   depending on the ML platform).
  1216 
  1217   Inserting @{ML Runtime.exn_trace} into ML code temporarily is
  1218   useful for debugging, but not suitable for production code.
  1219 
  1220   \end{description}
  1221 \<close>
  1222 
  1223 text %mlantiq \<open>
  1224   \begin{matharray}{rcl}
  1225   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\
  1226   \end{matharray}
  1227 
  1228   \begin{description}
  1229 
  1230   \item @{text "@{assert}"} inlines a function
  1231   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is
  1232   @{ML false}.  Due to inlining the source position of failed
  1233   assertions is included in the error output.
  1234 
  1235   \end{description}
  1236 \<close>
  1237 
  1238 
  1239 section \<open>Strings of symbols \label{sec:symbols}\<close>
  1240 
  1241 text \<open>A \emph{symbol} constitutes the smallest textual unit in
  1242   Isabelle/ML --- raw ML characters are normally not encountered at
  1243   all.  Isabelle strings consist of a sequence of symbols, represented
  1244   as a packed string or an exploded list of strings.  Each symbol is
  1245   in itself a small string, which has either one of the following
  1246   forms:
  1247 
  1248   \begin{enumerate}
  1249 
  1250   \item a single ASCII character ``@{text "c"}'', for example
  1251   ``@{verbatim a}'',
  1252 
  1253   \item a codepoint according to UTF-8 (non-ASCII byte sequence),
  1254 
  1255   \item a regular symbol ``@{verbatim \<open>\\<close>}@{verbatim "<"}@{text
  1256   "ident"}@{verbatim ">"}'', for example ``@{verbatim "\<alpha>"}'',
  1257 
  1258   \item a control symbol ``@{verbatim \<open>\\<close>}@{verbatim "<^"}@{text
  1259   "ident"}@{verbatim ">"}'', for example ``@{verbatim "\<^bold>"}'',
  1260 
  1261   \item a raw symbol ``@{verbatim \<open>\\<close>}@{verbatim "<^raw:"}@{text
  1262   text}@{verbatim ">"}'' where @{text text} consists of printable characters
  1263   excluding ``@{verbatim "."}'' and ``@{verbatim ">"}'', for example
  1264   ``@{verbatim "\<^raw:$\sum_{i = 1}^n$>"}'',
  1265 
  1266   \item a numbered raw control symbol ``@{verbatim \<open>\\<close>}@{verbatim
  1267   "<^raw"}@{text n}@{verbatim ">"}, where @{text n} consists of digits, for
  1268   example ``@{verbatim "\<^raw42>"}''.
  1269 
  1270   \end{enumerate}
  1271 
  1272   The @{text "ident"} syntax for symbol names is @{text "letter
  1273   (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text
  1274   "digit = 0..9"}.  There are infinitely many regular symbols and
  1275   control symbols, but a fixed collection of standard symbols is
  1276   treated specifically.  For example, ``@{verbatim "\<alpha>"}'' is
  1277   classified as a letter, which means it may occur within regular
  1278   Isabelle identifiers.
  1279 
  1280   The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit
  1281   character sequences are passed-through unchanged. Unicode/UCS data in UTF-8
  1282   encoding is processed in a non-strict fashion, such that well-formed code
  1283   sequences are recognized accordingly. Unicode provides its own collection of
  1284   mathematical symbols, but within the core Isabelle/ML world there is no link
  1285   to the standard collection of Isabelle regular symbols.
  1286 
  1287   \medskip Output of Isabelle symbols depends on the print mode. For example,
  1288   the standard {\LaTeX} setup of the Isabelle document preparation system
  1289   would present ``@{verbatim "\<alpha>"}'' as @{text "\<alpha>"}, and ``@{verbatim
  1290   "\<^bold>\<alpha>"}'' as @{text "\<^bold>\<alpha>"}. On-screen rendering usually works by mapping a
  1291   finite subset of Isabelle symbols to suitable Unicode characters.
  1292 \<close>
  1293 
  1294 text %mlref \<open>
  1295   \begin{mldecls}
  1296   @{index_ML_type "Symbol.symbol": string} \\
  1297   @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
  1298   @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
  1299   @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
  1300   @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
  1301   @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
  1302   \end{mldecls}
  1303   \begin{mldecls}
  1304   @{index_ML_type "Symbol.sym"} \\
  1305   @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
  1306   \end{mldecls}
  1307 
  1308   \begin{description}
  1309 
  1310   \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle
  1311   symbols.
  1312 
  1313   \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list
  1314   from the packed form.  This function supersedes @{ML
  1315   "String.explode"} for virtually all purposes of manipulating text in
  1316   Isabelle!\footnote{The runtime overhead for exploded strings is
  1317   mainly that of the list structure: individual symbols that happen to
  1318   be a singleton string do not require extra memory in Poly/ML.}
  1319 
  1320   \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
  1321   "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard
  1322   symbols according to fixed syntactic conventions of Isabelle, cf.\
  1323   @{cite "isabelle-isar-ref"}.
  1324 
  1325   \item Type @{ML_type "Symbol.sym"} is a concrete datatype that
  1326   represents the different kinds of symbols explicitly, with
  1327   constructors @{ML "Symbol.Char"}, @{ML "Symbol.UTF8"},
  1328   @{ML "Symbol.Sym"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"},
  1329   @{ML "Symbol.Malformed"}.
  1330 
  1331   \item @{ML "Symbol.decode"} converts the string representation of a
  1332   symbol into the datatype version.
  1333 
  1334   \end{description}
  1335 
  1336   \paragraph{Historical note.} In the original SML90 standard the
  1337   primitive ML type @{ML_type char} did not exists, and @{ML_text
  1338   "explode: string -> string list"} produced a list of singleton
  1339   strings like @{ML "raw_explode: string -> string list"} in
  1340   Isabelle/ML today.  When SML97 came out, Isabelle did not adopt its
  1341   somewhat anachronistic 8-bit or 16-bit characters, but the idea of
  1342   exploding a string into a list of small strings was extended to
  1343   ``symbols'' as explained above.  Thus Isabelle sources can refer to
  1344   an infinite store of user-defined symbols, without having to worry
  1345   about the multitude of Unicode encodings that have emerged over the
  1346   years.\<close>
  1347 
  1348 
  1349 section \<open>Basic data types\<close>
  1350 
  1351 text \<open>The basis library proposal of SML97 needs to be treated with
  1352   caution.  Many of its operations simply do not fit with important
  1353   Isabelle/ML conventions (like ``canonical argument order'', see
  1354   \secref{sec:canonical-argument-order}), others cause problems with
  1355   the parallel evaluation model of Isabelle/ML (such as @{ML
  1356   TextIO.print} or @{ML OS.Process.system}).
  1357 
  1358   Subsequently we give a brief overview of important operations on
  1359   basic ML data types.
  1360 \<close>
  1361 
  1362 
  1363 subsection \<open>Characters\<close>
  1364 
  1365 text %mlref \<open>
  1366   \begin{mldecls}
  1367   @{index_ML_type char} \\
  1368   \end{mldecls}
  1369 
  1370   \begin{description}
  1371 
  1372   \item Type @{ML_type char} is \emph{not} used.  The smallest textual
  1373   unit in Isabelle is represented as a ``symbol'' (see
  1374   \secref{sec:symbols}).
  1375 
  1376   \end{description}
  1377 \<close>
  1378 
  1379 
  1380 subsection \<open>Strings\<close>
  1381 
  1382 text %mlref \<open>
  1383   \begin{mldecls}
  1384   @{index_ML_type string} \\
  1385   \end{mldecls}
  1386 
  1387   \begin{description}
  1388 
  1389   \item Type @{ML_type string} represents immutable vectors of 8-bit
  1390   characters.  There are operations in SML to convert back and forth
  1391   to actual byte vectors, which are seldom used.
  1392 
  1393   This historically important raw text representation is used for
  1394   Isabelle-specific purposes with the following implicit substructures
  1395   packed into the string content:
  1396 
  1397   \begin{enumerate}
  1398 
  1399   \item sequence of Isabelle symbols (see also \secref{sec:symbols}),
  1400   with @{ML Symbol.explode} as key operation;
  1401 
  1402   \item XML tree structure via YXML (see also @{cite "isabelle-sys"}),
  1403   with @{ML YXML.parse_body} as key operation.
  1404 
  1405   \end{enumerate}
  1406 
  1407   Note that Isabelle/ML string literals may refer Isabelle symbols like
  1408   ``@{verbatim \<alpha>}'' natively, \emph{without} escaping the backslash. This is a
  1409   consequence of Isabelle treating all source text as strings of symbols,
  1410   instead of raw characters.
  1411 
  1412   \end{description}
  1413 \<close>
  1414 
  1415 text %mlex \<open>The subsequent example illustrates the difference of
  1416   physical addressing of bytes versus logical addressing of symbols in
  1417   Isabelle strings.
  1418 \<close>
  1419 
  1420 ML_val \<open>
  1421   val s = "\<A>";
  1422 
  1423   @{assert} (length (Symbol.explode s) = 1);
  1424   @{assert} (size s = 4);
  1425 \<close>
  1426 
  1427 text \<open>Note that in Unicode renderings of the symbol @{text "\<A>"},
  1428   variations of encodings like UTF-8 or UTF-16 pose delicate questions
  1429   about the multi-byte representations of its codepoint, which is outside
  1430   of the 16-bit address space of the original Unicode standard from
  1431   the 1990-ies.  In Isabelle/ML it is just ``@{verbatim \<A>}''
  1432   literally, using plain ASCII characters beyond any doubts.\<close>
  1433 
  1434 
  1435 subsection \<open>Integers\<close>
  1436 
  1437 text %mlref \<open>
  1438   \begin{mldecls}
  1439   @{index_ML_type int} \\
  1440   \end{mldecls}
  1441 
  1442   \begin{description}
  1443 
  1444   \item Type @{ML_type int} represents regular mathematical integers, which
  1445   are \emph{unbounded}. Overflow is treated properly, but should never happen
  1446   in practice.\footnote{The size limit for integer bit patterns in memory is
  1447   64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.} This works
  1448   uniformly for all supported ML platforms (Poly/ML and SML/NJ).
  1449 
  1450   Literal integers in ML text are forced to be of this one true
  1451   integer type --- adhoc overloading of SML97 is disabled.
  1452 
  1453   Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
  1454   @{ML_structure Int}.  Structure @{ML_structure Integer} in @{file
  1455   "~~/src/Pure/General/integer.ML"} provides some additional
  1456   operations.
  1457 
  1458   \end{description}
  1459 \<close>
  1460 
  1461 
  1462 subsection \<open>Time\<close>
  1463 
  1464 text %mlref \<open>
  1465   \begin{mldecls}
  1466   @{index_ML_type Time.time} \\
  1467   @{index_ML seconds: "real -> Time.time"} \\
  1468   \end{mldecls}
  1469 
  1470   \begin{description}
  1471 
  1472   \item Type @{ML_type Time.time} represents time abstractly according
  1473   to the SML97 basis library definition.  This is adequate for
  1474   internal ML operations, but awkward in concrete time specifications.
  1475 
  1476   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text
  1477   "s"} (measured in seconds) into an abstract time value.  Floating
  1478   point numbers are easy to use as configuration options in the
  1479   context (see \secref{sec:config-options}) or system options that
  1480   are maintained externally.
  1481 
  1482   \end{description}
  1483 \<close>
  1484 
  1485 
  1486 subsection \<open>Options\<close>
  1487 
  1488 text %mlref \<open>
  1489   \begin{mldecls}
  1490   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
  1491   @{index_ML is_some: "'a option -> bool"} \\
  1492   @{index_ML is_none: "'a option -> bool"} \\
  1493   @{index_ML the: "'a option -> 'a"} \\
  1494   @{index_ML these: "'a list option -> 'a list"} \\
  1495   @{index_ML the_list: "'a option -> 'a list"} \\
  1496   @{index_ML the_default: "'a -> 'a option -> 'a"} \\
  1497   \end{mldecls}
  1498 \<close>
  1499 
  1500 text \<open>Apart from @{ML Option.map} most other operations defined in
  1501   structure @{ML_structure Option} are alien to Isabelle/ML and never
  1502   used.  The operations shown above are defined in @{file
  1503   "~~/src/Pure/General/basics.ML"}.\<close>
  1504 
  1505 
  1506 subsection \<open>Lists\<close>
  1507 
  1508 text \<open>Lists are ubiquitous in ML as simple and light-weight
  1509   ``collections'' for many everyday programming tasks.  Isabelle/ML
  1510   provides important additions and improvements over operations that
  1511   are predefined in the SML97 library.\<close>
  1512 
  1513 text %mlref \<open>
  1514   \begin{mldecls}
  1515   @{index_ML cons: "'a -> 'a list -> 'a list"} \\
  1516   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
  1517   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
  1518   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
  1519   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
  1520   \end{mldecls}
  1521 
  1522   \begin{description}
  1523 
  1524   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.
  1525 
  1526   Tupled infix operators are a historical accident in Standard ML.
  1527   The curried @{ML cons} amends this, but it should be only used when
  1528   partial application is required.
  1529 
  1530   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat
  1531   lists as a set-like container that maintains the order of elements.
  1532   See @{file "~~/src/Pure/library.ML"} for the full specifications
  1533   (written in ML).  There are some further derived operations like
  1534   @{ML union} or @{ML inter}.
  1535 
  1536   Note that @{ML insert} is conservative about elements that are
  1537   already a @{ML member} of the list, while @{ML update} ensures that
  1538   the latest entry is always put in front.  The latter discipline is
  1539   often more appropriate in declarations of context data
  1540   (\secref{sec:context-data}) that are issued by the user in Isar
  1541   source: later declarations take precedence over earlier ones.
  1542 
  1543   \end{description}
  1544 \<close>
  1545 
  1546 text %mlex \<open>Using canonical @{ML fold} together with @{ML cons} (or
  1547   similar standard operations) alternates the orientation of data.
  1548   The is quite natural and should not be altered forcible by inserting
  1549   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can
  1550   be used in the few situations, where alternation should be
  1551   prevented.
  1552 \<close>
  1553 
  1554 ML \<open>
  1555   val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
  1556 
  1557   val list1 = fold cons items [];
  1558   @{assert} (list1 = rev items);
  1559 
  1560   val list2 = fold_rev cons items [];
  1561   @{assert} (list2 = items);
  1562 \<close>
  1563 
  1564 text \<open>The subsequent example demonstrates how to \emph{merge} two
  1565   lists in a natural way.\<close>
  1566 
  1567 ML \<open>
  1568   fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;
  1569 \<close>
  1570 
  1571 text \<open>Here the first list is treated conservatively: only the new
  1572   elements from the second list are inserted.  The inside-out order of
  1573   insertion via @{ML fold_rev} attempts to preserve the order of
  1574   elements in the result.
  1575 
  1576   This way of merging lists is typical for context data
  1577   (\secref{sec:context-data}).  See also @{ML merge} as defined in
  1578   @{file "~~/src/Pure/library.ML"}.
  1579 \<close>
  1580 
  1581 
  1582 subsection \<open>Association lists\<close>
  1583 
  1584 text \<open>The operations for association lists interpret a concrete list
  1585   of pairs as a finite function from keys to values.  Redundant
  1586   representations with multiple occurrences of the same key are
  1587   implicitly normalized: lookup and update only take the first
  1588   occurrence into account.
  1589 \<close>
  1590 
  1591 text \<open>
  1592   \begin{mldecls}
  1593   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
  1594   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
  1595   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
  1596   \end{mldecls}
  1597 
  1598   \begin{description}
  1599 
  1600   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}
  1601   implement the main ``framework operations'' for mappings in
  1602   Isabelle/ML, following standard conventions for their names and
  1603   types.
  1604 
  1605   Note that a function called @{verbatim lookup} is obliged to express its
  1606   partiality via an explicit option element.  There is no choice to
  1607   raise an exception, without changing the name to something like
  1608   @{text "the_element"} or @{text "get"}.
  1609 
  1610   The @{text "defined"} operation is essentially a contraction of @{ML
  1611   is_some} and @{verbatim "lookup"}, but this is sufficiently frequent to
  1612   justify its independent existence.  This also gives the
  1613   implementation some opportunity for peep-hole optimization.
  1614 
  1615   \end{description}
  1616 
  1617   Association lists are adequate as simple implementation of finite mappings
  1618   in many practical situations. A more advanced table structure is defined in
  1619   @{file "~~/src/Pure/General/table.ML"}; that version scales easily to
  1620   thousands or millions of elements.
  1621 \<close>
  1622 
  1623 
  1624 subsection \<open>Unsynchronized references\<close>
  1625 
  1626 text %mlref \<open>
  1627   \begin{mldecls}
  1628   @{index_ML_type "'a Unsynchronized.ref"} \\
  1629   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
  1630   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
  1631   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
  1632   \end{mldecls}
  1633 \<close>
  1634 
  1635 text \<open>Due to ubiquitous parallelism in Isabelle/ML (see also
  1636   \secref{sec:multi-threading}), the mutable reference cells of
  1637   Standard ML are notorious for causing problems.  In a highly
  1638   parallel system, both correctness \emph{and} performance are easily
  1639   degraded when using mutable data.
  1640 
  1641   The unwieldy name of @{ML Unsynchronized.ref} for the constructor
  1642   for references in Isabelle/ML emphasizes the inconveniences caused by
  1643   mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are
  1644   unchanged, but should be used with special precautions, say in a
  1645   strictly local situation that is guaranteed to be restricted to
  1646   sequential evaluation --- now and in the future.
  1647 
  1648   \begin{warn}
  1649   Never @{ML_text "open Unsynchronized"}, not even in a local scope!
  1650   Pretending that mutable state is no problem is a very bad idea.
  1651   \end{warn}
  1652 \<close>
  1653 
  1654 
  1655 section \<open>Thread-safe programming \label{sec:multi-threading}\<close>
  1656 
  1657 text \<open>Multi-threaded execution has become an everyday reality in
  1658   Isabelle since Poly/ML 5.2.1 and Isabelle2008.  Isabelle/ML provides
  1659   implicit and explicit parallelism by default, and there is no way
  1660   for user-space tools to ``opt out''.  ML programs that are purely
  1661   functional, output messages only via the official channels
  1662   (\secref{sec:message-channels}), and do not intercept interrupts
  1663   (\secref{sec:exceptions}) can participate in the multi-threaded
  1664   environment immediately without further ado.
  1665 
  1666   More ambitious tools with more fine-grained interaction with the
  1667   environment need to observe the principles explained below.
  1668 \<close>
  1669 
  1670 
  1671 subsection \<open>Multi-threading with shared memory\<close>
  1672 
  1673 text \<open>Multiple threads help to organize advanced operations of the
  1674   system, such as real-time conditions on command transactions,
  1675   sub-components with explicit communication, general asynchronous
  1676   interaction etc.  Moreover, parallel evaluation is a prerequisite to
  1677   make adequate use of the CPU resources that are available on
  1678   multi-core systems.\footnote{Multi-core computing does not mean that
  1679   there are ``spare cycles'' to be wasted.  It means that the
  1680   continued exponential speedup of CPU performance due to ``Moore's
  1681   Law'' follows different rules: clock frequency has reached its peak
  1682   around 2005, and applications need to be parallelized in order to
  1683   avoid a perceived loss of performance.  See also
  1684   @{cite "Sutter:2005"}.}
  1685 
  1686   Isabelle/Isar exploits the inherent structure of theories and proofs to
  1687   support \emph{implicit parallelism} to a large extent. LCF-style theorem
  1688   proving provides almost ideal conditions for that, see also
  1689   @{cite "Wenzel:2009"}. This means, significant parts of theory and proof
  1690   checking is parallelized by default. In Isabelle2013, a maximum
  1691   speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be expected
  1692   @{cite "Wenzel:2013:ITP"}.
  1693 
  1694   \medskip ML threads lack the memory protection of separate
  1695   processes, and operate concurrently on shared heap memory.  This has
  1696   the advantage that results of independent computations are directly
  1697   available to other threads: abstract values can be passed without
  1698   copying or awkward serialization that is typically required for
  1699   separate processes.
  1700 
  1701   To make shared-memory multi-threading work robustly and efficiently,
  1702   some programming guidelines need to be observed.  While the ML
  1703   system is responsible to maintain basic integrity of the
  1704   representation of ML values in memory, the application programmer
  1705   needs to ensure that multi-threaded execution does not break the
  1706   intended semantics.
  1707 
  1708   \begin{warn}
  1709   To participate in implicit parallelism, tools need to be
  1710   thread-safe.  A single ill-behaved tool can affect the stability and
  1711   performance of the whole system.
  1712   \end{warn}
  1713 
  1714   Apart from observing the principles of thread-safeness passively, advanced
  1715   tools may also exploit parallelism actively, e.g.\ by using library
  1716   functions for parallel list operations (\secref{sec:parlist}).
  1717 
  1718   \begin{warn}
  1719   Parallel computing resources are managed centrally by the
  1720   Isabelle/ML infrastructure.  User programs should not fork their own
  1721   ML threads to perform heavy computations.
  1722   \end{warn}
  1723 \<close>
  1724 
  1725 
  1726 subsection \<open>Critical shared resources\<close>
  1727 
  1728 text \<open>Thread-safeness is mainly concerned about concurrent
  1729   read/write access to shared resources, which are outside the purely
  1730   functional world of ML.  This covers the following in particular.
  1731 
  1732   \begin{itemize}
  1733 
  1734   \item Global references (or arrays), i.e.\ mutable memory cells that
  1735   persist over several invocations of associated
  1736   operations.\footnote{This is independent of the visibility of such
  1737   mutable values in the toplevel scope.}
  1738 
  1739   \item Global state of the running Isabelle/ML process, i.e.\ raw I/O
  1740   channels, environment variables, current working directory.
  1741 
  1742   \item Writable resources in the file-system that are shared among
  1743   different threads or external processes.
  1744 
  1745   \end{itemize}
  1746 
  1747   Isabelle/ML provides various mechanisms to avoid critical shared
  1748   resources in most situations.  As last resort there are some
  1749   mechanisms for explicit synchronization.  The following guidelines
  1750   help to make Isabelle/ML programs work smoothly in a concurrent
  1751   environment.
  1752 
  1753   \begin{itemize}
  1754 
  1755   \item Avoid global references altogether.  Isabelle/Isar maintains a
  1756   uniform context that incorporates arbitrary data declared by user
  1757   programs (\secref{sec:context-data}).  This context is passed as
  1758   plain value and user tools can get/map their own data in a purely
  1759   functional manner.  Configuration options within the context
  1760   (\secref{sec:config-options}) provide simple drop-in replacements
  1761   for historic reference variables.
  1762 
  1763   \item Keep components with local state information re-entrant.
  1764   Instead of poking initial values into (private) global references, a
  1765   new state record can be created on each invocation, and passed
  1766   through any auxiliary functions of the component.  The state record
  1767   contain mutable references in special situations, without requiring any
  1768   synchronization, as long as each invocation gets its own copy and the
  1769   tool itself is single-threaded.
  1770 
  1771   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The
  1772   Poly/ML library is thread-safe for each individual output operation,
  1773   but the ordering of parallel invocations is arbitrary.  This means
  1774   raw output will appear on some system console with unpredictable
  1775   interleaving of atomic chunks.
  1776 
  1777   Note that this does not affect regular message output channels
  1778   (\secref{sec:message-channels}).  An official message id is associated
  1779   with the command transaction from where it originates, independently
  1780   of other transactions.  This means each running Isar command has
  1781   effectively its own set of message channels, and interleaving can
  1782   only happen when commands use parallelism internally (and only at
  1783   message boundaries).
  1784 
  1785   \item Treat environment variables and the current working directory
  1786   of the running process as read-only.
  1787 
  1788   \item Restrict writing to the file-system to unique temporary files.
  1789   Isabelle already provides a temporary directory that is unique for
  1790   the running process, and there is a centralized source of unique
  1791   serial numbers in Isabelle/ML.  Thus temporary files that are passed
  1792   to to some external process will be always disjoint, and thus
  1793   thread-safe.
  1794 
  1795   \end{itemize}
  1796 \<close>
  1797 
  1798 text %mlref \<open>
  1799   \begin{mldecls}
  1800   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
  1801   @{index_ML serial_string: "unit -> string"} \\
  1802   \end{mldecls}
  1803 
  1804   \begin{description}
  1805 
  1806   \item @{ML File.tmp_path}~@{text "path"} relocates the base
  1807   component of @{text "path"} into the unique temporary directory of
  1808   the running Isabelle/ML process.
  1809 
  1810   \item @{ML serial_string}~@{text "()"} creates a new serial number
  1811   that is unique over the runtime of the Isabelle/ML process.
  1812 
  1813   \end{description}
  1814 \<close>
  1815 
  1816 text %mlex \<open>The following example shows how to create unique
  1817   temporary file names.
  1818 \<close>
  1819 
  1820 ML \<open>
  1821   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
  1822   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
  1823   @{assert} (tmp1 <> tmp2);
  1824 \<close>
  1825 
  1826 
  1827 subsection \<open>Explicit synchronization\<close>
  1828 
  1829 text \<open>Isabelle/ML provides explicit synchronization for mutable variables over
  1830   immutable data, which may be updated atomically and exclusively. This
  1831   addresses the rare situations where mutable shared resources are really
  1832   required. Synchronization in Isabelle/ML is based on primitives of Poly/ML,
  1833   which have been adapted to the specific assumptions of the concurrent
  1834   Isabelle environment. User code should not break this abstraction, but stay
  1835   within the confines of concurrent Isabelle/ML.
  1836 
  1837   A \emph{synchronized variable} is an explicit state component associated
  1838   with mechanisms for locking and signaling. There are operations to await a
  1839   condition, change the state, and signal the change to all other waiting
  1840   threads. Synchronized access to the state variable is \emph{not} re-entrant:
  1841   direct or indirect nesting within the same thread will cause a deadlock!\<close>
  1842 
  1843 text %mlref \<open>
  1844   \begin{mldecls}
  1845   @{index_ML_type "'a Synchronized.var"} \\
  1846   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
  1847   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
  1848   ('a -> ('b * 'a) option) -> 'b"} \\
  1849   \end{mldecls}
  1850 
  1851   \begin{description}
  1852 
  1853   \item Type @{ML_type "'a Synchronized.var"} represents synchronized
  1854   variables with state of type @{ML_type 'a}.
  1855 
  1856   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized
  1857   variable that is initialized with value @{text "x"}.  The @{text
  1858   "name"} is used for tracing.
  1859 
  1860   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the
  1861   function @{text "f"} operate within a critical section on the state
  1862   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it
  1863   continues to wait on the internal condition variable, expecting that
  1864   some other thread will eventually change the content in a suitable
  1865   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is
  1866   satisfied and assigns the new state value @{text "x'"}, broadcasts a
  1867   signal to all waiting threads on the associated condition variable,
  1868   and returns the result @{text "y"}.
  1869 
  1870   \end{description}
  1871 
  1872   There are some further variants of the @{ML
  1873   Synchronized.guarded_access} combinator, see @{file
  1874   "~~/src/Pure/Concurrent/synchronized.ML"} for details.
  1875 \<close>
  1876 
  1877 text %mlex \<open>The following example implements a counter that produces
  1878   positive integers that are unique over the runtime of the Isabelle
  1879   process:\<close>
  1880 
  1881 ML \<open>
  1882   local
  1883     val counter = Synchronized.var "counter" 0;
  1884   in
  1885     fun next () =
  1886       Synchronized.guarded_access counter
  1887         (fn i =>
  1888           let val j = i + 1
  1889           in SOME (j, j) end);
  1890   end;
  1891 \<close>
  1892 
  1893 ML \<open>
  1894   val a = next ();
  1895   val b = next ();
  1896   @{assert} (a <> b);
  1897 \<close>
  1898 
  1899 text \<open>\medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how
  1900   to implement a mailbox as synchronized variable over a purely
  1901   functional list.\<close>
  1902 
  1903 
  1904 section \<open>Managed evaluation\<close>
  1905 
  1906 text \<open>Execution of Standard ML follows the model of strict
  1907   functional evaluation with optional exceptions.  Evaluation happens
  1908   whenever some function is applied to (sufficiently many)
  1909   arguments. The result is either an explicit value or an implicit
  1910   exception.
  1911 
  1912   \emph{Managed evaluation} in Isabelle/ML organizes expressions and
  1913   results to control certain physical side-conditions, to say more
  1914   specifically when and how evaluation happens.  For example, the
  1915   Isabelle/ML library supports lazy evaluation with memoing, parallel
  1916   evaluation via futures, asynchronous evaluation via promises,
  1917   evaluation with time limit etc.
  1918 
  1919   \medskip An \emph{unevaluated expression} is represented either as
  1920   unit abstraction @{verbatim "fn () => a"} of type
  1921   @{verbatim "unit -> 'a"} or as regular function
  1922   @{verbatim "fn a => b"} of type @{verbatim "'a -> 'b"}.  Both forms
  1923   occur routinely, and special care is required to tell them apart ---
  1924   the static type-system of SML is only of limited help here.
  1925 
  1926   The first form is more intuitive: some combinator @{text "(unit ->
  1927   'a) -> 'a"} applies the given function to @{text "()"} to initiate
  1928   the postponed evaluation process.  The second form is more flexible:
  1929   some combinator @{text "('a -> 'b) -> 'a -> 'b"} acts like a
  1930   modified form of function application; several such combinators may
  1931   be cascaded to modify a given function, before it is ultimately
  1932   applied to some argument.
  1933 
  1934   \medskip \emph{Reified results} make the disjoint sum of regular
  1935   values versions exceptional situations explicit as ML datatype:
  1936   @{text "'a result = Res of 'a | Exn of exn"}.  This is typically
  1937   used for administrative purposes, to store the overall outcome of an
  1938   evaluation process.
  1939 
  1940   \emph{Parallel exceptions} aggregate reified results, such that
  1941   multiple exceptions are digested as a collection in canonical form
  1942   that identifies exceptions according to their original occurrence.
  1943   This is particular important for parallel evaluation via futures
  1944   \secref{sec:futures}, which are organized as acyclic graph of
  1945   evaluations that depend on other evaluations: exceptions stemming
  1946   from shared sub-graphs are exposed exactly once and in the order of
  1947   their original occurrence (e.g.\ when printed at the toplevel).
  1948   Interrupt counts as neutral element here: it is treated as minimal
  1949   information about some canceled evaluation process, and is absorbed
  1950   by the presence of regular program exceptions.\<close>
  1951 
  1952 text %mlref \<open>
  1953   \begin{mldecls}
  1954   @{index_ML_type "'a Exn.result"} \\
  1955   @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
  1956   @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
  1957   @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
  1958   @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
  1959   @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
  1960   \end{mldecls}
  1961 
  1962   \begin{description}
  1963 
  1964   \item Type @{ML_type "'a Exn.result"} represents the disjoint sum of
  1965   ML results explicitly, with constructor @{ML Exn.Res} for regular
  1966   values and @{ML "Exn.Exn"} for exceptions.
  1967 
  1968   \item @{ML Exn.capture}~@{text "f x"} manages the evaluation of
  1969   @{text "f x"} such that exceptions are made explicit as @{ML
  1970   "Exn.Exn"}.  Note that this includes physical interrupts (see also
  1971   \secref{sec:exceptions}), so the same precautions apply to user
  1972   code: interrupts must not be absorbed accidentally!
  1973 
  1974   \item @{ML Exn.interruptible_capture} is similar to @{ML
  1975   Exn.capture}, but interrupts are immediately re-raised as required
  1976   for user code.
  1977 
  1978   \item @{ML Exn.release}~@{text "result"} releases the original
  1979   runtime result, exposing its regular value or raising the reified
  1980   exception.
  1981 
  1982   \item @{ML Par_Exn.release_all}~@{text "results"} combines results
  1983   that were produced independently (e.g.\ by parallel evaluation).  If
  1984   all results are regular values, that list is returned.  Otherwise,
  1985   the collection of all exceptions is raised, wrapped-up as collective
  1986   parallel exception.  Note that the latter prevents access to
  1987   individual exceptions by conventional @{verbatim "handle"} of ML.
  1988 
  1989   \item @{ML Par_Exn.release_first} is similar to @{ML
  1990   Par_Exn.release_all}, but only the first (meaningful) exception that has
  1991   occurred in the original evaluation process is raised again, the others are
  1992   ignored.  That single exception may get handled by conventional
  1993   means in ML.
  1994 
  1995   \end{description}
  1996 \<close>
  1997 
  1998 
  1999 subsection \<open>Parallel skeletons \label{sec:parlist}\<close>
  2000 
  2001 text \<open>
  2002   Algorithmic skeletons are combinators that operate on lists in
  2003   parallel, in the manner of well-known @{text map}, @{text exists},
  2004   @{text forall} etc.  Management of futures (\secref{sec:futures})
  2005   and their results as reified exceptions is wrapped up into simple
  2006   programming interfaces that resemble the sequential versions.
  2007 
  2008   What remains is the application-specific problem to present
  2009   expressions with suitable \emph{granularity}: each list element
  2010   corresponds to one evaluation task.  If the granularity is too
  2011   coarse, the available CPUs are not saturated.  If it is too
  2012   fine-grained, CPU cycles are wasted due to the overhead of
  2013   organizing parallel processing.  In the worst case, parallel
  2014   performance will be less than the sequential counterpart!
  2015 \<close>
  2016 
  2017 text %mlref \<open>
  2018   \begin{mldecls}
  2019   @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
  2020   @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
  2021   \end{mldecls}
  2022 
  2023   \begin{description}
  2024 
  2025   \item @{ML Par_List.map}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} is like @{ML
  2026   "map"}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"}, but the evaluation of @{text "f x\<^sub>i"}
  2027   for @{text "i = 1, \<dots>, n"} is performed in parallel.
  2028 
  2029   An exception in any @{text "f x\<^sub>i"} cancels the overall evaluation
  2030   process.  The final result is produced via @{ML
  2031   Par_Exn.release_first} as explained above, which means the first
  2032   program exception that happened to occur in the parallel evaluation
  2033   is propagated, and all other failures are ignored.
  2034 
  2035   \item @{ML Par_List.get_some}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} produces some
  2036   @{text "f x\<^sub>i"} that is of the form @{text "SOME y\<^sub>i"}, if that
  2037   exists, otherwise @{text "NONE"}.  Thus it is similar to @{ML
  2038   Library.get_first}, but subject to a non-deterministic parallel
  2039   choice process.  The first successful result cancels the overall
  2040   evaluation process; other exceptions are propagated as for @{ML
  2041   Par_List.map}.
  2042 
  2043   This generic parallel choice combinator is the basis for derived
  2044   forms, such as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML
  2045   Par_List.forall}.
  2046 
  2047   \end{description}
  2048 \<close>
  2049 
  2050 text %mlex \<open>Subsequently, the Ackermann function is evaluated in
  2051   parallel for some ranges of arguments.\<close>
  2052 
  2053 ML_val \<open>
  2054   fun ackermann 0 n = n + 1
  2055     | ackermann m 0 = ackermann (m - 1) 1
  2056     | ackermann m n = ackermann (m - 1) (ackermann m (n - 1));
  2057 
  2058   Par_List.map (ackermann 2) (500 upto 1000);
  2059   Par_List.map (ackermann 3) (5 upto 10);
  2060 \<close>
  2061 
  2062 
  2063 subsection \<open>Lazy evaluation\<close>
  2064 
  2065 text \<open>
  2066   Classic lazy evaluation works via the @{text lazy}~/ @{text force} pair of
  2067   operations: @{text lazy} to wrap an unevaluated expression, and @{text
  2068   force} to evaluate it once and store its result persistently. Later
  2069   invocations of @{text force} retrieve the stored result without another
  2070   evaluation. Isabelle/ML refines this idea to accommodate the aspects of
  2071   multi-threading, synchronous program exceptions and asynchronous interrupts.
  2072 
  2073   The first thread that invokes @{text force} on an unfinished lazy value
  2074   changes its state into a \emph{promise} of the eventual result and starts
  2075   evaluating it. Any other threads that @{text force} the same lazy value in
  2076   the meantime need to wait for it to finish, by producing a regular result or
  2077   program exception. If the evaluation attempt is interrupted, this event is
  2078   propagated to all waiting threads and the lazy value is reset to its
  2079   original state.
  2080 
  2081   This means a lazy value is completely evaluated at most once, in a
  2082   thread-safe manner. There might be multiple interrupted evaluation attempts,
  2083   and multiple receivers of intermediate interrupt events. Interrupts are
  2084   \emph{not} made persistent: later evaluation attempts start again from the
  2085   original expression.
  2086 \<close>
  2087 
  2088 text %mlref \<open>
  2089   \begin{mldecls}
  2090   @{index_ML_type "'a lazy"} \\
  2091   @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\
  2092   @{index_ML Lazy.value: "'a -> 'a lazy"} \\
  2093   @{index_ML Lazy.force: "'a lazy -> 'a"} \\
  2094   \end{mldecls}
  2095 
  2096   \begin{description}
  2097 
  2098   \item Type @{ML_type "'a lazy"} represents lazy values over type @{verbatim
  2099   "'a"}.
  2100 
  2101   \item @{ML Lazy.lazy}~@{text "(fn () => e)"} wraps the unevaluated
  2102   expression @{text e} as unfinished lazy value.
  2103 
  2104   \item @{ML Lazy.value}~@{text a} wraps the value @{text a} as finished lazy
  2105   value.  When forced, it returns @{text a} without any further evaluation.
  2106 
  2107   There is very low overhead for this proforma wrapping of strict values as
  2108   lazy values.
  2109 
  2110   \item @{ML Lazy.force}~@{text x} produces the result of the lazy value in a
  2111   thread-safe manner as explained above. Thus it may cause the current thread
  2112   to wait on a pending evaluation attempt by another thread.
  2113 
  2114   \end{description}
  2115 \<close>
  2116 
  2117 
  2118 subsection \<open>Futures \label{sec:futures}\<close>
  2119 
  2120 text \<open>
  2121   Futures help to organize parallel execution in a value-oriented manner, with
  2122   @{text fork}~/ @{text join} as the main pair of operations, and some further
  2123   variants; see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy
  2124   values, futures are evaluated strictly and spontaneously on separate worker
  2125   threads. Futures may be canceled, which leads to interrupts on running
  2126   evaluation attempts, and forces structurally related futures to fail for all
  2127   time; already finished futures remain unchanged. Exceptions between related
  2128   futures are propagated as well, and turned into parallel exceptions (see
  2129   above).
  2130 
  2131   Technically, a future is a single-assignment variable together with a
  2132   \emph{task} that serves administrative purposes, notably within the
  2133   \emph{task queue} where new futures are registered for eventual evaluation
  2134   and the worker threads retrieve their work.
  2135 
  2136   The pool of worker threads is limited, in correlation with the number of
  2137   physical cores on the machine. Note that allocation of runtime resources may
  2138   be distorted either if workers yield CPU time (e.g.\ via system sleep or
  2139   wait operations), or if non-worker threads contend for significant runtime
  2140   resources independently. There is a limited number of replacement worker
  2141   threads that get activated in certain explicit wait conditions, after a
  2142   timeout.
  2143 
  2144   \medskip Each future task belongs to some \emph{task group}, which
  2145   represents the hierarchic structure of related tasks, together with the
  2146   exception status a that point. By default, the task group of a newly created
  2147   future is a new sub-group of the presently running one, but it is also
  2148   possible to indicate different group layouts under program control.
  2149 
  2150   Cancellation of futures actually refers to the corresponding task group and
  2151   all its sub-groups. Thus interrupts are propagated down the group hierarchy.
  2152   Regular program exceptions are treated likewise: failure of the evaluation
  2153   of some future task affects its own group and all sub-groups. Given a
  2154   particular task group, its \emph{group status} cumulates all relevant
  2155   exceptions according to its position within the group hierarchy. Interrupted
  2156   tasks that lack regular result information, will pick up parallel exceptions
  2157   from the cumulative group status.
  2158 
  2159   \medskip A \emph{passive future} or \emph{promise} is a future with slightly
  2160   different evaluation policies: there is only a single-assignment variable
  2161   and some expression to evaluate for the \emph{failed} case (e.g.\ to clean
  2162   up resources when canceled). A regular result is produced by external means,
  2163   using a separate \emph{fulfill} operation.
  2164 
  2165   Promises are managed in the same task queue, so regular futures may depend
  2166   on them. This allows a form of reactive programming, where some promises are
  2167   used as minimal elements (or guards) within the future dependency graph:
  2168   when these promises are fulfilled the evaluation of subsequent futures
  2169   starts spontaneously, according to their own inter-dependencies.
  2170 \<close>
  2171 
  2172 text %mlref \<open>
  2173   \begin{mldecls}
  2174   @{index_ML_type "'a future"} \\
  2175   @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\
  2176   @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\
  2177   @{index_ML Future.join: "'a future -> 'a"} \\
  2178   @{index_ML Future.joins: "'a future list -> 'a list"} \\
  2179   @{index_ML Future.value: "'a -> 'a future"} \\
  2180   @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\
  2181   @{index_ML Future.cancel: "'a future -> unit"} \\
  2182   @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex]
  2183   @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\
  2184   @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\
  2185   \end{mldecls}
  2186 
  2187   \begin{description}
  2188 
  2189   \item Type @{ML_type "'a future"} represents future values over type
  2190   @{verbatim "'a"}.
  2191 
  2192   \item @{ML Future.fork}~@{text "(fn () => e)"} registers the unevaluated
  2193   expression @{text e} as unfinished future value, to be evaluated eventually
  2194   on the parallel worker-thread farm. This is a shorthand for @{ML
  2195   Future.forks} below, with default parameters and a single expression.
  2196 
  2197   \item @{ML Future.forks}~@{text "params exprs"} is the general interface to
  2198   fork several futures simultaneously. The @{text params} consist of the
  2199   following fields:
  2200 
  2201   \begin{itemize}
  2202 
  2203   \item @{text "name : string"} (default @{ML "\"\""}) specifies a common name
  2204   for the tasks of the forked futures, which serves diagnostic purposes.
  2205 
  2206   \item @{text "group : Future.group option"} (default @{ML NONE}) specifies
  2207   an optional task group for the forked futures. @{ML NONE} means that a new
  2208   sub-group of the current worker-thread task context is created. If this is
  2209   not a worker thread, the group will be a new root in the group hierarchy.
  2210 
  2211   \item @{text "deps : Future.task list"} (default @{ML "[]"}) specifies
  2212   dependencies on other future tasks, i.e.\ the adjacency relation in the
  2213   global task queue. Dependencies on already finished tasks are ignored.
  2214 
  2215   \item @{text "pri : int"} (default @{ML 0}) specifies a priority within the
  2216   task queue.
  2217 
  2218   Typically there is only little deviation from the default priority @{ML 0}.
  2219   As a rule of thumb, @{ML "~1"} means ``low priority" and @{ML 1} means
  2220   ``high priority''.
  2221 
  2222   Note that the task priority only affects the position in the queue, not the
  2223   thread priority. When a worker thread picks up a task for processing, it
  2224   runs with the normal thread priority to the end (or until canceled). Higher
  2225   priority tasks that are queued later need to wait until this (or another)
  2226   worker thread becomes free again.
  2227 
  2228   \item @{text "interrupts : bool"} (default @{ML true}) tells whether the
  2229   worker thread that processes the corresponding task is initially put into
  2230   interruptible state. This state may change again while running, by modifying
  2231   the thread attributes.
  2232 
  2233   With interrupts disabled, a running future task cannot be canceled.  It is
  2234   the responsibility of the programmer that this special state is retained
  2235   only briefly.
  2236 
  2237   \end{itemize}
  2238 
  2239   \item @{ML Future.join}~@{text x} retrieves the value of an already finished
  2240   future, which may lead to an exception, according to the result of its
  2241   previous evaluation.
  2242 
  2243   For an unfinished future there are several cases depending on the role of
  2244   the current thread and the status of the future. A non-worker thread waits
  2245   passively until the future is eventually evaluated. A worker thread
  2246   temporarily changes its task context and takes over the responsibility to
  2247   evaluate the future expression on the spot. The latter is done in a
  2248   thread-safe manner: other threads that intend to join the same future need
  2249   to wait until the ongoing evaluation is finished.
  2250 
  2251   Note that excessive use of dynamic dependencies of futures by adhoc joining
  2252   may lead to bad utilization of CPU cores, due to threads waiting on other
  2253   threads to finish required futures. The future task farm has a limited
  2254   amount of replacement threads that continue working on unrelated tasks after
  2255   some timeout.
  2256 
  2257   Whenever possible, static dependencies of futures should be specified
  2258   explicitly when forked (see @{text deps} above). Thus the evaluation can
  2259   work from the bottom up, without join conflicts and wait states.
  2260 
  2261   \item @{ML Future.joins}~@{text xs} joins the given list of futures
  2262   simultaneously, which is more efficient than @{ML "map Future.join"}~@{text
  2263   xs}.
  2264 
  2265   Based on the dependency graph of tasks, the current thread takes over the
  2266   responsibility to evaluate future expressions that are required for the main
  2267   result, working from the bottom up. Waiting on future results that are
  2268   presently evaluated on other threads only happens as last resort, when no
  2269   other unfinished futures are left over.
  2270 
  2271   \item @{ML Future.value}~@{text a} wraps the value @{text a} as finished
  2272   future value, bypassing the worker-thread farm. When joined, it returns
  2273   @{text a} without any further evaluation.
  2274 
  2275   There is very low overhead for this proforma wrapping of strict values as
  2276   futures.
  2277 
  2278   \item @{ML Future.map}~@{text "f x"} is a fast-path implementation of @{ML
  2279   Future.fork}~@{text "(fn () => f ("}@{ML Future.join}~@{text "x))"}, which
  2280   avoids the full overhead of the task queue and worker-thread farm as far as
  2281   possible. The function @{text f} is supposed to be some trivial
  2282   post-processing or projection of the future result.
  2283 
  2284   \item @{ML Future.cancel}~@{text "x"} cancels the task group of the given
  2285   future, using @{ML Future.cancel_group} below.
  2286 
  2287   \item @{ML Future.cancel_group}~@{text "group"} cancels all tasks of the
  2288   given task group for all time. Threads that are presently processing a task
  2289   of the given group are interrupted: it may take some time until they are
  2290   actually terminated. Tasks that are queued but not yet processed are
  2291   dequeued and forced into interrupted state. Since the task group is itself
  2292   invalidated, any further attempt to fork a future that belongs to it will
  2293   yield a canceled result as well.
  2294 
  2295   \item @{ML Future.promise}~@{text abort} registers a passive future with the
  2296   given @{text abort} operation: it is invoked when the future task group is
  2297   canceled.
  2298 
  2299   \item @{ML Future.fulfill}~@{text "x a"} finishes the passive future @{text
  2300   x} by the given value @{text a}. If the promise has already been canceled,
  2301   the attempt to fulfill it causes an exception.
  2302 
  2303   \end{description}
  2304 \<close>
  2305 
  2306 end