src/Doc/Implementation/ML.thy
 author wenzelm Wed Mar 25 11:39:52 2015 +0100 (2015-03-25) changeset 59809 87641097d0f3 parent 59624 6c0e70b01111 child 59902 6afbe5a99139 permissions -rw-r--r--
tuned signature;
     1 (*:wrap=hard:maxLineLen=78:*)

     2

     3 theory "ML"

     4 imports Base

     5 begin

     6

     7 chapter \<open>Isabelle/ML\<close>

     8

     9 text \<open>Isabelle/ML is best understood as a certain culture based on

    10   Standard ML.  Thus it is not a new programming language, but a

    11   certain way to use SML at an advanced level within the Isabelle

    12   environment.  This covers a variety of aspects that are geared

    13   towards an efficient and robust platform for applications of formal

    14   logic with fully foundational proof construction --- according to

    15   the well-known \emph{LCF principle}.  There is specific

    16   infrastructure with library modules to address the needs of this

    17   difficult task.  For example, the raw parallel programming model of

    18   Poly/ML is presented as considerably more abstract concept of

    19   \emph{futures}, which is then used to augment the inference

    20   kernel, Isar theory and proof interpreter, and PIDE document management.

    21

    22   The main aspects of Isabelle/ML are introduced below.  These

    23   first-hand explanations should help to understand how proper

    24   Isabelle/ML is to be read and written, and to get access to the

    25   wealth of experience that is expressed in the source text and its

    26   history of changes.\footnote{See

    27   @{url "http://isabelle.in.tum.de/repos/isabelle"} for the full

    28   Mercurial history.  There are symbolic tags to refer to official

    29   Isabelle releases, as opposed to arbitrary \emph{tip} versions that

    30   merely reflect snapshots that are never really up-to-date.}\<close>

    31

    32

    33 section \<open>Style and orthography\<close>

    34

    35 text \<open>The sources of Isabelle/Isar are optimized for

    36   \emph{readability} and \emph{maintainability}.  The main purpose is

    37   to tell an informed reader what is really going on and how things

    38   really work.  This is a non-trivial aim, but it is supported by a

    39   certain style of writing Isabelle/ML that has emerged from long

    40   years of system development.\footnote{See also the interesting style

    41   guide for OCaml

    42   @{url "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"}

    43   which shares many of our means and ends.}

    44

    45   The main principle behind any coding style is \emph{consistency}.

    46   For a single author of a small program this merely means choose

    47   your style and stick to it''.  A complex project like Isabelle, with

    48   long years of development and different contributors, requires more

    49   standardization.  A coding style that is changed every few years or

    50   with every new contributor is no style at all, because consistency

    51   is quickly lost.  Global consistency is hard to achieve, though.

    52   Nonetheless, one should always strive at least for local consistency

    53   of modules and sub-systems, without deviating from some general

    54   principles how to write Isabelle/ML.

    55

    56   In a sense, good coding style is like an \emph{orthography} for the

    57   sources: it helps to read quickly over the text and see through the

    58   main points, without getting distracted by accidental presentation

    59   of free-style code.

    60 \<close>

    61

    62

    63 subsection \<open>Header and sectioning\<close>

    64

    65 text \<open>Isabelle source files have a certain standardized header

    66   format (with precise spacing) that follows ancient traditions

    67   reaching back to the earliest versions of the system by Larry

    68   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.

    69

    70   The header includes at least @{verbatim Title} and @{verbatim

    71   Author} entries, followed by a prose description of the purpose of

    72   the module.  The latter can range from a single line to several

    73   paragraphs of explanations.

    74

    75   The rest of the file is divided into sections, subsections,

    76   subsubsections, paragraphs etc.\ using a simple layout via ML

    77   comments as follows.

    78

    79   \begin{verbatim}

    80   (*** section ***)

    81

    82   (** subsection **)

    83

    84   (* subsubsection *)

    85

    86   (*short paragraph*)

    87

    88   (*

    89     long paragraph,

    90     with more text

    91   *)

    92   \end{verbatim}

    93

    94   As in regular typography, there is some extra space \emph{before}

    95   section headings that are adjacent to plain text, but not other headings

    96   as in the example above.

    97

    98   \medskip The precise wording of the prose text given in these

    99   headings is chosen carefully to introduce the main theme of the

   100   subsequent formal ML text.

   101 \<close>

   102

   103

   104 subsection \<open>Naming conventions\<close>

   105

   106 text \<open>Since ML is the primary medium to express the meaning of the

   107   source text, naming of ML entities requires special care.

   108

   109   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely

   110   4, but not more) that are separated by underscore.  There are three

   111   variants concerning upper or lower case letters, which are used for

   112   certain ML categories as follows:

   113

   114   \medskip

   115   \begin{tabular}{lll}

   116   variant & example & ML categories \\\hline

   117   lower-case & @{ML_text foo_bar} & values, types, record fields \\

   118   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\

   119   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\

   120   \end{tabular}

   121   \medskip

   122

   123   For historical reasons, many capitalized names omit underscores,

   124   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.

   125   Genuine mixed-case names are \emph{not} used, because clear division

   126   of words is essential for readability.\footnote{Camel-case was

   127   invented to workaround the lack of underscore in some early

   128   non-ASCII character sets.  Later it became habitual in some language

   129   communities that are now strong in numbers.}

   130

   131   A single (capital) character does not count as word'' in this

   132   respect: some Isabelle/ML names are suffixed by extra markers like

   133   this: @{ML_text foo_barT}.

   134

   135   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text

   136   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text

   137   foo''''} or more.  Decimal digits scale better to larger numbers,

   138   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.

   139

   140   \paragraph{Scopes.}  Apart from very basic library modules, ML

   141   structures are not opened'', but names are referenced with

   142   explicit qualification, as in @{ML Syntax.string_of_term} for

   143   example.  When devising names for structures and their components it

   144   is important to aim at eye-catching compositions of both parts, because

   145   this is how they are seen in the sources and documentation.  For the

   146   same reasons, aliases of well-known library functions should be

   147   avoided.

   148

   149   Local names of function abstraction or case/let bindings are

   150   typically shorter, sometimes using only rudiments of words'',

   151   while still avoiding cryptic shorthands.  An auxiliary function

   152   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is

   153   considered bad style.

   154

   155   Example:

   156

   157   \begin{verbatim}

   158   (* RIGHT *)

   159

   160   fun print_foo ctxt foo =

   161     let

   162       fun print t = ... Syntax.string_of_term ctxt t ...

   163     in ... end;

   164

   165

   166   (* RIGHT *)

   167

   168   fun print_foo ctxt foo =

   169     let

   170       val string_of_term = Syntax.string_of_term ctxt;

   171       fun print t = ... string_of_term t ...

   172     in ... end;

   173

   174

   175   (* WRONG *)

   176

   177   val string_of_term = Syntax.string_of_term;

   178

   179   fun print_foo ctxt foo =

   180     let

   181       fun aux t = ... string_of_term ctxt t ...

   182     in ... end;

   183   \end{verbatim}

   184

   185

   186   \paragraph{Specific conventions.} Here are some specific name forms

   187   that occur frequently in the sources.

   188

   189   \begin{itemize}

   190

   191   \item A function that maps @{ML_text foo} to @{ML_text bar} is

   192   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never

   193   @{ML_text foo2bar}, nor @{ML_text bar_from_foo}, nor @{ML_text

   194   bar_for_foo}, nor @{ML_text bar4foo}).

   195

   196   \item The name component @{ML_text legacy} means that the operation

   197   is about to be discontinued soon.

   198

   199   \item The name component @{ML_text global} means that this works

   200   with the background theory instead of the regular local context

   201   (\secref{sec:context}), sometimes for historical reasons, sometimes

   202   due a genuine lack of locality of the concept involved, sometimes as

   203   a fall-back for the lack of a proper context in the application

   204   code.  Whenever there is a non-global variant available, the

   205   application should be migrated to use it with a proper local

   206   context.

   207

   208   \item Variables of the main context types of the Isabelle/Isar

   209   framework (\secref{sec:context} and \chref{ch:local-theory}) have

   210   firm naming conventions as follows:

   211

   212   \begin{itemize}

   213

   214   \item theories are called @{ML_text thy}, rarely @{ML_text theory}

   215   (never @{ML_text thry})

   216

   217   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text

   218   context} (never @{ML_text ctx})

   219

   220   \item generic contexts are called @{ML_text context}

   221

   222   \item local theories are called @{ML_text lthy}, except for local

   223   theories that are treated as proof context (which is a semantic

   224   super-type)

   225

   226   \end{itemize}

   227

   228   Variations with primed or decimal numbers are always possible, as

   229   well as semantic prefixes like @{ML_text foo_thy} or @{ML_text

   230   bar_ctxt}, but the base conventions above need to be preserved.

   231   This allows to emphasize their data flow via plain regular

   232   expressions in the text editor.

   233

   234   \item The main logical entities (\secref{ch:logic}) have established

   235   naming convention as follows:

   236

   237   \begin{itemize}

   238

   239   \item sorts are called @{ML_text S}

   240

   241   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text

   242   ty} (never @{ML_text t})

   243

   244   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text

   245   tm} (never @{ML_text trm})

   246

   247   \item certified types are called @{ML_text cT}, rarely @{ML_text

   248   T}, with variants as for types

   249

   250   \item certified terms are called @{ML_text ct}, rarely @{ML_text

   251   t}, with variants as for terms (never @{ML_text ctrm})

   252

   253   \item theorems are called @{ML_text th}, or @{ML_text thm}

   254

   255   \end{itemize}

   256

   257   Proper semantic names override these conventions completely.  For

   258   example, the left-hand side of an equation (as a term) can be called

   259   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known

   260   to be a variable can be called @{ML_text v} or @{ML_text x}.

   261

   262   \item Tactics (\secref{sec:tactics}) are sufficiently important to

   263   have specific naming conventions.  The name of a basic tactic

   264   definition always has a @{ML_text "_tac"} suffix, the subgoal index

   265   (if applicable) is always called @{ML_text i}, and the goal state

   266   (if made explicit) is usually called @{ML_text st} instead of the

   267   somewhat misleading @{ML_text thm}.  Any other arguments are given

   268   before the latter two, and the general context is given first.

   269   Example:

   270

   271   \begin{verbatim}

   272   fun my_tac ctxt arg1 arg2 i st = ...

   273   \end{verbatim}

   274

   275   Note that the goal state @{ML_text st} above is rarely made

   276   explicit, if tactic combinators (tacticals) are used as usual.

   277

   278   A tactic that requires a proof context needs to make that explicit as seen

   279   in the @{verbatim ctxt} argument above. Do not refer to the background

   280   theory of @{verbatim st} -- it is not a proper context, but merely a formal

   281   certificate.

   282

   283   \end{itemize}

   284 \<close>

   285

   286

   287 subsection \<open>General source layout\<close>

   288

   289 text \<open>

   290   The general Isabelle/ML source layout imitates regular type-setting

   291   conventions, augmented by the requirements for deeply nested expressions

   292   that are commonplace in functional programming.

   293

   294   \paragraph{Line length} is limited to 80 characters according to ancient

   295   standards, but we allow as much as 100 characters (not

   296   more).\footnote{Readability requires to keep the beginning of a line

   297   in view while watching its end.  Modern wide-screen displays do not

   298   change the way how the human brain works.  Sources also need to be

   299   printable on plain paper with reasonable font-size.} The extra 20

   300   characters acknowledge the space requirements due to qualified

   301   library references in Isabelle/ML.

   302

   303   \paragraph{White-space} is used to emphasize the structure of

   304   expressions, following mostly standard conventions for mathematical

   305   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This

   306   defines positioning of spaces for parentheses, punctuation, and

   307   infixes as illustrated here:

   308

   309   \begin{verbatim}

   310   val x = y + z * (a + b);

   311   val pair = (a, b);

   312   val record = {foo = 1, bar = 2};

   313   \end{verbatim}

   314

   315   Lines are normally broken \emph{after} an infix operator or

   316   punctuation character.  For example:

   317

   318   \begin{verbatim}

   319   val x =

   320     a +

   321     b +

   322     c;

   323

   324   val tuple =

   325    (a,

   326     b,

   327     c);

   328   \end{verbatim}

   329

   330   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the

   331   start of the line, but punctuation is always at the end.

   332

   333   Function application follows the tradition of @{text "\<lambda>"}-calculus,

   334   not informal mathematics.  For example: @{ML_text "f a b"} for a

   335   curried function, or @{ML_text "g (a, b)"} for a tupled function.

   336   Note that the space between @{ML_text g} and the pair @{ML_text

   337   "(a, b)"} follows the important principle of

   338   \emph{compositionality}: the layout of @{ML_text "g p"} does not

   339   change when @{ML_text "p"} is refined to the concrete pair

   340   @{ML_text "(a, b)"}.

   341

   342   \paragraph{Indentation} uses plain spaces, never hard

   343   tabulators.\footnote{Tabulators were invented to move the carriage

   344   of a type-writer to certain predefined positions.  In software they

   345   could be used as a primitive run-length compression of consecutive

   346   spaces, but the precise result would depend on non-standardized

   347   text editor configuration.}

   348

   349   Each level of nesting is indented by 2 spaces, sometimes 1, very

   350   rarely 4, never 8 or any other odd number.

   351

   352   Indentation follows a simple logical format that only depends on the

   353   nesting depth, not the accidental length of the text that initiates

   354   a level of nesting.  Example:

   355

   356   \begin{verbatim}

   357   (* RIGHT *)

   358

   359   if b then

   360     expr1_part1

   361     expr1_part2

   362   else

   363     expr2_part1

   364     expr2_part2

   365

   366

   367   (* WRONG *)

   368

   369   if b then expr1_part1

   370             expr1_part2

   371   else expr2_part1

   372        expr2_part2

   373   \end{verbatim}

   374

   375   The second form has many problems: it assumes a fixed-width font

   376   when viewing the sources, it uses more space on the line and thus

   377   makes it hard to observe its strict length limit (working against

   378   \emph{readability}), it requires extra editing to adapt the layout

   379   to changes of the initial text (working against

   380   \emph{maintainability}) etc.

   381

   382   \medskip For similar reasons, any kind of two-dimensional or tabular

   383   layouts, ASCII-art with lines or boxes of asterisks etc.\ should be

   384   avoided.

   385

   386   \paragraph{Complex expressions} that consist of multi-clausal

   387   function definitions, @{ML_text handle}, @{ML_text case},

   388   @{ML_text let} (and combinations) require special attention.  The

   389   syntax of Standard ML is quite ambitious and admits a lot of

   390   variance that can distort the meaning of the text.

   391

   392   Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},

   393   @{ML_text case} get extra indentation to indicate the nesting

   394   clearly.  Example:

   395

   396   \begin{verbatim}

   397   (* RIGHT *)

   398

   399   fun foo p1 =

   400         expr1

   401     | foo p2 =

   402         expr2

   403

   404

   405   (* WRONG *)

   406

   407   fun foo p1 =

   408     expr1

   409     | foo p2 =

   410     expr2

   411   \end{verbatim}

   412

   413   Body expressions consisting of @{ML_text case} or @{ML_text let}

   414   require care to maintain compositionality, to prevent loss of

   415   logical indentation where it is especially important to see the

   416   structure of the text.  Example:

   417

   418   \begin{verbatim}

   419   (* RIGHT *)

   420

   421   fun foo p1 =

   422         (case e of

   423           q1 => ...

   424         | q2 => ...)

   425     | foo p2 =

   426         let

   427           ...

   428         in

   429           ...

   430         end

   431

   432

   433   (* WRONG *)

   434

   435   fun foo p1 = case e of

   436       q1 => ...

   437     | q2 => ...

   438     | foo p2 =

   439     let

   440       ...

   441     in

   442       ...

   443     end

   444   \end{verbatim}

   445

   446   Extra parentheses around @{ML_text case} expressions are optional,

   447   but help to analyse the nesting based on character matching in the

   448   text editor.

   449

   450   \medskip There are two main exceptions to the overall principle of

   451   compositionality in the layout of complex expressions.

   452

   453   \begin{enumerate}

   454

   455   \item @{ML_text "if"} expressions are iterated as if ML had multi-branch

   456   conditionals, e.g.

   457

   458   \begin{verbatim}

   459   (* RIGHT *)

   460

   461   if b1 then e1

   462   else if b2 then e2

   463   else e3

   464   \end{verbatim}

   465

   466   \item @{ML_text fn} abstractions are often layed-out as if they

   467   would lack any structure by themselves.  This traditional form is

   468   motivated by the possibility to shift function arguments back and

   469   forth wrt.\ additional combinators.  Example:

   470

   471   \begin{verbatim}

   472   (* RIGHT *)

   473

   474   fun foo x y = fold (fn z =>

   475     expr)

   476   \end{verbatim}

   477

   478   Here the visual appearance is that of three arguments @{ML_text x},

   479   @{ML_text y}, @{ML_text z} in a row.

   480

   481   \end{enumerate}

   482

   483   Such weakly structured layout should be use with great care.  Here

   484   are some counter-examples involving @{ML_text let} expressions:

   485

   486   \begin{verbatim}

   487   (* WRONG *)

   488

   489   fun foo x = let

   490       val y = ...

   491     in ... end

   492

   493

   494   (* WRONG *)

   495

   496   fun foo x = let

   497     val y = ...

   498   in ... end

   499

   500

   501   (* WRONG *)

   502

   503   fun foo x =

   504   let

   505     val y = ...

   506   in ... end

   507

   508

   509   (* WRONG *)

   510

   511   fun foo x =

   512     let

   513       val y = ...

   514     in

   515       ... end

   516   \end{verbatim}

   517

   518   \medskip In general the source layout is meant to emphasize the

   519   structure of complex language expressions, not to pretend that SML

   520   had a completely different syntax (say that of Haskell, Scala, Java).

   521 \<close>

   522

   523

   524 section \<open>ML embedded into Isabelle/Isar\<close>

   525

   526 text \<open>ML and Isar are intertwined via an open-ended bootstrap

   527   process that provides more and more programming facilities and

   528   logical content in an alternating manner.  Bootstrapping starts from

   529   the raw environment of existing implementations of Standard ML

   530   (mainly Poly/ML, but also SML/NJ).

   531

   532   Isabelle/Pure marks the point where the raw ML toplevel is superseded by

   533   Isabelle/ML within the Isar theory and proof language, with a uniform

   534   context for arbitrary ML values (see also \secref{sec:context}). This formal

   535   environment holds ML compiler bindings, logical entities, and many other

   536   things.

   537

   538   Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar

   539   environment by introducing suitable theories with associated ML modules,

   540   either inlined within @{verbatim ".thy"} files, or as separate @{verbatim

   541   ".ML"} files that are loading from some theory. Thus Isabelle/HOL is defined

   542   as a regular user-space application within the Isabelle framework. Further

   543   add-on tools can be implemented in ML within the Isar context in the same

   544   manner: ML is part of the standard repertoire of Isabelle, and there is no

   545   distinction between users'' and developers'' in this respect.

   546 \<close>

   547

   548

   549 subsection \<open>Isar ML commands\<close>

   550

   551 text \<open>

   552   The primary Isar source language provides facilities to open a window'' to

   553   the underlying ML compiler. Especially see the Isar commands @{command_ref

   554   "ML_file"} and @{command_ref "ML"}: both work the same way, but the source

   555   text is provided differently, via a file vs.\ inlined, respectively. Apart

   556   from embedding ML into the main theory definition like that, there are many

   557   more commands that refer to ML source, such as @{command_ref setup} or

   558   @{command_ref declaration}. Even more fine-grained embedding of ML into Isar

   559   is encountered in the proof method @{method_ref tactic}, which refines the

   560   pending goal state via a given expression of type @{ML_type tactic}.

   561 \<close>

   562

   563 text %mlex \<open>The following artificial example demonstrates some ML

   564   toplevel declarations within the implicit Isar theory context.  This

   565   is regular functional programming without referring to logical

   566   entities yet.

   567 \<close>

   568

   569 ML \<open>

   570   fun factorial 0 = 1

   571     | factorial n = n * factorial (n - 1)

   572 \<close>

   573

   574 text \<open>Here the ML environment is already managed by Isabelle, i.e.\

   575   the @{ML factorial} function is not yet accessible in the preceding

   576   paragraph, nor in a different theory that is independent from the

   577   current one in the import hierarchy.

   578

   579   Removing the above ML declaration from the source text will remove any trace

   580   of this definition, as expected. The Isabelle/ML toplevel environment is

   581   managed in a \emph{stateless} way: in contrast to the raw ML toplevel, there

   582   are no global side-effects involved here.\footnote{Such a stateless

   583   compilation environment is also a prerequisite for robust parallel

   584   compilation within independent nodes of the implicit theory development

   585   graph.}

   586

   587   \medskip The next example shows how to embed ML into Isar proofs, using

   588   @{command_ref "ML_prf"} instead of @{command_ref "ML"}. As illustrated

   589   below, the effect on the ML environment is local to the whole proof body,

   590   but ignoring the block structure. \<close>

   591

   592 notepad

   593 begin

   594   ML_prf %"ML" \<open>val a = 1\<close>

   595   {

   596     ML_prf %"ML" \<open>val b = a + 1\<close>

   597   } -- \<open>Isar block structure ignored by ML environment\<close>

   598   ML_prf %"ML" \<open>val c = b + 1\<close>

   599 end

   600

   601 text \<open>By side-stepping the normal scoping rules for Isar proof

   602   blocks, embedded ML code can refer to the different contexts and

   603   manipulate corresponding entities, e.g.\ export a fact from a block

   604   context.

   605

   606   \medskip Two further ML commands are useful in certain situations:

   607   @{command_ref ML_val} and @{command_ref ML_command} are \emph{diagnostic} in

   608   the sense that there is no effect on the underlying environment, and can

   609   thus be used anywhere. The examples below produce long strings of digits by

   610   invoking @{ML factorial}: @{command ML_val} takes care of printing the ML

   611   toplevel result, but @{command ML_command} is silent so we produce an

   612   explicit output message.

   613 \<close>

   614

   615 ML_val \<open>factorial 100\<close>

   616 ML_command \<open>writeln (string_of_int (factorial 100))\<close>

   617

   618 notepad

   619 begin

   620   ML_val \<open>factorial 100\<close>

   621   ML_command \<open>writeln (string_of_int (factorial 100))\<close>

   622 end

   623

   624

   625 subsection \<open>Compile-time context\<close>

   626

   627 text \<open>Whenever the ML compiler is invoked within Isabelle/Isar, the

   628   formal context is passed as a thread-local reference variable.  Thus

   629   ML code may access the theory context during compilation, by reading

   630   or writing the (local) theory under construction.  Note that such

   631   direct access to the compile-time context is rare.  In practice it

   632   is typically done via some derived ML functions instead.

   633 \<close>

   634

   635 text %mlref \<open>

   636   \begin{mldecls}

   637   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\

   638   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\

   639   @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\

   640   @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\

   641   \end{mldecls}

   642

   643   \begin{description}

   644

   645   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory

   646   context of the ML toplevel --- at compile time.  ML code needs to

   647   take care to refer to @{ML "ML_Context.the_generic_context ()"}

   648   correctly.  Recall that evaluation of a function body is delayed

   649   until actual run-time.

   650

   651   \item @{ML "Context.>>"}~@{text f} applies context transformation

   652   @{text f} to the implicit context of the ML toplevel.

   653

   654   \item @{ML ML_Thms.bind_thms}~@{text "(name, thms)"} stores a list of

   655   theorems produced in ML both in the (global) theory context and the

   656   ML toplevel, associating it with the provided name.

   657

   658   \item @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but

   659   refers to a singleton fact.

   660

   661   \end{description}

   662

   663   It is important to note that the above functions are really

   664   restricted to the compile time, even though the ML compiler is

   665   invoked at run-time.  The majority of ML code either uses static

   666   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or

   667   proof context at run-time, by explicit functional abstraction.

   668 \<close>

   669

   670

   671 subsection \<open>Antiquotations \label{sec:ML-antiq}\<close>

   672

   673 text \<open>A very important consequence of embedding ML into Isar is the

   674   concept of \emph{ML antiquotation}.  The standard token language of

   675   ML is augmented by special syntactic entities of the following form:

   676

   677   @{rail \<open>

   678   @{syntax_def antiquote}: '@{' nameref args '}'

   679   \<close>}

   680

   681   Here @{syntax nameref} and @{syntax args} are outer syntax categories, as

   682   defined in @{cite "isabelle-isar-ref"}.

   683

   684   \medskip A regular antiquotation @{text "@{name args}"} processes

   685   its arguments by the usual means of the Isar source language, and

   686   produces corresponding ML source text, either as literal

   687   \emph{inline} text (e.g.\ @{text "@{term t}"}) or abstract

   688   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation

   689   scheme allows to refer to formal entities in a robust manner, with

   690   proper static scoping and with some degree of logical checking of

   691   small portions of the code.

   692 \<close>

   693

   694

   695 subsection \<open>Printing ML values\<close>

   696

   697 text \<open>The ML compiler knows about the structure of values according

   698   to their static type, and can print them in the manner of its

   699   toplevel, although the details are non-portable.  The

   700   antiquotations @{ML_antiquotation_def "make_string"} and

   701   @{ML_antiquotation_def "print"} provide a quasi-portable way to

   702   refer to this potential capability of the underlying ML system in

   703   generic Isabelle/ML sources.

   704

   705   This is occasionally useful for diagnostic or demonstration

   706   purposes.  Note that production-quality tools require proper

   707   user-level error messages, avoiding raw ML values in the output.\<close>

   708

   709 text %mlantiq \<open>

   710   \begin{matharray}{rcl}

   711   @{ML_antiquotation_def "make_string"} & : & @{text ML_antiquotation} \\

   712   @{ML_antiquotation_def "print"} & : & @{text ML_antiquotation} \\

   713   \end{matharray}

   714

   715   @{rail \<open>

   716   @@{ML_antiquotation make_string}

   717   ;

   718   @@{ML_antiquotation print} @{syntax name}?

   719   \<close>}

   720

   721   \begin{description}

   722

   723   \item @{text "@{make_string}"} inlines a function to print arbitrary values

   724   similar to the ML toplevel. The result is compiler dependent and may fall

   725   back on "?" in certain situations. The value of configuration option

   726   @{attribute_ref ML_print_depth} determines further details of output.

   727

   728   \item @{text "@{print f}"} uses the ML function @{text "f: string ->

   729   unit"} to output the result of @{text "@{make_string}"} above,

   730   together with the source position of the antiquotation.  The default

   731   output function is @{ML writeln}.

   732

   733   \end{description}

   734 \<close>

   735

   736 text %mlex \<open>The following artificial examples show how to produce

   737   adhoc output of ML values for debugging purposes.\<close>

   738

   739 ML \<open>

   740   val x = 42;

   741   val y = true;

   742

   743   writeln (@{make_string} {x = x, y = y});

   744

   745   @{print} {x = x, y = y};

   746   @{print tracing} {x = x, y = y};

   747 \<close>

   748

   749

   750 section \<open>Canonical argument order \label{sec:canonical-argument-order}\<close>

   751

   752 text \<open>Standard ML is a language in the tradition of @{text

   753   "\<lambda>"}-calculus and \emph{higher-order functional programming},

   754   similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical

   755   languages.  Getting acquainted with the native style of representing

   756   functions in that setting can save a lot of extra boiler-plate of

   757   redundant shuffling of arguments, auxiliary abstractions etc.

   758

   759   Functions are usually \emph{curried}: the idea of turning arguments

   760   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of

   761   type @{text "\<tau>"} is represented by the iterated function space

   762   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known

   763   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried

   764   version fits more smoothly into the basic calculus.\footnote{The

   765   difference is even more significant in HOL, because the redundant

   766   tuple structure needs to be accommodated extraneous proof steps.}

   767

   768   Currying gives some flexibility due to \emph{partial application}.  A

   769   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}

   770   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function

   771   etc.  How well this works in practice depends on the order of

   772   arguments.  In the worst case, arguments are arranged erratically,

   773   and using a function in a certain situation always requires some

   774   glue code.  Thus we would get exponentially many opportunities to

   775   decorate the code with meaningless permutations of arguments.

   776

   777   This can be avoided by \emph{canonical argument order}, which

   778   observes certain standard patterns and minimizes adhoc permutations

   779   in their application.  In Isabelle/ML, large portions of text can be

   780   written without auxiliary operations like @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times>

   781   \<alpha>"} or @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} (the latter is not

   782   present in the Isabelle/ML library).

   783

   784   \medskip The main idea is that arguments that vary less are moved

   785   further to the left than those that vary more.  Two particularly

   786   important categories of functions are \emph{selectors} and

   787   \emph{updates}.

   788

   789   The subsequent scheme is based on a hypothetical set-like container

   790   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both

   791   the names and types of the associated operations are canonical for

   792   Isabelle/ML.

   793

   794   \begin{center}

   795   \begin{tabular}{ll}

   796   kind & canonical name and type \\\hline

   797   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\

   798   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\

   799   \end{tabular}

   800   \end{center}

   801

   802   Given a container @{text "B: \<beta>"}, the partially applied @{text

   803   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and

   804   thus represents the intended denotation directly.  It is customary

   805   to pass the abstract predicate to further operations, not the

   806   concrete container.  The argument order makes it easy to use other

   807   combinators: @{text "forall (member B) list"} will check a list of

   808   elements for membership in @{text "B"} etc. Often the explicit

   809   @{text "list"} is pointless and can be contracted to @{text "forall

   810   (member B)"} to get directly a predicate again.

   811

   812   In contrast, an update operation varies the container, so it moves

   813   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to

   814   insert a value @{text "a"}.  These can be composed naturally as

   815   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward

   816   inversion of the composition order is due to conventional

   817   mathematical notation, which can be easily amended as explained

   818   below.

   819 \<close>

   820

   821

   822 subsection \<open>Forward application and composition\<close>

   823

   824 text \<open>Regular function application and infix notation works best for

   825   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g

   826   z)"}.  The important special case of \emph{linear transformation}

   827   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This

   828   becomes hard to read and maintain if the functions are themselves

   829   given as complex expressions.  The notation can be significantly

   830   improved by introducing \emph{forward} versions of application and

   831   composition as follows:

   832

   833   \medskip

   834   \begin{tabular}{lll}

   835   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\

   836   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\

   837   \end{tabular}

   838   \medskip

   839

   840   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or

   841   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text

   842   "x"}.

   843

   844   \medskip There is an additional set of combinators to accommodate

   845   multiple results (via pairs) that are passed on as multiple

   846   arguments (via currying).

   847

   848   \medskip

   849   \begin{tabular}{lll}

   850   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\

   851   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\

   852   \end{tabular}

   853   \medskip

   854 \<close>

   855

   856 text %mlref \<open>

   857   \begin{mldecls}

   858   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\

   859   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\

   860   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\

   861   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\

   862   \end{mldecls}

   863 \<close>

   864

   865

   866 subsection \<open>Canonical iteration\<close>

   867

   868 text \<open>As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be

   869   understood as update on a configuration of type @{text "\<beta>"},

   870   parameterized by an argument of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}

   871   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates

   872   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a

   873   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f a\<^sub>n"}.

   874   The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.

   875   It can be applied to an initial configuration @{text "b: \<beta>"} to

   876   start the iteration over the given list of arguments: each @{text

   877   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a

   878   cumulative configuration.

   879

   880   The @{text fold} combinator in Isabelle/ML lifts a function @{text

   881   "f"} as above to its iterated version over a list of arguments.

   882   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates

   883   over a list of lists as expected.

   884

   885   The variant @{text "fold_rev"} works inside-out over the list of

   886   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.

   887

   888   The @{text "fold_map"} combinator essentially performs @{text

   889   "fold"} and @{text "map"} simultaneously: each application of @{text

   890   "f"} produces an updated configuration together with a side-result;

   891   the iteration collects all such side-results as a separate list.

   892 \<close>

   893

   894 text %mlref \<open>

   895   \begin{mldecls}

   896   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\

   897   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\

   898   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\

   899   \end{mldecls}

   900

   901   \begin{description}

   902

   903   \item @{ML fold}~@{text f} lifts the parametrized update function

   904   @{text "f"} to a list of parameters.

   905

   906   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text

   907   "f"}, but works inside-out, as if the list would be reversed.

   908

   909   \item @{ML fold_map}~@{text "f"} lifts the parametrized update

   910   function @{text "f"} (with side-result) to a list of parameters and

   911   cumulative side-results.

   912

   913   \end{description}

   914

   915   \begin{warn}

   916   The literature on functional programming provides a confusing multitude of

   917   combinators called @{text "foldl"}, @{text "foldr"} etc. SML97 provides its

   918   own variations as @{ML List.foldl} and @{ML List.foldr}, while the classic

   919   Isabelle library also has the historic @{ML Library.foldl} and @{ML

   920   Library.foldr}. To avoid unnecessary complication, all these historical

   921   versions should be ignored, and the canonical @{ML fold} (or @{ML fold_rev})

   922   used exclusively.

   923   \end{warn}

   924 \<close>

   925

   926 text %mlex \<open>The following example shows how to fill a text buffer

   927   incrementally by adding strings, either individually or from a given

   928   list.

   929 \<close>

   930

   931 ML \<open>

   932   val s =

   933     Buffer.empty

   934     |> Buffer.add "digits: "

   935     |> fold (Buffer.add o string_of_int) (0 upto 9)

   936     |> Buffer.content;

   937

   938   @{assert} (s = "digits: 0123456789");

   939 \<close>

   940

   941 text \<open>Note how @{ML "fold (Buffer.add o string_of_int)"} above saves

   942   an extra @{ML "map"} over the given list.  This kind of peephole

   943   optimization reduces both the code size and the tree structures in

   944   memory (deforestation''), but it requires some practice to read

   945   and write fluently.

   946

   947   \medskip The next example elaborates the idea of canonical

   948   iteration, demonstrating fast accumulation of tree content using a

   949   text buffer.

   950 \<close>

   951

   952 ML \<open>

   953   datatype tree = Text of string | Elem of string * tree list;

   954

   955   fun slow_content (Text txt) = txt

   956     | slow_content (Elem (name, ts)) =

   957         "<" ^ name ^ ">" ^

   958         implode (map slow_content ts) ^

   959         "</" ^ name ^ ">"

   960

   961   fun add_content (Text txt) = Buffer.add txt

   962     | add_content (Elem (name, ts)) =

   963         Buffer.add ("<" ^ name ^ ">") #>

   964         fold add_content ts #>

   965         Buffer.add ("</" ^ name ^ ">");

   966

   967   fun fast_content tree =

   968     Buffer.empty |> add_content tree |> Buffer.content;

   969 \<close>

   970

   971 text \<open>The slowness of @{ML slow_content} is due to the @{ML implode} of

   972   the recursive results, because it copies previously produced strings

   973   again and again.

   974

   975   The incremental @{ML add_content} avoids this by operating on a

   976   buffer that is passed through in a linear fashion.  Using @{ML_text

   977   "#>"} and contraction over the actual buffer argument saves some

   978   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}

   979   invocations with concatenated strings could have been split into

   980   smaller parts, but this would have obfuscated the source without

   981   making a big difference in performance.  Here we have done some

   982   peephole-optimization for the sake of readability.

   983

   984   Another benefit of @{ML add_content} is its open'' form as a

   985   function on buffers that can be continued in further linear

   986   transformations, folding etc.  Thus it is more compositional than

   987   the naive @{ML slow_content}.  As realistic example, compare the

   988   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer

   989   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.

   990

   991   Note that @{ML fast_content} above is only defined as example.  In

   992   many practical situations, it is customary to provide the

   993   incremental @{ML add_content} only and leave the initialization and

   994   termination to the concrete application to the user.

   995 \<close>

   996

   997

   998 section \<open>Message output channels \label{sec:message-channels}\<close>

   999

  1000 text \<open>Isabelle provides output channels for different kinds of

  1001   messages: regular output, high-volume tracing information, warnings,

  1002   and errors.

  1003

  1004   Depending on the user interface involved, these messages may appear

  1005   in different text styles or colours.  The standard output for

  1006   batch sessions prefixes each line of warnings by @{verbatim

  1007   "###"} and errors by @{verbatim "***"}, but leaves anything else

  1008   unchanged.  The message body may contain further markup and formatting,

  1009   which is routinely used in the Prover IDE @{cite "isabelle-jedit"}.

  1010

  1011   Messages are associated with the transaction context of the running

  1012   Isar command.  This enables the front-end to manage commands and

  1013   resulting messages together.  For example, after deleting a command

  1014   from a given theory document version, the corresponding message

  1015   output can be retracted from the display.

  1016 \<close>

  1017

  1018 text %mlref \<open>

  1019   \begin{mldecls}

  1020   @{index_ML writeln: "string -> unit"} \\

  1021   @{index_ML tracing: "string -> unit"} \\

  1022   @{index_ML warning: "string -> unit"} \\

  1023   @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\

  1024   \end{mldecls}

  1025

  1026   \begin{description}

  1027

  1028   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular

  1029   message.  This is the primary message output operation of Isabelle

  1030   and should be used by default.

  1031

  1032   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special

  1033   tracing message, indicating potential high-volume output to the

  1034   front-end (hundreds or thousands of messages issued by a single

  1035   command).  The idea is to allow the user-interface to downgrade the

  1036   quality of message display to achieve higher throughput.

  1037

  1038   Note that the user might have to take special actions to see tracing

  1039   output, e.g.\ switch to a different output window.  So this channel

  1040   should not be used for regular output.

  1041

  1042   \item @{ML warning}~@{text "text"} outputs @{text "text"} as

  1043   warning, which typically means some extra emphasis on the front-end

  1044   side (color highlighting, icons, etc.).

  1045

  1046   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text

  1047   "text"} and thus lets the Isar toplevel print @{text "text"} on the

  1048   error channel, which typically means some extra emphasis on the

  1049   front-end side (color highlighting, icons, etc.).

  1050

  1051   This assumes that the exception is not handled before the command

  1052   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a

  1053   perfectly legal alternative: it means that the error is absorbed

  1054   without any message output.

  1055

  1056   \begin{warn}

  1057   The actual error channel is accessed via @{ML Output.error_message}, but

  1058   this is normally not used directly in user code.

  1059   \end{warn}

  1060

  1061   \end{description}

  1062

  1063   \begin{warn}

  1064   Regular Isabelle/ML code should output messages exclusively by the

  1065   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}

  1066   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in

  1067   the presence of parallel and asynchronous processing of Isabelle

  1068   theories.  Such raw output might be displayed by the front-end in

  1069   some system console log, with a low chance that the user will ever

  1070   see it.  Moreover, as a genuine side-effect on global process

  1071   channels, there is no proper way to retract output when Isar command

  1072   transactions are reset by the system.

  1073   \end{warn}

  1074

  1075   \begin{warn}

  1076   The message channels should be used in a message-oriented manner.

  1077   This means that multi-line output that logically belongs together is

  1078   issued by a single invocation of @{ML writeln} etc.\ with the

  1079   functional concatenation of all message constituents.

  1080   \end{warn}

  1081 \<close>

  1082

  1083 text %mlex \<open>The following example demonstrates a multi-line

  1084   warning.  Note that in some situations the user sees only the first

  1085   line, so the most important point should be made first.

  1086 \<close>

  1087

  1088 ML_command \<open>

  1089   warning (cat_lines

  1090    ["Beware the Jabberwock, my son!",

  1091     "The jaws that bite, the claws that catch!",

  1092     "Beware the Jubjub Bird, and shun",

  1093     "The frumious Bandersnatch!"]);

  1094 \<close>

  1095

  1096

  1097 section \<open>Exceptions \label{sec:exceptions}\<close>

  1098

  1099 text \<open>The Standard ML semantics of strict functional evaluation

  1100   together with exceptions is rather well defined, but some delicate

  1101   points need to be observed to avoid that ML programs go wrong

  1102   despite static type-checking.  Exceptions in Isabelle/ML are

  1103   subsequently categorized as follows.

  1104

  1105   \paragraph{Regular user errors.}  These are meant to provide

  1106   informative feedback about malformed input etc.

  1107

  1108   The \emph{error} function raises the corresponding @{ML ERROR}

  1109   exception, with a plain text message as argument.  @{ML ERROR}

  1110   exceptions can be handled internally, in order to be ignored, turned

  1111   into other exceptions, or cascaded by appending messages.  If the

  1112   corresponding Isabelle/Isar command terminates with an @{ML ERROR}

  1113   exception state, the system will print the result on the error

  1114   channel (see \secref{sec:message-channels}).

  1115

  1116   It is considered bad style to refer to internal function names or

  1117   values in ML source notation in user error messages.  Do not use

  1118   @{text "@{make_string}"} nor @{text "@{here}"}!

  1119

  1120   Grammatical correctness of error messages can be improved by

  1121   \emph{omitting} final punctuation: messages are often concatenated

  1122   or put into a larger context (e.g.\ augmented with source position).

  1123   Note that punctuation after formal entities (types, terms, theorems) is

  1124   particularly prone to user confusion.

  1125

  1126   \paragraph{Program failures.}  There is a handful of standard

  1127   exceptions that indicate general failure situations, or failures of

  1128   core operations on logical entities (types, terms, theorems,

  1129   theories, see \chref{ch:logic}).

  1130

  1131   These exceptions indicate a genuine breakdown of the program, so the

  1132   main purpose is to determine quickly what has happened where.

  1133   Traditionally, the (short) exception message would include the name

  1134   of an ML function, although this is no longer necessary, because the

  1135   ML runtime system attaches detailed source position stemming from the

  1136   corresponding @{ML_text raise} keyword.

  1137

  1138   \medskip User modules can always introduce their own custom

  1139   exceptions locally, e.g.\ to organize internal failures robustly

  1140   without overlapping with existing exceptions.  Exceptions that are

  1141   exposed in module signatures require extra care, though, and should

  1142   \emph{not} be introduced by default.  Surprise by users of a module

  1143   can be often minimized by using plain user errors instead.

  1144

  1145   \paragraph{Interrupts.}  These indicate arbitrary system events:

  1146   both the ML runtime system and the Isabelle/ML infrastructure signal

  1147   various exceptional situations by raising the special

  1148   @{ML Exn.Interrupt} exception in user code.

  1149

  1150   This is the one and only way that physical events can intrude an Isabelle/ML

  1151   program. Such an interrupt can mean out-of-memory, stack overflow, timeout,

  1152   internal signaling of threads, or a POSIX process signal. An Isabelle/ML

  1153   program that intercepts interrupts becomes dependent on physical effects of

  1154   the environment. Even worse, exception handling patterns that are too

  1155   general by accident, e.g.\ by misspelled exception constructors, will cover

  1156   interrupts unintentionally and thus render the program semantics

  1157   ill-defined.

  1158

  1159   Note that the Interrupt exception dates back to the original SML90

  1160   language definition.  It was excluded from the SML97 version to

  1161   avoid its malign impact on ML program semantics, but without

  1162   providing a viable alternative.  Isabelle/ML recovers physical

  1163   interruptibility (which is an indispensable tool to implement

  1164   managed evaluation of command transactions), but requires user code

  1165   to be strictly transparent wrt.\ interrupts.

  1166

  1167   \begin{warn}

  1168   Isabelle/ML user code needs to terminate promptly on interruption,

  1169   without guessing at its meaning to the system infrastructure.

  1170   Temporary handling of interrupts for cleanup of global resources

  1171   etc.\ needs to be followed immediately by re-raising of the original

  1172   exception.

  1173   \end{warn}

  1174 \<close>

  1175

  1176 text %mlref \<open>

  1177   \begin{mldecls}

  1178   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\

  1179   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\

  1180   @{index_ML_exception ERROR: string} \\

  1181   @{index_ML_exception Fail: string} \\

  1182   @{index_ML Exn.is_interrupt: "exn -> bool"} \\

  1183   @{index_ML reraise: "exn -> 'a"} \\

  1184   @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\

  1185   \end{mldecls}

  1186

  1187   \begin{description}

  1188

  1189   \item @{ML try}~@{text "f x"} makes the partiality of evaluating

  1190   @{text "f x"} explicit via the option datatype.  Interrupts are

  1191   \emph{not} handled here, i.e.\ this form serves as safe replacement

  1192   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f

  1193   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in

  1194   books about SML97, but not in Isabelle/ML.

  1195

  1196   \item @{ML can} is similar to @{ML try} with more abstract result.

  1197

  1198   \item @{ML ERROR}~@{text "msg"} represents user errors; this

  1199   exception is normally raised indirectly via the @{ML error} function

  1200   (see \secref{sec:message-channels}).

  1201

  1202   \item @{ML Fail}~@{text "msg"} represents general program failures.

  1203

  1204   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without

  1205   mentioning concrete exception constructors in user code.  Handled

  1206   interrupts need to be re-raised promptly!

  1207

  1208   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}

  1209   while preserving its implicit position information (if possible,

  1210   depending on the ML platform).

  1211

  1212   \item @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~@{text

  1213   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing

  1214   a full trace of its stack of nested exceptions (if possible,

  1215   depending on the ML platform).

  1216

  1217   Inserting @{ML Runtime.exn_trace} into ML code temporarily is

  1218   useful for debugging, but not suitable for production code.

  1219

  1220   \end{description}

  1221 \<close>

  1222

  1223 text %mlantiq \<open>

  1224   \begin{matharray}{rcl}

  1225   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\

  1226   \end{matharray}

  1227

  1228   \begin{description}

  1229

  1230   \item @{text "@{assert}"} inlines a function

  1231   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is

  1232   @{ML false}.  Due to inlining the source position of failed

  1233   assertions is included in the error output.

  1234

  1235   \end{description}

  1236 \<close>

  1237

  1238

  1239 section \<open>Strings of symbols \label{sec:symbols}\<close>

  1240

  1241 text \<open>A \emph{symbol} constitutes the smallest textual unit in

  1242   Isabelle/ML --- raw ML characters are normally not encountered at

  1243   all.  Isabelle strings consist of a sequence of symbols, represented

  1244   as a packed string or an exploded list of strings.  Each symbol is

  1245   in itself a small string, which has either one of the following

  1246   forms:

  1247

  1248   \begin{enumerate}

  1249

  1250   \item a single ASCII character @{text "c"}'', for example

  1251   @{verbatim a}'',

  1252

  1253   \item a codepoint according to UTF-8 (non-ASCII byte sequence),

  1254

  1255   \item a regular symbol @{verbatim \<open>\\<close>}@{verbatim "<"}@{text

  1256   "ident"}@{verbatim ">"}'', for example @{verbatim "\<alpha>"}'',

  1257

  1258   \item a control symbol @{verbatim \<open>\\<close>}@{verbatim "<^"}@{text

  1259   "ident"}@{verbatim ">"}'', for example @{verbatim "\<^bold>"}'',

  1260

  1261   \item a raw symbol @{verbatim \<open>\\<close>}@{verbatim "<^raw:"}@{text

  1262   text}@{verbatim ">"}'' where @{text text} consists of printable characters

  1263   excluding @{verbatim "."}'' and @{verbatim ">"}'', for example

  1264   @{verbatim "\<^raw:$\sum_{i = 1}^n$>"}'',

  1265

  1266   \item a numbered raw control symbol @{verbatim \<open>\\<close>}@{verbatim

  1267   "<^raw"}@{text n}@{verbatim ">"}, where @{text n} consists of digits, for

  1268   example @{verbatim "\<^raw42>"}''.

  1269

  1270   \end{enumerate}

  1271

  1272   The @{text "ident"} syntax for symbol names is @{text "letter

  1273   (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text

  1274   "digit = 0..9"}.  There are infinitely many regular symbols and

  1275   control symbols, but a fixed collection of standard symbols is

  1276   treated specifically.  For example, @{verbatim "\<alpha>"}'' is

  1277   classified as a letter, which means it may occur within regular

  1278   Isabelle identifiers.

  1279

  1280   The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit

  1281   character sequences are passed-through unchanged. Unicode/UCS data in UTF-8

  1282   encoding is processed in a non-strict fashion, such that well-formed code

  1283   sequences are recognized accordingly. Unicode provides its own collection of

  1284   mathematical symbols, but within the core Isabelle/ML world there is no link

  1285   to the standard collection of Isabelle regular symbols.

  1286

  1287   \medskip Output of Isabelle symbols depends on the print mode. For example,

  1288   the standard {\LaTeX} setup of the Isabelle document preparation system

  1289   would present @{verbatim "\<alpha>"}'' as @{text "\<alpha>"}, and @{verbatim

  1290   "\<^bold>\<alpha>"}'' as @{text "\<^bold>\<alpha>"}. On-screen rendering usually works by mapping a

  1291   finite subset of Isabelle symbols to suitable Unicode characters.

  1292 \<close>

  1293

  1294 text %mlref \<open>

  1295   \begin{mldecls}

  1296   @{index_ML_type "Symbol.symbol": string} \\

  1297   @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\

  1298   @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\

  1299   @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\

  1300   @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\

  1301   @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\

  1302   \end{mldecls}

  1303   \begin{mldecls}

  1304   @{index_ML_type "Symbol.sym"} \\

  1305   @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\

  1306   \end{mldecls}

  1307

  1308   \begin{description}

  1309

  1310   \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle

  1311   symbols.

  1312

  1313   \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list

  1314   from the packed form.  This function supersedes @{ML

  1315   "String.explode"} for virtually all purposes of manipulating text in

  1316   Isabelle!\footnote{The runtime overhead for exploded strings is

  1317   mainly that of the list structure: individual symbols that happen to

  1318   be a singleton string do not require extra memory in Poly/ML.}

  1319

  1320   \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML

  1321   "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard

  1322   symbols according to fixed syntactic conventions of Isabelle, cf.\

  1323   @{cite "isabelle-isar-ref"}.

  1324

  1325   \item Type @{ML_type "Symbol.sym"} is a concrete datatype that

  1326   represents the different kinds of symbols explicitly, with

  1327   constructors @{ML "Symbol.Char"}, @{ML "Symbol.UTF8"},

  1328   @{ML "Symbol.Sym"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"},

  1329   @{ML "Symbol.Malformed"}.

  1330

  1331   \item @{ML "Symbol.decode"} converts the string representation of a

  1332   symbol into the datatype version.

  1333

  1334   \end{description}

  1335

  1336   \paragraph{Historical note.} In the original SML90 standard the

  1337   primitive ML type @{ML_type char} did not exists, and @{ML_text

  1338   "explode: string -> string list"} produced a list of singleton

  1339   strings like @{ML "raw_explode: string -> string list"} in

  1340   Isabelle/ML today.  When SML97 came out, Isabelle did not adopt its

  1341   somewhat anachronistic 8-bit or 16-bit characters, but the idea of

  1342   exploding a string into a list of small strings was extended to

  1343   symbols'' as explained above.  Thus Isabelle sources can refer to

  1344   an infinite store of user-defined symbols, without having to worry

  1345   about the multitude of Unicode encodings that have emerged over the

  1346   years.\<close>

  1347

  1348

  1349 section \<open>Basic data types\<close>

  1350

  1351 text \<open>The basis library proposal of SML97 needs to be treated with

  1352   caution.  Many of its operations simply do not fit with important

  1353   Isabelle/ML conventions (like canonical argument order'', see

  1354   \secref{sec:canonical-argument-order}), others cause problems with

  1355   the parallel evaluation model of Isabelle/ML (such as @{ML

  1356   TextIO.print} or @{ML OS.Process.system}).

  1357

  1358   Subsequently we give a brief overview of important operations on

  1359   basic ML data types.

  1360 \<close>

  1361

  1362

  1363 subsection \<open>Characters\<close>

  1364

  1365 text %mlref \<open>

  1366   \begin{mldecls}

  1367   @{index_ML_type char} \\

  1368   \end{mldecls}

  1369

  1370   \begin{description}

  1371

  1372   \item Type @{ML_type char} is \emph{not} used.  The smallest textual

  1373   unit in Isabelle is represented as a symbol'' (see

  1374   \secref{sec:symbols}).

  1375

  1376   \end{description}

  1377 \<close>

  1378

  1379

  1380 subsection \<open>Strings\<close>

  1381

  1382 text %mlref \<open>

  1383   \begin{mldecls}

  1384   @{index_ML_type string} \\

  1385   \end{mldecls}

  1386

  1387   \begin{description}

  1388

  1389   \item Type @{ML_type string} represents immutable vectors of 8-bit

  1390   characters.  There are operations in SML to convert back and forth

  1391   to actual byte vectors, which are seldom used.

  1392

  1393   This historically important raw text representation is used for

  1394   Isabelle-specific purposes with the following implicit substructures

  1395   packed into the string content:

  1396

  1397   \begin{enumerate}

  1398

  1399   \item sequence of Isabelle symbols (see also \secref{sec:symbols}),

  1400   with @{ML Symbol.explode} as key operation;

  1401

  1402   \item XML tree structure via YXML (see also @{cite "isabelle-sys"}),

  1403   with @{ML YXML.parse_body} as key operation.

  1404

  1405   \end{enumerate}

  1406

  1407   Note that Isabelle/ML string literals may refer Isabelle symbols like

  1408   @{verbatim \<alpha>}'' natively, \emph{without} escaping the backslash. This is a

  1409   consequence of Isabelle treating all source text as strings of symbols,

  1410   instead of raw characters.

  1411

  1412   \end{description}

  1413 \<close>

  1414

  1415 text %mlex \<open>The subsequent example illustrates the difference of

  1416   physical addressing of bytes versus logical addressing of symbols in

  1417   Isabelle strings.

  1418 \<close>

  1419

  1420 ML_val \<open>

  1421   val s = "\<A>";

  1422

  1423   @{assert} (length (Symbol.explode s) = 1);

  1424   @{assert} (size s = 4);

  1425 \<close>

  1426

  1427 text \<open>Note that in Unicode renderings of the symbol @{text "\<A>"},

  1428   variations of encodings like UTF-8 or UTF-16 pose delicate questions

  1429   about the multi-byte representations of its codepoint, which is outside

  1430   of the 16-bit address space of the original Unicode standard from

  1431   the 1990-ies.  In Isabelle/ML it is just @{verbatim \<A>}''

  1432   literally, using plain ASCII characters beyond any doubts.\<close>

  1433

  1434

  1435 subsection \<open>Integers\<close>

  1436

  1437 text %mlref \<open>

  1438   \begin{mldecls}

  1439   @{index_ML_type int} \\

  1440   \end{mldecls}

  1441

  1442   \begin{description}

  1443

  1444   \item Type @{ML_type int} represents regular mathematical integers, which

  1445   are \emph{unbounded}. Overflow is treated properly, but should never happen

  1446   in practice.\footnote{The size limit for integer bit patterns in memory is

  1447   64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.} This works

  1448   uniformly for all supported ML platforms (Poly/ML and SML/NJ).

  1449

  1450   Literal integers in ML text are forced to be of this one true

  1451   integer type --- adhoc overloading of SML97 is disabled.

  1452

  1453   Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by

  1454   @{ML_structure Int}.  Structure @{ML_structure Integer} in @{file

  1455   "~~/src/Pure/General/integer.ML"} provides some additional

  1456   operations.

  1457

  1458   \end{description}

  1459 \<close>

  1460

  1461

  1462 subsection \<open>Time\<close>

  1463

  1464 text %mlref \<open>

  1465   \begin{mldecls}

  1466   @{index_ML_type Time.time} \\

  1467   @{index_ML seconds: "real -> Time.time"} \\

  1468   \end{mldecls}

  1469

  1470   \begin{description}

  1471

  1472   \item Type @{ML_type Time.time} represents time abstractly according

  1473   to the SML97 basis library definition.  This is adequate for

  1474   internal ML operations, but awkward in concrete time specifications.

  1475

  1476   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text

  1477   "s"} (measured in seconds) into an abstract time value.  Floating

  1478   point numbers are easy to use as configuration options in the

  1479   context (see \secref{sec:config-options}) or system options that

  1480   are maintained externally.

  1481

  1482   \end{description}

  1483 \<close>

  1484

  1485

  1486 subsection \<open>Options\<close>

  1487

  1488 text %mlref \<open>

  1489   \begin{mldecls}

  1490   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\

  1491   @{index_ML is_some: "'a option -> bool"} \\

  1492   @{index_ML is_none: "'a option -> bool"} \\

  1493   @{index_ML the: "'a option -> 'a"} \\

  1494   @{index_ML these: "'a list option -> 'a list"} \\

  1495   @{index_ML the_list: "'a option -> 'a list"} \\

  1496   @{index_ML the_default: "'a -> 'a option -> 'a"} \\

  1497   \end{mldecls}

  1498 \<close>

  1499

  1500 text \<open>Apart from @{ML Option.map} most other operations defined in

  1501   structure @{ML_structure Option} are alien to Isabelle/ML and never

  1502   used.  The operations shown above are defined in @{file

  1503   "~~/src/Pure/General/basics.ML"}.\<close>

  1504

  1505

  1506 subsection \<open>Lists\<close>

  1507

  1508 text \<open>Lists are ubiquitous in ML as simple and light-weight

  1509   collections'' for many everyday programming tasks.  Isabelle/ML

  1510   provides important additions and improvements over operations that

  1511   are predefined in the SML97 library.\<close>

  1512

  1513 text %mlref \<open>

  1514   \begin{mldecls}

  1515   @{index_ML cons: "'a -> 'a list -> 'a list"} \\

  1516   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\

  1517   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\

  1518   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\

  1519   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\

  1520   \end{mldecls}

  1521

  1522   \begin{description}

  1523

  1524   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.

  1525

  1526   Tupled infix operators are a historical accident in Standard ML.

  1527   The curried @{ML cons} amends this, but it should be only used when

  1528   partial application is required.

  1529

  1530   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat

  1531   lists as a set-like container that maintains the order of elements.

  1532   See @{file "~~/src/Pure/library.ML"} for the full specifications

  1533   (written in ML).  There are some further derived operations like

  1534   @{ML union} or @{ML inter}.

  1535

  1536   Note that @{ML insert} is conservative about elements that are

  1537   already a @{ML member} of the list, while @{ML update} ensures that

  1538   the latest entry is always put in front.  The latter discipline is

  1539   often more appropriate in declarations of context data

  1540   (\secref{sec:context-data}) that are issued by the user in Isar

  1541   source: later declarations take precedence over earlier ones.

  1542

  1543   \end{description}

  1544 \<close>

  1545

  1546 text %mlex \<open>Using canonical @{ML fold} together with @{ML cons} (or

  1547   similar standard operations) alternates the orientation of data.

  1548   The is quite natural and should not be altered forcible by inserting

  1549   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can

  1550   be used in the few situations, where alternation should be

  1551   prevented.

  1552 \<close>

  1553

  1554 ML \<open>

  1555   val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

  1556

  1557   val list1 = fold cons items [];

  1558   @{assert} (list1 = rev items);

  1559

  1560   val list2 = fold_rev cons items [];

  1561   @{assert} (list2 = items);

  1562 \<close>

  1563

  1564 text \<open>The subsequent example demonstrates how to \emph{merge} two

  1565   lists in a natural way.\<close>

  1566

  1567 ML \<open>

  1568   fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;

  1569 \<close>

  1570

  1571 text \<open>Here the first list is treated conservatively: only the new

  1572   elements from the second list are inserted.  The inside-out order of

  1573   insertion via @{ML fold_rev} attempts to preserve the order of

  1574   elements in the result.

  1575

  1576   This way of merging lists is typical for context data

  1577   (\secref{sec:context-data}).  See also @{ML merge} as defined in

  1578   @{file "~~/src/Pure/library.ML"}.

  1579 \<close>

  1580

  1581

  1582 subsection \<open>Association lists\<close>

  1583

  1584 text \<open>The operations for association lists interpret a concrete list

  1585   of pairs as a finite function from keys to values.  Redundant

  1586   representations with multiple occurrences of the same key are

  1587   implicitly normalized: lookup and update only take the first

  1588   occurrence into account.

  1589 \<close>

  1590

  1591 text \<open>

  1592   \begin{mldecls}

  1593   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\

  1594   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\

  1595   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\

  1596   \end{mldecls}

  1597

  1598   \begin{description}

  1599

  1600   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}

  1601   implement the main framework operations'' for mappings in

  1602   Isabelle/ML, following standard conventions for their names and

  1603   types.

  1604

  1605   Note that a function called @{verbatim lookup} is obliged to express its

  1606   partiality via an explicit option element.  There is no choice to

  1607   raise an exception, without changing the name to something like

  1608   @{text "the_element"} or @{text "get"}.

  1609

  1610   The @{text "defined"} operation is essentially a contraction of @{ML

  1611   is_some} and @{verbatim "lookup"}, but this is sufficiently frequent to

  1612   justify its independent existence.  This also gives the

  1613   implementation some opportunity for peep-hole optimization.

  1614

  1615   \end{description}

  1616

  1617   Association lists are adequate as simple implementation of finite mappings

  1618   in many practical situations. A more advanced table structure is defined in

  1619   @{file "~~/src/Pure/General/table.ML"}; that version scales easily to

  1620   thousands or millions of elements.

  1621 \<close>

  1622

  1623

  1624 subsection \<open>Unsynchronized references\<close>

  1625

  1626 text %mlref \<open>

  1627   \begin{mldecls}

  1628   @{index_ML_type "'a Unsynchronized.ref"} \\

  1629   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\

  1630   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\

  1631   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\

  1632   \end{mldecls}

  1633 \<close>

  1634

  1635 text \<open>Due to ubiquitous parallelism in Isabelle/ML (see also

  1636   \secref{sec:multi-threading}), the mutable reference cells of

  1637   Standard ML are notorious for causing problems.  In a highly

  1638   parallel system, both correctness \emph{and} performance are easily

  1639   degraded when using mutable data.

  1640

  1641   The unwieldy name of @{ML Unsynchronized.ref} for the constructor

  1642   for references in Isabelle/ML emphasizes the inconveniences caused by

  1643   mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are

  1644   unchanged, but should be used with special precautions, say in a

  1645   strictly local situation that is guaranteed to be restricted to

  1646   sequential evaluation --- now and in the future.

  1647

  1648   \begin{warn}

  1649   Never @{ML_text "open Unsynchronized"}, not even in a local scope!

  1650   Pretending that mutable state is no problem is a very bad idea.

  1651   \end{warn}

  1652 \<close>

  1653

  1654

  1655 section \<open>Thread-safe programming \label{sec:multi-threading}\<close>

  1656

  1657 text \<open>Multi-threaded execution has become an everyday reality in

  1658   Isabelle since Poly/ML 5.2.1 and Isabelle2008.  Isabelle/ML provides

  1659   implicit and explicit parallelism by default, and there is no way

  1660   for user-space tools to opt out''.  ML programs that are purely

  1661   functional, output messages only via the official channels

  1662   (\secref{sec:message-channels}), and do not intercept interrupts

  1663   (\secref{sec:exceptions}) can participate in the multi-threaded

  1664   environment immediately without further ado.

  1665

  1666   More ambitious tools with more fine-grained interaction with the

  1667   environment need to observe the principles explained below.

  1668 \<close>

  1669

  1670

  1671 subsection \<open>Multi-threading with shared memory\<close>

  1672

  1673 text \<open>Multiple threads help to organize advanced operations of the

  1674   system, such as real-time conditions on command transactions,

  1675   sub-components with explicit communication, general asynchronous

  1676   interaction etc.  Moreover, parallel evaluation is a prerequisite to

  1677   make adequate use of the CPU resources that are available on

  1678   multi-core systems.\footnote{Multi-core computing does not mean that

  1679   there are spare cycles'' to be wasted.  It means that the

  1680   continued exponential speedup of CPU performance due to Moore's

  1681   Law'' follows different rules: clock frequency has reached its peak

  1682   around 2005, and applications need to be parallelized in order to

  1683   avoid a perceived loss of performance.  See also

  1684   @{cite "Sutter:2005"}.}

  1685

  1686   Isabelle/Isar exploits the inherent structure of theories and proofs to

  1687   support \emph{implicit parallelism} to a large extent. LCF-style theorem

  1688   proving provides almost ideal conditions for that, see also

  1689   @{cite "Wenzel:2009"}. This means, significant parts of theory and proof

  1690   checking is parallelized by default. In Isabelle2013, a maximum

  1691   speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be expected

  1692   @{cite "Wenzel:2013:ITP"}.

  1693

  1694   \medskip ML threads lack the memory protection of separate

  1695   processes, and operate concurrently on shared heap memory.  This has

  1696   the advantage that results of independent computations are directly

  1697   available to other threads: abstract values can be passed without

  1698   copying or awkward serialization that is typically required for

  1699   separate processes.

  1700

  1701   To make shared-memory multi-threading work robustly and efficiently,

  1702   some programming guidelines need to be observed.  While the ML

  1703   system is responsible to maintain basic integrity of the

  1704   representation of ML values in memory, the application programmer

  1705   needs to ensure that multi-threaded execution does not break the

  1706   intended semantics.

  1707

  1708   \begin{warn}

  1709   To participate in implicit parallelism, tools need to be

  1710   thread-safe.  A single ill-behaved tool can affect the stability and

  1711   performance of the whole system.

  1712   \end{warn}

  1713

  1714   Apart from observing the principles of thread-safeness passively, advanced

  1715   tools may also exploit parallelism actively, e.g.\ by using library

  1716   functions for parallel list operations (\secref{sec:parlist}).

  1717

  1718   \begin{warn}

  1719   Parallel computing resources are managed centrally by the

  1720   Isabelle/ML infrastructure.  User programs should not fork their own

  1721   ML threads to perform heavy computations.

  1722   \end{warn}

  1723 \<close>

  1724

  1725

  1726 subsection \<open>Critical shared resources\<close>

  1727

  1728 text \<open>Thread-safeness is mainly concerned about concurrent

  1729   read/write access to shared resources, which are outside the purely

  1730   functional world of ML.  This covers the following in particular.

  1731

  1732   \begin{itemize}

  1733

  1734   \item Global references (or arrays), i.e.\ mutable memory cells that

  1735   persist over several invocations of associated

  1736   operations.\footnote{This is independent of the visibility of such

  1737   mutable values in the toplevel scope.}

  1738

  1739   \item Global state of the running Isabelle/ML process, i.e.\ raw I/O

  1740   channels, environment variables, current working directory.

  1741

  1742   \item Writable resources in the file-system that are shared among

  1743   different threads or external processes.

  1744

  1745   \end{itemize}

  1746

  1747   Isabelle/ML provides various mechanisms to avoid critical shared

  1748   resources in most situations.  As last resort there are some

  1749   mechanisms for explicit synchronization.  The following guidelines

  1750   help to make Isabelle/ML programs work smoothly in a concurrent

  1751   environment.

  1752

  1753   \begin{itemize}

  1754

  1755   \item Avoid global references altogether.  Isabelle/Isar maintains a

  1756   uniform context that incorporates arbitrary data declared by user

  1757   programs (\secref{sec:context-data}).  This context is passed as

  1758   plain value and user tools can get/map their own data in a purely

  1759   functional manner.  Configuration options within the context

  1760   (\secref{sec:config-options}) provide simple drop-in replacements

  1761   for historic reference variables.

  1762

  1763   \item Keep components with local state information re-entrant.

  1764   Instead of poking initial values into (private) global references, a

  1765   new state record can be created on each invocation, and passed

  1766   through any auxiliary functions of the component.  The state record

  1767   contain mutable references in special situations, without requiring any

  1768   synchronization, as long as each invocation gets its own copy and the

  1769   tool itself is single-threaded.

  1770

  1771   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The

  1772   Poly/ML library is thread-safe for each individual output operation,

  1773   but the ordering of parallel invocations is arbitrary.  This means

  1774   raw output will appear on some system console with unpredictable

  1775   interleaving of atomic chunks.

  1776

  1777   Note that this does not affect regular message output channels

  1778   (\secref{sec:message-channels}).  An official message id is associated

  1779   with the command transaction from where it originates, independently

  1780   of other transactions.  This means each running Isar command has

  1781   effectively its own set of message channels, and interleaving can

  1782   only happen when commands use parallelism internally (and only at

  1783   message boundaries).

  1784

  1785   \item Treat environment variables and the current working directory

  1786   of the running process as read-only.

  1787

  1788   \item Restrict writing to the file-system to unique temporary files.

  1789   Isabelle already provides a temporary directory that is unique for

  1790   the running process, and there is a centralized source of unique

  1791   serial numbers in Isabelle/ML.  Thus temporary files that are passed

  1792   to to some external process will be always disjoint, and thus

  1793   thread-safe.

  1794

  1795   \end{itemize}

  1796 \<close>

  1797

  1798 text %mlref \<open>

  1799   \begin{mldecls}

  1800   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\

  1801   @{index_ML serial_string: "unit -> string"} \\

  1802   \end{mldecls}

  1803

  1804   \begin{description}

  1805

  1806   \item @{ML File.tmp_path}~@{text "path"} relocates the base

  1807   component of @{text "path"} into the unique temporary directory of

  1808   the running Isabelle/ML process.

  1809

  1810   \item @{ML serial_string}~@{text "()"} creates a new serial number

  1811   that is unique over the runtime of the Isabelle/ML process.

  1812

  1813   \end{description}

  1814 \<close>

  1815

  1816 text %mlex \<open>The following example shows how to create unique

  1817   temporary file names.

  1818 \<close>

  1819

  1820 ML \<open>

  1821   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));

  1822   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));

  1823   @{assert} (tmp1 <> tmp2);

  1824 \<close>

  1825

  1826

  1827 subsection \<open>Explicit synchronization\<close>

  1828

  1829 text \<open>Isabelle/ML provides explicit synchronization for mutable variables over

  1830   immutable data, which may be updated atomically and exclusively. This

  1831   addresses the rare situations where mutable shared resources are really

  1832   required. Synchronization in Isabelle/ML is based on primitives of Poly/ML,

  1833   which have been adapted to the specific assumptions of the concurrent

  1834   Isabelle environment. User code should not break this abstraction, but stay

  1835   within the confines of concurrent Isabelle/ML.

  1836

  1837   A \emph{synchronized variable} is an explicit state component associated

  1838   with mechanisms for locking and signaling. There are operations to await a

  1839   condition, change the state, and signal the change to all other waiting

  1840   threads. Synchronized access to the state variable is \emph{not} re-entrant:

  1841   direct or indirect nesting within the same thread will cause a deadlock!\<close>

  1842

  1843 text %mlref \<open>

  1844   \begin{mldecls}

  1845   @{index_ML_type "'a Synchronized.var"} \\

  1846   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\

  1847   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->

  1848   ('a -> ('b * 'a) option) -> 'b"} \\

  1849   \end{mldecls}

  1850

  1851   \begin{description}

  1852

  1853   \item Type @{ML_type "'a Synchronized.var"} represents synchronized

  1854   variables with state of type @{ML_type 'a}.

  1855

  1856   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized

  1857   variable that is initialized with value @{text "x"}.  The @{text

  1858   "name"} is used for tracing.

  1859

  1860   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the

  1861   function @{text "f"} operate within a critical section on the state

  1862   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it

  1863   continues to wait on the internal condition variable, expecting that

  1864   some other thread will eventually change the content in a suitable

  1865   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is

  1866   satisfied and assigns the new state value @{text "x'"}, broadcasts a

  1867   signal to all waiting threads on the associated condition variable,

  1868   and returns the result @{text "y"}.

  1869

  1870   \end{description}

  1871

  1872   There are some further variants of the @{ML

  1873   Synchronized.guarded_access} combinator, see @{file

  1874   "~~/src/Pure/Concurrent/synchronized.ML"} for details.

  1875 \<close>

  1876

  1877 text %mlex \<open>The following example implements a counter that produces

  1878   positive integers that are unique over the runtime of the Isabelle

  1879   process:\<close>

  1880

  1881 ML \<open>

  1882   local

  1883     val counter = Synchronized.var "counter" 0;

  1884   in

  1885     fun next () =

  1886       Synchronized.guarded_access counter

  1887         (fn i =>

  1888           let val j = i + 1

  1889           in SOME (j, j) end);

  1890   end;

  1891 \<close>

  1892

  1893 ML \<open>

  1894   val a = next ();

  1895   val b = next ();

  1896   @{assert} (a <> b);

  1897 \<close>

  1898

  1899 text \<open>\medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how

  1900   to implement a mailbox as synchronized variable over a purely

  1901   functional list.\<close>

  1902

  1903

  1904 section \<open>Managed evaluation\<close>

  1905

  1906 text \<open>Execution of Standard ML follows the model of strict

  1907   functional evaluation with optional exceptions.  Evaluation happens

  1908   whenever some function is applied to (sufficiently many)

  1909   arguments. The result is either an explicit value or an implicit

  1910   exception.

  1911

  1912   \emph{Managed evaluation} in Isabelle/ML organizes expressions and

  1913   results to control certain physical side-conditions, to say more

  1914   specifically when and how evaluation happens.  For example, the

  1915   Isabelle/ML library supports lazy evaluation with memoing, parallel

  1916   evaluation via futures, asynchronous evaluation via promises,

  1917   evaluation with time limit etc.

  1918

  1919   \medskip An \emph{unevaluated expression} is represented either as

  1920   unit abstraction @{verbatim "fn () => a"} of type

  1921   @{verbatim "unit -> 'a"} or as regular function

  1922   @{verbatim "fn a => b"} of type @{verbatim "'a -> 'b"}.  Both forms

  1923   occur routinely, and special care is required to tell them apart ---

  1924   the static type-system of SML is only of limited help here.

  1925

  1926   The first form is more intuitive: some combinator @{text "(unit ->

  1927   'a) -> 'a"} applies the given function to @{text "()"} to initiate

  1928   the postponed evaluation process.  The second form is more flexible:

  1929   some combinator @{text "('a -> 'b) -> 'a -> 'b"} acts like a

  1930   modified form of function application; several such combinators may

  1931   be cascaded to modify a given function, before it is ultimately

  1932   applied to some argument.

  1933

  1934   \medskip \emph{Reified results} make the disjoint sum of regular

  1935   values versions exceptional situations explicit as ML datatype:

  1936   @{text "'a result = Res of 'a | Exn of exn"}.  This is typically

  1937   used for administrative purposes, to store the overall outcome of an

  1938   evaluation process.

  1939

  1940   \emph{Parallel exceptions} aggregate reified results, such that

  1941   multiple exceptions are digested as a collection in canonical form

  1942   that identifies exceptions according to their original occurrence.

  1943   This is particular important for parallel evaluation via futures

  1944   \secref{sec:futures}, which are organized as acyclic graph of

  1945   evaluations that depend on other evaluations: exceptions stemming

  1946   from shared sub-graphs are exposed exactly once and in the order of

  1947   their original occurrence (e.g.\ when printed at the toplevel).

  1948   Interrupt counts as neutral element here: it is treated as minimal

  1949   information about some canceled evaluation process, and is absorbed

  1950   by the presence of regular program exceptions.\<close>

  1951

  1952 text %mlref \<open>

  1953   \begin{mldecls}

  1954   @{index_ML_type "'a Exn.result"} \\

  1955   @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\

  1956   @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\

  1957   @{index_ML Exn.release: "'a Exn.result -> 'a"} \\

  1958   @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\

  1959   @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\

  1960   \end{mldecls}

  1961

  1962   \begin{description}

  1963

  1964   \item Type @{ML_type "'a Exn.result"} represents the disjoint sum of

  1965   ML results explicitly, with constructor @{ML Exn.Res} for regular

  1966   values and @{ML "Exn.Exn"} for exceptions.

  1967

  1968   \item @{ML Exn.capture}~@{text "f x"} manages the evaluation of

  1969   @{text "f x"} such that exceptions are made explicit as @{ML

  1970   "Exn.Exn"}.  Note that this includes physical interrupts (see also

  1971   \secref{sec:exceptions}), so the same precautions apply to user

  1972   code: interrupts must not be absorbed accidentally!

  1973

  1974   \item @{ML Exn.interruptible_capture} is similar to @{ML

  1975   Exn.capture}, but interrupts are immediately re-raised as required

  1976   for user code.

  1977

  1978   \item @{ML Exn.release}~@{text "result"} releases the original

  1979   runtime result, exposing its regular value or raising the reified

  1980   exception.

  1981

  1982   \item @{ML Par_Exn.release_all}~@{text "results"} combines results

  1983   that were produced independently (e.g.\ by parallel evaluation).  If

  1984   all results are regular values, that list is returned.  Otherwise,

  1985   the collection of all exceptions is raised, wrapped-up as collective

  1986   parallel exception.  Note that the latter prevents access to

  1987   individual exceptions by conventional @{verbatim "handle"} of ML.

  1988

  1989   \item @{ML Par_Exn.release_first} is similar to @{ML

  1990   Par_Exn.release_all}, but only the first (meaningful) exception that has

  1991   occurred in the original evaluation process is raised again, the others are

  1992   ignored.  That single exception may get handled by conventional

  1993   means in ML.

  1994

  1995   \end{description}

  1996 \<close>

  1997

  1998

  1999 subsection \<open>Parallel skeletons \label{sec:parlist}\<close>

  2000

  2001 text \<open>

  2002   Algorithmic skeletons are combinators that operate on lists in

  2003   parallel, in the manner of well-known @{text map}, @{text exists},

  2004   @{text forall} etc.  Management of futures (\secref{sec:futures})

  2005   and their results as reified exceptions is wrapped up into simple

  2006   programming interfaces that resemble the sequential versions.

  2007

  2008   What remains is the application-specific problem to present

  2009   expressions with suitable \emph{granularity}: each list element

  2010   corresponds to one evaluation task.  If the granularity is too

  2011   coarse, the available CPUs are not saturated.  If it is too

  2012   fine-grained, CPU cycles are wasted due to the overhead of

  2013   organizing parallel processing.  In the worst case, parallel

  2014   performance will be less than the sequential counterpart!

  2015 \<close>

  2016

  2017 text %mlref \<open>

  2018   \begin{mldecls}

  2019   @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\

  2020   @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\

  2021   \end{mldecls}

  2022

  2023   \begin{description}

  2024

  2025   \item @{ML Par_List.map}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} is like @{ML

  2026   "map"}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"}, but the evaluation of @{text "f x\<^sub>i"}

  2027   for @{text "i = 1, \<dots>, n"} is performed in parallel.

  2028

  2029   An exception in any @{text "f x\<^sub>i"} cancels the overall evaluation

  2030   process.  The final result is produced via @{ML

  2031   Par_Exn.release_first} as explained above, which means the first

  2032   program exception that happened to occur in the parallel evaluation

  2033   is propagated, and all other failures are ignored.

  2034

  2035   \item @{ML Par_List.get_some}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} produces some

  2036   @{text "f x\<^sub>i"} that is of the form @{text "SOME y\<^sub>i"}, if that

  2037   exists, otherwise @{text "NONE"}.  Thus it is similar to @{ML

  2038   Library.get_first}, but subject to a non-deterministic parallel

  2039   choice process.  The first successful result cancels the overall

  2040   evaluation process; other exceptions are propagated as for @{ML

  2041   Par_List.map}.

  2042

  2043   This generic parallel choice combinator is the basis for derived

  2044   forms, such as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML

  2045   Par_List.forall}.

  2046

  2047   \end{description}

  2048 \<close>

  2049

  2050 text %mlex \<open>Subsequently, the Ackermann function is evaluated in

  2051   parallel for some ranges of arguments.\<close>

  2052

  2053 ML_val \<open>

  2054   fun ackermann 0 n = n + 1

  2055     | ackermann m 0 = ackermann (m - 1) 1

  2056     | ackermann m n = ackermann (m - 1) (ackermann m (n - 1));

  2057

  2058   Par_List.map (ackermann 2) (500 upto 1000);

  2059   Par_List.map (ackermann 3) (5 upto 10);

  2060 \<close>

  2061

  2062

  2063 subsection \<open>Lazy evaluation\<close>

  2064

  2065 text \<open>

  2066   Classic lazy evaluation works via the @{text lazy}~/ @{text force} pair of

  2067   operations: @{text lazy} to wrap an unevaluated expression, and @{text

  2068   force} to evaluate it once and store its result persistently. Later

  2069   invocations of @{text force} retrieve the stored result without another

  2070   evaluation. Isabelle/ML refines this idea to accommodate the aspects of

  2071   multi-threading, synchronous program exceptions and asynchronous interrupts.

  2072

  2073   The first thread that invokes @{text force} on an unfinished lazy value

  2074   changes its state into a \emph{promise} of the eventual result and starts

  2075   evaluating it. Any other threads that @{text force} the same lazy value in

  2076   the meantime need to wait for it to finish, by producing a regular result or

  2077   program exception. If the evaluation attempt is interrupted, this event is

  2078   propagated to all waiting threads and the lazy value is reset to its

  2079   original state.

  2080

  2081   This means a lazy value is completely evaluated at most once, in a

  2082   thread-safe manner. There might be multiple interrupted evaluation attempts,

  2083   and multiple receivers of intermediate interrupt events. Interrupts are

  2084   \emph{not} made persistent: later evaluation attempts start again from the

  2085   original expression.

  2086 \<close>

  2087

  2088 text %mlref \<open>

  2089   \begin{mldecls}

  2090   @{index_ML_type "'a lazy"} \\

  2091   @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\

  2092   @{index_ML Lazy.value: "'a -> 'a lazy"} \\

  2093   @{index_ML Lazy.force: "'a lazy -> 'a"} \\

  2094   \end{mldecls}

  2095

  2096   \begin{description}

  2097

  2098   \item Type @{ML_type "'a lazy"} represents lazy values over type @{verbatim

  2099   "'a"}.

  2100

  2101   \item @{ML Lazy.lazy}~@{text "(fn () => e)"} wraps the unevaluated

  2102   expression @{text e} as unfinished lazy value.

  2103

  2104   \item @{ML Lazy.value}~@{text a} wraps the value @{text a} as finished lazy

  2105   value.  When forced, it returns @{text a} without any further evaluation.

  2106

  2107   There is very low overhead for this proforma wrapping of strict values as

  2108   lazy values.

  2109

  2110   \item @{ML Lazy.force}~@{text x} produces the result of the lazy value in a

  2111   thread-safe manner as explained above. Thus it may cause the current thread

  2112   to wait on a pending evaluation attempt by another thread.

  2113

  2114   \end{description}

  2115 \<close>

  2116

  2117

  2118 subsection \<open>Futures \label{sec:futures}\<close>

  2119

  2120 text \<open>

  2121   Futures help to organize parallel execution in a value-oriented manner, with

  2122   @{text fork}~/ @{text join} as the main pair of operations, and some further

  2123   variants; see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy

  2124   values, futures are evaluated strictly and spontaneously on separate worker

  2125   threads. Futures may be canceled, which leads to interrupts on running

  2126   evaluation attempts, and forces structurally related futures to fail for all

  2127   time; already finished futures remain unchanged. Exceptions between related

  2128   futures are propagated as well, and turned into parallel exceptions (see

  2129   above).

  2130

  2131   Technically, a future is a single-assignment variable together with a

  2132   \emph{task} that serves administrative purposes, notably within the

  2133   \emph{task queue} where new futures are registered for eventual evaluation

  2134   and the worker threads retrieve their work.

  2135

  2136   The pool of worker threads is limited, in correlation with the number of

  2137   physical cores on the machine. Note that allocation of runtime resources may

  2138   be distorted either if workers yield CPU time (e.g.\ via system sleep or

  2139   wait operations), or if non-worker threads contend for significant runtime

  2140   resources independently. There is a limited number of replacement worker

  2141   threads that get activated in certain explicit wait conditions, after a

  2142   timeout.

  2143

  2144   \medskip Each future task belongs to some \emph{task group}, which

  2145   represents the hierarchic structure of related tasks, together with the

  2146   exception status a that point. By default, the task group of a newly created

  2147   future is a new sub-group of the presently running one, but it is also

  2148   possible to indicate different group layouts under program control.

  2149

  2150   Cancellation of futures actually refers to the corresponding task group and

  2151   all its sub-groups. Thus interrupts are propagated down the group hierarchy.

  2152   Regular program exceptions are treated likewise: failure of the evaluation

  2153   of some future task affects its own group and all sub-groups. Given a

  2154   particular task group, its \emph{group status} cumulates all relevant

  2155   exceptions according to its position within the group hierarchy. Interrupted

  2156   tasks that lack regular result information, will pick up parallel exceptions

  2157   from the cumulative group status.

  2158

  2159   \medskip A \emph{passive future} or \emph{promise} is a future with slightly

  2160   different evaluation policies: there is only a single-assignment variable

  2161   and some expression to evaluate for the \emph{failed} case (e.g.\ to clean

  2162   up resources when canceled). A regular result is produced by external means,

  2163   using a separate \emph{fulfill} operation.

  2164

  2165   Promises are managed in the same task queue, so regular futures may depend

  2166   on them. This allows a form of reactive programming, where some promises are

  2167   used as minimal elements (or guards) within the future dependency graph:

  2168   when these promises are fulfilled the evaluation of subsequent futures

  2169   starts spontaneously, according to their own inter-dependencies.

  2170 \<close>

  2171

  2172 text %mlref \<open>

  2173   \begin{mldecls}

  2174   @{index_ML_type "'a future"} \\

  2175   @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\

  2176   @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\

  2177   @{index_ML Future.join: "'a future -> 'a"} \\

  2178   @{index_ML Future.joins: "'a future list -> 'a list"} \\

  2179   @{index_ML Future.value: "'a -> 'a future"} \\

  2180   @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\

  2181   @{index_ML Future.cancel: "'a future -> unit"} \\

  2182   @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex]

  2183   @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\

  2184   @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\

  2185   \end{mldecls}

  2186

  2187   \begin{description}

  2188

  2189   \item Type @{ML_type "'a future"} represents future values over type

  2190   @{verbatim "'a"}.

  2191

  2192   \item @{ML Future.fork}~@{text "(fn () => e)"} registers the unevaluated

  2193   expression @{text e} as unfinished future value, to be evaluated eventually

  2194   on the parallel worker-thread farm. This is a shorthand for @{ML

  2195   Future.forks} below, with default parameters and a single expression.

  2196

  2197   \item @{ML Future.forks}~@{text "params exprs"} is the general interface to

  2198   fork several futures simultaneously. The @{text params} consist of the

  2199   following fields:

  2200

  2201   \begin{itemize}

  2202

  2203   \item @{text "name : string"} (default @{ML "\"\""}) specifies a common name

  2204   for the tasks of the forked futures, which serves diagnostic purposes.

  2205

  2206   \item @{text "group : Future.group option"} (default @{ML NONE}) specifies

  2207   an optional task group for the forked futures. @{ML NONE} means that a new

  2208   sub-group of the current worker-thread task context is created. If this is

  2209   not a worker thread, the group will be a new root in the group hierarchy.

  2210

  2211   \item @{text "deps : Future.task list"} (default @{ML "[]"}) specifies

  2212   dependencies on other future tasks, i.e.\ the adjacency relation in the

  2213   global task queue. Dependencies on already finished tasks are ignored.

  2214

  2215   \item @{text "pri : int"} (default @{ML 0}) specifies a priority within the

  2216   task queue.

  2217

  2218   Typically there is only little deviation from the default priority @{ML 0}.

  2219   As a rule of thumb, @{ML "~1"} means low priority" and @{ML 1} means

  2220   high priority''.

  2221

  2222   Note that the task priority only affects the position in the queue, not the

  2223   thread priority. When a worker thread picks up a task for processing, it

  2224   runs with the normal thread priority to the end (or until canceled). Higher

  2225   priority tasks that are queued later need to wait until this (or another)

  2226   worker thread becomes free again.

  2227

  2228   \item @{text "interrupts : bool"} (default @{ML true}) tells whether the

  2229   worker thread that processes the corresponding task is initially put into

  2230   interruptible state. This state may change again while running, by modifying

  2231   the thread attributes.

  2232

  2233   With interrupts disabled, a running future task cannot be canceled.  It is

  2234   the responsibility of the programmer that this special state is retained

  2235   only briefly.

  2236

  2237   \end{itemize}

  2238

  2239   \item @{ML Future.join}~@{text x} retrieves the value of an already finished

  2240   future, which may lead to an exception, according to the result of its

  2241   previous evaluation.

  2242

  2243   For an unfinished future there are several cases depending on the role of

  2244   the current thread and the status of the future. A non-worker thread waits

  2245   passively until the future is eventually evaluated. A worker thread

  2246   temporarily changes its task context and takes over the responsibility to

  2247   evaluate the future expression on the spot. The latter is done in a

  2248   thread-safe manner: other threads that intend to join the same future need

  2249   to wait until the ongoing evaluation is finished.

  2250

  2251   Note that excessive use of dynamic dependencies of futures by adhoc joining

  2252   may lead to bad utilization of CPU cores, due to threads waiting on other

  2253   threads to finish required futures. The future task farm has a limited

  2254   amount of replacement threads that continue working on unrelated tasks after

  2255   some timeout.

  2256

  2257   Whenever possible, static dependencies of futures should be specified

  2258   explicitly when forked (see @{text deps} above). Thus the evaluation can

  2259   work from the bottom up, without join conflicts and wait states.

  2260

  2261   \item @{ML Future.joins}~@{text xs} joins the given list of futures

  2262   simultaneously, which is more efficient than @{ML "map Future.join"}~@{text

  2263   xs}.

  2264

  2265   Based on the dependency graph of tasks, the current thread takes over the

  2266   responsibility to evaluate future expressions that are required for the main

  2267   result, working from the bottom up. Waiting on future results that are

  2268   presently evaluated on other threads only happens as last resort, when no

  2269   other unfinished futures are left over.

  2270

  2271   \item @{ML Future.value}~@{text a} wraps the value @{text a} as finished

  2272   future value, bypassing the worker-thread farm. When joined, it returns

  2273   @{text a} without any further evaluation.

  2274

  2275   There is very low overhead for this proforma wrapping of strict values as

  2276   futures.

  2277

  2278   \item @{ML Future.map}~@{text "f x"} is a fast-path implementation of @{ML

  2279   Future.fork}~@{text "(fn () => f ("}@{ML Future.join}~@{text "x))"}, which

  2280   avoids the full overhead of the task queue and worker-thread farm as far as

  2281   possible. The function @{text f} is supposed to be some trivial

  2282   post-processing or projection of the future result.

  2283

  2284   \item @{ML Future.cancel}~@{text "x"} cancels the task group of the given

  2285   future, using @{ML Future.cancel_group} below.

  2286

  2287   \item @{ML Future.cancel_group}~@{text "group"} cancels all tasks of the

  2288   given task group for all time. Threads that are presently processing a task

  2289   of the given group are interrupted: it may take some time until they are

  2290   actually terminated. Tasks that are queued but not yet processed are

  2291   dequeued and forced into interrupted state. Since the task group is itself

  2292   invalidated, any further attempt to fork a future that belongs to it will

  2293   yield a canceled result as well.

  2294

  2295   \item @{ML Future.promise}~@{text abort} registers a passive future with the

  2296   given @{text abort} operation: it is invoked when the future task group is

  2297   canceled.

  2298

  2299   \item @{ML Future.fulfill}~@{text "x a"} finishes the passive future @{text

  2300   x} by the given value @{text a}. If the promise has already been canceled,

  2301   the attempt to fulfill it causes an exception.

  2302

  2303   \end{description}

  2304 \<close>

  2305

  2306 end