src/Doc/IsarImplementation/ML.thy
changeset 56420 b266e7a86485
parent 56419 f47de9e82b0f
child 56431 4eb88149c7b2
equal deleted inserted replaced
56419:f47de9e82b0f 56420:b266e7a86485
     1 theory "ML"
       
     2 imports Base
       
     3 begin
       
     4 
       
     5 chapter {* Isabelle/ML *}
       
     6 
       
     7 text {* Isabelle/ML is best understood as a certain culture based on
       
     8   Standard ML.  Thus it is not a new programming language, but a
       
     9   certain way to use SML at an advanced level within the Isabelle
       
    10   environment.  This covers a variety of aspects that are geared
       
    11   towards an efficient and robust platform for applications of formal
       
    12   logic with fully foundational proof construction --- according to
       
    13   the well-known \emph{LCF principle}.  There is specific
       
    14   infrastructure with library modules to address the needs of this
       
    15   difficult task.  For example, the raw parallel programming model of
       
    16   Poly/ML is presented as considerably more abstract concept of
       
    17   \emph{future values}, which is then used to augment the inference
       
    18   kernel, proof interpreter, and theory loader accordingly.
       
    19 
       
    20   The main aspects of Isabelle/ML are introduced below.  These
       
    21   first-hand explanations should help to understand how proper
       
    22   Isabelle/ML is to be read and written, and to get access to the
       
    23   wealth of experience that is expressed in the source text and its
       
    24   history of changes.\footnote{See
       
    25   @{url "http://isabelle.in.tum.de/repos/isabelle"} for the full
       
    26   Mercurial history.  There are symbolic tags to refer to official
       
    27   Isabelle releases, as opposed to arbitrary \emph{tip} versions that
       
    28   merely reflect snapshots that are never really up-to-date.}  *}
       
    29 
       
    30 
       
    31 section {* Style and orthography *}
       
    32 
       
    33 text {* The sources of Isabelle/Isar are optimized for
       
    34   \emph{readability} and \emph{maintainability}.  The main purpose is
       
    35   to tell an informed reader what is really going on and how things
       
    36   really work.  This is a non-trivial aim, but it is supported by a
       
    37   certain style of writing Isabelle/ML that has emerged from long
       
    38   years of system development.\footnote{See also the interesting style
       
    39   guide for OCaml
       
    40   @{url "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"}
       
    41   which shares many of our means and ends.}
       
    42 
       
    43   The main principle behind any coding style is \emph{consistency}.
       
    44   For a single author of a small program this merely means ``choose
       
    45   your style and stick to it''.  A complex project like Isabelle, with
       
    46   long years of development and different contributors, requires more
       
    47   standardization.  A coding style that is changed every few years or
       
    48   with every new contributor is no style at all, because consistency
       
    49   is quickly lost.  Global consistency is hard to achieve, though.
       
    50   Nonetheless, one should always strive at least for local consistency
       
    51   of modules and sub-systems, without deviating from some general
       
    52   principles how to write Isabelle/ML.
       
    53 
       
    54   In a sense, good coding style is like an \emph{orthography} for the
       
    55   sources: it helps to read quickly over the text and see through the
       
    56   main points, without getting distracted by accidental presentation
       
    57   of free-style code.
       
    58 *}
       
    59 
       
    60 
       
    61 subsection {* Header and sectioning *}
       
    62 
       
    63 text {* Isabelle source files have a certain standardized header
       
    64   format (with precise spacing) that follows ancient traditions
       
    65   reaching back to the earliest versions of the system by Larry
       
    66   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.
       
    67 
       
    68   The header includes at least @{verbatim Title} and @{verbatim
       
    69   Author} entries, followed by a prose description of the purpose of
       
    70   the module.  The latter can range from a single line to several
       
    71   paragraphs of explanations.
       
    72 
       
    73   The rest of the file is divided into sections, subsections,
       
    74   subsubsections, paragraphs etc.\ using a simple layout via ML
       
    75   comments as follows.
       
    76 
       
    77 \begin{verbatim}
       
    78 (*** section ***)
       
    79 
       
    80 (** subsection **)
       
    81 
       
    82 (* subsubsection *)
       
    83 
       
    84 (*short paragraph*)
       
    85 
       
    86 (*
       
    87   long paragraph,
       
    88   with more text
       
    89 *)
       
    90 \end{verbatim}
       
    91 
       
    92   As in regular typography, there is some extra space \emph{before}
       
    93   section headings that are adjacent to plain text (not other headings
       
    94   as in the example above).
       
    95 
       
    96   \medskip The precise wording of the prose text given in these
       
    97   headings is chosen carefully to introduce the main theme of the
       
    98   subsequent formal ML text.
       
    99 *}
       
   100 
       
   101 
       
   102 subsection {* Naming conventions *}
       
   103 
       
   104 text {* Since ML is the primary medium to express the meaning of the
       
   105   source text, naming of ML entities requires special care.
       
   106 
       
   107   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely
       
   108   4, but not more) that are separated by underscore.  There are three
       
   109   variants concerning upper or lower case letters, which are used for
       
   110   certain ML categories as follows:
       
   111 
       
   112   \medskip
       
   113   \begin{tabular}{lll}
       
   114   variant & example & ML categories \\\hline
       
   115   lower-case & @{ML_text foo_bar} & values, types, record fields \\
       
   116   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
       
   117   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
       
   118   \end{tabular}
       
   119   \medskip
       
   120 
       
   121   For historical reasons, many capitalized names omit underscores,
       
   122   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.
       
   123   Genuine mixed-case names are \emph{not} used, because clear division
       
   124   of words is essential for readability.\footnote{Camel-case was
       
   125   invented to workaround the lack of underscore in some early
       
   126   non-ASCII character sets.  Later it became habitual in some language
       
   127   communities that are now strong in numbers.}
       
   128 
       
   129   A single (capital) character does not count as ``word'' in this
       
   130   respect: some Isabelle/ML names are suffixed by extra markers like
       
   131   this: @{ML_text foo_barT}.
       
   132 
       
   133   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
       
   134   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
       
   135   foo''''} or more.  Decimal digits scale better to larger numbers,
       
   136   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
       
   137 
       
   138   \paragraph{Scopes.}  Apart from very basic library modules, ML
       
   139   structures are not ``opened'', but names are referenced with
       
   140   explicit qualification, as in @{ML Syntax.string_of_term} for
       
   141   example.  When devising names for structures and their components it
       
   142   is important aim at eye-catching compositions of both parts, because
       
   143   this is how they are seen in the sources and documentation.  For the
       
   144   same reasons, aliases of well-known library functions should be
       
   145   avoided.
       
   146 
       
   147   Local names of function abstraction or case/let bindings are
       
   148   typically shorter, sometimes using only rudiments of ``words'',
       
   149   while still avoiding cryptic shorthands.  An auxiliary function
       
   150   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is
       
   151   considered bad style.
       
   152 
       
   153   Example:
       
   154 
       
   155   \begin{verbatim}
       
   156   (* RIGHT *)
       
   157 
       
   158   fun print_foo ctxt foo =
       
   159     let
       
   160       fun print t = ... Syntax.string_of_term ctxt t ...
       
   161     in ... end;
       
   162 
       
   163 
       
   164   (* RIGHT *)
       
   165 
       
   166   fun print_foo ctxt foo =
       
   167     let
       
   168       val string_of_term = Syntax.string_of_term ctxt;
       
   169       fun print t = ... string_of_term t ...
       
   170     in ... end;
       
   171 
       
   172 
       
   173   (* WRONG *)
       
   174 
       
   175   val string_of_term = Syntax.string_of_term;
       
   176 
       
   177   fun print_foo ctxt foo =
       
   178     let
       
   179       fun aux t = ... string_of_term ctxt t ...
       
   180     in ... end;
       
   181 
       
   182   \end{verbatim}
       
   183 
       
   184 
       
   185   \paragraph{Specific conventions.} Here are some specific name forms
       
   186   that occur frequently in the sources.
       
   187 
       
   188   \begin{itemize}
       
   189 
       
   190   \item A function that maps @{ML_text foo} to @{ML_text bar} is
       
   191   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
       
   192   @{ML_text foo2bar}, @{ML_text bar_from_foo}, @{ML_text
       
   193   bar_for_foo}, or @{ML_text bar4foo}).
       
   194 
       
   195   \item The name component @{ML_text legacy} means that the operation
       
   196   is about to be discontinued soon.
       
   197 
       
   198   \item The name component @{ML_text old} means that this is historic
       
   199   material that might disappear at some later stage.
       
   200 
       
   201   \item The name component @{ML_text global} means that this works
       
   202   with the background theory instead of the regular local context
       
   203   (\secref{sec:context}), sometimes for historical reasons, sometimes
       
   204   due a genuine lack of locality of the concept involved, sometimes as
       
   205   a fall-back for the lack of a proper context in the application
       
   206   code.  Whenever there is a non-global variant available, the
       
   207   application should be migrated to use it with a proper local
       
   208   context.
       
   209 
       
   210   \item Variables of the main context types of the Isabelle/Isar
       
   211   framework (\secref{sec:context} and \chref{ch:local-theory}) have
       
   212   firm naming conventions as follows:
       
   213 
       
   214   \begin{itemize}
       
   215 
       
   216   \item theories are called @{ML_text thy}, rarely @{ML_text theory}
       
   217   (never @{ML_text thry})
       
   218 
       
   219   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text
       
   220   context} (never @{ML_text ctx})
       
   221 
       
   222   \item generic contexts are called @{ML_text context}, rarely
       
   223   @{ML_text ctxt}
       
   224 
       
   225   \item local theories are called @{ML_text lthy}, except for local
       
   226   theories that are treated as proof context (which is a semantic
       
   227   super-type)
       
   228 
       
   229   \end{itemize}
       
   230 
       
   231   Variations with primed or decimal numbers are always possible, as
       
   232   well as sematic prefixes like @{ML_text foo_thy} or @{ML_text
       
   233   bar_ctxt}, but the base conventions above need to be preserved.
       
   234   This allows to visualize the their data flow via plain regular
       
   235   expressions in the editor.
       
   236 
       
   237   \item The main logical entities (\secref{ch:logic}) have established
       
   238   naming convention as follows:
       
   239 
       
   240   \begin{itemize}
       
   241 
       
   242   \item sorts are called @{ML_text S}
       
   243 
       
   244   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text
       
   245   ty} (never @{ML_text t})
       
   246 
       
   247   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text
       
   248   tm} (never @{ML_text trm})
       
   249 
       
   250   \item certified types are called @{ML_text cT}, rarely @{ML_text
       
   251   T}, with variants as for types
       
   252 
       
   253   \item certified terms are called @{ML_text ct}, rarely @{ML_text
       
   254   t}, with variants as for terms (never @{ML_text ctrm})
       
   255 
       
   256   \item theorems are called @{ML_text th}, or @{ML_text thm}
       
   257 
       
   258   \end{itemize}
       
   259 
       
   260   Proper semantic names override these conventions completely.  For
       
   261   example, the left-hand side of an equation (as a term) can be called
       
   262   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known
       
   263   to be a variable can be called @{ML_text v} or @{ML_text x}.
       
   264 
       
   265   \item Tactics (\secref{sec:tactics}) are sufficiently important to
       
   266   have specific naming conventions.  The name of a basic tactic
       
   267   definition always has a @{ML_text "_tac"} suffix, the subgoal index
       
   268   (if applicable) is always called @{ML_text i}, and the goal state
       
   269   (if made explicit) is usually called @{ML_text st} instead of the
       
   270   somewhat misleading @{ML_text thm}.  Any other arguments are given
       
   271   before the latter two, and the general context is given first.
       
   272   Example:
       
   273 
       
   274   \begin{verbatim}
       
   275   fun my_tac ctxt arg1 arg2 i st = ...
       
   276   \end{verbatim}
       
   277 
       
   278   Note that the goal state @{ML_text st} above is rarely made
       
   279   explicit, if tactic combinators (tacticals) are used as usual.
       
   280 
       
   281   \end{itemize}
       
   282 *}
       
   283 
       
   284 
       
   285 subsection {* General source layout *}
       
   286 
       
   287 text {* The general Isabelle/ML source layout imitates regular
       
   288   type-setting to some extent, augmented by the requirements for
       
   289   deeply nested expressions that are commonplace in functional
       
   290   programming.
       
   291 
       
   292   \paragraph{Line length} is 80 characters according to ancient
       
   293   standards, but we allow as much as 100 characters (not
       
   294   more).\footnote{Readability requires to keep the beginning of a line
       
   295   in view while watching its end.  Modern wide-screen displays do not
       
   296   change the way how the human brain works.  Sources also need to be
       
   297   printable on plain paper with reasonable font-size.} The extra 20
       
   298   characters acknowledge the space requirements due to qualified
       
   299   library references in Isabelle/ML.
       
   300 
       
   301   \paragraph{White-space} is used to emphasize the structure of
       
   302   expressions, following mostly standard conventions for mathematical
       
   303   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
       
   304   defines positioning of spaces for parentheses, punctuation, and
       
   305   infixes as illustrated here:
       
   306 
       
   307   \begin{verbatim}
       
   308   val x = y + z * (a + b);
       
   309   val pair = (a, b);
       
   310   val record = {foo = 1, bar = 2};
       
   311   \end{verbatim}
       
   312 
       
   313   Lines are normally broken \emph{after} an infix operator or
       
   314   punctuation character.  For example:
       
   315 
       
   316   \begin{verbatim}
       
   317   val x =
       
   318     a +
       
   319     b +
       
   320     c;
       
   321 
       
   322   val tuple =
       
   323    (a,
       
   324     b,
       
   325     c);
       
   326   \end{verbatim}
       
   327 
       
   328   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the
       
   329   start of the line, but punctuation is always at the end.
       
   330 
       
   331   Function application follows the tradition of @{text "\<lambda>"}-calculus,
       
   332   not informal mathematics.  For example: @{ML_text "f a b"} for a
       
   333   curried function, or @{ML_text "g (a, b)"} for a tupled function.
       
   334   Note that the space between @{ML_text g} and the pair @{ML_text
       
   335   "(a, b)"} follows the important principle of
       
   336   \emph{compositionality}: the layout of @{ML_text "g p"} does not
       
   337   change when @{ML_text "p"} is refined to the concrete pair
       
   338   @{ML_text "(a, b)"}.
       
   339 
       
   340   \paragraph{Indentation} uses plain spaces, never hard
       
   341   tabulators.\footnote{Tabulators were invented to move the carriage
       
   342   of a type-writer to certain predefined positions.  In software they
       
   343   could be used as a primitive run-length compression of consecutive
       
   344   spaces, but the precise result would depend on non-standardized
       
   345   editor configuration.}
       
   346 
       
   347   Each level of nesting is indented by 2 spaces, sometimes 1, very
       
   348   rarely 4, never 8 or any other odd number.
       
   349 
       
   350   Indentation follows a simple logical format that only depends on the
       
   351   nesting depth, not the accidental length of the text that initiates
       
   352   a level of nesting.  Example:
       
   353 
       
   354   \begin{verbatim}
       
   355   (* RIGHT *)
       
   356 
       
   357   if b then
       
   358     expr1_part1
       
   359     expr1_part2
       
   360   else
       
   361     expr2_part1
       
   362     expr2_part2
       
   363 
       
   364 
       
   365   (* WRONG *)
       
   366 
       
   367   if b then expr1_part1
       
   368             expr1_part2
       
   369   else expr2_part1
       
   370        expr2_part2
       
   371   \end{verbatim}
       
   372 
       
   373   The second form has many problems: it assumes a fixed-width font
       
   374   when viewing the sources, it uses more space on the line and thus
       
   375   makes it hard to observe its strict length limit (working against
       
   376   \emph{readability}), it requires extra editing to adapt the layout
       
   377   to changes of the initial text (working against
       
   378   \emph{maintainability}) etc.
       
   379 
       
   380   \medskip For similar reasons, any kind of two-dimensional or tabular
       
   381   layouts, ASCII-art with lines or boxes of asterisks etc.\ should be
       
   382   avoided.
       
   383 
       
   384   \paragraph{Complex expressions} that consist of multi-clausal
       
   385   function definitions, @{ML_text handle}, @{ML_text case},
       
   386   @{ML_text let} (and combinations) require special attention.  The
       
   387   syntax of Standard ML is quite ambitious and admits a lot of
       
   388   variance that can distort the meaning of the text.
       
   389 
       
   390   Clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
       
   391   @{ML_text case} get extra indentation to indicate the nesting
       
   392   clearly.  Example:
       
   393 
       
   394   \begin{verbatim}
       
   395   (* RIGHT *)
       
   396 
       
   397   fun foo p1 =
       
   398         expr1
       
   399     | foo p2 =
       
   400         expr2
       
   401 
       
   402 
       
   403   (* WRONG *)
       
   404 
       
   405   fun foo p1 =
       
   406     expr1
       
   407     | foo p2 =
       
   408     expr2
       
   409   \end{verbatim}
       
   410 
       
   411   Body expressions consisting of @{ML_text case} or @{ML_text let}
       
   412   require care to maintain compositionality, to prevent loss of
       
   413   logical indentation where it is especially important to see the
       
   414   structure of the text.  Example:
       
   415 
       
   416   \begin{verbatim}
       
   417   (* RIGHT *)
       
   418 
       
   419   fun foo p1 =
       
   420         (case e of
       
   421           q1 => ...
       
   422         | q2 => ...)
       
   423     | foo p2 =
       
   424         let
       
   425           ...
       
   426         in
       
   427           ...
       
   428         end
       
   429 
       
   430 
       
   431   (* WRONG *)
       
   432 
       
   433   fun foo p1 = case e of
       
   434       q1 => ...
       
   435     | q2 => ...
       
   436     | foo p2 =
       
   437     let
       
   438       ...
       
   439     in
       
   440       ...
       
   441     end
       
   442   \end{verbatim}
       
   443 
       
   444   Extra parentheses around @{ML_text case} expressions are optional,
       
   445   but help to analyse the nesting based on character matching in the
       
   446   editor.
       
   447 
       
   448   \medskip There are two main exceptions to the overall principle of
       
   449   compositionality in the layout of complex expressions.
       
   450 
       
   451   \begin{enumerate}
       
   452 
       
   453   \item @{ML_text "if"} expressions are iterated as if there would be
       
   454   a multi-branch conditional in SML, e.g.
       
   455 
       
   456   \begin{verbatim}
       
   457   (* RIGHT *)
       
   458 
       
   459   if b1 then e1
       
   460   else if b2 then e2
       
   461   else e3
       
   462   \end{verbatim}
       
   463 
       
   464   \item @{ML_text fn} abstractions are often layed-out as if they
       
   465   would lack any structure by themselves.  This traditional form is
       
   466   motivated by the possibility to shift function arguments back and
       
   467   forth wrt.\ additional combinators.  Example:
       
   468 
       
   469   \begin{verbatim}
       
   470   (* RIGHT *)
       
   471 
       
   472   fun foo x y = fold (fn z =>
       
   473     expr)
       
   474   \end{verbatim}
       
   475 
       
   476   Here the visual appearance is that of three arguments @{ML_text x},
       
   477   @{ML_text y}, @{ML_text z}.
       
   478 
       
   479   \end{enumerate}
       
   480 
       
   481   Such weakly structured layout should be use with great care.  Here
       
   482   are some counter-examples involving @{ML_text let} expressions:
       
   483 
       
   484   \begin{verbatim}
       
   485   (* WRONG *)
       
   486 
       
   487   fun foo x = let
       
   488       val y = ...
       
   489     in ... end
       
   490 
       
   491 
       
   492   (* WRONG *)
       
   493 
       
   494   fun foo x = let
       
   495     val y = ...
       
   496   in ... end
       
   497 
       
   498 
       
   499   (* WRONG *)
       
   500 
       
   501   fun foo x =
       
   502   let
       
   503     val y = ...
       
   504   in ... end
       
   505   \end{verbatim}
       
   506 
       
   507   \medskip In general the source layout is meant to emphasize the
       
   508   structure of complex language expressions, not to pretend that SML
       
   509   had a completely different syntax (say that of Haskell or Java).
       
   510 *}
       
   511 
       
   512 
       
   513 section {* SML embedded into Isabelle/Isar *}
       
   514 
       
   515 text {* ML and Isar are intertwined via an open-ended bootstrap
       
   516   process that provides more and more programming facilities and
       
   517   logical content in an alternating manner.  Bootstrapping starts from
       
   518   the raw environment of existing implementations of Standard ML
       
   519   (mainly Poly/ML, but also SML/NJ).
       
   520 
       
   521   Isabelle/Pure marks the point where the original ML toplevel is
       
   522   superseded by the Isar toplevel that maintains a uniform context for
       
   523   arbitrary ML values (see also \secref{sec:context}).  This formal
       
   524   environment holds ML compiler bindings, logical entities, and many
       
   525   other things.  Raw SML is never encountered again after the initial
       
   526   bootstrap of Isabelle/Pure.
       
   527 
       
   528   Object-logics like Isabelle/HOL are built within the
       
   529   Isabelle/ML/Isar environment by introducing suitable theories with
       
   530   associated ML modules, either inlined or as separate files.  Thus
       
   531   Isabelle/HOL is defined as a regular user-space application within
       
   532   the Isabelle framework.  Further add-on tools can be implemented in
       
   533   ML within the Isar context in the same manner: ML is part of the
       
   534   standard repertoire of Isabelle, and there is no distinction between
       
   535   ``user'' and ``developer'' in this respect.
       
   536 *}
       
   537 
       
   538 
       
   539 subsection {* Isar ML commands *}
       
   540 
       
   541 text {* The primary Isar source language provides facilities to ``open
       
   542   a window'' to the underlying ML compiler.  Especially see the Isar
       
   543   commands @{command_ref "ML_file"} and @{command_ref "ML"}: both work the
       
   544   same way, only the source text is provided via a file vs.\ inlined,
       
   545   respectively.  Apart from embedding ML into the main theory
       
   546   definition like that, there are many more commands that refer to ML
       
   547   source, such as @{command_ref setup} or @{command_ref declaration}.
       
   548   Even more fine-grained embedding of ML into Isar is encountered in
       
   549   the proof method @{method_ref tactic}, which refines the pending
       
   550   goal state via a given expression of type @{ML_type tactic}.
       
   551 *}
       
   552 
       
   553 text %mlex {* The following artificial example demonstrates some ML
       
   554   toplevel declarations within the implicit Isar theory context.  This
       
   555   is regular functional programming without referring to logical
       
   556   entities yet.
       
   557 *}
       
   558 
       
   559 ML {*
       
   560   fun factorial 0 = 1
       
   561     | factorial n = n * factorial (n - 1)
       
   562 *}
       
   563 
       
   564 text {* Here the ML environment is already managed by Isabelle, i.e.\
       
   565   the @{ML factorial} function is not yet accessible in the preceding
       
   566   paragraph, nor in a different theory that is independent from the
       
   567   current one in the import hierarchy.
       
   568 
       
   569   Removing the above ML declaration from the source text will remove
       
   570   any trace of this definition as expected.  The Isabelle/ML toplevel
       
   571   environment is managed in a \emph{stateless} way: unlike the raw ML
       
   572   toplevel there are no global side-effects involved
       
   573   here.\footnote{Such a stateless compilation environment is also a
       
   574   prerequisite for robust parallel compilation within independent
       
   575   nodes of the implicit theory development graph.}
       
   576 
       
   577   \medskip The next example shows how to embed ML into Isar proofs, using
       
   578  @{command_ref "ML_prf"} instead of Instead of @{command_ref "ML"}.
       
   579   As illustrated below, the effect on the ML environment is local to
       
   580   the whole proof body, ignoring the block structure.
       
   581 *}
       
   582 
       
   583 notepad
       
   584 begin
       
   585   ML_prf %"ML" {* val a = 1 *}
       
   586   {
       
   587     ML_prf %"ML" {* val b = a + 1 *}
       
   588   } -- {* Isar block structure ignored by ML environment *}
       
   589   ML_prf %"ML" {* val c = b + 1 *}
       
   590 end
       
   591 
       
   592 text {* By side-stepping the normal scoping rules for Isar proof
       
   593   blocks, embedded ML code can refer to the different contexts and
       
   594   manipulate corresponding entities, e.g.\ export a fact from a block
       
   595   context.
       
   596 
       
   597   \medskip Two further ML commands are useful in certain situations:
       
   598   @{command_ref ML_val} and @{command_ref ML_command} are
       
   599   \emph{diagnostic} in the sense that there is no effect on the
       
   600   underlying environment, and can thus used anywhere (even outside a
       
   601   theory).  The examples below produce long strings of digits by
       
   602   invoking @{ML factorial}: @{command ML_val} already takes care of
       
   603   printing the ML toplevel result, but @{command ML_command} is silent
       
   604   so we produce an explicit output message.  *}
       
   605 
       
   606 ML_val {* factorial 100 *}
       
   607 ML_command {* writeln (string_of_int (factorial 100)) *}
       
   608 
       
   609 notepad
       
   610 begin
       
   611   ML_val {* factorial 100 *}
       
   612   ML_command {* writeln (string_of_int (factorial 100)) *}
       
   613 end
       
   614 
       
   615 
       
   616 subsection {* Compile-time context *}
       
   617 
       
   618 text {* Whenever the ML compiler is invoked within Isabelle/Isar, the
       
   619   formal context is passed as a thread-local reference variable.  Thus
       
   620   ML code may access the theory context during compilation, by reading
       
   621   or writing the (local) theory under construction.  Note that such
       
   622   direct access to the compile-time context is rare.  In practice it
       
   623   is typically done via some derived ML functions instead.
       
   624 *}
       
   625 
       
   626 text %mlref {*
       
   627   \begin{mldecls}
       
   628   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\
       
   629   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
       
   630   @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
       
   631   @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
       
   632   \end{mldecls}
       
   633 
       
   634   \begin{description}
       
   635 
       
   636   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory
       
   637   context of the ML toplevel --- at compile time.  ML code needs to
       
   638   take care to refer to @{ML "ML_Context.the_generic_context ()"}
       
   639   correctly.  Recall that evaluation of a function body is delayed
       
   640   until actual run-time.
       
   641 
       
   642   \item @{ML "Context.>>"}~@{text f} applies context transformation
       
   643   @{text f} to the implicit context of the ML toplevel.
       
   644 
       
   645   \item @{ML ML_Thms.bind_thms}~@{text "(name, thms)"} stores a list of
       
   646   theorems produced in ML both in the (global) theory context and the
       
   647   ML toplevel, associating it with the provided name.  Theorems are
       
   648   put into a global ``standard'' format before being stored.
       
   649 
       
   650   \item @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to a
       
   651   singleton fact.
       
   652 
       
   653   \end{description}
       
   654 
       
   655   It is important to note that the above functions are really
       
   656   restricted to the compile time, even though the ML compiler is
       
   657   invoked at run-time.  The majority of ML code either uses static
       
   658   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or
       
   659   proof context at run-time, by explicit functional abstraction.
       
   660 *}
       
   661 
       
   662 
       
   663 subsection {* Antiquotations \label{sec:ML-antiq} *}
       
   664 
       
   665 text {* A very important consequence of embedding SML into Isar is the
       
   666   concept of \emph{ML antiquotation}.  The standard token language of
       
   667   ML is augmented by special syntactic entities of the following form:
       
   668 
       
   669   @{rail \<open>
       
   670   @{syntax_def antiquote}: '@{' nameref args '}'
       
   671   \<close>}
       
   672 
       
   673   Here @{syntax nameref} and @{syntax args} are regular outer syntax
       
   674   categories \cite{isabelle-isar-ref}.  Attributes and proof methods
       
   675   use similar syntax.
       
   676 
       
   677   \medskip A regular antiquotation @{text "@{name args}"} processes
       
   678   its arguments by the usual means of the Isar source language, and
       
   679   produces corresponding ML source text, either as literal
       
   680   \emph{inline} text (e.g. @{text "@{term t}"}) or abstract
       
   681   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation
       
   682   scheme allows to refer to formal entities in a robust manner, with
       
   683   proper static scoping and with some degree of logical checking of
       
   684   small portions of the code.
       
   685 *}
       
   686 
       
   687 
       
   688 subsection {* Printing ML values *}
       
   689 
       
   690 text {* The ML compiler knows about the structure of values according
       
   691   to their static type, and can print them in the manner of the
       
   692   toplevel loop, although the details are non-portable.  The
       
   693   antiquotations @{ML_antiquotation_def "make_string"} and
       
   694   @{ML_antiquotation_def "print"} provide a quasi-portable way to
       
   695   refer to this potential capability of the underlying ML system in
       
   696   generic Isabelle/ML sources.
       
   697 
       
   698   This is occasionally useful for diagnostic or demonstration
       
   699   purposes.  Note that production-quality tools require proper
       
   700   user-level error messages. *}
       
   701 
       
   702 text %mlantiq {*
       
   703   \begin{matharray}{rcl}
       
   704   @{ML_antiquotation_def "make_string"} & : & @{text ML_antiquotation} \\
       
   705   @{ML_antiquotation_def "print"} & : & @{text ML_antiquotation} \\
       
   706   \end{matharray}
       
   707 
       
   708   @{rail \<open>
       
   709   @@{ML_antiquotation make_string}
       
   710   ;
       
   711   @@{ML_antiquotation print} @{syntax name}?
       
   712   \<close>}
       
   713 
       
   714   \begin{description}
       
   715 
       
   716   \item @{text "@{make_string}"} inlines a function to print arbitrary
       
   717   values similar to the ML toplevel.  The result is compiler dependent
       
   718   and may fall back on "?" in certain situations.
       
   719 
       
   720   \item @{text "@{print f}"} uses the ML function @{text "f: string ->
       
   721   unit"} to output the result of @{text "@{make_string}"} above,
       
   722   together with the source position of the antiquotation.  The default
       
   723   output function is @{ML writeln}.
       
   724 
       
   725   \end{description}
       
   726 *}
       
   727 
       
   728 text %mlex {* The following artificial examples show how to produce
       
   729   adhoc output of ML values for debugging purposes. *}
       
   730 
       
   731 ML {*
       
   732   val x = 42;
       
   733   val y = true;
       
   734 
       
   735   writeln (@{make_string} {x = x, y = y});
       
   736 
       
   737   @{print} {x = x, y = y};
       
   738   @{print tracing} {x = x, y = y};
       
   739 *}
       
   740 
       
   741 
       
   742 section {* Canonical argument order \label{sec:canonical-argument-order} *}
       
   743 
       
   744 text {* Standard ML is a language in the tradition of @{text
       
   745   "\<lambda>"}-calculus and \emph{higher-order functional programming},
       
   746   similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical
       
   747   languages.  Getting acquainted with the native style of representing
       
   748   functions in that setting can save a lot of extra boiler-plate of
       
   749   redundant shuffling of arguments, auxiliary abstractions etc.
       
   750 
       
   751   Functions are usually \emph{curried}: the idea of turning arguments
       
   752   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of
       
   753   type @{text "\<tau>"} is represented by the iterated function space
       
   754   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known
       
   755   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried
       
   756   version fits more smoothly into the basic calculus.\footnote{The
       
   757   difference is even more significant in higher-order logic, because
       
   758   the redundant tuple structure needs to be accommodated by formal
       
   759   reasoning.}
       
   760 
       
   761   Currying gives some flexiblity due to \emph{partial application}.  A
       
   762   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}
       
   763   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function
       
   764   etc.  How well this works in practice depends on the order of
       
   765   arguments.  In the worst case, arguments are arranged erratically,
       
   766   and using a function in a certain situation always requires some
       
   767   glue code.  Thus we would get exponentially many oppurtunities to
       
   768   decorate the code with meaningless permutations of arguments.
       
   769 
       
   770   This can be avoided by \emph{canonical argument order}, which
       
   771   observes certain standard patterns and minimizes adhoc permutations
       
   772   in their application.  In Isabelle/ML, large portions of text can be
       
   773   written without auxiliary operations like @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times>
       
   774   \<alpha>"} or @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} (the latter not
       
   775   present in the Isabelle/ML library).
       
   776 
       
   777   \medskip The basic idea is that arguments that vary less are moved
       
   778   further to the left than those that vary more.  Two particularly
       
   779   important categories of functions are \emph{selectors} and
       
   780   \emph{updates}.
       
   781 
       
   782   The subsequent scheme is based on a hypothetical set-like container
       
   783   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both
       
   784   the names and types of the associated operations are canonical for
       
   785   Isabelle/ML.
       
   786 
       
   787   \begin{center}
       
   788   \begin{tabular}{ll}
       
   789   kind & canonical name and type \\\hline
       
   790   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\
       
   791   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\
       
   792   \end{tabular}
       
   793   \end{center}
       
   794 
       
   795   Given a container @{text "B: \<beta>"}, the partially applied @{text
       
   796   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and
       
   797   thus represents the intended denotation directly.  It is customary
       
   798   to pass the abstract predicate to further operations, not the
       
   799   concrete container.  The argument order makes it easy to use other
       
   800   combinators: @{text "forall (member B) list"} will check a list of
       
   801   elements for membership in @{text "B"} etc. Often the explicit
       
   802   @{text "list"} is pointless and can be contracted to @{text "forall
       
   803   (member B)"} to get directly a predicate again.
       
   804 
       
   805   In contrast, an update operation varies the container, so it moves
       
   806   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to
       
   807   insert a value @{text "a"}.  These can be composed naturally as
       
   808   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward
       
   809   inversion of the composition order is due to conventional
       
   810   mathematical notation, which can be easily amended as explained
       
   811   below.
       
   812 *}
       
   813 
       
   814 
       
   815 subsection {* Forward application and composition *}
       
   816 
       
   817 text {* Regular function application and infix notation works best for
       
   818   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g
       
   819   z)"}.  The important special case of \emph{linear transformation}
       
   820   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This
       
   821   becomes hard to read and maintain if the functions are themselves
       
   822   given as complex expressions.  The notation can be significantly
       
   823   improved by introducing \emph{forward} versions of application and
       
   824   composition as follows:
       
   825 
       
   826   \medskip
       
   827   \begin{tabular}{lll}
       
   828   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\
       
   829   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\
       
   830   \end{tabular}
       
   831   \medskip
       
   832 
       
   833   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or
       
   834   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text
       
   835   "x"}.
       
   836 
       
   837   \medskip There is an additional set of combinators to accommodate
       
   838   multiple results (via pairs) that are passed on as multiple
       
   839   arguments (via currying).
       
   840 
       
   841   \medskip
       
   842   \begin{tabular}{lll}
       
   843   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\
       
   844   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\
       
   845   \end{tabular}
       
   846   \medskip
       
   847 *}
       
   848 
       
   849 text %mlref {*
       
   850   \begin{mldecls}
       
   851   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
       
   852   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
       
   853   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
       
   854   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
       
   855   \end{mldecls}
       
   856 *}
       
   857 
       
   858 
       
   859 subsection {* Canonical iteration *}
       
   860 
       
   861 text {* As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be
       
   862   understood as update on a configuration of type @{text "\<beta>"},
       
   863   parametrized by arguments of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}
       
   864   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates
       
   865   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a
       
   866   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f a\<^sub>n"}.
       
   867   The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.
       
   868   It can be applied to an initial configuration @{text "b: \<beta>"} to
       
   869   start the iteration over the given list of arguments: each @{text
       
   870   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a
       
   871   cumulative configuration.
       
   872 
       
   873   The @{text fold} combinator in Isabelle/ML lifts a function @{text
       
   874   "f"} as above to its iterated version over a list of arguments.
       
   875   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates
       
   876   over a list of lists as expected.
       
   877 
       
   878   The variant @{text "fold_rev"} works inside-out over the list of
       
   879   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.
       
   880 
       
   881   The @{text "fold_map"} combinator essentially performs @{text
       
   882   "fold"} and @{text "map"} simultaneously: each application of @{text
       
   883   "f"} produces an updated configuration together with a side-result;
       
   884   the iteration collects all such side-results as a separate list.
       
   885 *}
       
   886 
       
   887 text %mlref {*
       
   888   \begin{mldecls}
       
   889   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
       
   890   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
       
   891   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
       
   892   \end{mldecls}
       
   893 
       
   894   \begin{description}
       
   895 
       
   896   \item @{ML fold}~@{text f} lifts the parametrized update function
       
   897   @{text "f"} to a list of parameters.
       
   898 
       
   899   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text
       
   900   "f"}, but works inside-out.
       
   901 
       
   902   \item @{ML fold_map}~@{text "f"} lifts the parametrized update
       
   903   function @{text "f"} (with side-result) to a list of parameters and
       
   904   cumulative side-results.
       
   905 
       
   906   \end{description}
       
   907 
       
   908   \begin{warn}
       
   909   The literature on functional programming provides a multitude of
       
   910   combinators called @{text "foldl"}, @{text "foldr"} etc.  SML97
       
   911   provides its own variations as @{ML List.foldl} and @{ML
       
   912   List.foldr}, while the classic Isabelle library also has the
       
   913   historic @{ML Library.foldl} and @{ML Library.foldr}.  To avoid
       
   914   unnecessary complication and confusion, all these historical
       
   915   versions should be ignored, and @{ML fold} (or @{ML fold_rev}) used
       
   916   exclusively.
       
   917   \end{warn}
       
   918 *}
       
   919 
       
   920 text %mlex {* The following example shows how to fill a text buffer
       
   921   incrementally by adding strings, either individually or from a given
       
   922   list.
       
   923 *}
       
   924 
       
   925 ML {*
       
   926   val s =
       
   927     Buffer.empty
       
   928     |> Buffer.add "digits: "
       
   929     |> fold (Buffer.add o string_of_int) (0 upto 9)
       
   930     |> Buffer.content;
       
   931 
       
   932   @{assert} (s = "digits: 0123456789");
       
   933 *}
       
   934 
       
   935 text {* Note how @{ML "fold (Buffer.add o string_of_int)"} above saves
       
   936   an extra @{ML "map"} over the given list.  This kind of peephole
       
   937   optimization reduces both the code size and the tree structures in
       
   938   memory (``deforestation''), but it requires some practice to read
       
   939   and write fluently.
       
   940 
       
   941   \medskip The next example elaborates the idea of canonical
       
   942   iteration, demonstrating fast accumulation of tree content using a
       
   943   text buffer.
       
   944 *}
       
   945 
       
   946 ML {*
       
   947   datatype tree = Text of string | Elem of string * tree list;
       
   948 
       
   949   fun slow_content (Text txt) = txt
       
   950     | slow_content (Elem (name, ts)) =
       
   951         "<" ^ name ^ ">" ^
       
   952         implode (map slow_content ts) ^
       
   953         "</" ^ name ^ ">"
       
   954 
       
   955   fun add_content (Text txt) = Buffer.add txt
       
   956     | add_content (Elem (name, ts)) =
       
   957         Buffer.add ("<" ^ name ^ ">") #>
       
   958         fold add_content ts #>
       
   959         Buffer.add ("</" ^ name ^ ">");
       
   960 
       
   961   fun fast_content tree =
       
   962     Buffer.empty |> add_content tree |> Buffer.content;
       
   963 *}
       
   964 
       
   965 text {* The slow part of @{ML slow_content} is the @{ML implode} of
       
   966   the recursive results, because it copies previously produced strings
       
   967   again.
       
   968 
       
   969   The incremental @{ML add_content} avoids this by operating on a
       
   970   buffer that is passed through in a linear fashion.  Using @{ML_text
       
   971   "#>"} and contraction over the actual buffer argument saves some
       
   972   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}
       
   973   invocations with concatenated strings could have been split into
       
   974   smaller parts, but this would have obfuscated the source without
       
   975   making a big difference in allocations.  Here we have done some
       
   976   peephole-optimization for the sake of readability.
       
   977 
       
   978   Another benefit of @{ML add_content} is its ``open'' form as a
       
   979   function on buffers that can be continued in further linear
       
   980   transformations, folding etc.  Thus it is more compositional than
       
   981   the naive @{ML slow_content}.  As realistic example, compare the
       
   982   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer
       
   983   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
       
   984 
       
   985   Note that @{ML fast_content} above is only defined as example.  In
       
   986   many practical situations, it is customary to provide the
       
   987   incremental @{ML add_content} only and leave the initialization and
       
   988   termination to the concrete application by the user.
       
   989 *}
       
   990 
       
   991 
       
   992 section {* Message output channels \label{sec:message-channels} *}
       
   993 
       
   994 text {* Isabelle provides output channels for different kinds of
       
   995   messages: regular output, high-volume tracing information, warnings,
       
   996   and errors.
       
   997 
       
   998   Depending on the user interface involved, these messages may appear
       
   999   in different text styles or colours.  The standard output for
       
  1000   terminal sessions prefixes each line of warnings by @{verbatim
       
  1001   "###"} and errors by @{verbatim "***"}, but leaves anything else
       
  1002   unchanged.
       
  1003 
       
  1004   Messages are associated with the transaction context of the running
       
  1005   Isar command.  This enables the front-end to manage commands and
       
  1006   resulting messages together.  For example, after deleting a command
       
  1007   from a given theory document version, the corresponding message
       
  1008   output can be retracted from the display.
       
  1009 *}
       
  1010 
       
  1011 text %mlref {*
       
  1012   \begin{mldecls}
       
  1013   @{index_ML writeln: "string -> unit"} \\
       
  1014   @{index_ML tracing: "string -> unit"} \\
       
  1015   @{index_ML warning: "string -> unit"} \\
       
  1016   @{index_ML error: "string -> 'a"} \\
       
  1017   \end{mldecls}
       
  1018 
       
  1019   \begin{description}
       
  1020 
       
  1021   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular
       
  1022   message.  This is the primary message output operation of Isabelle
       
  1023   and should be used by default.
       
  1024 
       
  1025   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special
       
  1026   tracing message, indicating potential high-volume output to the
       
  1027   front-end (hundreds or thousands of messages issued by a single
       
  1028   command).  The idea is to allow the user-interface to downgrade the
       
  1029   quality of message display to achieve higher throughput.
       
  1030 
       
  1031   Note that the user might have to take special actions to see tracing
       
  1032   output, e.g.\ switch to a different output window.  So this channel
       
  1033   should not be used for regular output.
       
  1034 
       
  1035   \item @{ML warning}~@{text "text"} outputs @{text "text"} as
       
  1036   warning, which typically means some extra emphasis on the front-end
       
  1037   side (color highlighting, icons, etc.).
       
  1038 
       
  1039   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text
       
  1040   "text"} and thus lets the Isar toplevel print @{text "text"} on the
       
  1041   error channel, which typically means some extra emphasis on the
       
  1042   front-end side (color highlighting, icons, etc.).
       
  1043 
       
  1044   This assumes that the exception is not handled before the command
       
  1045   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a
       
  1046   perfectly legal alternative: it means that the error is absorbed
       
  1047   without any message output.
       
  1048 
       
  1049   \begin{warn}
       
  1050   The actual error channel is accessed via @{ML Output.error_message}, but
       
  1051   the old interaction protocol of Proof~General \emph{crashes} if that
       
  1052   function is used in regular ML code: error output and toplevel
       
  1053   command failure always need to coincide in classic TTY interaction.
       
  1054   \end{warn}
       
  1055 
       
  1056   \end{description}
       
  1057 
       
  1058   \begin{warn}
       
  1059   Regular Isabelle/ML code should output messages exclusively by the
       
  1060   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}
       
  1061   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in
       
  1062   the presence of parallel and asynchronous processing of Isabelle
       
  1063   theories.  Such raw output might be displayed by the front-end in
       
  1064   some system console log, with a low chance that the user will ever
       
  1065   see it.  Moreover, as a genuine side-effect on global process
       
  1066   channels, there is no proper way to retract output when Isar command
       
  1067   transactions are reset by the system.
       
  1068   \end{warn}
       
  1069 
       
  1070   \begin{warn}
       
  1071   The message channels should be used in a message-oriented manner.
       
  1072   This means that multi-line output that logically belongs together is
       
  1073   issued by a \emph{single} invocation of @{ML writeln} etc.\ with the
       
  1074   functional concatenation of all message constituents.
       
  1075   \end{warn}
       
  1076 *}
       
  1077 
       
  1078 text %mlex {* The following example demonstrates a multi-line
       
  1079   warning.  Note that in some situations the user sees only the first
       
  1080   line, so the most important point should be made first.
       
  1081 *}
       
  1082 
       
  1083 ML_command {*
       
  1084   warning (cat_lines
       
  1085    ["Beware the Jabberwock, my son!",
       
  1086     "The jaws that bite, the claws that catch!",
       
  1087     "Beware the Jubjub Bird, and shun",
       
  1088     "The frumious Bandersnatch!"]);
       
  1089 *}
       
  1090 
       
  1091 
       
  1092 section {* Exceptions \label{sec:exceptions} *}
       
  1093 
       
  1094 text {* The Standard ML semantics of strict functional evaluation
       
  1095   together with exceptions is rather well defined, but some delicate
       
  1096   points need to be observed to avoid that ML programs go wrong
       
  1097   despite static type-checking.  Exceptions in Isabelle/ML are
       
  1098   subsequently categorized as follows.
       
  1099 
       
  1100   \paragraph{Regular user errors.}  These are meant to provide
       
  1101   informative feedback about malformed input etc.
       
  1102 
       
  1103   The \emph{error} function raises the corresponding \emph{ERROR}
       
  1104   exception, with a plain text message as argument.  \emph{ERROR}
       
  1105   exceptions can be handled internally, in order to be ignored, turned
       
  1106   into other exceptions, or cascaded by appending messages.  If the
       
  1107   corresponding Isabelle/Isar command terminates with an \emph{ERROR}
       
  1108   exception state, the toplevel will print the result on the error
       
  1109   channel (see \secref{sec:message-channels}).
       
  1110 
       
  1111   It is considered bad style to refer to internal function names or
       
  1112   values in ML source notation in user error messages.
       
  1113 
       
  1114   Grammatical correctness of error messages can be improved by
       
  1115   \emph{omitting} final punctuation: messages are often concatenated
       
  1116   or put into a larger context (e.g.\ augmented with source position).
       
  1117   By not insisting in the final word at the origin of the error, the
       
  1118   system can perform its administrative tasks more easily and
       
  1119   robustly.
       
  1120 
       
  1121   \paragraph{Program failures.}  There is a handful of standard
       
  1122   exceptions that indicate general failure situations, or failures of
       
  1123   core operations on logical entities (types, terms, theorems,
       
  1124   theories, see \chref{ch:logic}).
       
  1125 
       
  1126   These exceptions indicate a genuine breakdown of the program, so the
       
  1127   main purpose is to determine quickly what has happened where.
       
  1128   Traditionally, the (short) exception message would include the name
       
  1129   of an ML function, although this is no longer necessary, because the
       
  1130   ML runtime system prints a detailed source position of the
       
  1131   corresponding @{ML_text raise} keyword.
       
  1132 
       
  1133   \medskip User modules can always introduce their own custom
       
  1134   exceptions locally, e.g.\ to organize internal failures robustly
       
  1135   without overlapping with existing exceptions.  Exceptions that are
       
  1136   exposed in module signatures require extra care, though, and should
       
  1137   \emph{not} be introduced by default.  Surprise by users of a module
       
  1138   can be often minimized by using plain user errors instead.
       
  1139 
       
  1140   \paragraph{Interrupts.}  These indicate arbitrary system events:
       
  1141   both the ML runtime system and the Isabelle/ML infrastructure signal
       
  1142   various exceptional situations by raising the special
       
  1143   \emph{Interrupt} exception in user code.
       
  1144 
       
  1145   This is the one and only way that physical events can intrude an
       
  1146   Isabelle/ML program.  Such an interrupt can mean out-of-memory,
       
  1147   stack overflow, timeout, internal signaling of threads, or the user
       
  1148   producing a console interrupt manually etc.  An Isabelle/ML program
       
  1149   that intercepts interrupts becomes dependent on physical effects of
       
  1150   the environment.  Even worse, exception handling patterns that are
       
  1151   too general by accident, e.g.\ by mispelled exception constructors,
       
  1152   will cover interrupts unintentionally and thus render the program
       
  1153   semantics ill-defined.
       
  1154 
       
  1155   Note that the Interrupt exception dates back to the original SML90
       
  1156   language definition.  It was excluded from the SML97 version to
       
  1157   avoid its malign impact on ML program semantics, but without
       
  1158   providing a viable alternative.  Isabelle/ML recovers physical
       
  1159   interruptibility (which is an indispensable tool to implement
       
  1160   managed evaluation of command transactions), but requires user code
       
  1161   to be strictly transparent wrt.\ interrupts.
       
  1162 
       
  1163   \begin{warn}
       
  1164   Isabelle/ML user code needs to terminate promptly on interruption,
       
  1165   without guessing at its meaning to the system infrastructure.
       
  1166   Temporary handling of interrupts for cleanup of global resources
       
  1167   etc.\ needs to be followed immediately by re-raising of the original
       
  1168   exception.
       
  1169   \end{warn}
       
  1170 *}
       
  1171 
       
  1172 text %mlref {*
       
  1173   \begin{mldecls}
       
  1174   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
       
  1175   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
       
  1176   @{index_ML_exception ERROR: string} \\
       
  1177   @{index_ML_exception Fail: string} \\
       
  1178   @{index_ML Exn.is_interrupt: "exn -> bool"} \\
       
  1179   @{index_ML reraise: "exn -> 'a"} \\
       
  1180   @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\
       
  1181   \end{mldecls}
       
  1182 
       
  1183   \begin{description}
       
  1184 
       
  1185   \item @{ML try}~@{text "f x"} makes the partiality of evaluating
       
  1186   @{text "f x"} explicit via the option datatype.  Interrupts are
       
  1187   \emph{not} handled here, i.e.\ this form serves as safe replacement
       
  1188   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f
       
  1189   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in
       
  1190   books about SML97, not Isabelle/ML.
       
  1191 
       
  1192   \item @{ML can} is similar to @{ML try} with more abstract result.
       
  1193 
       
  1194   \item @{ML ERROR}~@{text "msg"} represents user errors; this
       
  1195   exception is normally raised indirectly via the @{ML error} function
       
  1196   (see \secref{sec:message-channels}).
       
  1197 
       
  1198   \item @{ML Fail}~@{text "msg"} represents general program failures.
       
  1199 
       
  1200   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without
       
  1201   mentioning concrete exception constructors in user code.  Handled
       
  1202   interrupts need to be re-raised promptly!
       
  1203 
       
  1204   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}
       
  1205   while preserving its implicit position information (if possible,
       
  1206   depending on the ML platform).
       
  1207 
       
  1208   \item @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~@{text
       
  1209   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing
       
  1210   a full trace of its stack of nested exceptions (if possible,
       
  1211   depending on the ML platform).
       
  1212 
       
  1213   Inserting @{ML Runtime.exn_trace} into ML code temporarily is
       
  1214   useful for debugging, but not suitable for production code.
       
  1215 
       
  1216   \end{description}
       
  1217 *}
       
  1218 
       
  1219 text %mlantiq {*
       
  1220   \begin{matharray}{rcl}
       
  1221   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\
       
  1222   \end{matharray}
       
  1223 
       
  1224   \begin{description}
       
  1225 
       
  1226   \item @{text "@{assert}"} inlines a function
       
  1227   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is
       
  1228   @{ML false}.  Due to inlining the source position of failed
       
  1229   assertions is included in the error output.
       
  1230 
       
  1231   \end{description}
       
  1232 *}
       
  1233 
       
  1234 
       
  1235 section {* Strings of symbols \label{sec:symbols} *}
       
  1236 
       
  1237 text {* A \emph{symbol} constitutes the smallest textual unit in
       
  1238   Isabelle/ML --- raw ML characters are normally not encountered at
       
  1239   all!  Isabelle strings consist of a sequence of symbols, represented
       
  1240   as a packed string or an exploded list of strings.  Each symbol is
       
  1241   in itself a small string, which has either one of the following
       
  1242   forms:
       
  1243 
       
  1244   \begin{enumerate}
       
  1245 
       
  1246   \item a single ASCII character ``@{text "c"}'', for example
       
  1247   ``\verb,a,'',
       
  1248 
       
  1249   \item a codepoint according to UTF8 (non-ASCII byte sequence),
       
  1250 
       
  1251   \item a regular symbol ``\verb,\,\verb,<,@{text "ident"}\verb,>,'',
       
  1252   for example ``\verb,\,\verb,<alpha>,'',
       
  1253 
       
  1254   \item a control symbol ``\verb,\,\verb,<^,@{text "ident"}\verb,>,'',
       
  1255   for example ``\verb,\,\verb,<^bold>,'',
       
  1256 
       
  1257   \item a raw symbol ``\verb,\,\verb,<^raw:,@{text text}\verb,>,''
       
  1258   where @{text text} consists of printable characters excluding
       
  1259   ``\verb,.,'' and ``\verb,>,'', for example
       
  1260   ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'',
       
  1261 
       
  1262   \item a numbered raw control symbol ``\verb,\,\verb,<^raw,@{text
       
  1263   n}\verb,>, where @{text n} consists of digits, for example
       
  1264   ``\verb,\,\verb,<^raw42>,''.
       
  1265 
       
  1266   \end{enumerate}
       
  1267 
       
  1268   The @{text "ident"} syntax for symbol names is @{text "letter
       
  1269   (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text
       
  1270   "digit = 0..9"}.  There are infinitely many regular symbols and
       
  1271   control symbols, but a fixed collection of standard symbols is
       
  1272   treated specifically.  For example, ``\verb,\,\verb,<alpha>,'' is
       
  1273   classified as a letter, which means it may occur within regular
       
  1274   Isabelle identifiers.
       
  1275 
       
  1276   The character set underlying Isabelle symbols is 7-bit ASCII, but
       
  1277   8-bit character sequences are passed-through unchanged.  Unicode/UCS
       
  1278   data in UTF-8 encoding is processed in a non-strict fashion, such
       
  1279   that well-formed code sequences are recognized
       
  1280   accordingly.\footnote{Note that ISO-Latin-1 differs from UTF-8 only
       
  1281   in some special punctuation characters that even have replacements
       
  1282   within the standard collection of Isabelle symbols.  Text consisting
       
  1283   of ASCII plus accented letters can be processed in either encoding.}
       
  1284   Unicode provides its own collection of mathematical symbols, but
       
  1285   within the core Isabelle/ML world there is no link to the standard
       
  1286   collection of Isabelle regular symbols.
       
  1287 
       
  1288   \medskip Output of Isabelle symbols depends on the print mode
       
  1289   \cite{isabelle-isar-ref}.  For example, the standard {\LaTeX}
       
  1290   setup of the Isabelle document preparation system would present
       
  1291   ``\verb,\,\verb,<alpha>,'' as @{text "\<alpha>"}, and
       
  1292   ``\verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as @{text "\<^bold>\<alpha>"}.
       
  1293   On-screen rendering usually works by mapping a finite subset of
       
  1294   Isabelle symbols to suitable Unicode characters.
       
  1295 *}
       
  1296 
       
  1297 text %mlref {*
       
  1298   \begin{mldecls}
       
  1299   @{index_ML_type "Symbol.symbol": string} \\
       
  1300   @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
       
  1301   @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
       
  1302   @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
       
  1303   @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
       
  1304   @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
       
  1305   \end{mldecls}
       
  1306   \begin{mldecls}
       
  1307   @{index_ML_type "Symbol.sym"} \\
       
  1308   @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
       
  1309   \end{mldecls}
       
  1310 
       
  1311   \begin{description}
       
  1312 
       
  1313   \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle
       
  1314   symbols.
       
  1315 
       
  1316   \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list
       
  1317   from the packed form.  This function supersedes @{ML
       
  1318   "String.explode"} for virtually all purposes of manipulating text in
       
  1319   Isabelle!\footnote{The runtime overhead for exploded strings is
       
  1320   mainly that of the list structure: individual symbols that happen to
       
  1321   be a singleton string do not require extra memory in Poly/ML.}
       
  1322 
       
  1323   \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
       
  1324   "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard
       
  1325   symbols according to fixed syntactic conventions of Isabelle, cf.\
       
  1326   \cite{isabelle-isar-ref}.
       
  1327 
       
  1328   \item Type @{ML_type "Symbol.sym"} is a concrete datatype that
       
  1329   represents the different kinds of symbols explicitly, with
       
  1330   constructors @{ML "Symbol.Char"}, @{ML "Symbol.Sym"}, @{ML
       
  1331   "Symbol.UTF8"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"}.
       
  1332 
       
  1333   \item @{ML "Symbol.decode"} converts the string representation of a
       
  1334   symbol into the datatype version.
       
  1335 
       
  1336   \end{description}
       
  1337 
       
  1338   \paragraph{Historical note.} In the original SML90 standard the
       
  1339   primitive ML type @{ML_type char} did not exists, and @{ML_text
       
  1340   "explode: string -> string list"} produced a list of singleton
       
  1341   strings like @{ML "raw_explode: string -> string list"} in
       
  1342   Isabelle/ML today.  When SML97 came out, Isabelle did not adopt its
       
  1343   somewhat anachronistic 8-bit or 16-bit characters, but the idea of
       
  1344   exploding a string into a list of small strings was extended to
       
  1345   ``symbols'' as explained above.  Thus Isabelle sources can refer to
       
  1346   an infinite store of user-defined symbols, without having to worry
       
  1347   about the multitude of Unicode encodings that have emerged over the
       
  1348   years.  *}
       
  1349 
       
  1350 
       
  1351 section {* Basic data types *}
       
  1352 
       
  1353 text {* The basis library proposal of SML97 needs to be treated with
       
  1354   caution.  Many of its operations simply do not fit with important
       
  1355   Isabelle/ML conventions (like ``canonical argument order'', see
       
  1356   \secref{sec:canonical-argument-order}), others cause problems with
       
  1357   the parallel evaluation model of Isabelle/ML (such as @{ML
       
  1358   TextIO.print} or @{ML OS.Process.system}).
       
  1359 
       
  1360   Subsequently we give a brief overview of important operations on
       
  1361   basic ML data types.
       
  1362 *}
       
  1363 
       
  1364 
       
  1365 subsection {* Characters *}
       
  1366 
       
  1367 text %mlref {*
       
  1368   \begin{mldecls}
       
  1369   @{index_ML_type char} \\
       
  1370   \end{mldecls}
       
  1371 
       
  1372   \begin{description}
       
  1373 
       
  1374   \item Type @{ML_type char} is \emph{not} used.  The smallest textual
       
  1375   unit in Isabelle is represented as a ``symbol'' (see
       
  1376   \secref{sec:symbols}).
       
  1377 
       
  1378   \end{description}
       
  1379 *}
       
  1380 
       
  1381 
       
  1382 subsection {* Strings *}
       
  1383 
       
  1384 text %mlref {*
       
  1385   \begin{mldecls}
       
  1386   @{index_ML_type string} \\
       
  1387   \end{mldecls}
       
  1388 
       
  1389   \begin{description}
       
  1390 
       
  1391   \item Type @{ML_type string} represents immutable vectors of 8-bit
       
  1392   characters.  There are operations in SML to convert back and forth
       
  1393   to actual byte vectors, which are seldom used.
       
  1394 
       
  1395   This historically important raw text representation is used for
       
  1396   Isabelle-specific purposes with the following implicit substructures
       
  1397   packed into the string content:
       
  1398 
       
  1399   \begin{enumerate}
       
  1400 
       
  1401   \item sequence of Isabelle symbols (see also \secref{sec:symbols}),
       
  1402   with @{ML Symbol.explode} as key operation;
       
  1403 
       
  1404   \item XML tree structure via YXML (see also \cite{isabelle-sys}),
       
  1405   with @{ML YXML.parse_body} as key operation.
       
  1406 
       
  1407   \end{enumerate}
       
  1408 
       
  1409   Note that Isabelle/ML string literals may refer Isabelle symbols
       
  1410   like ``\verb,\,\verb,<alpha>,'' natively, \emph{without} escaping
       
  1411   the backslash.  This is a consequence of Isabelle treating all
       
  1412   source text as strings of symbols, instead of raw characters.
       
  1413 
       
  1414   \end{description}
       
  1415 *}
       
  1416 
       
  1417 text %mlex {* The subsequent example illustrates the difference of
       
  1418   physical addressing of bytes versus logical addressing of symbols in
       
  1419   Isabelle strings.
       
  1420 *}
       
  1421 
       
  1422 ML_val {*
       
  1423   val s = "\<A>";
       
  1424 
       
  1425   @{assert} (length (Symbol.explode s) = 1);
       
  1426   @{assert} (size s = 4);
       
  1427 *}
       
  1428 
       
  1429 text {* Note that in Unicode renderings of the symbol @{text "\<A>"},
       
  1430   variations of encodings like UTF-8 or UTF-16 pose delicate questions
       
  1431   about the multi-byte representations its codepoint, which is outside
       
  1432   of the 16-bit address space of the original Unicode standard from
       
  1433   the 1990-ies.  In Isabelle/ML it is just ``\verb,\,\verb,<A>,''
       
  1434   literally, using plain ASCII characters beyond any doubts. *}
       
  1435 
       
  1436 
       
  1437 subsection {* Integers *}
       
  1438 
       
  1439 text %mlref {*
       
  1440   \begin{mldecls}
       
  1441   @{index_ML_type int} \\
       
  1442   \end{mldecls}
       
  1443 
       
  1444   \begin{description}
       
  1445 
       
  1446   \item Type @{ML_type int} represents regular mathematical integers,
       
  1447   which are \emph{unbounded}.  Overflow never happens in
       
  1448   practice.\footnote{The size limit for integer bit patterns in memory
       
  1449   is 64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.}
       
  1450   This works uniformly for all supported ML platforms (Poly/ML and
       
  1451   SML/NJ).
       
  1452 
       
  1453   Literal integers in ML text are forced to be of this one true
       
  1454   integer type --- adhoc overloading of SML97 is disabled.
       
  1455 
       
  1456   Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
       
  1457   @{ML_structure Int}.  Structure @{ML_structure Integer} in @{file
       
  1458   "~~/src/Pure/General/integer.ML"} provides some additional
       
  1459   operations.
       
  1460 
       
  1461   \end{description}
       
  1462 *}
       
  1463 
       
  1464 
       
  1465 subsection {* Time *}
       
  1466 
       
  1467 text %mlref {*
       
  1468   \begin{mldecls}
       
  1469   @{index_ML_type Time.time} \\
       
  1470   @{index_ML seconds: "real -> Time.time"} \\
       
  1471   \end{mldecls}
       
  1472 
       
  1473   \begin{description}
       
  1474 
       
  1475   \item Type @{ML_type Time.time} represents time abstractly according
       
  1476   to the SML97 basis library definition.  This is adequate for
       
  1477   internal ML operations, but awkward in concrete time specifications.
       
  1478 
       
  1479   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text
       
  1480   "s"} (measured in seconds) into an abstract time value.  Floating
       
  1481   point numbers are easy to use as configuration options in the
       
  1482   context (see \secref{sec:config-options}) or system preferences that
       
  1483   are maintained externally.
       
  1484 
       
  1485   \end{description}
       
  1486 *}
       
  1487 
       
  1488 
       
  1489 subsection {* Options *}
       
  1490 
       
  1491 text %mlref {*
       
  1492   \begin{mldecls}
       
  1493   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
       
  1494   @{index_ML is_some: "'a option -> bool"} \\
       
  1495   @{index_ML is_none: "'a option -> bool"} \\
       
  1496   @{index_ML the: "'a option -> 'a"} \\
       
  1497   @{index_ML these: "'a list option -> 'a list"} \\
       
  1498   @{index_ML the_list: "'a option -> 'a list"} \\
       
  1499   @{index_ML the_default: "'a -> 'a option -> 'a"} \\
       
  1500   \end{mldecls}
       
  1501 *}
       
  1502 
       
  1503 text {* Apart from @{ML Option.map} most other operations defined in
       
  1504   structure @{ML_structure Option} are alien to Isabelle/ML an never
       
  1505   used.  The operations shown above are defined in @{file
       
  1506   "~~/src/Pure/General/basics.ML"}.  *}
       
  1507 
       
  1508 
       
  1509 subsection {* Lists *}
       
  1510 
       
  1511 text {* Lists are ubiquitous in ML as simple and light-weight
       
  1512   ``collections'' for many everyday programming tasks.  Isabelle/ML
       
  1513   provides important additions and improvements over operations that
       
  1514   are predefined in the SML97 library. *}
       
  1515 
       
  1516 text %mlref {*
       
  1517   \begin{mldecls}
       
  1518   @{index_ML cons: "'a -> 'a list -> 'a list"} \\
       
  1519   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
       
  1520   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
       
  1521   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
       
  1522   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
       
  1523   \end{mldecls}
       
  1524 
       
  1525   \begin{description}
       
  1526 
       
  1527   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.
       
  1528 
       
  1529   Tupled infix operators are a historical accident in Standard ML.
       
  1530   The curried @{ML cons} amends this, but it should be only used when
       
  1531   partial application is required.
       
  1532 
       
  1533   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat
       
  1534   lists as a set-like container that maintains the order of elements.
       
  1535   See @{file "~~/src/Pure/library.ML"} for the full specifications
       
  1536   (written in ML).  There are some further derived operations like
       
  1537   @{ML union} or @{ML inter}.
       
  1538 
       
  1539   Note that @{ML insert} is conservative about elements that are
       
  1540   already a @{ML member} of the list, while @{ML update} ensures that
       
  1541   the latest entry is always put in front.  The latter discipline is
       
  1542   often more appropriate in declarations of context data
       
  1543   (\secref{sec:context-data}) that are issued by the user in Isar
       
  1544   source: later declarations take precedence over earlier ones.
       
  1545 
       
  1546   \end{description}
       
  1547 *}
       
  1548 
       
  1549 text %mlex {* Using canonical @{ML fold} together with @{ML cons} (or
       
  1550   similar standard operations) alternates the orientation of data.
       
  1551   The is quite natural and should not be altered forcible by inserting
       
  1552   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can
       
  1553   be used in the few situations, where alternation should be
       
  1554   prevented.
       
  1555 *}
       
  1556 
       
  1557 ML {*
       
  1558   val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
       
  1559 
       
  1560   val list1 = fold cons items [];
       
  1561   @{assert} (list1 = rev items);
       
  1562 
       
  1563   val list2 = fold_rev cons items [];
       
  1564   @{assert} (list2 = items);
       
  1565 *}
       
  1566 
       
  1567 text {* The subsequent example demonstrates how to \emph{merge} two
       
  1568   lists in a natural way. *}
       
  1569 
       
  1570 ML {*
       
  1571   fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;
       
  1572 *}
       
  1573 
       
  1574 text {* Here the first list is treated conservatively: only the new
       
  1575   elements from the second list are inserted.  The inside-out order of
       
  1576   insertion via @{ML fold_rev} attempts to preserve the order of
       
  1577   elements in the result.
       
  1578 
       
  1579   This way of merging lists is typical for context data
       
  1580   (\secref{sec:context-data}).  See also @{ML merge} as defined in
       
  1581   @{file "~~/src/Pure/library.ML"}.
       
  1582 *}
       
  1583 
       
  1584 
       
  1585 subsection {* Association lists *}
       
  1586 
       
  1587 text {* The operations for association lists interpret a concrete list
       
  1588   of pairs as a finite function from keys to values.  Redundant
       
  1589   representations with multiple occurrences of the same key are
       
  1590   implicitly normalized: lookup and update only take the first
       
  1591   occurrence into account.
       
  1592 *}
       
  1593 
       
  1594 text {*
       
  1595   \begin{mldecls}
       
  1596   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
       
  1597   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
       
  1598   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
       
  1599   \end{mldecls}
       
  1600 
       
  1601   \begin{description}
       
  1602 
       
  1603   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}
       
  1604   implement the main ``framework operations'' for mappings in
       
  1605   Isabelle/ML, following standard conventions for their names and
       
  1606   types.
       
  1607 
       
  1608   Note that a function called @{text lookup} is obliged to express its
       
  1609   partiality via an explicit option element.  There is no choice to
       
  1610   raise an exception, without changing the name to something like
       
  1611   @{text "the_element"} or @{text "get"}.
       
  1612 
       
  1613   The @{text "defined"} operation is essentially a contraction of @{ML
       
  1614   is_some} and @{text "lookup"}, but this is sufficiently frequent to
       
  1615   justify its independent existence.  This also gives the
       
  1616   implementation some opportunity for peep-hole optimization.
       
  1617 
       
  1618   \end{description}
       
  1619 
       
  1620   Association lists are adequate as simple and light-weight
       
  1621   implementation of finite mappings in many practical situations.  A
       
  1622   more heavy-duty table structure is defined in @{file
       
  1623   "~~/src/Pure/General/table.ML"}; that version scales easily to
       
  1624   thousands or millions of elements.
       
  1625 *}
       
  1626 
       
  1627 
       
  1628 subsection {* Unsynchronized references *}
       
  1629 
       
  1630 text %mlref {*
       
  1631   \begin{mldecls}
       
  1632   @{index_ML_type "'a Unsynchronized.ref"} \\
       
  1633   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
       
  1634   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
       
  1635   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
       
  1636   \end{mldecls}
       
  1637 *}
       
  1638 
       
  1639 text {* Due to ubiquitous parallelism in Isabelle/ML (see also
       
  1640   \secref{sec:multi-threading}), the mutable reference cells of
       
  1641   Standard ML are notorious for causing problems.  In a highly
       
  1642   parallel system, both correctness \emph{and} performance are easily
       
  1643   degraded when using mutable data.
       
  1644 
       
  1645   The unwieldy name of @{ML Unsynchronized.ref} for the constructor
       
  1646   for references in Isabelle/ML emphasizes the inconveniences caused by
       
  1647   mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are
       
  1648   unchanged, but should be used with special precautions, say in a
       
  1649   strictly local situation that is guaranteed to be restricted to
       
  1650   sequential evaluation --- now and in the future.
       
  1651 
       
  1652   \begin{warn}
       
  1653   Never @{ML_text "open Unsynchronized"}, not even in a local scope!
       
  1654   Pretending that mutable state is no problem is a very bad idea.
       
  1655   \end{warn}
       
  1656 *}
       
  1657 
       
  1658 
       
  1659 section {* Thread-safe programming \label{sec:multi-threading} *}
       
  1660 
       
  1661 text {* Multi-threaded execution has become an everyday reality in
       
  1662   Isabelle since Poly/ML 5.2.1 and Isabelle2008.  Isabelle/ML provides
       
  1663   implicit and explicit parallelism by default, and there is no way
       
  1664   for user-space tools to ``opt out''.  ML programs that are purely
       
  1665   functional, output messages only via the official channels
       
  1666   (\secref{sec:message-channels}), and do not intercept interrupts
       
  1667   (\secref{sec:exceptions}) can participate in the multi-threaded
       
  1668   environment immediately without further ado.
       
  1669 
       
  1670   More ambitious tools with more fine-grained interaction with the
       
  1671   environment need to observe the principles explained below.
       
  1672 *}
       
  1673 
       
  1674 
       
  1675 subsection {* Multi-threading with shared memory *}
       
  1676 
       
  1677 text {* Multiple threads help to organize advanced operations of the
       
  1678   system, such as real-time conditions on command transactions,
       
  1679   sub-components with explicit communication, general asynchronous
       
  1680   interaction etc.  Moreover, parallel evaluation is a prerequisite to
       
  1681   make adequate use of the CPU resources that are available on
       
  1682   multi-core systems.\footnote{Multi-core computing does not mean that
       
  1683   there are ``spare cycles'' to be wasted.  It means that the
       
  1684   continued exponential speedup of CPU performance due to ``Moore's
       
  1685   Law'' follows different rules: clock frequency has reached its peak
       
  1686   around 2005, and applications need to be parallelized in order to
       
  1687   avoid a perceived loss of performance.  See also
       
  1688   \cite{Sutter:2005}.}
       
  1689 
       
  1690   Isabelle/Isar exploits the inherent structure of theories and proofs
       
  1691   to support \emph{implicit parallelism} to a large extent.  LCF-style
       
  1692   theorem provides almost ideal conditions for that, see also
       
  1693   \cite{Wenzel:2009}.  This means, significant parts of theory and
       
  1694   proof checking is parallelized by default.  In Isabelle2013, a
       
  1695   maximum speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be
       
  1696   expected.
       
  1697 
       
  1698   \medskip ML threads lack the memory protection of separate
       
  1699   processes, and operate concurrently on shared heap memory.  This has
       
  1700   the advantage that results of independent computations are directly
       
  1701   available to other threads: abstract values can be passed without
       
  1702   copying or awkward serialization that is typically required for
       
  1703   separate processes.
       
  1704 
       
  1705   To make shared-memory multi-threading work robustly and efficiently,
       
  1706   some programming guidelines need to be observed.  While the ML
       
  1707   system is responsible to maintain basic integrity of the
       
  1708   representation of ML values in memory, the application programmer
       
  1709   needs to ensure that multi-threaded execution does not break the
       
  1710   intended semantics.
       
  1711 
       
  1712   \begin{warn}
       
  1713   To participate in implicit parallelism, tools need to be
       
  1714   thread-safe.  A single ill-behaved tool can affect the stability and
       
  1715   performance of the whole system.
       
  1716   \end{warn}
       
  1717 
       
  1718   Apart from observing the principles of thread-safeness passively,
       
  1719   advanced tools may also exploit parallelism actively, e.g.\ by using
       
  1720   ``future values'' (\secref{sec:futures}) or the more basic library
       
  1721   functions for parallel list operations (\secref{sec:parlist}).
       
  1722 
       
  1723   \begin{warn}
       
  1724   Parallel computing resources are managed centrally by the
       
  1725   Isabelle/ML infrastructure.  User programs must not fork their own
       
  1726   ML threads to perform computations.
       
  1727   \end{warn}
       
  1728 *}
       
  1729 
       
  1730 
       
  1731 subsection {* Critical shared resources *}
       
  1732 
       
  1733 text {* Thread-safeness is mainly concerned about concurrent
       
  1734   read/write access to shared resources, which are outside the purely
       
  1735   functional world of ML.  This covers the following in particular.
       
  1736 
       
  1737   \begin{itemize}
       
  1738 
       
  1739   \item Global references (or arrays), i.e.\ mutable memory cells that
       
  1740   persist over several invocations of associated
       
  1741   operations.\footnote{This is independent of the visibility of such
       
  1742   mutable values in the toplevel scope.}
       
  1743 
       
  1744   \item Global state of the running Isabelle/ML process, i.e.\ raw I/O
       
  1745   channels, environment variables, current working directory.
       
  1746 
       
  1747   \item Writable resources in the file-system that are shared among
       
  1748   different threads or external processes.
       
  1749 
       
  1750   \end{itemize}
       
  1751 
       
  1752   Isabelle/ML provides various mechanisms to avoid critical shared
       
  1753   resources in most situations.  As last resort there are some
       
  1754   mechanisms for explicit synchronization.  The following guidelines
       
  1755   help to make Isabelle/ML programs work smoothly in a concurrent
       
  1756   environment.
       
  1757 
       
  1758   \begin{itemize}
       
  1759 
       
  1760   \item Avoid global references altogether.  Isabelle/Isar maintains a
       
  1761   uniform context that incorporates arbitrary data declared by user
       
  1762   programs (\secref{sec:context-data}).  This context is passed as
       
  1763   plain value and user tools can get/map their own data in a purely
       
  1764   functional manner.  Configuration options within the context
       
  1765   (\secref{sec:config-options}) provide simple drop-in replacements
       
  1766   for historic reference variables.
       
  1767 
       
  1768   \item Keep components with local state information re-entrant.
       
  1769   Instead of poking initial values into (private) global references, a
       
  1770   new state record can be created on each invocation, and passed
       
  1771   through any auxiliary functions of the component.  The state record
       
  1772   may well contain mutable references, without requiring any special
       
  1773   synchronizations, as long as each invocation gets its own copy.
       
  1774 
       
  1775   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The
       
  1776   Poly/ML library is thread-safe for each individual output operation,
       
  1777   but the ordering of parallel invocations is arbitrary.  This means
       
  1778   raw output will appear on some system console with unpredictable
       
  1779   interleaving of atomic chunks.
       
  1780 
       
  1781   Note that this does not affect regular message output channels
       
  1782   (\secref{sec:message-channels}).  An official message is associated
       
  1783   with the command transaction from where it originates, independently
       
  1784   of other transactions.  This means each running Isar command has
       
  1785   effectively its own set of message channels, and interleaving can
       
  1786   only happen when commands use parallelism internally (and only at
       
  1787   message boundaries).
       
  1788 
       
  1789   \item Treat environment variables and the current working directory
       
  1790   of the running process as strictly read-only.
       
  1791 
       
  1792   \item Restrict writing to the file-system to unique temporary files.
       
  1793   Isabelle already provides a temporary directory that is unique for
       
  1794   the running process, and there is a centralized source of unique
       
  1795   serial numbers in Isabelle/ML.  Thus temporary files that are passed
       
  1796   to to some external process will be always disjoint, and thus
       
  1797   thread-safe.
       
  1798 
       
  1799   \end{itemize}
       
  1800 *}
       
  1801 
       
  1802 text %mlref {*
       
  1803   \begin{mldecls}
       
  1804   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
       
  1805   @{index_ML serial_string: "unit -> string"} \\
       
  1806   \end{mldecls}
       
  1807 
       
  1808   \begin{description}
       
  1809 
       
  1810   \item @{ML File.tmp_path}~@{text "path"} relocates the base
       
  1811   component of @{text "path"} into the unique temporary directory of
       
  1812   the running Isabelle/ML process.
       
  1813 
       
  1814   \item @{ML serial_string}~@{text "()"} creates a new serial number
       
  1815   that is unique over the runtime of the Isabelle/ML process.
       
  1816 
       
  1817   \end{description}
       
  1818 *}
       
  1819 
       
  1820 text %mlex {* The following example shows how to create unique
       
  1821   temporary file names.
       
  1822 *}
       
  1823 
       
  1824 ML {*
       
  1825   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
       
  1826   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
       
  1827   @{assert} (tmp1 <> tmp2);
       
  1828 *}
       
  1829 
       
  1830 
       
  1831 subsection {* Explicit synchronization *}
       
  1832 
       
  1833 text {* Isabelle/ML also provides some explicit synchronization
       
  1834   mechanisms, for the rare situations where mutable shared resources
       
  1835   are really required.  These are based on the synchronizations
       
  1836   primitives of Poly/ML, which have been adapted to the specific
       
  1837   assumptions of the concurrent Isabelle/ML environment.  User code
       
  1838   must not use the Poly/ML primitives directly!
       
  1839 
       
  1840   \medskip The most basic synchronization concept is a single
       
  1841   \emph{critical section} (also called ``monitor'' in the literature).
       
  1842   A thread that enters the critical section prevents all other threads
       
  1843   from doing the same.  A thread that is already within the critical
       
  1844   section may re-enter it in an idempotent manner.
       
  1845 
       
  1846   Such centralized locking is convenient, because it prevents
       
  1847   deadlocks by construction.
       
  1848 
       
  1849   \medskip More fine-grained locking works via \emph{synchronized
       
  1850   variables}.  An explicit state component is associated with
       
  1851   mechanisms for locking and signaling.  There are operations to
       
  1852   await a condition, change the state, and signal the change to all
       
  1853   other waiting threads.
       
  1854 
       
  1855   Here the synchronized access to the state variable is \emph{not}
       
  1856   re-entrant: direct or indirect nesting within the same thread will
       
  1857   cause a deadlock!
       
  1858 *}
       
  1859 
       
  1860 text %mlref {*
       
  1861   \begin{mldecls}
       
  1862   @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
       
  1863   @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
       
  1864   \end{mldecls}
       
  1865   \begin{mldecls}
       
  1866   @{index_ML_type "'a Synchronized.var"} \\
       
  1867   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
       
  1868   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
       
  1869   ('a -> ('b * 'a) option) -> 'b"} \\
       
  1870   \end{mldecls}
       
  1871 
       
  1872   \begin{description}
       
  1873 
       
  1874   \item @{ML NAMED_CRITICAL}~@{text "name e"} evaluates @{text "e ()"}
       
  1875   within the central critical section of Isabelle/ML.  No other thread
       
  1876   may do so at the same time, but non-critical parallel execution will
       
  1877   continue.  The @{text "name"} argument is used for tracing and might
       
  1878   help to spot sources of congestion.
       
  1879 
       
  1880   Entering the critical section without contention is very fast.  Each
       
  1881   thread should stay within the critical section only very briefly,
       
  1882   otherwise parallel performance may degrade.
       
  1883 
       
  1884   \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
       
  1885   name argument.
       
  1886 
       
  1887   \item Type @{ML_type "'a Synchronized.var"} represents synchronized
       
  1888   variables with state of type @{ML_type 'a}.
       
  1889 
       
  1890   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized
       
  1891   variable that is initialized with value @{text "x"}.  The @{text
       
  1892   "name"} is used for tracing.
       
  1893 
       
  1894   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the
       
  1895   function @{text "f"} operate within a critical section on the state
       
  1896   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it
       
  1897   continues to wait on the internal condition variable, expecting that
       
  1898   some other thread will eventually change the content in a suitable
       
  1899   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is
       
  1900   satisfied and assigns the new state value @{text "x'"}, broadcasts a
       
  1901   signal to all waiting threads on the associated condition variable,
       
  1902   and returns the result @{text "y"}.
       
  1903 
       
  1904   \end{description}
       
  1905 
       
  1906   There are some further variants of the @{ML
       
  1907   Synchronized.guarded_access} combinator, see @{file
       
  1908   "~~/src/Pure/Concurrent/synchronized.ML"} for details.
       
  1909 *}
       
  1910 
       
  1911 text %mlex {* The following example implements a counter that produces
       
  1912   positive integers that are unique over the runtime of the Isabelle
       
  1913   process:
       
  1914 *}
       
  1915 
       
  1916 ML {*
       
  1917   local
       
  1918     val counter = Synchronized.var "counter" 0;
       
  1919   in
       
  1920     fun next () =
       
  1921       Synchronized.guarded_access counter
       
  1922         (fn i =>
       
  1923           let val j = i + 1
       
  1924           in SOME (j, j) end);
       
  1925   end;
       
  1926 *}
       
  1927 
       
  1928 ML {*
       
  1929   val a = next ();
       
  1930   val b = next ();
       
  1931   @{assert} (a <> b);
       
  1932 *}
       
  1933 
       
  1934 text {* \medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how
       
  1935   to implement a mailbox as synchronized variable over a purely
       
  1936   functional queue. *}
       
  1937 
       
  1938 
       
  1939 section {* Managed evaluation *}
       
  1940 
       
  1941 text {* Execution of Standard ML follows the model of strict
       
  1942   functional evaluation with optional exceptions.  Evaluation happens
       
  1943   whenever some function is applied to (sufficiently many)
       
  1944   arguments. The result is either an explicit value or an implicit
       
  1945   exception.
       
  1946 
       
  1947   \emph{Managed evaluation} in Isabelle/ML organizes expressions and
       
  1948   results to control certain physical side-conditions, to say more
       
  1949   specifically when and how evaluation happens.  For example, the
       
  1950   Isabelle/ML library supports lazy evaluation with memoing, parallel
       
  1951   evaluation via futures, asynchronous evaluation via promises,
       
  1952   evaluation with time limit etc.
       
  1953 
       
  1954   \medskip An \emph{unevaluated expression} is represented either as
       
  1955   unit abstraction @{verbatim "fn () => a"} of type
       
  1956   @{verbatim "unit -> 'a"} or as regular function
       
  1957   @{verbatim "fn a => b"} of type @{verbatim "'a -> 'b"}.  Both forms
       
  1958   occur routinely, and special care is required to tell them apart ---
       
  1959   the static type-system of SML is only of limited help here.
       
  1960 
       
  1961   The first form is more intuitive: some combinator @{text "(unit ->
       
  1962   'a) -> 'a"} applies the given function to @{text "()"} to initiate
       
  1963   the postponed evaluation process.  The second form is more flexible:
       
  1964   some combinator @{text "('a -> 'b) -> 'a -> 'b"} acts like a
       
  1965   modified form of function application; several such combinators may
       
  1966   be cascaded to modify a given function, before it is ultimately
       
  1967   applied to some argument.
       
  1968 
       
  1969   \medskip \emph{Reified results} make the disjoint sum of regular
       
  1970   values versions exceptional situations explicit as ML datatype:
       
  1971   @{text "'a result = Res of 'a | Exn of exn"}.  This is typically
       
  1972   used for administrative purposes, to store the overall outcome of an
       
  1973   evaluation process.
       
  1974 
       
  1975   \emph{Parallel exceptions} aggregate reified results, such that
       
  1976   multiple exceptions are digested as a collection in canonical form
       
  1977   that identifies exceptions according to their original occurrence.
       
  1978   This is particular important for parallel evaluation via futures
       
  1979   \secref{sec:futures}, which are organized as acyclic graph of
       
  1980   evaluations that depend on other evaluations: exceptions stemming
       
  1981   from shared sub-graphs are exposed exactly once and in the order of
       
  1982   their original occurrence (e.g.\ when printed at the toplevel).
       
  1983   Interrupt counts as neutral element here: it is treated as minimal
       
  1984   information about some canceled evaluation process, and is absorbed
       
  1985   by the presence of regular program exceptions.  *}
       
  1986 
       
  1987 text %mlref {*
       
  1988   \begin{mldecls}
       
  1989   @{index_ML_type "'a Exn.result"} \\
       
  1990   @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
       
  1991   @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
       
  1992   @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
       
  1993   @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
       
  1994   @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
       
  1995   \end{mldecls}
       
  1996 
       
  1997   \begin{description}
       
  1998 
       
  1999   \item Type @{ML_type "'a Exn.result"} represents the disjoint sum of
       
  2000   ML results explicitly, with constructor @{ML Exn.Res} for regular
       
  2001   values and @{ML "Exn.Exn"} for exceptions.
       
  2002 
       
  2003   \item @{ML Exn.capture}~@{text "f x"} manages the evaluation of
       
  2004   @{text "f x"} such that exceptions are made explicit as @{ML
       
  2005   "Exn.Exn"}.  Note that this includes physical interrupts (see also
       
  2006   \secref{sec:exceptions}), so the same precautions apply to user
       
  2007   code: interrupts must not be absorbed accidentally!
       
  2008 
       
  2009   \item @{ML Exn.interruptible_capture} is similar to @{ML
       
  2010   Exn.capture}, but interrupts are immediately re-raised as required
       
  2011   for user code.
       
  2012 
       
  2013   \item @{ML Exn.release}~@{text "result"} releases the original
       
  2014   runtime result, exposing its regular value or raising the reified
       
  2015   exception.
       
  2016 
       
  2017   \item @{ML Par_Exn.release_all}~@{text "results"} combines results
       
  2018   that were produced independently (e.g.\ by parallel evaluation).  If
       
  2019   all results are regular values, that list is returned.  Otherwise,
       
  2020   the collection of all exceptions is raised, wrapped-up as collective
       
  2021   parallel exception.  Note that the latter prevents access to
       
  2022   individual exceptions by conventional @{verbatim "handle"} of SML.
       
  2023 
       
  2024   \item @{ML Par_Exn.release_first} is similar to @{ML
       
  2025   Par_Exn.release_all}, but only the first exception that has occurred
       
  2026   in the original evaluation process is raised again, the others are
       
  2027   ignored.  That single exception may get handled by conventional
       
  2028   means in SML.
       
  2029 
       
  2030   \end{description}
       
  2031 *}
       
  2032 
       
  2033 
       
  2034 subsection {* Parallel skeletons \label{sec:parlist} *}
       
  2035 
       
  2036 text {*
       
  2037   Algorithmic skeletons are combinators that operate on lists in
       
  2038   parallel, in the manner of well-known @{text map}, @{text exists},
       
  2039   @{text forall} etc.  Management of futures (\secref{sec:futures})
       
  2040   and their results as reified exceptions is wrapped up into simple
       
  2041   programming interfaces that resemble the sequential versions.
       
  2042 
       
  2043   What remains is the application-specific problem to present
       
  2044   expressions with suitable \emph{granularity}: each list element
       
  2045   corresponds to one evaluation task.  If the granularity is too
       
  2046   coarse, the available CPUs are not saturated.  If it is too
       
  2047   fine-grained, CPU cycles are wasted due to the overhead of
       
  2048   organizing parallel processing.  In the worst case, parallel
       
  2049   performance will be less than the sequential counterpart!
       
  2050 *}
       
  2051 
       
  2052 text %mlref {*
       
  2053   \begin{mldecls}
       
  2054   @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
       
  2055   @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
       
  2056   \end{mldecls}
       
  2057 
       
  2058   \begin{description}
       
  2059 
       
  2060   \item @{ML Par_List.map}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} is like @{ML
       
  2061   "map"}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"}, but the evaluation of @{text "f x\<^sub>i"}
       
  2062   for @{text "i = 1, \<dots>, n"} is performed in parallel.
       
  2063 
       
  2064   An exception in any @{text "f x\<^sub>i"} cancels the overall evaluation
       
  2065   process.  The final result is produced via @{ML
       
  2066   Par_Exn.release_first} as explained above, which means the first
       
  2067   program exception that happened to occur in the parallel evaluation
       
  2068   is propagated, and all other failures are ignored.
       
  2069 
       
  2070   \item @{ML Par_List.get_some}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} produces some
       
  2071   @{text "f x\<^sub>i"} that is of the form @{text "SOME y\<^sub>i"}, if that
       
  2072   exists, otherwise @{text "NONE"}.  Thus it is similar to @{ML
       
  2073   Library.get_first}, but subject to a non-deterministic parallel
       
  2074   choice process.  The first successful result cancels the overall
       
  2075   evaluation process; other exceptions are propagated as for @{ML
       
  2076   Par_List.map}.
       
  2077 
       
  2078   This generic parallel choice combinator is the basis for derived
       
  2079   forms, such as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML
       
  2080   Par_List.forall}.
       
  2081 
       
  2082   \end{description}
       
  2083 *}
       
  2084 
       
  2085 text %mlex {* Subsequently, the Ackermann function is evaluated in
       
  2086   parallel for some ranges of arguments. *}
       
  2087 
       
  2088 ML_val {*
       
  2089   fun ackermann 0 n = n + 1
       
  2090     | ackermann m 0 = ackermann (m - 1) 1
       
  2091     | ackermann m n = ackermann (m - 1) (ackermann m (n - 1));
       
  2092 
       
  2093   Par_List.map (ackermann 2) (500 upto 1000);
       
  2094   Par_List.map (ackermann 3) (5 upto 10);
       
  2095 *}
       
  2096 
       
  2097 
       
  2098 subsection {* Lazy evaluation *}
       
  2099 
       
  2100 text {*
       
  2101   %FIXME
       
  2102 
       
  2103   See also @{file "~~/src/Pure/Concurrent/lazy.ML"}.
       
  2104 *}
       
  2105 
       
  2106 
       
  2107 subsection {* Future values \label{sec:futures} *}
       
  2108 
       
  2109 text {*
       
  2110   %FIXME
       
  2111 
       
  2112   See also @{file "~~/src/Pure/Concurrent/future.ML"}.
       
  2113 *}
       
  2114 
       
  2115 
       
  2116 end