src/Doc/IsarImplementation/ML.thy
 author wenzelm Tue Mar 18 11:07:47 2014 +0100 (2014-03-18) changeset 56199 8e8d28ed7529 parent 55838 e120a15b0ee6 child 56303 4cc3f4db3447 permissions -rw-r--r--
tuned signature -- rearranged modules;
     1 theory "ML"

     2 imports Base

     3 begin

     4

     5 chapter {* Isabelle/ML *}

     6

     7 text {* Isabelle/ML is best understood as a certain culture based on

     8   Standard ML.  Thus it is not a new programming language, but a

     9   certain way to use SML at an advanced level within the Isabelle

    10   environment.  This covers a variety of aspects that are geared

    11   towards an efficient and robust platform for applications of formal

    12   logic with fully foundational proof construction --- according to

    13   the well-known \emph{LCF principle}.  There is specific

    14   infrastructure with library modules to address the needs of this

    15   difficult task.  For example, the raw parallel programming model of

    16   Poly/ML is presented as considerably more abstract concept of

    17   \emph{future values}, which is then used to augment the inference

    18   kernel, proof interpreter, and theory loader accordingly.

    19

    20   The main aspects of Isabelle/ML are introduced below.  These

    21   first-hand explanations should help to understand how proper

    22   Isabelle/ML is to be read and written, and to get access to the

    23   wealth of experience that is expressed in the source text and its

    24   history of changes.\footnote{See

    25   @{url "http://isabelle.in.tum.de/repos/isabelle"} for the full

    26   Mercurial history.  There are symbolic tags to refer to official

    27   Isabelle releases, as opposed to arbitrary \emph{tip} versions that

    28   merely reflect snapshots that are never really up-to-date.}  *}

    29

    30

    31 section {* Style and orthography *}

    32

    33 text {* The sources of Isabelle/Isar are optimized for

    34   \emph{readability} and \emph{maintainability}.  The main purpose is

    35   to tell an informed reader what is really going on and how things

    36   really work.  This is a non-trivial aim, but it is supported by a

    37   certain style of writing Isabelle/ML that has emerged from long

    38   years of system development.\footnote{See also the interesting style

    39   guide for OCaml

    40   @{url "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"}

    41   which shares many of our means and ends.}

    42

    43   The main principle behind any coding style is \emph{consistency}.

    44   For a single author of a small program this merely means choose

    45   your style and stick to it''.  A complex project like Isabelle, with

    46   long years of development and different contributors, requires more

    47   standardization.  A coding style that is changed every few years or

    48   with every new contributor is no style at all, because consistency

    49   is quickly lost.  Global consistency is hard to achieve, though.

    50   Nonetheless, one should always strive at least for local consistency

    51   of modules and sub-systems, without deviating from some general

    52   principles how to write Isabelle/ML.

    53

    54   In a sense, good coding style is like an \emph{orthography} for the

    55   sources: it helps to read quickly over the text and see through the

    56   main points, without getting distracted by accidental presentation

    57   of free-style code.

    58 *}

    59

    60

    61 subsection {* Header and sectioning *}

    62

    63 text {* Isabelle source files have a certain standardized header

    64   format (with precise spacing) that follows ancient traditions

    65   reaching back to the earliest versions of the system by Larry

    66   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.

    67

    68   The header includes at least @{verbatim Title} and @{verbatim

    69   Author} entries, followed by a prose description of the purpose of

    70   the module.  The latter can range from a single line to several

    71   paragraphs of explanations.

    72

    73   The rest of the file is divided into sections, subsections,

    74   subsubsections, paragraphs etc.\ using a simple layout via ML

    75   comments as follows.

    76

    77 \begin{verbatim}

    78 (*** section ***)

    79

    80 (** subsection **)

    81

    82 (* subsubsection *)

    83

    84 (*short paragraph*)

    85

    86 (*

    87   long paragraph,

    88   with more text

    89 *)

    90 \end{verbatim}

    91

    92   As in regular typography, there is some extra space \emph{before}

    93   section headings that are adjacent to plain text (not other headings

    94   as in the example above).

    95

    96   \medskip The precise wording of the prose text given in these

    97   headings is chosen carefully to introduce the main theme of the

    98   subsequent formal ML text.

    99 *}

   100

   101

   102 subsection {* Naming conventions *}

   103

   104 text {* Since ML is the primary medium to express the meaning of the

   105   source text, naming of ML entities requires special care.

   106

   107   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely

   108   4, but not more) that are separated by underscore.  There are three

   109   variants concerning upper or lower case letters, which are used for

   110   certain ML categories as follows:

   111

   112   \medskip

   113   \begin{tabular}{lll}

   114   variant & example & ML categories \\\hline

   115   lower-case & @{ML_text foo_bar} & values, types, record fields \\

   116   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\

   117   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\

   118   \end{tabular}

   119   \medskip

   120

   121   For historical reasons, many capitalized names omit underscores,

   122   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.

   123   Genuine mixed-case names are \emph{not} used, because clear division

   124   of words is essential for readability.\footnote{Camel-case was

   125   invented to workaround the lack of underscore in some early

   126   non-ASCII character sets.  Later it became habitual in some language

   127   communities that are now strong in numbers.}

   128

   129   A single (capital) character does not count as word'' in this

   130   respect: some Isabelle/ML names are suffixed by extra markers like

   131   this: @{ML_text foo_barT}.

   132

   133   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text

   134   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text

   135   foo''''} or more.  Decimal digits scale better to larger numbers,

   136   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.

   137

   138   \paragraph{Scopes.}  Apart from very basic library modules, ML

   139   structures are not opened'', but names are referenced with

   140   explicit qualification, as in @{ML Syntax.string_of_term} for

   141   example.  When devising names for structures and their components it

   142   is important aim at eye-catching compositions of both parts, because

   143   this is how they are seen in the sources and documentation.  For the

   144   same reasons, aliases of well-known library functions should be

   145   avoided.

   146

   147   Local names of function abstraction or case/let bindings are

   148   typically shorter, sometimes using only rudiments of words'',

   149   while still avoiding cryptic shorthands.  An auxiliary function

   150   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is

   151   considered bad style.

   152

   153   Example:

   154

   155   \begin{verbatim}

   156   (* RIGHT *)

   157

   158   fun print_foo ctxt foo =

   159     let

   160       fun print t = ... Syntax.string_of_term ctxt t ...

   161     in ... end;

   162

   163

   164   (* RIGHT *)

   165

   166   fun print_foo ctxt foo =

   167     let

   168       val string_of_term = Syntax.string_of_term ctxt;

   169       fun print t = ... string_of_term t ...

   170     in ... end;

   171

   172

   173   (* WRONG *)

   174

   175   val string_of_term = Syntax.string_of_term;

   176

   177   fun print_foo ctxt foo =

   178     let

   179       fun aux t = ... string_of_term ctxt t ...

   180     in ... end;

   181

   182   \end{verbatim}

   183

   184

   185   \paragraph{Specific conventions.} Here are some specific name forms

   186   that occur frequently in the sources.

   187

   188   \begin{itemize}

   189

   190   \item A function that maps @{ML_text foo} to @{ML_text bar} is

   191   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never

   192   @{ML_text foo2bar}, @{ML_text bar_from_foo}, @{ML_text

   193   bar_for_foo}, or @{ML_text bar4foo}).

   194

   195   \item The name component @{ML_text legacy} means that the operation

   196   is about to be discontinued soon.

   197

   198   \item The name component @{ML_text old} means that this is historic

   199   material that might disappear at some later stage.

   200

   201   \item The name component @{ML_text global} means that this works

   202   with the background theory instead of the regular local context

   203   (\secref{sec:context}), sometimes for historical reasons, sometimes

   204   due a genuine lack of locality of the concept involved, sometimes as

   205   a fall-back for the lack of a proper context in the application

   206   code.  Whenever there is a non-global variant available, the

   207   application should be migrated to use it with a proper local

   208   context.

   209

   210   \item Variables of the main context types of the Isabelle/Isar

   211   framework (\secref{sec:context} and \chref{ch:local-theory}) have

   212   firm naming conventions as follows:

   213

   214   \begin{itemize}

   215

   216   \item theories are called @{ML_text thy}, rarely @{ML_text theory}

   217   (never @{ML_text thry})

   218

   219   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text

   220   context} (never @{ML_text ctx})

   221

   222   \item generic contexts are called @{ML_text context}, rarely

   223   @{ML_text ctxt}

   224

   225   \item local theories are called @{ML_text lthy}, except for local

   226   theories that are treated as proof context (which is a semantic

   227   super-type)

   228

   229   \end{itemize}

   230

   231   Variations with primed or decimal numbers are always possible, as

   232   well as sematic prefixes like @{ML_text foo_thy} or @{ML_text

   233   bar_ctxt}, but the base conventions above need to be preserved.

   234   This allows to visualize the their data flow via plain regular

   235   expressions in the editor.

   236

   237   \item The main logical entities (\secref{ch:logic}) have established

   238   naming convention as follows:

   239

   240   \begin{itemize}

   241

   242   \item sorts are called @{ML_text S}

   243

   244   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text

   245   ty} (never @{ML_text t})

   246

   247   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text

   248   tm} (never @{ML_text trm})

   249

   250   \item certified types are called @{ML_text cT}, rarely @{ML_text

   251   T}, with variants as for types

   252

   253   \item certified terms are called @{ML_text ct}, rarely @{ML_text

   254   t}, with variants as for terms (never @{ML_text ctrm})

   255

   256   \item theorems are called @{ML_text th}, or @{ML_text thm}

   257

   258   \end{itemize}

   259

   260   Proper semantic names override these conventions completely.  For

   261   example, the left-hand side of an equation (as a term) can be called

   262   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known

   263   to be a variable can be called @{ML_text v} or @{ML_text x}.

   264

   265   \item Tactics (\secref{sec:tactics}) are sufficiently important to

   266   have specific naming conventions.  The name of a basic tactic

   267   definition always has a @{ML_text "_tac"} suffix, the subgoal index

   268   (if applicable) is always called @{ML_text i}, and the goal state

   269   (if made explicit) is usually called @{ML_text st} instead of the

   270   somewhat misleading @{ML_text thm}.  Any other arguments are given

   271   before the latter two, and the general context is given first.

   272   Example:

   273

   274   \begin{verbatim}

   275   fun my_tac ctxt arg1 arg2 i st = ...

   276   \end{verbatim}

   277

   278   Note that the goal state @{ML_text st} above is rarely made

   279   explicit, if tactic combinators (tacticals) are used as usual.

   280

   281   \end{itemize}

   282 *}

   283

   284

   285 subsection {* General source layout *}

   286

   287 text {* The general Isabelle/ML source layout imitates regular

   288   type-setting to some extent, augmented by the requirements for

   289   deeply nested expressions that are commonplace in functional

   290   programming.

   291

   292   \paragraph{Line length} is 80 characters according to ancient

   293   standards, but we allow as much as 100 characters (not

   294   more).\footnote{Readability requires to keep the beginning of a line

   295   in view while watching its end.  Modern wide-screen displays do not

   296   change the way how the human brain works.  Sources also need to be

   297   printable on plain paper with reasonable font-size.} The extra 20

   298   characters acknowledge the space requirements due to qualified

   299   library references in Isabelle/ML.

   300

   301   \paragraph{White-space} is used to emphasize the structure of

   302   expressions, following mostly standard conventions for mathematical

   303   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This

   304   defines positioning of spaces for parentheses, punctuation, and

   305   infixes as illustrated here:

   306

   307   \begin{verbatim}

   308   val x = y + z * (a + b);

   309   val pair = (a, b);

   310   val record = {foo = 1, bar = 2};

   311   \end{verbatim}

   312

   313   Lines are normally broken \emph{after} an infix operator or

   314   punctuation character.  For example:

   315

   316   \begin{verbatim}

   317   val x =

   318     a +

   319     b +

   320     c;

   321

   322   val tuple =

   323    (a,

   324     b,

   325     c);

   326   \end{verbatim}

   327

   328   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the

   329   start of the line, but punctuation is always at the end.

   330

   331   Function application follows the tradition of @{text "\<lambda>"}-calculus,

   332   not informal mathematics.  For example: @{ML_text "f a b"} for a

   333   curried function, or @{ML_text "g (a, b)"} for a tupled function.

   334   Note that the space between @{ML_text g} and the pair @{ML_text

   335   "(a, b)"} follows the important principle of

   336   \emph{compositionality}: the layout of @{ML_text "g p"} does not

   337   change when @{ML_text "p"} is refined to the concrete pair

   338   @{ML_text "(a, b)"}.

   339

   340   \paragraph{Indentation} uses plain spaces, never hard

   341   tabulators.\footnote{Tabulators were invented to move the carriage

   342   of a type-writer to certain predefined positions.  In software they

   343   could be used as a primitive run-length compression of consecutive

   344   spaces, but the precise result would depend on non-standardized

   345   editor configuration.}

   346

   347   Each level of nesting is indented by 2 spaces, sometimes 1, very

   348   rarely 4, never 8 or any other odd number.

   349

   350   Indentation follows a simple logical format that only depends on the

   351   nesting depth, not the accidental length of the text that initiates

   352   a level of nesting.  Example:

   353

   354   \begin{verbatim}

   355   (* RIGHT *)

   356

   357   if b then

   358     expr1_part1

   359     expr1_part2

   360   else

   361     expr2_part1

   362     expr2_part2

   363

   364

   365   (* WRONG *)

   366

   367   if b then expr1_part1

   368             expr1_part2

   369   else expr2_part1

   370        expr2_part2

   371   \end{verbatim}

   372

   373   The second form has many problems: it assumes a fixed-width font

   374   when viewing the sources, it uses more space on the line and thus

   375   makes it hard to observe its strict length limit (working against

   376   \emph{readability}), it requires extra editing to adapt the layout

   377   to changes of the initial text (working against

   378   \emph{maintainability}) etc.

   379

   380   \medskip For similar reasons, any kind of two-dimensional or tabular

   381   layouts, ASCII-art with lines or boxes of asterisks etc.\ should be

   382   avoided.

   383

   384   \paragraph{Complex expressions} that consist of multi-clausal

   385   function definitions, @{ML_text handle}, @{ML_text case},

   386   @{ML_text let} (and combinations) require special attention.  The

   387   syntax of Standard ML is quite ambitious and admits a lot of

   388   variance that can distort the meaning of the text.

   389

   390   Clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},

   391   @{ML_text case} get extra indentation to indicate the nesting

   392   clearly.  Example:

   393

   394   \begin{verbatim}

   395   (* RIGHT *)

   396

   397   fun foo p1 =

   398         expr1

   399     | foo p2 =

   400         expr2

   401

   402

   403   (* WRONG *)

   404

   405   fun foo p1 =

   406     expr1

   407     | foo p2 =

   408     expr2

   409   \end{verbatim}

   410

   411   Body expressions consisting of @{ML_text case} or @{ML_text let}

   412   require care to maintain compositionality, to prevent loss of

   413   logical indentation where it is especially important to see the

   414   structure of the text.  Example:

   415

   416   \begin{verbatim}

   417   (* RIGHT *)

   418

   419   fun foo p1 =

   420         (case e of

   421           q1 => ...

   422         | q2 => ...)

   423     | foo p2 =

   424         let

   425           ...

   426         in

   427           ...

   428         end

   429

   430

   431   (* WRONG *)

   432

   433   fun foo p1 = case e of

   434       q1 => ...

   435     | q2 => ...

   436     | foo p2 =

   437     let

   438       ...

   439     in

   440       ...

   441     end

   442   \end{verbatim}

   443

   444   Extra parentheses around @{ML_text case} expressions are optional,

   445   but help to analyse the nesting based on character matching in the

   446   editor.

   447

   448   \medskip There are two main exceptions to the overall principle of

   449   compositionality in the layout of complex expressions.

   450

   451   \begin{enumerate}

   452

   453   \item @{ML_text "if"} expressions are iterated as if there would be

   454   a multi-branch conditional in SML, e.g.

   455

   456   \begin{verbatim}

   457   (* RIGHT *)

   458

   459   if b1 then e1

   460   else if b2 then e2

   461   else e3

   462   \end{verbatim}

   463

   464   \item @{ML_text fn} abstractions are often layed-out as if they

   465   would lack any structure by themselves.  This traditional form is

   466   motivated by the possibility to shift function arguments back and

   467   forth wrt.\ additional combinators.  Example:

   468

   469   \begin{verbatim}

   470   (* RIGHT *)

   471

   472   fun foo x y = fold (fn z =>

   473     expr)

   474   \end{verbatim}

   475

   476   Here the visual appearance is that of three arguments @{ML_text x},

   477   @{ML_text y}, @{ML_text z}.

   478

   479   \end{enumerate}

   480

   481   Such weakly structured layout should be use with great care.  Here

   482   are some counter-examples involving @{ML_text let} expressions:

   483

   484   \begin{verbatim}

   485   (* WRONG *)

   486

   487   fun foo x = let

   488       val y = ...

   489     in ... end

   490

   491

   492   (* WRONG *)

   493

   494   fun foo x = let

   495     val y = ...

   496   in ... end

   497

   498

   499   (* WRONG *)

   500

   501   fun foo x =

   502   let

   503     val y = ...

   504   in ... end

   505   \end{verbatim}

   506

   507   \medskip In general the source layout is meant to emphasize the

   508   structure of complex language expressions, not to pretend that SML

   509   had a completely different syntax (say that of Haskell or Java).

   510 *}

   511

   512

   513 section {* SML embedded into Isabelle/Isar *}

   514

   515 text {* ML and Isar are intertwined via an open-ended bootstrap

   516   process that provides more and more programming facilities and

   517   logical content in an alternating manner.  Bootstrapping starts from

   518   the raw environment of existing implementations of Standard ML

   519   (mainly Poly/ML, but also SML/NJ).

   520

   521   Isabelle/Pure marks the point where the original ML toplevel is

   522   superseded by the Isar toplevel that maintains a uniform context for

   523   arbitrary ML values (see also \secref{sec:context}).  This formal

   524   environment holds ML compiler bindings, logical entities, and many

   525   other things.  Raw SML is never encountered again after the initial

   526   bootstrap of Isabelle/Pure.

   527

   528   Object-logics like Isabelle/HOL are built within the

   529   Isabelle/ML/Isar environment by introducing suitable theories with

   530   associated ML modules, either inlined or as separate files.  Thus

   531   Isabelle/HOL is defined as a regular user-space application within

   532   the Isabelle framework.  Further add-on tools can be implemented in

   533   ML within the Isar context in the same manner: ML is part of the

   534   standard repertoire of Isabelle, and there is no distinction between

   535   user'' and developer'' in this respect.

   536 *}

   537

   538

   539 subsection {* Isar ML commands *}

   540

   541 text {* The primary Isar source language provides facilities to open

   542   a window'' to the underlying ML compiler.  Especially see the Isar

   543   commands @{command_ref "ML_file"} and @{command_ref "ML"}: both work the

   544   same way, only the source text is provided via a file vs.\ inlined,

   545   respectively.  Apart from embedding ML into the main theory

   546   definition like that, there are many more commands that refer to ML

   547   source, such as @{command_ref setup} or @{command_ref declaration}.

   548   Even more fine-grained embedding of ML into Isar is encountered in

   549   the proof method @{method_ref tactic}, which refines the pending

   550   goal state via a given expression of type @{ML_type tactic}.

   551 *}

   552

   553 text %mlex {* The following artificial example demonstrates some ML

   554   toplevel declarations within the implicit Isar theory context.  This

   555   is regular functional programming without referring to logical

   556   entities yet.

   557 *}

   558

   559 ML {*

   560   fun factorial 0 = 1

   561     | factorial n = n * factorial (n - 1)

   562 *}

   563

   564 text {* Here the ML environment is already managed by Isabelle, i.e.\

   565   the @{ML factorial} function is not yet accessible in the preceding

   566   paragraph, nor in a different theory that is independent from the

   567   current one in the import hierarchy.

   568

   569   Removing the above ML declaration from the source text will remove

   570   any trace of this definition as expected.  The Isabelle/ML toplevel

   571   environment is managed in a \emph{stateless} way: unlike the raw ML

   572   toplevel there are no global side-effects involved

   573   here.\footnote{Such a stateless compilation environment is also a

   574   prerequisite for robust parallel compilation within independent

   575   nodes of the implicit theory development graph.}

   576

   577   \medskip The next example shows how to embed ML into Isar proofs, using

   578  @{command_ref "ML_prf"} instead of Instead of @{command_ref "ML"}.

   579   As illustrated below, the effect on the ML environment is local to

   580   the whole proof body, ignoring the block structure.

   581 *}

   582

   583 notepad

   584 begin

   585   ML_prf %"ML" {* val a = 1 *}

   586   {

   587     ML_prf %"ML" {* val b = a + 1 *}

   588   } -- {* Isar block structure ignored by ML environment *}

   589   ML_prf %"ML" {* val c = b + 1 *}

   590 end

   591

   592 text {* By side-stepping the normal scoping rules for Isar proof

   593   blocks, embedded ML code can refer to the different contexts and

   594   manipulate corresponding entities, e.g.\ export a fact from a block

   595   context.

   596

   597   \medskip Two further ML commands are useful in certain situations:

   598   @{command_ref ML_val} and @{command_ref ML_command} are

   599   \emph{diagnostic} in the sense that there is no effect on the

   600   underlying environment, and can thus used anywhere (even outside a

   601   theory).  The examples below produce long strings of digits by

   602   invoking @{ML factorial}: @{command ML_val} already takes care of

   603   printing the ML toplevel result, but @{command ML_command} is silent

   604   so we produce an explicit output message.  *}

   605

   606 ML_val {* factorial 100 *}

   607 ML_command {* writeln (string_of_int (factorial 100)) *}

   608

   609 notepad

   610 begin

   611   ML_val {* factorial 100 *}

   612   ML_command {* writeln (string_of_int (factorial 100)) *}

   613 end

   614

   615

   616 subsection {* Compile-time context *}

   617

   618 text {* Whenever the ML compiler is invoked within Isabelle/Isar, the

   619   formal context is passed as a thread-local reference variable.  Thus

   620   ML code may access the theory context during compilation, by reading

   621   or writing the (local) theory under construction.  Note that such

   622   direct access to the compile-time context is rare.  In practice it

   623   is typically done via some derived ML functions instead.

   624 *}

   625

   626 text %mlref {*

   627   \begin{mldecls}

   628   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\

   629   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\

   630   @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\

   631   @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\

   632   \end{mldecls}

   633

   634   \begin{description}

   635

   636   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory

   637   context of the ML toplevel --- at compile time.  ML code needs to

   638   take care to refer to @{ML "ML_Context.the_generic_context ()"}

   639   correctly.  Recall that evaluation of a function body is delayed

   640   until actual run-time.

   641

   642   \item @{ML "Context.>>"}~@{text f} applies context transformation

   643   @{text f} to the implicit context of the ML toplevel.

   644

   645   \item @{ML ML_Thms.bind_thms}~@{text "(name, thms)"} stores a list of

   646   theorems produced in ML both in the (global) theory context and the

   647   ML toplevel, associating it with the provided name.  Theorems are

   648   put into a global standard'' format before being stored.

   649

   650   \item @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to a

   651   singleton fact.

   652

   653   \end{description}

   654

   655   It is important to note that the above functions are really

   656   restricted to the compile time, even though the ML compiler is

   657   invoked at run-time.  The majority of ML code either uses static

   658   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or

   659   proof context at run-time, by explicit functional abstraction.

   660 *}

   661

   662

   663 subsection {* Antiquotations \label{sec:ML-antiq} *}

   664

   665 text {* A very important consequence of embedding SML into Isar is the

   666   concept of \emph{ML antiquotation}.  The standard token language of

   667   ML is augmented by special syntactic entities of the following form:

   668

   669   @{rail \<open>

   670   @{syntax_def antiquote}: '@{' nameref args '}'

   671   \<close>}

   672

   673   Here @{syntax nameref} and @{syntax args} are regular outer syntax

   674   categories \cite{isabelle-isar-ref}.  Attributes and proof methods

   675   use similar syntax.

   676

   677   \medskip A regular antiquotation @{text "@{name args}"} processes

   678   its arguments by the usual means of the Isar source language, and

   679   produces corresponding ML source text, either as literal

   680   \emph{inline} text (e.g. @{text "@{term t}"}) or abstract

   681   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation

   682   scheme allows to refer to formal entities in a robust manner, with

   683   proper static scoping and with some degree of logical checking of

   684   small portions of the code.

   685 *}

   686

   687

   688 subsection {* Printing ML values *}

   689

   690 text {* The ML compiler knows about the structure of values according

   691   to their static type, and can print them in the manner of the

   692   toplevel loop, although the details are non-portable.  The

   693   antiquotation @{ML_antiquotation_def "make_string"} provides a

   694   quasi-portable way to refer to this potential capability of the

   695   underlying ML system in generic Isabelle/ML sources.  *}

   696

   697 text %mlantiq {*

   698   \begin{matharray}{rcl}

   699   @{ML_antiquotation_def "make_string"} & : & @{text ML_antiquotation} \\

   700   \end{matharray}

   701

   702   @{rail \<open>

   703   @@{ML_antiquotation make_string}

   704   \<close>}

   705

   706   \begin{description}

   707

   708   \item @{text "@{make_string}"} inlines a function to print arbitrary

   709 values similar to the ML toplevel.  The result is compiler dependent

   710 and may fall back on "?" in certain situations.

   711

   712   \end{description}

   713 *}

   714

   715 text %mlex {* The following artificial example shows how to produce

   716   ad-hoc output of ML values for debugging purposes.  This should not

   717   be done in production code, and not persist in regular Isabelle/ML

   718   sources.  *}

   719

   720 ML {*

   721   val x = 42;

   722   val y = true;

   723

   724   writeln (@{make_string} {x = x, y = y});

   725 *}

   726

   727

   728 section {* Canonical argument order \label{sec:canonical-argument-order} *}

   729

   730 text {* Standard ML is a language in the tradition of @{text

   731   "\<lambda>"}-calculus and \emph{higher-order functional programming},

   732   similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical

   733   languages.  Getting acquainted with the native style of representing

   734   functions in that setting can save a lot of extra boiler-plate of

   735   redundant shuffling of arguments, auxiliary abstractions etc.

   736

   737   Functions are usually \emph{curried}: the idea of turning arguments

   738   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of

   739   type @{text "\<tau>"} is represented by the iterated function space

   740   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known

   741   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried

   742   version fits more smoothly into the basic calculus.\footnote{The

   743   difference is even more significant in higher-order logic, because

   744   the redundant tuple structure needs to be accommodated by formal

   745   reasoning.}

   746

   747   Currying gives some flexiblity due to \emph{partial application}.  A

   748   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}

   749   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function

   750   etc.  How well this works in practice depends on the order of

   751   arguments.  In the worst case, arguments are arranged erratically,

   752   and using a function in a certain situation always requires some

   753   glue code.  Thus we would get exponentially many oppurtunities to

   754   decorate the code with meaningless permutations of arguments.

   755

   756   This can be avoided by \emph{canonical argument order}, which

   757   observes certain standard patterns and minimizes adhoc permutations

   758   in their application.  In Isabelle/ML, large portions of text can be

   759   written without auxiliary operations like @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times>

   760   \<alpha>"} or @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} (the latter not

   761   present in the Isabelle/ML library).

   762

   763   \medskip The basic idea is that arguments that vary less are moved

   764   further to the left than those that vary more.  Two particularly

   765   important categories of functions are \emph{selectors} and

   766   \emph{updates}.

   767

   768   The subsequent scheme is based on a hypothetical set-like container

   769   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both

   770   the names and types of the associated operations are canonical for

   771   Isabelle/ML.

   772

   773   \begin{center}

   774   \begin{tabular}{ll}

   775   kind & canonical name and type \\\hline

   776   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\

   777   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\

   778   \end{tabular}

   779   \end{center}

   780

   781   Given a container @{text "B: \<beta>"}, the partially applied @{text

   782   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and

   783   thus represents the intended denotation directly.  It is customary

   784   to pass the abstract predicate to further operations, not the

   785   concrete container.  The argument order makes it easy to use other

   786   combinators: @{text "forall (member B) list"} will check a list of

   787   elements for membership in @{text "B"} etc. Often the explicit

   788   @{text "list"} is pointless and can be contracted to @{text "forall

   789   (member B)"} to get directly a predicate again.

   790

   791   In contrast, an update operation varies the container, so it moves

   792   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to

   793   insert a value @{text "a"}.  These can be composed naturally as

   794   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward

   795   inversion of the composition order is due to conventional

   796   mathematical notation, which can be easily amended as explained

   797   below.

   798 *}

   799

   800

   801 subsection {* Forward application and composition *}

   802

   803 text {* Regular function application and infix notation works best for

   804   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g

   805   z)"}.  The important special case of \emph{linear transformation}

   806   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This

   807   becomes hard to read and maintain if the functions are themselves

   808   given as complex expressions.  The notation can be significantly

   809   improved by introducing \emph{forward} versions of application and

   810   composition as follows:

   811

   812   \medskip

   813   \begin{tabular}{lll}

   814   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\

   815   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\

   816   \end{tabular}

   817   \medskip

   818

   819   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or

   820   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text

   821   "x"}.

   822

   823   \medskip There is an additional set of combinators to accommodate

   824   multiple results (via pairs) that are passed on as multiple

   825   arguments (via currying).

   826

   827   \medskip

   828   \begin{tabular}{lll}

   829   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\

   830   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\

   831   \end{tabular}

   832   \medskip

   833 *}

   834

   835 text %mlref {*

   836   \begin{mldecls}

   837   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\

   838   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\

   839   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\

   840   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\

   841   \end{mldecls}

   842 *}

   843

   844

   845 subsection {* Canonical iteration *}

   846

   847 text {* As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be

   848   understood as update on a configuration of type @{text "\<beta>"},

   849   parametrized by arguments of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}

   850   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates

   851   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a

   852   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f a\<^sub>n"}.

   853   The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.

   854   It can be applied to an initial configuration @{text "b: \<beta>"} to

   855   start the iteration over the given list of arguments: each @{text

   856   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a

   857   cumulative configuration.

   858

   859   The @{text fold} combinator in Isabelle/ML lifts a function @{text

   860   "f"} as above to its iterated version over a list of arguments.

   861   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates

   862   over a list of lists as expected.

   863

   864   The variant @{text "fold_rev"} works inside-out over the list of

   865   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.

   866

   867   The @{text "fold_map"} combinator essentially performs @{text

   868   "fold"} and @{text "map"} simultaneously: each application of @{text

   869   "f"} produces an updated configuration together with a side-result;

   870   the iteration collects all such side-results as a separate list.

   871 *}

   872

   873 text %mlref {*

   874   \begin{mldecls}

   875   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\

   876   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\

   877   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\

   878   \end{mldecls}

   879

   880   \begin{description}

   881

   882   \item @{ML fold}~@{text f} lifts the parametrized update function

   883   @{text "f"} to a list of parameters.

   884

   885   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text

   886   "f"}, but works inside-out.

   887

   888   \item @{ML fold_map}~@{text "f"} lifts the parametrized update

   889   function @{text "f"} (with side-result) to a list of parameters and

   890   cumulative side-results.

   891

   892   \end{description}

   893

   894   \begin{warn}

   895   The literature on functional programming provides a multitude of

   896   combinators called @{text "foldl"}, @{text "foldr"} etc.  SML97

   897   provides its own variations as @{ML List.foldl} and @{ML

   898   List.foldr}, while the classic Isabelle library also has the

   899   historic @{ML Library.foldl} and @{ML Library.foldr}.  To avoid

   900   unnecessary complication and confusion, all these historical

   901   versions should be ignored, and @{ML fold} (or @{ML fold_rev}) used

   902   exclusively.

   903   \end{warn}

   904 *}

   905

   906 text %mlex {* The following example shows how to fill a text buffer

   907   incrementally by adding strings, either individually or from a given

   908   list.

   909 *}

   910

   911 ML {*

   912   val s =

   913     Buffer.empty

   914     |> Buffer.add "digits: "

   915     |> fold (Buffer.add o string_of_int) (0 upto 9)

   916     |> Buffer.content;

   917

   918   @{assert} (s = "digits: 0123456789");

   919 *}

   920

   921 text {* Note how @{ML "fold (Buffer.add o string_of_int)"} above saves

   922   an extra @{ML "map"} over the given list.  This kind of peephole

   923   optimization reduces both the code size and the tree structures in

   924   memory (deforestation''), but it requires some practice to read

   925   and write fluently.

   926

   927   \medskip The next example elaborates the idea of canonical

   928   iteration, demonstrating fast accumulation of tree content using a

   929   text buffer.

   930 *}

   931

   932 ML {*

   933   datatype tree = Text of string | Elem of string * tree list;

   934

   935   fun slow_content (Text txt) = txt

   936     | slow_content (Elem (name, ts)) =

   937         "<" ^ name ^ ">" ^

   938         implode (map slow_content ts) ^

   939         "</" ^ name ^ ">"

   940

   941   fun add_content (Text txt) = Buffer.add txt

   942     | add_content (Elem (name, ts)) =

   943         Buffer.add ("<" ^ name ^ ">") #>

   944         fold add_content ts #>

   945         Buffer.add ("</" ^ name ^ ">");

   946

   947   fun fast_content tree =

   948     Buffer.empty |> add_content tree |> Buffer.content;

   949 *}

   950

   951 text {* The slow part of @{ML slow_content} is the @{ML implode} of

   952   the recursive results, because it copies previously produced strings

   953   again.

   954

   955   The incremental @{ML add_content} avoids this by operating on a

   956   buffer that is passed through in a linear fashion.  Using @{ML_text

   957   "#>"} and contraction over the actual buffer argument saves some

   958   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}

   959   invocations with concatenated strings could have been split into

   960   smaller parts, but this would have obfuscated the source without

   961   making a big difference in allocations.  Here we have done some

   962   peephole-optimization for the sake of readability.

   963

   964   Another benefit of @{ML add_content} is its open'' form as a

   965   function on buffers that can be continued in further linear

   966   transformations, folding etc.  Thus it is more compositional than

   967   the naive @{ML slow_content}.  As realistic example, compare the

   968   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer

   969   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.

   970

   971   Note that @{ML fast_content} above is only defined as example.  In

   972   many practical situations, it is customary to provide the

   973   incremental @{ML add_content} only and leave the initialization and

   974   termination to the concrete application by the user.

   975 *}

   976

   977

   978 section {* Message output channels \label{sec:message-channels} *}

   979

   980 text {* Isabelle provides output channels for different kinds of

   981   messages: regular output, high-volume tracing information, warnings,

   982   and errors.

   983

   984   Depending on the user interface involved, these messages may appear

   985   in different text styles or colours.  The standard output for

   986   terminal sessions prefixes each line of warnings by @{verbatim

   987   "###"} and errors by @{verbatim "***"}, but leaves anything else

   988   unchanged.

   989

   990   Messages are associated with the transaction context of the running

   991   Isar command.  This enables the front-end to manage commands and

   992   resulting messages together.  For example, after deleting a command

   993   from a given theory document version, the corresponding message

   994   output can be retracted from the display.

   995 *}

   996

   997 text %mlref {*

   998   \begin{mldecls}

   999   @{index_ML writeln: "string -> unit"} \\

  1000   @{index_ML tracing: "string -> unit"} \\

  1001   @{index_ML warning: "string -> unit"} \\

  1002   @{index_ML error: "string -> 'a"} \\

  1003   \end{mldecls}

  1004

  1005   \begin{description}

  1006

  1007   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular

  1008   message.  This is the primary message output operation of Isabelle

  1009   and should be used by default.

  1010

  1011   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special

  1012   tracing message, indicating potential high-volume output to the

  1013   front-end (hundreds or thousands of messages issued by a single

  1014   command).  The idea is to allow the user-interface to downgrade the

  1015   quality of message display to achieve higher throughput.

  1016

  1017   Note that the user might have to take special actions to see tracing

  1018   output, e.g.\ switch to a different output window.  So this channel

  1019   should not be used for regular output.

  1020

  1021   \item @{ML warning}~@{text "text"} outputs @{text "text"} as

  1022   warning, which typically means some extra emphasis on the front-end

  1023   side (color highlighting, icons, etc.).

  1024

  1025   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text

  1026   "text"} and thus lets the Isar toplevel print @{text "text"} on the

  1027   error channel, which typically means some extra emphasis on the

  1028   front-end side (color highlighting, icons, etc.).

  1029

  1030   This assumes that the exception is not handled before the command

  1031   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a

  1032   perfectly legal alternative: it means that the error is absorbed

  1033   without any message output.

  1034

  1035   \begin{warn}

  1036   The actual error channel is accessed via @{ML Output.error_message}, but

  1037   the old interaction protocol of Proof~General \emph{crashes} if that

  1038   function is used in regular ML code: error output and toplevel

  1039   command failure always need to coincide in classic TTY interaction.

  1040   \end{warn}

  1041

  1042   \end{description}

  1043

  1044   \begin{warn}

  1045   Regular Isabelle/ML code should output messages exclusively by the

  1046   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}

  1047   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in

  1048   the presence of parallel and asynchronous processing of Isabelle

  1049   theories.  Such raw output might be displayed by the front-end in

  1050   some system console log, with a low chance that the user will ever

  1051   see it.  Moreover, as a genuine side-effect on global process

  1052   channels, there is no proper way to retract output when Isar command

  1053   transactions are reset by the system.

  1054   \end{warn}

  1055

  1056   \begin{warn}

  1057   The message channels should be used in a message-oriented manner.

  1058   This means that multi-line output that logically belongs together is

  1059   issued by a \emph{single} invocation of @{ML writeln} etc.\ with the

  1060   functional concatenation of all message constituents.

  1061   \end{warn}

  1062 *}

  1063

  1064 text %mlex {* The following example demonstrates a multi-line

  1065   warning.  Note that in some situations the user sees only the first

  1066   line, so the most important point should be made first.

  1067 *}

  1068

  1069 ML_command {*

  1070   warning (cat_lines

  1071    ["Beware the Jabberwock, my son!",

  1072     "The jaws that bite, the claws that catch!",

  1073     "Beware the Jubjub Bird, and shun",

  1074     "The frumious Bandersnatch!"]);

  1075 *}

  1076

  1077

  1078 section {* Exceptions \label{sec:exceptions} *}

  1079

  1080 text {* The Standard ML semantics of strict functional evaluation

  1081   together with exceptions is rather well defined, but some delicate

  1082   points need to be observed to avoid that ML programs go wrong

  1083   despite static type-checking.  Exceptions in Isabelle/ML are

  1084   subsequently categorized as follows.

  1085

  1086   \paragraph{Regular user errors.}  These are meant to provide

  1087   informative feedback about malformed input etc.

  1088

  1089   The \emph{error} function raises the corresponding \emph{ERROR}

  1090   exception, with a plain text message as argument.  \emph{ERROR}

  1091   exceptions can be handled internally, in order to be ignored, turned

  1092   into other exceptions, or cascaded by appending messages.  If the

  1093   corresponding Isabelle/Isar command terminates with an \emph{ERROR}

  1094   exception state, the toplevel will print the result on the error

  1095   channel (see \secref{sec:message-channels}).

  1096

  1097   It is considered bad style to refer to internal function names or

  1098   values in ML source notation in user error messages.

  1099

  1100   Grammatical correctness of error messages can be improved by

  1101   \emph{omitting} final punctuation: messages are often concatenated

  1102   or put into a larger context (e.g.\ augmented with source position).

  1103   By not insisting in the final word at the origin of the error, the

  1104   system can perform its administrative tasks more easily and

  1105   robustly.

  1106

  1107   \paragraph{Program failures.}  There is a handful of standard

  1108   exceptions that indicate general failure situations, or failures of

  1109   core operations on logical entities (types, terms, theorems,

  1110   theories, see \chref{ch:logic}).

  1111

  1112   These exceptions indicate a genuine breakdown of the program, so the

  1113   main purpose is to determine quickly what has happened where.

  1114   Traditionally, the (short) exception message would include the name

  1115   of an ML function, although this is no longer necessary, because the

  1116   ML runtime system prints a detailed source position of the

  1117   corresponding @{ML_text raise} keyword.

  1118

  1119   \medskip User modules can always introduce their own custom

  1120   exceptions locally, e.g.\ to organize internal failures robustly

  1121   without overlapping with existing exceptions.  Exceptions that are

  1122   exposed in module signatures require extra care, though, and should

  1123   \emph{not} be introduced by default.  Surprise by users of a module

  1124   can be often minimized by using plain user errors instead.

  1125

  1126   \paragraph{Interrupts.}  These indicate arbitrary system events:

  1127   both the ML runtime system and the Isabelle/ML infrastructure signal

  1128   various exceptional situations by raising the special

  1129   \emph{Interrupt} exception in user code.

  1130

  1131   This is the one and only way that physical events can intrude an

  1132   Isabelle/ML program.  Such an interrupt can mean out-of-memory,

  1133   stack overflow, timeout, internal signaling of threads, or the user

  1134   producing a console interrupt manually etc.  An Isabelle/ML program

  1135   that intercepts interrupts becomes dependent on physical effects of

  1136   the environment.  Even worse, exception handling patterns that are

  1137   too general by accident, e.g.\ by mispelled exception constructors,

  1138   will cover interrupts unintentionally and thus render the program

  1139   semantics ill-defined.

  1140

  1141   Note that the Interrupt exception dates back to the original SML90

  1142   language definition.  It was excluded from the SML97 version to

  1143   avoid its malign impact on ML program semantics, but without

  1144   providing a viable alternative.  Isabelle/ML recovers physical

  1145   interruptibility (which is an indispensable tool to implement

  1146   managed evaluation of command transactions), but requires user code

  1147   to be strictly transparent wrt.\ interrupts.

  1148

  1149   \begin{warn}

  1150   Isabelle/ML user code needs to terminate promptly on interruption,

  1151   without guessing at its meaning to the system infrastructure.

  1152   Temporary handling of interrupts for cleanup of global resources

  1153   etc.\ needs to be followed immediately by re-raising of the original

  1154   exception.

  1155   \end{warn}

  1156 *}

  1157

  1158 text %mlref {*

  1159   \begin{mldecls}

  1160   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\

  1161   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\

  1162   @{index_ML_exception ERROR: string} \\

  1163   @{index_ML_exception Fail: string} \\

  1164   @{index_ML Exn.is_interrupt: "exn -> bool"} \\

  1165   @{index_ML reraise: "exn -> 'a"} \\

  1166   @{index_ML ML_Compiler.exn_trace: "(unit -> 'a) -> 'a"} \\

  1167   \end{mldecls}

  1168

  1169   \begin{description}

  1170

  1171   \item @{ML try}~@{text "f x"} makes the partiality of evaluating

  1172   @{text "f x"} explicit via the option datatype.  Interrupts are

  1173   \emph{not} handled here, i.e.\ this form serves as safe replacement

  1174   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f

  1175   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in

  1176   books about SML97, not Isabelle/ML.

  1177

  1178   \item @{ML can} is similar to @{ML try} with more abstract result.

  1179

  1180   \item @{ML ERROR}~@{text "msg"} represents user errors; this

  1181   exception is normally raised indirectly via the @{ML error} function

  1182   (see \secref{sec:message-channels}).

  1183

  1184   \item @{ML Fail}~@{text "msg"} represents general program failures.

  1185

  1186   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without

  1187   mentioning concrete exception constructors in user code.  Handled

  1188   interrupts need to be re-raised promptly!

  1189

  1190   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}

  1191   while preserving its implicit position information (if possible,

  1192   depending on the ML platform).

  1193

  1194   \item @{ML ML_Compiler.exn_trace}~@{ML_text "(fn () =>"}~@{text

  1195   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing

  1196   a full trace of its stack of nested exceptions (if possible,

  1197   depending on the ML platform).

  1198

  1199   Inserting @{ML ML_Compiler.exn_trace} into ML code temporarily is

  1200   useful for debugging, but not suitable for production code.

  1201

  1202   \end{description}

  1203 *}

  1204

  1205 text %mlantiq {*

  1206   \begin{matharray}{rcl}

  1207   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\

  1208   \end{matharray}

  1209

  1210   \begin{description}

  1211

  1212   \item @{text "@{assert}"} inlines a function

  1213   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is

  1214   @{ML false}.  Due to inlining the source position of failed

  1215   assertions is included in the error output.

  1216

  1217   \end{description}

  1218 *}

  1219

  1220

  1221 section {* Strings of symbols \label{sec:symbols} *}

  1222

  1223 text {* A \emph{symbol} constitutes the smallest textual unit in

  1224   Isabelle/ML --- raw ML characters are normally not encountered at

  1225   all!  Isabelle strings consist of a sequence of symbols, represented

  1226   as a packed string or an exploded list of strings.  Each symbol is

  1227   in itself a small string, which has either one of the following

  1228   forms:

  1229

  1230   \begin{enumerate}

  1231

  1232   \item a single ASCII character @{text "c"}'', for example

  1233   \verb,a,'',

  1234

  1235   \item a codepoint according to UTF8 (non-ASCII byte sequence),

  1236

  1237   \item a regular symbol \verb,\,\verb,<,@{text "ident"}\verb,>,'',

  1238   for example \verb,\,\verb,<alpha>,'',

  1239

  1240   \item a control symbol \verb,\,\verb,<^,@{text "ident"}\verb,>,'',

  1241   for example \verb,\,\verb,<^bold>,'',

  1242

  1243   \item a raw symbol \verb,\,\verb,<^raw:,@{text text}\verb,>,''

  1244   where @{text text} consists of printable characters excluding

  1245   \verb,.,'' and \verb,>,'', for example

  1246   \verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'',

  1247

  1248   \item a numbered raw control symbol \verb,\,\verb,<^raw,@{text

  1249   n}\verb,>, where @{text n} consists of digits, for example

  1250   \verb,\,\verb,<^raw42>,''.

  1251

  1252   \end{enumerate}

  1253

  1254   The @{text "ident"} syntax for symbol names is @{text "letter

  1255   (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text

  1256   "digit = 0..9"}.  There are infinitely many regular symbols and

  1257   control symbols, but a fixed collection of standard symbols is

  1258   treated specifically.  For example, \verb,\,\verb,<alpha>,'' is

  1259   classified as a letter, which means it may occur within regular

  1260   Isabelle identifiers.

  1261

  1262   The character set underlying Isabelle symbols is 7-bit ASCII, but

  1263   8-bit character sequences are passed-through unchanged.  Unicode/UCS

  1264   data in UTF-8 encoding is processed in a non-strict fashion, such

  1265   that well-formed code sequences are recognized

  1266   accordingly.\footnote{Note that ISO-Latin-1 differs from UTF-8 only

  1267   in some special punctuation characters that even have replacements

  1268   within the standard collection of Isabelle symbols.  Text consisting

  1269   of ASCII plus accented letters can be processed in either encoding.}

  1270   Unicode provides its own collection of mathematical symbols, but

  1271   within the core Isabelle/ML world there is no link to the standard

  1272   collection of Isabelle regular symbols.

  1273

  1274   \medskip Output of Isabelle symbols depends on the print mode

  1275   \cite{isabelle-isar-ref}.  For example, the standard {\LaTeX}

  1276   setup of the Isabelle document preparation system would present

  1277   \verb,\,\verb,<alpha>,'' as @{text "\<alpha>"}, and

  1278   \verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as @{text "\<^bold>\<alpha>"}.

  1279   On-screen rendering usually works by mapping a finite subset of

  1280   Isabelle symbols to suitable Unicode characters.

  1281 *}

  1282

  1283 text %mlref {*

  1284   \begin{mldecls}

  1285   @{index_ML_type "Symbol.symbol": string} \\

  1286   @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\

  1287   @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\

  1288   @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\

  1289   @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\

  1290   @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\

  1291   \end{mldecls}

  1292   \begin{mldecls}

  1293   @{index_ML_type "Symbol.sym"} \\

  1294   @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\

  1295   \end{mldecls}

  1296

  1297   \begin{description}

  1298

  1299   \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle

  1300   symbols.

  1301

  1302   \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list

  1303   from the packed form.  This function supersedes @{ML

  1304   "String.explode"} for virtually all purposes of manipulating text in

  1305   Isabelle!\footnote{The runtime overhead for exploded strings is

  1306   mainly that of the list structure: individual symbols that happen to

  1307   be a singleton string do not require extra memory in Poly/ML.}

  1308

  1309   \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML

  1310   "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard

  1311   symbols according to fixed syntactic conventions of Isabelle, cf.\

  1312   \cite{isabelle-isar-ref}.

  1313

  1314   \item Type @{ML_type "Symbol.sym"} is a concrete datatype that

  1315   represents the different kinds of symbols explicitly, with

  1316   constructors @{ML "Symbol.Char"}, @{ML "Symbol.Sym"}, @{ML

  1317   "Symbol.UTF8"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"}.

  1318

  1319   \item @{ML "Symbol.decode"} converts the string representation of a

  1320   symbol into the datatype version.

  1321

  1322   \end{description}

  1323

  1324   \paragraph{Historical note.} In the original SML90 standard the

  1325   primitive ML type @{ML_type char} did not exists, and @{ML_text

  1326   "explode: string -> string list"} produced a list of singleton

  1327   strings like @{ML "raw_explode: string -> string list"} in

  1328   Isabelle/ML today.  When SML97 came out, Isabelle did not adopt its

  1329   somewhat anachronistic 8-bit or 16-bit characters, but the idea of

  1330   exploding a string into a list of small strings was extended to

  1331   symbols'' as explained above.  Thus Isabelle sources can refer to

  1332   an infinite store of user-defined symbols, without having to worry

  1333   about the multitude of Unicode encodings that have emerged over the

  1334   years.  *}

  1335

  1336

  1337 section {* Basic data types *}

  1338

  1339 text {* The basis library proposal of SML97 needs to be treated with

  1340   caution.  Many of its operations simply do not fit with important

  1341   Isabelle/ML conventions (like canonical argument order'', see

  1342   \secref{sec:canonical-argument-order}), others cause problems with

  1343   the parallel evaluation model of Isabelle/ML (such as @{ML

  1344   TextIO.print} or @{ML OS.Process.system}).

  1345

  1346   Subsequently we give a brief overview of important operations on

  1347   basic ML data types.

  1348 *}

  1349

  1350

  1351 subsection {* Characters *}

  1352

  1353 text %mlref {*

  1354   \begin{mldecls}

  1355   @{index_ML_type char} \\

  1356   \end{mldecls}

  1357

  1358   \begin{description}

  1359

  1360   \item Type @{ML_type char} is \emph{not} used.  The smallest textual

  1361   unit in Isabelle is represented as a symbol'' (see

  1362   \secref{sec:symbols}).

  1363

  1364   \end{description}

  1365 *}

  1366

  1367

  1368 subsection {* Strings *}

  1369

  1370 text %mlref {*

  1371   \begin{mldecls}

  1372   @{index_ML_type string} \\

  1373   \end{mldecls}

  1374

  1375   \begin{description}

  1376

  1377   \item Type @{ML_type string} represents immutable vectors of 8-bit

  1378   characters.  There are operations in SML to convert back and forth

  1379   to actual byte vectors, which are seldom used.

  1380

  1381   This historically important raw text representation is used for

  1382   Isabelle-specific purposes with the following implicit substructures

  1383   packed into the string content:

  1384

  1385   \begin{enumerate}

  1386

  1387   \item sequence of Isabelle symbols (see also \secref{sec:symbols}),

  1388   with @{ML Symbol.explode} as key operation;

  1389

  1390   \item XML tree structure via YXML (see also \cite{isabelle-sys}),

  1391   with @{ML YXML.parse_body} as key operation.

  1392

  1393   \end{enumerate}

  1394

  1395   Note that Isabelle/ML string literals may refer Isabelle symbols

  1396   like \verb,\,\verb,<alpha>,'' natively, \emph{without} escaping

  1397   the backslash.  This is a consequence of Isabelle treating all

  1398   source text as strings of symbols, instead of raw characters.

  1399

  1400   \end{description}

  1401 *}

  1402

  1403 text %mlex {* The subsequent example illustrates the difference of

  1404   physical addressing of bytes versus logical addressing of symbols in

  1405   Isabelle strings.

  1406 *}

  1407

  1408 ML_val {*

  1409   val s = "\<A>";

  1410

  1411   @{assert} (length (Symbol.explode s) = 1);

  1412   @{assert} (size s = 4);

  1413 *}

  1414

  1415 text {* Note that in Unicode renderings of the symbol @{text "\<A>"},

  1416   variations of encodings like UTF-8 or UTF-16 pose delicate questions

  1417   about the multi-byte representations its codepoint, which is outside

  1418   of the 16-bit address space of the original Unicode standard from

  1419   the 1990-ies.  In Isabelle/ML it is just \verb,\,\verb,<A>,''

  1420   literally, using plain ASCII characters beyond any doubts. *}

  1421

  1422

  1423 subsection {* Integers *}

  1424

  1425 text %mlref {*

  1426   \begin{mldecls}

  1427   @{index_ML_type int} \\

  1428   \end{mldecls}

  1429

  1430   \begin{description}

  1431

  1432   \item Type @{ML_type int} represents regular mathematical integers,

  1433   which are \emph{unbounded}.  Overflow never happens in

  1434   practice.\footnote{The size limit for integer bit patterns in memory

  1435   is 64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.}

  1436   This works uniformly for all supported ML platforms (Poly/ML and

  1437   SML/NJ).

  1438

  1439   Literal integers in ML text are forced to be of this one true

  1440   integer type --- adhoc overloading of SML97 is disabled.

  1441

  1442   Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by

  1443   @{ML_structure Int}.  Structure @{ML_structure Integer} in @{file

  1444   "~~/src/Pure/General/integer.ML"} provides some additional

  1445   operations.

  1446

  1447   \end{description}

  1448 *}

  1449

  1450

  1451 subsection {* Time *}

  1452

  1453 text %mlref {*

  1454   \begin{mldecls}

  1455   @{index_ML_type Time.time} \\

  1456   @{index_ML seconds: "real -> Time.time"} \\

  1457   \end{mldecls}

  1458

  1459   \begin{description}

  1460

  1461   \item Type @{ML_type Time.time} represents time abstractly according

  1462   to the SML97 basis library definition.  This is adequate for

  1463   internal ML operations, but awkward in concrete time specifications.

  1464

  1465   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text

  1466   "s"} (measured in seconds) into an abstract time value.  Floating

  1467   point numbers are easy to use as configuration options in the

  1468   context (see \secref{sec:config-options}) or system preferences that

  1469   are maintained externally.

  1470

  1471   \end{description}

  1472 *}

  1473

  1474

  1475 subsection {* Options *}

  1476

  1477 text %mlref {*

  1478   \begin{mldecls}

  1479   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\

  1480   @{index_ML is_some: "'a option -> bool"} \\

  1481   @{index_ML is_none: "'a option -> bool"} \\

  1482   @{index_ML the: "'a option -> 'a"} \\

  1483   @{index_ML these: "'a list option -> 'a list"} \\

  1484   @{index_ML the_list: "'a option -> 'a list"} \\

  1485   @{index_ML the_default: "'a -> 'a option -> 'a"} \\

  1486   \end{mldecls}

  1487 *}

  1488

  1489 text {* Apart from @{ML Option.map} most other operations defined in

  1490   structure @{ML_structure Option} are alien to Isabelle/ML an never

  1491   used.  The operations shown above are defined in @{file

  1492   "~~/src/Pure/General/basics.ML"}.  *}

  1493

  1494

  1495 subsection {* Lists *}

  1496

  1497 text {* Lists are ubiquitous in ML as simple and light-weight

  1498   collections'' for many everyday programming tasks.  Isabelle/ML

  1499   provides important additions and improvements over operations that

  1500   are predefined in the SML97 library. *}

  1501

  1502 text %mlref {*

  1503   \begin{mldecls}

  1504   @{index_ML cons: "'a -> 'a list -> 'a list"} \\

  1505   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\

  1506   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\

  1507   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\

  1508   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\

  1509   \end{mldecls}

  1510

  1511   \begin{description}

  1512

  1513   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.

  1514

  1515   Tupled infix operators are a historical accident in Standard ML.

  1516   The curried @{ML cons} amends this, but it should be only used when

  1517   partial application is required.

  1518

  1519   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat

  1520   lists as a set-like container that maintains the order of elements.

  1521   See @{file "~~/src/Pure/library.ML"} for the full specifications

  1522   (written in ML).  There are some further derived operations like

  1523   @{ML union} or @{ML inter}.

  1524

  1525   Note that @{ML insert} is conservative about elements that are

  1526   already a @{ML member} of the list, while @{ML update} ensures that

  1527   the latest entry is always put in front.  The latter discipline is

  1528   often more appropriate in declarations of context data

  1529   (\secref{sec:context-data}) that are issued by the user in Isar

  1530   source: later declarations take precedence over earlier ones.

  1531

  1532   \end{description}

  1533 *}

  1534

  1535 text %mlex {* Using canonical @{ML fold} together with @{ML cons} (or

  1536   similar standard operations) alternates the orientation of data.

  1537   The is quite natural and should not be altered forcible by inserting

  1538   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can

  1539   be used in the few situations, where alternation should be

  1540   prevented.

  1541 *}

  1542

  1543 ML {*

  1544   val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

  1545

  1546   val list1 = fold cons items [];

  1547   @{assert} (list1 = rev items);

  1548

  1549   val list2 = fold_rev cons items [];

  1550   @{assert} (list2 = items);

  1551 *}

  1552

  1553 text {* The subsequent example demonstrates how to \emph{merge} two

  1554   lists in a natural way. *}

  1555

  1556 ML {*

  1557   fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;

  1558 *}

  1559

  1560 text {* Here the first list is treated conservatively: only the new

  1561   elements from the second list are inserted.  The inside-out order of

  1562   insertion via @{ML fold_rev} attempts to preserve the order of

  1563   elements in the result.

  1564

  1565   This way of merging lists is typical for context data

  1566   (\secref{sec:context-data}).  See also @{ML merge} as defined in

  1567   @{file "~~/src/Pure/library.ML"}.

  1568 *}

  1569

  1570

  1571 subsection {* Association lists *}

  1572

  1573 text {* The operations for association lists interpret a concrete list

  1574   of pairs as a finite function from keys to values.  Redundant

  1575   representations with multiple occurrences of the same key are

  1576   implicitly normalized: lookup and update only take the first

  1577   occurrence into account.

  1578 *}

  1579

  1580 text {*

  1581   \begin{mldecls}

  1582   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\

  1583   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\

  1584   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\

  1585   \end{mldecls}

  1586

  1587   \begin{description}

  1588

  1589   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}

  1590   implement the main framework operations'' for mappings in

  1591   Isabelle/ML, following standard conventions for their names and

  1592   types.

  1593

  1594   Note that a function called @{text lookup} is obliged to express its

  1595   partiality via an explicit option element.  There is no choice to

  1596   raise an exception, without changing the name to something like

  1597   @{text "the_element"} or @{text "get"}.

  1598

  1599   The @{text "defined"} operation is essentially a contraction of @{ML

  1600   is_some} and @{text "lookup"}, but this is sufficiently frequent to

  1601   justify its independent existence.  This also gives the

  1602   implementation some opportunity for peep-hole optimization.

  1603

  1604   \end{description}

  1605

  1606   Association lists are adequate as simple and light-weight

  1607   implementation of finite mappings in many practical situations.  A

  1608   more heavy-duty table structure is defined in @{file

  1609   "~~/src/Pure/General/table.ML"}; that version scales easily to

  1610   thousands or millions of elements.

  1611 *}

  1612

  1613

  1614 subsection {* Unsynchronized references *}

  1615

  1616 text %mlref {*

  1617   \begin{mldecls}

  1618   @{index_ML_type "'a Unsynchronized.ref"} \\

  1619   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\

  1620   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\

  1621   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\

  1622   \end{mldecls}

  1623 *}

  1624

  1625 text {* Due to ubiquitous parallelism in Isabelle/ML (see also

  1626   \secref{sec:multi-threading}), the mutable reference cells of

  1627   Standard ML are notorious for causing problems.  In a highly

  1628   parallel system, both correctness \emph{and} performance are easily

  1629   degraded when using mutable data.

  1630

  1631   The unwieldy name of @{ML Unsynchronized.ref} for the constructor

  1632   for references in Isabelle/ML emphasizes the inconveniences caused by

  1633   mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are

  1634   unchanged, but should be used with special precautions, say in a

  1635   strictly local situation that is guaranteed to be restricted to

  1636   sequential evaluation --- now and in the future.

  1637

  1638   \begin{warn}

  1639   Never @{ML_text "open Unsynchronized"}, not even in a local scope!

  1640   Pretending that mutable state is no problem is a very bad idea.

  1641   \end{warn}

  1642 *}

  1643

  1644

  1645 section {* Thread-safe programming \label{sec:multi-threading} *}

  1646

  1647 text {* Multi-threaded execution has become an everyday reality in

  1648   Isabelle since Poly/ML 5.2.1 and Isabelle2008.  Isabelle/ML provides

  1649   implicit and explicit parallelism by default, and there is no way

  1650   for user-space tools to opt out''.  ML programs that are purely

  1651   functional, output messages only via the official channels

  1652   (\secref{sec:message-channels}), and do not intercept interrupts

  1653   (\secref{sec:exceptions}) can participate in the multi-threaded

  1654   environment immediately without further ado.

  1655

  1656   More ambitious tools with more fine-grained interaction with the

  1657   environment need to observe the principles explained below.

  1658 *}

  1659

  1660

  1661 subsection {* Multi-threading with shared memory *}

  1662

  1663 text {* Multiple threads help to organize advanced operations of the

  1664   system, such as real-time conditions on command transactions,

  1665   sub-components with explicit communication, general asynchronous

  1666   interaction etc.  Moreover, parallel evaluation is a prerequisite to

  1667   make adequate use of the CPU resources that are available on

  1668   multi-core systems.\footnote{Multi-core computing does not mean that

  1669   there are spare cycles'' to be wasted.  It means that the

  1670   continued exponential speedup of CPU performance due to Moore's

  1671   Law'' follows different rules: clock frequency has reached its peak

  1672   around 2005, and applications need to be parallelized in order to

  1673   avoid a perceived loss of performance.  See also

  1674   \cite{Sutter:2005}.}

  1675

  1676   Isabelle/Isar exploits the inherent structure of theories and proofs

  1677   to support \emph{implicit parallelism} to a large extent.  LCF-style

  1678   theorem provides almost ideal conditions for that, see also

  1679   \cite{Wenzel:2009}.  This means, significant parts of theory and

  1680   proof checking is parallelized by default.  In Isabelle2013, a

  1681   maximum speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be

  1682   expected.

  1683

  1684   \medskip ML threads lack the memory protection of separate

  1685   processes, and operate concurrently on shared heap memory.  This has

  1686   the advantage that results of independent computations are directly

  1687   available to other threads: abstract values can be passed without

  1688   copying or awkward serialization that is typically required for

  1689   separate processes.

  1690

  1691   To make shared-memory multi-threading work robustly and efficiently,

  1692   some programming guidelines need to be observed.  While the ML

  1693   system is responsible to maintain basic integrity of the

  1694   representation of ML values in memory, the application programmer

  1695   needs to ensure that multi-threaded execution does not break the

  1696   intended semantics.

  1697

  1698   \begin{warn}

  1699   To participate in implicit parallelism, tools need to be

  1700   thread-safe.  A single ill-behaved tool can affect the stability and

  1701   performance of the whole system.

  1702   \end{warn}

  1703

  1704   Apart from observing the principles of thread-safeness passively,

  1705   advanced tools may also exploit parallelism actively, e.g.\ by using

  1706   future values'' (\secref{sec:futures}) or the more basic library

  1707   functions for parallel list operations (\secref{sec:parlist}).

  1708

  1709   \begin{warn}

  1710   Parallel computing resources are managed centrally by the

  1711   Isabelle/ML infrastructure.  User programs must not fork their own

  1712   ML threads to perform computations.

  1713   \end{warn}

  1714 *}

  1715

  1716

  1717 subsection {* Critical shared resources *}

  1718

  1719 text {* Thread-safeness is mainly concerned about concurrent

  1720   read/write access to shared resources, which are outside the purely

  1721   functional world of ML.  This covers the following in particular.

  1722

  1723   \begin{itemize}

  1724

  1725   \item Global references (or arrays), i.e.\ mutable memory cells that

  1726   persist over several invocations of associated

  1727   operations.\footnote{This is independent of the visibility of such

  1728   mutable values in the toplevel scope.}

  1729

  1730   \item Global state of the running Isabelle/ML process, i.e.\ raw I/O

  1731   channels, environment variables, current working directory.

  1732

  1733   \item Writable resources in the file-system that are shared among

  1734   different threads or external processes.

  1735

  1736   \end{itemize}

  1737

  1738   Isabelle/ML provides various mechanisms to avoid critical shared

  1739   resources in most situations.  As last resort there are some

  1740   mechanisms for explicit synchronization.  The following guidelines

  1741   help to make Isabelle/ML programs work smoothly in a concurrent

  1742   environment.

  1743

  1744   \begin{itemize}

  1745

  1746   \item Avoid global references altogether.  Isabelle/Isar maintains a

  1747   uniform context that incorporates arbitrary data declared by user

  1748   programs (\secref{sec:context-data}).  This context is passed as

  1749   plain value and user tools can get/map their own data in a purely

  1750   functional manner.  Configuration options within the context

  1751   (\secref{sec:config-options}) provide simple drop-in replacements

  1752   for historic reference variables.

  1753

  1754   \item Keep components with local state information re-entrant.

  1755   Instead of poking initial values into (private) global references, a

  1756   new state record can be created on each invocation, and passed

  1757   through any auxiliary functions of the component.  The state record

  1758   may well contain mutable references, without requiring any special

  1759   synchronizations, as long as each invocation gets its own copy.

  1760

  1761   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The

  1762   Poly/ML library is thread-safe for each individual output operation,

  1763   but the ordering of parallel invocations is arbitrary.  This means

  1764   raw output will appear on some system console with unpredictable

  1765   interleaving of atomic chunks.

  1766

  1767   Note that this does not affect regular message output channels

  1768   (\secref{sec:message-channels}).  An official message is associated

  1769   with the command transaction from where it originates, independently

  1770   of other transactions.  This means each running Isar command has

  1771   effectively its own set of message channels, and interleaving can

  1772   only happen when commands use parallelism internally (and only at

  1773   message boundaries).

  1774

  1775   \item Treat environment variables and the current working directory

  1776   of the running process as strictly read-only.

  1777

  1778   \item Restrict writing to the file-system to unique temporary files.

  1779   Isabelle already provides a temporary directory that is unique for

  1780   the running process, and there is a centralized source of unique

  1781   serial numbers in Isabelle/ML.  Thus temporary files that are passed

  1782   to to some external process will be always disjoint, and thus

  1783   thread-safe.

  1784

  1785   \end{itemize}

  1786 *}

  1787

  1788 text %mlref {*

  1789   \begin{mldecls}

  1790   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\

  1791   @{index_ML serial_string: "unit -> string"} \\

  1792   \end{mldecls}

  1793

  1794   \begin{description}

  1795

  1796   \item @{ML File.tmp_path}~@{text "path"} relocates the base

  1797   component of @{text "path"} into the unique temporary directory of

  1798   the running Isabelle/ML process.

  1799

  1800   \item @{ML serial_string}~@{text "()"} creates a new serial number

  1801   that is unique over the runtime of the Isabelle/ML process.

  1802

  1803   \end{description}

  1804 *}

  1805

  1806 text %mlex {* The following example shows how to create unique

  1807   temporary file names.

  1808 *}

  1809

  1810 ML {*

  1811   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));

  1812   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));

  1813   @{assert} (tmp1 <> tmp2);

  1814 *}

  1815

  1816

  1817 subsection {* Explicit synchronization *}

  1818

  1819 text {* Isabelle/ML also provides some explicit synchronization

  1820   mechanisms, for the rare situations where mutable shared resources

  1821   are really required.  These are based on the synchronizations

  1822   primitives of Poly/ML, which have been adapted to the specific

  1823   assumptions of the concurrent Isabelle/ML environment.  User code

  1824   must not use the Poly/ML primitives directly!

  1825

  1826   \medskip The most basic synchronization concept is a single

  1827   \emph{critical section} (also called monitor'' in the literature).

  1828   A thread that enters the critical section prevents all other threads

  1829   from doing the same.  A thread that is already within the critical

  1830   section may re-enter it in an idempotent manner.

  1831

  1832   Such centralized locking is convenient, because it prevents

  1833   deadlocks by construction.

  1834

  1835   \medskip More fine-grained locking works via \emph{synchronized

  1836   variables}.  An explicit state component is associated with

  1837   mechanisms for locking and signaling.  There are operations to

  1838   await a condition, change the state, and signal the change to all

  1839   other waiting threads.

  1840

  1841   Here the synchronized access to the state variable is \emph{not}

  1842   re-entrant: direct or indirect nesting within the same thread will

  1843   cause a deadlock!

  1844 *}

  1845

  1846 text %mlref {*

  1847   \begin{mldecls}

  1848   @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\

  1849   @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\

  1850   \end{mldecls}

  1851   \begin{mldecls}

  1852   @{index_ML_type "'a Synchronized.var"} \\

  1853   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\

  1854   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->

  1855   ('a -> ('b * 'a) option) -> 'b"} \\

  1856   \end{mldecls}

  1857

  1858   \begin{description}

  1859

  1860   \item @{ML NAMED_CRITICAL}~@{text "name e"} evaluates @{text "e ()"}

  1861   within the central critical section of Isabelle/ML.  No other thread

  1862   may do so at the same time, but non-critical parallel execution will

  1863   continue.  The @{text "name"} argument is used for tracing and might

  1864   help to spot sources of congestion.

  1865

  1866   Entering the critical section without contention is very fast.  Each

  1867   thread should stay within the critical section only very briefly,

  1868   otherwise parallel performance may degrade.

  1869

  1870   \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty

  1871   name argument.

  1872

  1873   \item Type @{ML_type "'a Synchronized.var"} represents synchronized

  1874   variables with state of type @{ML_type 'a}.

  1875

  1876   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized

  1877   variable that is initialized with value @{text "x"}.  The @{text

  1878   "name"} is used for tracing.

  1879

  1880   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the

  1881   function @{text "f"} operate within a critical section on the state

  1882   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it

  1883   continues to wait on the internal condition variable, expecting that

  1884   some other thread will eventually change the content in a suitable

  1885   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is

  1886   satisfied and assigns the new state value @{text "x'"}, broadcasts a

  1887   signal to all waiting threads on the associated condition variable,

  1888   and returns the result @{text "y"}.

  1889

  1890   \end{description}

  1891

  1892   There are some further variants of the @{ML

  1893   Synchronized.guarded_access} combinator, see @{file

  1894   "~~/src/Pure/Concurrent/synchronized.ML"} for details.

  1895 *}

  1896

  1897 text %mlex {* The following example implements a counter that produces

  1898   positive integers that are unique over the runtime of the Isabelle

  1899   process:

  1900 *}

  1901

  1902 ML {*

  1903   local

  1904     val counter = Synchronized.var "counter" 0;

  1905   in

  1906     fun next () =

  1907       Synchronized.guarded_access counter

  1908         (fn i =>

  1909           let val j = i + 1

  1910           in SOME (j, j) end);

  1911   end;

  1912 *}

  1913

  1914 ML {*

  1915   val a = next ();

  1916   val b = next ();

  1917   @{assert} (a <> b);

  1918 *}

  1919

  1920 text {* \medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how

  1921   to implement a mailbox as synchronized variable over a purely

  1922   functional queue. *}

  1923

  1924

  1925 section {* Managed evaluation *}

  1926

  1927 text {* Execution of Standard ML follows the model of strict

  1928   functional evaluation with optional exceptions.  Evaluation happens

  1929   whenever some function is applied to (sufficiently many)

  1930   arguments. The result is either an explicit value or an implicit

  1931   exception.

  1932

  1933   \emph{Managed evaluation} in Isabelle/ML organizes expressions and

  1934   results to control certain physical side-conditions, to say more

  1935   specifically when and how evaluation happens.  For example, the

  1936   Isabelle/ML library supports lazy evaluation with memoing, parallel

  1937   evaluation via futures, asynchronous evaluation via promises,

  1938   evaluation with time limit etc.

  1939

  1940   \medskip An \emph{unevaluated expression} is represented either as

  1941   unit abstraction @{verbatim "fn () => a"} of type

  1942   @{verbatim "unit -> 'a"} or as regular function

  1943   @{verbatim "fn a => b"} of type @{verbatim "'a -> 'b"}.  Both forms

  1944   occur routinely, and special care is required to tell them apart ---

  1945   the static type-system of SML is only of limited help here.

  1946

  1947   The first form is more intuitive: some combinator @{text "(unit ->

  1948   'a) -> 'a"} applies the given function to @{text "()"} to initiate

  1949   the postponed evaluation process.  The second form is more flexible:

  1950   some combinator @{text "('a -> 'b) -> 'a -> 'b"} acts like a

  1951   modified form of function application; several such combinators may

  1952   be cascaded to modify a given function, before it is ultimately

  1953   applied to some argument.

  1954

  1955   \medskip \emph{Reified results} make the disjoint sum of regular

  1956   values versions exceptional situations explicit as ML datatype:

  1957   @{text "'a result = Res of 'a | Exn of exn"}.  This is typically

  1958   used for administrative purposes, to store the overall outcome of an

  1959   evaluation process.

  1960

  1961   \emph{Parallel exceptions} aggregate reified results, such that

  1962   multiple exceptions are digested as a collection in canonical form

  1963   that identifies exceptions according to their original occurrence.

  1964   This is particular important for parallel evaluation via futures

  1965   \secref{sec:futures}, which are organized as acyclic graph of

  1966   evaluations that depend on other evaluations: exceptions stemming

  1967   from shared sub-graphs are exposed exactly once and in the order of

  1968   their original occurrence (e.g.\ when printed at the toplevel).

  1969   Interrupt counts as neutral element here: it is treated as minimal

  1970   information about some canceled evaluation process, and is absorbed

  1971   by the presence of regular program exceptions.  *}

  1972

  1973 text %mlref {*

  1974   \begin{mldecls}

  1975   @{index_ML_type "'a Exn.result"} \\

  1976   @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\

  1977   @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\

  1978   @{index_ML Exn.release: "'a Exn.result -> 'a"} \\

  1979   @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\

  1980   @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\

  1981   \end{mldecls}

  1982

  1983   \begin{description}

  1984

  1985   \item Type @{ML_type "'a Exn.result"} represents the disjoint sum of

  1986   ML results explicitly, with constructor @{ML Exn.Res} for regular

  1987   values and @{ML "Exn.Exn"} for exceptions.

  1988

  1989   \item @{ML Exn.capture}~@{text "f x"} manages the evaluation of

  1990   @{text "f x"} such that exceptions are made explicit as @{ML

  1991   "Exn.Exn"}.  Note that this includes physical interrupts (see also

  1992   \secref{sec:exceptions}), so the same precautions apply to user

  1993   code: interrupts must not be absorbed accidentally!

  1994

  1995   \item @{ML Exn.interruptible_capture} is similar to @{ML

  1996   Exn.capture}, but interrupts are immediately re-raised as required

  1997   for user code.

  1998

  1999   \item @{ML Exn.release}~@{text "result"} releases the original

  2000   runtime result, exposing its regular value or raising the reified

  2001   exception.

  2002

  2003   \item @{ML Par_Exn.release_all}~@{text "results"} combines results

  2004   that were produced independently (e.g.\ by parallel evaluation).  If

  2005   all results are regular values, that list is returned.  Otherwise,

  2006   the collection of all exceptions is raised, wrapped-up as collective

  2007   parallel exception.  Note that the latter prevents access to

  2008   individual exceptions by conventional @{verbatim "handle"} of SML.

  2009

  2010   \item @{ML Par_Exn.release_first} is similar to @{ML

  2011   Par_Exn.release_all}, but only the first exception that has occurred

  2012   in the original evaluation process is raised again, the others are

  2013   ignored.  That single exception may get handled by conventional

  2014   means in SML.

  2015

  2016   \end{description}

  2017 *}

  2018

  2019

  2020 subsection {* Parallel skeletons \label{sec:parlist} *}

  2021

  2022 text {*

  2023   Algorithmic skeletons are combinators that operate on lists in

  2024   parallel, in the manner of well-known @{text map}, @{text exists},

  2025   @{text forall} etc.  Management of futures (\secref{sec:futures})

  2026   and their results as reified exceptions is wrapped up into simple

  2027   programming interfaces that resemble the sequential versions.

  2028

  2029   What remains is the application-specific problem to present

  2030   expressions with suitable \emph{granularity}: each list element

  2031   corresponds to one evaluation task.  If the granularity is too

  2032   coarse, the available CPUs are not saturated.  If it is too

  2033   fine-grained, CPU cycles are wasted due to the overhead of

  2034   organizing parallel processing.  In the worst case, parallel

  2035   performance will be less than the sequential counterpart!

  2036 *}

  2037

  2038 text %mlref {*

  2039   \begin{mldecls}

  2040   @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\

  2041   @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\

  2042   \end{mldecls}

  2043

  2044   \begin{description}

  2045

  2046   \item @{ML Par_List.map}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} is like @{ML

  2047   "map"}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"}, but the evaluation of @{text "f x\<^sub>i"}

  2048   for @{text "i = 1, \<dots>, n"} is performed in parallel.

  2049

  2050   An exception in any @{text "f x\<^sub>i"} cancels the overall evaluation

  2051   process.  The final result is produced via @{ML

  2052   Par_Exn.release_first} as explained above, which means the first

  2053   program exception that happened to occur in the parallel evaluation

  2054   is propagated, and all other failures are ignored.

  2055

  2056   \item @{ML Par_List.get_some}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} produces some

  2057   @{text "f x\<^sub>i"} that is of the form @{text "SOME y\<^sub>i"}, if that

  2058   exists, otherwise @{text "NONE"}.  Thus it is similar to @{ML

  2059   Library.get_first}, but subject to a non-deterministic parallel

  2060   choice process.  The first successful result cancels the overall

  2061   evaluation process; other exceptions are propagated as for @{ML

  2062   Par_List.map}.

  2063

  2064   This generic parallel choice combinator is the basis for derived

  2065   forms, such as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML

  2066   Par_List.forall}.

  2067

  2068   \end{description}

  2069 *}

  2070

  2071 text %mlex {* Subsequently, the Ackermann function is evaluated in

  2072   parallel for some ranges of arguments. *}

  2073

  2074 ML_val {*

  2075   fun ackermann 0 n = n + 1

  2076     | ackermann m 0 = ackermann (m - 1) 1

  2077     | ackermann m n = ackermann (m - 1) (ackermann m (n - 1));

  2078

  2079   Par_List.map (ackermann 2) (500 upto 1000);

  2080   Par_List.map (ackermann 3) (5 upto 10);

  2081 *}

  2082

  2083

  2084 subsection {* Lazy evaluation *}

  2085

  2086 text {*

  2087   %FIXME

  2088

  2089   See also @{file "~~/src/Pure/Concurrent/lazy.ML"}.

  2090 *}

  2091

  2092

  2093 subsection {* Future values \label{sec:futures} *}

  2094

  2095 text {*

  2096   %FIXME

  2097

  2098   See also @{file "~~/src/Pure/Concurrent/future.ML"}.

  2099 *}

  2100

  2101

  2102 end