doc-src/IsarImplementation/ML.thy
changeset 48985 5386df44a037
parent 48984 f51d4a302962
child 48986 037d32448e29
equal deleted inserted replaced
48984:f51d4a302962 48985:5386df44a037
     1 theory "ML"
       
     2 imports Base
       
     3 begin
       
     4 
       
     5 chapter {* Isabelle/ML *}
       
     6 
       
     7 text {* Isabelle/ML is best understood as a certain culture based on
       
     8   Standard ML.  Thus it is not a new programming language, but a
       
     9   certain way to use SML at an advanced level within the Isabelle
       
    10   environment.  This covers a variety of aspects that are geared
       
    11   towards an efficient and robust platform for applications of formal
       
    12   logic with fully foundational proof construction --- according to
       
    13   the well-known \emph{LCF principle}.  There is specific
       
    14   infrastructure with library modules to address the needs of this
       
    15   difficult task.  For example, the raw parallel programming model of
       
    16   Poly/ML is presented as considerably more abstract concept of
       
    17   \emph{future values}, which is then used to augment the inference
       
    18   kernel, proof interpreter, and theory loader accordingly.
       
    19 
       
    20   The main aspects of Isabelle/ML are introduced below.  These
       
    21   first-hand explanations should help to understand how proper
       
    22   Isabelle/ML is to be read and written, and to get access to the
       
    23   wealth of experience that is expressed in the source text and its
       
    24   history of changes.\footnote{See
       
    25   \url{http://isabelle.in.tum.de/repos/isabelle} for the full
       
    26   Mercurial history.  There are symbolic tags to refer to official
       
    27   Isabelle releases, as opposed to arbitrary \emph{tip} versions that
       
    28   merely reflect snapshots that are never really up-to-date.}  *}
       
    29 
       
    30 
       
    31 section {* Style and orthography *}
       
    32 
       
    33 text {* The sources of Isabelle/Isar are optimized for
       
    34   \emph{readability} and \emph{maintainability}.  The main purpose is
       
    35   to tell an informed reader what is really going on and how things
       
    36   really work.  This is a non-trivial aim, but it is supported by a
       
    37   certain style of writing Isabelle/ML that has emerged from long
       
    38   years of system development.\footnote{See also the interesting style
       
    39   guide for OCaml
       
    40   \url{http://caml.inria.fr/resources/doc/guides/guidelines.en.html}
       
    41   which shares many of our means and ends.}
       
    42 
       
    43   The main principle behind any coding style is \emph{consistency}.
       
    44   For a single author of a small program this merely means ``choose
       
    45   your style and stick to it''.  A complex project like Isabelle, with
       
    46   long years of development and different contributors, requires more
       
    47   standardization.  A coding style that is changed every few years or
       
    48   with every new contributor is no style at all, because consistency
       
    49   is quickly lost.  Global consistency is hard to achieve, though.
       
    50   Nonetheless, one should always strive at least for local consistency
       
    51   of modules and sub-systems, without deviating from some general
       
    52   principles how to write Isabelle/ML.
       
    53 
       
    54   In a sense, good coding style is like an \emph{orthography} for the
       
    55   sources: it helps to read quickly over the text and see through the
       
    56   main points, without getting distracted by accidental presentation
       
    57   of free-style code.
       
    58 *}
       
    59 
       
    60 
       
    61 subsection {* Header and sectioning *}
       
    62 
       
    63 text {* Isabelle source files have a certain standardized header
       
    64   format (with precise spacing) that follows ancient traditions
       
    65   reaching back to the earliest versions of the system by Larry
       
    66   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.
       
    67 
       
    68   The header includes at least @{verbatim Title} and @{verbatim
       
    69   Author} entries, followed by a prose description of the purpose of
       
    70   the module.  The latter can range from a single line to several
       
    71   paragraphs of explanations.
       
    72 
       
    73   The rest of the file is divided into sections, subsections,
       
    74   subsubsections, paragraphs etc.\ using a simple layout via ML
       
    75   comments as follows.
       
    76 
       
    77 \begin{verbatim}
       
    78 (*** section ***)
       
    79 
       
    80 (** subsection **)
       
    81 
       
    82 (* subsubsection *)
       
    83 
       
    84 (*short paragraph*)
       
    85 
       
    86 (*
       
    87   long paragraph,
       
    88   with more text
       
    89 *)
       
    90 \end{verbatim}
       
    91 
       
    92   As in regular typography, there is some extra space \emph{before}
       
    93   section headings that are adjacent to plain text (not other headings
       
    94   as in the example above).
       
    95 
       
    96   \medskip The precise wording of the prose text given in these
       
    97   headings is chosen carefully to introduce the main theme of the
       
    98   subsequent formal ML text.
       
    99 *}
       
   100 
       
   101 
       
   102 subsection {* Naming conventions *}
       
   103 
       
   104 text {* Since ML is the primary medium to express the meaning of the
       
   105   source text, naming of ML entities requires special care.
       
   106 
       
   107   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely
       
   108   4, but not more) that are separated by underscore.  There are three
       
   109   variants concerning upper or lower case letters, which are used for
       
   110   certain ML categories as follows:
       
   111 
       
   112   \medskip
       
   113   \begin{tabular}{lll}
       
   114   variant & example & ML categories \\\hline
       
   115   lower-case & @{ML_text foo_bar} & values, types, record fields \\
       
   116   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
       
   117   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
       
   118   \end{tabular}
       
   119   \medskip
       
   120 
       
   121   For historical reasons, many capitalized names omit underscores,
       
   122   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.
       
   123   Genuine mixed-case names are \emph{not} used, because clear division
       
   124   of words is essential for readability.\footnote{Camel-case was
       
   125   invented to workaround the lack of underscore in some early
       
   126   non-ASCII character sets.  Later it became habitual in some language
       
   127   communities that are now strong in numbers.}
       
   128 
       
   129   A single (capital) character does not count as ``word'' in this
       
   130   respect: some Isabelle/ML names are suffixed by extra markers like
       
   131   this: @{ML_text foo_barT}.
       
   132 
       
   133   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
       
   134   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
       
   135   foo''''} or more.  Decimal digits scale better to larger numbers,
       
   136   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
       
   137 
       
   138   \paragraph{Scopes.}  Apart from very basic library modules, ML
       
   139   structures are not ``opened'', but names are referenced with
       
   140   explicit qualification, as in @{ML Syntax.string_of_term} for
       
   141   example.  When devising names for structures and their components it
       
   142   is important aim at eye-catching compositions of both parts, because
       
   143   this is how they are seen in the sources and documentation.  For the
       
   144   same reasons, aliases of well-known library functions should be
       
   145   avoided.
       
   146 
       
   147   Local names of function abstraction or case/let bindings are
       
   148   typically shorter, sometimes using only rudiments of ``words'',
       
   149   while still avoiding cryptic shorthands.  An auxiliary function
       
   150   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is
       
   151   considered bad style.
       
   152 
       
   153   Example:
       
   154 
       
   155   \begin{verbatim}
       
   156   (* RIGHT *)
       
   157 
       
   158   fun print_foo ctxt foo =
       
   159     let
       
   160       fun print t = ... Syntax.string_of_term ctxt t ...
       
   161     in ... end;
       
   162 
       
   163 
       
   164   (* RIGHT *)
       
   165 
       
   166   fun print_foo ctxt foo =
       
   167     let
       
   168       val string_of_term = Syntax.string_of_term ctxt;
       
   169       fun print t = ... string_of_term t ...
       
   170     in ... end;
       
   171 
       
   172 
       
   173   (* WRONG *)
       
   174 
       
   175   val string_of_term = Syntax.string_of_term;
       
   176 
       
   177   fun print_foo ctxt foo =
       
   178     let
       
   179       fun aux t = ... string_of_term ctxt t ...
       
   180     in ... end;
       
   181 
       
   182   \end{verbatim}
       
   183 
       
   184 
       
   185   \paragraph{Specific conventions.} Here are some specific name forms
       
   186   that occur frequently in the sources.
       
   187 
       
   188   \begin{itemize}
       
   189 
       
   190   \item A function that maps @{ML_text foo} to @{ML_text bar} is
       
   191   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
       
   192   @{ML_text foo2bar}, @{ML_text bar_from_foo}, @{ML_text
       
   193   bar_for_foo}, or @{ML_text bar4foo}).
       
   194 
       
   195   \item The name component @{ML_text legacy} means that the operation
       
   196   is about to be discontinued soon.
       
   197 
       
   198   \item The name component @{ML_text old} means that this is historic
       
   199   material that might disappear at some later stage.
       
   200 
       
   201   \item The name component @{ML_text global} means that this works
       
   202   with the background theory instead of the regular local context
       
   203   (\secref{sec:context}), sometimes for historical reasons, sometimes
       
   204   due a genuine lack of locality of the concept involved, sometimes as
       
   205   a fall-back for the lack of a proper context in the application
       
   206   code.  Whenever there is a non-global variant available, the
       
   207   application should be migrated to use it with a proper local
       
   208   context.
       
   209 
       
   210   \item Variables of the main context types of the Isabelle/Isar
       
   211   framework (\secref{sec:context} and \chref{ch:local-theory}) have
       
   212   firm naming conventions as follows:
       
   213 
       
   214   \begin{itemize}
       
   215 
       
   216   \item theories are called @{ML_text thy}, rarely @{ML_text theory}
       
   217   (never @{ML_text thry})
       
   218 
       
   219   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text
       
   220   context} (never @{ML_text ctx})
       
   221 
       
   222   \item generic contexts are called @{ML_text context}, rarely
       
   223   @{ML_text ctxt}
       
   224 
       
   225   \item local theories are called @{ML_text lthy}, except for local
       
   226   theories that are treated as proof context (which is a semantic
       
   227   super-type)
       
   228 
       
   229   \end{itemize}
       
   230 
       
   231   Variations with primed or decimal numbers are always possible, as
       
   232   well as sematic prefixes like @{ML_text foo_thy} or @{ML_text
       
   233   bar_ctxt}, but the base conventions above need to be preserved.
       
   234   This allows to visualize the their data flow via plain regular
       
   235   expressions in the editor.
       
   236 
       
   237   \item The main logical entities (\secref{ch:logic}) have established
       
   238   naming convention as follows:
       
   239 
       
   240   \begin{itemize}
       
   241 
       
   242   \item sorts are called @{ML_text S}
       
   243 
       
   244   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text
       
   245   ty} (never @{ML_text t})
       
   246 
       
   247   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text
       
   248   tm} (never @{ML_text trm})
       
   249 
       
   250   \item certified types are called @{ML_text cT}, rarely @{ML_text
       
   251   T}, with variants as for types
       
   252 
       
   253   \item certified terms are called @{ML_text ct}, rarely @{ML_text
       
   254   t}, with variants as for terms
       
   255 
       
   256   \item theorems are called @{ML_text th}, or @{ML_text thm}
       
   257 
       
   258   \end{itemize}
       
   259 
       
   260   Proper semantic names override these conventions completely.  For
       
   261   example, the left-hand side of an equation (as a term) can be called
       
   262   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known
       
   263   to be a variable can be called @{ML_text v} or @{ML_text x}.
       
   264 
       
   265   \item Tactics (\secref{sec:tactics}) are sufficiently important to
       
   266   have specific naming conventions.  The name of a basic tactic
       
   267   definition always has a @{ML_text "_tac"} suffix, the subgoal index
       
   268   (if applicable) is always called @{ML_text i}, and the goal state
       
   269   (if made explicit) is usually called @{ML_text st} instead of the
       
   270   somewhat misleading @{ML_text thm}.  Any other arguments are given
       
   271   before the latter two, and the general context is given first.
       
   272   Example:
       
   273 
       
   274   \begin{verbatim}
       
   275   fun my_tac ctxt arg1 arg2 i st = ...
       
   276   \end{verbatim}
       
   277 
       
   278   Note that the goal state @{ML_text st} above is rarely made
       
   279   explicit, if tactic combinators (tacticals) are used as usual.
       
   280 
       
   281   \end{itemize}
       
   282 *}
       
   283 
       
   284 
       
   285 subsection {* General source layout *}
       
   286 
       
   287 text {* The general Isabelle/ML source layout imitates regular
       
   288   type-setting to some extent, augmented by the requirements for
       
   289   deeply nested expressions that are commonplace in functional
       
   290   programming.
       
   291 
       
   292   \paragraph{Line length} is 80 characters according to ancient
       
   293   standards, but we allow as much as 100 characters (not
       
   294   more).\footnote{Readability requires to keep the beginning of a line
       
   295   in view while watching its end.  Modern wide-screen displays do not
       
   296   change the way how the human brain works.  Sources also need to be
       
   297   printable on plain paper with reasonable font-size.} The extra 20
       
   298   characters acknowledge the space requirements due to qualified
       
   299   library references in Isabelle/ML.
       
   300 
       
   301   \paragraph{White-space} is used to emphasize the structure of
       
   302   expressions, following mostly standard conventions for mathematical
       
   303   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
       
   304   defines positioning of spaces for parentheses, punctuation, and
       
   305   infixes as illustrated here:
       
   306 
       
   307   \begin{verbatim}
       
   308   val x = y + z * (a + b);
       
   309   val pair = (a, b);
       
   310   val record = {foo = 1, bar = 2};
       
   311   \end{verbatim}
       
   312 
       
   313   Lines are normally broken \emph{after} an infix operator or
       
   314   punctuation character.  For example:
       
   315 
       
   316   \begin{verbatim}
       
   317   val x =
       
   318     a +
       
   319     b +
       
   320     c;
       
   321 
       
   322   val tuple =
       
   323    (a,
       
   324     b,
       
   325     c);
       
   326   \end{verbatim}
       
   327 
       
   328   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the
       
   329   start of the line, but punctuation is always at the end.
       
   330 
       
   331   Function application follows the tradition of @{text "\<lambda>"}-calculus,
       
   332   not informal mathematics.  For example: @{ML_text "f a b"} for a
       
   333   curried function, or @{ML_text "g (a, b)"} for a tupled function.
       
   334   Note that the space between @{ML_text g} and the pair @{ML_text
       
   335   "(a, b)"} follows the important principle of
       
   336   \emph{compositionality}: the layout of @{ML_text "g p"} does not
       
   337   change when @{ML_text "p"} is refined to the concrete pair
       
   338   @{ML_text "(a, b)"}.
       
   339 
       
   340   \paragraph{Indentation} uses plain spaces, never hard
       
   341   tabulators.\footnote{Tabulators were invented to move the carriage
       
   342   of a type-writer to certain predefined positions.  In software they
       
   343   could be used as a primitive run-length compression of consecutive
       
   344   spaces, but the precise result would depend on non-standardized
       
   345   editor configuration.}
       
   346 
       
   347   Each level of nesting is indented by 2 spaces, sometimes 1, very
       
   348   rarely 4, never 8 or any other odd number.
       
   349 
       
   350   Indentation follows a simple logical format that only depends on the
       
   351   nesting depth, not the accidental length of the text that initiates
       
   352   a level of nesting.  Example:
       
   353 
       
   354   \begin{verbatim}
       
   355   (* RIGHT *)
       
   356 
       
   357   if b then
       
   358     expr1_part1
       
   359     expr1_part2
       
   360   else
       
   361     expr2_part1
       
   362     expr2_part2
       
   363 
       
   364 
       
   365   (* WRONG *)
       
   366 
       
   367   if b then expr1_part1
       
   368             expr1_part2
       
   369   else expr2_part1
       
   370        expr2_part2
       
   371   \end{verbatim}
       
   372 
       
   373   The second form has many problems: it assumes a fixed-width font
       
   374   when viewing the sources, it uses more space on the line and thus
       
   375   makes it hard to observe its strict length limit (working against
       
   376   \emph{readability}), it requires extra editing to adapt the layout
       
   377   to changes of the initial text (working against
       
   378   \emph{maintainability}) etc.
       
   379 
       
   380   \medskip For similar reasons, any kind of two-dimensional or tabular
       
   381   layouts, ASCII-art with lines or boxes of asterisks etc.\ should be
       
   382   avoided.
       
   383 
       
   384   \paragraph{Complex expressions} that consist of multi-clausal
       
   385   function definitions, @{ML_text handle}, @{ML_text case},
       
   386   @{ML_text let} (and combinations) require special attention.  The
       
   387   syntax of Standard ML is quite ambitious and admits a lot of
       
   388   variance that can distort the meaning of the text.
       
   389 
       
   390   Clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
       
   391   @{ML_text case} get extra indentation to indicate the nesting
       
   392   clearly.  Example:
       
   393 
       
   394   \begin{verbatim}
       
   395   (* RIGHT *)
       
   396 
       
   397   fun foo p1 =
       
   398         expr1
       
   399     | foo p2 =
       
   400         expr2
       
   401 
       
   402 
       
   403   (* WRONG *)
       
   404 
       
   405   fun foo p1 =
       
   406     expr1
       
   407     | foo p2 =
       
   408     expr2
       
   409   \end{verbatim}
       
   410 
       
   411   Body expressions consisting of @{ML_text case} or @{ML_text let}
       
   412   require care to maintain compositionality, to prevent loss of
       
   413   logical indentation where it is especially important to see the
       
   414   structure of the text.  Example:
       
   415 
       
   416   \begin{verbatim}
       
   417   (* RIGHT *)
       
   418 
       
   419   fun foo p1 =
       
   420         (case e of
       
   421           q1 => ...
       
   422         | q2 => ...)
       
   423     | foo p2 =
       
   424         let
       
   425           ...
       
   426         in
       
   427           ...
       
   428         end
       
   429 
       
   430 
       
   431   (* WRONG *)
       
   432 
       
   433   fun foo p1 = case e of
       
   434       q1 => ...
       
   435     | q2 => ...
       
   436     | foo p2 =
       
   437     let
       
   438       ...
       
   439     in
       
   440       ...
       
   441     end
       
   442   \end{verbatim}
       
   443 
       
   444   Extra parentheses around @{ML_text case} expressions are optional,
       
   445   but help to analyse the nesting based on character matching in the
       
   446   editor.
       
   447 
       
   448   \medskip There are two main exceptions to the overall principle of
       
   449   compositionality in the layout of complex expressions.
       
   450 
       
   451   \begin{enumerate}
       
   452 
       
   453   \item @{ML_text "if"} expressions are iterated as if there would be
       
   454   a multi-branch conditional in SML, e.g.
       
   455 
       
   456   \begin{verbatim}
       
   457   (* RIGHT *)
       
   458 
       
   459   if b1 then e1
       
   460   else if b2 then e2
       
   461   else e3
       
   462   \end{verbatim}
       
   463 
       
   464   \item @{ML_text fn} abstractions are often layed-out as if they
       
   465   would lack any structure by themselves.  This traditional form is
       
   466   motivated by the possibility to shift function arguments back and
       
   467   forth wrt.\ additional combinators.  Example:
       
   468 
       
   469   \begin{verbatim}
       
   470   (* RIGHT *)
       
   471 
       
   472   fun foo x y = fold (fn z =>
       
   473     expr)
       
   474   \end{verbatim}
       
   475 
       
   476   Here the visual appearance is that of three arguments @{ML_text x},
       
   477   @{ML_text y}, @{ML_text z}.
       
   478 
       
   479   \end{enumerate}
       
   480 
       
   481   Such weakly structured layout should be use with great care.  Here
       
   482   are some counter-examples involving @{ML_text let} expressions:
       
   483 
       
   484   \begin{verbatim}
       
   485   (* WRONG *)
       
   486 
       
   487   fun foo x = let
       
   488       val y = ...
       
   489     in ... end
       
   490 
       
   491 
       
   492   (* WRONG *)
       
   493 
       
   494   fun foo x = let
       
   495     val y = ...
       
   496   in ... end
       
   497 
       
   498 
       
   499   (* WRONG *)
       
   500 
       
   501   fun foo x =
       
   502   let
       
   503     val y = ...
       
   504   in ... end
       
   505   \end{verbatim}
       
   506 
       
   507   \medskip In general the source layout is meant to emphasize the
       
   508   structure of complex language expressions, not to pretend that SML
       
   509   had a completely different syntax (say that of Haskell or Java).
       
   510 *}
       
   511 
       
   512 
       
   513 section {* SML embedded into Isabelle/Isar *}
       
   514 
       
   515 text {* ML and Isar are intertwined via an open-ended bootstrap
       
   516   process that provides more and more programming facilities and
       
   517   logical content in an alternating manner.  Bootstrapping starts from
       
   518   the raw environment of existing implementations of Standard ML
       
   519   (mainly Poly/ML, but also SML/NJ).
       
   520 
       
   521   Isabelle/Pure marks the point where the original ML toplevel is
       
   522   superseded by the Isar toplevel that maintains a uniform context for
       
   523   arbitrary ML values (see also \secref{sec:context}).  This formal
       
   524   environment holds ML compiler bindings, logical entities, and many
       
   525   other things.  Raw SML is never encountered again after the initial
       
   526   bootstrap of Isabelle/Pure.
       
   527 
       
   528   Object-logics like Isabelle/HOL are built within the
       
   529   Isabelle/ML/Isar environment by introducing suitable theories with
       
   530   associated ML modules, either inlined or as separate files.  Thus
       
   531   Isabelle/HOL is defined as a regular user-space application within
       
   532   the Isabelle framework.  Further add-on tools can be implemented in
       
   533   ML within the Isar context in the same manner: ML is part of the
       
   534   standard repertoire of Isabelle, and there is no distinction between
       
   535   ``user'' and ``developer'' in this respect.
       
   536 *}
       
   537 
       
   538 
       
   539 subsection {* Isar ML commands *}
       
   540 
       
   541 text {* The primary Isar source language provides facilities to ``open
       
   542   a window'' to the underlying ML compiler.  Especially see the Isar
       
   543   commands @{command_ref "use"} and @{command_ref "ML"}: both work the
       
   544   same way, only the source text is provided via a file vs.\ inlined,
       
   545   respectively.  Apart from embedding ML into the main theory
       
   546   definition like that, there are many more commands that refer to ML
       
   547   source, such as @{command_ref setup} or @{command_ref declaration}.
       
   548   Even more fine-grained embedding of ML into Isar is encountered in
       
   549   the proof method @{method_ref tactic}, which refines the pending
       
   550   goal state via a given expression of type @{ML_type tactic}.
       
   551 *}
       
   552 
       
   553 text %mlex {* The following artificial example demonstrates some ML
       
   554   toplevel declarations within the implicit Isar theory context.  This
       
   555   is regular functional programming without referring to logical
       
   556   entities yet.
       
   557 *}
       
   558 
       
   559 ML {*
       
   560   fun factorial 0 = 1
       
   561     | factorial n = n * factorial (n - 1)
       
   562 *}
       
   563 
       
   564 text {* Here the ML environment is already managed by Isabelle, i.e.\
       
   565   the @{ML factorial} function is not yet accessible in the preceding
       
   566   paragraph, nor in a different theory that is independent from the
       
   567   current one in the import hierarchy.
       
   568 
       
   569   Removing the above ML declaration from the source text will remove
       
   570   any trace of this definition as expected.  The Isabelle/ML toplevel
       
   571   environment is managed in a \emph{stateless} way: unlike the raw ML
       
   572   toplevel there are no global side-effects involved
       
   573   here.\footnote{Such a stateless compilation environment is also a
       
   574   prerequisite for robust parallel compilation within independent
       
   575   nodes of the implicit theory development graph.}
       
   576 
       
   577   \medskip The next example shows how to embed ML into Isar proofs, using
       
   578  @{command_ref "ML_prf"} instead of Instead of @{command_ref "ML"}.
       
   579   As illustrated below, the effect on the ML environment is local to
       
   580   the whole proof body, ignoring the block structure.
       
   581 *}
       
   582 
       
   583 notepad
       
   584 begin
       
   585   ML_prf %"ML" {* val a = 1 *}
       
   586   {
       
   587     ML_prf %"ML" {* val b = a + 1 *}
       
   588   } -- {* Isar block structure ignored by ML environment *}
       
   589   ML_prf %"ML" {* val c = b + 1 *}
       
   590 end
       
   591 
       
   592 text {* By side-stepping the normal scoping rules for Isar proof
       
   593   blocks, embedded ML code can refer to the different contexts and
       
   594   manipulate corresponding entities, e.g.\ export a fact from a block
       
   595   context.
       
   596 
       
   597   \medskip Two further ML commands are useful in certain situations:
       
   598   @{command_ref ML_val} and @{command_ref ML_command} are
       
   599   \emph{diagnostic} in the sense that there is no effect on the
       
   600   underlying environment, and can thus used anywhere (even outside a
       
   601   theory).  The examples below produce long strings of digits by
       
   602   invoking @{ML factorial}: @{command ML_val} already takes care of
       
   603   printing the ML toplevel result, but @{command ML_command} is silent
       
   604   so we produce an explicit output message.  *}
       
   605 
       
   606 ML_val {* factorial 100 *}
       
   607 ML_command {* writeln (string_of_int (factorial 100)) *}
       
   608 
       
   609 notepad
       
   610 begin
       
   611   ML_val {* factorial 100 *}  (* FIXME check/fix indentation *)
       
   612   ML_command {* writeln (string_of_int (factorial 100)) *}
       
   613 end
       
   614 
       
   615 
       
   616 subsection {* Compile-time context *}
       
   617 
       
   618 text {* Whenever the ML compiler is invoked within Isabelle/Isar, the
       
   619   formal context is passed as a thread-local reference variable.  Thus
       
   620   ML code may access the theory context during compilation, by reading
       
   621   or writing the (local) theory under construction.  Note that such
       
   622   direct access to the compile-time context is rare.  In practice it
       
   623   is typically done via some derived ML functions instead.
       
   624 *}
       
   625 
       
   626 text %mlref {*
       
   627   \begin{mldecls}
       
   628   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\
       
   629   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
       
   630   @{index_ML bind_thms: "string * thm list -> unit"} \\
       
   631   @{index_ML bind_thm: "string * thm -> unit"} \\
       
   632   \end{mldecls}
       
   633 
       
   634   \begin{description}
       
   635 
       
   636   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory
       
   637   context of the ML toplevel --- at compile time.  ML code needs to
       
   638   take care to refer to @{ML "ML_Context.the_generic_context ()"}
       
   639   correctly.  Recall that evaluation of a function body is delayed
       
   640   until actual run-time.
       
   641 
       
   642   \item @{ML "Context.>>"}~@{text f} applies context transformation
       
   643   @{text f} to the implicit context of the ML toplevel.
       
   644 
       
   645   \item @{ML bind_thms}~@{text "(name, thms)"} stores a list of
       
   646   theorems produced in ML both in the (global) theory context and the
       
   647   ML toplevel, associating it with the provided name.  Theorems are
       
   648   put into a global ``standard'' format before being stored.
       
   649 
       
   650   \item @{ML bind_thm} is similar to @{ML bind_thms} but refers to a
       
   651   singleton fact.
       
   652 
       
   653   \end{description}
       
   654 
       
   655   It is important to note that the above functions are really
       
   656   restricted to the compile time, even though the ML compiler is
       
   657   invoked at run-time.  The majority of ML code either uses static
       
   658   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or
       
   659   proof context at run-time, by explicit functional abstraction.
       
   660 *}
       
   661 
       
   662 
       
   663 subsection {* Antiquotations \label{sec:ML-antiq} *}
       
   664 
       
   665 text {* A very important consequence of embedding SML into Isar is the
       
   666   concept of \emph{ML antiquotation}.  The standard token language of
       
   667   ML is augmented by special syntactic entities of the following form:
       
   668 
       
   669   @{rail "
       
   670   @{syntax_def antiquote}: '@{' nameref args '}' | '\<lbrace>' | '\<rbrace>'
       
   671   "}
       
   672 
       
   673   Here @{syntax nameref} and @{syntax args} are regular outer syntax
       
   674   categories \cite{isabelle-isar-ref}.  Attributes and proof methods
       
   675   use similar syntax.
       
   676 
       
   677   \medskip A regular antiquotation @{text "@{name args}"} processes
       
   678   its arguments by the usual means of the Isar source language, and
       
   679   produces corresponding ML source text, either as literal
       
   680   \emph{inline} text (e.g. @{text "@{term t}"}) or abstract
       
   681   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation
       
   682   scheme allows to refer to formal entities in a robust manner, with
       
   683   proper static scoping and with some degree of logical checking of
       
   684   small portions of the code.
       
   685 
       
   686   Special antiquotations like @{text "@{let \<dots>}"} or @{text "@{note
       
   687   \<dots>}"} augment the compilation context without generating code.  The
       
   688   non-ASCII braces @{text "\<lbrace>"} and @{text "\<rbrace>"} allow to delimit the
       
   689   effect by introducing local blocks within the pre-compilation
       
   690   environment.
       
   691 
       
   692   \medskip See also \cite{Wenzel-Chaieb:2007b} for a broader
       
   693   perspective on Isabelle/ML antiquotations.  *}
       
   694 
       
   695 text %mlantiq {*
       
   696   \begin{matharray}{rcl}
       
   697   @{ML_antiquotation_def "let"} & : & @{text ML_antiquotation} \\
       
   698   @{ML_antiquotation_def "note"} & : & @{text ML_antiquotation} \\
       
   699   \end{matharray}
       
   700 
       
   701   @{rail "
       
   702   @@{ML_antiquotation let} ((term + @'and') '=' term + @'and')
       
   703   ;
       
   704   @@{ML_antiquotation note} (thmdef? thmrefs + @'and')
       
   705   "}
       
   706 
       
   707   \begin{description}
       
   708 
       
   709   \item @{text "@{let p = t}"} binds schematic variables in the
       
   710   pattern @{text "p"} by higher-order matching against the term @{text
       
   711   "t"}.  This is analogous to the regular @{command_ref let} command
       
   712   in the Isar proof language.  The pre-compilation environment is
       
   713   augmented by auxiliary term bindings, without emitting ML source.
       
   714 
       
   715   \item @{text "@{note a = b\<^sub>1 \<dots> b\<^sub>n}"} recalls existing facts @{text
       
   716   "b\<^sub>1, \<dots>, b\<^sub>n"}, binding the result as @{text a}.  This is analogous to
       
   717   the regular @{command_ref note} command in the Isar proof language.
       
   718   The pre-compilation environment is augmented by auxiliary fact
       
   719   bindings, without emitting ML source.
       
   720 
       
   721   \end{description}
       
   722 *}
       
   723 
       
   724 text %mlex {* The following artificial example gives some impression
       
   725   about the antiquotation elements introduced so far, together with
       
   726   the important @{text "@{thm}"} antiquotation defined later.
       
   727 *}
       
   728 
       
   729 ML {*
       
   730   \<lbrace>
       
   731     @{let ?t = my_term}
       
   732     @{note my_refl = reflexive [of ?t]}
       
   733     fun foo th = Thm.transitive th @{thm my_refl}
       
   734   \<rbrace>
       
   735 *}
       
   736 
       
   737 text {* The extra block delimiters do not affect the compiled code
       
   738   itself, i.e.\ function @{ML foo} is available in the present context
       
   739   of this paragraph.
       
   740 *}
       
   741 
       
   742 
       
   743 section {* Canonical argument order \label{sec:canonical-argument-order} *}
       
   744 
       
   745 text {* Standard ML is a language in the tradition of @{text
       
   746   "\<lambda>"}-calculus and \emph{higher-order functional programming},
       
   747   similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical
       
   748   languages.  Getting acquainted with the native style of representing
       
   749   functions in that setting can save a lot of extra boiler-plate of
       
   750   redundant shuffling of arguments, auxiliary abstractions etc.
       
   751 
       
   752   Functions are usually \emph{curried}: the idea of turning arguments
       
   753   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of
       
   754   type @{text "\<tau>"} is represented by the iterated function space
       
   755   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known
       
   756   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried
       
   757   version fits more smoothly into the basic calculus.\footnote{The
       
   758   difference is even more significant in higher-order logic, because
       
   759   the redundant tuple structure needs to be accommodated by formal
       
   760   reasoning.}
       
   761 
       
   762   Currying gives some flexiblity due to \emph{partial application}.  A
       
   763   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^bsub>2\<^esub> \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}
       
   764   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function
       
   765   etc.  How well this works in practice depends on the order of
       
   766   arguments.  In the worst case, arguments are arranged erratically,
       
   767   and using a function in a certain situation always requires some
       
   768   glue code.  Thus we would get exponentially many oppurtunities to
       
   769   decorate the code with meaningless permutations of arguments.
       
   770 
       
   771   This can be avoided by \emph{canonical argument order}, which
       
   772   observes certain standard patterns and minimizes adhoc permutations
       
   773   in their application.  In Isabelle/ML, large portions of text can be
       
   774   written without ever using @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times> \<alpha>"}, or the
       
   775   combinator @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} that is not even
       
   776   defined in our library.
       
   777 
       
   778   \medskip The basic idea is that arguments that vary less are moved
       
   779   further to the left than those that vary more.  Two particularly
       
   780   important categories of functions are \emph{selectors} and
       
   781   \emph{updates}.
       
   782 
       
   783   The subsequent scheme is based on a hypothetical set-like container
       
   784   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both
       
   785   the names and types of the associated operations are canonical for
       
   786   Isabelle/ML.
       
   787 
       
   788   \medskip
       
   789   \begin{tabular}{ll}
       
   790   kind & canonical name and type \\\hline
       
   791   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\
       
   792   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\
       
   793   \end{tabular}
       
   794   \medskip
       
   795 
       
   796   Given a container @{text "B: \<beta>"}, the partially applied @{text
       
   797   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and
       
   798   thus represents the intended denotation directly.  It is customary
       
   799   to pass the abstract predicate to further operations, not the
       
   800   concrete container.  The argument order makes it easy to use other
       
   801   combinators: @{text "forall (member B) list"} will check a list of
       
   802   elements for membership in @{text "B"} etc. Often the explicit
       
   803   @{text "list"} is pointless and can be contracted to @{text "forall
       
   804   (member B)"} to get directly a predicate again.
       
   805 
       
   806   In contrast, an update operation varies the container, so it moves
       
   807   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to
       
   808   insert a value @{text "a"}.  These can be composed naturally as
       
   809   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward
       
   810   inversion of the composition order is due to conventional
       
   811   mathematical notation, which can be easily amended as explained
       
   812   below.
       
   813 *}
       
   814 
       
   815 
       
   816 subsection {* Forward application and composition *}
       
   817 
       
   818 text {* Regular function application and infix notation works best for
       
   819   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g
       
   820   z)"}.  The important special case of \emph{linear transformation}
       
   821   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This
       
   822   becomes hard to read and maintain if the functions are themselves
       
   823   given as complex expressions.  The notation can be significantly
       
   824   improved by introducing \emph{forward} versions of application and
       
   825   composition as follows:
       
   826 
       
   827   \medskip
       
   828   \begin{tabular}{lll}
       
   829   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\
       
   830   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\
       
   831   \end{tabular}
       
   832   \medskip
       
   833 
       
   834   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or
       
   835   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text
       
   836   "x"}.
       
   837 
       
   838   \medskip There is an additional set of combinators to accommodate
       
   839   multiple results (via pairs) that are passed on as multiple
       
   840   arguments (via currying).
       
   841 
       
   842   \medskip
       
   843   \begin{tabular}{lll}
       
   844   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\
       
   845   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\
       
   846   \end{tabular}
       
   847   \medskip
       
   848 *}
       
   849 
       
   850 text %mlref {*
       
   851   \begin{mldecls}
       
   852   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
       
   853   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
       
   854   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
       
   855   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
       
   856   \end{mldecls}
       
   857 
       
   858   %FIXME description!?
       
   859 *}
       
   860 
       
   861 
       
   862 subsection {* Canonical iteration *}
       
   863 
       
   864 text {* As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be
       
   865   understood as update on a configuration of type @{text "\<beta>"},
       
   866   parametrized by arguments of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}
       
   867   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates
       
   868   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a
       
   869   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f
       
   870   a\<^bsub>n\<^esub>\<^bsub>\<^esub>"}.  The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.
       
   871   It can be applied to an initial configuration @{text "b: \<beta>"} to
       
   872   start the iteration over the given list of arguments: each @{text
       
   873   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a
       
   874   cumulative configuration.
       
   875 
       
   876   The @{text fold} combinator in Isabelle/ML lifts a function @{text
       
   877   "f"} as above to its iterated version over a list of arguments.
       
   878   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates
       
   879   over a list of lists as expected.
       
   880 
       
   881   The variant @{text "fold_rev"} works inside-out over the list of
       
   882   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.
       
   883 
       
   884   The @{text "fold_map"} combinator essentially performs @{text
       
   885   "fold"} and @{text "map"} simultaneously: each application of @{text
       
   886   "f"} produces an updated configuration together with a side-result;
       
   887   the iteration collects all such side-results as a separate list.
       
   888 *}
       
   889 
       
   890 text %mlref {*
       
   891   \begin{mldecls}
       
   892   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
       
   893   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
       
   894   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
       
   895   \end{mldecls}
       
   896 
       
   897   \begin{description}
       
   898 
       
   899   \item @{ML fold}~@{text f} lifts the parametrized update function
       
   900   @{text "f"} to a list of parameters.
       
   901 
       
   902   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text
       
   903   "f"}, but works inside-out.
       
   904 
       
   905   \item @{ML fold_map}~@{text "f"} lifts the parametrized update
       
   906   function @{text "f"} (with side-result) to a list of parameters and
       
   907   cumulative side-results.
       
   908 
       
   909   \end{description}
       
   910 
       
   911   \begin{warn}
       
   912   The literature on functional programming provides a multitude of
       
   913   combinators called @{text "foldl"}, @{text "foldr"} etc.  SML97
       
   914   provides its own variations as @{ML List.foldl} and @{ML
       
   915   List.foldr}, while the classic Isabelle library also has the
       
   916   historic @{ML Library.foldl} and @{ML Library.foldr}.  To avoid
       
   917   further confusion, all of this should be ignored, and @{ML fold} (or
       
   918   @{ML fold_rev}) used exclusively.
       
   919   \end{warn}
       
   920 *}
       
   921 
       
   922 text %mlex {* The following example shows how to fill a text buffer
       
   923   incrementally by adding strings, either individually or from a given
       
   924   list.
       
   925 *}
       
   926 
       
   927 ML {*
       
   928   val s =
       
   929     Buffer.empty
       
   930     |> Buffer.add "digits: "
       
   931     |> fold (Buffer.add o string_of_int) (0 upto 9)
       
   932     |> Buffer.content;
       
   933 
       
   934   @{assert} (s = "digits: 0123456789");
       
   935 *}
       
   936 
       
   937 text {* Note how @{ML "fold (Buffer.add o string_of_int)"} above saves
       
   938   an extra @{ML "map"} over the given list.  This kind of peephole
       
   939   optimization reduces both the code size and the tree structures in
       
   940   memory (``deforestation''), but requires some practice to read and
       
   941   write it fluently.
       
   942 
       
   943   \medskip The next example elaborates the idea of canonical
       
   944   iteration, demonstrating fast accumulation of tree content using a
       
   945   text buffer.
       
   946 *}
       
   947 
       
   948 ML {*
       
   949   datatype tree = Text of string | Elem of string * tree list;
       
   950 
       
   951   fun slow_content (Text txt) = txt
       
   952     | slow_content (Elem (name, ts)) =
       
   953         "<" ^ name ^ ">" ^
       
   954         implode (map slow_content ts) ^
       
   955         "</" ^ name ^ ">"
       
   956 
       
   957   fun add_content (Text txt) = Buffer.add txt
       
   958     | add_content (Elem (name, ts)) =
       
   959         Buffer.add ("<" ^ name ^ ">") #>
       
   960         fold add_content ts #>
       
   961         Buffer.add ("</" ^ name ^ ">");
       
   962 
       
   963   fun fast_content tree =
       
   964     Buffer.empty |> add_content tree |> Buffer.content;
       
   965 *}
       
   966 
       
   967 text {* The slow part of @{ML slow_content} is the @{ML implode} of
       
   968   the recursive results, because it copies previously produced strings
       
   969   again.
       
   970 
       
   971   The incremental @{ML add_content} avoids this by operating on a
       
   972   buffer that is passed through in a linear fashion.  Using @{ML_text
       
   973   "#>"} and contraction over the actual buffer argument saves some
       
   974   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}
       
   975   invocations with concatenated strings could have been split into
       
   976   smaller parts, but this would have obfuscated the source without
       
   977   making a big difference in allocations.  Here we have done some
       
   978   peephole-optimization for the sake of readability.
       
   979 
       
   980   Another benefit of @{ML add_content} is its ``open'' form as a
       
   981   function on buffers that can be continued in further linear
       
   982   transformations, folding etc.  Thus it is more compositional than
       
   983   the naive @{ML slow_content}.  As realistic example, compare the
       
   984   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer
       
   985   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
       
   986 
       
   987   Note that @{ML fast_content} above is only defined as example.  In
       
   988   many practical situations, it is customary to provide the
       
   989   incremental @{ML add_content} only and leave the initialization and
       
   990   termination to the concrete application by the user.
       
   991 *}
       
   992 
       
   993 
       
   994 section {* Message output channels \label{sec:message-channels} *}
       
   995 
       
   996 text {* Isabelle provides output channels for different kinds of
       
   997   messages: regular output, high-volume tracing information, warnings,
       
   998   and errors.
       
   999 
       
  1000   Depending on the user interface involved, these messages may appear
       
  1001   in different text styles or colours.  The standard output for
       
  1002   terminal sessions prefixes each line of warnings by @{verbatim
       
  1003   "###"} and errors by @{verbatim "***"}, but leaves anything else
       
  1004   unchanged.
       
  1005 
       
  1006   Messages are associated with the transaction context of the running
       
  1007   Isar command.  This enables the front-end to manage commands and
       
  1008   resulting messages together.  For example, after deleting a command
       
  1009   from a given theory document version, the corresponding message
       
  1010   output can be retracted from the display.
       
  1011 *}
       
  1012 
       
  1013 text %mlref {*
       
  1014   \begin{mldecls}
       
  1015   @{index_ML writeln: "string -> unit"} \\
       
  1016   @{index_ML tracing: "string -> unit"} \\
       
  1017   @{index_ML warning: "string -> unit"} \\
       
  1018   @{index_ML error: "string -> 'a"} \\
       
  1019   \end{mldecls}
       
  1020 
       
  1021   \begin{description}
       
  1022 
       
  1023   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular
       
  1024   message.  This is the primary message output operation of Isabelle
       
  1025   and should be used by default.
       
  1026 
       
  1027   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special
       
  1028   tracing message, indicating potential high-volume output to the
       
  1029   front-end (hundreds or thousands of messages issued by a single
       
  1030   command).  The idea is to allow the user-interface to downgrade the
       
  1031   quality of message display to achieve higher throughput.
       
  1032 
       
  1033   Note that the user might have to take special actions to see tracing
       
  1034   output, e.g.\ switch to a different output window.  So this channel
       
  1035   should not be used for regular output.
       
  1036 
       
  1037   \item @{ML warning}~@{text "text"} outputs @{text "text"} as
       
  1038   warning, which typically means some extra emphasis on the front-end
       
  1039   side (color highlighting, icons, etc.).
       
  1040 
       
  1041   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text
       
  1042   "text"} and thus lets the Isar toplevel print @{text "text"} on the
       
  1043   error channel, which typically means some extra emphasis on the
       
  1044   front-end side (color highlighting, icons, etc.).
       
  1045 
       
  1046   This assumes that the exception is not handled before the command
       
  1047   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a
       
  1048   perfectly legal alternative: it means that the error is absorbed
       
  1049   without any message output.
       
  1050 
       
  1051   \begin{warn}
       
  1052   The actual error channel is accessed via @{ML Output.error_msg}, but
       
  1053   the interaction protocol of Proof~General \emph{crashes} if that
       
  1054   function is used in regular ML code: error output and toplevel
       
  1055   command failure always need to coincide.
       
  1056   \end{warn}
       
  1057 
       
  1058   \end{description}
       
  1059 
       
  1060   \begin{warn}
       
  1061   Regular Isabelle/ML code should output messages exclusively by the
       
  1062   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}
       
  1063   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in
       
  1064   the presence of parallel and asynchronous processing of Isabelle
       
  1065   theories.  Such raw output might be displayed by the front-end in
       
  1066   some system console log, with a low chance that the user will ever
       
  1067   see it.  Moreover, as a genuine side-effect on global process
       
  1068   channels, there is no proper way to retract output when Isar command
       
  1069   transactions are reset by the system.
       
  1070   \end{warn}
       
  1071 
       
  1072   \begin{warn}
       
  1073   The message channels should be used in a message-oriented manner.
       
  1074   This means that multi-line output that logically belongs together is
       
  1075   issued by a \emph{single} invocation of @{ML writeln} etc.\ with the
       
  1076   functional concatenation of all message constituents.
       
  1077   \end{warn}
       
  1078 *}
       
  1079 
       
  1080 text %mlex {* The following example demonstrates a multi-line
       
  1081   warning.  Note that in some situations the user sees only the first
       
  1082   line, so the most important point should be made first.
       
  1083 *}
       
  1084 
       
  1085 ML_command {*
       
  1086   warning (cat_lines
       
  1087    ["Beware the Jabberwock, my son!",
       
  1088     "The jaws that bite, the claws that catch!",
       
  1089     "Beware the Jubjub Bird, and shun",
       
  1090     "The frumious Bandersnatch!"]);
       
  1091 *}
       
  1092 
       
  1093 
       
  1094 section {* Exceptions \label{sec:exceptions} *}
       
  1095 
       
  1096 text {* The Standard ML semantics of strict functional evaluation
       
  1097   together with exceptions is rather well defined, but some delicate
       
  1098   points need to be observed to avoid that ML programs go wrong
       
  1099   despite static type-checking.  Exceptions in Isabelle/ML are
       
  1100   subsequently categorized as follows.
       
  1101 
       
  1102   \paragraph{Regular user errors.}  These are meant to provide
       
  1103   informative feedback about malformed input etc.
       
  1104 
       
  1105   The \emph{error} function raises the corresponding \emph{ERROR}
       
  1106   exception, with a plain text message as argument.  \emph{ERROR}
       
  1107   exceptions can be handled internally, in order to be ignored, turned
       
  1108   into other exceptions, or cascaded by appending messages.  If the
       
  1109   corresponding Isabelle/Isar command terminates with an \emph{ERROR}
       
  1110   exception state, the toplevel will print the result on the error
       
  1111   channel (see \secref{sec:message-channels}).
       
  1112 
       
  1113   It is considered bad style to refer to internal function names or
       
  1114   values in ML source notation in user error messages.
       
  1115 
       
  1116   Grammatical correctness of error messages can be improved by
       
  1117   \emph{omitting} final punctuation: messages are often concatenated
       
  1118   or put into a larger context (e.g.\ augmented with source position).
       
  1119   By not insisting in the final word at the origin of the error, the
       
  1120   system can perform its administrative tasks more easily and
       
  1121   robustly.
       
  1122 
       
  1123   \paragraph{Program failures.}  There is a handful of standard
       
  1124   exceptions that indicate general failure situations, or failures of
       
  1125   core operations on logical entities (types, terms, theorems,
       
  1126   theories, see \chref{ch:logic}).
       
  1127 
       
  1128   These exceptions indicate a genuine breakdown of the program, so the
       
  1129   main purpose is to determine quickly what has happened where.
       
  1130   Traditionally, the (short) exception message would include the name
       
  1131   of an ML function, although this is no longer necessary, because the
       
  1132   ML runtime system prints a detailed source position of the
       
  1133   corresponding @{ML_text raise} keyword.
       
  1134 
       
  1135   \medskip User modules can always introduce their own custom
       
  1136   exceptions locally, e.g.\ to organize internal failures robustly
       
  1137   without overlapping with existing exceptions.  Exceptions that are
       
  1138   exposed in module signatures require extra care, though, and should
       
  1139   \emph{not} be introduced by default.  Surprise by users of a module
       
  1140   can be often minimized by using plain user errors instead.
       
  1141 
       
  1142   \paragraph{Interrupts.}  These indicate arbitrary system events:
       
  1143   both the ML runtime system and the Isabelle/ML infrastructure signal
       
  1144   various exceptional situations by raising the special
       
  1145   \emph{Interrupt} exception in user code.
       
  1146 
       
  1147   This is the one and only way that physical events can intrude an
       
  1148   Isabelle/ML program.  Such an interrupt can mean out-of-memory,
       
  1149   stack overflow, timeout, internal signaling of threads, or the user
       
  1150   producing a console interrupt manually etc.  An Isabelle/ML program
       
  1151   that intercepts interrupts becomes dependent on physical effects of
       
  1152   the environment.  Even worse, exception handling patterns that are
       
  1153   too general by accident, e.g.\ by mispelled exception constructors,
       
  1154   will cover interrupts unintentionally and thus render the program
       
  1155   semantics ill-defined.
       
  1156 
       
  1157   Note that the Interrupt exception dates back to the original SML90
       
  1158   language definition.  It was excluded from the SML97 version to
       
  1159   avoid its malign impact on ML program semantics, but without
       
  1160   providing a viable alternative.  Isabelle/ML recovers physical
       
  1161   interruptibility (which is an indispensable tool to implement
       
  1162   managed evaluation of command transactions), but requires user code
       
  1163   to be strictly transparent wrt.\ interrupts.
       
  1164 
       
  1165   \begin{warn}
       
  1166   Isabelle/ML user code needs to terminate promptly on interruption,
       
  1167   without guessing at its meaning to the system infrastructure.
       
  1168   Temporary handling of interrupts for cleanup of global resources
       
  1169   etc.\ needs to be followed immediately by re-raising of the original
       
  1170   exception.
       
  1171   \end{warn}
       
  1172 *}
       
  1173 
       
  1174 text %mlref {*
       
  1175   \begin{mldecls}
       
  1176   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
       
  1177   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
       
  1178   @{index_ML ERROR: "string -> exn"} \\
       
  1179   @{index_ML Fail: "string -> exn"} \\
       
  1180   @{index_ML Exn.is_interrupt: "exn -> bool"} \\
       
  1181   @{index_ML reraise: "exn -> 'a"} \\
       
  1182   @{index_ML exception_trace: "(unit -> 'a) -> 'a"} \\
       
  1183   \end{mldecls}
       
  1184 
       
  1185   \begin{description}
       
  1186 
       
  1187   \item @{ML try}~@{text "f x"} makes the partiality of evaluating
       
  1188   @{text "f x"} explicit via the option datatype.  Interrupts are
       
  1189   \emph{not} handled here, i.e.\ this form serves as safe replacement
       
  1190   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f
       
  1191   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in
       
  1192   books about SML.
       
  1193 
       
  1194   \item @{ML can} is similar to @{ML try} with more abstract result.
       
  1195 
       
  1196   \item @{ML ERROR}~@{text "msg"} represents user errors; this
       
  1197   exception is normally raised indirectly via the @{ML error} function
       
  1198   (see \secref{sec:message-channels}).
       
  1199 
       
  1200   \item @{ML Fail}~@{text "msg"} represents general program failures.
       
  1201 
       
  1202   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without
       
  1203   mentioning concrete exception constructors in user code.  Handled
       
  1204   interrupts need to be re-raised promptly!
       
  1205 
       
  1206   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}
       
  1207   while preserving its implicit position information (if possible,
       
  1208   depending on the ML platform).
       
  1209 
       
  1210   \item @{ML exception_trace}~@{ML_text "(fn () =>"}~@{text
       
  1211   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing
       
  1212   a full trace of its stack of nested exceptions (if possible,
       
  1213   depending on the ML platform).\footnote{In versions of Poly/ML the
       
  1214   trace will appear on raw stdout of the Isabelle process.}
       
  1215 
       
  1216   Inserting @{ML exception_trace} into ML code temporarily is useful
       
  1217   for debugging, but not suitable for production code.
       
  1218 
       
  1219   \end{description}
       
  1220 *}
       
  1221 
       
  1222 text %mlantiq {*
       
  1223   \begin{matharray}{rcl}
       
  1224   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\
       
  1225   \end{matharray}
       
  1226 
       
  1227   \begin{description}
       
  1228 
       
  1229   \item @{text "@{assert}"} inlines a function
       
  1230   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is
       
  1231   @{ML false}.  Due to inlining the source position of failed
       
  1232   assertions is included in the error output.
       
  1233 
       
  1234   \end{description}
       
  1235 *}
       
  1236 
       
  1237 
       
  1238 section {* Basic data types *}
       
  1239 
       
  1240 text {* The basis library proposal of SML97 needs to be treated with
       
  1241   caution.  Many of its operations simply do not fit with important
       
  1242   Isabelle/ML conventions (like ``canonical argument order'', see
       
  1243   \secref{sec:canonical-argument-order}), others cause problems with
       
  1244   the parallel evaluation model of Isabelle/ML (such as @{ML
       
  1245   TextIO.print} or @{ML OS.Process.system}).
       
  1246 
       
  1247   Subsequently we give a brief overview of important operations on
       
  1248   basic ML data types.
       
  1249 *}
       
  1250 
       
  1251 
       
  1252 subsection {* Characters *}
       
  1253 
       
  1254 text %mlref {*
       
  1255   \begin{mldecls}
       
  1256   @{index_ML_type char} \\
       
  1257   \end{mldecls}
       
  1258 
       
  1259   \begin{description}
       
  1260 
       
  1261   \item Type @{ML_type char} is \emph{not} used.  The smallest textual
       
  1262   unit in Isabelle is represented as a ``symbol'' (see
       
  1263   \secref{sec:symbols}).
       
  1264 
       
  1265   \end{description}
       
  1266 *}
       
  1267 
       
  1268 
       
  1269 subsection {* Integers *}
       
  1270 
       
  1271 text %mlref {*
       
  1272   \begin{mldecls}
       
  1273   @{index_ML_type int} \\
       
  1274   \end{mldecls}
       
  1275 
       
  1276   \begin{description}
       
  1277 
       
  1278   \item Type @{ML_type int} represents regular mathematical integers,
       
  1279   which are \emph{unbounded}.  Overflow never happens in
       
  1280   practice.\footnote{The size limit for integer bit patterns in memory
       
  1281   is 64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.}
       
  1282   This works uniformly for all supported ML platforms (Poly/ML and
       
  1283   SML/NJ).
       
  1284 
       
  1285   Literal integers in ML text are forced to be of this one true
       
  1286   integer type --- overloading of SML97 is disabled.
       
  1287 
       
  1288   Structure @{ML_struct IntInf} of SML97 is obsolete and superseded by
       
  1289   @{ML_struct Int}.  Structure @{ML_struct Integer} in @{file
       
  1290   "~~/src/Pure/General/integer.ML"} provides some additional
       
  1291   operations.
       
  1292 
       
  1293   \end{description}
       
  1294 *}
       
  1295 
       
  1296 
       
  1297 subsection {* Time *}
       
  1298 
       
  1299 text %mlref {*
       
  1300   \begin{mldecls}
       
  1301   @{index_ML_type Time.time} \\
       
  1302   @{index_ML seconds: "real -> Time.time"} \\
       
  1303   \end{mldecls}
       
  1304 
       
  1305   \begin{description}
       
  1306 
       
  1307   \item Type @{ML_type Time.time} represents time abstractly according
       
  1308   to the SML97 basis library definition.  This is adequate for
       
  1309   internal ML operations, but awkward in concrete time specifications.
       
  1310 
       
  1311   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text
       
  1312   "s"} (measured in seconds) into an abstract time value.  Floating
       
  1313   point numbers are easy to use as context parameters (e.g.\ via
       
  1314   configuration options, see \secref{sec:config-options}) or
       
  1315   preferences that are maintained by external tools as well.
       
  1316 
       
  1317   \end{description}
       
  1318 *}
       
  1319 
       
  1320 
       
  1321 subsection {* Options *}
       
  1322 
       
  1323 text %mlref {*
       
  1324   \begin{mldecls}
       
  1325   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
       
  1326   @{index_ML is_some: "'a option -> bool"} \\
       
  1327   @{index_ML is_none: "'a option -> bool"} \\
       
  1328   @{index_ML the: "'a option -> 'a"} \\
       
  1329   @{index_ML these: "'a list option -> 'a list"} \\
       
  1330   @{index_ML the_list: "'a option -> 'a list"} \\
       
  1331   @{index_ML the_default: "'a -> 'a option -> 'a"} \\
       
  1332   \end{mldecls}
       
  1333 *}
       
  1334 
       
  1335 text {* Apart from @{ML Option.map} most operations defined in
       
  1336   structure @{ML_struct Option} are alien to Isabelle/ML.  The
       
  1337   operations shown above are defined in @{file
       
  1338   "~~/src/Pure/General/basics.ML"}, among others.  *}
       
  1339 
       
  1340 
       
  1341 subsection {* Lists *}
       
  1342 
       
  1343 text {* Lists are ubiquitous in ML as simple and light-weight
       
  1344   ``collections'' for many everyday programming tasks.  Isabelle/ML
       
  1345   provides important additions and improvements over operations that
       
  1346   are predefined in the SML97 library. *}
       
  1347 
       
  1348 text %mlref {*
       
  1349   \begin{mldecls}
       
  1350   @{index_ML cons: "'a -> 'a list -> 'a list"} \\
       
  1351   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
       
  1352   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
       
  1353   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
       
  1354   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
       
  1355   \end{mldecls}
       
  1356 
       
  1357   \begin{description}
       
  1358 
       
  1359   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.
       
  1360 
       
  1361   Tupled infix operators are a historical accident in Standard ML.
       
  1362   The curried @{ML cons} amends this, but it should be only used when
       
  1363   partial application is required.
       
  1364 
       
  1365   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat
       
  1366   lists as a set-like container that maintains the order of elements.
       
  1367   See @{file "~~/src/Pure/library.ML"} for the full specifications
       
  1368   (written in ML).  There are some further derived operations like
       
  1369   @{ML union} or @{ML inter}.
       
  1370 
       
  1371   Note that @{ML insert} is conservative about elements that are
       
  1372   already a @{ML member} of the list, while @{ML update} ensures that
       
  1373   the latest entry is always put in front.  The latter discipline is
       
  1374   often more appropriate in declarations of context data
       
  1375   (\secref{sec:context-data}) that are issued by the user in Isar
       
  1376   source: more recent declarations normally take precedence over
       
  1377   earlier ones.
       
  1378 
       
  1379   \end{description}
       
  1380 *}
       
  1381 
       
  1382 text %mlex {* Using canonical @{ML fold} together with @{ML cons}, or
       
  1383   similar standard operations, alternates the orientation of data.
       
  1384   The is quite natural and should not be altered forcible by inserting
       
  1385   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can
       
  1386   be used in the few situations, where alternation should be
       
  1387   prevented.
       
  1388 *}
       
  1389 
       
  1390 ML {*
       
  1391   val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
       
  1392 
       
  1393   val list1 = fold cons items [];
       
  1394   @{assert} (list1 = rev items);
       
  1395 
       
  1396   val list2 = fold_rev cons items [];
       
  1397   @{assert} (list2 = items);
       
  1398 *}
       
  1399 
       
  1400 text {* The subsequent example demonstrates how to \emph{merge} two
       
  1401   lists in a natural way. *}
       
  1402 
       
  1403 ML {*
       
  1404   fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;
       
  1405 *}
       
  1406 
       
  1407 text {* Here the first list is treated conservatively: only the new
       
  1408   elements from the second list are inserted.  The inside-out order of
       
  1409   insertion via @{ML fold_rev} attempts to preserve the order of
       
  1410   elements in the result.
       
  1411 
       
  1412   This way of merging lists is typical for context data
       
  1413   (\secref{sec:context-data}).  See also @{ML merge} as defined in
       
  1414   @{file "~~/src/Pure/library.ML"}.
       
  1415 *}
       
  1416 
       
  1417 
       
  1418 subsection {* Association lists *}
       
  1419 
       
  1420 text {* The operations for association lists interpret a concrete list
       
  1421   of pairs as a finite function from keys to values.  Redundant
       
  1422   representations with multiple occurrences of the same key are
       
  1423   implicitly normalized: lookup and update only take the first
       
  1424   occurrence into account.
       
  1425 *}
       
  1426 
       
  1427 text {*
       
  1428   \begin{mldecls}
       
  1429   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
       
  1430   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
       
  1431   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
       
  1432   \end{mldecls}
       
  1433 
       
  1434   \begin{description}
       
  1435 
       
  1436   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}
       
  1437   implement the main ``framework operations'' for mappings in
       
  1438   Isabelle/ML, following standard conventions for their names and
       
  1439   types.
       
  1440 
       
  1441   Note that a function called @{text lookup} is obliged to express its
       
  1442   partiality via an explicit option element.  There is no choice to
       
  1443   raise an exception, without changing the name to something like
       
  1444   @{text "the_element"} or @{text "get"}.
       
  1445 
       
  1446   The @{text "defined"} operation is essentially a contraction of @{ML
       
  1447   is_some} and @{text "lookup"}, but this is sufficiently frequent to
       
  1448   justify its independent existence.  This also gives the
       
  1449   implementation some opportunity for peep-hole optimization.
       
  1450 
       
  1451   \end{description}
       
  1452 
       
  1453   Association lists are adequate as simple and light-weight
       
  1454   implementation of finite mappings in many practical situations.  A
       
  1455   more heavy-duty table structure is defined in @{file
       
  1456   "~~/src/Pure/General/table.ML"}; that version scales easily to
       
  1457   thousands or millions of elements.
       
  1458 *}
       
  1459 
       
  1460 
       
  1461 subsection {* Unsynchronized references *}
       
  1462 
       
  1463 text %mlref {*
       
  1464   \begin{mldecls}
       
  1465   @{index_ML_type "'a Unsynchronized.ref"} \\
       
  1466   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
       
  1467   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
       
  1468   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
       
  1469   \end{mldecls}
       
  1470 *}
       
  1471 
       
  1472 text {* Due to ubiquitous parallelism in Isabelle/ML (see also
       
  1473   \secref{sec:multi-threading}), the mutable reference cells of
       
  1474   Standard ML are notorious for causing problems.  In a highly
       
  1475   parallel system, both correctness \emph{and} performance are easily
       
  1476   degraded when using mutable data.
       
  1477 
       
  1478   The unwieldy name of @{ML Unsynchronized.ref} for the constructor
       
  1479   for references in Isabelle/ML emphasizes the inconveniences caused by
       
  1480   mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are
       
  1481   unchanged, but should be used with special precautions, say in a
       
  1482   strictly local situation that is guaranteed to be restricted to
       
  1483   sequential evaluation --- now and in the future.
       
  1484 
       
  1485   \begin{warn}
       
  1486   Never @{ML_text "open Unsynchronized"}, not even in a local scope!
       
  1487   Pretending that mutable state is no problem is a very bad idea.
       
  1488   \end{warn}
       
  1489 *}
       
  1490 
       
  1491 
       
  1492 section {* Thread-safe programming \label{sec:multi-threading} *}
       
  1493 
       
  1494 text {* Multi-threaded execution has become an everyday reality in
       
  1495   Isabelle since Poly/ML 5.2.1 and Isabelle2008.  Isabelle/ML provides
       
  1496   implicit and explicit parallelism by default, and there is no way
       
  1497   for user-space tools to ``opt out''.  ML programs that are purely
       
  1498   functional, output messages only via the official channels
       
  1499   (\secref{sec:message-channels}), and do not intercept interrupts
       
  1500   (\secref{sec:exceptions}) can participate in the multi-threaded
       
  1501   environment immediately without further ado.
       
  1502 
       
  1503   More ambitious tools with more fine-grained interaction with the
       
  1504   environment need to observe the principles explained below.
       
  1505 *}
       
  1506 
       
  1507 
       
  1508 subsection {* Multi-threading with shared memory *}
       
  1509 
       
  1510 text {* Multiple threads help to organize advanced operations of the
       
  1511   system, such as real-time conditions on command transactions,
       
  1512   sub-components with explicit communication, general asynchronous
       
  1513   interaction etc.  Moreover, parallel evaluation is a prerequisite to
       
  1514   make adequate use of the CPU resources that are available on
       
  1515   multi-core systems.\footnote{Multi-core computing does not mean that
       
  1516   there are ``spare cycles'' to be wasted.  It means that the
       
  1517   continued exponential speedup of CPU performance due to ``Moore's
       
  1518   Law'' follows different rules: clock frequency has reached its peak
       
  1519   around 2005, and applications need to be parallelized in order to
       
  1520   avoid a perceived loss of performance.  See also
       
  1521   \cite{Sutter:2005}.}
       
  1522 
       
  1523   Isabelle/Isar exploits the inherent structure of theories and proofs
       
  1524   to support \emph{implicit parallelism} to a large extent.  LCF-style
       
  1525   theorem provides almost ideal conditions for that, see also
       
  1526   \cite{Wenzel:2009}.  This means, significant parts of theory and
       
  1527   proof checking is parallelized by default.  A maximum speedup-factor
       
  1528   of 3.0 on 4 cores and 5.0 on 8 cores can be
       
  1529   expected.\footnote{Further scalability is limited due to garbage
       
  1530   collection, which is still sequential in Poly/ML 5.2/5.3/5.4.  It
       
  1531   helps to provide initial heap space generously, using the
       
  1532   \texttt{-H} option.  Initial heap size needs to be scaled-up
       
  1533   together with the number of CPU cores: approximately 1--2\,GB per
       
  1534   core..}
       
  1535 
       
  1536   \medskip ML threads lack the memory protection of separate
       
  1537   processes, and operate concurrently on shared heap memory.  This has
       
  1538   the advantage that results of independent computations are directly
       
  1539   available to other threads: abstract values can be passed without
       
  1540   copying or awkward serialization that is typically required for
       
  1541   separate processes.
       
  1542 
       
  1543   To make shared-memory multi-threading work robustly and efficiently,
       
  1544   some programming guidelines need to be observed.  While the ML
       
  1545   system is responsible to maintain basic integrity of the
       
  1546   representation of ML values in memory, the application programmer
       
  1547   needs to ensure that multi-threaded execution does not break the
       
  1548   intended semantics.
       
  1549 
       
  1550   \begin{warn}
       
  1551   To participate in implicit parallelism, tools need to be
       
  1552   thread-safe.  A single ill-behaved tool can affect the stability and
       
  1553   performance of the whole system.
       
  1554   \end{warn}
       
  1555 
       
  1556   Apart from observing the principles of thread-safeness passively,
       
  1557   advanced tools may also exploit parallelism actively, e.g.\ by using
       
  1558   ``future values'' (\secref{sec:futures}) or the more basic library
       
  1559   functions for parallel list operations (\secref{sec:parlist}).
       
  1560 
       
  1561   \begin{warn}
       
  1562   Parallel computing resources are managed centrally by the
       
  1563   Isabelle/ML infrastructure.  User programs must not fork their own
       
  1564   ML threads to perform computations.
       
  1565   \end{warn}
       
  1566 *}
       
  1567 
       
  1568 
       
  1569 subsection {* Critical shared resources *}
       
  1570 
       
  1571 text {* Thread-safeness is mainly concerned about concurrent
       
  1572   read/write access to shared resources, which are outside the purely
       
  1573   functional world of ML.  This covers the following in particular.
       
  1574 
       
  1575   \begin{itemize}
       
  1576 
       
  1577   \item Global references (or arrays), i.e.\ mutable memory cells that
       
  1578   persist over several invocations of associated
       
  1579   operations.\footnote{This is independent of the visibility of such
       
  1580   mutable values in the toplevel scope.}
       
  1581 
       
  1582   \item Global state of the running Isabelle/ML process, i.e.\ raw I/O
       
  1583   channels, environment variables, current working directory.
       
  1584 
       
  1585   \item Writable resources in the file-system that are shared among
       
  1586   different threads or external processes.
       
  1587 
       
  1588   \end{itemize}
       
  1589 
       
  1590   Isabelle/ML provides various mechanisms to avoid critical shared
       
  1591   resources in most situations.  As last resort there are some
       
  1592   mechanisms for explicit synchronization.  The following guidelines
       
  1593   help to make Isabelle/ML programs work smoothly in a concurrent
       
  1594   environment.
       
  1595 
       
  1596   \begin{itemize}
       
  1597 
       
  1598   \item Avoid global references altogether.  Isabelle/Isar maintains a
       
  1599   uniform context that incorporates arbitrary data declared by user
       
  1600   programs (\secref{sec:context-data}).  This context is passed as
       
  1601   plain value and user tools can get/map their own data in a purely
       
  1602   functional manner.  Configuration options within the context
       
  1603   (\secref{sec:config-options}) provide simple drop-in replacements
       
  1604   for historic reference variables.
       
  1605 
       
  1606   \item Keep components with local state information re-entrant.
       
  1607   Instead of poking initial values into (private) global references, a
       
  1608   new state record can be created on each invocation, and passed
       
  1609   through any auxiliary functions of the component.  The state record
       
  1610   may well contain mutable references, without requiring any special
       
  1611   synchronizations, as long as each invocation gets its own copy.
       
  1612 
       
  1613   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The
       
  1614   Poly/ML library is thread-safe for each individual output operation,
       
  1615   but the ordering of parallel invocations is arbitrary.  This means
       
  1616   raw output will appear on some system console with unpredictable
       
  1617   interleaving of atomic chunks.
       
  1618 
       
  1619   Note that this does not affect regular message output channels
       
  1620   (\secref{sec:message-channels}).  An official message is associated
       
  1621   with the command transaction from where it originates, independently
       
  1622   of other transactions.  This means each running Isar command has
       
  1623   effectively its own set of message channels, and interleaving can
       
  1624   only happen when commands use parallelism internally (and only at
       
  1625   message boundaries).
       
  1626 
       
  1627   \item Treat environment variables and the current working directory
       
  1628   of the running process as strictly read-only.
       
  1629 
       
  1630   \item Restrict writing to the file-system to unique temporary files.
       
  1631   Isabelle already provides a temporary directory that is unique for
       
  1632   the running process, and there is a centralized source of unique
       
  1633   serial numbers in Isabelle/ML.  Thus temporary files that are passed
       
  1634   to to some external process will be always disjoint, and thus
       
  1635   thread-safe.
       
  1636 
       
  1637   \end{itemize}
       
  1638 *}
       
  1639 
       
  1640 text %mlref {*
       
  1641   \begin{mldecls}
       
  1642   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
       
  1643   @{index_ML serial_string: "unit -> string"} \\
       
  1644   \end{mldecls}
       
  1645 
       
  1646   \begin{description}
       
  1647 
       
  1648   \item @{ML File.tmp_path}~@{text "path"} relocates the base
       
  1649   component of @{text "path"} into the unique temporary directory of
       
  1650   the running Isabelle/ML process.
       
  1651 
       
  1652   \item @{ML serial_string}~@{text "()"} creates a new serial number
       
  1653   that is unique over the runtime of the Isabelle/ML process.
       
  1654 
       
  1655   \end{description}
       
  1656 *}
       
  1657 
       
  1658 text %mlex {* The following example shows how to create unique
       
  1659   temporary file names.
       
  1660 *}
       
  1661 
       
  1662 ML {*
       
  1663   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
       
  1664   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
       
  1665   @{assert} (tmp1 <> tmp2);
       
  1666 *}
       
  1667 
       
  1668 
       
  1669 subsection {* Explicit synchronization *}
       
  1670 
       
  1671 text {* Isabelle/ML also provides some explicit synchronization
       
  1672   mechanisms, for the rare situations where mutable shared resources
       
  1673   are really required.  These are based on the synchronizations
       
  1674   primitives of Poly/ML, which have been adapted to the specific
       
  1675   assumptions of the concurrent Isabelle/ML environment.  User code
       
  1676   must not use the Poly/ML primitives directly!
       
  1677 
       
  1678   \medskip The most basic synchronization concept is a single
       
  1679   \emph{critical section} (also called ``monitor'' in the literature).
       
  1680   A thread that enters the critical section prevents all other threads
       
  1681   from doing the same.  A thread that is already within the critical
       
  1682   section may re-enter it in an idempotent manner.
       
  1683 
       
  1684   Such centralized locking is convenient, because it prevents
       
  1685   deadlocks by construction.
       
  1686 
       
  1687   \medskip More fine-grained locking works via \emph{synchronized
       
  1688   variables}.  An explicit state component is associated with
       
  1689   mechanisms for locking and signaling.  There are operations to
       
  1690   await a condition, change the state, and signal the change to all
       
  1691   other waiting threads.
       
  1692 
       
  1693   Here the synchronized access to the state variable is \emph{not}
       
  1694   re-entrant: direct or indirect nesting within the same thread will
       
  1695   cause a deadlock!
       
  1696 *}
       
  1697 
       
  1698 text %mlref {*
       
  1699   \begin{mldecls}
       
  1700   @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
       
  1701   @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
       
  1702   \end{mldecls}
       
  1703   \begin{mldecls}
       
  1704   @{index_ML_type "'a Synchronized.var"} \\
       
  1705   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
       
  1706   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
       
  1707   ('a -> ('b * 'a) option) -> 'b"} \\
       
  1708   \end{mldecls}
       
  1709 
       
  1710   \begin{description}
       
  1711 
       
  1712   \item @{ML NAMED_CRITICAL}~@{text "name e"} evaluates @{text "e ()"}
       
  1713   within the central critical section of Isabelle/ML.  No other thread
       
  1714   may do so at the same time, but non-critical parallel execution will
       
  1715   continue.  The @{text "name"} argument is used for tracing and might
       
  1716   help to spot sources of congestion.
       
  1717 
       
  1718   Entering the critical section without contention is very fast, and
       
  1719   several basic system operations do so frequently.  Each thread
       
  1720   should stay within the critical section quickly only very briefly,
       
  1721   otherwise parallel performance may degrade.
       
  1722 
       
  1723   \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
       
  1724   name argument.
       
  1725 
       
  1726   \item Type @{ML_type "'a Synchronized.var"} represents synchronized
       
  1727   variables with state of type @{ML_type 'a}.
       
  1728 
       
  1729   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized
       
  1730   variable that is initialized with value @{text "x"}.  The @{text
       
  1731   "name"} is used for tracing.
       
  1732 
       
  1733   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the
       
  1734   function @{text "f"} operate within a critical section on the state
       
  1735   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it
       
  1736   continues to wait on the internal condition variable, expecting that
       
  1737   some other thread will eventually change the content in a suitable
       
  1738   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is
       
  1739   satisfied and assigns the new state value @{text "x'"}, broadcasts a
       
  1740   signal to all waiting threads on the associated condition variable,
       
  1741   and returns the result @{text "y"}.
       
  1742 
       
  1743   \end{description}
       
  1744 
       
  1745   There are some further variants of the @{ML
       
  1746   Synchronized.guarded_access} combinator, see @{file
       
  1747   "~~/src/Pure/Concurrent/synchronized.ML"} for details.
       
  1748 *}
       
  1749 
       
  1750 text %mlex {* The following example implements a counter that produces
       
  1751   positive integers that are unique over the runtime of the Isabelle
       
  1752   process:
       
  1753 *}
       
  1754 
       
  1755 ML {*
       
  1756   local
       
  1757     val counter = Synchronized.var "counter" 0;
       
  1758   in
       
  1759     fun next () =
       
  1760       Synchronized.guarded_access counter
       
  1761         (fn i =>
       
  1762           let val j = i + 1
       
  1763           in SOME (j, j) end);
       
  1764   end;
       
  1765 *}
       
  1766 
       
  1767 ML {*
       
  1768   val a = next ();
       
  1769   val b = next ();
       
  1770   @{assert} (a <> b);
       
  1771 *}
       
  1772 
       
  1773 text {* \medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how
       
  1774   to implement a mailbox as synchronized variable over a purely
       
  1775   functional queue. *}
       
  1776 
       
  1777 end