src/Doc/Implementation/ML.thy
author wenzelm
Wed Mar 25 11:39:52 2015 +0100 (2015-03-25)
changeset 59809 87641097d0f3
parent 59624 6c0e70b01111
child 59902 6afbe5a99139
permissions -rw-r--r--
tuned signature;
wenzelm@57347
     1
(*:wrap=hard:maxLineLen=78:*)
wenzelm@57347
     2
wenzelm@29755
     3
theory "ML"
wenzelm@29755
     4
imports Base
wenzelm@29755
     5
begin
wenzelm@18538
     6
wenzelm@58618
     7
chapter \<open>Isabelle/ML\<close>
wenzelm@58618
     8
wenzelm@58618
     9
text \<open>Isabelle/ML is best understood as a certain culture based on
wenzelm@39823
    10
  Standard ML.  Thus it is not a new programming language, but a
wenzelm@39823
    11
  certain way to use SML at an advanced level within the Isabelle
wenzelm@39823
    12
  environment.  This covers a variety of aspects that are geared
wenzelm@39823
    13
  towards an efficient and robust platform for applications of formal
wenzelm@39823
    14
  logic with fully foundational proof construction --- according to
wenzelm@40126
    15
  the well-known \emph{LCF principle}.  There is specific
wenzelm@40126
    16
  infrastructure with library modules to address the needs of this
wenzelm@40126
    17
  difficult task.  For example, the raw parallel programming model of
wenzelm@40126
    18
  Poly/ML is presented as considerably more abstract concept of
wenzelm@57421
    19
  \emph{futures}, which is then used to augment the inference
wenzelm@57421
    20
  kernel, Isar theory and proof interpreter, and PIDE document management.
wenzelm@39823
    21
wenzelm@39823
    22
  The main aspects of Isabelle/ML are introduced below.  These
wenzelm@39823
    23
  first-hand explanations should help to understand how proper
wenzelm@39823
    24
  Isabelle/ML is to be read and written, and to get access to the
wenzelm@39823
    25
  wealth of experience that is expressed in the source text and its
wenzelm@39823
    26
  history of changes.\footnote{See
wenzelm@54703
    27
  @{url "http://isabelle.in.tum.de/repos/isabelle"} for the full
wenzelm@39823
    28
  Mercurial history.  There are symbolic tags to refer to official
wenzelm@39823
    29
  Isabelle releases, as opposed to arbitrary \emph{tip} versions that
wenzelm@58618
    30
  merely reflect snapshots that are never really up-to-date.}\<close>
wenzelm@58618
    31
wenzelm@58618
    32
wenzelm@58618
    33
section \<open>Style and orthography\<close>
wenzelm@58618
    34
wenzelm@58618
    35
text \<open>The sources of Isabelle/Isar are optimized for
wenzelm@39879
    36
  \emph{readability} and \emph{maintainability}.  The main purpose is
wenzelm@39879
    37
  to tell an informed reader what is really going on and how things
wenzelm@39879
    38
  really work.  This is a non-trivial aim, but it is supported by a
wenzelm@39879
    39
  certain style of writing Isabelle/ML that has emerged from long
wenzelm@39879
    40
  years of system development.\footnote{See also the interesting style
wenzelm@39879
    41
  guide for OCaml
wenzelm@54703
    42
  @{url "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"}
wenzelm@39878
    43
  which shares many of our means and ends.}
wenzelm@39878
    44
wenzelm@39878
    45
  The main principle behind any coding style is \emph{consistency}.
wenzelm@39879
    46
  For a single author of a small program this merely means ``choose
wenzelm@39878
    47
  your style and stick to it''.  A complex project like Isabelle, with
wenzelm@39879
    48
  long years of development and different contributors, requires more
wenzelm@39879
    49
  standardization.  A coding style that is changed every few years or
wenzelm@39879
    50
  with every new contributor is no style at all, because consistency
wenzelm@39879
    51
  is quickly lost.  Global consistency is hard to achieve, though.
wenzelm@40126
    52
  Nonetheless, one should always strive at least for local consistency
wenzelm@40126
    53
  of modules and sub-systems, without deviating from some general
wenzelm@40126
    54
  principles how to write Isabelle/ML.
wenzelm@39878
    55
wenzelm@40126
    56
  In a sense, good coding style is like an \emph{orthography} for the
wenzelm@40126
    57
  sources: it helps to read quickly over the text and see through the
wenzelm@40126
    58
  main points, without getting distracted by accidental presentation
wenzelm@40126
    59
  of free-style code.
wenzelm@58618
    60
\<close>
wenzelm@58618
    61
wenzelm@58618
    62
wenzelm@58618
    63
subsection \<open>Header and sectioning\<close>
wenzelm@58618
    64
wenzelm@58618
    65
text \<open>Isabelle source files have a certain standardized header
wenzelm@39879
    66
  format (with precise spacing) that follows ancient traditions
wenzelm@39879
    67
  reaching back to the earliest versions of the system by Larry
wenzelm@40800
    68
  Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.
wenzelm@39878
    69
wenzelm@39878
    70
  The header includes at least @{verbatim Title} and @{verbatim
wenzelm@39878
    71
  Author} entries, followed by a prose description of the purpose of
wenzelm@39878
    72
  the module.  The latter can range from a single line to several
wenzelm@39878
    73
  paragraphs of explanations.
wenzelm@39878
    74
wenzelm@39878
    75
  The rest of the file is divided into sections, subsections,
wenzelm@40126
    76
  subsubsections, paragraphs etc.\ using a simple layout via ML
wenzelm@40126
    77
  comments as follows.
wenzelm@39878
    78
wenzelm@58723
    79
  \begin{verbatim}
wenzelm@58723
    80
  (*** section ***)
wenzelm@58723
    81
wenzelm@58723
    82
  (** subsection **)
wenzelm@58723
    83
wenzelm@58723
    84
  (* subsubsection *)
wenzelm@58723
    85
wenzelm@58723
    86
  (*short paragraph*)
wenzelm@58723
    87
wenzelm@58723
    88
  (*
wenzelm@58723
    89
    long paragraph,
wenzelm@58723
    90
    with more text
wenzelm@58723
    91
  *)
wenzelm@58723
    92
  \end{verbatim}
wenzelm@39878
    93
wenzelm@39878
    94
  As in regular typography, there is some extra space \emph{before}
wenzelm@59624
    95
  section headings that are adjacent to plain text, but not other headings
wenzelm@57421
    96
  as in the example above.
wenzelm@39878
    97
wenzelm@39878
    98
  \medskip The precise wording of the prose text given in these
wenzelm@40126
    99
  headings is chosen carefully to introduce the main theme of the
wenzelm@39879
   100
  subsequent formal ML text.
wenzelm@58618
   101
\<close>
wenzelm@58618
   102
wenzelm@58618
   103
wenzelm@58618
   104
subsection \<open>Naming conventions\<close>
wenzelm@58618
   105
wenzelm@58618
   106
text \<open>Since ML is the primary medium to express the meaning of the
wenzelm@39879
   107
  source text, naming of ML entities requires special care.
wenzelm@39879
   108
wenzelm@39881
   109
  \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely
wenzelm@39881
   110
  4, but not more) that are separated by underscore.  There are three
wenzelm@40126
   111
  variants concerning upper or lower case letters, which are used for
wenzelm@39881
   112
  certain ML categories as follows:
wenzelm@39880
   113
wenzelm@39880
   114
  \medskip
wenzelm@39880
   115
  \begin{tabular}{lll}
wenzelm@39880
   116
  variant & example & ML categories \\\hline
wenzelm@40149
   117
  lower-case & @{ML_text foo_bar} & values, types, record fields \\
wenzelm@40149
   118
  capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
wenzelm@40149
   119
  upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
wenzelm@39880
   120
  \end{tabular}
wenzelm@39880
   121
  \medskip
wenzelm@39880
   122
wenzelm@39880
   123
  For historical reasons, many capitalized names omit underscores,
wenzelm@40149
   124
  e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.
bulwahn@47180
   125
  Genuine mixed-case names are \emph{not} used, because clear division
wenzelm@40126
   126
  of words is essential for readability.\footnote{Camel-case was
wenzelm@40126
   127
  invented to workaround the lack of underscore in some early
wenzelm@40126
   128
  non-ASCII character sets.  Later it became habitual in some language
wenzelm@40126
   129
  communities that are now strong in numbers.}
wenzelm@39880
   130
wenzelm@39881
   131
  A single (capital) character does not count as ``word'' in this
wenzelm@40126
   132
  respect: some Isabelle/ML names are suffixed by extra markers like
wenzelm@40149
   133
  this: @{ML_text foo_barT}.
wenzelm@39881
   134
wenzelm@40149
   135
  Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
wenzelm@40149
   136
  foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
wenzelm@39881
   137
  foo''''} or more.  Decimal digits scale better to larger numbers,
wenzelm@40149
   138
  e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
wenzelm@39880
   139
wenzelm@39880
   140
  \paragraph{Scopes.}  Apart from very basic library modules, ML
wenzelm@39880
   141
  structures are not ``opened'', but names are referenced with
wenzelm@39881
   142
  explicit qualification, as in @{ML Syntax.string_of_term} for
wenzelm@39880
   143
  example.  When devising names for structures and their components it
wenzelm@57421
   144
  is important to aim at eye-catching compositions of both parts, because
wenzelm@40126
   145
  this is how they are seen in the sources and documentation.  For the
wenzelm@40126
   146
  same reasons, aliases of well-known library functions should be
wenzelm@40126
   147
  avoided.
wenzelm@39880
   148
wenzelm@40126
   149
  Local names of function abstraction or case/let bindings are
wenzelm@39880
   150
  typically shorter, sometimes using only rudiments of ``words'',
wenzelm@39880
   151
  while still avoiding cryptic shorthands.  An auxiliary function
wenzelm@40149
   152
  called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is
wenzelm@39880
   153
  considered bad style.
wenzelm@39880
   154
wenzelm@39880
   155
  Example:
wenzelm@39880
   156
wenzelm@39880
   157
  \begin{verbatim}
wenzelm@39880
   158
  (* RIGHT *)
wenzelm@39880
   159
wenzelm@39880
   160
  fun print_foo ctxt foo =
wenzelm@39880
   161
    let
wenzelm@39881
   162
      fun print t = ... Syntax.string_of_term ctxt t ...
wenzelm@39881
   163
    in ... end;
wenzelm@39881
   164
wenzelm@39881
   165
wenzelm@39881
   166
  (* RIGHT *)
wenzelm@39881
   167
wenzelm@39881
   168
  fun print_foo ctxt foo =
wenzelm@39881
   169
    let
wenzelm@39880
   170
      val string_of_term = Syntax.string_of_term ctxt;
wenzelm@39880
   171
      fun print t = ... string_of_term t ...
wenzelm@39880
   172
    in ... end;
wenzelm@39880
   173
wenzelm@39880
   174
wenzelm@39880
   175
  (* WRONG *)
wenzelm@39880
   176
wenzelm@39880
   177
  val string_of_term = Syntax.string_of_term;
wenzelm@39880
   178
wenzelm@39880
   179
  fun print_foo ctxt foo =
wenzelm@39880
   180
    let
wenzelm@39880
   181
      fun aux t = ... string_of_term ctxt t ...
wenzelm@39880
   182
    in ... end;
wenzelm@39880
   183
  \end{verbatim}
wenzelm@39880
   184
wenzelm@39880
   185
wenzelm@40126
   186
  \paragraph{Specific conventions.} Here are some specific name forms
wenzelm@40126
   187
  that occur frequently in the sources.
wenzelm@39881
   188
wenzelm@39881
   189
  \begin{itemize}
wenzelm@39881
   190
wenzelm@40149
   191
  \item A function that maps @{ML_text foo} to @{ML_text bar} is
wenzelm@40149
   192
  called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
wenzelm@57421
   193
  @{ML_text foo2bar}, nor @{ML_text bar_from_foo}, nor @{ML_text
wenzelm@57421
   194
  bar_for_foo}, nor @{ML_text bar4foo}).
wenzelm@39881
   195
wenzelm@40149
   196
  \item The name component @{ML_text legacy} means that the operation
wenzelm@39881
   197
  is about to be discontinued soon.
wenzelm@39881
   198
wenzelm@40149
   199
  \item The name component @{ML_text global} means that this works
wenzelm@39881
   200
  with the background theory instead of the regular local context
wenzelm@39881
   201
  (\secref{sec:context}), sometimes for historical reasons, sometimes
wenzelm@39881
   202
  due a genuine lack of locality of the concept involved, sometimes as
wenzelm@39881
   203
  a fall-back for the lack of a proper context in the application
wenzelm@39881
   204
  code.  Whenever there is a non-global variant available, the
wenzelm@39881
   205
  application should be migrated to use it with a proper local
wenzelm@39881
   206
  context.
wenzelm@39881
   207
wenzelm@39881
   208
  \item Variables of the main context types of the Isabelle/Isar
wenzelm@39881
   209
  framework (\secref{sec:context} and \chref{ch:local-theory}) have
wenzelm@40126
   210
  firm naming conventions as follows:
wenzelm@39881
   211
wenzelm@39881
   212
  \begin{itemize}
wenzelm@39881
   213
wenzelm@40149
   214
  \item theories are called @{ML_text thy}, rarely @{ML_text theory}
wenzelm@40149
   215
  (never @{ML_text thry})
wenzelm@39881
   216
wenzelm@40149
   217
  \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text
wenzelm@40149
   218
  context} (never @{ML_text ctx})
wenzelm@39881
   219
wenzelm@57421
   220
  \item generic contexts are called @{ML_text context}
wenzelm@39881
   221
wenzelm@40149
   222
  \item local theories are called @{ML_text lthy}, except for local
wenzelm@40126
   223
  theories that are treated as proof context (which is a semantic
wenzelm@40126
   224
  super-type)
wenzelm@39881
   225
wenzelm@39881
   226
  \end{itemize}
wenzelm@39881
   227
wenzelm@39881
   228
  Variations with primed or decimal numbers are always possible, as
wenzelm@56579
   229
  well as semantic prefixes like @{ML_text foo_thy} or @{ML_text
wenzelm@39881
   230
  bar_ctxt}, but the base conventions above need to be preserved.
wenzelm@57421
   231
  This allows to emphasize their data flow via plain regular
wenzelm@57421
   232
  expressions in the text editor.
wenzelm@39881
   233
wenzelm@40126
   234
  \item The main logical entities (\secref{ch:logic}) have established
wenzelm@40126
   235
  naming convention as follows:
wenzelm@39881
   236
wenzelm@39881
   237
  \begin{itemize}
wenzelm@39881
   238
wenzelm@40149
   239
  \item sorts are called @{ML_text S}
wenzelm@39881
   240
wenzelm@40149
   241
  \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text
wenzelm@40149
   242
  ty} (never @{ML_text t})
wenzelm@39881
   243
wenzelm@40149
   244
  \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text
wenzelm@40149
   245
  tm} (never @{ML_text trm})
wenzelm@39881
   246
wenzelm@40149
   247
  \item certified types are called @{ML_text cT}, rarely @{ML_text
wenzelm@39881
   248
  T}, with variants as for types
wenzelm@39881
   249
wenzelm@40149
   250
  \item certified terms are called @{ML_text ct}, rarely @{ML_text
wenzelm@52733
   251
  t}, with variants as for terms (never @{ML_text ctrm})
wenzelm@39881
   252
wenzelm@40149
   253
  \item theorems are called @{ML_text th}, or @{ML_text thm}
wenzelm@39881
   254
wenzelm@39881
   255
  \end{itemize}
wenzelm@39881
   256
wenzelm@39881
   257
  Proper semantic names override these conventions completely.  For
wenzelm@39881
   258
  example, the left-hand side of an equation (as a term) can be called
wenzelm@40149
   259
  @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known
wenzelm@40149
   260
  to be a variable can be called @{ML_text v} or @{ML_text x}.
wenzelm@39881
   261
wenzelm@40310
   262
  \item Tactics (\secref{sec:tactics}) are sufficiently important to
wenzelm@40310
   263
  have specific naming conventions.  The name of a basic tactic
wenzelm@40310
   264
  definition always has a @{ML_text "_tac"} suffix, the subgoal index
wenzelm@40310
   265
  (if applicable) is always called @{ML_text i}, and the goal state
wenzelm@40310
   266
  (if made explicit) is usually called @{ML_text st} instead of the
wenzelm@40310
   267
  somewhat misleading @{ML_text thm}.  Any other arguments are given
wenzelm@40310
   268
  before the latter two, and the general context is given first.
wenzelm@40310
   269
  Example:
wenzelm@40310
   270
wenzelm@40310
   271
  \begin{verbatim}
wenzelm@40310
   272
  fun my_tac ctxt arg1 arg2 i st = ...
wenzelm@40310
   273
  \end{verbatim}
wenzelm@40310
   274
wenzelm@40310
   275
  Note that the goal state @{ML_text st} above is rarely made
wenzelm@40310
   276
  explicit, if tactic combinators (tacticals) are used as usual.
wenzelm@40310
   277
wenzelm@57421
   278
  A tactic that requires a proof context needs to make that explicit as seen
wenzelm@57421
   279
  in the @{verbatim ctxt} argument above. Do not refer to the background
wenzelm@57421
   280
  theory of @{verbatim st} -- it is not a proper context, but merely a formal
wenzelm@57421
   281
  certificate.
wenzelm@57421
   282
wenzelm@39881
   283
  \end{itemize}
wenzelm@58618
   284
\<close>
wenzelm@58618
   285
wenzelm@58618
   286
wenzelm@58618
   287
subsection \<open>General source layout\<close>
wenzelm@58618
   288
wenzelm@58618
   289
text \<open>
wenzelm@57421
   290
  The general Isabelle/ML source layout imitates regular type-setting
wenzelm@57421
   291
  conventions, augmented by the requirements for deeply nested expressions
wenzelm@57421
   292
  that are commonplace in functional programming.
wenzelm@57421
   293
wenzelm@57421
   294
  \paragraph{Line length} is limited to 80 characters according to ancient
wenzelm@40126
   295
  standards, but we allow as much as 100 characters (not
wenzelm@40126
   296
  more).\footnote{Readability requires to keep the beginning of a line
wenzelm@39881
   297
  in view while watching its end.  Modern wide-screen displays do not
wenzelm@40126
   298
  change the way how the human brain works.  Sources also need to be
wenzelm@40126
   299
  printable on plain paper with reasonable font-size.} The extra 20
wenzelm@39880
   300
  characters acknowledge the space requirements due to qualified
wenzelm@39880
   301
  library references in Isabelle/ML.
wenzelm@39878
   302
wenzelm@39878
   303
  \paragraph{White-space} is used to emphasize the structure of
wenzelm@39878
   304
  expressions, following mostly standard conventions for mathematical
wenzelm@39878
   305
  typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
wenzelm@39879
   306
  defines positioning of spaces for parentheses, punctuation, and
wenzelm@39879
   307
  infixes as illustrated here:
wenzelm@39878
   308
wenzelm@39878
   309
  \begin{verbatim}
wenzelm@39878
   310
  val x = y + z * (a + b);
wenzelm@39878
   311
  val pair = (a, b);
wenzelm@39878
   312
  val record = {foo = 1, bar = 2};
wenzelm@39878
   313
  \end{verbatim}
wenzelm@39878
   314
wenzelm@39879
   315
  Lines are normally broken \emph{after} an infix operator or
wenzelm@39879
   316
  punctuation character.  For example:
wenzelm@39878
   317
wenzelm@39878
   318
  \begin{verbatim}
wenzelm@39878
   319
  val x =
wenzelm@39878
   320
    a +
wenzelm@39878
   321
    b +
wenzelm@39878
   322
    c;
wenzelm@39878
   323
wenzelm@39878
   324
  val tuple =
wenzelm@39878
   325
   (a,
wenzelm@39878
   326
    b,
wenzelm@39878
   327
    c);
wenzelm@39878
   328
  \end{verbatim}
wenzelm@39878
   329
wenzelm@40149
   330
  Some special infixes (e.g.\ @{ML_text "|>"}) work better at the
wenzelm@39879
   331
  start of the line, but punctuation is always at the end.
wenzelm@39878
   332
wenzelm@39878
   333
  Function application follows the tradition of @{text "\<lambda>"}-calculus,
wenzelm@40149
   334
  not informal mathematics.  For example: @{ML_text "f a b"} for a
wenzelm@40149
   335
  curried function, or @{ML_text "g (a, b)"} for a tupled function.
wenzelm@40149
   336
  Note that the space between @{ML_text g} and the pair @{ML_text
wenzelm@39879
   337
  "(a, b)"} follows the important principle of
wenzelm@40149
   338
  \emph{compositionality}: the layout of @{ML_text "g p"} does not
wenzelm@40149
   339
  change when @{ML_text "p"} is refined to the concrete pair
wenzelm@40149
   340
  @{ML_text "(a, b)"}.
wenzelm@39878
   341
wenzelm@39878
   342
  \paragraph{Indentation} uses plain spaces, never hard
wenzelm@39878
   343
  tabulators.\footnote{Tabulators were invented to move the carriage
wenzelm@39878
   344
  of a type-writer to certain predefined positions.  In software they
wenzelm@39878
   345
  could be used as a primitive run-length compression of consecutive
wenzelm@39878
   346
  spaces, but the precise result would depend on non-standardized
wenzelm@57421
   347
  text editor configuration.}
wenzelm@39878
   348
wenzelm@39879
   349
  Each level of nesting is indented by 2 spaces, sometimes 1, very
wenzelm@40126
   350
  rarely 4, never 8 or any other odd number.
wenzelm@39878
   351
wenzelm@39879
   352
  Indentation follows a simple logical format that only depends on the
wenzelm@39879
   353
  nesting depth, not the accidental length of the text that initiates
wenzelm@39879
   354
  a level of nesting.  Example:
wenzelm@39878
   355
wenzelm@39878
   356
  \begin{verbatim}
wenzelm@39880
   357
  (* RIGHT *)
wenzelm@39880
   358
wenzelm@39878
   359
  if b then
wenzelm@39879
   360
    expr1_part1
wenzelm@39879
   361
    expr1_part2
wenzelm@39878
   362
  else
wenzelm@39879
   363
    expr2_part1
wenzelm@39879
   364
    expr2_part2
wenzelm@39878
   365
wenzelm@39880
   366
wenzelm@39880
   367
  (* WRONG *)
wenzelm@39880
   368
wenzelm@39879
   369
  if b then expr1_part1
wenzelm@39879
   370
            expr1_part2
wenzelm@39879
   371
  else expr2_part1
wenzelm@39879
   372
       expr2_part2
wenzelm@39878
   373
  \end{verbatim}
wenzelm@39878
   374
wenzelm@39878
   375
  The second form has many problems: it assumes a fixed-width font
wenzelm@39879
   376
  when viewing the sources, it uses more space on the line and thus
wenzelm@39879
   377
  makes it hard to observe its strict length limit (working against
wenzelm@39878
   378
  \emph{readability}), it requires extra editing to adapt the layout
wenzelm@39879
   379
  to changes of the initial text (working against
wenzelm@39878
   380
  \emph{maintainability}) etc.
wenzelm@39878
   381
wenzelm@39879
   382
  \medskip For similar reasons, any kind of two-dimensional or tabular
wenzelm@40126
   383
  layouts, ASCII-art with lines or boxes of asterisks etc.\ should be
wenzelm@39879
   384
  avoided.
wenzelm@39881
   385
wenzelm@40126
   386
  \paragraph{Complex expressions} that consist of multi-clausal
wenzelm@40149
   387
  function definitions, @{ML_text handle}, @{ML_text case},
wenzelm@40149
   388
  @{ML_text let} (and combinations) require special attention.  The
wenzelm@40126
   389
  syntax of Standard ML is quite ambitious and admits a lot of
wenzelm@40126
   390
  variance that can distort the meaning of the text.
wenzelm@39881
   391
wenzelm@57421
   392
  Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
wenzelm@40149
   393
  @{ML_text case} get extra indentation to indicate the nesting
wenzelm@40126
   394
  clearly.  Example:
wenzelm@39881
   395
wenzelm@39881
   396
  \begin{verbatim}
wenzelm@39881
   397
  (* RIGHT *)
wenzelm@39881
   398
wenzelm@39881
   399
  fun foo p1 =
wenzelm@39881
   400
        expr1
wenzelm@39881
   401
    | foo p2 =
wenzelm@39881
   402
        expr2
wenzelm@39881
   403
wenzelm@39881
   404
wenzelm@39881
   405
  (* WRONG *)
wenzelm@39881
   406
wenzelm@39881
   407
  fun foo p1 =
wenzelm@39881
   408
    expr1
wenzelm@39881
   409
    | foo p2 =
wenzelm@39881
   410
    expr2
wenzelm@39881
   411
  \end{verbatim}
wenzelm@39881
   412
wenzelm@40149
   413
  Body expressions consisting of @{ML_text case} or @{ML_text let}
wenzelm@39881
   414
  require care to maintain compositionality, to prevent loss of
wenzelm@40126
   415
  logical indentation where it is especially important to see the
wenzelm@40126
   416
  structure of the text.  Example:
wenzelm@39881
   417
wenzelm@39881
   418
  \begin{verbatim}
wenzelm@39881
   419
  (* RIGHT *)
wenzelm@39881
   420
wenzelm@39881
   421
  fun foo p1 =
wenzelm@39881
   422
        (case e of
wenzelm@39881
   423
          q1 => ...
wenzelm@39881
   424
        | q2 => ...)
wenzelm@39881
   425
    | foo p2 =
wenzelm@39881
   426
        let
wenzelm@39881
   427
          ...
wenzelm@39881
   428
        in
wenzelm@39881
   429
          ...
wenzelm@39881
   430
        end
wenzelm@39881
   431
wenzelm@39881
   432
wenzelm@39881
   433
  (* WRONG *)
wenzelm@39881
   434
wenzelm@39881
   435
  fun foo p1 = case e of
wenzelm@39881
   436
      q1 => ...
wenzelm@39881
   437
    | q2 => ...
wenzelm@39881
   438
    | foo p2 =
wenzelm@39881
   439
    let
wenzelm@39881
   440
      ...
wenzelm@39881
   441
    in
wenzelm@39881
   442
      ...
wenzelm@39881
   443
    end
wenzelm@39881
   444
  \end{verbatim}
wenzelm@39881
   445
wenzelm@40149
   446
  Extra parentheses around @{ML_text case} expressions are optional,
wenzelm@40126
   447
  but help to analyse the nesting based on character matching in the
wenzelm@57421
   448
  text editor.
wenzelm@39881
   449
wenzelm@39881
   450
  \medskip There are two main exceptions to the overall principle of
wenzelm@39881
   451
  compositionality in the layout of complex expressions.
wenzelm@39881
   452
wenzelm@39881
   453
  \begin{enumerate}
wenzelm@39881
   454
wenzelm@57421
   455
  \item @{ML_text "if"} expressions are iterated as if ML had multi-branch
wenzelm@57421
   456
  conditionals, e.g.
wenzelm@39881
   457
wenzelm@39881
   458
  \begin{verbatim}
wenzelm@39881
   459
  (* RIGHT *)
wenzelm@39881
   460
wenzelm@39881
   461
  if b1 then e1
wenzelm@39881
   462
  else if b2 then e2
wenzelm@39881
   463
  else e3
wenzelm@39881
   464
  \end{verbatim}
wenzelm@39881
   465
wenzelm@40149
   466
  \item @{ML_text fn} abstractions are often layed-out as if they
wenzelm@39881
   467
  would lack any structure by themselves.  This traditional form is
wenzelm@39881
   468
  motivated by the possibility to shift function arguments back and
wenzelm@40126
   469
  forth wrt.\ additional combinators.  Example:
wenzelm@39881
   470
wenzelm@39881
   471
  \begin{verbatim}
wenzelm@39881
   472
  (* RIGHT *)
wenzelm@39881
   473
wenzelm@39881
   474
  fun foo x y = fold (fn z =>
wenzelm@39881
   475
    expr)
wenzelm@39881
   476
  \end{verbatim}
wenzelm@39881
   477
wenzelm@40149
   478
  Here the visual appearance is that of three arguments @{ML_text x},
wenzelm@57421
   479
  @{ML_text y}, @{ML_text z} in a row.
wenzelm@39881
   480
wenzelm@39881
   481
  \end{enumerate}
wenzelm@39881
   482
wenzelm@39881
   483
  Such weakly structured layout should be use with great care.  Here
wenzelm@40153
   484
  are some counter-examples involving @{ML_text let} expressions:
wenzelm@39881
   485
wenzelm@39881
   486
  \begin{verbatim}
wenzelm@39881
   487
  (* WRONG *)
wenzelm@39881
   488
wenzelm@39881
   489
  fun foo x = let
wenzelm@39881
   490
      val y = ...
wenzelm@39881
   491
    in ... end
wenzelm@39881
   492
wenzelm@41162
   493
wenzelm@41162
   494
  (* WRONG *)
wenzelm@41162
   495
wenzelm@40153
   496
  fun foo x = let
wenzelm@40153
   497
    val y = ...
wenzelm@40153
   498
  in ... end
wenzelm@40153
   499
wenzelm@41162
   500
wenzelm@41162
   501
  (* WRONG *)
wenzelm@41162
   502
wenzelm@39881
   503
  fun foo x =
wenzelm@39881
   504
  let
wenzelm@39881
   505
    val y = ...
wenzelm@39881
   506
  in ... end
wenzelm@57421
   507
wenzelm@57421
   508
wenzelm@57421
   509
  (* WRONG *)
wenzelm@57421
   510
wenzelm@57421
   511
  fun foo x =
wenzelm@57421
   512
    let
wenzelm@57421
   513
      val y = ...
wenzelm@57421
   514
    in
wenzelm@57421
   515
      ... end
wenzelm@39881
   516
  \end{verbatim}
wenzelm@39881
   517
wenzelm@39881
   518
  \medskip In general the source layout is meant to emphasize the
wenzelm@39881
   519
  structure of complex language expressions, not to pretend that SML
wenzelm@57421
   520
  had a completely different syntax (say that of Haskell, Scala, Java).
wenzelm@58618
   521
\<close>
wenzelm@58618
   522
wenzelm@58618
   523
wenzelm@58618
   524
section \<open>ML embedded into Isabelle/Isar\<close>
wenzelm@58618
   525
wenzelm@58618
   526
text \<open>ML and Isar are intertwined via an open-ended bootstrap
wenzelm@39824
   527
  process that provides more and more programming facilities and
wenzelm@39824
   528
  logical content in an alternating manner.  Bootstrapping starts from
wenzelm@39824
   529
  the raw environment of existing implementations of Standard ML
wenzelm@39824
   530
  (mainly Poly/ML, but also SML/NJ).
wenzelm@39823
   531
wenzelm@57421
   532
  Isabelle/Pure marks the point where the raw ML toplevel is superseded by
wenzelm@57421
   533
  Isabelle/ML within the Isar theory and proof language, with a uniform
wenzelm@57421
   534
  context for arbitrary ML values (see also \secref{sec:context}). This formal
wenzelm@57421
   535
  environment holds ML compiler bindings, logical entities, and many other
wenzelm@57421
   536
  things.
wenzelm@57421
   537
wenzelm@57421
   538
  Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar
wenzelm@57421
   539
  environment by introducing suitable theories with associated ML modules,
wenzelm@57421
   540
  either inlined within @{verbatim ".thy"} files, or as separate @{verbatim
wenzelm@57421
   541
  ".ML"} files that are loading from some theory. Thus Isabelle/HOL is defined
wenzelm@57421
   542
  as a regular user-space application within the Isabelle framework. Further
wenzelm@57421
   543
  add-on tools can be implemented in ML within the Isar context in the same
wenzelm@57421
   544
  manner: ML is part of the standard repertoire of Isabelle, and there is no
wenzelm@57421
   545
  distinction between ``users'' and ``developers'' in this respect.
wenzelm@58618
   546
\<close>
wenzelm@58618
   547
wenzelm@58618
   548
wenzelm@58618
   549
subsection \<open>Isar ML commands\<close>
wenzelm@58618
   550
wenzelm@58618
   551
text \<open>
wenzelm@57421
   552
  The primary Isar source language provides facilities to ``open a window'' to
wenzelm@57421
   553
  the underlying ML compiler. Especially see the Isar commands @{command_ref
wenzelm@57421
   554
  "ML_file"} and @{command_ref "ML"}: both work the same way, but the source
wenzelm@57421
   555
  text is provided differently, via a file vs.\ inlined, respectively. Apart
wenzelm@57421
   556
  from embedding ML into the main theory definition like that, there are many
wenzelm@57421
   557
  more commands that refer to ML source, such as @{command_ref setup} or
wenzelm@57421
   558
  @{command_ref declaration}. Even more fine-grained embedding of ML into Isar
wenzelm@57421
   559
  is encountered in the proof method @{method_ref tactic}, which refines the
wenzelm@57421
   560
  pending goal state via a given expression of type @{ML_type tactic}.
wenzelm@58618
   561
\<close>
wenzelm@58618
   562
wenzelm@58618
   563
text %mlex \<open>The following artificial example demonstrates some ML
wenzelm@39824
   564
  toplevel declarations within the implicit Isar theory context.  This
wenzelm@39824
   565
  is regular functional programming without referring to logical
wenzelm@39824
   566
  entities yet.
wenzelm@58618
   567
\<close>
wenzelm@58618
   568
wenzelm@58618
   569
ML \<open>
wenzelm@39823
   570
  fun factorial 0 = 1
wenzelm@39823
   571
    | factorial n = n * factorial (n - 1)
wenzelm@58618
   572
\<close>
wenzelm@58618
   573
wenzelm@58618
   574
text \<open>Here the ML environment is already managed by Isabelle, i.e.\
wenzelm@39861
   575
  the @{ML factorial} function is not yet accessible in the preceding
wenzelm@39861
   576
  paragraph, nor in a different theory that is independent from the
wenzelm@39861
   577
  current one in the import hierarchy.
wenzelm@39823
   578
wenzelm@57421
   579
  Removing the above ML declaration from the source text will remove any trace
wenzelm@57421
   580
  of this definition, as expected. The Isabelle/ML toplevel environment is
wenzelm@57421
   581
  managed in a \emph{stateless} way: in contrast to the raw ML toplevel, there
wenzelm@57421
   582
  are no global side-effects involved here.\footnote{Such a stateless
wenzelm@57421
   583
  compilation environment is also a prerequisite for robust parallel
wenzelm@57421
   584
  compilation within independent nodes of the implicit theory development
wenzelm@57421
   585
  graph.}
wenzelm@39823
   586
wenzelm@40126
   587
  \medskip The next example shows how to embed ML into Isar proofs, using
wenzelm@59624
   588
  @{command_ref "ML_prf"} instead of @{command_ref "ML"}. As illustrated
wenzelm@59624
   589
  below, the effect on the ML environment is local to the whole proof body,
wenzelm@59624
   590
  but ignoring the block structure. \<close>
wenzelm@39823
   591
wenzelm@40964
   592
notepad
wenzelm@40964
   593
begin
wenzelm@58618
   594
  ML_prf %"ML" \<open>val a = 1\<close>
wenzelm@40126
   595
  {
wenzelm@58618
   596
    ML_prf %"ML" \<open>val b = a + 1\<close>
wenzelm@58618
   597
  } -- \<open>Isar block structure ignored by ML environment\<close>
wenzelm@58618
   598
  ML_prf %"ML" \<open>val c = b + 1\<close>
wenzelm@40964
   599
end
wenzelm@39823
   600
wenzelm@58618
   601
text \<open>By side-stepping the normal scoping rules for Isar proof
wenzelm@40126
   602
  blocks, embedded ML code can refer to the different contexts and
wenzelm@40126
   603
  manipulate corresponding entities, e.g.\ export a fact from a block
wenzelm@40126
   604
  context.
wenzelm@39823
   605
wenzelm@39861
   606
  \medskip Two further ML commands are useful in certain situations:
wenzelm@57421
   607
  @{command_ref ML_val} and @{command_ref ML_command} are \emph{diagnostic} in
wenzelm@57421
   608
  the sense that there is no effect on the underlying environment, and can
wenzelm@57421
   609
  thus be used anywhere. The examples below produce long strings of digits by
wenzelm@57421
   610
  invoking @{ML factorial}: @{command ML_val} takes care of printing the ML
wenzelm@57421
   611
  toplevel result, but @{command ML_command} is silent so we produce an
wenzelm@57421
   612
  explicit output message.
wenzelm@58618
   613
\<close>
wenzelm@58618
   614
wenzelm@58618
   615
ML_val \<open>factorial 100\<close>
wenzelm@58618
   616
ML_command \<open>writeln (string_of_int (factorial 100))\<close>
wenzelm@39823
   617
wenzelm@40964
   618
notepad
wenzelm@40964
   619
begin
wenzelm@58618
   620
  ML_val \<open>factorial 100\<close>
wenzelm@58618
   621
  ML_command \<open>writeln (string_of_int (factorial 100))\<close>
wenzelm@40964
   622
end
wenzelm@39823
   623
wenzelm@39823
   624
wenzelm@58618
   625
subsection \<open>Compile-time context\<close>
wenzelm@58618
   626
wenzelm@58618
   627
text \<open>Whenever the ML compiler is invoked within Isabelle/Isar, the
wenzelm@39825
   628
  formal context is passed as a thread-local reference variable.  Thus
wenzelm@39825
   629
  ML code may access the theory context during compilation, by reading
wenzelm@39825
   630
  or writing the (local) theory under construction.  Note that such
wenzelm@40126
   631
  direct access to the compile-time context is rare.  In practice it
wenzelm@40126
   632
  is typically done via some derived ML functions instead.
wenzelm@58618
   633
\<close>
wenzelm@58618
   634
wenzelm@58618
   635
text %mlref \<open>
wenzelm@39825
   636
  \begin{mldecls}
wenzelm@39825
   637
  @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\
wenzelm@40126
   638
  @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
wenzelm@56199
   639
  @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
wenzelm@56199
   640
  @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
wenzelm@39825
   641
  \end{mldecls}
wenzelm@39825
   642
wenzelm@39825
   643
  \begin{description}
wenzelm@39825
   644
wenzelm@39825
   645
  \item @{ML "ML_Context.the_generic_context ()"} refers to the theory
wenzelm@39825
   646
  context of the ML toplevel --- at compile time.  ML code needs to
wenzelm@39825
   647
  take care to refer to @{ML "ML_Context.the_generic_context ()"}
wenzelm@39825
   648
  correctly.  Recall that evaluation of a function body is delayed
wenzelm@39827
   649
  until actual run-time.
wenzelm@39825
   650
wenzelm@39825
   651
  \item @{ML "Context.>>"}~@{text f} applies context transformation
wenzelm@39825
   652
  @{text f} to the implicit context of the ML toplevel.
wenzelm@39825
   653
wenzelm@56199
   654
  \item @{ML ML_Thms.bind_thms}~@{text "(name, thms)"} stores a list of
wenzelm@39850
   655
  theorems produced in ML both in the (global) theory context and the
wenzelm@57421
   656
  ML toplevel, associating it with the provided name.
wenzelm@57421
   657
wenzelm@57421
   658
  \item @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but
wenzelm@57421
   659
  refers to a singleton fact.
wenzelm@39850
   660
wenzelm@39825
   661
  \end{description}
wenzelm@39825
   662
wenzelm@40126
   663
  It is important to note that the above functions are really
wenzelm@39825
   664
  restricted to the compile time, even though the ML compiler is
wenzelm@39827
   665
  invoked at run-time.  The majority of ML code either uses static
wenzelm@39825
   666
  antiquotations (\secref{sec:ML-antiq}) or refers to the theory or
wenzelm@39825
   667
  proof context at run-time, by explicit functional abstraction.
wenzelm@58618
   668
\<close>
wenzelm@58618
   669
wenzelm@58618
   670
wenzelm@58618
   671
subsection \<open>Antiquotations \label{sec:ML-antiq}\<close>
wenzelm@58618
   672
wenzelm@58618
   673
text \<open>A very important consequence of embedding ML into Isar is the
wenzelm@40126
   674
  concept of \emph{ML antiquotation}.  The standard token language of
wenzelm@40126
   675
  ML is augmented by special syntactic entities of the following form:
wenzelm@39827
   676
wenzelm@55112
   677
  @{rail \<open>
wenzelm@53167
   678
  @{syntax_def antiquote}: '@{' nameref args '}'
wenzelm@55112
   679
  \<close>}
wenzelm@39827
   680
wenzelm@57421
   681
  Here @{syntax nameref} and @{syntax args} are outer syntax categories, as
wenzelm@58555
   682
  defined in @{cite "isabelle-isar-ref"}.
wenzelm@39823
   683
wenzelm@39827
   684
  \medskip A regular antiquotation @{text "@{name args}"} processes
wenzelm@39827
   685
  its arguments by the usual means of the Isar source language, and
wenzelm@39827
   686
  produces corresponding ML source text, either as literal
wenzelm@57421
   687
  \emph{inline} text (e.g.\ @{text "@{term t}"}) or abstract
wenzelm@39827
   688
  \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation
wenzelm@39827
   689
  scheme allows to refer to formal entities in a robust manner, with
wenzelm@39827
   690
  proper static scoping and with some degree of logical checking of
wenzelm@39827
   691
  small portions of the code.
wenzelm@58618
   692
\<close>
wenzelm@58618
   693
wenzelm@58618
   694
wenzelm@58618
   695
subsection \<open>Printing ML values\<close>
wenzelm@58618
   696
wenzelm@58618
   697
text \<open>The ML compiler knows about the structure of values according
wenzelm@57421
   698
  to their static type, and can print them in the manner of its
wenzelm@57421
   699
  toplevel, although the details are non-portable.  The
wenzelm@56399
   700
  antiquotations @{ML_antiquotation_def "make_string"} and
wenzelm@56399
   701
  @{ML_antiquotation_def "print"} provide a quasi-portable way to
wenzelm@56399
   702
  refer to this potential capability of the underlying ML system in
wenzelm@56399
   703
  generic Isabelle/ML sources.
wenzelm@56399
   704
wenzelm@56399
   705
  This is occasionally useful for diagnostic or demonstration
wenzelm@56399
   706
  purposes.  Note that production-quality tools require proper
wenzelm@58618
   707
  user-level error messages, avoiding raw ML values in the output.\<close>
wenzelm@58618
   708
wenzelm@58618
   709
text %mlantiq \<open>
wenzelm@51636
   710
  \begin{matharray}{rcl}
wenzelm@51636
   711
  @{ML_antiquotation_def "make_string"} & : & @{text ML_antiquotation} \\
wenzelm@56399
   712
  @{ML_antiquotation_def "print"} & : & @{text ML_antiquotation} \\
wenzelm@51636
   713
  \end{matharray}
wenzelm@51636
   714
wenzelm@55112
   715
  @{rail \<open>
wenzelm@51636
   716
  @@{ML_antiquotation make_string}
wenzelm@56399
   717
  ;
wenzelm@56399
   718
  @@{ML_antiquotation print} @{syntax name}?
wenzelm@55112
   719
  \<close>}
wenzelm@51636
   720
wenzelm@51636
   721
  \begin{description}
wenzelm@51636
   722
wenzelm@57832
   723
  \item @{text "@{make_string}"} inlines a function to print arbitrary values
wenzelm@57832
   724
  similar to the ML toplevel. The result is compiler dependent and may fall
wenzelm@57832
   725
  back on "?" in certain situations. The value of configuration option
wenzelm@57834
   726
  @{attribute_ref ML_print_depth} determines further details of output.
wenzelm@56399
   727
wenzelm@56399
   728
  \item @{text "@{print f}"} uses the ML function @{text "f: string ->
wenzelm@56399
   729
  unit"} to output the result of @{text "@{make_string}"} above,
wenzelm@56399
   730
  together with the source position of the antiquotation.  The default
wenzelm@56399
   731
  output function is @{ML writeln}.
wenzelm@51636
   732
wenzelm@51636
   733
  \end{description}
wenzelm@58618
   734
\<close>
wenzelm@58618
   735
wenzelm@58618
   736
text %mlex \<open>The following artificial examples show how to produce
wenzelm@58618
   737
  adhoc output of ML values for debugging purposes.\<close>
wenzelm@58618
   738
wenzelm@58618
   739
ML \<open>
wenzelm@51636
   740
  val x = 42;
wenzelm@51636
   741
  val y = true;
wenzelm@51636
   742
wenzelm@51636
   743
  writeln (@{make_string} {x = x, y = y});
wenzelm@56399
   744
wenzelm@56399
   745
  @{print} {x = x, y = y};
wenzelm@56399
   746
  @{print tracing} {x = x, y = y};
wenzelm@58618
   747
\<close>
wenzelm@58618
   748
wenzelm@58618
   749
wenzelm@58618
   750
section \<open>Canonical argument order \label{sec:canonical-argument-order}\<close>
wenzelm@58618
   751
wenzelm@58618
   752
text \<open>Standard ML is a language in the tradition of @{text
wenzelm@39883
   753
  "\<lambda>"}-calculus and \emph{higher-order functional programming},
wenzelm@39883
   754
  similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical
wenzelm@39883
   755
  languages.  Getting acquainted with the native style of representing
wenzelm@39883
   756
  functions in that setting can save a lot of extra boiler-plate of
wenzelm@39883
   757
  redundant shuffling of arguments, auxiliary abstractions etc.
wenzelm@39883
   758
wenzelm@40126
   759
  Functions are usually \emph{curried}: the idea of turning arguments
wenzelm@40126
   760
  of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of
wenzelm@40126
   761
  type @{text "\<tau>"} is represented by the iterated function space
wenzelm@40126
   762
  @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known
wenzelm@40126
   763
  encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried
wenzelm@40126
   764
  version fits more smoothly into the basic calculus.\footnote{The
wenzelm@57421
   765
  difference is even more significant in HOL, because the redundant
wenzelm@57421
   766
  tuple structure needs to be accommodated extraneous proof steps.}
wenzelm@39883
   767
wenzelm@56594
   768
  Currying gives some flexibility due to \emph{partial application}.  A
wenzelm@53071
   769
  function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}
wenzelm@40126
   770
  and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function
wenzelm@39883
   771
  etc.  How well this works in practice depends on the order of
wenzelm@39883
   772
  arguments.  In the worst case, arguments are arranged erratically,
wenzelm@39883
   773
  and using a function in a certain situation always requires some
wenzelm@56579
   774
  glue code.  Thus we would get exponentially many opportunities to
wenzelm@39883
   775
  decorate the code with meaningless permutations of arguments.
wenzelm@39883
   776
wenzelm@39883
   777
  This can be avoided by \emph{canonical argument order}, which
wenzelm@40126
   778
  observes certain standard patterns and minimizes adhoc permutations
wenzelm@40229
   779
  in their application.  In Isabelle/ML, large portions of text can be
wenzelm@52416
   780
  written without auxiliary operations like @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times>
wenzelm@57421
   781
  \<alpha>"} or @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} (the latter is not
wenzelm@52416
   782
  present in the Isabelle/ML library).
wenzelm@39883
   783
wenzelm@57421
   784
  \medskip The main idea is that arguments that vary less are moved
wenzelm@39883
   785
  further to the left than those that vary more.  Two particularly
wenzelm@39883
   786
  important categories of functions are \emph{selectors} and
wenzelm@39883
   787
  \emph{updates}.
wenzelm@39883
   788
wenzelm@39883
   789
  The subsequent scheme is based on a hypothetical set-like container
wenzelm@39883
   790
  of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both
wenzelm@39883
   791
  the names and types of the associated operations are canonical for
wenzelm@39883
   792
  Isabelle/ML.
wenzelm@39883
   793
wenzelm@52416
   794
  \begin{center}
wenzelm@39883
   795
  \begin{tabular}{ll}
wenzelm@39883
   796
  kind & canonical name and type \\\hline
wenzelm@39883
   797
  selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\
wenzelm@39883
   798
  update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\
wenzelm@39883
   799
  \end{tabular}
wenzelm@52416
   800
  \end{center}
wenzelm@39883
   801
wenzelm@39883
   802
  Given a container @{text "B: \<beta>"}, the partially applied @{text
wenzelm@39883
   803
  "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and
wenzelm@39883
   804
  thus represents the intended denotation directly.  It is customary
wenzelm@39883
   805
  to pass the abstract predicate to further operations, not the
wenzelm@39883
   806
  concrete container.  The argument order makes it easy to use other
wenzelm@39883
   807
  combinators: @{text "forall (member B) list"} will check a list of
wenzelm@39883
   808
  elements for membership in @{text "B"} etc. Often the explicit
wenzelm@40126
   809
  @{text "list"} is pointless and can be contracted to @{text "forall
wenzelm@40126
   810
  (member B)"} to get directly a predicate again.
wenzelm@39883
   811
wenzelm@40126
   812
  In contrast, an update operation varies the container, so it moves
wenzelm@39883
   813
  to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to
wenzelm@39883
   814
  insert a value @{text "a"}.  These can be composed naturally as
wenzelm@40126
   815
  @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward
wenzelm@40229
   816
  inversion of the composition order is due to conventional
wenzelm@40126
   817
  mathematical notation, which can be easily amended as explained
wenzelm@40126
   818
  below.
wenzelm@58618
   819
\<close>
wenzelm@58618
   820
wenzelm@58618
   821
wenzelm@58618
   822
subsection \<open>Forward application and composition\<close>
wenzelm@58618
   823
wenzelm@58618
   824
text \<open>Regular function application and infix notation works best for
wenzelm@39883
   825
  relatively deeply structured expressions, e.g.\ @{text "h (f x y + g
wenzelm@40126
   826
  z)"}.  The important special case of \emph{linear transformation}
wenzelm@40126
   827
  applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This
wenzelm@40126
   828
  becomes hard to read and maintain if the functions are themselves
wenzelm@40126
   829
  given as complex expressions.  The notation can be significantly
wenzelm@39883
   830
  improved by introducing \emph{forward} versions of application and
wenzelm@39883
   831
  composition as follows:
wenzelm@39883
   832
wenzelm@39883
   833
  \medskip
wenzelm@39883
   834
  \begin{tabular}{lll}
wenzelm@39883
   835
  @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\
wenzelm@41162
   836
  @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\
wenzelm@39883
   837
  \end{tabular}
wenzelm@39883
   838
  \medskip
wenzelm@39883
   839
wenzelm@39883
   840
  This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or
wenzelm@39883
   841
  @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text
wenzelm@39883
   842
  "x"}.
wenzelm@39883
   843
wenzelm@39883
   844
  \medskip There is an additional set of combinators to accommodate
wenzelm@39883
   845
  multiple results (via pairs) that are passed on as multiple
wenzelm@39883
   846
  arguments (via currying).
wenzelm@39883
   847
wenzelm@39883
   848
  \medskip
wenzelm@39883
   849
  \begin{tabular}{lll}
wenzelm@39883
   850
  @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\
wenzelm@41162
   851
  @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\
wenzelm@39883
   852
  \end{tabular}
wenzelm@39883
   853
  \medskip
wenzelm@58618
   854
\<close>
wenzelm@58618
   855
wenzelm@58618
   856
text %mlref \<open>
wenzelm@39883
   857
  \begin{mldecls}
wenzelm@46262
   858
  @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
wenzelm@46262
   859
  @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
wenzelm@46262
   860
  @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
wenzelm@46262
   861
  @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
wenzelm@39883
   862
  \end{mldecls}
wenzelm@58618
   863
\<close>
wenzelm@58618
   864
wenzelm@58618
   865
wenzelm@58618
   866
subsection \<open>Canonical iteration\<close>
wenzelm@58618
   867
wenzelm@58618
   868
text \<open>As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be
wenzelm@40126
   869
  understood as update on a configuration of type @{text "\<beta>"},
wenzelm@57421
   870
  parameterized by an argument of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}
wenzelm@39883
   871
  the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates
wenzelm@39883
   872
  homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a
wenzelm@53071
   873
  list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f a\<^sub>n"}.
wenzelm@53071
   874
  The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.
wenzelm@39883
   875
  It can be applied to an initial configuration @{text "b: \<beta>"} to
wenzelm@39883
   876
  start the iteration over the given list of arguments: each @{text
wenzelm@39883
   877
  "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a
wenzelm@39883
   878
  cumulative configuration.
wenzelm@39883
   879
wenzelm@39883
   880
  The @{text fold} combinator in Isabelle/ML lifts a function @{text
wenzelm@39883
   881
  "f"} as above to its iterated version over a list of arguments.
wenzelm@39883
   882
  Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates
wenzelm@39883
   883
  over a list of lists as expected.
wenzelm@39883
   884
wenzelm@39883
   885
  The variant @{text "fold_rev"} works inside-out over the list of
wenzelm@39883
   886
  arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.
wenzelm@39883
   887
wenzelm@39883
   888
  The @{text "fold_map"} combinator essentially performs @{text
wenzelm@39883
   889
  "fold"} and @{text "map"} simultaneously: each application of @{text
wenzelm@39883
   890
  "f"} produces an updated configuration together with a side-result;
wenzelm@39883
   891
  the iteration collects all such side-results as a separate list.
wenzelm@58618
   892
\<close>
wenzelm@58618
   893
wenzelm@58618
   894
text %mlref \<open>
wenzelm@39883
   895
  \begin{mldecls}
wenzelm@39883
   896
  @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
wenzelm@39883
   897
  @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
wenzelm@39883
   898
  @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
wenzelm@39883
   899
  \end{mldecls}
wenzelm@39883
   900
wenzelm@39883
   901
  \begin{description}
wenzelm@39883
   902
wenzelm@39883
   903
  \item @{ML fold}~@{text f} lifts the parametrized update function
wenzelm@39883
   904
  @{text "f"} to a list of parameters.
wenzelm@39883
   905
wenzelm@39883
   906
  \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text
wenzelm@57421
   907
  "f"}, but works inside-out, as if the list would be reversed.
wenzelm@39883
   908
wenzelm@39883
   909
  \item @{ML fold_map}~@{text "f"} lifts the parametrized update
wenzelm@39883
   910
  function @{text "f"} (with side-result) to a list of parameters and
wenzelm@39883
   911
  cumulative side-results.
wenzelm@39883
   912
wenzelm@39883
   913
  \end{description}
wenzelm@39883
   914
wenzelm@39883
   915
  \begin{warn}
wenzelm@57421
   916
  The literature on functional programming provides a confusing multitude of
wenzelm@57421
   917
  combinators called @{text "foldl"}, @{text "foldr"} etc. SML97 provides its
wenzelm@57421
   918
  own variations as @{ML List.foldl} and @{ML List.foldr}, while the classic
wenzelm@57421
   919
  Isabelle library also has the historic @{ML Library.foldl} and @{ML
wenzelm@57421
   920
  Library.foldr}. To avoid unnecessary complication, all these historical
wenzelm@57421
   921
  versions should be ignored, and the canonical @{ML fold} (or @{ML fold_rev})
wenzelm@57421
   922
  used exclusively.
wenzelm@39883
   923
  \end{warn}
wenzelm@58618
   924
\<close>
wenzelm@58618
   925
wenzelm@58618
   926
text %mlex \<open>The following example shows how to fill a text buffer
wenzelm@39883
   927
  incrementally by adding strings, either individually or from a given
wenzelm@39883
   928
  list.
wenzelm@58618
   929
\<close>
wenzelm@58618
   930
wenzelm@58618
   931
ML \<open>
wenzelm@39883
   932
  val s =
wenzelm@39883
   933
    Buffer.empty
wenzelm@39883
   934
    |> Buffer.add "digits: "
wenzelm@39883
   935
    |> fold (Buffer.add o string_of_int) (0 upto 9)
wenzelm@39883
   936
    |> Buffer.content;
wenzelm@39883
   937
wenzelm@39883
   938
  @{assert} (s = "digits: 0123456789");
wenzelm@58618
   939
\<close>
wenzelm@58618
   940
wenzelm@58618
   941
text \<open>Note how @{ML "fold (Buffer.add o string_of_int)"} above saves
wenzelm@39883
   942
  an extra @{ML "map"} over the given list.  This kind of peephole
wenzelm@39883
   943
  optimization reduces both the code size and the tree structures in
wenzelm@52416
   944
  memory (``deforestation''), but it requires some practice to read
wenzelm@52416
   945
  and write fluently.
wenzelm@39883
   946
wenzelm@40126
   947
  \medskip The next example elaborates the idea of canonical
wenzelm@40126
   948
  iteration, demonstrating fast accumulation of tree content using a
wenzelm@40126
   949
  text buffer.
wenzelm@58618
   950
\<close>
wenzelm@58618
   951
wenzelm@58618
   952
ML \<open>
wenzelm@39883
   953
  datatype tree = Text of string | Elem of string * tree list;
wenzelm@39883
   954
wenzelm@39883
   955
  fun slow_content (Text txt) = txt
wenzelm@39883
   956
    | slow_content (Elem (name, ts)) =
wenzelm@39883
   957
        "<" ^ name ^ ">" ^
wenzelm@39883
   958
        implode (map slow_content ts) ^
wenzelm@39883
   959
        "</" ^ name ^ ">"
wenzelm@39883
   960
wenzelm@39883
   961
  fun add_content (Text txt) = Buffer.add txt
wenzelm@39883
   962
    | add_content (Elem (name, ts)) =
wenzelm@39883
   963
        Buffer.add ("<" ^ name ^ ">") #>
wenzelm@39883
   964
        fold add_content ts #>
wenzelm@39883
   965
        Buffer.add ("</" ^ name ^ ">");
wenzelm@39883
   966
wenzelm@39883
   967
  fun fast_content tree =
wenzelm@39883
   968
    Buffer.empty |> add_content tree |> Buffer.content;
wenzelm@58618
   969
\<close>
wenzelm@58618
   970
wenzelm@58618
   971
text \<open>The slowness of @{ML slow_content} is due to the @{ML implode} of
wenzelm@39883
   972
  the recursive results, because it copies previously produced strings
wenzelm@57421
   973
  again and again.
wenzelm@39883
   974
wenzelm@39883
   975
  The incremental @{ML add_content} avoids this by operating on a
wenzelm@40149
   976
  buffer that is passed through in a linear fashion.  Using @{ML_text
wenzelm@40126
   977
  "#>"} and contraction over the actual buffer argument saves some
wenzelm@40126
   978
  additional boiler-plate.  Of course, the two @{ML "Buffer.add"}
wenzelm@40126
   979
  invocations with concatenated strings could have been split into
wenzelm@40126
   980
  smaller parts, but this would have obfuscated the source without
wenzelm@57421
   981
  making a big difference in performance.  Here we have done some
wenzelm@39883
   982
  peephole-optimization for the sake of readability.
wenzelm@39883
   983
wenzelm@39883
   984
  Another benefit of @{ML add_content} is its ``open'' form as a
wenzelm@40126
   985
  function on buffers that can be continued in further linear
wenzelm@40126
   986
  transformations, folding etc.  Thus it is more compositional than
wenzelm@40126
   987
  the naive @{ML slow_content}.  As realistic example, compare the
wenzelm@40126
   988
  old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer
wenzelm@40126
   989
  @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
wenzelm@39883
   990
wenzelm@40126
   991
  Note that @{ML fast_content} above is only defined as example.  In
wenzelm@40126
   992
  many practical situations, it is customary to provide the
wenzelm@40126
   993
  incremental @{ML add_content} only and leave the initialization and
wenzelm@57421
   994
  termination to the concrete application to the user.
wenzelm@58618
   995
\<close>
wenzelm@58618
   996
wenzelm@58618
   997
wenzelm@58618
   998
section \<open>Message output channels \label{sec:message-channels}\<close>
wenzelm@58618
   999
wenzelm@58618
  1000
text \<open>Isabelle provides output channels for different kinds of
wenzelm@39835
  1001
  messages: regular output, high-volume tracing information, warnings,
wenzelm@39835
  1002
  and errors.
wenzelm@39835
  1003
wenzelm@39835
  1004
  Depending on the user interface involved, these messages may appear
wenzelm@39835
  1005
  in different text styles or colours.  The standard output for
wenzelm@57421
  1006
  batch sessions prefixes each line of warnings by @{verbatim
wenzelm@39835
  1007
  "###"} and errors by @{verbatim "***"}, but leaves anything else
wenzelm@57421
  1008
  unchanged.  The message body may contain further markup and formatting,
wenzelm@58555
  1009
  which is routinely used in the Prover IDE @{cite "isabelle-jedit"}.
wenzelm@39835
  1010
wenzelm@39835
  1011
  Messages are associated with the transaction context of the running
wenzelm@39835
  1012
  Isar command.  This enables the front-end to manage commands and
wenzelm@39835
  1013
  resulting messages together.  For example, after deleting a command
wenzelm@39835
  1014
  from a given theory document version, the corresponding message
wenzelm@39872
  1015
  output can be retracted from the display.
wenzelm@58618
  1016
\<close>
wenzelm@58618
  1017
wenzelm@58618
  1018
text %mlref \<open>
wenzelm@39835
  1019
  \begin{mldecls}
wenzelm@39835
  1020
  @{index_ML writeln: "string -> unit"} \\
wenzelm@39835
  1021
  @{index_ML tracing: "string -> unit"} \\
wenzelm@39835
  1022
  @{index_ML warning: "string -> unit"} \\
wenzelm@57421
  1023
  @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\
wenzelm@39835
  1024
  \end{mldecls}
wenzelm@39835
  1025
wenzelm@39835
  1026
  \begin{description}
wenzelm@39835
  1027
wenzelm@39835
  1028
  \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular
wenzelm@39835
  1029
  message.  This is the primary message output operation of Isabelle
wenzelm@39835
  1030
  and should be used by default.
wenzelm@39835
  1031
wenzelm@39835
  1032
  \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special
wenzelm@39835
  1033
  tracing message, indicating potential high-volume output to the
wenzelm@39835
  1034
  front-end (hundreds or thousands of messages issued by a single
wenzelm@39835
  1035
  command).  The idea is to allow the user-interface to downgrade the
wenzelm@39835
  1036
  quality of message display to achieve higher throughput.
wenzelm@39835
  1037
wenzelm@39835
  1038
  Note that the user might have to take special actions to see tracing
wenzelm@39835
  1039
  output, e.g.\ switch to a different output window.  So this channel
wenzelm@39835
  1040
  should not be used for regular output.
wenzelm@39835
  1041
wenzelm@39835
  1042
  \item @{ML warning}~@{text "text"} outputs @{text "text"} as
wenzelm@39835
  1043
  warning, which typically means some extra emphasis on the front-end
wenzelm@40126
  1044
  side (color highlighting, icons, etc.).
wenzelm@39835
  1045
wenzelm@39835
  1046
  \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text
wenzelm@39835
  1047
  "text"} and thus lets the Isar toplevel print @{text "text"} on the
wenzelm@39835
  1048
  error channel, which typically means some extra emphasis on the
wenzelm@40126
  1049
  front-end side (color highlighting, icons, etc.).
wenzelm@39835
  1050
wenzelm@39835
  1051
  This assumes that the exception is not handled before the command
wenzelm@39835
  1052
  terminates.  Handling exception @{ML ERROR}~@{text "text"} is a
wenzelm@39835
  1053
  perfectly legal alternative: it means that the error is absorbed
wenzelm@39835
  1054
  without any message output.
wenzelm@39835
  1055
wenzelm@39861
  1056
  \begin{warn}
wenzelm@54387
  1057
  The actual error channel is accessed via @{ML Output.error_message}, but
wenzelm@58842
  1058
  this is normally not used directly in user code.
wenzelm@39861
  1059
  \end{warn}
wenzelm@39835
  1060
wenzelm@39861
  1061
  \end{description}
wenzelm@39861
  1062
wenzelm@39861
  1063
  \begin{warn}
wenzelm@39835
  1064
  Regular Isabelle/ML code should output messages exclusively by the
wenzelm@39835
  1065
  official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}
wenzelm@39835
  1066
  instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in
wenzelm@39835
  1067
  the presence of parallel and asynchronous processing of Isabelle
wenzelm@39835
  1068
  theories.  Such raw output might be displayed by the front-end in
wenzelm@39835
  1069
  some system console log, with a low chance that the user will ever
wenzelm@39835
  1070
  see it.  Moreover, as a genuine side-effect on global process
wenzelm@39835
  1071
  channels, there is no proper way to retract output when Isar command
wenzelm@40126
  1072
  transactions are reset by the system.
wenzelm@39861
  1073
  \end{warn}
wenzelm@39872
  1074
wenzelm@39872
  1075
  \begin{warn}
wenzelm@39872
  1076
  The message channels should be used in a message-oriented manner.
wenzelm@40126
  1077
  This means that multi-line output that logically belongs together is
wenzelm@57421
  1078
  issued by a single invocation of @{ML writeln} etc.\ with the
wenzelm@40126
  1079
  functional concatenation of all message constituents.
wenzelm@39872
  1080
  \end{warn}
wenzelm@58618
  1081
\<close>
wenzelm@58618
  1082
wenzelm@58618
  1083
text %mlex \<open>The following example demonstrates a multi-line
wenzelm@39872
  1084
  warning.  Note that in some situations the user sees only the first
wenzelm@39872
  1085
  line, so the most important point should be made first.
wenzelm@58618
  1086
\<close>
wenzelm@58618
  1087
wenzelm@58618
  1088
ML_command \<open>
wenzelm@39872
  1089
  warning (cat_lines
wenzelm@39872
  1090
   ["Beware the Jabberwock, my son!",
wenzelm@39872
  1091
    "The jaws that bite, the claws that catch!",
wenzelm@39872
  1092
    "Beware the Jubjub Bird, and shun",
wenzelm@39872
  1093
    "The frumious Bandersnatch!"]);
wenzelm@58618
  1094
\<close>
wenzelm@58618
  1095
wenzelm@58618
  1096
wenzelm@58618
  1097
section \<open>Exceptions \label{sec:exceptions}\<close>
wenzelm@58618
  1098
wenzelm@58618
  1099
text \<open>The Standard ML semantics of strict functional evaluation
wenzelm@39854
  1100
  together with exceptions is rather well defined, but some delicate
wenzelm@39854
  1101
  points need to be observed to avoid that ML programs go wrong
wenzelm@39854
  1102
  despite static type-checking.  Exceptions in Isabelle/ML are
wenzelm@39854
  1103
  subsequently categorized as follows.
wenzelm@39854
  1104
wenzelm@39854
  1105
  \paragraph{Regular user errors.}  These are meant to provide
wenzelm@39854
  1106
  informative feedback about malformed input etc.
wenzelm@39854
  1107
wenzelm@57421
  1108
  The \emph{error} function raises the corresponding @{ML ERROR}
wenzelm@57421
  1109
  exception, with a plain text message as argument.  @{ML ERROR}
wenzelm@39854
  1110
  exceptions can be handled internally, in order to be ignored, turned
wenzelm@39854
  1111
  into other exceptions, or cascaded by appending messages.  If the
wenzelm@57421
  1112
  corresponding Isabelle/Isar command terminates with an @{ML ERROR}
wenzelm@57421
  1113
  exception state, the system will print the result on the error
wenzelm@39855
  1114
  channel (see \secref{sec:message-channels}).
wenzelm@39854
  1115
wenzelm@39854
  1116
  It is considered bad style to refer to internal function names or
wenzelm@57421
  1117
  values in ML source notation in user error messages.  Do not use
wenzelm@59572
  1118
  @{text "@{make_string}"} nor @{text "@{here}"}!
wenzelm@39854
  1119
wenzelm@39854
  1120
  Grammatical correctness of error messages can be improved by
wenzelm@39854
  1121
  \emph{omitting} final punctuation: messages are often concatenated
wenzelm@39854
  1122
  or put into a larger context (e.g.\ augmented with source position).
wenzelm@57421
  1123
  Note that punctuation after formal entities (types, terms, theorems) is
wenzelm@57421
  1124
  particularly prone to user confusion.
wenzelm@39854
  1125
wenzelm@39854
  1126
  \paragraph{Program failures.}  There is a handful of standard
wenzelm@39854
  1127
  exceptions that indicate general failure situations, or failures of
wenzelm@39854
  1128
  core operations on logical entities (types, terms, theorems,
wenzelm@39856
  1129
  theories, see \chref{ch:logic}).
wenzelm@39854
  1130
wenzelm@39854
  1131
  These exceptions indicate a genuine breakdown of the program, so the
wenzelm@39854
  1132
  main purpose is to determine quickly what has happened where.
wenzelm@39855
  1133
  Traditionally, the (short) exception message would include the name
wenzelm@40126
  1134
  of an ML function, although this is no longer necessary, because the
wenzelm@57421
  1135
  ML runtime system attaches detailed source position stemming from the
wenzelm@40149
  1136
  corresponding @{ML_text raise} keyword.
wenzelm@39854
  1137
wenzelm@39854
  1138
  \medskip User modules can always introduce their own custom
wenzelm@39854
  1139
  exceptions locally, e.g.\ to organize internal failures robustly
wenzelm@39854
  1140
  without overlapping with existing exceptions.  Exceptions that are
wenzelm@39854
  1141
  exposed in module signatures require extra care, though, and should
wenzelm@40126
  1142
  \emph{not} be introduced by default.  Surprise by users of a module
wenzelm@40126
  1143
  can be often minimized by using plain user errors instead.
wenzelm@39854
  1144
wenzelm@39854
  1145
  \paragraph{Interrupts.}  These indicate arbitrary system events:
wenzelm@39854
  1146
  both the ML runtime system and the Isabelle/ML infrastructure signal
wenzelm@39854
  1147
  various exceptional situations by raising the special
wenzelm@57421
  1148
  @{ML Exn.Interrupt} exception in user code.
wenzelm@57421
  1149
wenzelm@57421
  1150
  This is the one and only way that physical events can intrude an Isabelle/ML
wenzelm@57421
  1151
  program. Such an interrupt can mean out-of-memory, stack overflow, timeout,
wenzelm@57421
  1152
  internal signaling of threads, or a POSIX process signal. An Isabelle/ML
wenzelm@57421
  1153
  program that intercepts interrupts becomes dependent on physical effects of
wenzelm@57421
  1154
  the environment. Even worse, exception handling patterns that are too
wenzelm@57421
  1155
  general by accident, e.g.\ by misspelled exception constructors, will cover
wenzelm@57421
  1156
  interrupts unintentionally and thus render the program semantics
wenzelm@57421
  1157
  ill-defined.
wenzelm@39854
  1158
wenzelm@39854
  1159
  Note that the Interrupt exception dates back to the original SML90
wenzelm@39854
  1160
  language definition.  It was excluded from the SML97 version to
wenzelm@39854
  1161
  avoid its malign impact on ML program semantics, but without
wenzelm@39854
  1162
  providing a viable alternative.  Isabelle/ML recovers physical
wenzelm@40229
  1163
  interruptibility (which is an indispensable tool to implement
wenzelm@40229
  1164
  managed evaluation of command transactions), but requires user code
wenzelm@40229
  1165
  to be strictly transparent wrt.\ interrupts.
wenzelm@39854
  1166
wenzelm@39854
  1167
  \begin{warn}
wenzelm@39854
  1168
  Isabelle/ML user code needs to terminate promptly on interruption,
wenzelm@39854
  1169
  without guessing at its meaning to the system infrastructure.
wenzelm@39854
  1170
  Temporary handling of interrupts for cleanup of global resources
wenzelm@39854
  1171
  etc.\ needs to be followed immediately by re-raising of the original
wenzelm@39854
  1172
  exception.
wenzelm@39854
  1173
  \end{warn}
wenzelm@58618
  1174
\<close>
wenzelm@58618
  1175
wenzelm@58618
  1176
text %mlref \<open>
wenzelm@39855
  1177
  \begin{mldecls}
wenzelm@39855
  1178
  @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
wenzelm@39855
  1179
  @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
wenzelm@55838
  1180
  @{index_ML_exception ERROR: string} \\
wenzelm@55838
  1181
  @{index_ML_exception Fail: string} \\
wenzelm@39856
  1182
  @{index_ML Exn.is_interrupt: "exn -> bool"} \\
wenzelm@39855
  1183
  @{index_ML reraise: "exn -> 'a"} \\
wenzelm@56303
  1184
  @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\
wenzelm@39855
  1185
  \end{mldecls}
wenzelm@39855
  1186
wenzelm@39855
  1187
  \begin{description}
wenzelm@39855
  1188
wenzelm@39855
  1189
  \item @{ML try}~@{text "f x"} makes the partiality of evaluating
wenzelm@39855
  1190
  @{text "f x"} explicit via the option datatype.  Interrupts are
wenzelm@39855
  1191
  \emph{not} handled here, i.e.\ this form serves as safe replacement
wenzelm@40149
  1192
  for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f
wenzelm@40149
  1193
  x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in
wenzelm@57421
  1194
  books about SML97, but not in Isabelle/ML.
wenzelm@39855
  1195
wenzelm@39855
  1196
  \item @{ML can} is similar to @{ML try} with more abstract result.
wenzelm@39855
  1197
wenzelm@39856
  1198
  \item @{ML ERROR}~@{text "msg"} represents user errors; this
wenzelm@40126
  1199
  exception is normally raised indirectly via the @{ML error} function
wenzelm@40126
  1200
  (see \secref{sec:message-channels}).
wenzelm@39856
  1201
wenzelm@39856
  1202
  \item @{ML Fail}~@{text "msg"} represents general program failures.
wenzelm@39856
  1203
wenzelm@39856
  1204
  \item @{ML Exn.is_interrupt} identifies interrupts robustly, without
wenzelm@39856
  1205
  mentioning concrete exception constructors in user code.  Handled
wenzelm@39856
  1206
  interrupts need to be re-raised promptly!
wenzelm@39856
  1207
wenzelm@39855
  1208
  \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}
wenzelm@39855
  1209
  while preserving its implicit position information (if possible,
wenzelm@39855
  1210
  depending on the ML platform).
wenzelm@39855
  1211
wenzelm@56303
  1212
  \item @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~@{text
wenzelm@40149
  1213
  "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing
wenzelm@39855
  1214
  a full trace of its stack of nested exceptions (if possible,
wenzelm@53739
  1215
  depending on the ML platform).
wenzelm@39855
  1216
wenzelm@56303
  1217
  Inserting @{ML Runtime.exn_trace} into ML code temporarily is
wenzelm@53709
  1218
  useful for debugging, but not suitable for production code.
wenzelm@39855
  1219
wenzelm@39855
  1220
  \end{description}
wenzelm@58618
  1221
\<close>
wenzelm@58618
  1222
wenzelm@58618
  1223
text %mlantiq \<open>
wenzelm@39866
  1224
  \begin{matharray}{rcl}
wenzelm@39866
  1225
  @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\
wenzelm@39866
  1226
  \end{matharray}
wenzelm@39866
  1227
wenzelm@39866
  1228
  \begin{description}
wenzelm@39866
  1229
wenzelm@40110
  1230
  \item @{text "@{assert}"} inlines a function
wenzelm@40110
  1231
  @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is
wenzelm@40110
  1232
  @{ML false}.  Due to inlining the source position of failed
wenzelm@40110
  1233
  assertions is included in the error output.
wenzelm@39866
  1234
wenzelm@39866
  1235
  \end{description}
wenzelm@58618
  1236
\<close>
wenzelm@58618
  1237
wenzelm@58618
  1238
wenzelm@58618
  1239
section \<open>Strings of symbols \label{sec:symbols}\<close>
wenzelm@58618
  1240
wenzelm@58618
  1241
text \<open>A \emph{symbol} constitutes the smallest textual unit in
wenzelm@52421
  1242
  Isabelle/ML --- raw ML characters are normally not encountered at
wenzelm@57421
  1243
  all.  Isabelle strings consist of a sequence of symbols, represented
wenzelm@52421
  1244
  as a packed string or an exploded list of strings.  Each symbol is
wenzelm@52421
  1245
  in itself a small string, which has either one of the following
wenzelm@52421
  1246
  forms:
wenzelm@52421
  1247
wenzelm@52421
  1248
  \begin{enumerate}
wenzelm@52421
  1249
wenzelm@52421
  1250
  \item a single ASCII character ``@{text "c"}'', for example
wenzelm@58723
  1251
  ``@{verbatim a}'',
wenzelm@52421
  1252
wenzelm@56579
  1253
  \item a codepoint according to UTF-8 (non-ASCII byte sequence),
wenzelm@52421
  1254
wenzelm@58723
  1255
  \item a regular symbol ``@{verbatim \<open>\\<close>}@{verbatim "<"}@{text
wenzelm@58723
  1256
  "ident"}@{verbatim ">"}'', for example ``@{verbatim "\<alpha>"}'',
wenzelm@58723
  1257
wenzelm@58723
  1258
  \item a control symbol ``@{verbatim \<open>\\<close>}@{verbatim "<^"}@{text
wenzelm@58723
  1259
  "ident"}@{verbatim ">"}'', for example ``@{verbatim "\<^bold>"}'',
wenzelm@58723
  1260
wenzelm@58723
  1261
  \item a raw symbol ``@{verbatim \<open>\\<close>}@{verbatim "<^raw:"}@{text
wenzelm@58723
  1262
  text}@{verbatim ">"}'' where @{text text} consists of printable characters
wenzelm@58723
  1263
  excluding ``@{verbatim "."}'' and ``@{verbatim ">"}'', for example
wenzelm@58723
  1264
  ``@{verbatim "\<^raw:$\sum_{i = 1}^n$>"}'',
wenzelm@58723
  1265
wenzelm@58723
  1266
  \item a numbered raw control symbol ``@{verbatim \<open>\\<close>}@{verbatim
wenzelm@58723
  1267
  "<^raw"}@{text n}@{verbatim ">"}, where @{text n} consists of digits, for
wenzelm@58723
  1268
  example ``@{verbatim "\<^raw42>"}''.
wenzelm@52421
  1269
wenzelm@52421
  1270
  \end{enumerate}
wenzelm@52421
  1271
wenzelm@52421
  1272
  The @{text "ident"} syntax for symbol names is @{text "letter
wenzelm@52421
  1273
  (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text
wenzelm@52421
  1274
  "digit = 0..9"}.  There are infinitely many regular symbols and
wenzelm@52421
  1275
  control symbols, but a fixed collection of standard symbols is
wenzelm@58723
  1276
  treated specifically.  For example, ``@{verbatim "\<alpha>"}'' is
wenzelm@52421
  1277
  classified as a letter, which means it may occur within regular
wenzelm@52421
  1278
  Isabelle identifiers.
wenzelm@52421
  1279
wenzelm@57421
  1280
  The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit
wenzelm@57421
  1281
  character sequences are passed-through unchanged. Unicode/UCS data in UTF-8
wenzelm@57421
  1282
  encoding is processed in a non-strict fashion, such that well-formed code
wenzelm@57421
  1283
  sequences are recognized accordingly. Unicode provides its own collection of
wenzelm@57421
  1284
  mathematical symbols, but within the core Isabelle/ML world there is no link
wenzelm@57421
  1285
  to the standard collection of Isabelle regular symbols.
wenzelm@57421
  1286
wenzelm@57421
  1287
  \medskip Output of Isabelle symbols depends on the print mode. For example,
wenzelm@57421
  1288
  the standard {\LaTeX} setup of the Isabelle document preparation system
wenzelm@58723
  1289
  would present ``@{verbatim "\<alpha>"}'' as @{text "\<alpha>"}, and ``@{verbatim
wenzelm@58723
  1290
  "\<^bold>\<alpha>"}'' as @{text "\<^bold>\<alpha>"}. On-screen rendering usually works by mapping a
wenzelm@58723
  1291
  finite subset of Isabelle symbols to suitable Unicode characters.
wenzelm@58618
  1292
\<close>
wenzelm@58618
  1293
wenzelm@58618
  1294
text %mlref \<open>
wenzelm@52421
  1295
  \begin{mldecls}
wenzelm@52421
  1296
  @{index_ML_type "Symbol.symbol": string} \\
wenzelm@52421
  1297
  @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
wenzelm@52421
  1298
  @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
wenzelm@52421
  1299
  @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
wenzelm@52421
  1300
  @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
wenzelm@52421
  1301
  @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
wenzelm@52421
  1302
  \end{mldecls}
wenzelm@52421
  1303
  \begin{mldecls}
wenzelm@52421
  1304
  @{index_ML_type "Symbol.sym"} \\
wenzelm@52421
  1305
  @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
wenzelm@52421
  1306
  \end{mldecls}
wenzelm@52421
  1307
wenzelm@52421
  1308
  \begin{description}
wenzelm@52421
  1309
wenzelm@52421
  1310
  \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle
wenzelm@52421
  1311
  symbols.
wenzelm@52421
  1312
wenzelm@52421
  1313
  \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list
wenzelm@52421
  1314
  from the packed form.  This function supersedes @{ML
wenzelm@52421
  1315
  "String.explode"} for virtually all purposes of manipulating text in
wenzelm@52421
  1316
  Isabelle!\footnote{The runtime overhead for exploded strings is
wenzelm@52421
  1317
  mainly that of the list structure: individual symbols that happen to
wenzelm@52421
  1318
  be a singleton string do not require extra memory in Poly/ML.}
wenzelm@52421
  1319
wenzelm@52421
  1320
  \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
wenzelm@52421
  1321
  "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard
wenzelm@52421
  1322
  symbols according to fixed syntactic conventions of Isabelle, cf.\
wenzelm@58555
  1323
  @{cite "isabelle-isar-ref"}.
wenzelm@52421
  1324
wenzelm@52421
  1325
  \item Type @{ML_type "Symbol.sym"} is a concrete datatype that
wenzelm@52421
  1326
  represents the different kinds of symbols explicitly, with
wenzelm@57421
  1327
  constructors @{ML "Symbol.Char"}, @{ML "Symbol.UTF8"},
wenzelm@57421
  1328
  @{ML "Symbol.Sym"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"},
wenzelm@57421
  1329
  @{ML "Symbol.Malformed"}.
wenzelm@52421
  1330
wenzelm@52421
  1331
  \item @{ML "Symbol.decode"} converts the string representation of a
wenzelm@52421
  1332
  symbol into the datatype version.
wenzelm@52421
  1333
wenzelm@52421
  1334
  \end{description}
wenzelm@52421
  1335
wenzelm@52421
  1336
  \paragraph{Historical note.} In the original SML90 standard the
wenzelm@52421
  1337
  primitive ML type @{ML_type char} did not exists, and @{ML_text
wenzelm@52421
  1338
  "explode: string -> string list"} produced a list of singleton
wenzelm@52421
  1339
  strings like @{ML "raw_explode: string -> string list"} in
wenzelm@52421
  1340
  Isabelle/ML today.  When SML97 came out, Isabelle did not adopt its
wenzelm@52421
  1341
  somewhat anachronistic 8-bit or 16-bit characters, but the idea of
wenzelm@52421
  1342
  exploding a string into a list of small strings was extended to
wenzelm@52421
  1343
  ``symbols'' as explained above.  Thus Isabelle sources can refer to
wenzelm@52421
  1344
  an infinite store of user-defined symbols, without having to worry
wenzelm@52421
  1345
  about the multitude of Unicode encodings that have emerged over the
wenzelm@58618
  1346
  years.\<close>
wenzelm@58618
  1347
wenzelm@58618
  1348
wenzelm@58618
  1349
section \<open>Basic data types\<close>
wenzelm@58618
  1350
wenzelm@58618
  1351
text \<open>The basis library proposal of SML97 needs to be treated with
wenzelm@39859
  1352
  caution.  Many of its operations simply do not fit with important
wenzelm@39859
  1353
  Isabelle/ML conventions (like ``canonical argument order'', see
wenzelm@40126
  1354
  \secref{sec:canonical-argument-order}), others cause problems with
wenzelm@40126
  1355
  the parallel evaluation model of Isabelle/ML (such as @{ML
wenzelm@40126
  1356
  TextIO.print} or @{ML OS.Process.system}).
wenzelm@39859
  1357
wenzelm@39859
  1358
  Subsequently we give a brief overview of important operations on
wenzelm@39859
  1359
  basic ML data types.
wenzelm@58618
  1360
\<close>
wenzelm@58618
  1361
wenzelm@58618
  1362
wenzelm@58618
  1363
subsection \<open>Characters\<close>
wenzelm@58618
  1364
wenzelm@58618
  1365
text %mlref \<open>
wenzelm@39863
  1366
  \begin{mldecls}
wenzelm@39863
  1367
  @{index_ML_type char} \\
wenzelm@39863
  1368
  \end{mldecls}
wenzelm@39863
  1369
wenzelm@39863
  1370
  \begin{description}
wenzelm@39863
  1371
wenzelm@39864
  1372
  \item Type @{ML_type char} is \emph{not} used.  The smallest textual
wenzelm@40126
  1373
  unit in Isabelle is represented as a ``symbol'' (see
wenzelm@39864
  1374
  \secref{sec:symbols}).
wenzelm@39863
  1375
wenzelm@39863
  1376
  \end{description}
wenzelm@58618
  1377
\<close>
wenzelm@58618
  1378
wenzelm@58618
  1379
wenzelm@58618
  1380
subsection \<open>Strings\<close>
wenzelm@58618
  1381
wenzelm@58618
  1382
text %mlref \<open>
wenzelm@52421
  1383
  \begin{mldecls}
wenzelm@52421
  1384
  @{index_ML_type string} \\
wenzelm@52421
  1385
  \end{mldecls}
wenzelm@52421
  1386
wenzelm@52421
  1387
  \begin{description}
wenzelm@52421
  1388
wenzelm@52421
  1389
  \item Type @{ML_type string} represents immutable vectors of 8-bit
wenzelm@52421
  1390
  characters.  There are operations in SML to convert back and forth
wenzelm@52421
  1391
  to actual byte vectors, which are seldom used.
wenzelm@52421
  1392
wenzelm@52421
  1393
  This historically important raw text representation is used for
wenzelm@52421
  1394
  Isabelle-specific purposes with the following implicit substructures
wenzelm@52421
  1395
  packed into the string content:
wenzelm@52421
  1396
wenzelm@52421
  1397
  \begin{enumerate}
wenzelm@52421
  1398
wenzelm@52421
  1399
  \item sequence of Isabelle symbols (see also \secref{sec:symbols}),
wenzelm@52421
  1400
  with @{ML Symbol.explode} as key operation;
wenzelm@52421
  1401
wenzelm@58555
  1402
  \item XML tree structure via YXML (see also @{cite "isabelle-sys"}),
wenzelm@52421
  1403
  with @{ML YXML.parse_body} as key operation.
wenzelm@52421
  1404
wenzelm@52421
  1405
  \end{enumerate}
wenzelm@52421
  1406
wenzelm@58723
  1407
  Note that Isabelle/ML string literals may refer Isabelle symbols like
wenzelm@58723
  1408
  ``@{verbatim \<alpha>}'' natively, \emph{without} escaping the backslash. This is a
wenzelm@58723
  1409
  consequence of Isabelle treating all source text as strings of symbols,
wenzelm@58723
  1410
  instead of raw characters.
wenzelm@52421
  1411
wenzelm@52421
  1412
  \end{description}
wenzelm@58618
  1413
\<close>
wenzelm@58618
  1414
wenzelm@58618
  1415
text %mlex \<open>The subsequent example illustrates the difference of
wenzelm@52421
  1416
  physical addressing of bytes versus logical addressing of symbols in
wenzelm@52421
  1417
  Isabelle strings.
wenzelm@58618
  1418
\<close>
wenzelm@58618
  1419
wenzelm@58618
  1420
ML_val \<open>
wenzelm@52421
  1421
  val s = "\<A>";
wenzelm@52421
  1422
wenzelm@52421
  1423
  @{assert} (length (Symbol.explode s) = 1);
wenzelm@52421
  1424
  @{assert} (size s = 4);
wenzelm@58618
  1425
\<close>
wenzelm@58618
  1426
wenzelm@58618
  1427
text \<open>Note that in Unicode renderings of the symbol @{text "\<A>"},
wenzelm@52421
  1428
  variations of encodings like UTF-8 or UTF-16 pose delicate questions
wenzelm@57421
  1429
  about the multi-byte representations of its codepoint, which is outside
wenzelm@52421
  1430
  of the 16-bit address space of the original Unicode standard from
wenzelm@58723
  1431
  the 1990-ies.  In Isabelle/ML it is just ``@{verbatim \<A>}''
wenzelm@58618
  1432
  literally, using plain ASCII characters beyond any doubts.\<close>
wenzelm@58618
  1433
wenzelm@58618
  1434
wenzelm@58618
  1435
subsection \<open>Integers\<close>
wenzelm@58618
  1436
wenzelm@58618
  1437
text %mlref \<open>
wenzelm@39862
  1438
  \begin{mldecls}
wenzelm@39862
  1439
  @{index_ML_type int} \\
wenzelm@39862
  1440
  \end{mldecls}
wenzelm@39862
  1441
wenzelm@39862
  1442
  \begin{description}
wenzelm@39862
  1443
wenzelm@57421
  1444
  \item Type @{ML_type int} represents regular mathematical integers, which
wenzelm@57421
  1445
  are \emph{unbounded}. Overflow is treated properly, but should never happen
wenzelm@57421
  1446
  in practice.\footnote{The size limit for integer bit patterns in memory is
wenzelm@57421
  1447
  64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.} This works
wenzelm@57421
  1448
  uniformly for all supported ML platforms (Poly/ML and SML/NJ).
wenzelm@39862
  1449
wenzelm@40126
  1450
  Literal integers in ML text are forced to be of this one true
wenzelm@52417
  1451
  integer type --- adhoc overloading of SML97 is disabled.
wenzelm@39862
  1452
wenzelm@55837
  1453
  Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
wenzelm@55837
  1454
  @{ML_structure Int}.  Structure @{ML_structure Integer} in @{file
wenzelm@39862
  1455
  "~~/src/Pure/General/integer.ML"} provides some additional
wenzelm@39862
  1456
  operations.
wenzelm@39862
  1457
wenzelm@39862
  1458
  \end{description}
wenzelm@58618
  1459
\<close>
wenzelm@58618
  1460
wenzelm@58618
  1461
wenzelm@58618
  1462
subsection \<open>Time\<close>
wenzelm@58618
  1463
wenzelm@58618
  1464
text %mlref \<open>
wenzelm@40302
  1465
  \begin{mldecls}
wenzelm@40302
  1466
  @{index_ML_type Time.time} \\
wenzelm@40302
  1467
  @{index_ML seconds: "real -> Time.time"} \\
wenzelm@40302
  1468
  \end{mldecls}
wenzelm@40302
  1469
wenzelm@40302
  1470
  \begin{description}
wenzelm@40302
  1471
wenzelm@40302
  1472
  \item Type @{ML_type Time.time} represents time abstractly according
wenzelm@40302
  1473
  to the SML97 basis library definition.  This is adequate for
wenzelm@40302
  1474
  internal ML operations, but awkward in concrete time specifications.
wenzelm@40302
  1475
wenzelm@40302
  1476
  \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text
wenzelm@40302
  1477
  "s"} (measured in seconds) into an abstract time value.  Floating
wenzelm@52417
  1478
  point numbers are easy to use as configuration options in the
wenzelm@57421
  1479
  context (see \secref{sec:config-options}) or system options that
wenzelm@52417
  1480
  are maintained externally.
wenzelm@40302
  1481
wenzelm@40302
  1482
  \end{description}
wenzelm@58618
  1483
\<close>
wenzelm@58618
  1484
wenzelm@58618
  1485
wenzelm@58618
  1486
subsection \<open>Options\<close>
wenzelm@58618
  1487
wenzelm@58618
  1488
text %mlref \<open>
wenzelm@39859
  1489
  \begin{mldecls}
wenzelm@39859
  1490
  @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
wenzelm@39859
  1491
  @{index_ML is_some: "'a option -> bool"} \\
wenzelm@39859
  1492
  @{index_ML is_none: "'a option -> bool"} \\
wenzelm@39859
  1493
  @{index_ML the: "'a option -> 'a"} \\
wenzelm@39859
  1494
  @{index_ML these: "'a list option -> 'a list"} \\
wenzelm@39859
  1495
  @{index_ML the_list: "'a option -> 'a list"} \\
wenzelm@39859
  1496
  @{index_ML the_default: "'a -> 'a option -> 'a"} \\
wenzelm@39859
  1497
  \end{mldecls}
wenzelm@58618
  1498
\<close>
wenzelm@58618
  1499
wenzelm@58618
  1500
text \<open>Apart from @{ML Option.map} most other operations defined in
wenzelm@57421
  1501
  structure @{ML_structure Option} are alien to Isabelle/ML and never
wenzelm@52417
  1502
  used.  The operations shown above are defined in @{file
wenzelm@58618
  1503
  "~~/src/Pure/General/basics.ML"}.\<close>
wenzelm@58618
  1504
wenzelm@58618
  1505
wenzelm@58618
  1506
subsection \<open>Lists\<close>
wenzelm@58618
  1507
wenzelm@58618
  1508
text \<open>Lists are ubiquitous in ML as simple and light-weight
wenzelm@39863
  1509
  ``collections'' for many everyday programming tasks.  Isabelle/ML
wenzelm@39874
  1510
  provides important additions and improvements over operations that
wenzelm@58618
  1511
  are predefined in the SML97 library.\<close>
wenzelm@58618
  1512
wenzelm@58618
  1513
text %mlref \<open>
wenzelm@39863
  1514
  \begin{mldecls}
wenzelm@39863
  1515
  @{index_ML cons: "'a -> 'a list -> 'a list"} \\
wenzelm@39874
  1516
  @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
wenzelm@39874
  1517
  @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
wenzelm@39874
  1518
  @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
wenzelm@39874
  1519
  @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
wenzelm@39863
  1520
  \end{mldecls}
wenzelm@39863
  1521
wenzelm@39863
  1522
  \begin{description}
wenzelm@39863
  1523
wenzelm@39863
  1524
  \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.
wenzelm@39863
  1525
wenzelm@39863
  1526
  Tupled infix operators are a historical accident in Standard ML.
wenzelm@39863
  1527
  The curried @{ML cons} amends this, but it should be only used when
wenzelm@39863
  1528
  partial application is required.
wenzelm@39863
  1529
wenzelm@39874
  1530
  \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat
wenzelm@39874
  1531
  lists as a set-like container that maintains the order of elements.
wenzelm@40800
  1532
  See @{file "~~/src/Pure/library.ML"} for the full specifications
wenzelm@39874
  1533
  (written in ML).  There are some further derived operations like
wenzelm@39874
  1534
  @{ML union} or @{ML inter}.
wenzelm@39874
  1535
wenzelm@39874
  1536
  Note that @{ML insert} is conservative about elements that are
wenzelm@39874
  1537
  already a @{ML member} of the list, while @{ML update} ensures that
wenzelm@40126
  1538
  the latest entry is always put in front.  The latter discipline is
wenzelm@39874
  1539
  often more appropriate in declarations of context data
wenzelm@39874
  1540
  (\secref{sec:context-data}) that are issued by the user in Isar
wenzelm@52417
  1541
  source: later declarations take precedence over earlier ones.
wenzelm@39874
  1542
wenzelm@39863
  1543
  \end{description}
wenzelm@58618
  1544
\<close>
wenzelm@58618
  1545
wenzelm@58618
  1546
text %mlex \<open>Using canonical @{ML fold} together with @{ML cons} (or
wenzelm@52417
  1547
  similar standard operations) alternates the orientation of data.
wenzelm@40126
  1548
  The is quite natural and should not be altered forcible by inserting
wenzelm@40126
  1549
  extra applications of @{ML rev}.  The alternative @{ML fold_rev} can
wenzelm@40126
  1550
  be used in the few situations, where alternation should be
wenzelm@40126
  1551
  prevented.
wenzelm@58618
  1552
\<close>
wenzelm@58618
  1553
wenzelm@58618
  1554
ML \<open>
wenzelm@39863
  1555
  val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
wenzelm@39863
  1556
wenzelm@39863
  1557
  val list1 = fold cons items [];
wenzelm@39866
  1558
  @{assert} (list1 = rev items);
wenzelm@39866
  1559
wenzelm@39863
  1560
  val list2 = fold_rev cons items [];
wenzelm@39866
  1561
  @{assert} (list2 = items);
wenzelm@58618
  1562
\<close>
wenzelm@58618
  1563
wenzelm@58618
  1564
text \<open>The subsequent example demonstrates how to \emph{merge} two
wenzelm@58618
  1565
  lists in a natural way.\<close>
wenzelm@58618
  1566
wenzelm@58618
  1567
ML \<open>
wenzelm@39883
  1568
  fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;
wenzelm@58618
  1569
\<close>
wenzelm@58618
  1570
wenzelm@58618
  1571
text \<open>Here the first list is treated conservatively: only the new
wenzelm@39883
  1572
  elements from the second list are inserted.  The inside-out order of
wenzelm@39883
  1573
  insertion via @{ML fold_rev} attempts to preserve the order of
wenzelm@39883
  1574
  elements in the result.
wenzelm@39883
  1575
wenzelm@39883
  1576
  This way of merging lists is typical for context data
wenzelm@39883
  1577
  (\secref{sec:context-data}).  See also @{ML merge} as defined in
wenzelm@40800
  1578
  @{file "~~/src/Pure/library.ML"}.
wenzelm@58618
  1579
\<close>
wenzelm@58618
  1580
wenzelm@58618
  1581
wenzelm@58618
  1582
subsection \<open>Association lists\<close>
wenzelm@58618
  1583
wenzelm@58618
  1584
text \<open>The operations for association lists interpret a concrete list
wenzelm@39875
  1585
  of pairs as a finite function from keys to values.  Redundant
wenzelm@39875
  1586
  representations with multiple occurrences of the same key are
wenzelm@39875
  1587
  implicitly normalized: lookup and update only take the first
wenzelm@39875
  1588
  occurrence into account.
wenzelm@58618
  1589
\<close>
wenzelm@58618
  1590
wenzelm@58618
  1591
text \<open>
wenzelm@39875
  1592
  \begin{mldecls}
wenzelm@39875
  1593
  @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
wenzelm@39875
  1594
  @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
wenzelm@39875
  1595
  @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
wenzelm@39875
  1596
  \end{mldecls}
wenzelm@39875
  1597
wenzelm@39875
  1598
  \begin{description}
wenzelm@39875
  1599
wenzelm@39875
  1600
  \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}
wenzelm@39875
  1601
  implement the main ``framework operations'' for mappings in
wenzelm@40126
  1602
  Isabelle/ML, following standard conventions for their names and
wenzelm@40126
  1603
  types.
wenzelm@39875
  1604
wenzelm@57421
  1605
  Note that a function called @{verbatim lookup} is obliged to express its
wenzelm@39875
  1606
  partiality via an explicit option element.  There is no choice to
wenzelm@39875
  1607
  raise an exception, without changing the name to something like
wenzelm@39875
  1608
  @{text "the_element"} or @{text "get"}.
wenzelm@39875
  1609
wenzelm@39875
  1610
  The @{text "defined"} operation is essentially a contraction of @{ML
wenzelm@57421
  1611
  is_some} and @{verbatim "lookup"}, but this is sufficiently frequent to
wenzelm@39875
  1612
  justify its independent existence.  This also gives the
wenzelm@39875
  1613
  implementation some opportunity for peep-hole optimization.
wenzelm@39875
  1614
wenzelm@39875
  1615
  \end{description}
wenzelm@39875
  1616
wenzelm@57421
  1617
  Association lists are adequate as simple implementation of finite mappings
wenzelm@57421
  1618
  in many practical situations. A more advanced table structure is defined in
wenzelm@57421
  1619
  @{file "~~/src/Pure/General/table.ML"}; that version scales easily to
wenzelm@39875
  1620
  thousands or millions of elements.
wenzelm@58618
  1621
\<close>
wenzelm@58618
  1622
wenzelm@58618
  1623
wenzelm@58618
  1624
subsection \<open>Unsynchronized references\<close>
wenzelm@58618
  1625
wenzelm@58618
  1626
text %mlref \<open>
wenzelm@39859
  1627
  \begin{mldecls}
wenzelm@39870
  1628
  @{index_ML_type "'a Unsynchronized.ref"} \\
wenzelm@39859
  1629
  @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
wenzelm@39859
  1630
  @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
wenzelm@46262
  1631
  @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
wenzelm@39859
  1632
  \end{mldecls}
wenzelm@58618
  1633
\<close>
wenzelm@58618
  1634
wenzelm@58618
  1635
text \<open>Due to ubiquitous parallelism in Isabelle/ML (see also
wenzelm@39859
  1636
  \secref{sec:multi-threading}), the mutable reference cells of
wenzelm@39859
  1637
  Standard ML are notorious for causing problems.  In a highly
wenzelm@39859
  1638
  parallel system, both correctness \emph{and} performance are easily
wenzelm@39859
  1639
  degraded when using mutable data.
wenzelm@39859
  1640
wenzelm@39859
  1641
  The unwieldy name of @{ML Unsynchronized.ref} for the constructor
wenzelm@39859
  1642
  for references in Isabelle/ML emphasizes the inconveniences caused by
wenzelm@46262
  1643
  mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are
wenzelm@39859
  1644
  unchanged, but should be used with special precautions, say in a
wenzelm@39859
  1645
  strictly local situation that is guaranteed to be restricted to
wenzelm@40508
  1646
  sequential evaluation --- now and in the future.
wenzelm@40508
  1647
wenzelm@40508
  1648
  \begin{warn}
wenzelm@40508
  1649
  Never @{ML_text "open Unsynchronized"}, not even in a local scope!
wenzelm@40508
  1650
  Pretending that mutable state is no problem is a very bad idea.
wenzelm@40508
  1651
  \end{warn}
wenzelm@58618
  1652
\<close>
wenzelm@58618
  1653
wenzelm@58618
  1654
wenzelm@58618
  1655
section \<open>Thread-safe programming \label{sec:multi-threading}\<close>
wenzelm@58618
  1656
wenzelm@58618
  1657
text \<open>Multi-threaded execution has become an everyday reality in
wenzelm@39867
  1658
  Isabelle since Poly/ML 5.2.1 and Isabelle2008.  Isabelle/ML provides
wenzelm@39868
  1659
  implicit and explicit parallelism by default, and there is no way
wenzelm@39868
  1660
  for user-space tools to ``opt out''.  ML programs that are purely
wenzelm@39868
  1661
  functional, output messages only via the official channels
wenzelm@39868
  1662
  (\secref{sec:message-channels}), and do not intercept interrupts
wenzelm@39868
  1663
  (\secref{sec:exceptions}) can participate in the multi-threaded
wenzelm@39868
  1664
  environment immediately without further ado.
wenzelm@39868
  1665
wenzelm@39868
  1666
  More ambitious tools with more fine-grained interaction with the
wenzelm@39868
  1667
  environment need to observe the principles explained below.
wenzelm@58618
  1668
\<close>
wenzelm@58618
  1669
wenzelm@58618
  1670
wenzelm@58618
  1671
subsection \<open>Multi-threading with shared memory\<close>
wenzelm@58618
  1672
wenzelm@58618
  1673
text \<open>Multiple threads help to organize advanced operations of the
wenzelm@39868
  1674
  system, such as real-time conditions on command transactions,
wenzelm@39868
  1675
  sub-components with explicit communication, general asynchronous
wenzelm@39868
  1676
  interaction etc.  Moreover, parallel evaluation is a prerequisite to
wenzelm@39868
  1677
  make adequate use of the CPU resources that are available on
wenzelm@39868
  1678
  multi-core systems.\footnote{Multi-core computing does not mean that
wenzelm@39868
  1679
  there are ``spare cycles'' to be wasted.  It means that the
wenzelm@39868
  1680
  continued exponential speedup of CPU performance due to ``Moore's
wenzelm@39868
  1681
  Law'' follows different rules: clock frequency has reached its peak
wenzelm@39868
  1682
  around 2005, and applications need to be parallelized in order to
wenzelm@39868
  1683
  avoid a perceived loss of performance.  See also
wenzelm@58555
  1684
  @{cite "Sutter:2005"}.}
wenzelm@39867
  1685
wenzelm@57421
  1686
  Isabelle/Isar exploits the inherent structure of theories and proofs to
wenzelm@57421
  1687
  support \emph{implicit parallelism} to a large extent. LCF-style theorem
wenzelm@57421
  1688
  proving provides almost ideal conditions for that, see also
wenzelm@58555
  1689
  @{cite "Wenzel:2009"}. This means, significant parts of theory and proof
wenzelm@57421
  1690
  checking is parallelized by default. In Isabelle2013, a maximum
wenzelm@57421
  1691
  speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be expected
wenzelm@58555
  1692
  @{cite "Wenzel:2013:ITP"}.
wenzelm@39867
  1693
wenzelm@39867
  1694
  \medskip ML threads lack the memory protection of separate
wenzelm@39867
  1695
  processes, and operate concurrently on shared heap memory.  This has
wenzelm@40126
  1696
  the advantage that results of independent computations are directly
wenzelm@40126
  1697
  available to other threads: abstract values can be passed without
wenzelm@40126
  1698
  copying or awkward serialization that is typically required for
wenzelm@40126
  1699
  separate processes.
wenzelm@39867
  1700
wenzelm@39867
  1701
  To make shared-memory multi-threading work robustly and efficiently,
wenzelm@39867
  1702
  some programming guidelines need to be observed.  While the ML
wenzelm@39867
  1703
  system is responsible to maintain basic integrity of the
wenzelm@39867
  1704
  representation of ML values in memory, the application programmer
wenzelm@39867
  1705
  needs to ensure that multi-threaded execution does not break the
wenzelm@39867
  1706
  intended semantics.
wenzelm@39867
  1707
wenzelm@39867
  1708
  \begin{warn}
wenzelm@39867
  1709
  To participate in implicit parallelism, tools need to be
wenzelm@39867
  1710
  thread-safe.  A single ill-behaved tool can affect the stability and
wenzelm@39867
  1711
  performance of the whole system.
wenzelm@39867
  1712
  \end{warn}
wenzelm@39867
  1713
wenzelm@57421
  1714
  Apart from observing the principles of thread-safeness passively, advanced
wenzelm@57421
  1715
  tools may also exploit parallelism actively, e.g.\ by using library
wenzelm@39868
  1716
  functions for parallel list operations (\secref{sec:parlist}).
wenzelm@39867
  1717
wenzelm@39867
  1718
  \begin{warn}
wenzelm@39867
  1719
  Parallel computing resources are managed centrally by the
wenzelm@59180
  1720
  Isabelle/ML infrastructure.  User programs should not fork their own
wenzelm@57421
  1721
  ML threads to perform heavy computations.
wenzelm@39867
  1722
  \end{warn}
wenzelm@58618
  1723
\<close>
wenzelm@58618
  1724
wenzelm@58618
  1725
wenzelm@58618
  1726
subsection \<open>Critical shared resources\<close>
wenzelm@58618
  1727
wenzelm@58618
  1728
text \<open>Thread-safeness is mainly concerned about concurrent
wenzelm@39867
  1729
  read/write access to shared resources, which are outside the purely
wenzelm@39867
  1730
  functional world of ML.  This covers the following in particular.
wenzelm@39867
  1731
wenzelm@39867
  1732
  \begin{itemize}
wenzelm@39867
  1733
wenzelm@39867
  1734
  \item Global references (or arrays), i.e.\ mutable memory cells that
wenzelm@39867
  1735
  persist over several invocations of associated
wenzelm@39867
  1736
  operations.\footnote{This is independent of the visibility of such
wenzelm@39867
  1737
  mutable values in the toplevel scope.}
wenzelm@39867
  1738
wenzelm@39868
  1739
  \item Global state of the running Isabelle/ML process, i.e.\ raw I/O
wenzelm@39868
  1740
  channels, environment variables, current working directory.
wenzelm@39867
  1741
wenzelm@39867
  1742
  \item Writable resources in the file-system that are shared among
wenzelm@40126
  1743
  different threads or external processes.
wenzelm@39867
  1744
wenzelm@39867
  1745
  \end{itemize}
wenzelm@39867
  1746
wenzelm@39868
  1747
  Isabelle/ML provides various mechanisms to avoid critical shared
wenzelm@40126
  1748
  resources in most situations.  As last resort there are some
wenzelm@40126
  1749
  mechanisms for explicit synchronization.  The following guidelines
wenzelm@40126
  1750
  help to make Isabelle/ML programs work smoothly in a concurrent
wenzelm@40126
  1751
  environment.
wenzelm@39867
  1752
wenzelm@39867
  1753
  \begin{itemize}
wenzelm@39867
  1754
wenzelm@39867
  1755
  \item Avoid global references altogether.  Isabelle/Isar maintains a
wenzelm@39868
  1756
  uniform context that incorporates arbitrary data declared by user
wenzelm@39868
  1757
  programs (\secref{sec:context-data}).  This context is passed as
wenzelm@39868
  1758
  plain value and user tools can get/map their own data in a purely
wenzelm@39868
  1759
  functional manner.  Configuration options within the context
wenzelm@39868
  1760
  (\secref{sec:config-options}) provide simple drop-in replacements
wenzelm@40126
  1761
  for historic reference variables.
wenzelm@39867
  1762
wenzelm@39867
  1763
  \item Keep components with local state information re-entrant.
wenzelm@39868
  1764
  Instead of poking initial values into (private) global references, a
wenzelm@39868
  1765
  new state record can be created on each invocation, and passed
wenzelm@39868
  1766
  through any auxiliary functions of the component.  The state record
wenzelm@59180
  1767
  contain mutable references in special situations, without requiring any
wenzelm@59180
  1768
  synchronization, as long as each invocation gets its own copy and the
wenzelm@57421
  1769
  tool itself is single-threaded.
wenzelm@39867
  1770
wenzelm@39868
  1771
  \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The
wenzelm@39868
  1772
  Poly/ML library is thread-safe for each individual output operation,
wenzelm@39868
  1773
  but the ordering of parallel invocations is arbitrary.  This means
wenzelm@39868
  1774
  raw output will appear on some system console with unpredictable
wenzelm@39868
  1775
  interleaving of atomic chunks.
wenzelm@39867
  1776
wenzelm@39868
  1777
  Note that this does not affect regular message output channels
wenzelm@57421
  1778
  (\secref{sec:message-channels}).  An official message id is associated
wenzelm@39868
  1779
  with the command transaction from where it originates, independently
wenzelm@39868
  1780
  of other transactions.  This means each running Isar command has
wenzelm@39868
  1781
  effectively its own set of message channels, and interleaving can
wenzelm@39868
  1782
  only happen when commands use parallelism internally (and only at
wenzelm@39868
  1783
  message boundaries).
wenzelm@39867
  1784
wenzelm@39868
  1785
  \item Treat environment variables and the current working directory
wenzelm@57421
  1786
  of the running process as read-only.
wenzelm@39867
  1787
wenzelm@39868
  1788
  \item Restrict writing to the file-system to unique temporary files.
wenzelm@39868
  1789
  Isabelle already provides a temporary directory that is unique for
wenzelm@39868
  1790
  the running process, and there is a centralized source of unique
wenzelm@39868
  1791
  serial numbers in Isabelle/ML.  Thus temporary files that are passed
wenzelm@39868
  1792
  to to some external process will be always disjoint, and thus
wenzelm@39867
  1793
  thread-safe.
wenzelm@39867
  1794
wenzelm@39867
  1795
  \end{itemize}
wenzelm@58618
  1796
\<close>
wenzelm@58618
  1797
wenzelm@58618
  1798
text %mlref \<open>
wenzelm@39868
  1799
  \begin{mldecls}
wenzelm@39868
  1800
  @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
wenzelm@39868
  1801
  @{index_ML serial_string: "unit -> string"} \\
wenzelm@39868
  1802
  \end{mldecls}
wenzelm@39868
  1803
wenzelm@39868
  1804
  \begin{description}
wenzelm@39868
  1805
wenzelm@39868
  1806
  \item @{ML File.tmp_path}~@{text "path"} relocates the base
wenzelm@39868
  1807
  component of @{text "path"} into the unique temporary directory of
wenzelm@39868
  1808
  the running Isabelle/ML process.
wenzelm@39868
  1809
wenzelm@39868
  1810
  \item @{ML serial_string}~@{text "()"} creates a new serial number
wenzelm@39868
  1811
  that is unique over the runtime of the Isabelle/ML process.
wenzelm@39868
  1812
wenzelm@39868
  1813
  \end{description}
wenzelm@58618
  1814
\<close>
wenzelm@58618
  1815
wenzelm@58618
  1816
text %mlex \<open>The following example shows how to create unique
wenzelm@39868
  1817
  temporary file names.
wenzelm@58618
  1818
\<close>
wenzelm@58618
  1819
wenzelm@58618
  1820
ML \<open>
wenzelm@39868
  1821
  val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
wenzelm@39868
  1822
  val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
wenzelm@39868
  1823
  @{assert} (tmp1 <> tmp2);
wenzelm@58618
  1824
\<close>
wenzelm@58618
  1825
wenzelm@58618
  1826
wenzelm@58618
  1827
subsection \<open>Explicit synchronization\<close>
wenzelm@58618
  1828
wenzelm@59180
  1829
text \<open>Isabelle/ML provides explicit synchronization for mutable variables over
wenzelm@59180
  1830
  immutable data, which may be updated atomically and exclusively. This
wenzelm@59180
  1831
  addresses the rare situations where mutable shared resources are really
wenzelm@59180
  1832
  required. Synchronization in Isabelle/ML is based on primitives of Poly/ML,
wenzelm@59180
  1833
  which have been adapted to the specific assumptions of the concurrent
wenzelm@59180
  1834
  Isabelle environment. User code should not break this abstraction, but stay
wenzelm@59180
  1835
  within the confines of concurrent Isabelle/ML.
wenzelm@59180
  1836
haftmann@59187
  1837
  A \emph{synchronized variable} is an explicit state component associated
wenzelm@59180
  1838
  with mechanisms for locking and signaling. There are operations to await a
wenzelm@59180
  1839
  condition, change the state, and signal the change to all other waiting
wenzelm@59180
  1840
  threads. Synchronized access to the state variable is \emph{not} re-entrant:
wenzelm@59180
  1841
  direct or indirect nesting within the same thread will cause a deadlock!\<close>
wenzelm@58618
  1842
wenzelm@58618
  1843
text %mlref \<open>
wenzelm@39867
  1844
  \begin{mldecls}
wenzelm@39871
  1845
  @{index_ML_type "'a Synchronized.var"} \\
wenzelm@39871
  1846
  @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
wenzelm@39871
  1847
  @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
wenzelm@39871
  1848
  ('a -> ('b * 'a) option) -> 'b"} \\
wenzelm@39871
  1849
  \end{mldecls}
wenzelm@39867
  1850
wenzelm@39867
  1851
  \begin{description}
wenzelm@39867
  1852
wenzelm@39871
  1853
  \item Type @{ML_type "'a Synchronized.var"} represents synchronized
wenzelm@39871
  1854
  variables with state of type @{ML_type 'a}.
wenzelm@39871
  1855
wenzelm@39871
  1856
  \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized
wenzelm@39871
  1857
  variable that is initialized with value @{text "x"}.  The @{text
wenzelm@39871
  1858
  "name"} is used for tracing.
wenzelm@39871
  1859
wenzelm@39871
  1860
  \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the
wenzelm@39871
  1861
  function @{text "f"} operate within a critical section on the state
wenzelm@40126
  1862
  @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it
wenzelm@40126
  1863
  continues to wait on the internal condition variable, expecting that
wenzelm@39871
  1864
  some other thread will eventually change the content in a suitable
wenzelm@40126
  1865
  manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is
wenzelm@40126
  1866
  satisfied and assigns the new state value @{text "x'"}, broadcasts a
wenzelm@40126
  1867
  signal to all waiting threads on the associated condition variable,
wenzelm@40126
  1868
  and returns the result @{text "y"}.
wenzelm@39871
  1869
wenzelm@39867
  1870
  \end{description}
wenzelm@39871
  1871
wenzelm@40126
  1872
  There are some further variants of the @{ML
wenzelm@40800
  1873
  Synchronized.guarded_access} combinator, see @{file
wenzelm@39871
  1874
  "~~/src/Pure/Concurrent/synchronized.ML"} for details.
wenzelm@58618
  1875
\<close>
wenzelm@58618
  1876
wenzelm@58618
  1877
text %mlex \<open>The following example implements a counter that produces
wenzelm@39871
  1878
  positive integers that are unique over the runtime of the Isabelle
wenzelm@59180
  1879
  process:\<close>
wenzelm@58618
  1880
wenzelm@58618
  1881
ML \<open>
wenzelm@39871
  1882
  local
wenzelm@39871
  1883
    val counter = Synchronized.var "counter" 0;
wenzelm@39871
  1884
  in
wenzelm@39871
  1885
    fun next () =
wenzelm@39871
  1886
      Synchronized.guarded_access counter
wenzelm@39871
  1887
        (fn i =>
wenzelm@39871
  1888
          let val j = i + 1
wenzelm@39871
  1889
          in SOME (j, j) end);
wenzelm@39871
  1890
  end;
wenzelm@58618
  1891
\<close>
wenzelm@58618
  1892
wenzelm@58618
  1893
ML \<open>
wenzelm@39871
  1894
  val a = next ();
wenzelm@39871
  1895
  val b = next ();
wenzelm@39871
  1896
  @{assert} (a <> b);
wenzelm@58618
  1897
\<close>
wenzelm@58618
  1898
wenzelm@58618
  1899
text \<open>\medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how
wenzelm@40126
  1900
  to implement a mailbox as synchronized variable over a purely
wenzelm@58618
  1901
  functional list.\<close>
wenzelm@58618
  1902
wenzelm@58618
  1903
wenzelm@58618
  1904
section \<open>Managed evaluation\<close>
wenzelm@58618
  1905
wenzelm@58618
  1906
text \<open>Execution of Standard ML follows the model of strict
wenzelm@52419
  1907
  functional evaluation with optional exceptions.  Evaluation happens
wenzelm@52419
  1908
  whenever some function is applied to (sufficiently many)
wenzelm@52419
  1909
  arguments. The result is either an explicit value or an implicit
wenzelm@52419
  1910
  exception.
wenzelm@52419
  1911
wenzelm@52419
  1912
  \emph{Managed evaluation} in Isabelle/ML organizes expressions and
wenzelm@52419
  1913
  results to control certain physical side-conditions, to say more
wenzelm@52419
  1914
  specifically when and how evaluation happens.  For example, the
wenzelm@52419
  1915
  Isabelle/ML library supports lazy evaluation with memoing, parallel
wenzelm@52419
  1916
  evaluation via futures, asynchronous evaluation via promises,
wenzelm@52419
  1917
  evaluation with time limit etc.
wenzelm@52419
  1918
wenzelm@52419
  1919
  \medskip An \emph{unevaluated expression} is represented either as
wenzelm@52419
  1920
  unit abstraction @{verbatim "fn () => a"} of type
wenzelm@52419
  1921
  @{verbatim "unit -> 'a"} or as regular function
wenzelm@52419
  1922
  @{verbatim "fn a => b"} of type @{verbatim "'a -> 'b"}.  Both forms
wenzelm@52419
  1923
  occur routinely, and special care is required to tell them apart ---
wenzelm@52419
  1924
  the static type-system of SML is only of limited help here.
wenzelm@52419
  1925
wenzelm@52419
  1926
  The first form is more intuitive: some combinator @{text "(unit ->
wenzelm@52419
  1927
  'a) -> 'a"} applies the given function to @{text "()"} to initiate
wenzelm@52419
  1928
  the postponed evaluation process.  The second form is more flexible:
wenzelm@52419
  1929
  some combinator @{text "('a -> 'b) -> 'a -> 'b"} acts like a
wenzelm@52419
  1930
  modified form of function application; several such combinators may
wenzelm@52419
  1931
  be cascaded to modify a given function, before it is ultimately
wenzelm@52419
  1932
  applied to some argument.
wenzelm@52419
  1933
wenzelm@52419
  1934
  \medskip \emph{Reified results} make the disjoint sum of regular
wenzelm@52419
  1935
  values versions exceptional situations explicit as ML datatype:
wenzelm@52419
  1936
  @{text "'a result = Res of 'a | Exn of exn"}.  This is typically
wenzelm@52419
  1937
  used for administrative purposes, to store the overall outcome of an
wenzelm@52419
  1938
  evaluation process.
wenzelm@52419
  1939
wenzelm@52419
  1940
  \emph{Parallel exceptions} aggregate reified results, such that
wenzelm@52419
  1941
  multiple exceptions are digested as a collection in canonical form
wenzelm@52419
  1942
  that identifies exceptions according to their original occurrence.
wenzelm@52419
  1943
  This is particular important for parallel evaluation via futures
wenzelm@52419
  1944
  \secref{sec:futures}, which are organized as acyclic graph of
wenzelm@52419
  1945
  evaluations that depend on other evaluations: exceptions stemming
wenzelm@52419
  1946
  from shared sub-graphs are exposed exactly once and in the order of
wenzelm@52419
  1947
  their original occurrence (e.g.\ when printed at the toplevel).
wenzelm@52419
  1948
  Interrupt counts as neutral element here: it is treated as minimal
wenzelm@52419
  1949
  information about some canceled evaluation process, and is absorbed
wenzelm@58618
  1950
  by the presence of regular program exceptions.\<close>
wenzelm@58618
  1951
wenzelm@58618
  1952
text %mlref \<open>
wenzelm@52419
  1953
  \begin{mldecls}
wenzelm@52419
  1954
  @{index_ML_type "'a Exn.result"} \\
wenzelm@52419
  1955
  @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
wenzelm@52419
  1956
  @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
wenzelm@52419
  1957
  @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
wenzelm@52419
  1958
  @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
wenzelm@52419
  1959
  @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
wenzelm@52419
  1960
  \end{mldecls}
wenzelm@52419
  1961
wenzelm@52419
  1962
  \begin{description}
wenzelm@52419
  1963
wenzelm@52419
  1964
  \item Type @{ML_type "'a Exn.result"} represents the disjoint sum of
wenzelm@52419
  1965
  ML results explicitly, with constructor @{ML Exn.Res} for regular
wenzelm@52419
  1966
  values and @{ML "Exn.Exn"} for exceptions.
wenzelm@52419
  1967
wenzelm@52419
  1968
  \item @{ML Exn.capture}~@{text "f x"} manages the evaluation of
wenzelm@52419
  1969
  @{text "f x"} such that exceptions are made explicit as @{ML
wenzelm@52419
  1970
  "Exn.Exn"}.  Note that this includes physical interrupts (see also
wenzelm@52419
  1971
  \secref{sec:exceptions}), so the same precautions apply to user
wenzelm@52419
  1972
  code: interrupts must not be absorbed accidentally!
wenzelm@52419
  1973
wenzelm@52419
  1974
  \item @{ML Exn.interruptible_capture} is similar to @{ML
wenzelm@52419
  1975
  Exn.capture}, but interrupts are immediately re-raised as required
wenzelm@52419
  1976
  for user code.
wenzelm@52419
  1977
wenzelm@52419
  1978
  \item @{ML Exn.release}~@{text "result"} releases the original
wenzelm@52419
  1979
  runtime result, exposing its regular value or raising the reified
wenzelm@52419
  1980
  exception.
wenzelm@52419
  1981
wenzelm@52419
  1982
  \item @{ML Par_Exn.release_all}~@{text "results"} combines results
wenzelm@52419
  1983
  that were produced independently (e.g.\ by parallel evaluation).  If
wenzelm@52419
  1984
  all results are regular values, that list is returned.  Otherwise,
wenzelm@52419
  1985
  the collection of all exceptions is raised, wrapped-up as collective
wenzelm@52419
  1986
  parallel exception.  Note that the latter prevents access to
wenzelm@57421
  1987
  individual exceptions by conventional @{verbatim "handle"} of ML.
wenzelm@52419
  1988
wenzelm@52419
  1989
  \item @{ML Par_Exn.release_first} is similar to @{ML
wenzelm@59138
  1990
  Par_Exn.release_all}, but only the first (meaningful) exception that has
wenzelm@59138
  1991
  occurred in the original evaluation process is raised again, the others are
wenzelm@52419
  1992
  ignored.  That single exception may get handled by conventional
wenzelm@57421
  1993
  means in ML.
wenzelm@52419
  1994
wenzelm@52419
  1995
  \end{description}
wenzelm@58618
  1996
\<close>
wenzelm@58618
  1997
wenzelm@58618
  1998
wenzelm@58618
  1999
subsection \<open>Parallel skeletons \label{sec:parlist}\<close>
wenzelm@58618
  2000
wenzelm@58618
  2001
text \<open>
wenzelm@52420
  2002
  Algorithmic skeletons are combinators that operate on lists in
wenzelm@52420
  2003
  parallel, in the manner of well-known @{text map}, @{text exists},
wenzelm@52420
  2004
  @{text forall} etc.  Management of futures (\secref{sec:futures})
wenzelm@52420
  2005
  and their results as reified exceptions is wrapped up into simple
wenzelm@52420
  2006
  programming interfaces that resemble the sequential versions.
wenzelm@52420
  2007
wenzelm@52420
  2008
  What remains is the application-specific problem to present
wenzelm@52420
  2009
  expressions with suitable \emph{granularity}: each list element
wenzelm@52420
  2010
  corresponds to one evaluation task.  If the granularity is too
wenzelm@52420
  2011
  coarse, the available CPUs are not saturated.  If it is too
wenzelm@52420
  2012
  fine-grained, CPU cycles are wasted due to the overhead of
wenzelm@52420
  2013
  organizing parallel processing.  In the worst case, parallel
wenzelm@52420
  2014
  performance will be less than the sequential counterpart!
wenzelm@58618
  2015
\<close>
wenzelm@58618
  2016
wenzelm@58618
  2017
text %mlref \<open>
wenzelm@52420
  2018
  \begin{mldecls}
wenzelm@52420
  2019
  @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
wenzelm@52420
  2020
  @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
wenzelm@52420
  2021
  \end{mldecls}
wenzelm@52420
  2022
wenzelm@52420
  2023
  \begin{description}
wenzelm@52420
  2024
wenzelm@52420
  2025
  \item @{ML Par_List.map}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} is like @{ML
wenzelm@52420
  2026
  "map"}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"}, but the evaluation of @{text "f x\<^sub>i"}
wenzelm@52420
  2027
  for @{text "i = 1, \<dots>, n"} is performed in parallel.
wenzelm@52420
  2028
wenzelm@52420
  2029
  An exception in any @{text "f x\<^sub>i"} cancels the overall evaluation
wenzelm@52420
  2030
  process.  The final result is produced via @{ML
wenzelm@52420
  2031
  Par_Exn.release_first} as explained above, which means the first
wenzelm@52420
  2032
  program exception that happened to occur in the parallel evaluation
wenzelm@52420
  2033
  is propagated, and all other failures are ignored.
wenzelm@52420
  2034
wenzelm@52420
  2035
  \item @{ML Par_List.get_some}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} produces some
wenzelm@52420
  2036
  @{text "f x\<^sub>i"} that is of the form @{text "SOME y\<^sub>i"}, if that
wenzelm@52420
  2037
  exists, otherwise @{text "NONE"}.  Thus it is similar to @{ML
wenzelm@52420
  2038
  Library.get_first}, but subject to a non-deterministic parallel
wenzelm@52420
  2039
  choice process.  The first successful result cancels the overall
wenzelm@52420
  2040
  evaluation process; other exceptions are propagated as for @{ML
wenzelm@52420
  2041
  Par_List.map}.
wenzelm@52420
  2042
wenzelm@52420
  2043
  This generic parallel choice combinator is the basis for derived
wenzelm@52420
  2044
  forms, such as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML
wenzelm@52420
  2045
  Par_List.forall}.
wenzelm@52420
  2046
wenzelm@52420
  2047
  \end{description}
wenzelm@58618
  2048
\<close>
wenzelm@58618
  2049
wenzelm@58618
  2050
text %mlex \<open>Subsequently, the Ackermann function is evaluated in
wenzelm@58618
  2051
  parallel for some ranges of arguments.\<close>
wenzelm@58618
  2052
wenzelm@58618
  2053
ML_val \<open>
wenzelm@52420
  2054
  fun ackermann 0 n = n + 1
wenzelm@52420
  2055
    | ackermann m 0 = ackermann (m - 1) 1
wenzelm@52420
  2056
    | ackermann m n = ackermann (m - 1) (ackermann m (n - 1));
wenzelm@52420
  2057
wenzelm@52420
  2058
  Par_List.map (ackermann 2) (500 upto 1000);
wenzelm@52420
  2059
  Par_List.map (ackermann 3) (5 upto 10);
wenzelm@58618
  2060
\<close>
wenzelm@58618
  2061
wenzelm@58618
  2062
wenzelm@58618
  2063
subsection \<open>Lazy evaluation\<close>
wenzelm@58618
  2064
wenzelm@58618
  2065
text \<open>
wenzelm@57349
  2066
  Classic lazy evaluation works via the @{text lazy}~/ @{text force} pair of
wenzelm@57349
  2067
  operations: @{text lazy} to wrap an unevaluated expression, and @{text
wenzelm@57349
  2068
  force} to evaluate it once and store its result persistently. Later
wenzelm@57349
  2069
  invocations of @{text force} retrieve the stored result without another
wenzelm@57349
  2070
  evaluation. Isabelle/ML refines this idea to accommodate the aspects of
wenzelm@57349
  2071
  multi-threading, synchronous program exceptions and asynchronous interrupts.
wenzelm@57347
  2072
wenzelm@57347
  2073
  The first thread that invokes @{text force} on an unfinished lazy value
wenzelm@57347
  2074
  changes its state into a \emph{promise} of the eventual result and starts
wenzelm@57347
  2075
  evaluating it. Any other threads that @{text force} the same lazy value in
wenzelm@57347
  2076
  the meantime need to wait for it to finish, by producing a regular result or
wenzelm@57347
  2077
  program exception. If the evaluation attempt is interrupted, this event is
wenzelm@57347
  2078
  propagated to all waiting threads and the lazy value is reset to its
wenzelm@57347
  2079
  original state.
wenzelm@57347
  2080
wenzelm@57347
  2081
  This means a lazy value is completely evaluated at most once, in a
wenzelm@57347
  2082
  thread-safe manner. There might be multiple interrupted evaluation attempts,
wenzelm@57347
  2083
  and multiple receivers of intermediate interrupt events. Interrupts are
wenzelm@57347
  2084
  \emph{not} made persistent: later evaluation attempts start again from the
wenzelm@57347
  2085
  original expression.
wenzelm@58618
  2086
\<close>
wenzelm@58618
  2087
wenzelm@58618
  2088
text %mlref \<open>
wenzelm@57347
  2089
  \begin{mldecls}
wenzelm@57347
  2090
  @{index_ML_type "'a lazy"} \\
wenzelm@57347
  2091
  @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\
wenzelm@57347
  2092
  @{index_ML Lazy.value: "'a -> 'a lazy"} \\
wenzelm@57347
  2093
  @{index_ML Lazy.force: "'a lazy -> 'a"} \\
wenzelm@57347
  2094
  \end{mldecls}
wenzelm@57347
  2095
wenzelm@57347
  2096
  \begin{description}
wenzelm@57347
  2097
wenzelm@57347
  2098
  \item Type @{ML_type "'a lazy"} represents lazy values over type @{verbatim
wenzelm@57347
  2099
  "'a"}.
wenzelm@57347
  2100
wenzelm@57347
  2101
  \item @{ML Lazy.lazy}~@{text "(fn () => e)"} wraps the unevaluated
wenzelm@57347
  2102
  expression @{text e} as unfinished lazy value.
wenzelm@57347
  2103
wenzelm@57347
  2104
  \item @{ML Lazy.value}~@{text a} wraps the value @{text a} as finished lazy
wenzelm@57347
  2105
  value.  When forced, it returns @{text a} without any further evaluation.
wenzelm@57347
  2106
wenzelm@57349
  2107
  There is very low overhead for this proforma wrapping of strict values as
wenzelm@57349
  2108
  lazy values.
wenzelm@57347
  2109
wenzelm@57347
  2110
  \item @{ML Lazy.force}~@{text x} produces the result of the lazy value in a
wenzelm@57347
  2111
  thread-safe manner as explained above. Thus it may cause the current thread
wenzelm@57347
  2112
  to wait on a pending evaluation attempt by another thread.
wenzelm@57347
  2113
wenzelm@57347
  2114
  \end{description}
wenzelm@58618
  2115
\<close>
wenzelm@58618
  2116
wenzelm@58618
  2117
wenzelm@58618
  2118
subsection \<open>Futures \label{sec:futures}\<close>
wenzelm@58618
  2119
wenzelm@58618
  2120
text \<open>
wenzelm@57349
  2121
  Futures help to organize parallel execution in a value-oriented manner, with
wenzelm@57349
  2122
  @{text fork}~/ @{text join} as the main pair of operations, and some further
wenzelm@58555
  2123
  variants; see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy
wenzelm@58555
  2124
  values, futures are evaluated strictly and spontaneously on separate worker
wenzelm@58555
  2125
  threads. Futures may be canceled, which leads to interrupts on running
wenzelm@58555
  2126
  evaluation attempts, and forces structurally related futures to fail for all
wenzelm@58555
  2127
  time; already finished futures remain unchanged. Exceptions between related
wenzelm@57350
  2128
  futures are propagated as well, and turned into parallel exceptions (see
wenzelm@57350
  2129
  above).
wenzelm@57349
  2130
wenzelm@57349
  2131
  Technically, a future is a single-assignment variable together with a
wenzelm@57349
  2132
  \emph{task} that serves administrative purposes, notably within the
wenzelm@57349
  2133
  \emph{task queue} where new futures are registered for eventual evaluation
wenzelm@57349
  2134
  and the worker threads retrieve their work.
wenzelm@57349
  2135
wenzelm@57350
  2136
  The pool of worker threads is limited, in correlation with the number of
wenzelm@57350
  2137
  physical cores on the machine. Note that allocation of runtime resources may
wenzelm@57350
  2138
  be distorted either if workers yield CPU time (e.g.\ via system sleep or
wenzelm@57350
  2139
  wait operations), or if non-worker threads contend for significant runtime
wenzelm@57350
  2140
  resources independently. There is a limited number of replacement worker
wenzelm@57350
  2141
  threads that get activated in certain explicit wait conditions, after a
wenzelm@57350
  2142
  timeout.
wenzelm@57350
  2143
wenzelm@57349
  2144
  \medskip Each future task belongs to some \emph{task group}, which
wenzelm@57349
  2145
  represents the hierarchic structure of related tasks, together with the
wenzelm@57349
  2146
  exception status a that point. By default, the task group of a newly created
wenzelm@57349
  2147
  future is a new sub-group of the presently running one, but it is also
wenzelm@57349
  2148
  possible to indicate different group layouts under program control.
wenzelm@57349
  2149
wenzelm@57349
  2150
  Cancellation of futures actually refers to the corresponding task group and
wenzelm@57349
  2151
  all its sub-groups. Thus interrupts are propagated down the group hierarchy.
wenzelm@57349
  2152
  Regular program exceptions are treated likewise: failure of the evaluation
wenzelm@57349
  2153
  of some future task affects its own group and all sub-groups. Given a
wenzelm@57349
  2154
  particular task group, its \emph{group status} cumulates all relevant
wenzelm@57350
  2155
  exceptions according to its position within the group hierarchy. Interrupted
wenzelm@57350
  2156
  tasks that lack regular result information, will pick up parallel exceptions
wenzelm@57350
  2157
  from the cumulative group status.
wenzelm@57349
  2158
wenzelm@57349
  2159
  \medskip A \emph{passive future} or \emph{promise} is a future with slightly
wenzelm@57349
  2160
  different evaluation policies: there is only a single-assignment variable
wenzelm@57349
  2161
  and some expression to evaluate for the \emph{failed} case (e.g.\ to clean
wenzelm@57349
  2162
  up resources when canceled). A regular result is produced by external means,
wenzelm@57349
  2163
  using a separate \emph{fulfill} operation.
wenzelm@57349
  2164
wenzelm@57349
  2165
  Promises are managed in the same task queue, so regular futures may depend
wenzelm@57349
  2166
  on them. This allows a form of reactive programming, where some promises are
wenzelm@57349
  2167
  used as minimal elements (or guards) within the future dependency graph:
wenzelm@57349
  2168
  when these promises are fulfilled the evaluation of subsequent futures
wenzelm@57349
  2169
  starts spontaneously, according to their own inter-dependencies.
wenzelm@58618
  2170
\<close>
wenzelm@58618
  2171
wenzelm@58618
  2172
text %mlref \<open>
wenzelm@57348
  2173
  \begin{mldecls}
wenzelm@57348
  2174
  @{index_ML_type "'a future"} \\
wenzelm@57348
  2175
  @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\
wenzelm@57348
  2176
  @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\
wenzelm@57349
  2177
  @{index_ML Future.join: "'a future -> 'a"} \\
wenzelm@57349
  2178
  @{index_ML Future.joins: "'a future list -> 'a list"} \\
wenzelm@57348
  2179
  @{index_ML Future.value: "'a -> 'a future"} \\
wenzelm@57348
  2180
  @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\
wenzelm@57348
  2181
  @{index_ML Future.cancel: "'a future -> unit"} \\
wenzelm@57348
  2182
  @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex]
wenzelm@57348
  2183
  @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\
wenzelm@57348
  2184
  @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\
wenzelm@57348
  2185
  \end{mldecls}
wenzelm@57348
  2186
wenzelm@57348
  2187
  \begin{description}
wenzelm@57348
  2188
wenzelm@57348
  2189
  \item Type @{ML_type "'a future"} represents future values over type
wenzelm@57348
  2190
  @{verbatim "'a"}.
wenzelm@57348
  2191
wenzelm@57348
  2192
  \item @{ML Future.fork}~@{text "(fn () => e)"} registers the unevaluated
wenzelm@57348
  2193
  expression @{text e} as unfinished future value, to be evaluated eventually
wenzelm@57348
  2194
  on the parallel worker-thread farm. This is a shorthand for @{ML
wenzelm@57348
  2195
  Future.forks} below, with default parameters and a single expression.
wenzelm@57348
  2196
wenzelm@57348
  2197
  \item @{ML Future.forks}~@{text "params exprs"} is the general interface to
wenzelm@57348
  2198
  fork several futures simultaneously. The @{text params} consist of the
wenzelm@57348
  2199
  following fields:
wenzelm@57348
  2200
wenzelm@57348
  2201
  \begin{itemize}
wenzelm@57348
  2202
wenzelm@57348
  2203
  \item @{text "name : string"} (default @{ML "\"\""}) specifies a common name
wenzelm@57348
  2204
  for the tasks of the forked futures, which serves diagnostic purposes.
wenzelm@57348
  2205
wenzelm@57348
  2206
  \item @{text "group : Future.group option"} (default @{ML NONE}) specifies
wenzelm@57348
  2207
  an optional task group for the forked futures. @{ML NONE} means that a new
wenzelm@57348
  2208
  sub-group of the current worker-thread task context is created. If this is
wenzelm@57348
  2209
  not a worker thread, the group will be a new root in the group hierarchy.
wenzelm@57348
  2210
wenzelm@57348
  2211
  \item @{text "deps : Future.task list"} (default @{ML "[]"}) specifies
wenzelm@57348
  2212
  dependencies on other future tasks, i.e.\ the adjacency relation in the
wenzelm@57349
  2213
  global task queue. Dependencies on already finished tasks are ignored.
wenzelm@57348
  2214
wenzelm@57348
  2215
  \item @{text "pri : int"} (default @{ML 0}) specifies a priority within the
wenzelm@57348
  2216
  task queue.
wenzelm@57348
  2217
wenzelm@57348
  2218
  Typically there is only little deviation from the default priority @{ML 0}.
wenzelm@57348
  2219
  As a rule of thumb, @{ML "~1"} means ``low priority" and @{ML 1} means
wenzelm@57348
  2220
  ``high priority''.
wenzelm@57348
  2221
wenzelm@57349
  2222
  Note that the task priority only affects the position in the queue, not the
wenzelm@57348
  2223
  thread priority. When a worker thread picks up a task for processing, it
wenzelm@57348
  2224
  runs with the normal thread priority to the end (or until canceled). Higher
wenzelm@57348
  2225
  priority tasks that are queued later need to wait until this (or another)
wenzelm@57348
  2226
  worker thread becomes free again.
wenzelm@57348
  2227
wenzelm@57348
  2228
  \item @{text "interrupts : bool"} (default @{ML true}) tells whether the
wenzelm@57348
  2229
  worker thread that processes the corresponding task is initially put into
wenzelm@57349
  2230
  interruptible state. This state may change again while running, by modifying
wenzelm@57349
  2231
  the thread attributes.
wenzelm@57348
  2232
wenzelm@57348
  2233
  With interrupts disabled, a running future task cannot be canceled.  It is
wenzelm@57348
  2234
  the responsibility of the programmer that this special state is retained
wenzelm@57348
  2235
  only briefly.
wenzelm@57348
  2236
wenzelm@57348
  2237
  \end{itemize}
wenzelm@57348
  2238
wenzelm@57348
  2239
  \item @{ML Future.join}~@{text x} retrieves the value of an already finished
wenzelm@57348
  2240
  future, which may lead to an exception, according to the result of its
wenzelm@57348
  2241
  previous evaluation.
wenzelm@57348
  2242
wenzelm@57348
  2243
  For an unfinished future there are several cases depending on the role of
wenzelm@57348
  2244
  the current thread and the status of the future. A non-worker thread waits
wenzelm@57348
  2245
  passively until the future is eventually evaluated. A worker thread
wenzelm@57348
  2246
  temporarily changes its task context and takes over the responsibility to
wenzelm@57349
  2247
  evaluate the future expression on the spot. The latter is done in a
wenzelm@57349
  2248
  thread-safe manner: other threads that intend to join the same future need
wenzelm@57349
  2249
  to wait until the ongoing evaluation is finished.
wenzelm@57349
  2250
wenzelm@57349
  2251
  Note that excessive use of dynamic dependencies of futures by adhoc joining
wenzelm@57349
  2252
  may lead to bad utilization of CPU cores, due to threads waiting on other
wenzelm@57349
  2253
  threads to finish required futures. The future task farm has a limited
wenzelm@57349
  2254
  amount of replacement threads that continue working on unrelated tasks after
wenzelm@57349
  2255
  some timeout.
wenzelm@57348
  2256
wenzelm@57348
  2257
  Whenever possible, static dependencies of futures should be specified
wenzelm@57349
  2258
  explicitly when forked (see @{text deps} above). Thus the evaluation can
wenzelm@57349
  2259
  work from the bottom up, without join conflicts and wait states.
wenzelm@57349
  2260
wenzelm@57349
  2261
  \item @{ML Future.joins}~@{text xs} joins the given list of futures
wenzelm@57349
  2262
  simultaneously, which is more efficient than @{ML "map Future.join"}~@{text
wenzelm@57349
  2263
  xs}.
wenzelm@57349
  2264
wenzelm@57349
  2265
  Based on the dependency graph of tasks, the current thread takes over the
wenzelm@57349
  2266
  responsibility to evaluate future expressions that are required for the main
wenzelm@57349
  2267
  result, working from the bottom up. Waiting on future results that are
wenzelm@57349
  2268
  presently evaluated on other threads only happens as last resort, when no
wenzelm@57349
  2269
  other unfinished futures are left over.
wenzelm@57349
  2270
wenzelm@57349
  2271
  \item @{ML Future.value}~@{text a} wraps the value @{text a} as finished
wenzelm@57349
  2272
  future value, bypassing the worker-thread farm. When joined, it returns
wenzelm@57349
  2273
  @{text a} without any further evaluation.
wenzelm@57349
  2274
wenzelm@57349
  2275
  There is very low overhead for this proforma wrapping of strict values as
wenzelm@57421
  2276
  futures.
wenzelm@57348
  2277
wenzelm@57348
  2278
  \item @{ML Future.map}~@{text "f x"} is a fast-path implementation of @{ML
wenzelm@57348
  2279
  Future.fork}~@{text "(fn () => f ("}@{ML Future.join}~@{text "x))"}, which
wenzelm@57348
  2280
  avoids the full overhead of the task queue and worker-thread farm as far as
wenzelm@57348
  2281
  possible. The function @{text f} is supposed to be some trivial
wenzelm@57348
  2282
  post-processing or projection of the future result.
wenzelm@57348
  2283
wenzelm@57348
  2284
  \item @{ML Future.cancel}~@{text "x"} cancels the task group of the given
wenzelm@57348
  2285
  future, using @{ML Future.cancel_group} below.
wenzelm@57348
  2286
wenzelm@57349
  2287
  \item @{ML Future.cancel_group}~@{text "group"} cancels all tasks of the
wenzelm@57349
  2288
  given task group for all time. Threads that are presently processing a task
wenzelm@57349
  2289
  of the given group are interrupted: it may take some time until they are
wenzelm@57349
  2290
  actually terminated. Tasks that are queued but not yet processed are
wenzelm@57348
  2291
  dequeued and forced into interrupted state. Since the task group is itself
wenzelm@57348
  2292
  invalidated, any further attempt to fork a future that belongs to it will
wenzelm@57348
  2293
  yield a canceled result as well.
wenzelm@57348
  2294
wenzelm@57348
  2295
  \item @{ML Future.promise}~@{text abort} registers a passive future with the
wenzelm@57348
  2296
  given @{text abort} operation: it is invoked when the future task group is
wenzelm@57348
  2297
  canceled.
wenzelm@57348
  2298
wenzelm@57348
  2299
  \item @{ML Future.fulfill}~@{text "x a"} finishes the passive future @{text
wenzelm@57348
  2300
  x} by the given value @{text a}. If the promise has already been canceled,
wenzelm@57348
  2301
  the attempt to fulfill it causes an exception.
wenzelm@57348
  2302
wenzelm@57348
  2303
  \end{description}
wenzelm@58618
  2304
\<close>
wenzelm@57348
  2305
bulwahn@47180
  2306
end