src/Doc/Implementation/ML.thy
author wenzelm
Fri Aug 05 16:30:53 2016 +0200 (2016-08-05)
changeset 63610 4b40b8196dc7
parent 63215 c7de5b311909
child 63680 6e1e8b5abbfa
permissions -rw-r--r--
Sidekick parser for isabelle-ml and sml mode;
wenzelm@61656
     1
(*:maxLineLen=78:*)
wenzelm@57347
     2
wenzelm@29755
     3
theory "ML"
wenzelm@29755
     4
imports Base
wenzelm@29755
     5
begin
wenzelm@18538
     6
wenzelm@58618
     7
chapter \<open>Isabelle/ML\<close>
wenzelm@58618
     8
wenzelm@61854
     9
text \<open>
wenzelm@61854
    10
  Isabelle/ML is best understood as a certain culture based on Standard ML.
wenzelm@61854
    11
  Thus it is not a new programming language, but a certain way to use SML at
wenzelm@61854
    12
  an advanced level within the Isabelle environment. This covers a variety of
wenzelm@61854
    13
  aspects that are geared towards an efficient and robust platform for
wenzelm@61854
    14
  applications of formal logic with fully foundational proof construction ---
wenzelm@61854
    15
  according to the well-known \<^emph>\<open>LCF principle\<close>. There is specific
wenzelm@61854
    16
  infrastructure with library modules to address the needs of this difficult
wenzelm@61854
    17
  task. For example, the raw parallel programming model of Poly/ML is
wenzelm@61854
    18
  presented as considerably more abstract concept of \<^emph>\<open>futures\<close>, which is then
wenzelm@61854
    19
  used to augment the inference kernel, Isar theory and proof interpreter, and
wenzelm@61854
    20
  PIDE document management.
wenzelm@39823
    21
wenzelm@61854
    22
  The main aspects of Isabelle/ML are introduced below. These first-hand
wenzelm@61854
    23
  explanations should help to understand how proper Isabelle/ML is to be read
wenzelm@61854
    24
  and written, and to get access to the wealth of experience that is expressed
wenzelm@61854
    25
  in the source text and its history of changes.\<^footnote>\<open>See @{url
wenzelm@61854
    26
  "http://isabelle.in.tum.de/repos/isabelle"} for the full Mercurial history.
wenzelm@61854
    27
  There are symbolic tags to refer to official Isabelle releases, as opposed
wenzelm@61854
    28
  to arbitrary \<^emph>\<open>tip\<close> versions that merely reflect snapshots that are never
wenzelm@61854
    29
  really up-to-date.\<close>
wenzelm@61854
    30
\<close>
wenzelm@58618
    31
wenzelm@58618
    32
wenzelm@58618
    33
section \<open>Style and orthography\<close>
wenzelm@58618
    34
wenzelm@61854
    35
text \<open>
wenzelm@61854
    36
  The sources of Isabelle/Isar are optimized for \<^emph>\<open>readability\<close> and
wenzelm@61854
    37
  \<^emph>\<open>maintainability\<close>. The main purpose is to tell an informed reader what is
wenzelm@61854
    38
  really going on and how things really work. This is a non-trivial aim, but
wenzelm@61854
    39
  it is supported by a certain style of writing Isabelle/ML that has emerged
wenzelm@61854
    40
  from long years of system development.\<^footnote>\<open>See also the interesting style guide
wenzelm@61854
    41
  for OCaml @{url
wenzelm@61854
    42
  "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"} which shares
wenzelm@61854
    43
  many of our means and ends.\<close>
wenzelm@39878
    44
wenzelm@61854
    45
  The main principle behind any coding style is \<^emph>\<open>consistency\<close>. For a single
wenzelm@61854
    46
  author of a small program this merely means ``choose your style and stick to
wenzelm@61854
    47
  it''. A complex project like Isabelle, with long years of development and
wenzelm@61854
    48
  different contributors, requires more standardization. A coding style that
wenzelm@61854
    49
  is changed every few years or with every new contributor is no style at all,
wenzelm@61854
    50
  because consistency is quickly lost. Global consistency is hard to achieve,
wenzelm@61854
    51
  though. Nonetheless, one should always strive at least for local consistency
wenzelm@61854
    52
  of modules and sub-systems, without deviating from some general principles
wenzelm@61854
    53
  how to write Isabelle/ML.
wenzelm@39878
    54
wenzelm@61854
    55
  In a sense, good coding style is like an \<^emph>\<open>orthography\<close> for the sources: it
wenzelm@61854
    56
  helps to read quickly over the text and see through the main points, without
wenzelm@61854
    57
  getting distracted by accidental presentation of free-style code.
wenzelm@58618
    58
\<close>
wenzelm@58618
    59
wenzelm@58618
    60
wenzelm@58618
    61
subsection \<open>Header and sectioning\<close>
wenzelm@58618
    62
wenzelm@61854
    63
text \<open>
wenzelm@61854
    64
  Isabelle source files have a certain standardized header format (with
wenzelm@61854
    65
  precise spacing) that follows ancient traditions reaching back to the
wenzelm@61854
    66
  earliest versions of the system by Larry Paulson. See @{file
wenzelm@61854
    67
  "~~/src/Pure/thm.ML"}, for example.
wenzelm@39878
    68
wenzelm@61854
    69
  The header includes at least \<^verbatim>\<open>Title\<close> and \<^verbatim>\<open>Author\<close> entries, followed by a
wenzelm@61854
    70
  prose description of the purpose of the module. The latter can range from a
wenzelm@61854
    71
  single line to several paragraphs of explanations.
wenzelm@39878
    72
wenzelm@63610
    73
  The rest of the file is divided into chapters, sections, subsections,
wenzelm@63610
    74
  subsubsections, paragraphs etc.\ using a simple layout via ML comments as
wenzelm@63610
    75
  follows.
wenzelm@39878
    76
wenzelm@61458
    77
  @{verbatim [display]
wenzelm@63610
    78
\<open>  (**** chapter ****)
wenzelm@63610
    79
 
wenzelm@63610
    80
  (*** section ***)
wenzelm@58723
    81
wenzelm@58723
    82
  (** subsection **)
wenzelm@58723
    83
wenzelm@58723
    84
  (* subsubsection *)
wenzelm@58723
    85
wenzelm@58723
    86
  (*short paragraph*)
wenzelm@58723
    87
wenzelm@58723
    88
  (*
wenzelm@58723
    89
    long paragraph,
wenzelm@58723
    90
    with more text
wenzelm@61458
    91
  *)\<close>}
wenzelm@39878
    92
wenzelm@61854
    93
  As in regular typography, there is some extra space \<^emph>\<open>before\<close> section
wenzelm@61854
    94
  headings that are adjacent to plain text, but not other headings as in the
wenzelm@61854
    95
  example above.
wenzelm@39878
    96
wenzelm@61416
    97
  \<^medskip>
wenzelm@61854
    98
  The precise wording of the prose text given in these headings is chosen
wenzelm@61854
    99
  carefully to introduce the main theme of the subsequent formal ML text.
wenzelm@58618
   100
\<close>
wenzelm@58618
   101
wenzelm@58618
   102
wenzelm@58618
   103
subsection \<open>Naming conventions\<close>
wenzelm@58618
   104
wenzelm@61854
   105
text \<open>
wenzelm@61854
   106
  Since ML is the primary medium to express the meaning of the source text,
wenzelm@61854
   107
  naming of ML entities requires special care.
wenzelm@61854
   108
\<close>
wenzelm@61506
   109
wenzelm@61506
   110
paragraph \<open>Notation.\<close>
wenzelm@61854
   111
text \<open>
wenzelm@61854
   112
  A name consists of 1--3 \<^emph>\<open>words\<close> (rarely 4, but not more) that are separated
wenzelm@61854
   113
  by underscore. There are three variants concerning upper or lower case
wenzelm@61854
   114
  letters, which are used for certain ML categories as follows:
wenzelm@39880
   115
wenzelm@61416
   116
  \<^medskip>
wenzelm@39880
   117
  \begin{tabular}{lll}
wenzelm@39880
   118
  variant & example & ML categories \\\hline
wenzelm@40149
   119
  lower-case & @{ML_text foo_bar} & values, types, record fields \\
wenzelm@40149
   120
  capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
wenzelm@40149
   121
  upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
wenzelm@39880
   122
  \end{tabular}
wenzelm@61416
   123
  \<^medskip>
wenzelm@39880
   124
wenzelm@61854
   125
  For historical reasons, many capitalized names omit underscores, e.g.\
wenzelm@61854
   126
  old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}. Genuine
wenzelm@61854
   127
  mixed-case names are \<^emph>\<open>not\<close> used, because clear division of words is
wenzelm@61854
   128
  essential for readability.\<^footnote>\<open>Camel-case was invented to workaround the lack
wenzelm@61854
   129
  of underscore in some early non-ASCII character sets. Later it became
wenzelm@61854
   130
  habitual in some language communities that are now strong in numbers.\<close>
wenzelm@39880
   131
wenzelm@61854
   132
  A single (capital) character does not count as ``word'' in this respect:
wenzelm@61854
   133
  some Isabelle/ML names are suffixed by extra markers like this: @{ML_text
wenzelm@61854
   134
  foo_barT}.
wenzelm@39881
   135
wenzelm@61854
   136
  Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text foo'},
wenzelm@61854
   137
  @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text foo''''} or more.
wenzelm@61854
   138
  Decimal digits scale better to larger numbers, e.g.\ @{ML_text foo0},
wenzelm@61854
   139
  @{ML_text foo1}, @{ML_text foo42}.
wenzelm@61506
   140
\<close>
wenzelm@61506
   141
wenzelm@61506
   142
paragraph\<open>Scopes.\<close>
wenzelm@61854
   143
text \<open>
wenzelm@61854
   144
  Apart from very basic library modules, ML structures are not ``opened'', but
wenzelm@61854
   145
  names are referenced with explicit qualification, as in @{ML
wenzelm@61506
   146
  Syntax.string_of_term} for example. When devising names for structures and
wenzelm@61506
   147
  their components it is important to aim at eye-catching compositions of both
wenzelm@61506
   148
  parts, because this is how they are seen in the sources and documentation.
wenzelm@61506
   149
  For the same reasons, aliases of well-known library functions should be
wenzelm@40126
   150
  avoided.
wenzelm@39880
   151
wenzelm@61854
   152
  Local names of function abstraction or case/let bindings are typically
wenzelm@61854
   153
  shorter, sometimes using only rudiments of ``words'', while still avoiding
wenzelm@61854
   154
  cryptic shorthands. An auxiliary function called @{ML_text helper},
wenzelm@61854
   155
  @{ML_text aux}, or @{ML_text f} is considered bad style.
wenzelm@39880
   156
wenzelm@39880
   157
  Example:
wenzelm@39880
   158
wenzelm@61458
   159
  @{verbatim [display]
wenzelm@61458
   160
\<open>  (* RIGHT *)
wenzelm@39880
   161
wenzelm@39880
   162
  fun print_foo ctxt foo =
wenzelm@39880
   163
    let
wenzelm@39881
   164
      fun print t = ... Syntax.string_of_term ctxt t ...
wenzelm@39881
   165
    in ... end;
wenzelm@39881
   166
wenzelm@39881
   167
wenzelm@39881
   168
  (* RIGHT *)
wenzelm@39881
   169
wenzelm@39881
   170
  fun print_foo ctxt foo =
wenzelm@39881
   171
    let
wenzelm@39880
   172
      val string_of_term = Syntax.string_of_term ctxt;
wenzelm@39880
   173
      fun print t = ... string_of_term t ...
wenzelm@39880
   174
    in ... end;
wenzelm@39880
   175
wenzelm@39880
   176
wenzelm@39880
   177
  (* WRONG *)
wenzelm@39880
   178
wenzelm@39880
   179
  val string_of_term = Syntax.string_of_term;
wenzelm@39880
   180
wenzelm@39880
   181
  fun print_foo ctxt foo =
wenzelm@39880
   182
    let
wenzelm@39880
   183
      fun aux t = ... string_of_term ctxt t ...
wenzelm@61458
   184
    in ... end;\<close>}
wenzelm@61506
   185
\<close>
wenzelm@61506
   186
wenzelm@61506
   187
paragraph \<open>Specific conventions.\<close>
wenzelm@61854
   188
text \<open>
wenzelm@61854
   189
  Here are some specific name forms that occur frequently in the sources.
wenzelm@39881
   190
wenzelm@61854
   191
  \<^item> A function that maps @{ML_text foo} to @{ML_text bar} is called @{ML_text
wenzelm@61854
   192
  foo_to_bar} or @{ML_text bar_of_foo} (never @{ML_text foo2bar}, nor
wenzelm@61854
   193
  @{ML_text bar_from_foo}, nor @{ML_text bar_for_foo}, nor @{ML_text
wenzelm@61854
   194
  bar4foo}).
wenzelm@39881
   195
wenzelm@61854
   196
  \<^item> The name component @{ML_text legacy} means that the operation is about to
wenzelm@61854
   197
  be discontinued soon.
wenzelm@39881
   198
wenzelm@61854
   199
  \<^item> The name component @{ML_text global} means that this works with the
wenzelm@61854
   200
  background theory instead of the regular local context
wenzelm@61854
   201
  (\secref{sec:context}), sometimes for historical reasons, sometimes due a
wenzelm@61854
   202
  genuine lack of locality of the concept involved, sometimes as a fall-back
wenzelm@61854
   203
  for the lack of a proper context in the application code. Whenever there is
wenzelm@61854
   204
  a non-global variant available, the application should be migrated to use it
wenzelm@61854
   205
  with a proper local context.
wenzelm@39881
   206
wenzelm@61854
   207
  \<^item> Variables of the main context types of the Isabelle/Isar framework
wenzelm@61854
   208
  (\secref{sec:context} and \chref{ch:local-theory}) have firm naming
wenzelm@61854
   209
  conventions as follows:
wenzelm@39881
   210
wenzelm@61459
   211
    \<^item> theories are called @{ML_text thy}, rarely @{ML_text theory}
wenzelm@61458
   212
    (never @{ML_text thry})
wenzelm@61458
   213
wenzelm@61459
   214
    \<^item> proof contexts are called @{ML_text ctxt}, rarely @{ML_text
wenzelm@61458
   215
    context} (never @{ML_text ctx})
wenzelm@61458
   216
wenzelm@61459
   217
    \<^item> generic contexts are called @{ML_text context}
wenzelm@61459
   218
wenzelm@61459
   219
    \<^item> local theories are called @{ML_text lthy}, except for local
wenzelm@61458
   220
    theories that are treated as proof context (which is a semantic
wenzelm@61458
   221
    super-type)
wenzelm@39881
   222
wenzelm@61854
   223
  Variations with primed or decimal numbers are always possible, as well as
wenzelm@61854
   224
  semantic prefixes like @{ML_text foo_thy} or @{ML_text bar_ctxt}, but the
wenzelm@61854
   225
  base conventions above need to be preserved. This allows to emphasize their
wenzelm@61854
   226
  data flow via plain regular expressions in the text editor.
wenzelm@39881
   227
wenzelm@61854
   228
  \<^item> The main logical entities (\secref{ch:logic}) have established naming
wenzelm@61854
   229
  convention as follows:
wenzelm@39881
   230
wenzelm@61459
   231
    \<^item> sorts are called @{ML_text S}
wenzelm@61459
   232
wenzelm@61854
   233
    \<^item> types are called @{ML_text T}, @{ML_text U}, or @{ML_text ty} (never
wenzelm@61854
   234
    @{ML_text t})
wenzelm@61458
   235
wenzelm@61854
   236
    \<^item> terms are called @{ML_text t}, @{ML_text u}, or @{ML_text tm} (never
wenzelm@61854
   237
    @{ML_text trm})
wenzelm@61458
   238
wenzelm@61854
   239
    \<^item> certified types are called @{ML_text cT}, rarely @{ML_text T}, with
wenzelm@61854
   240
    variants as for types
wenzelm@61458
   241
wenzelm@61854
   242
    \<^item> certified terms are called @{ML_text ct}, rarely @{ML_text t}, with
wenzelm@61854
   243
    variants as for terms (never @{ML_text ctrm})
wenzelm@61458
   244
wenzelm@61459
   245
    \<^item> theorems are called @{ML_text th}, or @{ML_text thm}
wenzelm@39881
   246
wenzelm@61854
   247
  Proper semantic names override these conventions completely. For example,
wenzelm@61854
   248
  the left-hand side of an equation (as a term) can be called @{ML_text lhs}
wenzelm@61854
   249
  (not @{ML_text lhs_tm}). Or a term that is known to be a variable can be
wenzelm@61854
   250
  called @{ML_text v} or @{ML_text x}.
wenzelm@39881
   251
wenzelm@61854
   252
  \<^item> Tactics (\secref{sec:tactics}) are sufficiently important to have specific
wenzelm@61854
   253
  naming conventions. The name of a basic tactic definition always has a
wenzelm@61854
   254
  @{ML_text "_tac"} suffix, the subgoal index (if applicable) is always called
wenzelm@61854
   255
  @{ML_text i}, and the goal state (if made explicit) is usually called
wenzelm@61854
   256
  @{ML_text st} instead of the somewhat misleading @{ML_text thm}. Any other
wenzelm@61854
   257
  arguments are given before the latter two, and the general context is given
wenzelm@61854
   258
  first. Example:
wenzelm@40310
   259
wenzelm@61458
   260
  @{verbatim [display] \<open>  fun my_tac ctxt arg1 arg2 i st = ...\<close>}
wenzelm@40310
   261
wenzelm@61854
   262
  Note that the goal state @{ML_text st} above is rarely made explicit, if
wenzelm@61854
   263
  tactic combinators (tacticals) are used as usual.
wenzelm@40310
   264
wenzelm@57421
   265
  A tactic that requires a proof context needs to make that explicit as seen
wenzelm@61854
   266
  in the \<^verbatim>\<open>ctxt\<close> argument above. Do not refer to the background theory of
wenzelm@61854
   267
  \<^verbatim>\<open>st\<close> -- it is not a proper context, but merely a formal certificate.
wenzelm@58618
   268
\<close>
wenzelm@58618
   269
wenzelm@58618
   270
wenzelm@58618
   271
subsection \<open>General source layout\<close>
wenzelm@58618
   272
wenzelm@58618
   273
text \<open>
wenzelm@57421
   274
  The general Isabelle/ML source layout imitates regular type-setting
wenzelm@57421
   275
  conventions, augmented by the requirements for deeply nested expressions
wenzelm@61854
   276
  that are commonplace in functional programming.
wenzelm@61854
   277
\<close>
wenzelm@61506
   278
wenzelm@61506
   279
paragraph \<open>Line length\<close>
wenzelm@61854
   280
text \<open>
wenzelm@61854
   281
  is limited to 80 characters according to ancient standards, but we allow as
wenzelm@61854
   282
  much as 100 characters (not more).\<^footnote>\<open>Readability requires to keep the
wenzelm@61854
   283
  beginning of a line in view while watching its end. Modern wide-screen
wenzelm@61506
   284
  displays do not change the way how the human brain works. Sources also need
wenzelm@61572
   285
  to be printable on plain paper with reasonable font-size.\<close> The extra 20
wenzelm@61506
   286
  characters acknowledge the space requirements due to qualified library
wenzelm@61854
   287
  references in Isabelle/ML.
wenzelm@61854
   288
\<close>
wenzelm@61506
   289
wenzelm@61506
   290
paragraph \<open>White-space\<close>
wenzelm@61854
   291
text \<open>
wenzelm@61854
   292
  is used to emphasize the structure of expressions, following mostly standard
wenzelm@61854
   293
  conventions for mathematical typesetting, as can be seen in plain {\TeX} or
wenzelm@61854
   294
  {\LaTeX}. This defines positioning of spaces for parentheses, punctuation,
wenzelm@61854
   295
  and infixes as illustrated here:
wenzelm@39878
   296
wenzelm@61458
   297
  @{verbatim [display]
wenzelm@61458
   298
\<open>  val x = y + z * (a + b);
wenzelm@39878
   299
  val pair = (a, b);
wenzelm@61458
   300
  val record = {foo = 1, bar = 2};\<close>}
wenzelm@39878
   301
wenzelm@61854
   302
  Lines are normally broken \<^emph>\<open>after\<close> an infix operator or punctuation
wenzelm@61854
   303
  character. For example:
wenzelm@39878
   304
wenzelm@61458
   305
  @{verbatim [display]
wenzelm@61458
   306
\<open>
wenzelm@39878
   307
  val x =
wenzelm@39878
   308
    a +
wenzelm@39878
   309
    b +
wenzelm@39878
   310
    c;
wenzelm@39878
   311
wenzelm@39878
   312
  val tuple =
wenzelm@39878
   313
   (a,
wenzelm@39878
   314
    b,
wenzelm@39878
   315
    c);
wenzelm@61458
   316
\<close>}
wenzelm@39878
   317
wenzelm@61854
   318
  Some special infixes (e.g.\ @{ML_text "|>"}) work better at the start of the
wenzelm@61854
   319
  line, but punctuation is always at the end.
wenzelm@39878
   320
wenzelm@61854
   321
  Function application follows the tradition of \<open>\<lambda>\<close>-calculus, not informal
wenzelm@61854
   322
  mathematics. For example: @{ML_text "f a b"} for a curried function, or
wenzelm@61854
   323
  @{ML_text "g (a, b)"} for a tupled function. Note that the space between
wenzelm@61854
   324
  @{ML_text g} and the pair @{ML_text "(a, b)"} follows the important
wenzelm@61854
   325
  principle of \<^emph>\<open>compositionality\<close>: the layout of @{ML_text "g p"} does not
wenzelm@61854
   326
  change when @{ML_text "p"} is refined to the concrete pair @{ML_text "(a,
wenzelm@61854
   327
  b)"}.
wenzelm@61506
   328
\<close>
wenzelm@61506
   329
wenzelm@61506
   330
paragraph \<open>Indentation\<close>
wenzelm@61854
   331
text \<open>
wenzelm@61854
   332
  uses plain spaces, never hard tabulators.\<^footnote>\<open>Tabulators were invented to move
wenzelm@61854
   333
  the carriage of a type-writer to certain predefined positions. In software
wenzelm@61854
   334
  they could be used as a primitive run-length compression of consecutive
wenzelm@61854
   335
  spaces, but the precise result would depend on non-standardized text editor
wenzelm@61854
   336
  configuration.\<close>
wenzelm@39878
   337
wenzelm@61854
   338
  Each level of nesting is indented by 2 spaces, sometimes 1, very rarely 4,
wenzelm@61854
   339
  never 8 or any other odd number.
wenzelm@39878
   340
wenzelm@61854
   341
  Indentation follows a simple logical format that only depends on the nesting
wenzelm@61854
   342
  depth, not the accidental length of the text that initiates a level of
wenzelm@61854
   343
  nesting. Example:
wenzelm@39878
   344
wenzelm@61458
   345
  @{verbatim [display]
wenzelm@61458
   346
\<open>  (* RIGHT *)
wenzelm@39880
   347
wenzelm@39878
   348
  if b then
wenzelm@39879
   349
    expr1_part1
wenzelm@39879
   350
    expr1_part2
wenzelm@39878
   351
  else
wenzelm@39879
   352
    expr2_part1
wenzelm@39879
   353
    expr2_part2
wenzelm@39878
   354
wenzelm@39880
   355
wenzelm@39880
   356
  (* WRONG *)
wenzelm@39880
   357
wenzelm@39879
   358
  if b then expr1_part1
wenzelm@39879
   359
            expr1_part2
wenzelm@39879
   360
  else expr2_part1
wenzelm@61458
   361
       expr2_part2\<close>}
wenzelm@39878
   362
wenzelm@61854
   363
  The second form has many problems: it assumes a fixed-width font when
wenzelm@61854
   364
  viewing the sources, it uses more space on the line and thus makes it hard
wenzelm@61854
   365
  to observe its strict length limit (working against \<^emph>\<open>readability\<close>), it
wenzelm@61854
   366
  requires extra editing to adapt the layout to changes of the initial text
wenzelm@61854
   367
  (working against \<^emph>\<open>maintainability\<close>) etc.
wenzelm@39878
   368
wenzelm@61416
   369
  \<^medskip>
wenzelm@61854
   370
  For similar reasons, any kind of two-dimensional or tabular layouts,
wenzelm@61854
   371
  ASCII-art with lines or boxes of asterisks etc.\ should be avoided.
wenzelm@61506
   372
\<close>
wenzelm@61506
   373
wenzelm@61506
   374
paragraph \<open>Complex expressions\<close>
wenzelm@61854
   375
text \<open>
wenzelm@61854
   376
  that consist of multi-clausal function definitions, @{ML_text handle},
wenzelm@61506
   377
  @{ML_text case}, @{ML_text let} (and combinations) require special
wenzelm@61506
   378
  attention. The syntax of Standard ML is quite ambitious and admits a lot of
wenzelm@40126
   379
  variance that can distort the meaning of the text.
wenzelm@39881
   380
wenzelm@57421
   381
  Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
wenzelm@61854
   382
  @{ML_text case} get extra indentation to indicate the nesting clearly.
wenzelm@61854
   383
  Example:
wenzelm@39881
   384
wenzelm@61458
   385
  @{verbatim [display]
wenzelm@61458
   386
\<open>  (* RIGHT *)
wenzelm@39881
   387
wenzelm@39881
   388
  fun foo p1 =
wenzelm@39881
   389
        expr1
wenzelm@39881
   390
    | foo p2 =
wenzelm@39881
   391
        expr2
wenzelm@39881
   392
wenzelm@39881
   393
wenzelm@39881
   394
  (* WRONG *)
wenzelm@39881
   395
wenzelm@39881
   396
  fun foo p1 =
wenzelm@39881
   397
    expr1
wenzelm@39881
   398
    | foo p2 =
wenzelm@61458
   399
    expr2\<close>}
wenzelm@39881
   400
wenzelm@61854
   401
  Body expressions consisting of @{ML_text case} or @{ML_text let} require
wenzelm@61854
   402
  care to maintain compositionality, to prevent loss of logical indentation
wenzelm@61854
   403
  where it is especially important to see the structure of the text. Example:
wenzelm@39881
   404
wenzelm@61458
   405
  @{verbatim [display]
wenzelm@61458
   406
\<open>  (* RIGHT *)
wenzelm@39881
   407
wenzelm@39881
   408
  fun foo p1 =
wenzelm@39881
   409
        (case e of
wenzelm@39881
   410
          q1 => ...
wenzelm@39881
   411
        | q2 => ...)
wenzelm@39881
   412
    | foo p2 =
wenzelm@39881
   413
        let
wenzelm@39881
   414
          ...
wenzelm@39881
   415
        in
wenzelm@39881
   416
          ...
wenzelm@39881
   417
        end
wenzelm@39881
   418
wenzelm@39881
   419
wenzelm@39881
   420
  (* WRONG *)
wenzelm@39881
   421
wenzelm@39881
   422
  fun foo p1 = case e of
wenzelm@39881
   423
      q1 => ...
wenzelm@39881
   424
    | q2 => ...
wenzelm@39881
   425
    | foo p2 =
wenzelm@39881
   426
    let
wenzelm@39881
   427
      ...
wenzelm@39881
   428
    in
wenzelm@39881
   429
      ...
wenzelm@61458
   430
    end\<close>}
wenzelm@39881
   431
wenzelm@61854
   432
  Extra parentheses around @{ML_text case} expressions are optional, but help
wenzelm@61854
   433
  to analyse the nesting based on character matching in the text editor.
wenzelm@39881
   434
wenzelm@61416
   435
  \<^medskip>
wenzelm@61854
   436
  There are two main exceptions to the overall principle of compositionality
wenzelm@61854
   437
  in the layout of complex expressions.
wenzelm@39881
   438
wenzelm@61416
   439
  \<^enum> @{ML_text "if"} expressions are iterated as if ML had multi-branch
wenzelm@57421
   440
  conditionals, e.g.
wenzelm@39881
   441
wenzelm@61458
   442
  @{verbatim [display]
wenzelm@61458
   443
\<open>  (* RIGHT *)
wenzelm@39881
   444
wenzelm@39881
   445
  if b1 then e1
wenzelm@39881
   446
  else if b2 then e2
wenzelm@61458
   447
  else e3\<close>}
wenzelm@39881
   448
wenzelm@61854
   449
  \<^enum> @{ML_text fn} abstractions are often layed-out as if they would lack any
wenzelm@61854
   450
  structure by themselves. This traditional form is motivated by the
wenzelm@61854
   451
  possibility to shift function arguments back and forth wrt.\ additional
wenzelm@61854
   452
  combinators. Example:
wenzelm@39881
   453
wenzelm@61458
   454
  @{verbatim [display]
wenzelm@61458
   455
\<open>  (* RIGHT *)
wenzelm@39881
   456
wenzelm@39881
   457
  fun foo x y = fold (fn z =>
wenzelm@61458
   458
    expr)\<close>}
wenzelm@39881
   459
wenzelm@40149
   460
  Here the visual appearance is that of three arguments @{ML_text x},
wenzelm@57421
   461
  @{ML_text y}, @{ML_text z} in a row.
wenzelm@39881
   462
wenzelm@39881
   463
wenzelm@61854
   464
  Such weakly structured layout should be use with great care. Here are some
wenzelm@61854
   465
  counter-examples involving @{ML_text let} expressions:
wenzelm@39881
   466
wenzelm@61458
   467
  @{verbatim [display]
wenzelm@61458
   468
\<open>  (* WRONG *)
wenzelm@39881
   469
wenzelm@39881
   470
  fun foo x = let
wenzelm@39881
   471
      val y = ...
wenzelm@39881
   472
    in ... end
wenzelm@39881
   473
wenzelm@41162
   474
wenzelm@41162
   475
  (* WRONG *)
wenzelm@41162
   476
wenzelm@40153
   477
  fun foo x = let
wenzelm@40153
   478
    val y = ...
wenzelm@40153
   479
  in ... end
wenzelm@40153
   480
wenzelm@41162
   481
wenzelm@41162
   482
  (* WRONG *)
wenzelm@41162
   483
wenzelm@39881
   484
  fun foo x =
wenzelm@39881
   485
  let
wenzelm@39881
   486
    val y = ...
wenzelm@39881
   487
  in ... end
wenzelm@57421
   488
wenzelm@57421
   489
wenzelm@57421
   490
  (* WRONG *)
wenzelm@57421
   491
wenzelm@57421
   492
  fun foo x =
wenzelm@57421
   493
    let
wenzelm@57421
   494
      val y = ...
wenzelm@57421
   495
    in
wenzelm@61458
   496
      ... end\<close>}
wenzelm@39881
   497
wenzelm@61416
   498
  \<^medskip>
wenzelm@61854
   499
  In general the source layout is meant to emphasize the structure of complex
wenzelm@61854
   500
  language expressions, not to pretend that SML had a completely different
wenzelm@61854
   501
  syntax (say that of Haskell, Scala, Java).
wenzelm@58618
   502
\<close>
wenzelm@58618
   503
wenzelm@58618
   504
wenzelm@58618
   505
section \<open>ML embedded into Isabelle/Isar\<close>
wenzelm@58618
   506
wenzelm@61854
   507
text \<open>
wenzelm@61854
   508
  ML and Isar are intertwined via an open-ended bootstrap process that
wenzelm@61854
   509
  provides more and more programming facilities and logical content in an
wenzelm@61854
   510
  alternating manner. Bootstrapping starts from the raw environment of
wenzelm@62354
   511
  existing implementations of Standard ML (mainly Poly/ML).
wenzelm@39823
   512
wenzelm@57421
   513
  Isabelle/Pure marks the point where the raw ML toplevel is superseded by
wenzelm@57421
   514
  Isabelle/ML within the Isar theory and proof language, with a uniform
wenzelm@57421
   515
  context for arbitrary ML values (see also \secref{sec:context}). This formal
wenzelm@57421
   516
  environment holds ML compiler bindings, logical entities, and many other
wenzelm@57421
   517
  things.
wenzelm@57421
   518
wenzelm@57421
   519
  Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar
wenzelm@57421
   520
  environment by introducing suitable theories with associated ML modules,
wenzelm@61503
   521
  either inlined within \<^verbatim>\<open>.thy\<close> files, or as separate \<^verbatim>\<open>.ML\<close> files that are
wenzelm@61854
   522
  loading from some theory. Thus Isabelle/HOL is defined as a regular
wenzelm@61854
   523
  user-space application within the Isabelle framework. Further add-on tools
wenzelm@61854
   524
  can be implemented in ML within the Isar context in the same manner: ML is
wenzelm@61854
   525
  part of the standard repertoire of Isabelle, and there is no distinction
wenzelm@61854
   526
  between ``users'' and ``developers'' in this respect.
wenzelm@58618
   527
\<close>
wenzelm@58618
   528
wenzelm@58618
   529
wenzelm@58618
   530
subsection \<open>Isar ML commands\<close>
wenzelm@58618
   531
wenzelm@58618
   532
text \<open>
wenzelm@57421
   533
  The primary Isar source language provides facilities to ``open a window'' to
wenzelm@57421
   534
  the underlying ML compiler. Especially see the Isar commands @{command_ref
wenzelm@57421
   535
  "ML_file"} and @{command_ref "ML"}: both work the same way, but the source
wenzelm@57421
   536
  text is provided differently, via a file vs.\ inlined, respectively. Apart
wenzelm@57421
   537
  from embedding ML into the main theory definition like that, there are many
wenzelm@57421
   538
  more commands that refer to ML source, such as @{command_ref setup} or
wenzelm@57421
   539
  @{command_ref declaration}. Even more fine-grained embedding of ML into Isar
wenzelm@57421
   540
  is encountered in the proof method @{method_ref tactic}, which refines the
wenzelm@57421
   541
  pending goal state via a given expression of type @{ML_type tactic}.
wenzelm@58618
   542
\<close>
wenzelm@58618
   543
wenzelm@61854
   544
text %mlex \<open>
wenzelm@61854
   545
  The following artificial example demonstrates some ML toplevel declarations
wenzelm@61854
   546
  within the implicit Isar theory context. This is regular functional
wenzelm@61854
   547
  programming without referring to logical entities yet.
wenzelm@58618
   548
\<close>
wenzelm@58618
   549
wenzelm@58618
   550
ML \<open>
wenzelm@39823
   551
  fun factorial 0 = 1
wenzelm@39823
   552
    | factorial n = n * factorial (n - 1)
wenzelm@58618
   553
\<close>
wenzelm@58618
   554
wenzelm@61854
   555
text \<open>
wenzelm@61854
   556
  Here the ML environment is already managed by Isabelle, i.e.\ the @{ML
wenzelm@61854
   557
  factorial} function is not yet accessible in the preceding paragraph, nor in
wenzelm@61854
   558
  a different theory that is independent from the current one in the import
wenzelm@61854
   559
  hierarchy.
wenzelm@39823
   560
wenzelm@57421
   561
  Removing the above ML declaration from the source text will remove any trace
wenzelm@57421
   562
  of this definition, as expected. The Isabelle/ML toplevel environment is
wenzelm@61854
   563
  managed in a \<^emph>\<open>stateless\<close> way: in contrast to the raw ML toplevel, there are
wenzelm@61854
   564
  no global side-effects involved here.\<^footnote>\<open>Such a stateless compilation
wenzelm@61854
   565
  environment is also a prerequisite for robust parallel compilation within
wenzelm@61854
   566
  independent nodes of the implicit theory development graph.\<close>
wenzelm@39823
   567
wenzelm@61416
   568
  \<^medskip>
wenzelm@61854
   569
  The next example shows how to embed ML into Isar proofs, using @{command_ref
wenzelm@61854
   570
  "ML_prf"} instead of @{command_ref "ML"}. As illustrated below, the effect
wenzelm@61854
   571
  on the ML environment is local to the whole proof body, but ignoring the
wenzelm@61854
   572
  block structure.
wenzelm@61854
   573
\<close>
wenzelm@39823
   574
wenzelm@40964
   575
notepad
wenzelm@40964
   576
begin
wenzelm@58618
   577
  ML_prf %"ML" \<open>val a = 1\<close>
wenzelm@40126
   578
  {
wenzelm@58618
   579
    ML_prf %"ML" \<open>val b = a + 1\<close>
wenzelm@61580
   580
  } \<comment> \<open>Isar block structure ignored by ML environment\<close>
wenzelm@58618
   581
  ML_prf %"ML" \<open>val c = b + 1\<close>
wenzelm@40964
   582
end
wenzelm@39823
   583
wenzelm@61854
   584
text \<open>
wenzelm@61854
   585
  By side-stepping the normal scoping rules for Isar proof blocks, embedded ML
wenzelm@61854
   586
  code can refer to the different contexts and manipulate corresponding
wenzelm@61854
   587
  entities, e.g.\ export a fact from a block context.
wenzelm@39823
   588
wenzelm@61416
   589
  \<^medskip>
wenzelm@61854
   590
  Two further ML commands are useful in certain situations: @{command_ref
wenzelm@61854
   591
  ML_val} and @{command_ref ML_command} are \<^emph>\<open>diagnostic\<close> in the sense that
wenzelm@61854
   592
  there is no effect on the underlying environment, and can thus be used
wenzelm@61854
   593
  anywhere. The examples below produce long strings of digits by invoking @{ML
wenzelm@61854
   594
  factorial}: @{command ML_val} takes care of printing the ML toplevel result,
wenzelm@61854
   595
  but @{command ML_command} is silent so we produce an explicit output
wenzelm@61854
   596
  message.
wenzelm@58618
   597
\<close>
wenzelm@58618
   598
wenzelm@58618
   599
ML_val \<open>factorial 100\<close>
wenzelm@58618
   600
ML_command \<open>writeln (string_of_int (factorial 100))\<close>
wenzelm@39823
   601
wenzelm@40964
   602
notepad
wenzelm@40964
   603
begin
wenzelm@58618
   604
  ML_val \<open>factorial 100\<close>
wenzelm@58618
   605
  ML_command \<open>writeln (string_of_int (factorial 100))\<close>
wenzelm@40964
   606
end
wenzelm@39823
   607
wenzelm@39823
   608
wenzelm@58618
   609
subsection \<open>Compile-time context\<close>
wenzelm@58618
   610
wenzelm@61854
   611
text \<open>
wenzelm@61854
   612
  Whenever the ML compiler is invoked within Isabelle/Isar, the formal context
wenzelm@61854
   613
  is passed as a thread-local reference variable. Thus ML code may access the
wenzelm@61854
   614
  theory context during compilation, by reading or writing the (local) theory
wenzelm@61854
   615
  under construction. Note that such direct access to the compile-time context
wenzelm@61854
   616
  is rare. In practice it is typically done via some derived ML functions
wenzelm@61854
   617
  instead.
wenzelm@58618
   618
\<close>
wenzelm@58618
   619
wenzelm@58618
   620
text %mlref \<open>
wenzelm@39825
   621
  \begin{mldecls}
wenzelm@62876
   622
  @{index_ML Context.the_generic_context: "unit -> Context.generic"} \\
wenzelm@40126
   623
  @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
wenzelm@56199
   624
  @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
wenzelm@56199
   625
  @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
wenzelm@39825
   626
  \end{mldecls}
wenzelm@39825
   627
wenzelm@62876
   628
    \<^descr> @{ML "Context.the_generic_context ()"} refers to the theory context of
wenzelm@61854
   629
    the ML toplevel --- at compile time. ML code needs to take care to refer to
wenzelm@62876
   630
    @{ML "Context.the_generic_context ()"} correctly. Recall that evaluation
wenzelm@61854
   631
    of a function body is delayed until actual run-time.
wenzelm@39825
   632
wenzelm@61854
   633
    \<^descr> @{ML "Context.>>"}~\<open>f\<close> applies context transformation \<open>f\<close> to the implicit
wenzelm@61854
   634
    context of the ML toplevel.
wenzelm@61493
   635
wenzelm@61854
   636
    \<^descr> @{ML ML_Thms.bind_thms}~\<open>(name, thms)\<close> stores a list of theorems produced
wenzelm@61854
   637
    in ML both in the (global) theory context and the ML toplevel, associating
wenzelm@61854
   638
    it with the provided name.
wenzelm@39850
   639
wenzelm@61854
   640
    \<^descr> @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to
wenzelm@61854
   641
    a singleton fact.
wenzelm@39825
   642
wenzelm@61854
   643
  It is important to note that the above functions are really restricted to
wenzelm@61854
   644
  the compile time, even though the ML compiler is invoked at run-time. The
wenzelm@61854
   645
  majority of ML code either uses static antiquotations
wenzelm@61854
   646
  (\secref{sec:ML-antiq}) or refers to the theory or proof context at
wenzelm@61854
   647
  run-time, by explicit functional abstraction.
wenzelm@58618
   648
\<close>
wenzelm@58618
   649
wenzelm@58618
   650
wenzelm@58618
   651
subsection \<open>Antiquotations \label{sec:ML-antiq}\<close>
wenzelm@58618
   652
wenzelm@61854
   653
text \<open>
wenzelm@61854
   654
  A very important consequence of embedding ML into Isar is the concept of
wenzelm@61854
   655
  \<^emph>\<open>ML antiquotation\<close>. The standard token language of ML is augmented by
wenzelm@61854
   656
  special syntactic entities of the following form:
wenzelm@39827
   657
wenzelm@55112
   658
  @{rail \<open>
wenzelm@62969
   659
  @{syntax_def antiquote}: '@{' name args '}'
wenzelm@55112
   660
  \<close>}
wenzelm@39827
   661
wenzelm@62969
   662
  Here @{syntax name} and @{syntax args} are outer syntax categories, as
wenzelm@58555
   663
  defined in @{cite "isabelle-isar-ref"}.
wenzelm@39823
   664
wenzelm@61416
   665
  \<^medskip>
wenzelm@61854
   666
  A regular antiquotation \<open>@{name args}\<close> processes its arguments by the usual
wenzelm@61854
   667
  means of the Isar source language, and produces corresponding ML source
wenzelm@61854
   668
  text, either as literal \<^emph>\<open>inline\<close> text (e.g.\ \<open>@{term t}\<close>) or abstract
wenzelm@61854
   669
  \<^emph>\<open>value\<close> (e.g. \<open>@{thm th}\<close>). This pre-compilation scheme allows to refer to
wenzelm@61854
   670
  formal entities in a robust manner, with proper static scoping and with some
wenzelm@61854
   671
  degree of logical checking of small portions of the code.
wenzelm@58618
   672
\<close>
wenzelm@58618
   673
wenzelm@58618
   674
wenzelm@58618
   675
subsection \<open>Printing ML values\<close>
wenzelm@58618
   676
wenzelm@61854
   677
text \<open>
wenzelm@61854
   678
  The ML compiler knows about the structure of values according to their
wenzelm@61854
   679
  static type, and can print them in the manner of its toplevel, although the
wenzelm@61854
   680
  details are non-portable. The antiquotations @{ML_antiquotation_def
wenzelm@61854
   681
  "make_string"} and @{ML_antiquotation_def "print"} provide a quasi-portable
wenzelm@61854
   682
  way to refer to this potential capability of the underlying ML system in
wenzelm@56399
   683
  generic Isabelle/ML sources.
wenzelm@56399
   684
wenzelm@61854
   685
  This is occasionally useful for diagnostic or demonstration purposes. Note
wenzelm@61854
   686
  that production-quality tools require proper user-level error messages,
wenzelm@61854
   687
  avoiding raw ML values in the output.
wenzelm@61854
   688
\<close>
wenzelm@58618
   689
wenzelm@58618
   690
text %mlantiq \<open>
wenzelm@51636
   691
  \begin{matharray}{rcl}
wenzelm@61493
   692
  @{ML_antiquotation_def "make_string"} & : & \<open>ML_antiquotation\<close> \\
wenzelm@61493
   693
  @{ML_antiquotation_def "print"} & : & \<open>ML_antiquotation\<close> \\
wenzelm@51636
   694
  \end{matharray}
wenzelm@51636
   695
wenzelm@55112
   696
  @{rail \<open>
wenzelm@51636
   697
  @@{ML_antiquotation make_string}
wenzelm@56399
   698
  ;
wenzelm@56399
   699
  @@{ML_antiquotation print} @{syntax name}?
wenzelm@55112
   700
  \<close>}
wenzelm@51636
   701
wenzelm@61854
   702
  \<^descr> \<open>@{make_string}\<close> inlines a function to print arbitrary values similar to
wenzelm@61854
   703
  the ML toplevel. The result is compiler dependent and may fall back on "?"
wenzelm@61854
   704
  in certain situations. The value of configuration option @{attribute_ref
wenzelm@61854
   705
  ML_print_depth} determines further details of output.
wenzelm@56399
   706
wenzelm@61854
   707
  \<^descr> \<open>@{print f}\<close> uses the ML function \<open>f: string -> unit\<close> to output the result
wenzelm@61854
   708
  of \<open>@{make_string}\<close> above, together with the source position of the
wenzelm@61854
   709
  antiquotation. The default output function is @{ML writeln}.
wenzelm@58618
   710
\<close>
wenzelm@58618
   711
wenzelm@61854
   712
text %mlex \<open>
wenzelm@61854
   713
  The following artificial examples show how to produce adhoc output of ML
wenzelm@61854
   714
  values for debugging purposes.
wenzelm@61854
   715
\<close>
wenzelm@58618
   716
wenzelm@59902
   717
ML_val \<open>
wenzelm@51636
   718
  val x = 42;
wenzelm@51636
   719
  val y = true;
wenzelm@51636
   720
wenzelm@51636
   721
  writeln (@{make_string} {x = x, y = y});
wenzelm@56399
   722
wenzelm@56399
   723
  @{print} {x = x, y = y};
wenzelm@56399
   724
  @{print tracing} {x = x, y = y};
wenzelm@58618
   725
\<close>
wenzelm@58618
   726
wenzelm@58618
   727
wenzelm@58618
   728
section \<open>Canonical argument order \label{sec:canonical-argument-order}\<close>
wenzelm@58618
   729
wenzelm@61854
   730
text \<open>
wenzelm@61854
   731
  Standard ML is a language in the tradition of \<open>\<lambda>\<close>-calculus and
wenzelm@61854
   732
  \<^emph>\<open>higher-order functional programming\<close>, similar to OCaml, Haskell, or
wenzelm@61854
   733
  Isabelle/Pure and HOL as logical languages. Getting acquainted with the
wenzelm@61854
   734
  native style of representing functions in that setting can save a lot of
wenzelm@61854
   735
  extra boiler-plate of redundant shuffling of arguments, auxiliary
wenzelm@61854
   736
  abstractions etc.
wenzelm@39883
   737
wenzelm@61854
   738
  Functions are usually \<^emph>\<open>curried\<close>: the idea of turning arguments of type
wenzelm@61854
   739
  \<open>\<tau>\<^sub>i\<close> (for \<open>i \<in> {1, \<dots> n}\<close>) into a result of type \<open>\<tau>\<close> is represented by the
wenzelm@61854
   740
  iterated function space \<open>\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>. This is isomorphic to the
wenzelm@61854
   741
  well-known encoding via tuples \<open>\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>, but the curried version
wenzelm@61854
   742
  fits more smoothly into the basic calculus.\<^footnote>\<open>The difference is even more
wenzelm@61854
   743
  significant in HOL, because the redundant tuple structure needs to be
wenzelm@61854
   744
  accommodated extraneous proof steps.\<close>
wenzelm@39883
   745
wenzelm@61854
   746
  Currying gives some flexibility due to \<^emph>\<open>partial application\<close>. A function
wenzelm@61854
   747
  \<open>f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>\<close> can be applied to \<open>x: \<tau>\<^sub>1\<close> and the remaining \<open>(f x): \<tau>\<^sub>2
wenzelm@61854
   748
  \<rightarrow> \<tau>\<close> passed to another function etc. How well this works in practice depends
wenzelm@61854
   749
  on the order of arguments. In the worst case, arguments are arranged
wenzelm@61854
   750
  erratically, and using a function in a certain situation always requires
wenzelm@61854
   751
  some glue code. Thus we would get exponentially many opportunities to
wenzelm@39883
   752
  decorate the code with meaningless permutations of arguments.
wenzelm@39883
   753
wenzelm@61854
   754
  This can be avoided by \<^emph>\<open>canonical argument order\<close>, which observes certain
wenzelm@61854
   755
  standard patterns and minimizes adhoc permutations in their application. In
wenzelm@61854
   756
  Isabelle/ML, large portions of text can be written without auxiliary
wenzelm@61854
   757
  operations like \<open>swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times> \<alpha>\<close> or \<open>C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)\<close> (the
wenzelm@61854
   758
  latter is not present in the Isabelle/ML library).
wenzelm@39883
   759
wenzelm@61416
   760
  \<^medskip>
wenzelm@61854
   761
  The main idea is that arguments that vary less are moved further to the left
wenzelm@61854
   762
  than those that vary more. Two particularly important categories of
wenzelm@61854
   763
  functions are \<^emph>\<open>selectors\<close> and \<^emph>\<open>updates\<close>.
wenzelm@39883
   764
wenzelm@61854
   765
  The subsequent scheme is based on a hypothetical set-like container of type
wenzelm@61854
   766
  \<open>\<beta>\<close> that manages elements of type \<open>\<alpha>\<close>. Both the names and types of the
wenzelm@61854
   767
  associated operations are canonical for Isabelle/ML.
wenzelm@39883
   768
wenzelm@52416
   769
  \begin{center}
wenzelm@39883
   770
  \begin{tabular}{ll}
wenzelm@39883
   771
  kind & canonical name and type \\\hline
wenzelm@61493
   772
  selector & \<open>member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool\<close> \\
wenzelm@61493
   773
  update & \<open>insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> \\
wenzelm@39883
   774
  \end{tabular}
wenzelm@52416
   775
  \end{center}
wenzelm@39883
   776
wenzelm@61854
   777
  Given a container \<open>B: \<beta>\<close>, the partially applied \<open>member B\<close> is a predicate
wenzelm@61854
   778
  over elements \<open>\<alpha> \<rightarrow> bool\<close>, and thus represents the intended denotation
wenzelm@61854
   779
  directly. It is customary to pass the abstract predicate to further
wenzelm@61854
   780
  operations, not the concrete container. The argument order makes it easy to
wenzelm@61854
   781
  use other combinators: \<open>forall (member B) list\<close> will check a list of
wenzelm@61854
   782
  elements for membership in \<open>B\<close> etc. Often the explicit \<open>list\<close> is pointless
wenzelm@61854
   783
  and can be contracted to \<open>forall (member B)\<close> to get directly a predicate
wenzelm@61854
   784
  again.
wenzelm@39883
   785
wenzelm@61854
   786
  In contrast, an update operation varies the container, so it moves to the
wenzelm@61854
   787
  right: \<open>insert a\<close> is a function \<open>\<beta> \<rightarrow> \<beta>\<close> to insert a value \<open>a\<close>. These can be
wenzelm@61854
   788
  composed naturally as \<open>insert c \<circ> insert b \<circ> insert a\<close>. The slightly awkward
wenzelm@61854
   789
  inversion of the composition order is due to conventional mathematical
wenzelm@61854
   790
  notation, which can be easily amended as explained below.
wenzelm@58618
   791
\<close>
wenzelm@58618
   792
wenzelm@58618
   793
wenzelm@58618
   794
subsection \<open>Forward application and composition\<close>
wenzelm@58618
   795
wenzelm@61854
   796
text \<open>
wenzelm@61854
   797
  Regular function application and infix notation works best for relatively
wenzelm@61854
   798
  deeply structured expressions, e.g.\ \<open>h (f x y + g z)\<close>. The important
wenzelm@61854
   799
  special case of \<^emph>\<open>linear transformation\<close> applies a cascade of functions \<open>f\<^sub>n
wenzelm@61854
   800
  (\<dots> (f\<^sub>1 x))\<close>. This becomes hard to read and maintain if the functions are
wenzelm@61854
   801
  themselves given as complex expressions. The notation can be significantly
wenzelm@61854
   802
  improved by introducing \<^emph>\<open>forward\<close> versions of application and composition
wenzelm@61854
   803
  as follows:
wenzelm@39883
   804
wenzelm@61416
   805
  \<^medskip>
wenzelm@39883
   806
  \begin{tabular}{lll}
wenzelm@61493
   807
  \<open>x |> f\<close> & \<open>\<equiv>\<close> & \<open>f x\<close> \\
wenzelm@61493
   808
  \<open>(f #> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |> g\<close> \\
wenzelm@39883
   809
  \end{tabular}
wenzelm@61416
   810
  \<^medskip>
wenzelm@39883
   811
wenzelm@61854
   812
  This enables to write conveniently \<open>x |> f\<^sub>1 |> \<dots> |> f\<^sub>n\<close> or \<open>f\<^sub>1 #> \<dots> #>
wenzelm@61854
   813
  f\<^sub>n\<close> for its functional abstraction over \<open>x\<close>.
wenzelm@39883
   814
wenzelm@61416
   815
  \<^medskip>
wenzelm@61854
   816
  There is an additional set of combinators to accommodate multiple results
wenzelm@61854
   817
  (via pairs) that are passed on as multiple arguments (via currying).
wenzelm@39883
   818
wenzelm@61416
   819
  \<^medskip>
wenzelm@39883
   820
  \begin{tabular}{lll}
wenzelm@61493
   821
  \<open>(x, y) |-> f\<close> & \<open>\<equiv>\<close> & \<open>f x y\<close> \\
wenzelm@61493
   822
  \<open>(f #-> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |-> g\<close> \\
wenzelm@39883
   823
  \end{tabular}
wenzelm@61416
   824
  \<^medskip>
wenzelm@58618
   825
\<close>
wenzelm@58618
   826
wenzelm@58618
   827
text %mlref \<open>
wenzelm@39883
   828
  \begin{mldecls}
wenzelm@46262
   829
  @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
wenzelm@46262
   830
  @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
wenzelm@46262
   831
  @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
wenzelm@46262
   832
  @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
wenzelm@39883
   833
  \end{mldecls}
wenzelm@58618
   834
\<close>
wenzelm@58618
   835
wenzelm@58618
   836
wenzelm@58618
   837
subsection \<open>Canonical iteration\<close>
wenzelm@58618
   838
wenzelm@61854
   839
text \<open>
wenzelm@61854
   840
  As explained above, a function \<open>f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> can be understood as update on
wenzelm@61854
   841
  a configuration of type \<open>\<beta>\<close>, parameterized by an argument of type \<open>\<alpha>\<close>. Given
wenzelm@61854
   842
  \<open>a: \<alpha>\<close> the partial application \<open>(f a): \<beta> \<rightarrow> \<beta>\<close> operates homogeneously on \<open>\<beta>\<close>.
wenzelm@61854
   843
  This can be iterated naturally over a list of parameters \<open>[a\<^sub>1, \<dots>, a\<^sub>n]\<close> as
wenzelm@61854
   844
  \<open>f a\<^sub>1 #> \<dots> #> f a\<^sub>n\<close>. The latter expression is again a function \<open>\<beta> \<rightarrow> \<beta>\<close>. It
wenzelm@61854
   845
  can be applied to an initial configuration \<open>b: \<beta>\<close> to start the iteration
wenzelm@61854
   846
  over the given list of arguments: each \<open>a\<close> in \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> is applied
wenzelm@61854
   847
  consecutively by updating a cumulative configuration.
wenzelm@39883
   848
wenzelm@61854
   849
  The \<open>fold\<close> combinator in Isabelle/ML lifts a function \<open>f\<close> as above to its
wenzelm@61854
   850
  iterated version over a list of arguments. Lifting can be repeated, e.g.\
wenzelm@61854
   851
  \<open>(fold \<circ> fold) f\<close> iterates over a list of lists as expected.
wenzelm@39883
   852
wenzelm@61854
   853
  The variant \<open>fold_rev\<close> works inside-out over the list of arguments, such
wenzelm@61854
   854
  that \<open>fold_rev f \<equiv> fold f \<circ> rev\<close> holds.
wenzelm@61493
   855
wenzelm@61854
   856
  The \<open>fold_map\<close> combinator essentially performs \<open>fold\<close> and \<open>map\<close>
wenzelm@61854
   857
  simultaneously: each application of \<open>f\<close> produces an updated configuration
wenzelm@61854
   858
  together with a side-result; the iteration collects all such side-results as
wenzelm@61854
   859
  a separate list.
wenzelm@58618
   860
\<close>
wenzelm@58618
   861
wenzelm@58618
   862
text %mlref \<open>
wenzelm@39883
   863
  \begin{mldecls}
wenzelm@39883
   864
  @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
wenzelm@39883
   865
  @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
wenzelm@39883
   866
  @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
wenzelm@39883
   867
  \end{mldecls}
wenzelm@39883
   868
wenzelm@61854
   869
  \<^descr> @{ML fold}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> to a list of
wenzelm@61854
   870
  parameters.
wenzelm@61493
   871
wenzelm@61854
   872
  \<^descr> @{ML fold_rev}~\<open>f\<close> is similar to @{ML fold}~\<open>f\<close>, but works inside-out, as
wenzelm@61854
   873
  if the list would be reversed.
wenzelm@61493
   874
wenzelm@61854
   875
  \<^descr> @{ML fold_map}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> (with
wenzelm@61854
   876
  side-result) to a list of parameters and cumulative side-results.
wenzelm@39883
   877
wenzelm@39883
   878
wenzelm@39883
   879
  \begin{warn}
wenzelm@57421
   880
  The literature on functional programming provides a confusing multitude of
wenzelm@61854
   881
  combinators called \<open>foldl\<close>, \<open>foldr\<close> etc. SML97 provides its own variations
wenzelm@61854
   882
  as @{ML List.foldl} and @{ML List.foldr}, while the classic Isabelle library
wenzelm@61854
   883
  also has the historic @{ML Library.foldl} and @{ML Library.foldr}. To avoid
wenzelm@61854
   884
  unnecessary complication, all these historical versions should be ignored,
wenzelm@61854
   885
  and the canonical @{ML fold} (or @{ML fold_rev}) used exclusively.
wenzelm@39883
   886
  \end{warn}
wenzelm@58618
   887
\<close>
wenzelm@58618
   888
wenzelm@61854
   889
text %mlex \<open>
wenzelm@61854
   890
  The following example shows how to fill a text buffer incrementally by
wenzelm@61854
   891
  adding strings, either individually or from a given list.
wenzelm@58618
   892
\<close>
wenzelm@58618
   893
wenzelm@59902
   894
ML_val \<open>
wenzelm@39883
   895
  val s =
wenzelm@39883
   896
    Buffer.empty
wenzelm@39883
   897
    |> Buffer.add "digits: "
wenzelm@39883
   898
    |> fold (Buffer.add o string_of_int) (0 upto 9)
wenzelm@39883
   899
    |> Buffer.content;
wenzelm@39883
   900
wenzelm@39883
   901
  @{assert} (s = "digits: 0123456789");
wenzelm@58618
   902
\<close>
wenzelm@58618
   903
wenzelm@61854
   904
text \<open>
wenzelm@61854
   905
  Note how @{ML "fold (Buffer.add o string_of_int)"} above saves an extra @{ML
wenzelm@61854
   906
  "map"} over the given list. This kind of peephole optimization reduces both
wenzelm@61854
   907
  the code size and the tree structures in memory (``deforestation''), but it
wenzelm@61854
   908
  requires some practice to read and write fluently.
wenzelm@39883
   909
wenzelm@61416
   910
  \<^medskip>
wenzelm@61854
   911
  The next example elaborates the idea of canonical iteration, demonstrating
wenzelm@61854
   912
  fast accumulation of tree content using a text buffer.
wenzelm@58618
   913
\<close>
wenzelm@58618
   914
wenzelm@58618
   915
ML \<open>
wenzelm@39883
   916
  datatype tree = Text of string | Elem of string * tree list;
wenzelm@39883
   917
wenzelm@39883
   918
  fun slow_content (Text txt) = txt
wenzelm@39883
   919
    | slow_content (Elem (name, ts)) =
wenzelm@39883
   920
        "<" ^ name ^ ">" ^
wenzelm@39883
   921
        implode (map slow_content ts) ^
wenzelm@39883
   922
        "</" ^ name ^ ">"
wenzelm@39883
   923
wenzelm@39883
   924
  fun add_content (Text txt) = Buffer.add txt
wenzelm@39883
   925
    | add_content (Elem (name, ts)) =
wenzelm@39883
   926
        Buffer.add ("<" ^ name ^ ">") #>
wenzelm@39883
   927
        fold add_content ts #>
wenzelm@39883
   928
        Buffer.add ("</" ^ name ^ ">");
wenzelm@39883
   929
wenzelm@39883
   930
  fun fast_content tree =
wenzelm@39883
   931
    Buffer.empty |> add_content tree |> Buffer.content;
wenzelm@58618
   932
\<close>
wenzelm@58618
   933
wenzelm@61854
   934
text \<open>
wenzelm@61854
   935
  The slowness of @{ML slow_content} is due to the @{ML implode} of the
wenzelm@61854
   936
  recursive results, because it copies previously produced strings again and
wenzelm@61854
   937
  again.
wenzelm@39883
   938
wenzelm@61854
   939
  The incremental @{ML add_content} avoids this by operating on a buffer that
wenzelm@61854
   940
  is passed through in a linear fashion. Using @{ML_text "#>"} and contraction
wenzelm@61854
   941
  over the actual buffer argument saves some additional boiler-plate. Of
wenzelm@61854
   942
  course, the two @{ML "Buffer.add"} invocations with concatenated strings
wenzelm@61854
   943
  could have been split into smaller parts, but this would have obfuscated the
wenzelm@61854
   944
  source without making a big difference in performance. Here we have done
wenzelm@61854
   945
  some peephole-optimization for the sake of readability.
wenzelm@39883
   946
wenzelm@61854
   947
  Another benefit of @{ML add_content} is its ``open'' form as a function on
wenzelm@61854
   948
  buffers that can be continued in further linear transformations, folding
wenzelm@61854
   949
  etc. Thus it is more compositional than the naive @{ML slow_content}. As
wenzelm@61854
   950
  realistic example, compare the old-style @{ML "Term.maxidx_of_term: term ->
wenzelm@61854
   951
  int"} with the newer @{ML "Term.maxidx_term: term -> int -> int"} in
wenzelm@61854
   952
  Isabelle/Pure.
wenzelm@39883
   953
wenzelm@61854
   954
  Note that @{ML fast_content} above is only defined as example. In many
wenzelm@61854
   955
  practical situations, it is customary to provide the incremental @{ML
wenzelm@61854
   956
  add_content} only and leave the initialization and termination to the
wenzelm@61854
   957
  concrete application to the user.
wenzelm@58618
   958
\<close>
wenzelm@58618
   959
wenzelm@58618
   960
wenzelm@58618
   961
section \<open>Message output channels \label{sec:message-channels}\<close>
wenzelm@58618
   962
wenzelm@61854
   963
text \<open>
wenzelm@61854
   964
  Isabelle provides output channels for different kinds of messages: regular
wenzelm@61854
   965
  output, high-volume tracing information, warnings, and errors.
wenzelm@39835
   966
wenzelm@61854
   967
  Depending on the user interface involved, these messages may appear in
wenzelm@61854
   968
  different text styles or colours. The standard output for batch sessions
wenzelm@61854
   969
  prefixes each line of warnings by \<^verbatim>\<open>###\<close> and errors by \<^verbatim>\<open>***\<close>, but leaves
wenzelm@61854
   970
  anything else unchanged. The message body may contain further markup and
wenzelm@61854
   971
  formatting, which is routinely used in the Prover IDE @{cite
wenzelm@61854
   972
  "isabelle-jedit"}.
wenzelm@39835
   973
wenzelm@61854
   974
  Messages are associated with the transaction context of the running Isar
wenzelm@61854
   975
  command. This enables the front-end to manage commands and resulting
wenzelm@61854
   976
  messages together. For example, after deleting a command from a given theory
wenzelm@61854
   977
  document version, the corresponding message output can be retracted from the
wenzelm@61854
   978
  display.
wenzelm@58618
   979
\<close>
wenzelm@58618
   980
wenzelm@58618
   981
text %mlref \<open>
wenzelm@39835
   982
  \begin{mldecls}
wenzelm@39835
   983
  @{index_ML writeln: "string -> unit"} \\
wenzelm@39835
   984
  @{index_ML tracing: "string -> unit"} \\
wenzelm@39835
   985
  @{index_ML warning: "string -> unit"} \\
wenzelm@57421
   986
  @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\
wenzelm@39835
   987
  \end{mldecls}
wenzelm@39835
   988
wenzelm@61854
   989
  \<^descr> @{ML writeln}~\<open>text\<close> outputs \<open>text\<close> as regular message. This is the
wenzelm@61854
   990
  primary message output operation of Isabelle and should be used by default.
wenzelm@39835
   991
wenzelm@61854
   992
  \<^descr> @{ML tracing}~\<open>text\<close> outputs \<open>text\<close> as special tracing message, indicating
wenzelm@61854
   993
  potential high-volume output to the front-end (hundreds or thousands of
wenzelm@61854
   994
  messages issued by a single command). The idea is to allow the
wenzelm@61854
   995
  user-interface to downgrade the quality of message display to achieve higher
wenzelm@61854
   996
  throughput.
wenzelm@39835
   997
wenzelm@61854
   998
  Note that the user might have to take special actions to see tracing output,
wenzelm@61854
   999
  e.g.\ switch to a different output window. So this channel should not be
wenzelm@61854
  1000
  used for regular output.
wenzelm@39835
  1001
wenzelm@61854
  1002
  \<^descr> @{ML warning}~\<open>text\<close> outputs \<open>text\<close> as warning, which typically means some
wenzelm@61854
  1003
  extra emphasis on the front-end side (color highlighting, icons, etc.).
wenzelm@39835
  1004
wenzelm@61854
  1005
  \<^descr> @{ML error}~\<open>text\<close> raises exception @{ML ERROR}~\<open>text\<close> and thus lets the
wenzelm@61854
  1006
  Isar toplevel print \<open>text\<close> on the error channel, which typically means some
wenzelm@61854
  1007
  extra emphasis on the front-end side (color highlighting, icons, etc.).
wenzelm@39835
  1008
wenzelm@39835
  1009
  This assumes that the exception is not handled before the command
wenzelm@61854
  1010
  terminates. Handling exception @{ML ERROR}~\<open>text\<close> is a perfectly legal
wenzelm@61854
  1011
  alternative: it means that the error is absorbed without any message output.
wenzelm@39835
  1012
wenzelm@39861
  1013
  \begin{warn}
wenzelm@54387
  1014
  The actual error channel is accessed via @{ML Output.error_message}, but
wenzelm@58842
  1015
  this is normally not used directly in user code.
wenzelm@39861
  1016
  \end{warn}
wenzelm@39835
  1017
wenzelm@39861
  1018
wenzelm@39861
  1019
  \begin{warn}
wenzelm@61854
  1020
  Regular Isabelle/ML code should output messages exclusively by the official
wenzelm@61854
  1021
  channels. Using raw I/O on \<^emph>\<open>stdout\<close> or \<^emph>\<open>stderr\<close> instead (e.g.\ via @{ML
wenzelm@61854
  1022
  TextIO.output}) is apt to cause problems in the presence of parallel and
wenzelm@61854
  1023
  asynchronous processing of Isabelle theories. Such raw output might be
wenzelm@61854
  1024
  displayed by the front-end in some system console log, with a low chance
wenzelm@61854
  1025
  that the user will ever see it. Moreover, as a genuine side-effect on global
wenzelm@61854
  1026
  process channels, there is no proper way to retract output when Isar command
wenzelm@40126
  1027
  transactions are reset by the system.
wenzelm@39861
  1028
  \end{warn}
wenzelm@39872
  1029
wenzelm@39872
  1030
  \begin{warn}
wenzelm@61854
  1031
  The message channels should be used in a message-oriented manner. This means
wenzelm@61854
  1032
  that multi-line output that logically belongs together is issued by a single
wenzelm@61854
  1033
  invocation of @{ML writeln} etc.\ with the functional concatenation of all
wenzelm@61854
  1034
  message constituents.
wenzelm@39872
  1035
  \end{warn}
wenzelm@58618
  1036
\<close>
wenzelm@58618
  1037
wenzelm@61854
  1038
text %mlex \<open>
wenzelm@61854
  1039
  The following example demonstrates a multi-line warning. Note that in some
wenzelm@61854
  1040
  situations the user sees only the first line, so the most important point
wenzelm@61854
  1041
  should be made first.
wenzelm@58618
  1042
\<close>
wenzelm@58618
  1043
wenzelm@58618
  1044
ML_command \<open>
wenzelm@39872
  1045
  warning (cat_lines
wenzelm@39872
  1046
   ["Beware the Jabberwock, my son!",
wenzelm@39872
  1047
    "The jaws that bite, the claws that catch!",
wenzelm@39872
  1048
    "Beware the Jubjub Bird, and shun",
wenzelm@39872
  1049
    "The frumious Bandersnatch!"]);
wenzelm@58618
  1050
\<close>
wenzelm@58618
  1051
wenzelm@59902
  1052
text \<open>
wenzelm@61416
  1053
  \<^medskip>
wenzelm@61854
  1054
  An alternative is to make a paragraph of freely-floating words as follows.
wenzelm@59902
  1055
\<close>
wenzelm@59902
  1056
wenzelm@59902
  1057
ML_command \<open>
wenzelm@59902
  1058
  warning (Pretty.string_of (Pretty.para
wenzelm@59902
  1059
    "Beware the Jabberwock, my son! \
wenzelm@59902
  1060
    \The jaws that bite, the claws that catch! \
wenzelm@59902
  1061
    \Beware the Jubjub Bird, and shun \
wenzelm@59902
  1062
    \The frumious Bandersnatch!"))
wenzelm@59902
  1063
\<close>
wenzelm@59902
  1064
wenzelm@59902
  1065
text \<open>
wenzelm@59902
  1066
  This has advantages with variable window / popup sizes, but might make it
wenzelm@59902
  1067
  harder to search for message content systematically, e.g.\ by other tools or
wenzelm@59902
  1068
  by humans expecting the ``verse'' of a formal message in a fixed layout.
wenzelm@59902
  1069
\<close>
wenzelm@59902
  1070
wenzelm@58618
  1071
wenzelm@58618
  1072
section \<open>Exceptions \label{sec:exceptions}\<close>
wenzelm@58618
  1073
wenzelm@61854
  1074
text \<open>
wenzelm@61854
  1075
  The Standard ML semantics of strict functional evaluation together with
wenzelm@61854
  1076
  exceptions is rather well defined, but some delicate points need to be
wenzelm@61854
  1077
  observed to avoid that ML programs go wrong despite static type-checking.
wenzelm@61854
  1078
  Exceptions in Isabelle/ML are subsequently categorized as follows.
wenzelm@61854
  1079
\<close>
wenzelm@61506
  1080
wenzelm@61506
  1081
paragraph \<open>Regular user errors.\<close>
wenzelm@61854
  1082
text \<open>
wenzelm@61854
  1083
  These are meant to provide informative feedback about malformed input etc.
wenzelm@61854
  1084
wenzelm@61854
  1085
  The \<^emph>\<open>error\<close> function raises the corresponding @{ML ERROR} exception, with a
wenzelm@61854
  1086
  plain text message as argument. @{ML ERROR} exceptions can be handled
wenzelm@61854
  1087
  internally, in order to be ignored, turned into other exceptions, or
wenzelm@61854
  1088
  cascaded by appending messages. If the corresponding Isabelle/Isar command
wenzelm@61854
  1089
  terminates with an @{ML ERROR} exception state, the system will print the
wenzelm@61854
  1090
  result on the error channel (see \secref{sec:message-channels}).
wenzelm@39854
  1091
wenzelm@61854
  1092
  It is considered bad style to refer to internal function names or values in
wenzelm@61854
  1093
  ML source notation in user error messages. Do not use \<open>@{make_string}\<close> nor
wenzelm@61854
  1094
  \<open>@{here}\<close>!
wenzelm@39854
  1095
wenzelm@61854
  1096
  Grammatical correctness of error messages can be improved by \<^emph>\<open>omitting\<close>
wenzelm@61854
  1097
  final punctuation: messages are often concatenated or put into a larger
wenzelm@61854
  1098
  context (e.g.\ augmented with source position). Note that punctuation after
wenzelm@61854
  1099
  formal entities (types, terms, theorems) is particularly prone to user
wenzelm@61854
  1100
  confusion.
wenzelm@61506
  1101
\<close>
wenzelm@61506
  1102
wenzelm@61506
  1103
paragraph \<open>Program failures.\<close>
wenzelm@61854
  1104
text \<open>
wenzelm@61854
  1105
  There is a handful of standard exceptions that indicate general failure
wenzelm@61506
  1106
  situations, or failures of core operations on logical entities (types,
wenzelm@61506
  1107
  terms, theorems, theories, see \chref{ch:logic}).
wenzelm@39854
  1108
wenzelm@61854
  1109
  These exceptions indicate a genuine breakdown of the program, so the main
wenzelm@61854
  1110
  purpose is to determine quickly what has happened where. Traditionally, the
wenzelm@61854
  1111
  (short) exception message would include the name of an ML function, although
wenzelm@61854
  1112
  this is no longer necessary, because the ML runtime system attaches detailed
wenzelm@61854
  1113
  source position stemming from the corresponding @{ML_text raise} keyword.
wenzelm@39854
  1114
wenzelm@61416
  1115
  \<^medskip>
wenzelm@61854
  1116
  User modules can always introduce their own custom exceptions locally, e.g.\
wenzelm@61854
  1117
  to organize internal failures robustly without overlapping with existing
wenzelm@61854
  1118
  exceptions. Exceptions that are exposed in module signatures require extra
wenzelm@61854
  1119
  care, though, and should \<^emph>\<open>not\<close> be introduced by default. Surprise by users
wenzelm@61854
  1120
  of a module can be often minimized by using plain user errors instead.
wenzelm@61506
  1121
\<close>
wenzelm@61506
  1122
wenzelm@61506
  1123
paragraph \<open>Interrupts.\<close>
wenzelm@61854
  1124
text \<open>
wenzelm@61854
  1125
  These indicate arbitrary system events: both the ML runtime system and the
wenzelm@61854
  1126
  Isabelle/ML infrastructure signal various exceptional situations by raising
wenzelm@61854
  1127
  the special @{ML Exn.Interrupt} exception in user code.
wenzelm@57421
  1128
wenzelm@57421
  1129
  This is the one and only way that physical events can intrude an Isabelle/ML
wenzelm@57421
  1130
  program. Such an interrupt can mean out-of-memory, stack overflow, timeout,
wenzelm@57421
  1131
  internal signaling of threads, or a POSIX process signal. An Isabelle/ML
wenzelm@57421
  1132
  program that intercepts interrupts becomes dependent on physical effects of
wenzelm@57421
  1133
  the environment. Even worse, exception handling patterns that are too
wenzelm@57421
  1134
  general by accident, e.g.\ by misspelled exception constructors, will cover
wenzelm@57421
  1135
  interrupts unintentionally and thus render the program semantics
wenzelm@57421
  1136
  ill-defined.
wenzelm@39854
  1137
wenzelm@61854
  1138
  Note that the Interrupt exception dates back to the original SML90 language
wenzelm@61854
  1139
  definition. It was excluded from the SML97 version to avoid its malign
wenzelm@61854
  1140
  impact on ML program semantics, but without providing a viable alternative.
wenzelm@61854
  1141
  Isabelle/ML recovers physical interruptibility (which is an indispensable
wenzelm@61854
  1142
  tool to implement managed evaluation of command transactions), but requires
wenzelm@61854
  1143
  user code to be strictly transparent wrt.\ interrupts.
wenzelm@39854
  1144
wenzelm@39854
  1145
  \begin{warn}
wenzelm@61854
  1146
  Isabelle/ML user code needs to terminate promptly on interruption, without
wenzelm@61854
  1147
  guessing at its meaning to the system infrastructure. Temporary handling of
wenzelm@61854
  1148
  interrupts for cleanup of global resources etc.\ needs to be followed
wenzelm@61854
  1149
  immediately by re-raising of the original exception.
wenzelm@39854
  1150
  \end{warn}
wenzelm@58618
  1151
\<close>
wenzelm@58618
  1152
wenzelm@58618
  1153
text %mlref \<open>
wenzelm@39855
  1154
  \begin{mldecls}
wenzelm@39855
  1155
  @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
wenzelm@39855
  1156
  @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
wenzelm@55838
  1157
  @{index_ML_exception ERROR: string} \\
wenzelm@55838
  1158
  @{index_ML_exception Fail: string} \\
wenzelm@39856
  1159
  @{index_ML Exn.is_interrupt: "exn -> bool"} \\
wenzelm@62505
  1160
  @{index_ML Exn.reraise: "exn -> 'a"} \\
wenzelm@56303
  1161
  @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\
wenzelm@39855
  1162
  \end{mldecls}
wenzelm@39855
  1163
wenzelm@61854
  1164
  \<^descr> @{ML try}~\<open>f x\<close> makes the partiality of evaluating \<open>f x\<close> explicit via the
wenzelm@61854
  1165
  option datatype. Interrupts are \<^emph>\<open>not\<close> handled here, i.e.\ this form serves
wenzelm@61854
  1166
  as safe replacement for the \<^emph>\<open>unsafe\<close> version @{ML_text "(SOME"}~\<open>f
wenzelm@61854
  1167
  x\<close>~@{ML_text "handle _ => NONE)"} that is occasionally seen in books about
wenzelm@61854
  1168
  SML97, but not in Isabelle/ML.
wenzelm@39855
  1169
wenzelm@61439
  1170
  \<^descr> @{ML can} is similar to @{ML try} with more abstract result.
wenzelm@61439
  1171
wenzelm@61854
  1172
  \<^descr> @{ML ERROR}~\<open>msg\<close> represents user errors; this exception is normally
wenzelm@61854
  1173
  raised indirectly via the @{ML error} function (see
wenzelm@61854
  1174
  \secref{sec:message-channels}).
wenzelm@39856
  1175
wenzelm@61493
  1176
  \<^descr> @{ML Fail}~\<open>msg\<close> represents general program failures.
wenzelm@61439
  1177
wenzelm@61854
  1178
  \<^descr> @{ML Exn.is_interrupt} identifies interrupts robustly, without mentioning
wenzelm@61854
  1179
  concrete exception constructors in user code. Handled interrupts need to be
wenzelm@61854
  1180
  re-raised promptly!
wenzelm@61854
  1181
wenzelm@62505
  1182
  \<^descr> @{ML Exn.reraise}~\<open>exn\<close> raises exception \<open>exn\<close> while preserving its implicit
wenzelm@61854
  1183
  position information (if possible, depending on the ML platform).
wenzelm@39856
  1184
wenzelm@61854
  1185
  \<^descr> @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~\<open>e\<close>@{ML_text ")"} evaluates
wenzelm@61854
  1186
  expression \<open>e\<close> while printing a full trace of its stack of nested exceptions
wenzelm@61854
  1187
  (if possible, depending on the ML platform).
wenzelm@39855
  1188
wenzelm@61854
  1189
  Inserting @{ML Runtime.exn_trace} into ML code temporarily is useful for
wenzelm@61854
  1190
  debugging, but not suitable for production code.
wenzelm@58618
  1191
\<close>
wenzelm@58618
  1192
wenzelm@58618
  1193
text %mlantiq \<open>
wenzelm@39866
  1194
  \begin{matharray}{rcl}
wenzelm@61493
  1195
  @{ML_antiquotation_def "assert"} & : & \<open>ML_antiquotation\<close> \\
wenzelm@39866
  1196
  \end{matharray}
wenzelm@39866
  1197
wenzelm@61854
  1198
  \<^descr> \<open>@{assert}\<close> inlines a function @{ML_type "bool -> unit"} that raises @{ML
wenzelm@61854
  1199
  Fail} if the argument is @{ML false}. Due to inlining the source position of
wenzelm@61854
  1200
  failed assertions is included in the error output.
wenzelm@58618
  1201
\<close>
wenzelm@58618
  1202
wenzelm@58618
  1203
wenzelm@58618
  1204
section \<open>Strings of symbols \label{sec:symbols}\<close>
wenzelm@58618
  1205
wenzelm@61854
  1206
text \<open>
wenzelm@61854
  1207
  A \<^emph>\<open>symbol\<close> constitutes the smallest textual unit in Isabelle/ML --- raw ML
wenzelm@61854
  1208
  characters are normally not encountered at all. Isabelle strings consist of
wenzelm@61854
  1209
  a sequence of symbols, represented as a packed string or an exploded list of
wenzelm@61854
  1210
  strings. Each symbol is in itself a small string, which has either one of
wenzelm@61504
  1211
  the following forms:
wenzelm@61504
  1212
wenzelm@61504
  1213
    \<^enum> a single ASCII character ``\<open>c\<close>'', for example ``\<^verbatim>\<open>a\<close>'',
wenzelm@61504
  1214
wenzelm@61504
  1215
    \<^enum> a codepoint according to UTF-8 (non-ASCII byte sequence),
wenzelm@61504
  1216
wenzelm@61504
  1217
    \<^enum> a regular symbol ``\<^verbatim>\<open>\<ident>\<close>'', for example ``\<^verbatim>\<open>\<alpha>\<close>'',
wenzelm@61504
  1218
wenzelm@61504
  1219
    \<^enum> a control symbol ``\<^verbatim>\<open>\<^ident>\<close>'', for example ``\<^verbatim>\<open>\<^bold>\<close>'',
wenzelm@61504
  1220
wenzelm@61504
  1221
    \<^enum> a raw symbol ``\<^verbatim>\<open>\\<close>\<^verbatim>\<open><^raw:\<close>\<open>text\<close>\<^verbatim>\<open>>\<close>'' where \<open>text\<close> consists of
wenzelm@61504
  1222
    printable characters excluding ``\<^verbatim>\<open>.\<close>'' and ``\<^verbatim>\<open>>\<close>'', for example
wenzelm@61504
  1223
    ``\<^verbatim>\<open>\<^raw:$\sum_{i = 1}^n$>\<close>'',
wenzelm@61504
  1224
wenzelm@61504
  1225
    \<^enum> a numbered raw control symbol ``\<^verbatim>\<open>\\<close>\<^verbatim>\<open><^raw\<close>\<open>n\<close>\<^verbatim>\<open>>\<close>, where \<open>n\<close> consists
wenzelm@61504
  1226
    of digits, for example ``\<^verbatim>\<open>\<^raw42>\<close>''.
wenzelm@61504
  1227
wenzelm@61504
  1228
  The \<open>ident\<close> syntax for symbol names is \<open>letter (letter | digit)\<^sup>*\<close>, where
wenzelm@61504
  1229
  \<open>letter = A..Za..z\<close> and \<open>digit = 0..9\<close>. There are infinitely many regular
wenzelm@61504
  1230
  symbols and control symbols, but a fixed collection of standard symbols is
wenzelm@61504
  1231
  treated specifically. For example, ``\<^verbatim>\<open>\<alpha>\<close>'' is classified as a letter, which
wenzelm@61504
  1232
  means it may occur within regular Isabelle identifiers.
wenzelm@52421
  1233
wenzelm@57421
  1234
  The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit
wenzelm@57421
  1235
  character sequences are passed-through unchanged. Unicode/UCS data in UTF-8
wenzelm@57421
  1236
  encoding is processed in a non-strict fashion, such that well-formed code
wenzelm@57421
  1237
  sequences are recognized accordingly. Unicode provides its own collection of
wenzelm@57421
  1238
  mathematical symbols, but within the core Isabelle/ML world there is no link
wenzelm@57421
  1239
  to the standard collection of Isabelle regular symbols.
wenzelm@57421
  1240
wenzelm@61416
  1241
  \<^medskip>
wenzelm@61504
  1242
  Output of Isabelle symbols depends on the print mode. For example, the
wenzelm@61504
  1243
  standard {\LaTeX} setup of the Isabelle document preparation system would
wenzelm@61504
  1244
  present ``\<^verbatim>\<open>\<alpha>\<close>'' as \<open>\<alpha>\<close>, and ``\<^verbatim>\<open>\<^bold>\<alpha>\<close>'' as \<open>\<^bold>\<alpha>\<close>. On-screen rendering usually
wenzelm@61504
  1245
  works by mapping a finite subset of Isabelle symbols to suitable Unicode
wenzelm@61504
  1246
  characters.
wenzelm@58618
  1247
\<close>
wenzelm@58618
  1248
wenzelm@58618
  1249
text %mlref \<open>
wenzelm@52421
  1250
  \begin{mldecls}
wenzelm@52421
  1251
  @{index_ML_type "Symbol.symbol": string} \\
wenzelm@52421
  1252
  @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
wenzelm@52421
  1253
  @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
wenzelm@52421
  1254
  @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
wenzelm@52421
  1255
  @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
wenzelm@52421
  1256
  @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
wenzelm@52421
  1257
  \end{mldecls}
wenzelm@52421
  1258
  \begin{mldecls}
wenzelm@52421
  1259
  @{index_ML_type "Symbol.sym"} \\
wenzelm@52421
  1260
  @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
wenzelm@52421
  1261
  \end{mldecls}
wenzelm@52421
  1262
wenzelm@61854
  1263
  \<^descr> Type @{ML_type "Symbol.symbol"} represents individual Isabelle symbols.
wenzelm@52421
  1264
wenzelm@61854
  1265
  \<^descr> @{ML "Symbol.explode"}~\<open>str\<close> produces a symbol list from the packed form.
wenzelm@61854
  1266
  This function supersedes @{ML "String.explode"} for virtually all purposes
wenzelm@61854
  1267
  of manipulating text in Isabelle!\<^footnote>\<open>The runtime overhead for exploded strings
wenzelm@61854
  1268
  is mainly that of the list structure: individual symbols that happen to be a
wenzelm@61854
  1269
  singleton string do not require extra memory in Poly/ML.\<close>
wenzelm@52421
  1270
wenzelm@61439
  1271
  \<^descr> @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
wenzelm@61854
  1272
  "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard symbols
wenzelm@61854
  1273
  according to fixed syntactic conventions of Isabelle, cf.\ @{cite
wenzelm@61854
  1274
  "isabelle-isar-ref"}.
wenzelm@52421
  1275
wenzelm@61854
  1276
  \<^descr> Type @{ML_type "Symbol.sym"} is a concrete datatype that represents the
wenzelm@61854
  1277
  different kinds of symbols explicitly, with constructors @{ML
wenzelm@61854
  1278
  "Symbol.Char"}, @{ML "Symbol.UTF8"}, @{ML "Symbol.Sym"}, @{ML
wenzelm@61854
  1279
  "Symbol.Control"}, @{ML "Symbol.Raw"}, @{ML "Symbol.Malformed"}.
wenzelm@52421
  1280
wenzelm@61854
  1281
  \<^descr> @{ML "Symbol.decode"} converts the string representation of a symbol into
wenzelm@61854
  1282
  the datatype version.
wenzelm@61506
  1283
\<close>
wenzelm@61506
  1284
wenzelm@61506
  1285
paragraph \<open>Historical note.\<close>
wenzelm@61854
  1286
text \<open>
wenzelm@61854
  1287
  In the original SML90 standard the primitive ML type @{ML_type char} did not
wenzelm@61854
  1288
  exists, and @{ML_text "explode: string -> string list"} produced a list of
wenzelm@61854
  1289
  singleton strings like @{ML "raw_explode: string -> string list"} in
wenzelm@61506
  1290
  Isabelle/ML today. When SML97 came out, Isabelle did not adopt its somewhat
wenzelm@61506
  1291
  anachronistic 8-bit or 16-bit characters, but the idea of exploding a string
wenzelm@61506
  1292
  into a list of small strings was extended to ``symbols'' as explained above.
wenzelm@61506
  1293
  Thus Isabelle sources can refer to an infinite store of user-defined
wenzelm@61506
  1294
  symbols, without having to worry about the multitude of Unicode encodings
wenzelm@61854
  1295
  that have emerged over the years.
wenzelm@61854
  1296
\<close>
wenzelm@58618
  1297
wenzelm@58618
  1298
wenzelm@58618
  1299
section \<open>Basic data types\<close>
wenzelm@58618
  1300
wenzelm@61854
  1301
text \<open>
wenzelm@61854
  1302
  The basis library proposal of SML97 needs to be treated with caution. Many
wenzelm@61854
  1303
  of its operations simply do not fit with important Isabelle/ML conventions
wenzelm@61854
  1304
  (like ``canonical argument order'', see
wenzelm@61854
  1305
  \secref{sec:canonical-argument-order}), others cause problems with the
wenzelm@61854
  1306
  parallel evaluation model of Isabelle/ML (such as @{ML TextIO.print} or @{ML
wenzelm@61854
  1307
  OS.Process.system}).
wenzelm@39859
  1308
wenzelm@61854
  1309
  Subsequently we give a brief overview of important operations on basic ML
wenzelm@61854
  1310
  data types.
wenzelm@58618
  1311
\<close>
wenzelm@58618
  1312
wenzelm@58618
  1313
wenzelm@58618
  1314
subsection \<open>Characters\<close>
wenzelm@58618
  1315
wenzelm@58618
  1316
text %mlref \<open>
wenzelm@39863
  1317
  \begin{mldecls}
wenzelm@39863
  1318
  @{index_ML_type char} \\
wenzelm@39863
  1319
  \end{mldecls}
wenzelm@39863
  1320
wenzelm@61854
  1321
  \<^descr> Type @{ML_type char} is \<^emph>\<open>not\<close> used. The smallest textual unit in Isabelle
wenzelm@61854
  1322
  is represented as a ``symbol'' (see \secref{sec:symbols}).
wenzelm@58618
  1323
\<close>
wenzelm@58618
  1324
wenzelm@58618
  1325
wenzelm@58618
  1326
subsection \<open>Strings\<close>
wenzelm@58618
  1327
wenzelm@58618
  1328
text %mlref \<open>
wenzelm@52421
  1329
  \begin{mldecls}
wenzelm@52421
  1330
  @{index_ML_type string} \\
wenzelm@52421
  1331
  \end{mldecls}
wenzelm@52421
  1332
wenzelm@61854
  1333
  \<^descr> Type @{ML_type string} represents immutable vectors of 8-bit characters.
wenzelm@61854
  1334
  There are operations in SML to convert back and forth to actual byte
wenzelm@61854
  1335
  vectors, which are seldom used.
wenzelm@52421
  1336
wenzelm@52421
  1337
  This historically important raw text representation is used for
wenzelm@61854
  1338
  Isabelle-specific purposes with the following implicit substructures packed
wenzelm@61854
  1339
  into the string content:
wenzelm@52421
  1340
wenzelm@61854
  1341
    \<^enum> sequence of Isabelle symbols (see also \secref{sec:symbols}), with @{ML
wenzelm@61854
  1342
    Symbol.explode} as key operation;
wenzelm@61458
  1343
  
wenzelm@61854
  1344
    \<^enum> XML tree structure via YXML (see also @{cite "isabelle-system"}), with
wenzelm@61854
  1345
    @{ML YXML.parse_body} as key operation.
wenzelm@52421
  1346
wenzelm@58723
  1347
  Note that Isabelle/ML string literals may refer Isabelle symbols like
wenzelm@61854
  1348
  ``\<^verbatim>\<open>\<alpha>\<close>'' natively, \<^emph>\<open>without\<close> escaping the backslash. This is a consequence
wenzelm@61854
  1349
  of Isabelle treating all source text as strings of symbols, instead of raw
wenzelm@61854
  1350
  characters.
wenzelm@58618
  1351
\<close>
wenzelm@58618
  1352
wenzelm@61854
  1353
text %mlex \<open>
wenzelm@61854
  1354
  The subsequent example illustrates the difference of physical addressing of
wenzelm@61854
  1355
  bytes versus logical addressing of symbols in Isabelle strings.
wenzelm@58618
  1356
\<close>
wenzelm@58618
  1357
wenzelm@58618
  1358
ML_val \<open>
wenzelm@52421
  1359
  val s = "\<A>";
wenzelm@52421
  1360
wenzelm@52421
  1361
  @{assert} (length (Symbol.explode s) = 1);
wenzelm@52421
  1362
  @{assert} (size s = 4);
wenzelm@58618
  1363
\<close>
wenzelm@58618
  1364
wenzelm@61854
  1365
text \<open>
wenzelm@61854
  1366
  Note that in Unicode renderings of the symbol \<open>\<A>\<close>, variations of encodings
wenzelm@61854
  1367
  like UTF-8 or UTF-16 pose delicate questions about the multi-byte
wenzelm@61854
  1368
  representations of its codepoint, which is outside of the 16-bit address
wenzelm@61854
  1369
  space of the original Unicode standard from the 1990-ies. In Isabelle/ML it
wenzelm@61854
  1370
  is just ``\<^verbatim>\<open>\<A>\<close>'' literally, using plain ASCII characters beyond any
wenzelm@61854
  1371
  doubts.
wenzelm@61854
  1372
\<close>
wenzelm@58618
  1373
wenzelm@58618
  1374
wenzelm@58618
  1375
subsection \<open>Integers\<close>
wenzelm@58618
  1376
wenzelm@58618
  1377
text %mlref \<open>
wenzelm@39862
  1378
  \begin{mldecls}
wenzelm@39862
  1379
  @{index_ML_type int} \\
wenzelm@39862
  1380
  \end{mldecls}
wenzelm@39862
  1381
wenzelm@61854
  1382
  \<^descr> Type @{ML_type int} represents regular mathematical integers, which are
wenzelm@61854
  1383
  \<^emph>\<open>unbounded\<close>. Overflow is treated properly, but should never happen in
wenzelm@61854
  1384
  practice.\<^footnote>\<open>The size limit for integer bit patterns in memory is 64\,MB for
wenzelm@62354
  1385
  32-bit Poly/ML, and much higher for 64-bit systems.\<close>
wenzelm@39862
  1386
wenzelm@55837
  1387
  Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
wenzelm@61854
  1388
  @{ML_structure Int}. Structure @{ML_structure Integer} in @{file
wenzelm@61854
  1389
  "~~/src/Pure/General/integer.ML"} provides some additional operations.
wenzelm@58618
  1390
\<close>
wenzelm@58618
  1391
wenzelm@58618
  1392
wenzelm@63215
  1393
subsection \<open>Rational numbers\<close>
wenzelm@63215
  1394
wenzelm@63215
  1395
text %mlref \<open>
wenzelm@63215
  1396
  \begin{mldecls}
wenzelm@63215
  1397
  @{index_ML_type Rat.rat} \\
wenzelm@63215
  1398
  \end{mldecls}
wenzelm@63215
  1399
wenzelm@63215
  1400
  \<^descr> Type @{ML_type Rat.rat} represents rational numbers, based on the
wenzelm@63215
  1401
  unbounded integers of Poly/ML.
wenzelm@63215
  1402
wenzelm@63215
  1403
  Literal rationals may be written with special antiquotation syntax
wenzelm@63215
  1404
  \<^verbatim>\<open>@\<close>\<open>int\<close>\<^verbatim>\<open>/\<close>\<open>nat\<close> or \<^verbatim>\<open>@\<close>\<open>int\<close> (without any white space). For example
wenzelm@63215
  1405
  \<^verbatim>\<open>@~1/4\<close> or \<^verbatim>\<open>@10\<close>. The ML toplevel pretty printer uses the same format.
wenzelm@63215
  1406
wenzelm@63215
  1407
  Standard operations are provided via ad-hoc overloading of \<^verbatim>\<open>+\<close>, \<^verbatim>\<open>-\<close>, \<^verbatim>\<open>*\<close>,
wenzelm@63215
  1408
  \<^verbatim>\<open>/\<close>, etc.
wenzelm@63215
  1409
\<close>
wenzelm@63215
  1410
wenzelm@63215
  1411
wenzelm@58618
  1412
subsection \<open>Time\<close>
wenzelm@58618
  1413
wenzelm@58618
  1414
text %mlref \<open>
wenzelm@40302
  1415
  \begin{mldecls}
wenzelm@40302
  1416
  @{index_ML_type Time.time} \\
wenzelm@40302
  1417
  @{index_ML seconds: "real -> Time.time"} \\
wenzelm@40302
  1418
  \end{mldecls}
wenzelm@40302
  1419
wenzelm@61854
  1420
  \<^descr> Type @{ML_type Time.time} represents time abstractly according to the
wenzelm@61854
  1421
  SML97 basis library definition. This is adequate for internal ML operations,
wenzelm@61854
  1422
  but awkward in concrete time specifications.
wenzelm@40302
  1423
wenzelm@61854
  1424
  \<^descr> @{ML seconds}~\<open>s\<close> turns the concrete scalar \<open>s\<close> (measured in seconds) into
wenzelm@61854
  1425
  an abstract time value. Floating point numbers are easy to use as
wenzelm@61854
  1426
  configuration options in the context (see \secref{sec:config-options}) or
wenzelm@61854
  1427
  system options that are maintained externally.
wenzelm@58618
  1428
\<close>
wenzelm@58618
  1429
wenzelm@58618
  1430
wenzelm@58618
  1431
subsection \<open>Options\<close>
wenzelm@58618
  1432
wenzelm@58618
  1433
text %mlref \<open>
wenzelm@39859
  1434
  \begin{mldecls}
wenzelm@39859
  1435
  @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
wenzelm@39859
  1436
  @{index_ML is_some: "'a option -> bool"} \\
wenzelm@39859
  1437
  @{index_ML is_none: "'a option -> bool"} \\
wenzelm@39859
  1438
  @{index_ML the: "'a option -> 'a"} \\
wenzelm@39859
  1439
  @{index_ML these: "'a list option -> 'a list"} \\
wenzelm@39859
  1440
  @{index_ML the_list: "'a option -> 'a list"} \\
wenzelm@39859
  1441
  @{index_ML the_default: "'a -> 'a option -> 'a"} \\
wenzelm@39859
  1442
  \end{mldecls}
wenzelm@58618
  1443
\<close>
wenzelm@58618
  1444
wenzelm@61854
  1445
text \<open>
wenzelm@61854
  1446
  Apart from @{ML Option.map} most other operations defined in structure
wenzelm@61854
  1447
  @{ML_structure Option} are alien to Isabelle/ML and never used. The
wenzelm@61854
  1448
  operations shown above are defined in @{file
wenzelm@61854
  1449
  "~~/src/Pure/General/basics.ML"}.
wenzelm@61854
  1450
\<close>
wenzelm@58618
  1451
wenzelm@58618
  1452
wenzelm@58618
  1453
subsection \<open>Lists\<close>
wenzelm@58618
  1454
wenzelm@61854
  1455
text \<open>
wenzelm@61854
  1456
  Lists are ubiquitous in ML as simple and light-weight ``collections'' for
wenzelm@61854
  1457
  many everyday programming tasks. Isabelle/ML provides important additions
wenzelm@61854
  1458
  and improvements over operations that are predefined in the SML97 library.
wenzelm@61854
  1459
\<close>
wenzelm@58618
  1460
wenzelm@58618
  1461
text %mlref \<open>
wenzelm@39863
  1462
  \begin{mldecls}
wenzelm@39863
  1463
  @{index_ML cons: "'a -> 'a list -> 'a list"} \\
wenzelm@39874
  1464
  @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
wenzelm@39874
  1465
  @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
wenzelm@39874
  1466
  @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
wenzelm@39874
  1467
  @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
wenzelm@39863
  1468
  \end{mldecls}
wenzelm@39863
  1469
wenzelm@61493
  1470
  \<^descr> @{ML cons}~\<open>x xs\<close> evaluates to \<open>x :: xs\<close>.
wenzelm@39863
  1471
wenzelm@61854
  1472
  Tupled infix operators are a historical accident in Standard ML. The curried
wenzelm@61854
  1473
  @{ML cons} amends this, but it should be only used when partial application
wenzelm@61854
  1474
  is required.
wenzelm@39863
  1475
wenzelm@61854
  1476
  \<^descr> @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat lists as a
wenzelm@61854
  1477
  set-like container that maintains the order of elements. See @{file
wenzelm@61854
  1478
  "~~/src/Pure/library.ML"} for the full specifications (written in ML). There
wenzelm@61854
  1479
  are some further derived operations like @{ML union} or @{ML inter}.
wenzelm@39874
  1480
wenzelm@61854
  1481
  Note that @{ML insert} is conservative about elements that are already a
wenzelm@61854
  1482
  @{ML member} of the list, while @{ML update} ensures that the latest entry
wenzelm@61854
  1483
  is always put in front. The latter discipline is often more appropriate in
wenzelm@61854
  1484
  declarations of context data (\secref{sec:context-data}) that are issued by
wenzelm@61854
  1485
  the user in Isar source: later declarations take precedence over earlier
wenzelm@61854
  1486
  ones. \<close>
wenzelm@58618
  1487
wenzelm@61854
  1488
text %mlex \<open>
wenzelm@61854
  1489
  Using canonical @{ML fold} together with @{ML cons} (or similar standard
wenzelm@61854
  1490
  operations) alternates the orientation of data. The is quite natural and
wenzelm@61854
  1491
  should not be altered forcible by inserting extra applications of @{ML rev}.
wenzelm@61854
  1492
  The alternative @{ML fold_rev} can be used in the few situations, where
wenzelm@61854
  1493
  alternation should be prevented.
wenzelm@58618
  1494
\<close>
wenzelm@58618
  1495
wenzelm@59902
  1496
ML_val \<open>
wenzelm@39863
  1497
  val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
wenzelm@39863
  1498
wenzelm@39863
  1499
  val list1 = fold cons items [];
wenzelm@39866
  1500
  @{assert} (list1 = rev items);
wenzelm@39866
  1501
wenzelm@39863
  1502
  val list2 = fold_rev cons items [];
wenzelm@39866
  1503
  @{assert} (list2 = items);
wenzelm@58618
  1504
\<close>
wenzelm@58618
  1505
wenzelm@61854
  1506
text \<open>
wenzelm@61854
  1507
  The subsequent example demonstrates how to \<^emph>\<open>merge\<close> two lists in a natural
wenzelm@61854
  1508
  way.
wenzelm@61854
  1509
\<close>
wenzelm@58618
  1510
wenzelm@59902
  1511
ML_val \<open>
wenzelm@39883
  1512
  fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;
wenzelm@58618
  1513
\<close>
wenzelm@58618
  1514
wenzelm@61854
  1515
text \<open>
wenzelm@61854
  1516
  Here the first list is treated conservatively: only the new elements from
wenzelm@61854
  1517
  the second list are inserted. The inside-out order of insertion via @{ML
wenzelm@61854
  1518
  fold_rev} attempts to preserve the order of elements in the result.
wenzelm@39883
  1519
wenzelm@39883
  1520
  This way of merging lists is typical for context data
wenzelm@61854
  1521
  (\secref{sec:context-data}). See also @{ML merge} as defined in @{file
wenzelm@61854
  1522
  "~~/src/Pure/library.ML"}.
wenzelm@58618
  1523
\<close>
wenzelm@58618
  1524
wenzelm@58618
  1525
wenzelm@58618
  1526
subsection \<open>Association lists\<close>
wenzelm@58618
  1527
wenzelm@61854
  1528
text \<open>
wenzelm@61854
  1529
  The operations for association lists interpret a concrete list of pairs as a
wenzelm@61854
  1530
  finite function from keys to values. Redundant representations with multiple
wenzelm@61854
  1531
  occurrences of the same key are implicitly normalized: lookup and update
wenzelm@61854
  1532
  only take the first occurrence into account.
wenzelm@58618
  1533
\<close>
wenzelm@58618
  1534
wenzelm@58618
  1535
text \<open>
wenzelm@39875
  1536
  \begin{mldecls}
wenzelm@39875
  1537
  @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
wenzelm@39875
  1538
  @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
wenzelm@39875
  1539
  @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
wenzelm@39875
  1540
  \end{mldecls}
wenzelm@39875
  1541
wenzelm@61854
  1542
  \<^descr> @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update} implement the
wenzelm@61854
  1543
  main ``framework operations'' for mappings in Isabelle/ML, following
wenzelm@61854
  1544
  standard conventions for their names and types.
wenzelm@39875
  1545
wenzelm@61854
  1546
  Note that a function called \<^verbatim>\<open>lookup\<close> is obliged to express its partiality
wenzelm@61854
  1547
  via an explicit option element. There is no choice to raise an exception,
wenzelm@61854
  1548
  without changing the name to something like \<open>the_element\<close> or \<open>get\<close>.
wenzelm@61493
  1549
wenzelm@61854
  1550
  The \<open>defined\<close> operation is essentially a contraction of @{ML is_some} and
wenzelm@61854
  1551
  \<^verbatim>\<open>lookup\<close>, but this is sufficiently frequent to justify its independent
wenzelm@61854
  1552
  existence. This also gives the implementation some opportunity for peep-hole
wenzelm@61854
  1553
  optimization.
wenzelm@39875
  1554
wenzelm@39875
  1555
wenzelm@57421
  1556
  Association lists are adequate as simple implementation of finite mappings
wenzelm@57421
  1557
  in many practical situations. A more advanced table structure is defined in
wenzelm@57421
  1558
  @{file "~~/src/Pure/General/table.ML"}; that version scales easily to
wenzelm@39875
  1559
  thousands or millions of elements.
wenzelm@58618
  1560
\<close>
wenzelm@58618
  1561
wenzelm@58618
  1562
wenzelm@58618
  1563
subsection \<open>Unsynchronized references\<close>
wenzelm@58618
  1564
wenzelm@58618
  1565
text %mlref \<open>
wenzelm@39859
  1566
  \begin{mldecls}
wenzelm@39870
  1567
  @{index_ML_type "'a Unsynchronized.ref"} \\
wenzelm@39859
  1568
  @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
wenzelm@39859
  1569
  @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
wenzelm@46262
  1570
  @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
wenzelm@39859
  1571
  \end{mldecls}
wenzelm@58618
  1572
\<close>
wenzelm@58618
  1573
wenzelm@61854
  1574
text \<open>
wenzelm@61854
  1575
  Due to ubiquitous parallelism in Isabelle/ML (see also
wenzelm@61854
  1576
  \secref{sec:multi-threading}), the mutable reference cells of Standard ML
wenzelm@61854
  1577
  are notorious for causing problems. In a highly parallel system, both
wenzelm@61854
  1578
  correctness \<^emph>\<open>and\<close> performance are easily degraded when using mutable data.
wenzelm@39859
  1579
wenzelm@61854
  1580
  The unwieldy name of @{ML Unsynchronized.ref} for the constructor for
wenzelm@61854
  1581
  references in Isabelle/ML emphasizes the inconveniences caused by
wenzelm@61854
  1582
  mutability. Existing operations @{ML "!"} and @{ML_op ":="} are unchanged,
wenzelm@61854
  1583
  but should be used with special precautions, say in a strictly local
wenzelm@61854
  1584
  situation that is guaranteed to be restricted to sequential evaluation ---
wenzelm@61854
  1585
  now and in the future.
wenzelm@40508
  1586
wenzelm@40508
  1587
  \begin{warn}
wenzelm@40508
  1588
  Never @{ML_text "open Unsynchronized"}, not even in a local scope!
wenzelm@40508
  1589
  Pretending that mutable state is no problem is a very bad idea.
wenzelm@40508
  1590
  \end{warn}
wenzelm@58618
  1591
\<close>
wenzelm@58618
  1592
wenzelm@58618
  1593
wenzelm@58618
  1594
section \<open>Thread-safe programming \label{sec:multi-threading}\<close>
wenzelm@58618
  1595
wenzelm@61854
  1596
text \<open>
wenzelm@61854
  1597
  Multi-threaded execution has become an everyday reality in Isabelle since
wenzelm@61854
  1598
  Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides implicit and explicit
wenzelm@61854
  1599
  parallelism by default, and there is no way for user-space tools to ``opt
wenzelm@61854
  1600
  out''. ML programs that are purely functional, output messages only via the
wenzelm@61854
  1601
  official channels (\secref{sec:message-channels}), and do not intercept
wenzelm@61854
  1602
  interrupts (\secref{sec:exceptions}) can participate in the multi-threaded
wenzelm@39868
  1603
  environment immediately without further ado.
wenzelm@39868
  1604
wenzelm@61854
  1605
  More ambitious tools with more fine-grained interaction with the environment
wenzelm@61854
  1606
  need to observe the principles explained below.
wenzelm@58618
  1607
\<close>
wenzelm@58618
  1608
wenzelm@58618
  1609
wenzelm@58618
  1610
subsection \<open>Multi-threading with shared memory\<close>
wenzelm@58618
  1611
wenzelm@61854
  1612
text \<open>
wenzelm@61854
  1613
  Multiple threads help to organize advanced operations of the system, such as
wenzelm@61854
  1614
  real-time conditions on command transactions, sub-components with explicit
wenzelm@61854
  1615
  communication, general asynchronous interaction etc. Moreover, parallel
wenzelm@61854
  1616
  evaluation is a prerequisite to make adequate use of the CPU resources that
wenzelm@61854
  1617
  are available on multi-core systems.\<^footnote>\<open>Multi-core computing does not mean
wenzelm@61854
  1618
  that there are ``spare cycles'' to be wasted. It means that the continued
wenzelm@61854
  1619
  exponential speedup of CPU performance due to ``Moore's Law'' follows
wenzelm@61854
  1620
  different rules: clock frequency has reached its peak around 2005, and
wenzelm@61854
  1621
  applications need to be parallelized in order to avoid a perceived loss of
wenzelm@61854
  1622
  performance. See also @{cite "Sutter:2005"}.\<close>
wenzelm@39867
  1623
wenzelm@57421
  1624
  Isabelle/Isar exploits the inherent structure of theories and proofs to
wenzelm@61854
  1625
  support \<^emph>\<open>implicit parallelism\<close> to a large extent. LCF-style theorem proving
wenzelm@61854
  1626
  provides almost ideal conditions for that, see also @{cite "Wenzel:2009"}.
wenzelm@61854
  1627
  This means, significant parts of theory and proof checking is parallelized
wenzelm@61854
  1628
  by default. In Isabelle2013, a maximum speedup-factor of 3.5 on 4 cores and
wenzelm@61854
  1629
  6.5 on 8 cores can be expected @{cite "Wenzel:2013:ITP"}.
wenzelm@39867
  1630
wenzelm@61416
  1631
  \<^medskip>
wenzelm@61854
  1632
  ML threads lack the memory protection of separate processes, and operate
wenzelm@61854
  1633
  concurrently on shared heap memory. This has the advantage that results of
wenzelm@61854
  1634
  independent computations are directly available to other threads: abstract
wenzelm@61854
  1635
  values can be passed without copying or awkward serialization that is
wenzelm@61854
  1636
  typically required for separate processes.
wenzelm@39867
  1637
wenzelm@61854
  1638
  To make shared-memory multi-threading work robustly and efficiently, some
wenzelm@61854
  1639
  programming guidelines need to be observed. While the ML system is
wenzelm@61854
  1640
  responsible to maintain basic integrity of the representation of ML values
wenzelm@61854
  1641
  in memory, the application programmer needs to ensure that multi-threaded
wenzelm@61854
  1642
  execution does not break the intended semantics.
wenzelm@39867
  1643
wenzelm@39867
  1644
  \begin{warn}
wenzelm@61854
  1645
  To participate in implicit parallelism, tools need to be thread-safe. A
wenzelm@61854
  1646
  single ill-behaved tool can affect the stability and performance of the
wenzelm@61854
  1647
  whole system.
wenzelm@39867
  1648
  \end{warn}
wenzelm@39867
  1649
wenzelm@57421
  1650
  Apart from observing the principles of thread-safeness passively, advanced
wenzelm@57421
  1651
  tools may also exploit parallelism actively, e.g.\ by using library
wenzelm@39868
  1652
  functions for parallel list operations (\secref{sec:parlist}).
wenzelm@39867
  1653
wenzelm@39867
  1654
  \begin{warn}
wenzelm@61854
  1655
  Parallel computing resources are managed centrally by the Isabelle/ML
wenzelm@61854
  1656
  infrastructure. User programs should not fork their own ML threads to
wenzelm@61854
  1657
  perform heavy computations.
wenzelm@39867
  1658
  \end{warn}
wenzelm@58618
  1659
\<close>
wenzelm@58618
  1660
wenzelm@58618
  1661
wenzelm@58618
  1662
subsection \<open>Critical shared resources\<close>
wenzelm@58618
  1663
wenzelm@61854
  1664
text \<open>
wenzelm@61854
  1665
  Thread-safeness is mainly concerned about concurrent read/write access to
wenzelm@61854
  1666
  shared resources, which are outside the purely functional world of ML. This
wenzelm@61854
  1667
  covers the following in particular.
wenzelm@39867
  1668
wenzelm@61854
  1669
    \<^item> Global references (or arrays), i.e.\ mutable memory cells that persist
wenzelm@61854
  1670
    over several invocations of associated operations.\<^footnote>\<open>This is independent of
wenzelm@61854
  1671
    the visibility of such mutable values in the toplevel scope.\<close>
wenzelm@39867
  1672
wenzelm@61854
  1673
    \<^item> Global state of the running Isabelle/ML process, i.e.\ raw I/O channels,
wenzelm@61854
  1674
    environment variables, current working directory.
wenzelm@39867
  1675
wenzelm@61854
  1676
    \<^item> Writable resources in the file-system that are shared among different
wenzelm@61854
  1677
    threads or external processes.
wenzelm@39867
  1678
wenzelm@61854
  1679
  Isabelle/ML provides various mechanisms to avoid critical shared resources
wenzelm@61854
  1680
  in most situations. As last resort there are some mechanisms for explicit
wenzelm@61854
  1681
  synchronization. The following guidelines help to make Isabelle/ML programs
wenzelm@61854
  1682
  work smoothly in a concurrent environment.
wenzelm@61854
  1683
wenzelm@61854
  1684
  \<^item> Avoid global references altogether. Isabelle/Isar maintains a uniform
wenzelm@61854
  1685
  context that incorporates arbitrary data declared by user programs
wenzelm@61854
  1686
  (\secref{sec:context-data}). This context is passed as plain value and user
wenzelm@61854
  1687
  tools can get/map their own data in a purely functional manner.
wenzelm@61854
  1688
  Configuration options within the context (\secref{sec:config-options})
wenzelm@61854
  1689
  provide simple drop-in replacements for historic reference variables.
wenzelm@39867
  1690
wenzelm@61854
  1691
  \<^item> Keep components with local state information re-entrant. Instead of poking
wenzelm@61854
  1692
  initial values into (private) global references, a new state record can be
wenzelm@61854
  1693
  created on each invocation, and passed through any auxiliary functions of
wenzelm@61854
  1694
  the component. The state record contain mutable references in special
wenzelm@61854
  1695
  situations, without requiring any synchronization, as long as each
wenzelm@61854
  1696
  invocation gets its own copy and the tool itself is single-threaded.
wenzelm@39867
  1697
wenzelm@61854
  1698
  \<^item> Avoid raw output on \<open>stdout\<close> or \<open>stderr\<close>. The Poly/ML library is
wenzelm@61854
  1699
  thread-safe for each individual output operation, but the ordering of
wenzelm@61854
  1700
  parallel invocations is arbitrary. This means raw output will appear on some
wenzelm@61854
  1701
  system console with unpredictable interleaving of atomic chunks.
wenzelm@39867
  1702
wenzelm@39868
  1703
  Note that this does not affect regular message output channels
wenzelm@61854
  1704
  (\secref{sec:message-channels}). An official message id is associated with
wenzelm@61854
  1705
  the command transaction from where it originates, independently of other
wenzelm@61854
  1706
  transactions. This means each running Isar command has effectively its own
wenzelm@61854
  1707
  set of message channels, and interleaving can only happen when commands use
wenzelm@61854
  1708
  parallelism internally (and only at message boundaries).
wenzelm@39867
  1709
wenzelm@61854
  1710
  \<^item> Treat environment variables and the current working directory of the
wenzelm@61854
  1711
  running process as read-only.
wenzelm@39867
  1712
wenzelm@61854
  1713
  \<^item> Restrict writing to the file-system to unique temporary files. Isabelle
wenzelm@61854
  1714
  already provides a temporary directory that is unique for the running
wenzelm@61854
  1715
  process, and there is a centralized source of unique serial numbers in
wenzelm@61854
  1716
  Isabelle/ML. Thus temporary files that are passed to to some external
wenzelm@61854
  1717
  process will be always disjoint, and thus thread-safe.
wenzelm@58618
  1718
\<close>
wenzelm@58618
  1719
wenzelm@58618
  1720
text %mlref \<open>
wenzelm@39868
  1721
  \begin{mldecls}
wenzelm@39868
  1722
  @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
wenzelm@39868
  1723
  @{index_ML serial_string: "unit -> string"} \\
wenzelm@39868
  1724
  \end{mldecls}
wenzelm@39868
  1725
wenzelm@61854
  1726
  \<^descr> @{ML File.tmp_path}~\<open>path\<close> relocates the base component of \<open>path\<close> into the
wenzelm@61854
  1727
  unique temporary directory of the running Isabelle/ML process.
wenzelm@39868
  1728
wenzelm@61854
  1729
  \<^descr> @{ML serial_string}~\<open>()\<close> creates a new serial number that is unique over
wenzelm@61854
  1730
  the runtime of the Isabelle/ML process.
wenzelm@58618
  1731
\<close>
wenzelm@58618
  1732
wenzelm@61854
  1733
text %mlex \<open>
wenzelm@61854
  1734
  The following example shows how to create unique temporary file names.
wenzelm@58618
  1735
\<close>
wenzelm@58618
  1736
wenzelm@59902
  1737
ML_val \<open>
wenzelm@39868
  1738
  val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
wenzelm@39868
  1739
  val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
wenzelm@39868
  1740
  @{assert} (tmp1 <> tmp2);
wenzelm@58618
  1741
\<close>
wenzelm@58618
  1742
wenzelm@58618
  1743
wenzelm@58618
  1744
subsection \<open>Explicit synchronization\<close>
wenzelm@58618
  1745
wenzelm@61854
  1746
text \<open>
wenzelm@61854
  1747
  Isabelle/ML provides explicit synchronization for mutable variables over
wenzelm@59180
  1748
  immutable data, which may be updated atomically and exclusively. This
wenzelm@59180
  1749
  addresses the rare situations where mutable shared resources are really
wenzelm@59180
  1750
  required. Synchronization in Isabelle/ML is based on primitives of Poly/ML,
wenzelm@59180
  1751
  which have been adapted to the specific assumptions of the concurrent
wenzelm@59180
  1752
  Isabelle environment. User code should not break this abstraction, but stay
wenzelm@59180
  1753
  within the confines of concurrent Isabelle/ML.
wenzelm@59180
  1754
wenzelm@61854
  1755
  A \<^emph>\<open>synchronized variable\<close> is an explicit state component associated with
wenzelm@61854
  1756
  mechanisms for locking and signaling. There are operations to await a
wenzelm@59180
  1757
  condition, change the state, and signal the change to all other waiting
wenzelm@61477
  1758
  threads. Synchronized access to the state variable is \<^emph>\<open>not\<close> re-entrant:
wenzelm@61854
  1759
  direct or indirect nesting within the same thread will cause a deadlock!
wenzelm@61854
  1760
\<close>
wenzelm@58618
  1761
wenzelm@58618
  1762
text %mlref \<open>
wenzelm@39867
  1763
  \begin{mldecls}
wenzelm@39871
  1764
  @{index_ML_type "'a Synchronized.var"} \\
wenzelm@39871
  1765
  @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
wenzelm@39871
  1766
  @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
wenzelm@39871
  1767
  ('a -> ('b * 'a) option) -> 'b"} \\
wenzelm@39871
  1768
  \end{mldecls}
wenzelm@39867
  1769
wenzelm@61854
  1770
    \<^descr> Type @{ML_type "'a Synchronized.var"} represents synchronized variables
wenzelm@61854
  1771
    with state of type @{ML_type 'a}.
wenzelm@39871
  1772
wenzelm@61854
  1773
    \<^descr> @{ML Synchronized.var}~\<open>name x\<close> creates a synchronized variable that is
wenzelm@61854
  1774
    initialized with value \<open>x\<close>. The \<open>name\<close> is used for tracing.
wenzelm@61493
  1775
wenzelm@61854
  1776
    \<^descr> @{ML Synchronized.guarded_access}~\<open>var f\<close> lets the function \<open>f\<close> operate
wenzelm@61854
  1777
    within a critical section on the state \<open>x\<close> as follows: if \<open>f x\<close> produces
wenzelm@61854
  1778
    @{ML NONE}, it continues to wait on the internal condition variable,
wenzelm@61854
  1779
    expecting that some other thread will eventually change the content in a
wenzelm@61854
  1780
    suitable manner; if \<open>f x\<close> produces @{ML SOME}~\<open>(y, x')\<close> it is satisfied and
wenzelm@61854
  1781
    assigns the new state value \<open>x'\<close>, broadcasts a signal to all waiting threads
wenzelm@61854
  1782
    on the associated condition variable, and returns the result \<open>y\<close>.
wenzelm@39871
  1783
wenzelm@61854
  1784
  There are some further variants of the @{ML Synchronized.guarded_access}
wenzelm@61854
  1785
  combinator, see @{file "~~/src/Pure/Concurrent/synchronized.ML"} for
wenzelm@61854
  1786
  details.
wenzelm@58618
  1787
\<close>
wenzelm@58618
  1788
wenzelm@61854
  1789
text %mlex \<open>
wenzelm@61854
  1790
  The following example implements a counter that produces positive integers
wenzelm@61854
  1791
  that are unique over the runtime of the Isabelle process:
wenzelm@61854
  1792
\<close>
wenzelm@58618
  1793
wenzelm@59902
  1794
ML_val \<open>
wenzelm@39871
  1795
  local
wenzelm@39871
  1796
    val counter = Synchronized.var "counter" 0;
wenzelm@39871
  1797
  in
wenzelm@39871
  1798
    fun next () =
wenzelm@39871
  1799
      Synchronized.guarded_access counter
wenzelm@39871
  1800
        (fn i =>
wenzelm@39871
  1801
          let val j = i + 1
wenzelm@39871
  1802
          in SOME (j, j) end);
wenzelm@39871
  1803
  end;
wenzelm@59902
  1804
wenzelm@39871
  1805
  val a = next ();
wenzelm@39871
  1806
  val b = next ();
wenzelm@39871
  1807
  @{assert} (a <> b);
wenzelm@58618
  1808
\<close>
wenzelm@58618
  1809
wenzelm@61416
  1810
text \<open>
wenzelm@61416
  1811
  \<^medskip>
wenzelm@61854
  1812
  See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how to implement a mailbox
wenzelm@61854
  1813
  as synchronized variable over a purely functional list.
wenzelm@61854
  1814
\<close>
wenzelm@58618
  1815
wenzelm@58618
  1816
wenzelm@58618
  1817
section \<open>Managed evaluation\<close>
wenzelm@58618
  1818
wenzelm@61854
  1819
text \<open>
wenzelm@61854
  1820
  Execution of Standard ML follows the model of strict functional evaluation
wenzelm@61854
  1821
  with optional exceptions. Evaluation happens whenever some function is
wenzelm@61854
  1822
  applied to (sufficiently many) arguments. The result is either an explicit
wenzelm@61854
  1823
  value or an implicit exception.
wenzelm@52419
  1824
wenzelm@61854
  1825
  \<^emph>\<open>Managed evaluation\<close> in Isabelle/ML organizes expressions and results to
wenzelm@61854
  1826
  control certain physical side-conditions, to say more specifically when and
wenzelm@61854
  1827
  how evaluation happens. For example, the Isabelle/ML library supports lazy
wenzelm@61854
  1828
  evaluation with memoing, parallel evaluation via futures, asynchronous
wenzelm@61854
  1829
  evaluation via promises, evaluation with time limit etc.
wenzelm@52419
  1830
wenzelm@61416
  1831
  \<^medskip>
wenzelm@61854
  1832
  An \<^emph>\<open>unevaluated expression\<close> is represented either as unit abstraction \<^verbatim>\<open>fn
wenzelm@61854
  1833
  () => a\<close> of type \<^verbatim>\<open>unit -> 'a\<close> or as regular function \<^verbatim>\<open>fn a => b\<close> of type
wenzelm@61854
  1834
  \<^verbatim>\<open>'a -> 'b\<close>. Both forms occur routinely, and special care is required to
wenzelm@61854
  1835
  tell them apart --- the static type-system of SML is only of limited help
wenzelm@61854
  1836
  here.
wenzelm@52419
  1837
wenzelm@61854
  1838
  The first form is more intuitive: some combinator \<open>(unit -> 'a) -> 'a\<close>
wenzelm@61854
  1839
  applies the given function to \<open>()\<close> to initiate the postponed evaluation
wenzelm@61854
  1840
  process. The second form is more flexible: some combinator \<open>('a -> 'b) -> 'a
wenzelm@61854
  1841
  -> 'b\<close> acts like a modified form of function application; several such
wenzelm@61854
  1842
  combinators may be cascaded to modify a given function, before it is
wenzelm@61854
  1843
  ultimately applied to some argument.
wenzelm@52419
  1844
wenzelm@61416
  1845
  \<^medskip>
wenzelm@61854
  1846
  \<^emph>\<open>Reified results\<close> make the disjoint sum of regular values versions
wenzelm@61854
  1847
  exceptional situations explicit as ML datatype: \<open>'a result = Res of 'a | Exn
wenzelm@61854
  1848
  of exn\<close>. This is typically used for administrative purposes, to store the
wenzelm@61854
  1849
  overall outcome of an evaluation process.
wenzelm@52419
  1850
wenzelm@61854
  1851
  \<^emph>\<open>Parallel exceptions\<close> aggregate reified results, such that multiple
wenzelm@61854
  1852
  exceptions are digested as a collection in canonical form that identifies
wenzelm@61854
  1853
  exceptions according to their original occurrence. This is particular
wenzelm@61854
  1854
  important for parallel evaluation via futures \secref{sec:futures}, which
wenzelm@61854
  1855
  are organized as acyclic graph of evaluations that depend on other
wenzelm@61854
  1856
  evaluations: exceptions stemming from shared sub-graphs are exposed exactly
wenzelm@61854
  1857
  once and in the order of their original occurrence (e.g.\ when printed at
wenzelm@61854
  1858
  the toplevel). Interrupt counts as neutral element here: it is treated as
wenzelm@61854
  1859
  minimal information about some canceled evaluation process, and is absorbed
wenzelm@61854
  1860
  by the presence of regular program exceptions.
wenzelm@61854
  1861
\<close>
wenzelm@58618
  1862
wenzelm@58618
  1863
text %mlref \<open>
wenzelm@52419
  1864
  \begin{mldecls}
wenzelm@52419
  1865
  @{index_ML_type "'a Exn.result"} \\
wenzelm@52419
  1866
  @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
wenzelm@52419
  1867
  @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
wenzelm@52419
  1868
  @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
wenzelm@52419
  1869
  @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
wenzelm@52419
  1870
  @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
wenzelm@52419
  1871
  \end{mldecls}
wenzelm@52419
  1872
wenzelm@61854
  1873
  \<^descr> Type @{ML_type "'a Exn.result"} represents the disjoint sum of ML results
wenzelm@61854
  1874
  explicitly, with constructor @{ML Exn.Res} for regular values and @{ML
wenzelm@61854
  1875
  "Exn.Exn"} for exceptions.
wenzelm@52419
  1876
wenzelm@61854
  1877
  \<^descr> @{ML Exn.capture}~\<open>f x\<close> manages the evaluation of \<open>f x\<close> such that
wenzelm@61854
  1878
  exceptions are made explicit as @{ML "Exn.Exn"}. Note that this includes
wenzelm@61854
  1879
  physical interrupts (see also \secref{sec:exceptions}), so the same
wenzelm@61854
  1880
  precautions apply to user code: interrupts must not be absorbed
wenzelm@61854
  1881
  accidentally!
wenzelm@52419
  1882
wenzelm@61854
  1883
  \<^descr> @{ML Exn.interruptible_capture} is similar to @{ML Exn.capture}, but
wenzelm@61854
  1884
  interrupts are immediately re-raised as required for user code.
wenzelm@52419
  1885
wenzelm@61854
  1886
  \<^descr> @{ML Exn.release}~\<open>result\<close> releases the original runtime result, exposing
wenzelm@61854
  1887
  its regular value or raising the reified exception.
wenzelm@52419
  1888
wenzelm@61854
  1889
  \<^descr> @{ML Par_Exn.release_all}~\<open>results\<close> combines results that were produced
wenzelm@61854
  1890
  independently (e.g.\ by parallel evaluation). If all results are regular
wenzelm@61854
  1891
  values, that list is returned. Otherwise, the collection of all exceptions
wenzelm@61854
  1892
  is raised, wrapped-up as collective parallel exception. Note that the latter
wenzelm@61854
  1893
  prevents access to individual exceptions by conventional \<^verbatim>\<open>handle\<close> of ML.
wenzelm@52419
  1894
wenzelm@61854
  1895
  \<^descr> @{ML Par_Exn.release_first} is similar to @{ML Par_Exn.release_all}, but
wenzelm@61854
  1896
  only the first (meaningful) exception that has occurred in the original
wenzelm@61854
  1897
  evaluation process is raised again, the others are ignored. That single
wenzelm@61854
  1898
  exception may get handled by conventional means in ML.
wenzelm@58618
  1899
\<close>
wenzelm@58618
  1900
wenzelm@58618
  1901
wenzelm@58618
  1902
subsection \<open>Parallel skeletons \label{sec:parlist}\<close>
wenzelm@58618
  1903
wenzelm@58618
  1904
text \<open>
wenzelm@61854
  1905
  Algorithmic skeletons are combinators that operate on lists in parallel, in
wenzelm@61854
  1906
  the manner of well-known \<open>map\<close>, \<open>exists\<close>, \<open>forall\<close> etc. Management of
wenzelm@61854
  1907
  futures (\secref{sec:futures}) and their results as reified exceptions is
wenzelm@61854
  1908
  wrapped up into simple programming interfaces that resemble the sequential
wenzelm@61854
  1909
  versions.
wenzelm@52420
  1910
wenzelm@61854
  1911
  What remains is the application-specific problem to present expressions with
wenzelm@61854
  1912
  suitable \<^emph>\<open>granularity\<close>: each list element corresponds to one evaluation
wenzelm@61854
  1913
  task. If the granularity is too coarse, the available CPUs are not
wenzelm@61854
  1914
  saturated. If it is too fine-grained, CPU cycles are wasted due to the
wenzelm@61854
  1915
  overhead of organizing parallel processing. In the worst case, parallel
wenzelm@52420
  1916
  performance will be less than the sequential counterpart!
wenzelm@58618
  1917
\<close>
wenzelm@58618
  1918
wenzelm@58618
  1919
text %mlref \<open>
wenzelm@52420
  1920
  \begin{mldecls}
wenzelm@52420
  1921
  @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
wenzelm@52420
  1922
  @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
wenzelm@52420
  1923
  \end{mldecls}
wenzelm@52420
  1924
wenzelm@61854
  1925
  \<^descr> @{ML Par_List.map}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> is like @{ML "map"}~\<open>f [x\<^sub>1, \<dots>,
wenzelm@61854
  1926
  x\<^sub>n]\<close>, but the evaluation of \<open>f x\<^sub>i\<close> for \<open>i = 1, \<dots>, n\<close> is performed in
wenzelm@61854
  1927
  parallel.
wenzelm@61493
  1928
wenzelm@61854
  1929
  An exception in any \<open>f x\<^sub>i\<close> cancels the overall evaluation process. The
wenzelm@61854
  1930
  final result is produced via @{ML Par_Exn.release_first} as explained above,
wenzelm@61854
  1931
  which means the first program exception that happened to occur in the
wenzelm@61854
  1932
  parallel evaluation is propagated, and all other failures are ignored.
wenzelm@52420
  1933
wenzelm@61854
  1934
  \<^descr> @{ML Par_List.get_some}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> produces some \<open>f x\<^sub>i\<close> that is of
wenzelm@61854
  1935
  the form \<open>SOME y\<^sub>i\<close>, if that exists, otherwise \<open>NONE\<close>. Thus it is similar to
wenzelm@61854
  1936
  @{ML Library.get_first}, but subject to a non-deterministic parallel choice
wenzelm@61854
  1937
  process. The first successful result cancels the overall evaluation process;
wenzelm@61854
  1938
  other exceptions are propagated as for @{ML Par_List.map}.
wenzelm@52420
  1939
wenzelm@61854
  1940
  This generic parallel choice combinator is the basis for derived forms, such
wenzelm@61854
  1941
  as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML Par_List.forall}.
wenzelm@58618
  1942
\<close>
wenzelm@58618
  1943
wenzelm@61854
  1944
text %mlex \<open>
wenzelm@61854
  1945
  Subsequently, the Ackermann function is evaluated in parallel for some
wenzelm@61854
  1946
  ranges of arguments.
wenzelm@61854
  1947
\<close>
wenzelm@58618
  1948
wenzelm@58618
  1949
ML_val \<open>
wenzelm@52420
  1950
  fun ackermann 0 n = n + 1
wenzelm@52420
  1951
    | ackermann m 0 = ackermann (m - 1) 1
wenzelm@52420
  1952
    | ackermann m n = ackermann (m - 1) (ackermann m (n - 1));
wenzelm@52420
  1953
wenzelm@52420
  1954
  Par_List.map (ackermann 2) (500 upto 1000);
wenzelm@52420
  1955
  Par_List.map (ackermann 3) (5 upto 10);
wenzelm@58618
  1956
\<close>
wenzelm@58618
  1957
wenzelm@58618
  1958
wenzelm@58618
  1959
subsection \<open>Lazy evaluation\<close>
wenzelm@58618
  1960
wenzelm@58618
  1961
text \<open>
wenzelm@61854
  1962
  Classic lazy evaluation works via the \<open>lazy\<close>~/ \<open>force\<close> pair of operations:
wenzelm@61854
  1963
  \<open>lazy\<close> to wrap an unevaluated expression, and \<open>force\<close> to evaluate it once
wenzelm@61854
  1964
  and store its result persistently. Later invocations of \<open>force\<close> retrieve the
wenzelm@61854
  1965
  stored result without another evaluation. Isabelle/ML refines this idea to
wenzelm@61854
  1966
  accommodate the aspects of multi-threading, synchronous program exceptions
wenzelm@61854
  1967
  and asynchronous interrupts.
wenzelm@57347
  1968
wenzelm@61854
  1969
  The first thread that invokes \<open>force\<close> on an unfinished lazy value changes
wenzelm@61854
  1970
  its state into a \<^emph>\<open>promise\<close> of the eventual result and starts evaluating it.
wenzelm@61854
  1971
  Any other threads that \<open>force\<close> the same lazy value in the meantime need to
wenzelm@61854
  1972
  wait for it to finish, by producing a regular result or program exception.
wenzelm@61854
  1973
  If the evaluation attempt is interrupted, this event is propagated to all
wenzelm@61854
  1974
  waiting threads and the lazy value is reset to its original state.
wenzelm@57347
  1975
wenzelm@57347
  1976
  This means a lazy value is completely evaluated at most once, in a
wenzelm@57347
  1977
  thread-safe manner. There might be multiple interrupted evaluation attempts,
wenzelm@57347
  1978
  and multiple receivers of intermediate interrupt events. Interrupts are
wenzelm@61477
  1979
  \<^emph>\<open>not\<close> made persistent: later evaluation attempts start again from the
wenzelm@57347
  1980
  original expression.
wenzelm@58618
  1981
\<close>
wenzelm@58618
  1982
wenzelm@58618
  1983
text %mlref \<open>
wenzelm@57347
  1984
  \begin{mldecls}
wenzelm@57347
  1985
  @{index_ML_type "'a lazy"} \\
wenzelm@57347
  1986
  @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\
wenzelm@57347
  1987
  @{index_ML Lazy.value: "'a -> 'a lazy"} \\
wenzelm@57347
  1988
  @{index_ML Lazy.force: "'a lazy -> 'a"} \\
wenzelm@57347
  1989
  \end{mldecls}
wenzelm@57347
  1990
wenzelm@61503
  1991
  \<^descr> Type @{ML_type "'a lazy"} represents lazy values over type \<^verbatim>\<open>'a\<close>.
wenzelm@57347
  1992
wenzelm@61854
  1993
  \<^descr> @{ML Lazy.lazy}~\<open>(fn () => e)\<close> wraps the unevaluated expression \<open>e\<close> as
wenzelm@61854
  1994
  unfinished lazy value.
wenzelm@61493
  1995
wenzelm@61854
  1996
  \<^descr> @{ML Lazy.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished lazy value. When
wenzelm@61854
  1997
  forced, it returns \<open>a\<close> without any further evaluation.
wenzelm@57347
  1998
wenzelm@57349
  1999
  There is very low overhead for this proforma wrapping of strict values as
wenzelm@57349
  2000
  lazy values.
wenzelm@57347
  2001
wenzelm@61493
  2002
  \<^descr> @{ML Lazy.force}~\<open>x\<close> produces the result of the lazy value in a
wenzelm@57347
  2003
  thread-safe manner as explained above. Thus it may cause the current thread
wenzelm@57347
  2004
  to wait on a pending evaluation attempt by another thread.
wenzelm@58618
  2005
\<close>
wenzelm@58618
  2006
wenzelm@58618
  2007
wenzelm@58618
  2008
subsection \<open>Futures \label{sec:futures}\<close>
wenzelm@58618
  2009
wenzelm@58618
  2010
text \<open>
wenzelm@57349
  2011
  Futures help to organize parallel execution in a value-oriented manner, with
wenzelm@61854
  2012
  \<open>fork\<close>~/ \<open>join\<close> as the main pair of operations, and some further variants;
wenzelm@61854
  2013
  see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy values,
wenzelm@61854
  2014
  futures are evaluated strictly and spontaneously on separate worker threads.
wenzelm@61854
  2015
  Futures may be canceled, which leads to interrupts on running evaluation
wenzelm@61854
  2016
  attempts, and forces structurally related futures to fail for all time;
wenzelm@61854
  2017
  already finished futures remain unchanged. Exceptions between related
wenzelm@57350
  2018
  futures are propagated as well, and turned into parallel exceptions (see
wenzelm@57350
  2019
  above).
wenzelm@57349
  2020
wenzelm@57349
  2021
  Technically, a future is a single-assignment variable together with a
wenzelm@61854
  2022
  \<^emph>\<open>task\<close> that serves administrative purposes, notably within the \<^emph>\<open>task
wenzelm@61854
  2023
  queue\<close> where new futures are registered for eventual evaluation and the
wenzelm@61854
  2024
  worker threads retrieve their work.
wenzelm@57349
  2025
wenzelm@57350
  2026
  The pool of worker threads is limited, in correlation with the number of
wenzelm@57350
  2027
  physical cores on the machine. Note that allocation of runtime resources may
wenzelm@57350
  2028
  be distorted either if workers yield CPU time (e.g.\ via system sleep or
wenzelm@57350
  2029
  wait operations), or if non-worker threads contend for significant runtime
wenzelm@57350
  2030
  resources independently. There is a limited number of replacement worker
wenzelm@57350
  2031
  threads that get activated in certain explicit wait conditions, after a
wenzelm@57350
  2032
  timeout.
wenzelm@57350
  2033
wenzelm@61416
  2034
  \<^medskip>
wenzelm@61854
  2035
  Each future task belongs to some \<^emph>\<open>task group\<close>, which represents the
wenzelm@61854
  2036
  hierarchic structure of related tasks, together with the exception status a
wenzelm@61854
  2037
  that point. By default, the task group of a newly created future is a new
wenzelm@61854
  2038
  sub-group of the presently running one, but it is also possible to indicate
wenzelm@61854
  2039
  different group layouts under program control.
wenzelm@57349
  2040
wenzelm@57349
  2041
  Cancellation of futures actually refers to the corresponding task group and
wenzelm@57349
  2042
  all its sub-groups. Thus interrupts are propagated down the group hierarchy.
wenzelm@57349
  2043
  Regular program exceptions are treated likewise: failure of the evaluation
wenzelm@57349
  2044
  of some future task affects its own group and all sub-groups. Given a
wenzelm@61854
  2045
  particular task group, its \<^emph>\<open>group status\<close> cumulates all relevant exceptions
wenzelm@61854
  2046
  according to its position within the group hierarchy. Interrupted tasks that
wenzelm@61854
  2047
  lack regular result information, will pick up parallel exceptions from the
wenzelm@61854
  2048
  cumulative group status.
wenzelm@57349
  2049
wenzelm@61416
  2050
  \<^medskip>
wenzelm@61854
  2051
  A \<^emph>\<open>passive future\<close> or \<^emph>\<open>promise\<close> is a future with slightly different
wenzelm@61854
  2052
  evaluation policies: there is only a single-assignment variable and some
wenzelm@61854
  2053
  expression to evaluate for the \<^emph>\<open>failed\<close> case (e.g.\ to clean up resources
wenzelm@61854
  2054
  when canceled). A regular result is produced by external means, using a
wenzelm@61854
  2055
  separate \<^emph>\<open>fulfill\<close> operation.
wenzelm@57349
  2056
wenzelm@57349
  2057
  Promises are managed in the same task queue, so regular futures may depend
wenzelm@57349
  2058
  on them. This allows a form of reactive programming, where some promises are
wenzelm@57349
  2059
  used as minimal elements (or guards) within the future dependency graph:
wenzelm@57349
  2060
  when these promises are fulfilled the evaluation of subsequent futures
wenzelm@57349
  2061
  starts spontaneously, according to their own inter-dependencies.
wenzelm@58618
  2062
\<close>
wenzelm@58618
  2063
wenzelm@58618
  2064
text %mlref \<open>
wenzelm@57348
  2065
  \begin{mldecls}
wenzelm@57348
  2066
  @{index_ML_type "'a future"} \\
wenzelm@57348
  2067
  @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\
wenzelm@57348
  2068
  @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\
wenzelm@57349
  2069
  @{index_ML Future.join: "'a future -> 'a"} \\
wenzelm@57349
  2070
  @{index_ML Future.joins: "'a future list -> 'a list"} \\
wenzelm@57348
  2071
  @{index_ML Future.value: "'a -> 'a future"} \\
wenzelm@57348
  2072
  @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\
wenzelm@57348
  2073
  @{index_ML Future.cancel: "'a future -> unit"} \\
wenzelm@57348
  2074
  @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex]
wenzelm@57348
  2075
  @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\
wenzelm@57348
  2076
  @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\
wenzelm@57348
  2077
  \end{mldecls}
wenzelm@57348
  2078
wenzelm@61503
  2079
  \<^descr> Type @{ML_type "'a future"} represents future values over type \<^verbatim>\<open>'a\<close>.
wenzelm@57348
  2080
wenzelm@61854
  2081
  \<^descr> @{ML Future.fork}~\<open>(fn () => e)\<close> registers the unevaluated expression \<open>e\<close>
wenzelm@61854
  2082
  as unfinished future value, to be evaluated eventually on the parallel
wenzelm@61854
  2083
  worker-thread farm. This is a shorthand for @{ML Future.forks} below, with
wenzelm@61854
  2084
  default parameters and a single expression.
wenzelm@57348
  2085
wenzelm@61854
  2086
  \<^descr> @{ML Future.forks}~\<open>params exprs\<close> is the general interface to fork several
wenzelm@61854
  2087
  futures simultaneously. The \<open>params\<close> consist of the following fields:
wenzelm@57348
  2088
wenzelm@61854
  2089
    \<^item> \<open>name : string\<close> (default @{ML "\"\""}) specifies a common name for the
wenzelm@61854
  2090
    tasks of the forked futures, which serves diagnostic purposes.
wenzelm@61458
  2091
wenzelm@61854
  2092
    \<^item> \<open>group : Future.group option\<close> (default @{ML NONE}) specifies an optional
wenzelm@61854
  2093
    task group for the forked futures. @{ML NONE} means that a new sub-group
wenzelm@61854
  2094
    of the current worker-thread task context is created. If this is not a
wenzelm@61854
  2095
    worker thread, the group will be a new root in the group hierarchy.
wenzelm@61458
  2096
wenzelm@61854
  2097
    \<^item> \<open>deps : Future.task list\<close> (default @{ML "[]"}) specifies dependencies on
wenzelm@61854
  2098
    other future tasks, i.e.\ the adjacency relation in the global task queue.
wenzelm@61854
  2099
    Dependencies on already finished tasks are ignored.
wenzelm@61458
  2100
wenzelm@61854
  2101
    \<^item> \<open>pri : int\<close> (default @{ML 0}) specifies a priority within the task
wenzelm@61854
  2102
    queue.
wenzelm@61458
  2103
wenzelm@61854
  2104
    Typically there is only little deviation from the default priority @{ML
wenzelm@61854
  2105
    0}. As a rule of thumb, @{ML "~1"} means ``low priority" and @{ML 1} means
wenzelm@61458
  2106
    ``high priority''.
wenzelm@61458
  2107
wenzelm@61854
  2108
    Note that the task priority only affects the position in the queue, not
wenzelm@61854
  2109
    the thread priority. When a worker thread picks up a task for processing,
wenzelm@61854
  2110
    it runs with the normal thread priority to the end (or until canceled).
wenzelm@61854
  2111
    Higher priority tasks that are queued later need to wait until this (or
wenzelm@61854
  2112
    another) worker thread becomes free again.
wenzelm@61458
  2113
wenzelm@61854
  2114
    \<^item> \<open>interrupts : bool\<close> (default @{ML true}) tells whether the worker thread
wenzelm@61854
  2115
    that processes the corresponding task is initially put into interruptible
wenzelm@61854
  2116
    state. This state may change again while running, by modifying the thread
wenzelm@61854
  2117
    attributes.
wenzelm@61458
  2118
wenzelm@61854
  2119
    With interrupts disabled, a running future task cannot be canceled. It is
wenzelm@61458
  2120
    the responsibility of the programmer that this special state is retained
wenzelm@61458
  2121
    only briefly.
wenzelm@57348
  2122
wenzelm@61854
  2123
  \<^descr> @{ML Future.join}~\<open>x\<close> retrieves the value of an already finished future,
wenzelm@61854
  2124
  which may lead to an exception, according to the result of its previous
wenzelm@61854
  2125
  evaluation.
wenzelm@57348
  2126
wenzelm@57348
  2127
  For an unfinished future there are several cases depending on the role of
wenzelm@57348
  2128
  the current thread and the status of the future. A non-worker thread waits
wenzelm@57348
  2129
  passively until the future is eventually evaluated. A worker thread
wenzelm@57348
  2130
  temporarily changes its task context and takes over the responsibility to
wenzelm@57349
  2131
  evaluate the future expression on the spot. The latter is done in a
wenzelm@57349
  2132
  thread-safe manner: other threads that intend to join the same future need
wenzelm@57349
  2133
  to wait until the ongoing evaluation is finished.
wenzelm@57349
  2134
wenzelm@57349
  2135
  Note that excessive use of dynamic dependencies of futures by adhoc joining
wenzelm@57349
  2136
  may lead to bad utilization of CPU cores, due to threads waiting on other
wenzelm@57349
  2137
  threads to finish required futures. The future task farm has a limited
wenzelm@57349
  2138
  amount of replacement threads that continue working on unrelated tasks after
wenzelm@57349
  2139
  some timeout.
wenzelm@57348
  2140
wenzelm@57348
  2141
  Whenever possible, static dependencies of futures should be specified
wenzelm@61854
  2142
  explicitly when forked (see \<open>deps\<close> above). Thus the evaluation can work from
wenzelm@61854
  2143
  the bottom up, without join conflicts and wait states.
wenzelm@57349
  2144
wenzelm@61854
  2145
  \<^descr> @{ML Future.joins}~\<open>xs\<close> joins the given list of futures simultaneously,
wenzelm@61854
  2146
  which is more efficient than @{ML "map Future.join"}~\<open>xs\<close>.
wenzelm@57349
  2147
wenzelm@57349
  2148
  Based on the dependency graph of tasks, the current thread takes over the
wenzelm@57349
  2149
  responsibility to evaluate future expressions that are required for the main
wenzelm@57349
  2150
  result, working from the bottom up. Waiting on future results that are
wenzelm@57349
  2151
  presently evaluated on other threads only happens as last resort, when no
wenzelm@57349
  2152
  other unfinished futures are left over.
wenzelm@57349
  2153
wenzelm@61854
  2154
  \<^descr> @{ML Future.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished future value,
wenzelm@61854
  2155
  bypassing the worker-thread farm. When joined, it returns \<open>a\<close> without any
wenzelm@61854
  2156
  further evaluation.
wenzelm@57349
  2157
wenzelm@57349
  2158
  There is very low overhead for this proforma wrapping of strict values as
wenzelm@57421
  2159
  futures.
wenzelm@57348
  2160
wenzelm@61493
  2161
  \<^descr> @{ML Future.map}~\<open>f x\<close> is a fast-path implementation of @{ML
wenzelm@61854
  2162
  Future.fork}~\<open>(fn () => f (\<close>@{ML Future.join}~\<open>x))\<close>, which avoids the full
wenzelm@61854
  2163
  overhead of the task queue and worker-thread farm as far as possible. The
wenzelm@61854
  2164
  function \<open>f\<close> is supposed to be some trivial post-processing or projection of
wenzelm@61854
  2165
  the future result.
wenzelm@57348
  2166
wenzelm@61854
  2167
  \<^descr> @{ML Future.cancel}~\<open>x\<close> cancels the task group of the given future, using
wenzelm@61854
  2168
  @{ML Future.cancel_group} below.
wenzelm@57348
  2169
wenzelm@61854
  2170
  \<^descr> @{ML Future.cancel_group}~\<open>group\<close> cancels all tasks of the given task
wenzelm@61854
  2171
  group for all time. Threads that are presently processing a task of the
wenzelm@61854
  2172
  given group are interrupted: it may take some time until they are actually
wenzelm@61854
  2173
  terminated. Tasks that are queued but not yet processed are dequeued and
wenzelm@61854
  2174
  forced into interrupted state. Since the task group is itself invalidated,
wenzelm@61854
  2175
  any further attempt to fork a future that belongs to it will yield a
wenzelm@61854
  2176
  canceled result as well.
wenzelm@57348
  2177
wenzelm@61854
  2178
  \<^descr> @{ML Future.promise}~\<open>abort\<close> registers a passive future with the given
wenzelm@61854
  2179
  \<open>abort\<close> operation: it is invoked when the future task group is canceled.
wenzelm@57348
  2180
wenzelm@61854
  2181
  \<^descr> @{ML Future.fulfill}~\<open>x a\<close> finishes the passive future \<open>x\<close> by the given
wenzelm@61854
  2182
  value \<open>a\<close>. If the promise has already been canceled, the attempt to fulfill
wenzelm@61854
  2183
  it causes an exception.
wenzelm@58618
  2184
\<close>
wenzelm@57348
  2185
bulwahn@47180
  2186
end