| author | wenzelm | 
| Fri, 09 Jun 2017 21:57:30 +0200 | |
| changeset 66055 | 07175635f78c | 
| parent 64465 | 73069f272f42 | 
| child 66663 | 49318345c332 | 
| permissions | -rw-r--r-- | 
| 61656 | 1 | (*:maxLineLen=78:*) | 
| 57347 | 2 | |
| 29755 | 3 | theory "ML" | 
| 4 | imports Base | |
| 5 | begin | |
| 18538 | 6 | |
| 58618 | 7 | chapter \<open>Isabelle/ML\<close> | 
| 8 | ||
| 61854 | 9 | text \<open> | 
| 10 | Isabelle/ML is best understood as a certain culture based on Standard ML. | |
| 11 | Thus it is not a new programming language, but a certain way to use SML at | |
| 12 | an advanced level within the Isabelle environment. This covers a variety of | |
| 13 | aspects that are geared towards an efficient and robust platform for | |
| 14 | applications of formal logic with fully foundational proof construction --- | |
| 15 | according to the well-known \<^emph>\<open>LCF principle\<close>. There is specific | |
| 16 | infrastructure with library modules to address the needs of this difficult | |
| 17 | task. For example, the raw parallel programming model of Poly/ML is | |
| 18 | presented as considerably more abstract concept of \<^emph>\<open>futures\<close>, which is then | |
| 19 | used to augment the inference kernel, Isar theory and proof interpreter, and | |
| 20 | PIDE document management. | |
| 39823 | 21 | |
| 61854 | 22 | The main aspects of Isabelle/ML are introduced below. These first-hand | 
| 23 | explanations should help to understand how proper Isabelle/ML is to be read | |
| 24 | and written, and to get access to the wealth of experience that is expressed | |
| 63680 | 25 | in the source text and its history of changes.\<^footnote>\<open>See | 
| 26 | \<^url>\<open>http://isabelle.in.tum.de/repos/isabelle\<close> for the full Mercurial history. | |
| 61854 | 27 | There are symbolic tags to refer to official Isabelle releases, as opposed | 
| 28 | to arbitrary \<^emph>\<open>tip\<close> versions that merely reflect snapshots that are never | |
| 29 | really up-to-date.\<close> | |
| 30 | \<close> | |
| 58618 | 31 | |
| 32 | ||
| 33 | section \<open>Style and orthography\<close> | |
| 34 | ||
| 61854 | 35 | text \<open> | 
| 36 | The sources of Isabelle/Isar are optimized for \<^emph>\<open>readability\<close> and | |
| 37 | \<^emph>\<open>maintainability\<close>. The main purpose is to tell an informed reader what is | |
| 38 | really going on and how things really work. This is a non-trivial aim, but | |
| 39 | it is supported by a certain style of writing Isabelle/ML that has emerged | |
| 40 | from long years of system development.\<^footnote>\<open>See also the interesting style guide | |
| 63680 | 41 | for OCaml \<^url>\<open>http://caml.inria.fr/resources/doc/guides/guidelines.en.html\<close> | 
| 42 | which shares many of our means and ends.\<close> | |
| 39878 | 43 | |
| 61854 | 44 | The main principle behind any coding style is \<^emph>\<open>consistency\<close>. For a single | 
| 45 | author of a small program this merely means ``choose your style and stick to | |
| 46 | it''. A complex project like Isabelle, with long years of development and | |
| 47 | different contributors, requires more standardization. A coding style that | |
| 48 | is changed every few years or with every new contributor is no style at all, | |
| 49 | because consistency is quickly lost. Global consistency is hard to achieve, | |
| 50 | though. Nonetheless, one should always strive at least for local consistency | |
| 51 | of modules and sub-systems, without deviating from some general principles | |
| 52 | how to write Isabelle/ML. | |
| 39878 | 53 | |
| 61854 | 54 | In a sense, good coding style is like an \<^emph>\<open>orthography\<close> for the sources: it | 
| 55 | helps to read quickly over the text and see through the main points, without | |
| 56 | getting distracted by accidental presentation of free-style code. | |
| 58618 | 57 | \<close> | 
| 58 | ||
| 59 | ||
| 60 | subsection \<open>Header and sectioning\<close> | |
| 61 | ||
| 61854 | 62 | text \<open> | 
| 63 | Isabelle source files have a certain standardized header format (with | |
| 64 | precise spacing) that follows ancient traditions reaching back to the | |
| 63680 | 65 | earliest versions of the system by Larry Paulson. See | 
| 66 | \<^file>\<open>~~/src/Pure/thm.ML\<close>, for example. | |
| 39878 | 67 | |
| 61854 | 68 | The header includes at least \<^verbatim>\<open>Title\<close> and \<^verbatim>\<open>Author\<close> entries, followed by a | 
| 69 | prose description of the purpose of the module. The latter can range from a | |
| 70 | single line to several paragraphs of explanations. | |
| 39878 | 71 | |
| 63610 | 72 | The rest of the file is divided into chapters, sections, subsections, | 
| 73 | subsubsections, paragraphs etc.\ using a simple layout via ML comments as | |
| 74 | follows. | |
| 39878 | 75 | |
| 61458 | 76 |   @{verbatim [display]
 | 
| 63610 | 77 | \<open> (**** chapter ****) | 
| 78 | ||
| 79 | (*** section ***) | |
| 58723 | 80 | |
| 81 | (** subsection **) | |
| 82 | ||
| 83 | (* subsubsection *) | |
| 84 | ||
| 85 | (*short paragraph*) | |
| 86 | ||
| 87 | (* | |
| 88 | long paragraph, | |
| 89 | with more text | |
| 61458 | 90 | *)\<close>} | 
| 39878 | 91 | |
| 61854 | 92 | As in regular typography, there is some extra space \<^emph>\<open>before\<close> section | 
| 93 | headings that are adjacent to plain text, but not other headings as in the | |
| 94 | example above. | |
| 39878 | 95 | |
| 61416 | 96 | \<^medskip> | 
| 61854 | 97 | The precise wording of the prose text given in these headings is chosen | 
| 98 | carefully to introduce the main theme of the subsequent formal ML text. | |
| 58618 | 99 | \<close> | 
| 100 | ||
| 101 | ||
| 102 | subsection \<open>Naming conventions\<close> | |
| 103 | ||
| 61854 | 104 | text \<open> | 
| 105 | Since ML is the primary medium to express the meaning of the source text, | |
| 106 | naming of ML entities requires special care. | |
| 107 | \<close> | |
| 61506 | 108 | |
| 109 | paragraph \<open>Notation.\<close> | |
| 61854 | 110 | text \<open> | 
| 111 | A name consists of 1--3 \<^emph>\<open>words\<close> (rarely 4, but not more) that are separated | |
| 112 | by underscore. There are three variants concerning upper or lower case | |
| 113 | letters, which are used for certain ML categories as follows: | |
| 39880 | 114 | |
| 61416 | 115 | \<^medskip> | 
| 39880 | 116 |   \begin{tabular}{lll}
 | 
| 117 | variant & example & ML categories \\\hline | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 118 |   lower-case & @{ML_text foo_bar} & values, types, record fields \\
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 119 |   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 120 |   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
 | 
| 39880 | 121 |   \end{tabular}
 | 
| 61416 | 122 | \<^medskip> | 
| 39880 | 123 | |
| 61854 | 124 | For historical reasons, many capitalized names omit underscores, e.g.\ | 
| 125 |   old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}. Genuine
 | |
| 126 | mixed-case names are \<^emph>\<open>not\<close> used, because clear division of words is | |
| 127 | essential for readability.\<^footnote>\<open>Camel-case was invented to workaround the lack | |
| 128 | of underscore in some early non-ASCII character sets. Later it became | |
| 129 | habitual in some language communities that are now strong in numbers.\<close> | |
| 39880 | 130 | |
| 61854 | 131 | A single (capital) character does not count as ``word'' in this respect: | 
| 132 |   some Isabelle/ML names are suffixed by extra markers like this: @{ML_text
 | |
| 133 | foo_barT}. | |
| 39881 | 134 | |
| 61854 | 135 |   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text foo'},
 | 
| 136 |   @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text foo''''} or more.
 | |
| 137 |   Decimal digits scale better to larger numbers, e.g.\ @{ML_text foo0},
 | |
| 138 |   @{ML_text foo1}, @{ML_text foo42}.
 | |
| 61506 | 139 | \<close> | 
| 140 | ||
| 64464 | 141 | paragraph \<open>Scopes.\<close> | 
| 61854 | 142 | text \<open> | 
| 143 | Apart from very basic library modules, ML structures are not ``opened'', but | |
| 144 |   names are referenced with explicit qualification, as in @{ML
 | |
| 61506 | 145 | Syntax.string_of_term} for example. When devising names for structures and | 
| 146 | their components it is important to aim at eye-catching compositions of both | |
| 147 | parts, because this is how they are seen in the sources and documentation. | |
| 148 | For the same reasons, aliases of well-known library functions should be | |
| 40126 | 149 | avoided. | 
| 39880 | 150 | |
| 61854 | 151 | Local names of function abstraction or case/let bindings are typically | 
| 152 | shorter, sometimes using only rudiments of ``words'', while still avoiding | |
| 153 |   cryptic shorthands. An auxiliary function called @{ML_text helper},
 | |
| 154 |   @{ML_text aux}, or @{ML_text f} is considered bad style.
 | |
| 39880 | 155 | |
| 156 | Example: | |
| 157 | ||
| 61458 | 158 |   @{verbatim [display]
 | 
| 159 | \<open> (* RIGHT *) | |
| 39880 | 160 | |
| 161 | fun print_foo ctxt foo = | |
| 162 | let | |
| 39881 | 163 | fun print t = ... Syntax.string_of_term ctxt t ... | 
| 164 | in ... end; | |
| 165 | ||
| 166 | ||
| 167 | (* RIGHT *) | |
| 168 | ||
| 169 | fun print_foo ctxt foo = | |
| 170 | let | |
| 39880 | 171 | val string_of_term = Syntax.string_of_term ctxt; | 
| 172 | fun print t = ... string_of_term t ... | |
| 173 | in ... end; | |
| 174 | ||
| 175 | ||
| 176 | (* WRONG *) | |
| 177 | ||
| 178 | val string_of_term = Syntax.string_of_term; | |
| 179 | ||
| 180 | fun print_foo ctxt foo = | |
| 181 | let | |
| 182 | fun aux t = ... string_of_term ctxt t ... | |
| 61458 | 183 | in ... end;\<close>} | 
| 61506 | 184 | \<close> | 
| 185 | ||
| 186 | paragraph \<open>Specific conventions.\<close> | |
| 61854 | 187 | text \<open> | 
| 188 | Here are some specific name forms that occur frequently in the sources. | |
| 39881 | 189 | |
| 61854 | 190 |   \<^item> A function that maps @{ML_text foo} to @{ML_text bar} is called @{ML_text
 | 
| 191 |   foo_to_bar} or @{ML_text bar_of_foo} (never @{ML_text foo2bar}, nor
 | |
| 192 |   @{ML_text bar_from_foo}, nor @{ML_text bar_for_foo}, nor @{ML_text
 | |
| 193 | bar4foo}). | |
| 39881 | 194 | |
| 61854 | 195 |   \<^item> The name component @{ML_text legacy} means that the operation is about to
 | 
| 196 | be discontinued soon. | |
| 39881 | 197 | |
| 61854 | 198 |   \<^item> The name component @{ML_text global} means that this works with the
 | 
| 199 | background theory instead of the regular local context | |
| 200 |   (\secref{sec:context}), sometimes for historical reasons, sometimes due a
 | |
| 201 | genuine lack of locality of the concept involved, sometimes as a fall-back | |
| 202 | for the lack of a proper context in the application code. Whenever there is | |
| 203 | a non-global variant available, the application should be migrated to use it | |
| 204 | with a proper local context. | |
| 39881 | 205 | |
| 61854 | 206 | \<^item> Variables of the main context types of the Isabelle/Isar framework | 
| 207 |   (\secref{sec:context} and \chref{ch:local-theory}) have firm naming
 | |
| 208 | conventions as follows: | |
| 39881 | 209 | |
| 61459 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 wenzelm parents: 
61458diff
changeset | 210 |     \<^item> theories are called @{ML_text thy}, rarely @{ML_text theory}
 | 
| 61458 | 211 |     (never @{ML_text thry})
 | 
| 212 | ||
| 61459 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 wenzelm parents: 
61458diff
changeset | 213 |     \<^item> proof contexts are called @{ML_text ctxt}, rarely @{ML_text
 | 
| 61458 | 214 |     context} (never @{ML_text ctx})
 | 
| 215 | ||
| 61459 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 wenzelm parents: 
61458diff
changeset | 216 |     \<^item> generic contexts are called @{ML_text context}
 | 
| 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 wenzelm parents: 
61458diff
changeset | 217 | |
| 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 wenzelm parents: 
61458diff
changeset | 218 |     \<^item> local theories are called @{ML_text lthy}, except for local
 | 
| 61458 | 219 | theories that are treated as proof context (which is a semantic | 
| 220 | super-type) | |
| 39881 | 221 | |
| 61854 | 222 | Variations with primed or decimal numbers are always possible, as well as | 
| 223 |   semantic prefixes like @{ML_text foo_thy} or @{ML_text bar_ctxt}, but the
 | |
| 224 | base conventions above need to be preserved. This allows to emphasize their | |
| 225 | data flow via plain regular expressions in the text editor. | |
| 39881 | 226 | |
| 61854 | 227 |   \<^item> The main logical entities (\secref{ch:logic}) have established naming
 | 
| 228 | convention as follows: | |
| 39881 | 229 | |
| 61459 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 wenzelm parents: 
61458diff
changeset | 230 |     \<^item> sorts are called @{ML_text S}
 | 
| 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 wenzelm parents: 
61458diff
changeset | 231 | |
| 61854 | 232 |     \<^item> types are called @{ML_text T}, @{ML_text U}, or @{ML_text ty} (never
 | 
| 233 |     @{ML_text t})
 | |
| 61458 | 234 | |
| 61854 | 235 |     \<^item> terms are called @{ML_text t}, @{ML_text u}, or @{ML_text tm} (never
 | 
| 236 |     @{ML_text trm})
 | |
| 61458 | 237 | |
| 61854 | 238 |     \<^item> certified types are called @{ML_text cT}, rarely @{ML_text T}, with
 | 
| 239 | variants as for types | |
| 61458 | 240 | |
| 61854 | 241 |     \<^item> certified terms are called @{ML_text ct}, rarely @{ML_text t}, with
 | 
| 242 |     variants as for terms (never @{ML_text ctrm})
 | |
| 61458 | 243 | |
| 61459 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 wenzelm parents: 
61458diff
changeset | 244 |     \<^item> theorems are called @{ML_text th}, or @{ML_text thm}
 | 
| 39881 | 245 | |
| 61854 | 246 | Proper semantic names override these conventions completely. For example, | 
| 247 |   the left-hand side of an equation (as a term) can be called @{ML_text lhs}
 | |
| 248 |   (not @{ML_text lhs_tm}). Or a term that is known to be a variable can be
 | |
| 249 |   called @{ML_text v} or @{ML_text x}.
 | |
| 39881 | 250 | |
| 61854 | 251 |   \<^item> Tactics (\secref{sec:tactics}) are sufficiently important to have specific
 | 
| 252 | naming conventions. The name of a basic tactic definition always has a | |
| 253 |   @{ML_text "_tac"} suffix, the subgoal index (if applicable) is always called
 | |
| 254 |   @{ML_text i}, and the goal state (if made explicit) is usually called
 | |
| 255 |   @{ML_text st} instead of the somewhat misleading @{ML_text thm}. Any other
 | |
| 256 | arguments are given before the latter two, and the general context is given | |
| 257 | first. Example: | |
| 40310 | 258 | |
| 61458 | 259 |   @{verbatim [display] \<open>  fun my_tac ctxt arg1 arg2 i st = ...\<close>}
 | 
| 40310 | 260 | |
| 61854 | 261 |   Note that the goal state @{ML_text st} above is rarely made explicit, if
 | 
| 262 | tactic combinators (tacticals) are used as usual. | |
| 40310 | 263 | |
| 57421 | 264 | A tactic that requires a proof context needs to make that explicit as seen | 
| 61854 | 265 | in the \<^verbatim>\<open>ctxt\<close> argument above. Do not refer to the background theory of | 
| 266 | \<^verbatim>\<open>st\<close> -- it is not a proper context, but merely a formal certificate. | |
| 58618 | 267 | \<close> | 
| 268 | ||
| 269 | ||
| 270 | subsection \<open>General source layout\<close> | |
| 271 | ||
| 272 | text \<open> | |
| 57421 | 273 | The general Isabelle/ML source layout imitates regular type-setting | 
| 274 | conventions, augmented by the requirements for deeply nested expressions | |
| 61854 | 275 | that are commonplace in functional programming. | 
| 276 | \<close> | |
| 61506 | 277 | |
| 278 | paragraph \<open>Line length\<close> | |
| 61854 | 279 | text \<open> | 
| 280 | is limited to 80 characters according to ancient standards, but we allow as | |
| 281 | much as 100 characters (not more).\<^footnote>\<open>Readability requires to keep the | |
| 282 | beginning of a line in view while watching its end. Modern wide-screen | |
| 61506 | 283 | displays do not change the way how the human brain works. Sources also need | 
| 61572 | 284 | to be printable on plain paper with reasonable font-size.\<close> The extra 20 | 
| 61506 | 285 | characters acknowledge the space requirements due to qualified library | 
| 61854 | 286 | references in Isabelle/ML. | 
| 287 | \<close> | |
| 61506 | 288 | |
| 289 | paragraph \<open>White-space\<close> | |
| 61854 | 290 | text \<open> | 
| 291 | is used to emphasize the structure of expressions, following mostly standard | |
| 292 |   conventions for mathematical typesetting, as can be seen in plain {\TeX} or
 | |
| 293 |   {\LaTeX}. This defines positioning of spaces for parentheses, punctuation,
 | |
| 294 | and infixes as illustrated here: | |
| 39878 | 295 | |
| 61458 | 296 |   @{verbatim [display]
 | 
| 297 | \<open> val x = y + z * (a + b); | |
| 39878 | 298 | val pair = (a, b); | 
| 61458 | 299 |   val record = {foo = 1, bar = 2};\<close>}
 | 
| 39878 | 300 | |
| 61854 | 301 | Lines are normally broken \<^emph>\<open>after\<close> an infix operator or punctuation | 
| 302 | character. For example: | |
| 39878 | 303 | |
| 61458 | 304 |   @{verbatim [display]
 | 
| 305 | \<open> | |
| 39878 | 306 | val x = | 
| 307 | a + | |
| 308 | b + | |
| 309 | c; | |
| 310 | ||
| 311 | val tuple = | |
| 312 | (a, | |
| 313 | b, | |
| 314 | c); | |
| 61458 | 315 | \<close>} | 
| 39878 | 316 | |
| 61854 | 317 |   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the start of the
 | 
| 318 | line, but punctuation is always at the end. | |
| 39878 | 319 | |
| 61854 | 320 | Function application follows the tradition of \<open>\<lambda>\<close>-calculus, not informal | 
| 321 |   mathematics. For example: @{ML_text "f a b"} for a curried function, or
 | |
| 322 |   @{ML_text "g (a, b)"} for a tupled function. Note that the space between
 | |
| 323 |   @{ML_text g} and the pair @{ML_text "(a, b)"} follows the important
 | |
| 324 |   principle of \<^emph>\<open>compositionality\<close>: the layout of @{ML_text "g p"} does not
 | |
| 325 |   change when @{ML_text "p"} is refined to the concrete pair @{ML_text "(a,
 | |
| 326 | b)"}. | |
| 61506 | 327 | \<close> | 
| 328 | ||
| 329 | paragraph \<open>Indentation\<close> | |
| 61854 | 330 | text \<open> | 
| 331 | uses plain spaces, never hard tabulators.\<^footnote>\<open>Tabulators were invented to move | |
| 332 | the carriage of a type-writer to certain predefined positions. In software | |
| 333 | they could be used as a primitive run-length compression of consecutive | |
| 334 | spaces, but the precise result would depend on non-standardized text editor | |
| 335 | configuration.\<close> | |
| 39878 | 336 | |
| 61854 | 337 | Each level of nesting is indented by 2 spaces, sometimes 1, very rarely 4, | 
| 338 | never 8 or any other odd number. | |
| 39878 | 339 | |
| 61854 | 340 | Indentation follows a simple logical format that only depends on the nesting | 
| 341 | depth, not the accidental length of the text that initiates a level of | |
| 342 | nesting. Example: | |
| 39878 | 343 | |
| 61458 | 344 |   @{verbatim [display]
 | 
| 345 | \<open> (* RIGHT *) | |
| 39880 | 346 | |
| 39878 | 347 | if b then | 
| 39879 | 348 | expr1_part1 | 
| 349 | expr1_part2 | |
| 39878 | 350 | else | 
| 39879 | 351 | expr2_part1 | 
| 352 | expr2_part2 | |
| 39878 | 353 | |
| 39880 | 354 | |
| 355 | (* WRONG *) | |
| 356 | ||
| 39879 | 357 | if b then expr1_part1 | 
| 358 | expr1_part2 | |
| 359 | else expr2_part1 | |
| 61458 | 360 | expr2_part2\<close>} | 
| 39878 | 361 | |
| 61854 | 362 | The second form has many problems: it assumes a fixed-width font when | 
| 363 | viewing the sources, it uses more space on the line and thus makes it hard | |
| 364 | to observe its strict length limit (working against \<^emph>\<open>readability\<close>), it | |
| 365 | requires extra editing to adapt the layout to changes of the initial text | |
| 366 | (working against \<^emph>\<open>maintainability\<close>) etc. | |
| 39878 | 367 | |
| 61416 | 368 | \<^medskip> | 
| 61854 | 369 | For similar reasons, any kind of two-dimensional or tabular layouts, | 
| 370 | ASCII-art with lines or boxes of asterisks etc.\ should be avoided. | |
| 61506 | 371 | \<close> | 
| 372 | ||
| 373 | paragraph \<open>Complex expressions\<close> | |
| 61854 | 374 | text \<open> | 
| 375 |   that consist of multi-clausal function definitions, @{ML_text handle},
 | |
| 61506 | 376 |   @{ML_text case}, @{ML_text let} (and combinations) require special
 | 
| 377 | attention. The syntax of Standard ML is quite ambitious and admits a lot of | |
| 40126 | 378 | variance that can distort the meaning of the text. | 
| 39881 | 379 | |
| 57421 | 380 |   Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
 | 
| 61854 | 381 |   @{ML_text case} get extra indentation to indicate the nesting clearly.
 | 
| 382 | Example: | |
| 39881 | 383 | |
| 61458 | 384 |   @{verbatim [display]
 | 
| 385 | \<open> (* RIGHT *) | |
| 39881 | 386 | |
| 387 | fun foo p1 = | |
| 388 | expr1 | |
| 389 | | foo p2 = | |
| 390 | expr2 | |
| 391 | ||
| 392 | ||
| 393 | (* WRONG *) | |
| 394 | ||
| 395 | fun foo p1 = | |
| 396 | expr1 | |
| 397 | | foo p2 = | |
| 61458 | 398 | expr2\<close>} | 
| 39881 | 399 | |
| 61854 | 400 |   Body expressions consisting of @{ML_text case} or @{ML_text let} require
 | 
| 401 | care to maintain compositionality, to prevent loss of logical indentation | |
| 402 | where it is especially important to see the structure of the text. Example: | |
| 39881 | 403 | |
| 61458 | 404 |   @{verbatim [display]
 | 
| 405 | \<open> (* RIGHT *) | |
| 39881 | 406 | |
| 407 | fun foo p1 = | |
| 408 | (case e of | |
| 409 | q1 => ... | |
| 410 | | q2 => ...) | |
| 411 | | foo p2 = | |
| 412 | let | |
| 413 | ... | |
| 414 | in | |
| 415 | ... | |
| 416 | end | |
| 417 | ||
| 418 | ||
| 419 | (* WRONG *) | |
| 420 | ||
| 421 | fun foo p1 = case e of | |
| 422 | q1 => ... | |
| 423 | | q2 => ... | |
| 424 | | foo p2 = | |
| 425 | let | |
| 426 | ... | |
| 427 | in | |
| 428 | ... | |
| 61458 | 429 | end\<close>} | 
| 39881 | 430 | |
| 61854 | 431 |   Extra parentheses around @{ML_text case} expressions are optional, but help
 | 
| 432 | to analyse the nesting based on character matching in the text editor. | |
| 39881 | 433 | |
| 61416 | 434 | \<^medskip> | 
| 61854 | 435 | There are two main exceptions to the overall principle of compositionality | 
| 436 | in the layout of complex expressions. | |
| 39881 | 437 | |
| 61416 | 438 |   \<^enum> @{ML_text "if"} expressions are iterated as if ML had multi-branch
 | 
| 57421 | 439 | conditionals, e.g. | 
| 39881 | 440 | |
| 61458 | 441 |   @{verbatim [display]
 | 
| 442 | \<open> (* RIGHT *) | |
| 39881 | 443 | |
| 444 | if b1 then e1 | |
| 445 | else if b2 then e2 | |
| 61458 | 446 | else e3\<close>} | 
| 39881 | 447 | |
| 61854 | 448 |   \<^enum> @{ML_text fn} abstractions are often layed-out as if they would lack any
 | 
| 449 | structure by themselves. This traditional form is motivated by the | |
| 450 | possibility to shift function arguments back and forth wrt.\ additional | |
| 451 | combinators. Example: | |
| 39881 | 452 | |
| 61458 | 453 |   @{verbatim [display]
 | 
| 454 | \<open> (* RIGHT *) | |
| 39881 | 455 | |
| 456 | fun foo x y = fold (fn z => | |
| 61458 | 457 | expr)\<close>} | 
| 39881 | 458 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 459 |   Here the visual appearance is that of three arguments @{ML_text x},
 | 
| 57421 | 460 |   @{ML_text y}, @{ML_text z} in a row.
 | 
| 39881 | 461 | |
| 462 | ||
| 61854 | 463 | Such weakly structured layout should be use with great care. Here are some | 
| 464 |   counter-examples involving @{ML_text let} expressions:
 | |
| 39881 | 465 | |
| 61458 | 466 |   @{verbatim [display]
 | 
| 467 | \<open> (* WRONG *) | |
| 39881 | 468 | |
| 469 | fun foo x = let | |
| 470 | val y = ... | |
| 471 | in ... end | |
| 472 | ||
| 41162 | 473 | |
| 474 | (* WRONG *) | |
| 475 | ||
| 40153 | 476 | fun foo x = let | 
| 477 | val y = ... | |
| 478 | in ... end | |
| 479 | ||
| 41162 | 480 | |
| 481 | (* WRONG *) | |
| 482 | ||
| 39881 | 483 | fun foo x = | 
| 484 | let | |
| 485 | val y = ... | |
| 486 | in ... end | |
| 57421 | 487 | |
| 488 | ||
| 489 | (* WRONG *) | |
| 490 | ||
| 491 | fun foo x = | |
| 492 | let | |
| 493 | val y = ... | |
| 494 | in | |
| 61458 | 495 | ... end\<close>} | 
| 39881 | 496 | |
| 61416 | 497 | \<^medskip> | 
| 61854 | 498 | In general the source layout is meant to emphasize the structure of complex | 
| 499 | language expressions, not to pretend that SML had a completely different | |
| 500 | syntax (say that of Haskell, Scala, Java). | |
| 58618 | 501 | \<close> | 
| 502 | ||
| 503 | ||
| 504 | section \<open>ML embedded into Isabelle/Isar\<close> | |
| 505 | ||
| 61854 | 506 | text \<open> | 
| 507 | ML and Isar are intertwined via an open-ended bootstrap process that | |
| 508 | provides more and more programming facilities and logical content in an | |
| 509 | alternating manner. Bootstrapping starts from the raw environment of | |
| 62354 | 510 | existing implementations of Standard ML (mainly Poly/ML). | 
| 39823 | 511 | |
| 57421 | 512 | Isabelle/Pure marks the point where the raw ML toplevel is superseded by | 
| 513 | Isabelle/ML within the Isar theory and proof language, with a uniform | |
| 514 |   context for arbitrary ML values (see also \secref{sec:context}). This formal
 | |
| 515 | environment holds ML compiler bindings, logical entities, and many other | |
| 516 | things. | |
| 517 | ||
| 518 | Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar | |
| 519 | environment by introducing suitable theories with associated ML modules, | |
| 61503 | 520 | either inlined within \<^verbatim>\<open>.thy\<close> files, or as separate \<^verbatim>\<open>.ML\<close> files that are | 
| 61854 | 521 | loading from some theory. Thus Isabelle/HOL is defined as a regular | 
| 522 | user-space application within the Isabelle framework. Further add-on tools | |
| 523 | can be implemented in ML within the Isar context in the same manner: ML is | |
| 524 | part of the standard repertoire of Isabelle, and there is no distinction | |
| 525 | between ``users'' and ``developers'' in this respect. | |
| 58618 | 526 | \<close> | 
| 527 | ||
| 528 | ||
| 529 | subsection \<open>Isar ML commands\<close> | |
| 530 | ||
| 531 | text \<open> | |
| 57421 | 532 | The primary Isar source language provides facilities to ``open a window'' to | 
| 533 |   the underlying ML compiler. Especially see the Isar commands @{command_ref
 | |
| 534 |   "ML_file"} and @{command_ref "ML"}: both work the same way, but the source
 | |
| 535 | text is provided differently, via a file vs.\ inlined, respectively. Apart | |
| 536 | from embedding ML into the main theory definition like that, there are many | |
| 537 |   more commands that refer to ML source, such as @{command_ref setup} or
 | |
| 538 |   @{command_ref declaration}. Even more fine-grained embedding of ML into Isar
 | |
| 539 |   is encountered in the proof method @{method_ref tactic}, which refines the
 | |
| 540 |   pending goal state via a given expression of type @{ML_type tactic}.
 | |
| 58618 | 541 | \<close> | 
| 542 | ||
| 61854 | 543 | text %mlex \<open> | 
| 544 | The following artificial example demonstrates some ML toplevel declarations | |
| 545 | within the implicit Isar theory context. This is regular functional | |
| 546 | programming without referring to logical entities yet. | |
| 58618 | 547 | \<close> | 
| 548 | ||
| 549 | ML \<open> | |
| 39823 | 550 | fun factorial 0 = 1 | 
| 551 | | factorial n = n * factorial (n - 1) | |
| 58618 | 552 | \<close> | 
| 553 | ||
| 61854 | 554 | text \<open> | 
| 555 |   Here the ML environment is already managed by Isabelle, i.e.\ the @{ML
 | |
| 556 | factorial} function is not yet accessible in the preceding paragraph, nor in | |
| 557 | a different theory that is independent from the current one in the import | |
| 558 | hierarchy. | |
| 39823 | 559 | |
| 57421 | 560 | Removing the above ML declaration from the source text will remove any trace | 
| 561 | of this definition, as expected. The Isabelle/ML toplevel environment is | |
| 61854 | 562 | managed in a \<^emph>\<open>stateless\<close> way: in contrast to the raw ML toplevel, there are | 
| 563 | no global side-effects involved here.\<^footnote>\<open>Such a stateless compilation | |
| 564 | environment is also a prerequisite for robust parallel compilation within | |
| 565 | independent nodes of the implicit theory development graph.\<close> | |
| 39823 | 566 | |
| 61416 | 567 | \<^medskip> | 
| 61854 | 568 |   The next example shows how to embed ML into Isar proofs, using @{command_ref
 | 
| 569 |   "ML_prf"} instead of @{command_ref "ML"}. As illustrated below, the effect
 | |
| 570 | on the ML environment is local to the whole proof body, but ignoring the | |
| 571 | block structure. | |
| 572 | \<close> | |
| 39823 | 573 | |
| 40964 | 574 | notepad | 
| 575 | begin | |
| 58618 | 576 | ML_prf %"ML" \<open>val a = 1\<close> | 
| 40126 | 577 |   {
 | 
| 58618 | 578 | ML_prf %"ML" \<open>val b = a + 1\<close> | 
| 61580 | 579 | } \<comment> \<open>Isar block structure ignored by ML environment\<close> | 
| 58618 | 580 | ML_prf %"ML" \<open>val c = b + 1\<close> | 
| 40964 | 581 | end | 
| 39823 | 582 | |
| 61854 | 583 | text \<open> | 
| 584 | By side-stepping the normal scoping rules for Isar proof blocks, embedded ML | |
| 585 | code can refer to the different contexts and manipulate corresponding | |
| 586 | entities, e.g.\ export a fact from a block context. | |
| 39823 | 587 | |
| 61416 | 588 | \<^medskip> | 
| 61854 | 589 |   Two further ML commands are useful in certain situations: @{command_ref
 | 
| 590 |   ML_val} and @{command_ref ML_command} are \<^emph>\<open>diagnostic\<close> in the sense that
 | |
| 591 | there is no effect on the underlying environment, and can thus be used | |
| 592 |   anywhere. The examples below produce long strings of digits by invoking @{ML
 | |
| 593 |   factorial}: @{command ML_val} takes care of printing the ML toplevel result,
 | |
| 594 |   but @{command ML_command} is silent so we produce an explicit output
 | |
| 595 | message. | |
| 58618 | 596 | \<close> | 
| 597 | ||
| 598 | ML_val \<open>factorial 100\<close> | |
| 599 | ML_command \<open>writeln (string_of_int (factorial 100))\<close> | |
| 39823 | 600 | |
| 40964 | 601 | notepad | 
| 602 | begin | |
| 58618 | 603 | ML_val \<open>factorial 100\<close> | 
| 604 | ML_command \<open>writeln (string_of_int (factorial 100))\<close> | |
| 40964 | 605 | end | 
| 39823 | 606 | |
| 607 | ||
| 58618 | 608 | subsection \<open>Compile-time context\<close> | 
| 609 | ||
| 61854 | 610 | text \<open> | 
| 611 | Whenever the ML compiler is invoked within Isabelle/Isar, the formal context | |
| 612 | is passed as a thread-local reference variable. Thus ML code may access the | |
| 613 | theory context during compilation, by reading or writing the (local) theory | |
| 614 | under construction. Note that such direct access to the compile-time context | |
| 615 | is rare. In practice it is typically done via some derived ML functions | |
| 616 | instead. | |
| 58618 | 617 | \<close> | 
| 618 | ||
| 619 | text %mlref \<open> | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 620 |   \begin{mldecls}
 | 
| 62876 | 621 |   @{index_ML Context.the_generic_context: "unit -> Context.generic"} \\
 | 
| 40126 | 622 |   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
 | 
| 56199 | 623 |   @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
 | 
| 624 |   @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 625 |   \end{mldecls}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 626 | |
| 62876 | 627 |     \<^descr> @{ML "Context.the_generic_context ()"} refers to the theory context of
 | 
| 61854 | 628 | the ML toplevel --- at compile time. ML code needs to take care to refer to | 
| 62876 | 629 |     @{ML "Context.the_generic_context ()"} correctly. Recall that evaluation
 | 
| 61854 | 630 | of a function body is delayed until actual run-time. | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 631 | |
| 61854 | 632 |     \<^descr> @{ML "Context.>>"}~\<open>f\<close> applies context transformation \<open>f\<close> to the implicit
 | 
| 633 | context of the ML toplevel. | |
| 61493 | 634 | |
| 61854 | 635 |     \<^descr> @{ML ML_Thms.bind_thms}~\<open>(name, thms)\<close> stores a list of theorems produced
 | 
| 636 | in ML both in the (global) theory context and the ML toplevel, associating | |
| 637 | it with the provided name. | |
| 39850 | 638 | |
| 61854 | 639 |     \<^descr> @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to
 | 
| 640 | a singleton fact. | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 641 | |
| 61854 | 642 | It is important to note that the above functions are really restricted to | 
| 643 | the compile time, even though the ML compiler is invoked at run-time. The | |
| 644 | majority of ML code either uses static antiquotations | |
| 645 |   (\secref{sec:ML-antiq}) or refers to the theory or proof context at
 | |
| 646 | run-time, by explicit functional abstraction. | |
| 58618 | 647 | \<close> | 
| 648 | ||
| 649 | ||
| 650 | subsection \<open>Antiquotations \label{sec:ML-antiq}\<close>
 | |
| 651 | ||
| 61854 | 652 | text \<open> | 
| 653 | A very important consequence of embedding ML into Isar is the concept of | |
| 654 | \<^emph>\<open>ML antiquotation\<close>. The standard token language of ML is augmented by | |
| 655 | special syntactic entities of the following form: | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 656 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 657 |   @{rail \<open>
 | 
| 62969 | 658 |   @{syntax_def antiquote}: '@{' name args '}'
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 659 | \<close>} | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 660 | |
| 62969 | 661 |   Here @{syntax name} and @{syntax args} are outer syntax categories, as
 | 
| 58555 | 662 |   defined in @{cite "isabelle-isar-ref"}.
 | 
| 39823 | 663 | |
| 61416 | 664 | \<^medskip> | 
| 61854 | 665 |   A regular antiquotation \<open>@{name args}\<close> processes its arguments by the usual
 | 
| 666 | means of the Isar source language, and produces corresponding ML source | |
| 667 |   text, either as literal \<^emph>\<open>inline\<close> text (e.g.\ \<open>@{term t}\<close>) or abstract
 | |
| 668 |   \<^emph>\<open>value\<close> (e.g. \<open>@{thm th}\<close>). This pre-compilation scheme allows to refer to
 | |
| 669 | formal entities in a robust manner, with proper static scoping and with some | |
| 670 | degree of logical checking of small portions of the code. | |
| 58618 | 671 | \<close> | 
| 672 | ||
| 673 | ||
| 674 | subsection \<open>Printing ML values\<close> | |
| 675 | ||
| 61854 | 676 | text \<open> | 
| 677 | The ML compiler knows about the structure of values according to their | |
| 678 | static type, and can print them in the manner of its toplevel, although the | |
| 679 |   details are non-portable. The antiquotations @{ML_antiquotation_def
 | |
| 680 |   "make_string"} and @{ML_antiquotation_def "print"} provide a quasi-portable
 | |
| 681 | way to refer to this potential capability of the underlying ML system in | |
| 56399 | 682 | generic Isabelle/ML sources. | 
| 683 | ||
| 61854 | 684 | This is occasionally useful for diagnostic or demonstration purposes. Note | 
| 685 | that production-quality tools require proper user-level error messages, | |
| 686 | avoiding raw ML values in the output. | |
| 687 | \<close> | |
| 58618 | 688 | |
| 689 | text %mlantiq \<open> | |
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 690 |   \begin{matharray}{rcl}
 | 
| 61493 | 691 |   @{ML_antiquotation_def "make_string"} & : & \<open>ML_antiquotation\<close> \\
 | 
| 692 |   @{ML_antiquotation_def "print"} & : & \<open>ML_antiquotation\<close> \\
 | |
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 693 |   \end{matharray}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 694 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 695 |   @{rail \<open>
 | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 696 |   @@{ML_antiquotation make_string}
 | 
| 56399 | 697 | ; | 
| 698 |   @@{ML_antiquotation print} @{syntax name}?
 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 699 | \<close>} | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 700 | |
| 61854 | 701 |   \<^descr> \<open>@{make_string}\<close> inlines a function to print arbitrary values similar to
 | 
| 702 | the ML toplevel. The result is compiler dependent and may fall back on "?" | |
| 703 |   in certain situations. The value of configuration option @{attribute_ref
 | |
| 704 | ML_print_depth} determines further details of output. | |
| 56399 | 705 | |
| 61854 | 706 |   \<^descr> \<open>@{print f}\<close> uses the ML function \<open>f: string -> unit\<close> to output the result
 | 
| 707 |   of \<open>@{make_string}\<close> above, together with the source position of the
 | |
| 708 |   antiquotation. The default output function is @{ML writeln}.
 | |
| 58618 | 709 | \<close> | 
| 710 | ||
| 61854 | 711 | text %mlex \<open> | 
| 712 | The following artificial examples show how to produce adhoc output of ML | |
| 713 | values for debugging purposes. | |
| 714 | \<close> | |
| 58618 | 715 | |
| 59902 | 716 | ML_val \<open> | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 717 | val x = 42; | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 718 | val y = true; | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 719 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 720 |   writeln (@{make_string} {x = x, y = y});
 | 
| 56399 | 721 | |
| 722 |   @{print} {x = x, y = y};
 | |
| 723 |   @{print tracing} {x = x, y = y};
 | |
| 58618 | 724 | \<close> | 
| 725 | ||
| 726 | ||
| 727 | section \<open>Canonical argument order \label{sec:canonical-argument-order}\<close>
 | |
| 728 | ||
| 61854 | 729 | text \<open> | 
| 730 | Standard ML is a language in the tradition of \<open>\<lambda>\<close>-calculus and | |
| 731 | \<^emph>\<open>higher-order functional programming\<close>, similar to OCaml, Haskell, or | |
| 732 | Isabelle/Pure and HOL as logical languages. Getting acquainted with the | |
| 733 | native style of representing functions in that setting can save a lot of | |
| 734 | extra boiler-plate of redundant shuffling of arguments, auxiliary | |
| 735 | abstractions etc. | |
| 39883 | 736 | |
| 61854 | 737 | Functions are usually \<^emph>\<open>curried\<close>: the idea of turning arguments of type | 
| 738 |   \<open>\<tau>\<^sub>i\<close> (for \<open>i \<in> {1, \<dots> n}\<close>) into a result of type \<open>\<tau>\<close> is represented by the
 | |
| 739 | iterated function space \<open>\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>. This is isomorphic to the | |
| 740 | well-known encoding via tuples \<open>\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>, but the curried version | |
| 741 | fits more smoothly into the basic calculus.\<^footnote>\<open>The difference is even more | |
| 742 | significant in HOL, because the redundant tuple structure needs to be | |
| 743 | accommodated extraneous proof steps.\<close> | |
| 39883 | 744 | |
| 61854 | 745 | Currying gives some flexibility due to \<^emph>\<open>partial application\<close>. A function | 
| 746 | \<open>f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>\<close> can be applied to \<open>x: \<tau>\<^sub>1\<close> and the remaining \<open>(f x): \<tau>\<^sub>2 | |
| 747 | \<rightarrow> \<tau>\<close> passed to another function etc. How well this works in practice depends | |
| 748 | on the order of arguments. In the worst case, arguments are arranged | |
| 749 | erratically, and using a function in a certain situation always requires | |
| 750 | some glue code. Thus we would get exponentially many opportunities to | |
| 39883 | 751 | decorate the code with meaningless permutations of arguments. | 
| 752 | ||
| 61854 | 753 | This can be avoided by \<^emph>\<open>canonical argument order\<close>, which observes certain | 
| 754 | standard patterns and minimizes adhoc permutations in their application. In | |
| 755 | Isabelle/ML, large portions of text can be written without auxiliary | |
| 756 | operations like \<open>swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times> \<alpha>\<close> or \<open>C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)\<close> (the | |
| 757 | latter is not present in the Isabelle/ML library). | |
| 39883 | 758 | |
| 61416 | 759 | \<^medskip> | 
| 61854 | 760 | The main idea is that arguments that vary less are moved further to the left | 
| 761 | than those that vary more. Two particularly important categories of | |
| 762 | functions are \<^emph>\<open>selectors\<close> and \<^emph>\<open>updates\<close>. | |
| 39883 | 763 | |
| 61854 | 764 | The subsequent scheme is based on a hypothetical set-like container of type | 
| 765 | \<open>\<beta>\<close> that manages elements of type \<open>\<alpha>\<close>. Both the names and types of the | |
| 766 | associated operations are canonical for Isabelle/ML. | |
| 39883 | 767 | |
| 52416 | 768 |   \begin{center}
 | 
| 39883 | 769 |   \begin{tabular}{ll}
 | 
| 770 | kind & canonical name and type \\\hline | |
| 61493 | 771 | selector & \<open>member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool\<close> \\ | 
| 772 | update & \<open>insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> \\ | |
| 39883 | 773 |   \end{tabular}
 | 
| 52416 | 774 |   \end{center}
 | 
| 39883 | 775 | |
| 61854 | 776 | Given a container \<open>B: \<beta>\<close>, the partially applied \<open>member B\<close> is a predicate | 
| 777 | over elements \<open>\<alpha> \<rightarrow> bool\<close>, and thus represents the intended denotation | |
| 778 | directly. It is customary to pass the abstract predicate to further | |
| 779 | operations, not the concrete container. The argument order makes it easy to | |
| 780 | use other combinators: \<open>forall (member B) list\<close> will check a list of | |
| 781 | elements for membership in \<open>B\<close> etc. Often the explicit \<open>list\<close> is pointless | |
| 782 | and can be contracted to \<open>forall (member B)\<close> to get directly a predicate | |
| 783 | again. | |
| 39883 | 784 | |
| 61854 | 785 | In contrast, an update operation varies the container, so it moves to the | 
| 786 | right: \<open>insert a\<close> is a function \<open>\<beta> \<rightarrow> \<beta>\<close> to insert a value \<open>a\<close>. These can be | |
| 787 | composed naturally as \<open>insert c \<circ> insert b \<circ> insert a\<close>. The slightly awkward | |
| 788 | inversion of the composition order is due to conventional mathematical | |
| 789 | notation, which can be easily amended as explained below. | |
| 58618 | 790 | \<close> | 
| 791 | ||
| 792 | ||
| 793 | subsection \<open>Forward application and composition\<close> | |
| 794 | ||
| 61854 | 795 | text \<open> | 
| 796 | Regular function application and infix notation works best for relatively | |
| 797 | deeply structured expressions, e.g.\ \<open>h (f x y + g z)\<close>. The important | |
| 798 | special case of \<^emph>\<open>linear transformation\<close> applies a cascade of functions \<open>f\<^sub>n | |
| 799 | (\<dots> (f\<^sub>1 x))\<close>. This becomes hard to read and maintain if the functions are | |
| 800 | themselves given as complex expressions. The notation can be significantly | |
| 801 | improved by introducing \<^emph>\<open>forward\<close> versions of application and composition | |
| 802 | as follows: | |
| 39883 | 803 | |
| 61416 | 804 | \<^medskip> | 
| 39883 | 805 |   \begin{tabular}{lll}
 | 
| 61493 | 806 | \<open>x |> f\<close> & \<open>\<equiv>\<close> & \<open>f x\<close> \\ | 
| 807 | \<open>(f #> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |> g\<close> \\ | |
| 39883 | 808 |   \end{tabular}
 | 
| 61416 | 809 | \<^medskip> | 
| 39883 | 810 | |
| 61854 | 811 | This enables to write conveniently \<open>x |> f\<^sub>1 |> \<dots> |> f\<^sub>n\<close> or \<open>f\<^sub>1 #> \<dots> #> | 
| 812 | f\<^sub>n\<close> for its functional abstraction over \<open>x\<close>. | |
| 39883 | 813 | |
| 61416 | 814 | \<^medskip> | 
| 61854 | 815 | There is an additional set of combinators to accommodate multiple results | 
| 816 | (via pairs) that are passed on as multiple arguments (via currying). | |
| 39883 | 817 | |
| 61416 | 818 | \<^medskip> | 
| 39883 | 819 |   \begin{tabular}{lll}
 | 
| 61493 | 820 | \<open>(x, y) |-> f\<close> & \<open>\<equiv>\<close> & \<open>f x y\<close> \\ | 
| 821 | \<open>(f #-> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |-> g\<close> \\ | |
| 39883 | 822 |   \end{tabular}
 | 
| 61416 | 823 | \<^medskip> | 
| 58618 | 824 | \<close> | 
| 825 | ||
| 826 | text %mlref \<open> | |
| 39883 | 827 |   \begin{mldecls}
 | 
| 46262 | 828 |   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
 | 
| 829 |   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
 | |
| 830 |   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
 | |
| 831 |   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
 | |
| 39883 | 832 |   \end{mldecls}
 | 
| 58618 | 833 | \<close> | 
| 834 | ||
| 835 | ||
| 836 | subsection \<open>Canonical iteration\<close> | |
| 837 | ||
| 61854 | 838 | text \<open> | 
| 839 | As explained above, a function \<open>f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> can be understood as update on | |
| 840 | a configuration of type \<open>\<beta>\<close>, parameterized by an argument of type \<open>\<alpha>\<close>. Given | |
| 841 | \<open>a: \<alpha>\<close> the partial application \<open>(f a): \<beta> \<rightarrow> \<beta>\<close> operates homogeneously on \<open>\<beta>\<close>. | |
| 842 | This can be iterated naturally over a list of parameters \<open>[a\<^sub>1, \<dots>, a\<^sub>n]\<close> as | |
| 843 | \<open>f a\<^sub>1 #> \<dots> #> f a\<^sub>n\<close>. The latter expression is again a function \<open>\<beta> \<rightarrow> \<beta>\<close>. It | |
| 844 | can be applied to an initial configuration \<open>b: \<beta>\<close> to start the iteration | |
| 845 | over the given list of arguments: each \<open>a\<close> in \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> is applied | |
| 846 | consecutively by updating a cumulative configuration. | |
| 39883 | 847 | |
| 61854 | 848 | The \<open>fold\<close> combinator in Isabelle/ML lifts a function \<open>f\<close> as above to its | 
| 849 | iterated version over a list of arguments. Lifting can be repeated, e.g.\ | |
| 850 | \<open>(fold \<circ> fold) f\<close> iterates over a list of lists as expected. | |
| 39883 | 851 | |
| 61854 | 852 | The variant \<open>fold_rev\<close> works inside-out over the list of arguments, such | 
| 853 | that \<open>fold_rev f \<equiv> fold f \<circ> rev\<close> holds. | |
| 61493 | 854 | |
| 61854 | 855 | The \<open>fold_map\<close> combinator essentially performs \<open>fold\<close> and \<open>map\<close> | 
| 856 | simultaneously: each application of \<open>f\<close> produces an updated configuration | |
| 857 | together with a side-result; the iteration collects all such side-results as | |
| 858 | a separate list. | |
| 58618 | 859 | \<close> | 
| 860 | ||
| 861 | text %mlref \<open> | |
| 39883 | 862 |   \begin{mldecls}
 | 
| 863 |   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | |
| 864 |   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | |
| 865 |   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
 | |
| 866 |   \end{mldecls}
 | |
| 867 | ||
| 61854 | 868 |   \<^descr> @{ML fold}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> to a list of
 | 
| 869 | parameters. | |
| 61493 | 870 | |
| 61854 | 871 |   \<^descr> @{ML fold_rev}~\<open>f\<close> is similar to @{ML fold}~\<open>f\<close>, but works inside-out, as
 | 
| 872 | if the list would be reversed. | |
| 61493 | 873 | |
| 61854 | 874 |   \<^descr> @{ML fold_map}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> (with
 | 
| 875 | side-result) to a list of parameters and cumulative side-results. | |
| 39883 | 876 | |
| 877 | ||
| 878 |   \begin{warn}
 | |
| 57421 | 879 | The literature on functional programming provides a confusing multitude of | 
| 61854 | 880 | combinators called \<open>foldl\<close>, \<open>foldr\<close> etc. SML97 provides its own variations | 
| 881 |   as @{ML List.foldl} and @{ML List.foldr}, while the classic Isabelle library
 | |
| 882 |   also has the historic @{ML Library.foldl} and @{ML Library.foldr}. To avoid
 | |
| 883 | unnecessary complication, all these historical versions should be ignored, | |
| 884 |   and the canonical @{ML fold} (or @{ML fold_rev}) used exclusively.
 | |
| 39883 | 885 |   \end{warn}
 | 
| 58618 | 886 | \<close> | 
| 887 | ||
| 61854 | 888 | text %mlex \<open> | 
| 889 | The following example shows how to fill a text buffer incrementally by | |
| 890 | adding strings, either individually or from a given list. | |
| 58618 | 891 | \<close> | 
| 892 | ||
| 59902 | 893 | ML_val \<open> | 
| 39883 | 894 | val s = | 
| 895 | Buffer.empty | |
| 896 | |> Buffer.add "digits: " | |
| 897 | |> fold (Buffer.add o string_of_int) (0 upto 9) | |
| 898 | |> Buffer.content; | |
| 899 | ||
| 900 |   @{assert} (s = "digits: 0123456789");
 | |
| 58618 | 901 | \<close> | 
| 902 | ||
| 61854 | 903 | text \<open> | 
| 904 |   Note how @{ML "fold (Buffer.add o string_of_int)"} above saves an extra @{ML
 | |
| 905 | "map"} over the given list. This kind of peephole optimization reduces both | |
| 906 | the code size and the tree structures in memory (``deforestation''), but it | |
| 907 | requires some practice to read and write fluently. | |
| 39883 | 908 | |
| 61416 | 909 | \<^medskip> | 
| 61854 | 910 | The next example elaborates the idea of canonical iteration, demonstrating | 
| 911 | fast accumulation of tree content using a text buffer. | |
| 58618 | 912 | \<close> | 
| 913 | ||
| 914 | ML \<open> | |
| 39883 | 915 | datatype tree = Text of string | Elem of string * tree list; | 
| 916 | ||
| 917 | fun slow_content (Text txt) = txt | |
| 918 | | slow_content (Elem (name, ts)) = | |
| 919 | "<" ^ name ^ ">" ^ | |
| 920 | implode (map slow_content ts) ^ | |
| 921 | "</" ^ name ^ ">" | |
| 922 | ||
| 923 | fun add_content (Text txt) = Buffer.add txt | |
| 924 | | add_content (Elem (name, ts)) = | |
| 925 |         Buffer.add ("<" ^ name ^ ">") #>
 | |
| 926 | fold add_content ts #> | |
| 927 |         Buffer.add ("</" ^ name ^ ">");
 | |
| 928 | ||
| 929 | fun fast_content tree = | |
| 930 | Buffer.empty |> add_content tree |> Buffer.content; | |
| 58618 | 931 | \<close> | 
| 932 | ||
| 61854 | 933 | text \<open> | 
| 934 |   The slowness of @{ML slow_content} is due to the @{ML implode} of the
 | |
| 935 | recursive results, because it copies previously produced strings again and | |
| 936 | again. | |
| 39883 | 937 | |
| 61854 | 938 |   The incremental @{ML add_content} avoids this by operating on a buffer that
 | 
| 939 |   is passed through in a linear fashion. Using @{ML_text "#>"} and contraction
 | |
| 940 | over the actual buffer argument saves some additional boiler-plate. Of | |
| 941 |   course, the two @{ML "Buffer.add"} invocations with concatenated strings
 | |
| 942 | could have been split into smaller parts, but this would have obfuscated the | |
| 943 | source without making a big difference in performance. Here we have done | |
| 944 | some peephole-optimization for the sake of readability. | |
| 39883 | 945 | |
| 61854 | 946 |   Another benefit of @{ML add_content} is its ``open'' form as a function on
 | 
| 947 | buffers that can be continued in further linear transformations, folding | |
| 948 |   etc. Thus it is more compositional than the naive @{ML slow_content}. As
 | |
| 949 |   realistic example, compare the old-style @{ML "Term.maxidx_of_term: term ->
 | |
| 950 |   int"} with the newer @{ML "Term.maxidx_term: term -> int -> int"} in
 | |
| 951 | Isabelle/Pure. | |
| 39883 | 952 | |
| 61854 | 953 |   Note that @{ML fast_content} above is only defined as example. In many
 | 
| 954 |   practical situations, it is customary to provide the incremental @{ML
 | |
| 955 | add_content} only and leave the initialization and termination to the | |
| 956 | concrete application to the user. | |
| 58618 | 957 | \<close> | 
| 958 | ||
| 959 | ||
| 960 | section \<open>Message output channels \label{sec:message-channels}\<close>
 | |
| 961 | ||
| 61854 | 962 | text \<open> | 
| 963 | Isabelle provides output channels for different kinds of messages: regular | |
| 964 | output, high-volume tracing information, warnings, and errors. | |
| 39835 | 965 | |
| 61854 | 966 | Depending on the user interface involved, these messages may appear in | 
| 967 | different text styles or colours. The standard output for batch sessions | |
| 968 | prefixes each line of warnings by \<^verbatim>\<open>###\<close> and errors by \<^verbatim>\<open>***\<close>, but leaves | |
| 969 | anything else unchanged. The message body may contain further markup and | |
| 970 |   formatting, which is routinely used in the Prover IDE @{cite
 | |
| 971 | "isabelle-jedit"}. | |
| 39835 | 972 | |
| 61854 | 973 | Messages are associated with the transaction context of the running Isar | 
| 974 | command. This enables the front-end to manage commands and resulting | |
| 975 | messages together. For example, after deleting a command from a given theory | |
| 976 | document version, the corresponding message output can be retracted from the | |
| 977 | display. | |
| 58618 | 978 | \<close> | 
| 979 | ||
| 980 | text %mlref \<open> | |
| 39835 | 981 |   \begin{mldecls}
 | 
| 982 |   @{index_ML writeln: "string -> unit"} \\
 | |
| 983 |   @{index_ML tracing: "string -> unit"} \\
 | |
| 984 |   @{index_ML warning: "string -> unit"} \\
 | |
| 57421 | 985 |   @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\
 | 
| 39835 | 986 |   \end{mldecls}
 | 
| 987 | ||
| 61854 | 988 |   \<^descr> @{ML writeln}~\<open>text\<close> outputs \<open>text\<close> as regular message. This is the
 | 
| 989 | primary message output operation of Isabelle and should be used by default. | |
| 39835 | 990 | |
| 61854 | 991 |   \<^descr> @{ML tracing}~\<open>text\<close> outputs \<open>text\<close> as special tracing message, indicating
 | 
| 992 | potential high-volume output to the front-end (hundreds or thousands of | |
| 993 | messages issued by a single command). The idea is to allow the | |
| 994 | user-interface to downgrade the quality of message display to achieve higher | |
| 995 | throughput. | |
| 39835 | 996 | |
| 61854 | 997 | Note that the user might have to take special actions to see tracing output, | 
| 998 | e.g.\ switch to a different output window. So this channel should not be | |
| 999 | used for regular output. | |
| 39835 | 1000 | |
| 61854 | 1001 |   \<^descr> @{ML warning}~\<open>text\<close> outputs \<open>text\<close> as warning, which typically means some
 | 
| 1002 | extra emphasis on the front-end side (color highlighting, icons, etc.). | |
| 39835 | 1003 | |
| 61854 | 1004 |   \<^descr> @{ML error}~\<open>text\<close> raises exception @{ML ERROR}~\<open>text\<close> and thus lets the
 | 
| 1005 | Isar toplevel print \<open>text\<close> on the error channel, which typically means some | |
| 1006 | extra emphasis on the front-end side (color highlighting, icons, etc.). | |
| 39835 | 1007 | |
| 1008 | This assumes that the exception is not handled before the command | |
| 61854 | 1009 |   terminates. Handling exception @{ML ERROR}~\<open>text\<close> is a perfectly legal
 | 
| 1010 | alternative: it means that the error is absorbed without any message output. | |
| 39835 | 1011 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1012 |   \begin{warn}
 | 
| 54387 | 1013 |   The actual error channel is accessed via @{ML Output.error_message}, but
 | 
| 58842 | 1014 | this is normally not used directly in user code. | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1015 |   \end{warn}
 | 
| 39835 | 1016 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1017 | |
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1018 |   \begin{warn}
 | 
| 61854 | 1019 | Regular Isabelle/ML code should output messages exclusively by the official | 
| 1020 |   channels. Using raw I/O on \<^emph>\<open>stdout\<close> or \<^emph>\<open>stderr\<close> instead (e.g.\ via @{ML
 | |
| 1021 | TextIO.output}) is apt to cause problems in the presence of parallel and | |
| 1022 | asynchronous processing of Isabelle theories. Such raw output might be | |
| 1023 | displayed by the front-end in some system console log, with a low chance | |
| 1024 | that the user will ever see it. Moreover, as a genuine side-effect on global | |
| 1025 | process channels, there is no proper way to retract output when Isar command | |
| 40126 | 1026 | transactions are reset by the system. | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1027 |   \end{warn}
 | 
| 39872 | 1028 | |
| 1029 |   \begin{warn}
 | |
| 61854 | 1030 | The message channels should be used in a message-oriented manner. This means | 
| 1031 | that multi-line output that logically belongs together is issued by a single | |
| 1032 |   invocation of @{ML writeln} etc.\ with the functional concatenation of all
 | |
| 1033 | message constituents. | |
| 39872 | 1034 |   \end{warn}
 | 
| 58618 | 1035 | \<close> | 
| 1036 | ||
| 61854 | 1037 | text %mlex \<open> | 
| 1038 | The following example demonstrates a multi-line warning. Note that in some | |
| 1039 | situations the user sees only the first line, so the most important point | |
| 1040 | should be made first. | |
| 58618 | 1041 | \<close> | 
| 1042 | ||
| 1043 | ML_command \<open> | |
| 39872 | 1044 | warning (cat_lines | 
| 1045 | ["Beware the Jabberwock, my son!", | |
| 1046 | "The jaws that bite, the claws that catch!", | |
| 1047 | "Beware the Jubjub Bird, and shun", | |
| 1048 | "The frumious Bandersnatch!"]); | |
| 58618 | 1049 | \<close> | 
| 1050 | ||
| 59902 | 1051 | text \<open> | 
| 61416 | 1052 | \<^medskip> | 
| 61854 | 1053 | An alternative is to make a paragraph of freely-floating words as follows. | 
| 59902 | 1054 | \<close> | 
| 1055 | ||
| 1056 | ML_command \<open> | |
| 1057 | warning (Pretty.string_of (Pretty.para | |
| 1058 | "Beware the Jabberwock, my son! \ | |
| 1059 | \The jaws that bite, the claws that catch! \ | |
| 1060 | \Beware the Jubjub Bird, and shun \ | |
| 1061 | \The frumious Bandersnatch!")) | |
| 1062 | \<close> | |
| 1063 | ||
| 1064 | text \<open> | |
| 1065 | This has advantages with variable window / popup sizes, but might make it | |
| 1066 | harder to search for message content systematically, e.g.\ by other tools or | |
| 1067 | by humans expecting the ``verse'' of a formal message in a fixed layout. | |
| 1068 | \<close> | |
| 1069 | ||
| 58618 | 1070 | |
| 1071 | section \<open>Exceptions \label{sec:exceptions}\<close>
 | |
| 1072 | ||
| 61854 | 1073 | text \<open> | 
| 1074 | The Standard ML semantics of strict functional evaluation together with | |
| 1075 | exceptions is rather well defined, but some delicate points need to be | |
| 1076 | observed to avoid that ML programs go wrong despite static type-checking. | |
| 1077 | Exceptions in Isabelle/ML are subsequently categorized as follows. | |
| 1078 | \<close> | |
| 61506 | 1079 | |
| 1080 | paragraph \<open>Regular user errors.\<close> | |
| 61854 | 1081 | text \<open> | 
| 1082 | These are meant to provide informative feedback about malformed input etc. | |
| 1083 | ||
| 1084 |   The \<^emph>\<open>error\<close> function raises the corresponding @{ML ERROR} exception, with a
 | |
| 1085 |   plain text message as argument. @{ML ERROR} exceptions can be handled
 | |
| 1086 | internally, in order to be ignored, turned into other exceptions, or | |
| 1087 | cascaded by appending messages. If the corresponding Isabelle/Isar command | |
| 1088 |   terminates with an @{ML ERROR} exception state, the system will print the
 | |
| 1089 |   result on the error channel (see \secref{sec:message-channels}).
 | |
| 39854 | 1090 | |
| 61854 | 1091 | It is considered bad style to refer to internal function names or values in | 
| 1092 |   ML source notation in user error messages. Do not use \<open>@{make_string}\<close> nor
 | |
| 1093 |   \<open>@{here}\<close>!
 | |
| 39854 | 1094 | |
| 61854 | 1095 | Grammatical correctness of error messages can be improved by \<^emph>\<open>omitting\<close> | 
| 1096 | final punctuation: messages are often concatenated or put into a larger | |
| 1097 | context (e.g.\ augmented with source position). Note that punctuation after | |
| 1098 | formal entities (types, terms, theorems) is particularly prone to user | |
| 1099 | confusion. | |
| 61506 | 1100 | \<close> | 
| 1101 | ||
| 1102 | paragraph \<open>Program failures.\<close> | |
| 61854 | 1103 | text \<open> | 
| 1104 | There is a handful of standard exceptions that indicate general failure | |
| 61506 | 1105 | situations, or failures of core operations on logical entities (types, | 
| 1106 |   terms, theorems, theories, see \chref{ch:logic}).
 | |
| 39854 | 1107 | |
| 61854 | 1108 | These exceptions indicate a genuine breakdown of the program, so the main | 
| 1109 | purpose is to determine quickly what has happened where. Traditionally, the | |
| 1110 | (short) exception message would include the name of an ML function, although | |
| 1111 | this is no longer necessary, because the ML runtime system attaches detailed | |
| 1112 |   source position stemming from the corresponding @{ML_text raise} keyword.
 | |
| 39854 | 1113 | |
| 61416 | 1114 | \<^medskip> | 
| 61854 | 1115 | User modules can always introduce their own custom exceptions locally, e.g.\ | 
| 1116 | to organize internal failures robustly without overlapping with existing | |
| 1117 | exceptions. Exceptions that are exposed in module signatures require extra | |
| 1118 | care, though, and should \<^emph>\<open>not\<close> be introduced by default. Surprise by users | |
| 1119 | of a module can be often minimized by using plain user errors instead. | |
| 61506 | 1120 | \<close> | 
| 1121 | ||
| 1122 | paragraph \<open>Interrupts.\<close> | |
| 61854 | 1123 | text \<open> | 
| 1124 | These indicate arbitrary system events: both the ML runtime system and the | |
| 1125 | Isabelle/ML infrastructure signal various exceptional situations by raising | |
| 1126 |   the special @{ML Exn.Interrupt} exception in user code.
 | |
| 57421 | 1127 | |
| 1128 | This is the one and only way that physical events can intrude an Isabelle/ML | |
| 1129 | program. Such an interrupt can mean out-of-memory, stack overflow, timeout, | |
| 1130 | internal signaling of threads, or a POSIX process signal. An Isabelle/ML | |
| 1131 | program that intercepts interrupts becomes dependent on physical effects of | |
| 1132 | the environment. Even worse, exception handling patterns that are too | |
| 1133 | general by accident, e.g.\ by misspelled exception constructors, will cover | |
| 1134 | interrupts unintentionally and thus render the program semantics | |
| 1135 | ill-defined. | |
| 39854 | 1136 | |
| 61854 | 1137 | Note that the Interrupt exception dates back to the original SML90 language | 
| 1138 | definition. It was excluded from the SML97 version to avoid its malign | |
| 1139 | impact on ML program semantics, but without providing a viable alternative. | |
| 1140 | Isabelle/ML recovers physical interruptibility (which is an indispensable | |
| 1141 | tool to implement managed evaluation of command transactions), but requires | |
| 1142 | user code to be strictly transparent wrt.\ interrupts. | |
| 39854 | 1143 | |
| 1144 |   \begin{warn}
 | |
| 61854 | 1145 | Isabelle/ML user code needs to terminate promptly on interruption, without | 
| 1146 | guessing at its meaning to the system infrastructure. Temporary handling of | |
| 1147 | interrupts for cleanup of global resources etc.\ needs to be followed | |
| 1148 | immediately by re-raising of the original exception. | |
| 39854 | 1149 |   \end{warn}
 | 
| 58618 | 1150 | \<close> | 
| 1151 | ||
| 1152 | text %mlref \<open> | |
| 39855 | 1153 |   \begin{mldecls}
 | 
| 1154 |   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
 | |
| 1155 |   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
 | |
| 55838 | 1156 |   @{index_ML_exception ERROR: string} \\
 | 
| 1157 |   @{index_ML_exception Fail: string} \\
 | |
| 39856 | 1158 |   @{index_ML Exn.is_interrupt: "exn -> bool"} \\
 | 
| 62505 | 1159 |   @{index_ML Exn.reraise: "exn -> 'a"} \\
 | 
| 56303 
4cc3f4db3447
clarified Isabelle/ML bootstrap, such that Execution does not require ML_Compiler;
 wenzelm parents: 
56199diff
changeset | 1160 |   @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\
 | 
| 39855 | 1161 |   \end{mldecls}
 | 
| 1162 | ||
| 61854 | 1163 |   \<^descr> @{ML try}~\<open>f x\<close> makes the partiality of evaluating \<open>f x\<close> explicit via the
 | 
| 1164 | option datatype. Interrupts are \<^emph>\<open>not\<close> handled here, i.e.\ this form serves | |
| 1165 |   as safe replacement for the \<^emph>\<open>unsafe\<close> version @{ML_text "(SOME"}~\<open>f
 | |
| 1166 |   x\<close>~@{ML_text "handle _ => NONE)"} that is occasionally seen in books about
 | |
| 1167 | SML97, but not in Isabelle/ML. | |
| 39855 | 1168 | |
| 61439 | 1169 |   \<^descr> @{ML can} is similar to @{ML try} with more abstract result.
 | 
| 1170 | ||
| 61854 | 1171 |   \<^descr> @{ML ERROR}~\<open>msg\<close> represents user errors; this exception is normally
 | 
| 1172 |   raised indirectly via the @{ML error} function (see
 | |
| 1173 |   \secref{sec:message-channels}).
 | |
| 39856 | 1174 | |
| 61493 | 1175 |   \<^descr> @{ML Fail}~\<open>msg\<close> represents general program failures.
 | 
| 61439 | 1176 | |
| 61854 | 1177 |   \<^descr> @{ML Exn.is_interrupt} identifies interrupts robustly, without mentioning
 | 
| 1178 | concrete exception constructors in user code. Handled interrupts need to be | |
| 1179 | re-raised promptly! | |
| 1180 | ||
| 62505 | 1181 |   \<^descr> @{ML Exn.reraise}~\<open>exn\<close> raises exception \<open>exn\<close> while preserving its implicit
 | 
| 61854 | 1182 | position information (if possible, depending on the ML platform). | 
| 39856 | 1183 | |
| 61854 | 1184 |   \<^descr> @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~\<open>e\<close>@{ML_text ")"} evaluates
 | 
| 1185 | expression \<open>e\<close> while printing a full trace of its stack of nested exceptions | |
| 1186 | (if possible, depending on the ML platform). | |
| 39855 | 1187 | |
| 61854 | 1188 |   Inserting @{ML Runtime.exn_trace} into ML code temporarily is useful for
 | 
| 1189 | debugging, but not suitable for production code. | |
| 58618 | 1190 | \<close> | 
| 1191 | ||
| 1192 | text %mlantiq \<open> | |
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1193 |   \begin{matharray}{rcl}
 | 
| 61493 | 1194 |   @{ML_antiquotation_def "assert"} & : & \<open>ML_antiquotation\<close> \\
 | 
| 64465 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1195 |   @{ML_antiquotation_def "undefined"} & : & \<open>ML_antiquotation\<close> \\
 | 
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1196 |   \end{matharray}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1197 | |
| 61854 | 1198 |   \<^descr> \<open>@{assert}\<close> inlines a function @{ML_type "bool -> unit"} that raises @{ML
 | 
| 1199 |   Fail} if the argument is @{ML false}. Due to inlining the source position of
 | |
| 1200 | failed assertions is included in the error output. | |
| 64465 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1201 | |
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1202 |   \<^descr> \<open>@{undefined}\<close> inlines @{verbatim raise}~@{ML Match}, i.e.\ the ML program
 | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1203 | behaves as in some function application of an undefined case. | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1204 | \<close> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1205 | |
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1206 | text %mlex \<open> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1207 |   The ML function @{ML undefined} is defined in \<^file>\<open>~~/src/Pure/library.ML\<close>
 | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1208 | as follows: | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1209 | \<close> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1210 | |
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1211 | ML \<open>fun undefined _ = raise Match\<close> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1212 | |
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1213 | text \<open> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1214 | \<^medskip> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1215 |   The following variant uses the antiquotation @{ML_antiquotation undefined}
 | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1216 | instead: | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1217 | \<close> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1218 | |
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1219 | ML \<open>fun undefined _ = @{undefined}\<close>
 | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1220 | |
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1221 | text \<open> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1222 | \<^medskip> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1223 | Here is the same, using control-symbol notation for the antiquotation, with | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1224 | special rendering of \<^verbatim>\<open>\<^undefined>\<close>: | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1225 | \<close> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1226 | |
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1227 | ML \<open>fun undefined _ = \<^undefined>\<close> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1228 | |
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1229 | text \<open> | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1230 | \medskip Semantically, all forms are equivalent to a function definition | 
| 
73069f272f42
documentation of @{undefined} (actually introduced in Isabelle2016);
 wenzelm parents: 
64464diff
changeset | 1231 | without any clauses, but that is syntactically not allowed in ML. | 
| 58618 | 1232 | \<close> | 
| 1233 | ||
| 1234 | ||
| 1235 | section \<open>Strings of symbols \label{sec:symbols}\<close>
 | |
| 1236 | ||
| 61854 | 1237 | text \<open> | 
| 1238 | A \<^emph>\<open>symbol\<close> constitutes the smallest textual unit in Isabelle/ML --- raw ML | |
| 1239 | characters are normally not encountered at all. Isabelle strings consist of | |
| 1240 | a sequence of symbols, represented as a packed string or an exploded list of | |
| 1241 | strings. Each symbol is in itself a small string, which has either one of | |
| 61504 | 1242 | the following forms: | 
| 1243 | ||
| 1244 | \<^enum> a single ASCII character ``\<open>c\<close>'', for example ``\<^verbatim>\<open>a\<close>'', | |
| 1245 | ||
| 1246 | \<^enum> a codepoint according to UTF-8 (non-ASCII byte sequence), | |
| 1247 | ||
| 1248 | \<^enum> a regular symbol ``\<^verbatim>\<open>\<ident>\<close>'', for example ``\<^verbatim>\<open>\<alpha>\<close>'', | |
| 1249 | ||
| 1250 | \<^enum> a control symbol ``\<^verbatim>\<open>\<^ident>\<close>'', for example ``\<^verbatim>\<open>\<^bold>\<close>'', | |
| 1251 | ||
| 1252 | The \<open>ident\<close> syntax for symbol names is \<open>letter (letter | digit)\<^sup>*\<close>, where | |
| 1253 | \<open>letter = A..Za..z\<close> and \<open>digit = 0..9\<close>. There are infinitely many regular | |
| 1254 | symbols and control symbols, but a fixed collection of standard symbols is | |
| 1255 | treated specifically. For example, ``\<^verbatim>\<open>\<alpha>\<close>'' is classified as a letter, which | |
| 1256 | means it may occur within regular Isabelle identifiers. | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1257 | |
| 57421 | 1258 | The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit | 
| 1259 | character sequences are passed-through unchanged. Unicode/UCS data in UTF-8 | |
| 1260 | encoding is processed in a non-strict fashion, such that well-formed code | |
| 1261 | sequences are recognized accordingly. Unicode provides its own collection of | |
| 1262 | mathematical symbols, but within the core Isabelle/ML world there is no link | |
| 1263 | to the standard collection of Isabelle regular symbols. | |
| 1264 | ||
| 61416 | 1265 | \<^medskip> | 
| 61504 | 1266 | Output of Isabelle symbols depends on the print mode. For example, the | 
| 1267 |   standard {\LaTeX} setup of the Isabelle document preparation system would
 | |
| 1268 | present ``\<^verbatim>\<open>\<alpha>\<close>'' as \<open>\<alpha>\<close>, and ``\<^verbatim>\<open>\<^bold>\<alpha>\<close>'' as \<open>\<^bold>\<alpha>\<close>. On-screen rendering usually | |
| 1269 | works by mapping a finite subset of Isabelle symbols to suitable Unicode | |
| 1270 | characters. | |
| 58618 | 1271 | \<close> | 
| 1272 | ||
| 1273 | text %mlref \<open> | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1274 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1275 |   @{index_ML_type "Symbol.symbol": string} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1276 |   @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1277 |   @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1278 |   @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1279 |   @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1280 |   @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1281 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1282 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1283 |   @{index_ML_type "Symbol.sym"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1284 |   @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1285 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1286 | |
| 61854 | 1287 |   \<^descr> Type @{ML_type "Symbol.symbol"} represents individual Isabelle symbols.
 | 
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1288 | |
| 61854 | 1289 |   \<^descr> @{ML "Symbol.explode"}~\<open>str\<close> produces a symbol list from the packed form.
 | 
| 1290 |   This function supersedes @{ML "String.explode"} for virtually all purposes
 | |
| 1291 | of manipulating text in Isabelle!\<^footnote>\<open>The runtime overhead for exploded strings | |
| 1292 | is mainly that of the list structure: individual symbols that happen to be a | |
| 1293 | singleton string do not require extra memory in Poly/ML.\<close> | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1294 | |
| 61439 | 1295 |   \<^descr> @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
 | 
| 61854 | 1296 |   "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard symbols
 | 
| 1297 |   according to fixed syntactic conventions of Isabelle, cf.\ @{cite
 | |
| 1298 | "isabelle-isar-ref"}. | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1299 | |
| 61854 | 1300 |   \<^descr> Type @{ML_type "Symbol.sym"} is a concrete datatype that represents the
 | 
| 1301 |   different kinds of symbols explicitly, with constructors @{ML
 | |
| 1302 |   "Symbol.Char"}, @{ML "Symbol.UTF8"}, @{ML "Symbol.Sym"}, @{ML
 | |
| 63936 | 1303 |   "Symbol.Control"}, @{ML "Symbol.Malformed"}.
 | 
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1304 | |
| 61854 | 1305 |   \<^descr> @{ML "Symbol.decode"} converts the string representation of a symbol into
 | 
| 1306 | the datatype version. | |
| 61506 | 1307 | \<close> | 
| 1308 | ||
| 1309 | paragraph \<open>Historical note.\<close> | |
| 61854 | 1310 | text \<open> | 
| 1311 |   In the original SML90 standard the primitive ML type @{ML_type char} did not
 | |
| 1312 |   exists, and @{ML_text "explode: string -> string list"} produced a list of
 | |
| 1313 |   singleton strings like @{ML "raw_explode: string -> string list"} in
 | |
| 61506 | 1314 | Isabelle/ML today. When SML97 came out, Isabelle did not adopt its somewhat | 
| 1315 | anachronistic 8-bit or 16-bit characters, but the idea of exploding a string | |
| 1316 | into a list of small strings was extended to ``symbols'' as explained above. | |
| 1317 | Thus Isabelle sources can refer to an infinite store of user-defined | |
| 1318 | symbols, without having to worry about the multitude of Unicode encodings | |
| 61854 | 1319 | that have emerged over the years. | 
| 1320 | \<close> | |
| 58618 | 1321 | |
| 1322 | ||
| 1323 | section \<open>Basic data types\<close> | |
| 1324 | ||
| 61854 | 1325 | text \<open> | 
| 1326 | The basis library proposal of SML97 needs to be treated with caution. Many | |
| 1327 | of its operations simply do not fit with important Isabelle/ML conventions | |
| 1328 | (like ``canonical argument order'', see | |
| 1329 |   \secref{sec:canonical-argument-order}), others cause problems with the
 | |
| 1330 |   parallel evaluation model of Isabelle/ML (such as @{ML TextIO.print} or @{ML
 | |
| 1331 | OS.Process.system}). | |
| 39859 | 1332 | |
| 61854 | 1333 | Subsequently we give a brief overview of important operations on basic ML | 
| 1334 | data types. | |
| 58618 | 1335 | \<close> | 
| 1336 | ||
| 1337 | ||
| 1338 | subsection \<open>Characters\<close> | |
| 1339 | ||
| 1340 | text %mlref \<open> | |
| 39863 | 1341 |   \begin{mldecls}
 | 
| 1342 |   @{index_ML_type char} \\
 | |
| 1343 |   \end{mldecls}
 | |
| 1344 | ||
| 61854 | 1345 |   \<^descr> Type @{ML_type char} is \<^emph>\<open>not\<close> used. The smallest textual unit in Isabelle
 | 
| 1346 |   is represented as a ``symbol'' (see \secref{sec:symbols}).
 | |
| 58618 | 1347 | \<close> | 
| 1348 | ||
| 1349 | ||
| 1350 | subsection \<open>Strings\<close> | |
| 1351 | ||
| 1352 | text %mlref \<open> | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1353 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1354 |   @{index_ML_type string} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1355 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1356 | |
| 61854 | 1357 |   \<^descr> Type @{ML_type string} represents immutable vectors of 8-bit characters.
 | 
| 1358 | There are operations in SML to convert back and forth to actual byte | |
| 1359 | vectors, which are seldom used. | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1360 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1361 | This historically important raw text representation is used for | 
| 61854 | 1362 | Isabelle-specific purposes with the following implicit substructures packed | 
| 1363 | into the string content: | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1364 | |
| 61854 | 1365 |     \<^enum> sequence of Isabelle symbols (see also \secref{sec:symbols}), with @{ML
 | 
| 1366 | Symbol.explode} as key operation; | |
| 61458 | 1367 | |
| 61854 | 1368 |     \<^enum> XML tree structure via YXML (see also @{cite "isabelle-system"}), with
 | 
| 1369 |     @{ML YXML.parse_body} as key operation.
 | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1370 | |
| 58723 | 1371 | Note that Isabelle/ML string literals may refer Isabelle symbols like | 
| 61854 | 1372 | ``\<^verbatim>\<open>\<alpha>\<close>'' natively, \<^emph>\<open>without\<close> escaping the backslash. This is a consequence | 
| 1373 | of Isabelle treating all source text as strings of symbols, instead of raw | |
| 1374 | characters. | |
| 58618 | 1375 | \<close> | 
| 1376 | ||
| 61854 | 1377 | text %mlex \<open> | 
| 1378 | The subsequent example illustrates the difference of physical addressing of | |
| 1379 | bytes versus logical addressing of symbols in Isabelle strings. | |
| 58618 | 1380 | \<close> | 
| 1381 | ||
| 1382 | ML_val \<open> | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1383 | val s = "\<A>"; | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1384 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1385 |   @{assert} (length (Symbol.explode s) = 1);
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1386 |   @{assert} (size s = 4);
 | 
| 58618 | 1387 | \<close> | 
| 1388 | ||
| 61854 | 1389 | text \<open> | 
| 1390 | Note that in Unicode renderings of the symbol \<open>\<A>\<close>, variations of encodings | |
| 1391 | like UTF-8 or UTF-16 pose delicate questions about the multi-byte | |
| 1392 | representations of its codepoint, which is outside of the 16-bit address | |
| 1393 | space of the original Unicode standard from the 1990-ies. In Isabelle/ML it | |
| 1394 | is just ``\<^verbatim>\<open>\<A>\<close>'' literally, using plain ASCII characters beyond any | |
| 1395 | doubts. | |
| 1396 | \<close> | |
| 58618 | 1397 | |
| 1398 | ||
| 1399 | subsection \<open>Integers\<close> | |
| 1400 | ||
| 1401 | text %mlref \<open> | |
| 39862 | 1402 |   \begin{mldecls}
 | 
| 1403 |   @{index_ML_type int} \\
 | |
| 1404 |   \end{mldecls}
 | |
| 1405 | ||
| 61854 | 1406 |   \<^descr> Type @{ML_type int} represents regular mathematical integers, which are
 | 
| 1407 | \<^emph>\<open>unbounded\<close>. Overflow is treated properly, but should never happen in | |
| 1408 | practice.\<^footnote>\<open>The size limit for integer bit patterns in memory is 64\,MB for | |
| 62354 | 1409 | 32-bit Poly/ML, and much higher for 64-bit systems.\<close> | 
| 39862 | 1410 | |
| 55837 | 1411 |   Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
 | 
| 63680 | 1412 |   @{ML_structure Int}. Structure @{ML_structure Integer} in
 | 
| 1413 | \<^file>\<open>~~/src/Pure/General/integer.ML\<close> provides some additional operations. | |
| 58618 | 1414 | \<close> | 
| 1415 | ||
| 1416 | ||
| 63215 | 1417 | subsection \<open>Rational numbers\<close> | 
| 1418 | ||
| 1419 | text %mlref \<open> | |
| 1420 |   \begin{mldecls}
 | |
| 1421 |   @{index_ML_type Rat.rat} \\
 | |
| 1422 |   \end{mldecls}
 | |
| 1423 | ||
| 1424 |   \<^descr> Type @{ML_type Rat.rat} represents rational numbers, based on the
 | |
| 1425 | unbounded integers of Poly/ML. | |
| 1426 | ||
| 1427 | Literal rationals may be written with special antiquotation syntax | |
| 1428 | \<^verbatim>\<open>@\<close>\<open>int\<close>\<^verbatim>\<open>/\<close>\<open>nat\<close> or \<^verbatim>\<open>@\<close>\<open>int\<close> (without any white space). For example | |
| 1429 | \<^verbatim>\<open>@~1/4\<close> or \<^verbatim>\<open>@10\<close>. The ML toplevel pretty printer uses the same format. | |
| 1430 | ||
| 1431 | Standard operations are provided via ad-hoc overloading of \<^verbatim>\<open>+\<close>, \<^verbatim>\<open>-\<close>, \<^verbatim>\<open>*\<close>, | |
| 1432 | \<^verbatim>\<open>/\<close>, etc. | |
| 1433 | \<close> | |
| 1434 | ||
| 1435 | ||
| 58618 | 1436 | subsection \<open>Time\<close> | 
| 1437 | ||
| 1438 | text %mlref \<open> | |
| 40302 | 1439 |   \begin{mldecls}
 | 
| 1440 |   @{index_ML_type Time.time} \\
 | |
| 1441 |   @{index_ML seconds: "real -> Time.time"} \\
 | |
| 1442 |   \end{mldecls}
 | |
| 1443 | ||
| 61854 | 1444 |   \<^descr> Type @{ML_type Time.time} represents time abstractly according to the
 | 
| 1445 | SML97 basis library definition. This is adequate for internal ML operations, | |
| 1446 | but awkward in concrete time specifications. | |
| 40302 | 1447 | |
| 61854 | 1448 |   \<^descr> @{ML seconds}~\<open>s\<close> turns the concrete scalar \<open>s\<close> (measured in seconds) into
 | 
| 1449 | an abstract time value. Floating point numbers are easy to use as | |
| 1450 |   configuration options in the context (see \secref{sec:config-options}) or
 | |
| 1451 | system options that are maintained externally. | |
| 58618 | 1452 | \<close> | 
| 1453 | ||
| 1454 | ||
| 1455 | subsection \<open>Options\<close> | |
| 1456 | ||
| 1457 | text %mlref \<open> | |
| 39859 | 1458 |   \begin{mldecls}
 | 
| 1459 |   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
 | |
| 1460 |   @{index_ML is_some: "'a option -> bool"} \\
 | |
| 1461 |   @{index_ML is_none: "'a option -> bool"} \\
 | |
| 1462 |   @{index_ML the: "'a option -> 'a"} \\
 | |
| 1463 |   @{index_ML these: "'a list option -> 'a list"} \\
 | |
| 1464 |   @{index_ML the_list: "'a option -> 'a list"} \\
 | |
| 1465 |   @{index_ML the_default: "'a -> 'a option -> 'a"} \\
 | |
| 1466 |   \end{mldecls}
 | |
| 58618 | 1467 | \<close> | 
| 1468 | ||
| 61854 | 1469 | text \<open> | 
| 1470 |   Apart from @{ML Option.map} most other operations defined in structure
 | |
| 1471 |   @{ML_structure Option} are alien to Isabelle/ML and never used. The
 | |
| 63680 | 1472 | operations shown above are defined in \<^file>\<open>~~/src/Pure/General/basics.ML\<close>. | 
| 61854 | 1473 | \<close> | 
| 58618 | 1474 | |
| 1475 | ||
| 1476 | subsection \<open>Lists\<close> | |
| 1477 | ||
| 61854 | 1478 | text \<open> | 
| 1479 | Lists are ubiquitous in ML as simple and light-weight ``collections'' for | |
| 1480 | many everyday programming tasks. Isabelle/ML provides important additions | |
| 1481 | and improvements over operations that are predefined in the SML97 library. | |
| 1482 | \<close> | |
| 58618 | 1483 | |
| 1484 | text %mlref \<open> | |
| 39863 | 1485 |   \begin{mldecls}
 | 
| 1486 |   @{index_ML cons: "'a -> 'a list -> 'a list"} \\
 | |
| 39874 | 1487 |   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
 | 
| 1488 |   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | |
| 1489 |   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
 | |
| 1490 |   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | |
| 39863 | 1491 |   \end{mldecls}
 | 
| 1492 | ||
| 61493 | 1493 |   \<^descr> @{ML cons}~\<open>x xs\<close> evaluates to \<open>x :: xs\<close>.
 | 
| 39863 | 1494 | |
| 61854 | 1495 | Tupled infix operators are a historical accident in Standard ML. The curried | 
| 1496 |   @{ML cons} amends this, but it should be only used when partial application
 | |
| 1497 | is required. | |
| 39863 | 1498 | |
| 61854 | 1499 |   \<^descr> @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat lists as a
 | 
| 63680 | 1500 | set-like container that maintains the order of elements. See | 
| 1501 | \<^file>\<open>~~/src/Pure/library.ML\<close> for the full specifications (written in ML). | |
| 1502 |   There are some further derived operations like @{ML union} or @{ML inter}.
 | |
| 39874 | 1503 | |
| 61854 | 1504 |   Note that @{ML insert} is conservative about elements that are already a
 | 
| 1505 |   @{ML member} of the list, while @{ML update} ensures that the latest entry
 | |
| 1506 | is always put in front. The latter discipline is often more appropriate in | |
| 1507 |   declarations of context data (\secref{sec:context-data}) that are issued by
 | |
| 1508 | the user in Isar source: later declarations take precedence over earlier | |
| 1509 | ones. \<close> | |
| 58618 | 1510 | |
| 61854 | 1511 | text %mlex \<open> | 
| 1512 |   Using canonical @{ML fold} together with @{ML cons} (or similar standard
 | |
| 1513 | operations) alternates the orientation of data. The is quite natural and | |
| 1514 |   should not be altered forcible by inserting extra applications of @{ML rev}.
 | |
| 1515 |   The alternative @{ML fold_rev} can be used in the few situations, where
 | |
| 1516 | alternation should be prevented. | |
| 58618 | 1517 | \<close> | 
| 1518 | ||
| 59902 | 1519 | ML_val \<open> | 
| 39863 | 1520 | val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; | 
| 1521 | ||
| 1522 | val list1 = fold cons items []; | |
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1523 |   @{assert} (list1 = rev items);
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1524 | |
| 39863 | 1525 | val list2 = fold_rev cons items []; | 
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1526 |   @{assert} (list2 = items);
 | 
| 58618 | 1527 | \<close> | 
| 1528 | ||
| 61854 | 1529 | text \<open> | 
| 1530 | The subsequent example demonstrates how to \<^emph>\<open>merge\<close> two lists in a natural | |
| 1531 | way. | |
| 1532 | \<close> | |
| 58618 | 1533 | |
| 59902 | 1534 | ML_val \<open> | 
| 39883 | 1535 | fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs; | 
| 58618 | 1536 | \<close> | 
| 1537 | ||
| 61854 | 1538 | text \<open> | 
| 1539 | Here the first list is treated conservatively: only the new elements from | |
| 1540 |   the second list are inserted. The inside-out order of insertion via @{ML
 | |
| 1541 | fold_rev} attempts to preserve the order of elements in the result. | |
| 39883 | 1542 | |
| 1543 | This way of merging lists is typical for context data | |
| 63680 | 1544 |   (\secref{sec:context-data}). See also @{ML merge} as defined in
 | 
| 1545 | \<^file>\<open>~~/src/Pure/library.ML\<close>. | |
| 58618 | 1546 | \<close> | 
| 1547 | ||
| 1548 | ||
| 1549 | subsection \<open>Association lists\<close> | |
| 1550 | ||
| 61854 | 1551 | text \<open> | 
| 1552 | The operations for association lists interpret a concrete list of pairs as a | |
| 1553 | finite function from keys to values. Redundant representations with multiple | |
| 1554 | occurrences of the same key are implicitly normalized: lookup and update | |
| 1555 | only take the first occurrence into account. | |
| 58618 | 1556 | \<close> | 
| 1557 | ||
| 1558 | text \<open> | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1559 |   \begin{mldecls}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1560 |   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1561 |   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1562 |   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1563 |   \end{mldecls}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1564 | |
| 61854 | 1565 |   \<^descr> @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update} implement the
 | 
| 1566 | main ``framework operations'' for mappings in Isabelle/ML, following | |
| 1567 | standard conventions for their names and types. | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1568 | |
| 61854 | 1569 | Note that a function called \<^verbatim>\<open>lookup\<close> is obliged to express its partiality | 
| 1570 | via an explicit option element. There is no choice to raise an exception, | |
| 1571 | without changing the name to something like \<open>the_element\<close> or \<open>get\<close>. | |
| 61493 | 1572 | |
| 61854 | 1573 |   The \<open>defined\<close> operation is essentially a contraction of @{ML is_some} and
 | 
| 1574 | \<^verbatim>\<open>lookup\<close>, but this is sufficiently frequent to justify its independent | |
| 1575 | existence. This also gives the implementation some opportunity for peep-hole | |
| 1576 | optimization. | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1577 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1578 | |
| 57421 | 1579 | Association lists are adequate as simple implementation of finite mappings | 
| 1580 | in many practical situations. A more advanced table structure is defined in | |
| 63680 | 1581 | \<^file>\<open>~~/src/Pure/General/table.ML\<close>; that version scales easily to thousands or | 
| 1582 | millions of elements. | |
| 58618 | 1583 | \<close> | 
| 1584 | ||
| 1585 | ||
| 1586 | subsection \<open>Unsynchronized references\<close> | |
| 1587 | ||
| 1588 | text %mlref \<open> | |
| 39859 | 1589 |   \begin{mldecls}
 | 
| 39870 | 1590 |   @{index_ML_type "'a Unsynchronized.ref"} \\
 | 
| 39859 | 1591 |   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
 | 
| 1592 |   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
 | |
| 46262 | 1593 |   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
 | 
| 39859 | 1594 |   \end{mldecls}
 | 
| 58618 | 1595 | \<close> | 
| 1596 | ||
| 61854 | 1597 | text \<open> | 
| 1598 | Due to ubiquitous parallelism in Isabelle/ML (see also | |
| 1599 |   \secref{sec:multi-threading}), the mutable reference cells of Standard ML
 | |
| 1600 | are notorious for causing problems. In a highly parallel system, both | |
| 1601 | correctness \<^emph>\<open>and\<close> performance are easily degraded when using mutable data. | |
| 39859 | 1602 | |
| 61854 | 1603 |   The unwieldy name of @{ML Unsynchronized.ref} for the constructor for
 | 
| 1604 | references in Isabelle/ML emphasizes the inconveniences caused by | |
| 1605 |   mutability. Existing operations @{ML "!"} and @{ML_op ":="} are unchanged,
 | |
| 1606 | but should be used with special precautions, say in a strictly local | |
| 1607 | situation that is guaranteed to be restricted to sequential evaluation --- | |
| 1608 | now and in the future. | |
| 40508 | 1609 | |
| 1610 |   \begin{warn}
 | |
| 1611 |   Never @{ML_text "open Unsynchronized"}, not even in a local scope!
 | |
| 1612 | Pretending that mutable state is no problem is a very bad idea. | |
| 1613 |   \end{warn}
 | |
| 58618 | 1614 | \<close> | 
| 1615 | ||
| 1616 | ||
| 1617 | section \<open>Thread-safe programming \label{sec:multi-threading}\<close>
 | |
| 1618 | ||
| 61854 | 1619 | text \<open> | 
| 1620 | Multi-threaded execution has become an everyday reality in Isabelle since | |
| 1621 | Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides implicit and explicit | |
| 1622 | parallelism by default, and there is no way for user-space tools to ``opt | |
| 1623 | out''. ML programs that are purely functional, output messages only via the | |
| 1624 |   official channels (\secref{sec:message-channels}), and do not intercept
 | |
| 1625 |   interrupts (\secref{sec:exceptions}) can participate in the multi-threaded
 | |
| 39868 | 1626 | environment immediately without further ado. | 
| 1627 | ||
| 61854 | 1628 | More ambitious tools with more fine-grained interaction with the environment | 
| 1629 | need to observe the principles explained below. | |
| 58618 | 1630 | \<close> | 
| 1631 | ||
| 1632 | ||
| 1633 | subsection \<open>Multi-threading with shared memory\<close> | |
| 1634 | ||
| 61854 | 1635 | text \<open> | 
| 1636 | Multiple threads help to organize advanced operations of the system, such as | |
| 1637 | real-time conditions on command transactions, sub-components with explicit | |
| 1638 | communication, general asynchronous interaction etc. Moreover, parallel | |
| 1639 | evaluation is a prerequisite to make adequate use of the CPU resources that | |
| 1640 | are available on multi-core systems.\<^footnote>\<open>Multi-core computing does not mean | |
| 1641 | that there are ``spare cycles'' to be wasted. It means that the continued | |
| 1642 | exponential speedup of CPU performance due to ``Moore's Law'' follows | |
| 1643 | different rules: clock frequency has reached its peak around 2005, and | |
| 1644 | applications need to be parallelized in order to avoid a perceived loss of | |
| 1645 |   performance. See also @{cite "Sutter:2005"}.\<close>
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1646 | |
| 57421 | 1647 | Isabelle/Isar exploits the inherent structure of theories and proofs to | 
| 61854 | 1648 | support \<^emph>\<open>implicit parallelism\<close> to a large extent. LCF-style theorem proving | 
| 1649 |   provides almost ideal conditions for that, see also @{cite "Wenzel:2009"}.
 | |
| 1650 | This means, significant parts of theory and proof checking is parallelized | |
| 1651 | by default. In Isabelle2013, a maximum speedup-factor of 3.5 on 4 cores and | |
| 1652 |   6.5 on 8 cores can be expected @{cite "Wenzel:2013:ITP"}.
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1653 | |
| 61416 | 1654 | \<^medskip> | 
| 61854 | 1655 | ML threads lack the memory protection of separate processes, and operate | 
| 1656 | concurrently on shared heap memory. This has the advantage that results of | |
| 1657 | independent computations are directly available to other threads: abstract | |
| 1658 | values can be passed without copying or awkward serialization that is | |
| 1659 | typically required for separate processes. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1660 | |
| 61854 | 1661 | To make shared-memory multi-threading work robustly and efficiently, some | 
| 1662 | programming guidelines need to be observed. While the ML system is | |
| 1663 | responsible to maintain basic integrity of the representation of ML values | |
| 1664 | in memory, the application programmer needs to ensure that multi-threaded | |
| 1665 | execution does not break the intended semantics. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1666 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1667 |   \begin{warn}
 | 
| 61854 | 1668 | To participate in implicit parallelism, tools need to be thread-safe. A | 
| 1669 | single ill-behaved tool can affect the stability and performance of the | |
| 1670 | whole system. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1671 |   \end{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1672 | |
| 57421 | 1673 | Apart from observing the principles of thread-safeness passively, advanced | 
| 1674 | tools may also exploit parallelism actively, e.g.\ by using library | |
| 39868 | 1675 |   functions for parallel list operations (\secref{sec:parlist}).
 | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1676 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1677 |   \begin{warn}
 | 
| 61854 | 1678 | Parallel computing resources are managed centrally by the Isabelle/ML | 
| 1679 | infrastructure. User programs should not fork their own ML threads to | |
| 1680 | perform heavy computations. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1681 |   \end{warn}
 | 
| 58618 | 1682 | \<close> | 
| 1683 | ||
| 1684 | ||
| 1685 | subsection \<open>Critical shared resources\<close> | |
| 1686 | ||
| 61854 | 1687 | text \<open> | 
| 1688 | Thread-safeness is mainly concerned about concurrent read/write access to | |
| 1689 | shared resources, which are outside the purely functional world of ML. This | |
| 1690 | covers the following in particular. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1691 | |
| 61854 | 1692 | \<^item> Global references (or arrays), i.e.\ mutable memory cells that persist | 
| 1693 | over several invocations of associated operations.\<^footnote>\<open>This is independent of | |
| 1694 | the visibility of such mutable values in the toplevel scope.\<close> | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1695 | |
| 61854 | 1696 | \<^item> Global state of the running Isabelle/ML process, i.e.\ raw I/O channels, | 
| 1697 | environment variables, current working directory. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1698 | |
| 61854 | 1699 | \<^item> Writable resources in the file-system that are shared among different | 
| 1700 | threads or external processes. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1701 | |
| 61854 | 1702 | Isabelle/ML provides various mechanisms to avoid critical shared resources | 
| 1703 | in most situations. As last resort there are some mechanisms for explicit | |
| 1704 | synchronization. The following guidelines help to make Isabelle/ML programs | |
| 1705 | work smoothly in a concurrent environment. | |
| 1706 | ||
| 1707 | \<^item> Avoid global references altogether. Isabelle/Isar maintains a uniform | |
| 1708 | context that incorporates arbitrary data declared by user programs | |
| 1709 |   (\secref{sec:context-data}). This context is passed as plain value and user
 | |
| 1710 | tools can get/map their own data in a purely functional manner. | |
| 1711 |   Configuration options within the context (\secref{sec:config-options})
 | |
| 1712 | provide simple drop-in replacements for historic reference variables. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1713 | |
| 61854 | 1714 | \<^item> Keep components with local state information re-entrant. Instead of poking | 
| 1715 | initial values into (private) global references, a new state record can be | |
| 1716 | created on each invocation, and passed through any auxiliary functions of | |
| 1717 | the component. The state record contain mutable references in special | |
| 1718 | situations, without requiring any synchronization, as long as each | |
| 1719 | invocation gets its own copy and the tool itself is single-threaded. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1720 | |
| 61854 | 1721 | \<^item> Avoid raw output on \<open>stdout\<close> or \<open>stderr\<close>. The Poly/ML library is | 
| 1722 | thread-safe for each individual output operation, but the ordering of | |
| 1723 | parallel invocations is arbitrary. This means raw output will appear on some | |
| 1724 | system console with unpredictable interleaving of atomic chunks. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1725 | |
| 39868 | 1726 | Note that this does not affect regular message output channels | 
| 61854 | 1727 |   (\secref{sec:message-channels}). An official message id is associated with
 | 
| 1728 | the command transaction from where it originates, independently of other | |
| 1729 | transactions. This means each running Isar command has effectively its own | |
| 1730 | set of message channels, and interleaving can only happen when commands use | |
| 1731 | parallelism internally (and only at message boundaries). | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1732 | |
| 61854 | 1733 | \<^item> Treat environment variables and the current working directory of the | 
| 1734 | running process as read-only. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1735 | |
| 61854 | 1736 | \<^item> Restrict writing to the file-system to unique temporary files. Isabelle | 
| 1737 | already provides a temporary directory that is unique for the running | |
| 1738 | process, and there is a centralized source of unique serial numbers in | |
| 1739 | Isabelle/ML. Thus temporary files that are passed to to some external | |
| 1740 | process will be always disjoint, and thus thread-safe. | |
| 58618 | 1741 | \<close> | 
| 1742 | ||
| 1743 | text %mlref \<open> | |
| 39868 | 1744 |   \begin{mldecls}
 | 
| 1745 |   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
 | |
| 1746 |   @{index_ML serial_string: "unit -> string"} \\
 | |
| 1747 |   \end{mldecls}
 | |
| 1748 | ||
| 61854 | 1749 |   \<^descr> @{ML File.tmp_path}~\<open>path\<close> relocates the base component of \<open>path\<close> into the
 | 
| 1750 | unique temporary directory of the running Isabelle/ML process. | |
| 39868 | 1751 | |
| 61854 | 1752 |   \<^descr> @{ML serial_string}~\<open>()\<close> creates a new serial number that is unique over
 | 
| 1753 | the runtime of the Isabelle/ML process. | |
| 58618 | 1754 | \<close> | 
| 1755 | ||
| 61854 | 1756 | text %mlex \<open> | 
| 1757 | The following example shows how to create unique temporary file names. | |
| 58618 | 1758 | \<close> | 
| 1759 | ||
| 59902 | 1760 | ML_val \<open> | 
| 39868 | 1761 |   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | 
| 1762 |   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | |
| 1763 |   @{assert} (tmp1 <> tmp2);
 | |
| 58618 | 1764 | \<close> | 
| 1765 | ||
| 1766 | ||
| 1767 | subsection \<open>Explicit synchronization\<close> | |
| 1768 | ||
| 61854 | 1769 | text \<open> | 
| 1770 | Isabelle/ML provides explicit synchronization for mutable variables over | |
| 59180 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 wenzelm parents: 
59138diff
changeset | 1771 | immutable data, which may be updated atomically and exclusively. This | 
| 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 wenzelm parents: 
59138diff
changeset | 1772 | addresses the rare situations where mutable shared resources are really | 
| 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 wenzelm parents: 
59138diff
changeset | 1773 | required. Synchronization in Isabelle/ML is based on primitives of Poly/ML, | 
| 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 wenzelm parents: 
59138diff
changeset | 1774 | which have been adapted to the specific assumptions of the concurrent | 
| 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 wenzelm parents: 
59138diff
changeset | 1775 | Isabelle environment. User code should not break this abstraction, but stay | 
| 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 wenzelm parents: 
59138diff
changeset | 1776 | within the confines of concurrent Isabelle/ML. | 
| 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 wenzelm parents: 
59138diff
changeset | 1777 | |
| 61854 | 1778 | A \<^emph>\<open>synchronized variable\<close> is an explicit state component associated with | 
| 1779 | mechanisms for locking and signaling. There are operations to await a | |
| 59180 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 wenzelm parents: 
59138diff
changeset | 1780 | condition, change the state, and signal the change to all other waiting | 
| 61477 | 1781 | threads. Synchronized access to the state variable is \<^emph>\<open>not\<close> re-entrant: | 
| 61854 | 1782 | direct or indirect nesting within the same thread will cause a deadlock! | 
| 1783 | \<close> | |
| 58618 | 1784 | |
| 1785 | text %mlref \<open> | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1786 |   \begin{mldecls}
 | 
| 39871 | 1787 |   @{index_ML_type "'a Synchronized.var"} \\
 | 
| 1788 |   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
 | |
| 1789 |   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
 | |
| 1790 |   ('a -> ('b * 'a) option) -> 'b"} \\
 | |
| 1791 |   \end{mldecls}
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1792 | |
| 61854 | 1793 |     \<^descr> Type @{ML_type "'a Synchronized.var"} represents synchronized variables
 | 
| 1794 |     with state of type @{ML_type 'a}.
 | |
| 39871 | 1795 | |
| 61854 | 1796 |     \<^descr> @{ML Synchronized.var}~\<open>name x\<close> creates a synchronized variable that is
 | 
| 1797 | initialized with value \<open>x\<close>. The \<open>name\<close> is used for tracing. | |
| 61493 | 1798 | |
| 61854 | 1799 |     \<^descr> @{ML Synchronized.guarded_access}~\<open>var f\<close> lets the function \<open>f\<close> operate
 | 
| 1800 | within a critical section on the state \<open>x\<close> as follows: if \<open>f x\<close> produces | |
| 1801 |     @{ML NONE}, it continues to wait on the internal condition variable,
 | |
| 1802 | expecting that some other thread will eventually change the content in a | |
| 1803 |     suitable manner; if \<open>f x\<close> produces @{ML SOME}~\<open>(y, x')\<close> it is satisfied and
 | |
| 1804 | assigns the new state value \<open>x'\<close>, broadcasts a signal to all waiting threads | |
| 1805 | on the associated condition variable, and returns the result \<open>y\<close>. | |
| 39871 | 1806 | |
| 61854 | 1807 |   There are some further variants of the @{ML Synchronized.guarded_access}
 | 
| 63680 | 1808 | combinator, see \<^file>\<open>~~/src/Pure/Concurrent/synchronized.ML\<close> for details. | 
| 58618 | 1809 | \<close> | 
| 1810 | ||
| 61854 | 1811 | text %mlex \<open> | 
| 1812 | The following example implements a counter that produces positive integers | |
| 1813 | that are unique over the runtime of the Isabelle process: | |
| 1814 | \<close> | |
| 58618 | 1815 | |
| 59902 | 1816 | ML_val \<open> | 
| 39871 | 1817 | local | 
| 1818 | val counter = Synchronized.var "counter" 0; | |
| 1819 | in | |
| 1820 | fun next () = | |
| 1821 | Synchronized.guarded_access counter | |
| 1822 | (fn i => | |
| 1823 | let val j = i + 1 | |
| 1824 | in SOME (j, j) end); | |
| 1825 | end; | |
| 59902 | 1826 | |
| 39871 | 1827 | val a = next (); | 
| 1828 | val b = next (); | |
| 1829 |   @{assert} (a <> b);
 | |
| 58618 | 1830 | \<close> | 
| 1831 | ||
| 61416 | 1832 | text \<open> | 
| 1833 | \<^medskip> | |
| 63680 | 1834 | See \<^file>\<open>~~/src/Pure/Concurrent/mailbox.ML\<close> how to implement a mailbox as | 
| 1835 | synchronized variable over a purely functional list. | |
| 61854 | 1836 | \<close> | 
| 58618 | 1837 | |
| 1838 | ||
| 1839 | section \<open>Managed evaluation\<close> | |
| 1840 | ||
| 61854 | 1841 | text \<open> | 
| 1842 | Execution of Standard ML follows the model of strict functional evaluation | |
| 1843 | with optional exceptions. Evaluation happens whenever some function is | |
| 1844 | applied to (sufficiently many) arguments. The result is either an explicit | |
| 1845 | value or an implicit exception. | |
| 52419 | 1846 | |
| 61854 | 1847 | \<^emph>\<open>Managed evaluation\<close> in Isabelle/ML organizes expressions and results to | 
| 1848 | control certain physical side-conditions, to say more specifically when and | |
| 1849 | how evaluation happens. For example, the Isabelle/ML library supports lazy | |
| 1850 | evaluation with memoing, parallel evaluation via futures, asynchronous | |
| 1851 | evaluation via promises, evaluation with time limit etc. | |
| 52419 | 1852 | |
| 61416 | 1853 | \<^medskip> | 
| 61854 | 1854 | An \<^emph>\<open>unevaluated expression\<close> is represented either as unit abstraction \<^verbatim>\<open>fn | 
| 1855 | () => a\<close> of type \<^verbatim>\<open>unit -> 'a\<close> or as regular function \<^verbatim>\<open>fn a => b\<close> of type | |
| 1856 | \<^verbatim>\<open>'a -> 'b\<close>. Both forms occur routinely, and special care is required to | |
| 1857 | tell them apart --- the static type-system of SML is only of limited help | |
| 1858 | here. | |
| 52419 | 1859 | |
| 61854 | 1860 | The first form is more intuitive: some combinator \<open>(unit -> 'a) -> 'a\<close> | 
| 1861 | applies the given function to \<open>()\<close> to initiate the postponed evaluation | |
| 1862 |   process. The second form is more flexible: some combinator \<open>('a -> 'b) -> 'a
 | |
| 1863 | -> 'b\<close> acts like a modified form of function application; several such | |
| 1864 | combinators may be cascaded to modify a given function, before it is | |
| 1865 | ultimately applied to some argument. | |
| 52419 | 1866 | |
| 61416 | 1867 | \<^medskip> | 
| 61854 | 1868 | \<^emph>\<open>Reified results\<close> make the disjoint sum of regular values versions | 
| 1869 | exceptional situations explicit as ML datatype: \<open>'a result = Res of 'a | Exn | |
| 1870 | of exn\<close>. This is typically used for administrative purposes, to store the | |
| 1871 | overall outcome of an evaluation process. | |
| 52419 | 1872 | |
| 61854 | 1873 | \<^emph>\<open>Parallel exceptions\<close> aggregate reified results, such that multiple | 
| 1874 | exceptions are digested as a collection in canonical form that identifies | |
| 1875 | exceptions according to their original occurrence. This is particular | |
| 1876 |   important for parallel evaluation via futures \secref{sec:futures}, which
 | |
| 1877 | are organized as acyclic graph of evaluations that depend on other | |
| 1878 | evaluations: exceptions stemming from shared sub-graphs are exposed exactly | |
| 1879 | once and in the order of their original occurrence (e.g.\ when printed at | |
| 1880 | the toplevel). Interrupt counts as neutral element here: it is treated as | |
| 1881 | minimal information about some canceled evaluation process, and is absorbed | |
| 1882 | by the presence of regular program exceptions. | |
| 1883 | \<close> | |
| 58618 | 1884 | |
| 1885 | text %mlref \<open> | |
| 52419 | 1886 |   \begin{mldecls}
 | 
| 1887 |   @{index_ML_type "'a Exn.result"} \\
 | |
| 1888 |   @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
 | |
| 1889 |   @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
 | |
| 1890 |   @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
 | |
| 1891 |   @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
 | |
| 1892 |   @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
 | |
| 1893 |   \end{mldecls}
 | |
| 1894 | ||
| 61854 | 1895 |   \<^descr> Type @{ML_type "'a Exn.result"} represents the disjoint sum of ML results
 | 
| 1896 |   explicitly, with constructor @{ML Exn.Res} for regular values and @{ML
 | |
| 1897 | "Exn.Exn"} for exceptions. | |
| 52419 | 1898 | |
| 61854 | 1899 |   \<^descr> @{ML Exn.capture}~\<open>f x\<close> manages the evaluation of \<open>f x\<close> such that
 | 
| 1900 |   exceptions are made explicit as @{ML "Exn.Exn"}. Note that this includes
 | |
| 1901 |   physical interrupts (see also \secref{sec:exceptions}), so the same
 | |
| 1902 | precautions apply to user code: interrupts must not be absorbed | |
| 1903 | accidentally! | |
| 52419 | 1904 | |
| 61854 | 1905 |   \<^descr> @{ML Exn.interruptible_capture} is similar to @{ML Exn.capture}, but
 | 
| 1906 | interrupts are immediately re-raised as required for user code. | |
| 52419 | 1907 | |
| 61854 | 1908 |   \<^descr> @{ML Exn.release}~\<open>result\<close> releases the original runtime result, exposing
 | 
| 1909 | its regular value or raising the reified exception. | |
| 52419 | 1910 | |
| 61854 | 1911 |   \<^descr> @{ML Par_Exn.release_all}~\<open>results\<close> combines results that were produced
 | 
| 1912 | independently (e.g.\ by parallel evaluation). If all results are regular | |
| 1913 | values, that list is returned. Otherwise, the collection of all exceptions | |
| 1914 | is raised, wrapped-up as collective parallel exception. Note that the latter | |
| 1915 | prevents access to individual exceptions by conventional \<^verbatim>\<open>handle\<close> of ML. | |
| 52419 | 1916 | |
| 61854 | 1917 |   \<^descr> @{ML Par_Exn.release_first} is similar to @{ML Par_Exn.release_all}, but
 | 
| 1918 | only the first (meaningful) exception that has occurred in the original | |
| 1919 | evaluation process is raised again, the others are ignored. That single | |
| 1920 | exception may get handled by conventional means in ML. | |
| 58618 | 1921 | \<close> | 
| 1922 | ||
| 1923 | ||
| 1924 | subsection \<open>Parallel skeletons \label{sec:parlist}\<close>
 | |
| 1925 | ||
| 1926 | text \<open> | |
| 61854 | 1927 | Algorithmic skeletons are combinators that operate on lists in parallel, in | 
| 1928 | the manner of well-known \<open>map\<close>, \<open>exists\<close>, \<open>forall\<close> etc. Management of | |
| 1929 |   futures (\secref{sec:futures}) and their results as reified exceptions is
 | |
| 1930 | wrapped up into simple programming interfaces that resemble the sequential | |
| 1931 | versions. | |
| 52420 | 1932 | |
| 61854 | 1933 | What remains is the application-specific problem to present expressions with | 
| 1934 | suitable \<^emph>\<open>granularity\<close>: each list element corresponds to one evaluation | |
| 1935 | task. If the granularity is too coarse, the available CPUs are not | |
| 1936 | saturated. If it is too fine-grained, CPU cycles are wasted due to the | |
| 1937 | overhead of organizing parallel processing. In the worst case, parallel | |
| 52420 | 1938 | performance will be less than the sequential counterpart! | 
| 58618 | 1939 | \<close> | 
| 1940 | ||
| 1941 | text %mlref \<open> | |
| 52420 | 1942 |   \begin{mldecls}
 | 
| 1943 |   @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
 | |
| 1944 |   @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
 | |
| 1945 |   \end{mldecls}
 | |
| 1946 | ||
| 61854 | 1947 |   \<^descr> @{ML Par_List.map}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> is like @{ML "map"}~\<open>f [x\<^sub>1, \<dots>,
 | 
| 1948 | x\<^sub>n]\<close>, but the evaluation of \<open>f x\<^sub>i\<close> for \<open>i = 1, \<dots>, n\<close> is performed in | |
| 1949 | parallel. | |
| 61493 | 1950 | |
| 61854 | 1951 | An exception in any \<open>f x\<^sub>i\<close> cancels the overall evaluation process. The | 
| 1952 |   final result is produced via @{ML Par_Exn.release_first} as explained above,
 | |
| 1953 | which means the first program exception that happened to occur in the | |
| 1954 | parallel evaluation is propagated, and all other failures are ignored. | |
| 52420 | 1955 | |
| 61854 | 1956 |   \<^descr> @{ML Par_List.get_some}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> produces some \<open>f x\<^sub>i\<close> that is of
 | 
| 1957 | the form \<open>SOME y\<^sub>i\<close>, if that exists, otherwise \<open>NONE\<close>. Thus it is similar to | |
| 1958 |   @{ML Library.get_first}, but subject to a non-deterministic parallel choice
 | |
| 1959 | process. The first successful result cancels the overall evaluation process; | |
| 1960 |   other exceptions are propagated as for @{ML Par_List.map}.
 | |
| 52420 | 1961 | |
| 61854 | 1962 | This generic parallel choice combinator is the basis for derived forms, such | 
| 1963 |   as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML Par_List.forall}.
 | |
| 58618 | 1964 | \<close> | 
| 1965 | ||
| 61854 | 1966 | text %mlex \<open> | 
| 1967 | Subsequently, the Ackermann function is evaluated in parallel for some | |
| 1968 | ranges of arguments. | |
| 1969 | \<close> | |
| 58618 | 1970 | |
| 1971 | ML_val \<open> | |
| 52420 | 1972 | fun ackermann 0 n = n + 1 | 
| 1973 | | ackermann m 0 = ackermann (m - 1) 1 | |
| 1974 | | ackermann m n = ackermann (m - 1) (ackermann m (n - 1)); | |
| 1975 | ||
| 1976 | Par_List.map (ackermann 2) (500 upto 1000); | |
| 1977 | Par_List.map (ackermann 3) (5 upto 10); | |
| 58618 | 1978 | \<close> | 
| 1979 | ||
| 1980 | ||
| 1981 | subsection \<open>Lazy evaluation\<close> | |
| 1982 | ||
| 1983 | text \<open> | |
| 61854 | 1984 | Classic lazy evaluation works via the \<open>lazy\<close>~/ \<open>force\<close> pair of operations: | 
| 1985 | \<open>lazy\<close> to wrap an unevaluated expression, and \<open>force\<close> to evaluate it once | |
| 1986 | and store its result persistently. Later invocations of \<open>force\<close> retrieve the | |
| 1987 | stored result without another evaluation. Isabelle/ML refines this idea to | |
| 1988 | accommodate the aspects of multi-threading, synchronous program exceptions | |
| 1989 | and asynchronous interrupts. | |
| 57347 | 1990 | |
| 61854 | 1991 | The first thread that invokes \<open>force\<close> on an unfinished lazy value changes | 
| 1992 | its state into a \<^emph>\<open>promise\<close> of the eventual result and starts evaluating it. | |
| 1993 | Any other threads that \<open>force\<close> the same lazy value in the meantime need to | |
| 1994 | wait for it to finish, by producing a regular result or program exception. | |
| 1995 | If the evaluation attempt is interrupted, this event is propagated to all | |
| 1996 | waiting threads and the lazy value is reset to its original state. | |
| 57347 | 1997 | |
| 1998 | This means a lazy value is completely evaluated at most once, in a | |
| 1999 | thread-safe manner. There might be multiple interrupted evaluation attempts, | |
| 2000 | and multiple receivers of intermediate interrupt events. Interrupts are | |
| 61477 | 2001 | \<^emph>\<open>not\<close> made persistent: later evaluation attempts start again from the | 
| 57347 | 2002 | original expression. | 
| 58618 | 2003 | \<close> | 
| 2004 | ||
| 2005 | text %mlref \<open> | |
| 57347 | 2006 |   \begin{mldecls}
 | 
| 2007 |   @{index_ML_type "'a lazy"} \\
 | |
| 2008 |   @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\
 | |
| 2009 |   @{index_ML Lazy.value: "'a -> 'a lazy"} \\
 | |
| 2010 |   @{index_ML Lazy.force: "'a lazy -> 'a"} \\
 | |
| 2011 |   \end{mldecls}
 | |
| 2012 | ||
| 61503 | 2013 |   \<^descr> Type @{ML_type "'a lazy"} represents lazy values over type \<^verbatim>\<open>'a\<close>.
 | 
| 57347 | 2014 | |
| 61854 | 2015 |   \<^descr> @{ML Lazy.lazy}~\<open>(fn () => e)\<close> wraps the unevaluated expression \<open>e\<close> as
 | 
| 2016 | unfinished lazy value. | |
| 61493 | 2017 | |
| 61854 | 2018 |   \<^descr> @{ML Lazy.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished lazy value. When
 | 
| 2019 | forced, it returns \<open>a\<close> without any further evaluation. | |
| 57347 | 2020 | |
| 57349 | 2021 | There is very low overhead for this proforma wrapping of strict values as | 
| 2022 | lazy values. | |
| 57347 | 2023 | |
| 61493 | 2024 |   \<^descr> @{ML Lazy.force}~\<open>x\<close> produces the result of the lazy value in a
 | 
| 57347 | 2025 | thread-safe manner as explained above. Thus it may cause the current thread | 
| 2026 | to wait on a pending evaluation attempt by another thread. | |
| 58618 | 2027 | \<close> | 
| 2028 | ||
| 2029 | ||
| 2030 | subsection \<open>Futures \label{sec:futures}\<close>
 | |
| 2031 | ||
| 2032 | text \<open> | |
| 57349 | 2033 | Futures help to organize parallel execution in a value-oriented manner, with | 
| 61854 | 2034 | \<open>fork\<close>~/ \<open>join\<close> as the main pair of operations, and some further variants; | 
| 2035 |   see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy values,
 | |
| 2036 | futures are evaluated strictly and spontaneously on separate worker threads. | |
| 2037 | Futures may be canceled, which leads to interrupts on running evaluation | |
| 2038 | attempts, and forces structurally related futures to fail for all time; | |
| 2039 | already finished futures remain unchanged. Exceptions between related | |
| 57350 | 2040 | futures are propagated as well, and turned into parallel exceptions (see | 
| 2041 | above). | |
| 57349 | 2042 | |
| 2043 | Technically, a future is a single-assignment variable together with a | |
| 61854 | 2044 | \<^emph>\<open>task\<close> that serves administrative purposes, notably within the \<^emph>\<open>task | 
| 2045 | queue\<close> where new futures are registered for eventual evaluation and the | |
| 2046 | worker threads retrieve their work. | |
| 57349 | 2047 | |
| 57350 | 2048 | The pool of worker threads is limited, in correlation with the number of | 
| 2049 | physical cores on the machine. Note that allocation of runtime resources may | |
| 2050 | be distorted either if workers yield CPU time (e.g.\ via system sleep or | |
| 2051 | wait operations), or if non-worker threads contend for significant runtime | |
| 2052 | resources independently. There is a limited number of replacement worker | |
| 2053 | threads that get activated in certain explicit wait conditions, after a | |
| 2054 | timeout. | |
| 2055 | ||
| 61416 | 2056 | \<^medskip> | 
| 61854 | 2057 | Each future task belongs to some \<^emph>\<open>task group\<close>, which represents the | 
| 2058 | hierarchic structure of related tasks, together with the exception status a | |
| 2059 | that point. By default, the task group of a newly created future is a new | |
| 2060 | sub-group of the presently running one, but it is also possible to indicate | |
| 2061 | different group layouts under program control. | |
| 57349 | 2062 | |
| 2063 | Cancellation of futures actually refers to the corresponding task group and | |
| 2064 | all its sub-groups. Thus interrupts are propagated down the group hierarchy. | |
| 2065 | Regular program exceptions are treated likewise: failure of the evaluation | |
| 2066 | of some future task affects its own group and all sub-groups. Given a | |
| 61854 | 2067 | particular task group, its \<^emph>\<open>group status\<close> cumulates all relevant exceptions | 
| 2068 | according to its position within the group hierarchy. Interrupted tasks that | |
| 2069 | lack regular result information, will pick up parallel exceptions from the | |
| 2070 | cumulative group status. | |
| 57349 | 2071 | |
| 61416 | 2072 | \<^medskip> | 
| 61854 | 2073 | A \<^emph>\<open>passive future\<close> or \<^emph>\<open>promise\<close> is a future with slightly different | 
| 2074 | evaluation policies: there is only a single-assignment variable and some | |
| 2075 | expression to evaluate for the \<^emph>\<open>failed\<close> case (e.g.\ to clean up resources | |
| 2076 | when canceled). A regular result is produced by external means, using a | |
| 2077 | separate \<^emph>\<open>fulfill\<close> operation. | |
| 57349 | 2078 | |
| 2079 | Promises are managed in the same task queue, so regular futures may depend | |
| 2080 | on them. This allows a form of reactive programming, where some promises are | |
| 2081 | used as minimal elements (or guards) within the future dependency graph: | |
| 2082 | when these promises are fulfilled the evaluation of subsequent futures | |
| 2083 | starts spontaneously, according to their own inter-dependencies. | |
| 58618 | 2084 | \<close> | 
| 2085 | ||
| 2086 | text %mlref \<open> | |
| 57348 | 2087 |   \begin{mldecls}
 | 
| 2088 |   @{index_ML_type "'a future"} \\
 | |
| 2089 |   @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\
 | |
| 2090 |   @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\
 | |
| 57349 | 2091 |   @{index_ML Future.join: "'a future -> 'a"} \\
 | 
| 2092 |   @{index_ML Future.joins: "'a future list -> 'a list"} \\
 | |
| 57348 | 2093 |   @{index_ML Future.value: "'a -> 'a future"} \\
 | 
| 2094 |   @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\
 | |
| 2095 |   @{index_ML Future.cancel: "'a future -> unit"} \\
 | |
| 2096 |   @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex]
 | |
| 2097 |   @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\
 | |
| 2098 |   @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\
 | |
| 2099 |   \end{mldecls}
 | |
| 2100 | ||
| 61503 | 2101 |   \<^descr> Type @{ML_type "'a future"} represents future values over type \<^verbatim>\<open>'a\<close>.
 | 
| 57348 | 2102 | |
| 61854 | 2103 |   \<^descr> @{ML Future.fork}~\<open>(fn () => e)\<close> registers the unevaluated expression \<open>e\<close>
 | 
| 2104 | as unfinished future value, to be evaluated eventually on the parallel | |
| 2105 |   worker-thread farm. This is a shorthand for @{ML Future.forks} below, with
 | |
| 2106 | default parameters and a single expression. | |
| 57348 | 2107 | |
| 61854 | 2108 |   \<^descr> @{ML Future.forks}~\<open>params exprs\<close> is the general interface to fork several
 | 
| 2109 | futures simultaneously. The \<open>params\<close> consist of the following fields: | |
| 57348 | 2110 | |
| 61854 | 2111 |     \<^item> \<open>name : string\<close> (default @{ML "\"\""}) specifies a common name for the
 | 
| 2112 | tasks of the forked futures, which serves diagnostic purposes. | |
| 61458 | 2113 | |
| 61854 | 2114 |     \<^item> \<open>group : Future.group option\<close> (default @{ML NONE}) specifies an optional
 | 
| 2115 |     task group for the forked futures. @{ML NONE} means that a new sub-group
 | |
| 2116 | of the current worker-thread task context is created. If this is not a | |
| 2117 | worker thread, the group will be a new root in the group hierarchy. | |
| 61458 | 2118 | |
| 61854 | 2119 |     \<^item> \<open>deps : Future.task list\<close> (default @{ML "[]"}) specifies dependencies on
 | 
| 2120 | other future tasks, i.e.\ the adjacency relation in the global task queue. | |
| 2121 | Dependencies on already finished tasks are ignored. | |
| 61458 | 2122 | |
| 61854 | 2123 |     \<^item> \<open>pri : int\<close> (default @{ML 0}) specifies a priority within the task
 | 
| 2124 | queue. | |
| 61458 | 2125 | |
| 61854 | 2126 |     Typically there is only little deviation from the default priority @{ML
 | 
| 2127 |     0}. As a rule of thumb, @{ML "~1"} means ``low priority" and @{ML 1} means
 | |
| 61458 | 2128 | ``high priority''. | 
| 2129 | ||
| 61854 | 2130 | Note that the task priority only affects the position in the queue, not | 
| 2131 | the thread priority. When a worker thread picks up a task for processing, | |
| 2132 | it runs with the normal thread priority to the end (or until canceled). | |
| 2133 | Higher priority tasks that are queued later need to wait until this (or | |
| 2134 | another) worker thread becomes free again. | |
| 61458 | 2135 | |
| 61854 | 2136 |     \<^item> \<open>interrupts : bool\<close> (default @{ML true}) tells whether the worker thread
 | 
| 2137 | that processes the corresponding task is initially put into interruptible | |
| 2138 | state. This state may change again while running, by modifying the thread | |
| 2139 | attributes. | |
| 61458 | 2140 | |
| 61854 | 2141 | With interrupts disabled, a running future task cannot be canceled. It is | 
| 61458 | 2142 | the responsibility of the programmer that this special state is retained | 
| 2143 | only briefly. | |
| 57348 | 2144 | |
| 61854 | 2145 |   \<^descr> @{ML Future.join}~\<open>x\<close> retrieves the value of an already finished future,
 | 
| 2146 | which may lead to an exception, according to the result of its previous | |
| 2147 | evaluation. | |
| 57348 | 2148 | |
| 2149 | For an unfinished future there are several cases depending on the role of | |
| 2150 | the current thread and the status of the future. A non-worker thread waits | |
| 2151 | passively until the future is eventually evaluated. A worker thread | |
| 2152 | temporarily changes its task context and takes over the responsibility to | |
| 57349 | 2153 | evaluate the future expression on the spot. The latter is done in a | 
| 2154 | thread-safe manner: other threads that intend to join the same future need | |
| 2155 | to wait until the ongoing evaluation is finished. | |
| 2156 | ||
| 2157 | Note that excessive use of dynamic dependencies of futures by adhoc joining | |
| 2158 | may lead to bad utilization of CPU cores, due to threads waiting on other | |
| 2159 | threads to finish required futures. The future task farm has a limited | |
| 2160 | amount of replacement threads that continue working on unrelated tasks after | |
| 2161 | some timeout. | |
| 57348 | 2162 | |
| 2163 | Whenever possible, static dependencies of futures should be specified | |
| 61854 | 2164 | explicitly when forked (see \<open>deps\<close> above). Thus the evaluation can work from | 
| 2165 | the bottom up, without join conflicts and wait states. | |
| 57349 | 2166 | |
| 61854 | 2167 |   \<^descr> @{ML Future.joins}~\<open>xs\<close> joins the given list of futures simultaneously,
 | 
| 2168 |   which is more efficient than @{ML "map Future.join"}~\<open>xs\<close>.
 | |
| 57349 | 2169 | |
| 2170 | Based on the dependency graph of tasks, the current thread takes over the | |
| 2171 | responsibility to evaluate future expressions that are required for the main | |
| 2172 | result, working from the bottom up. Waiting on future results that are | |
| 2173 | presently evaluated on other threads only happens as last resort, when no | |
| 2174 | other unfinished futures are left over. | |
| 2175 | ||
| 61854 | 2176 |   \<^descr> @{ML Future.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished future value,
 | 
| 2177 | bypassing the worker-thread farm. When joined, it returns \<open>a\<close> without any | |
| 2178 | further evaluation. | |
| 57349 | 2179 | |
| 2180 | There is very low overhead for this proforma wrapping of strict values as | |
| 57421 | 2181 | futures. | 
| 57348 | 2182 | |
| 61493 | 2183 |   \<^descr> @{ML Future.map}~\<open>f x\<close> is a fast-path implementation of @{ML
 | 
| 61854 | 2184 |   Future.fork}~\<open>(fn () => f (\<close>@{ML Future.join}~\<open>x))\<close>, which avoids the full
 | 
| 2185 | overhead of the task queue and worker-thread farm as far as possible. The | |
| 2186 | function \<open>f\<close> is supposed to be some trivial post-processing or projection of | |
| 2187 | the future result. | |
| 57348 | 2188 | |
| 61854 | 2189 |   \<^descr> @{ML Future.cancel}~\<open>x\<close> cancels the task group of the given future, using
 | 
| 2190 |   @{ML Future.cancel_group} below.
 | |
| 57348 | 2191 | |
| 61854 | 2192 |   \<^descr> @{ML Future.cancel_group}~\<open>group\<close> cancels all tasks of the given task
 | 
| 2193 | group for all time. Threads that are presently processing a task of the | |
| 2194 | given group are interrupted: it may take some time until they are actually | |
| 2195 | terminated. Tasks that are queued but not yet processed are dequeued and | |
| 2196 | forced into interrupted state. Since the task group is itself invalidated, | |
| 2197 | any further attempt to fork a future that belongs to it will yield a | |
| 2198 | canceled result as well. | |
| 57348 | 2199 | |
| 61854 | 2200 |   \<^descr> @{ML Future.promise}~\<open>abort\<close> registers a passive future with the given
 | 
| 2201 | \<open>abort\<close> operation: it is invoked when the future task group is canceled. | |
| 57348 | 2202 | |
| 61854 | 2203 |   \<^descr> @{ML Future.fulfill}~\<open>x a\<close> finishes the passive future \<open>x\<close> by the given
 | 
| 2204 | value \<open>a\<close>. If the promise has already been canceled, the attempt to fulfill | |
| 2205 | it causes an exception. | |
| 58618 | 2206 | \<close> | 
| 57348 | 2207 | |
| 47180 | 2208 | end |