| author | desharna | 
| Fri, 12 Sep 2014 13:50:51 +0200 | |
| changeset 58327 | a147bd03cad0 | 
| parent 57834 | 0d295e339f52 | 
| child 58555 | 7975676c08c0 | 
| permissions | -rw-r--r-- | 
| 57347 | 1 | (*:wrap=hard:maxLineLen=78:*) | 
| 2 | ||
| 29755 | 3 | theory "ML" | 
| 4 | imports Base | |
| 5 | begin | |
| 18538 | 6 | |
| 39822 | 7 | chapter {* Isabelle/ML *}
 | 
| 20489 | 8 | |
| 39823 | 9 | text {* Isabelle/ML is best understood as a certain culture based on
 | 
| 10 | Standard ML. Thus it is not a new programming language, but a | |
| 11 | certain way to use SML at an advanced level within the Isabelle | |
| 12 | environment. This covers a variety of aspects that are geared | |
| 13 | towards an efficient and robust platform for applications of formal | |
| 14 | logic with fully foundational proof construction --- according to | |
| 40126 | 15 |   the well-known \emph{LCF principle}.  There is specific
 | 
| 16 | infrastructure with library modules to address the needs of this | |
| 17 | difficult task. For example, the raw parallel programming model of | |
| 18 | Poly/ML is presented as considerably more abstract concept of | |
| 57421 | 19 |   \emph{futures}, which is then used to augment the inference
 | 
| 20 | kernel, Isar theory and proof interpreter, and PIDE document management. | |
| 39823 | 21 | |
| 22 | The main aspects of Isabelle/ML are introduced below. These | |
| 23 | first-hand explanations should help to understand how proper | |
| 24 | Isabelle/ML is to be read and written, and to get access to the | |
| 25 | wealth of experience that is expressed in the source text and its | |
| 26 |   history of changes.\footnote{See
 | |
| 54703 | 27 |   @{url "http://isabelle.in.tum.de/repos/isabelle"} for the full
 | 
| 39823 | 28 | Mercurial history. There are symbolic tags to refer to official | 
| 29 |   Isabelle releases, as opposed to arbitrary \emph{tip} versions that
 | |
| 30 | merely reflect snapshots that are never really up-to-date.} *} | |
| 31 | ||
| 32 | ||
| 39878 | 33 | section {* Style and orthography *}
 | 
| 34 | ||
| 39879 | 35 | text {* The sources of Isabelle/Isar are optimized for
 | 
| 36 |   \emph{readability} and \emph{maintainability}.  The main purpose is
 | |
| 37 | to tell an informed reader what is really going on and how things | |
| 38 | really work. This is a non-trivial aim, but it is supported by a | |
| 39 | certain style of writing Isabelle/ML that has emerged from long | |
| 40 |   years of system development.\footnote{See also the interesting style
 | |
| 41 | guide for OCaml | |
| 54703 | 42 |   @{url "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"}
 | 
| 39878 | 43 | which shares many of our means and ends.} | 
| 44 | ||
| 45 |   The main principle behind any coding style is \emph{consistency}.
 | |
| 39879 | 46 | For a single author of a small program this merely means ``choose | 
| 39878 | 47 | your style and stick to it''. A complex project like Isabelle, with | 
| 39879 | 48 | long years of development and different contributors, requires more | 
| 49 | standardization. A coding style that is changed every few years or | |
| 50 | with every new contributor is no style at all, because consistency | |
| 51 | is quickly lost. Global consistency is hard to achieve, though. | |
| 40126 | 52 | Nonetheless, one should always strive at least for local consistency | 
| 53 | of modules and sub-systems, without deviating from some general | |
| 54 | principles how to write Isabelle/ML. | |
| 39878 | 55 | |
| 40126 | 56 |   In a sense, good coding style is like an \emph{orthography} for the
 | 
| 57 | sources: it helps to read quickly over the text and see through the | |
| 58 | main points, without getting distracted by accidental presentation | |
| 59 | of free-style code. | |
| 39878 | 60 | *} | 
| 61 | ||
| 62 | ||
| 63 | subsection {* Header and sectioning *}
 | |
| 64 | ||
| 39879 | 65 | text {* Isabelle source files have a certain standardized header
 | 
| 66 | format (with precise spacing) that follows ancient traditions | |
| 67 | reaching back to the earliest versions of the system by Larry | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 68 |   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.
 | 
| 39878 | 69 | |
| 70 |   The header includes at least @{verbatim Title} and @{verbatim
 | |
| 71 | Author} entries, followed by a prose description of the purpose of | |
| 72 | the module. The latter can range from a single line to several | |
| 73 | paragraphs of explanations. | |
| 74 | ||
| 75 | The rest of the file is divided into sections, subsections, | |
| 40126 | 76 | subsubsections, paragraphs etc.\ using a simple layout via ML | 
| 77 | comments as follows. | |
| 39878 | 78 | |
| 79 | \begin{verbatim}
 | |
| 80 | (*** section ***) | |
| 81 | ||
| 82 | (** subsection **) | |
| 83 | ||
| 84 | (* subsubsection *) | |
| 85 | ||
| 86 | (*short paragraph*) | |
| 87 | ||
| 88 | (* | |
| 40126 | 89 | long paragraph, | 
| 90 | with more text | |
| 39878 | 91 | *) | 
| 92 | \end{verbatim}
 | |
| 93 | ||
| 94 |   As in regular typography, there is some extra space \emph{before}
 | |
| 57421 | 95 | section headings that are adjacent to plain text, bit not other headings | 
| 96 | as in the example above. | |
| 39878 | 97 | |
| 98 | \medskip The precise wording of the prose text given in these | |
| 40126 | 99 | headings is chosen carefully to introduce the main theme of the | 
| 39879 | 100 | subsequent formal ML text. | 
| 101 | *} | |
| 102 | ||
| 103 | ||
| 39880 | 104 | subsection {* Naming conventions *}
 | 
| 39879 | 105 | |
| 106 | text {* Since ML is the primary medium to express the meaning of the
 | |
| 107 | source text, naming of ML entities requires special care. | |
| 108 | ||
| 39881 | 109 |   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely
 | 
| 110 | 4, but not more) that are separated by underscore. There are three | |
| 40126 | 111 | variants concerning upper or lower case letters, which are used for | 
| 39881 | 112 | certain ML categories as follows: | 
| 39880 | 113 | |
| 114 | \medskip | |
| 115 |   \begin{tabular}{lll}
 | |
| 116 | variant & example & ML categories \\\hline | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 117 |   lower-case & @{ML_text foo_bar} & values, types, record fields \\
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 118 |   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 119 |   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
 | 
| 39880 | 120 |   \end{tabular}
 | 
| 121 | \medskip | |
| 122 | ||
| 123 | For historical reasons, many capitalized names omit underscores, | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 124 |   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.
 | 
| 47180 | 125 |   Genuine mixed-case names are \emph{not} used, because clear division
 | 
| 40126 | 126 |   of words is essential for readability.\footnote{Camel-case was
 | 
| 127 | invented to workaround the lack of underscore in some early | |
| 128 | non-ASCII character sets. Later it became habitual in some language | |
| 129 | communities that are now strong in numbers.} | |
| 39880 | 130 | |
| 39881 | 131 | A single (capital) character does not count as ``word'' in this | 
| 40126 | 132 | respect: some Isabelle/ML names are suffixed by extra markers like | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 133 |   this: @{ML_text foo_barT}.
 | 
| 39881 | 134 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 135 |   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 136 |   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
 | 
| 39881 | 137 | foo''''} or more. Decimal digits scale better to larger numbers, | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 138 |   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
 | 
| 39880 | 139 | |
| 140 |   \paragraph{Scopes.}  Apart from very basic library modules, ML
 | |
| 141 | structures are not ``opened'', but names are referenced with | |
| 39881 | 142 |   explicit qualification, as in @{ML Syntax.string_of_term} for
 | 
| 39880 | 143 | example. When devising names for structures and their components it | 
| 57421 | 144 | is important to aim at eye-catching compositions of both parts, because | 
| 40126 | 145 | this is how they are seen in the sources and documentation. For the | 
| 146 | same reasons, aliases of well-known library functions should be | |
| 147 | avoided. | |
| 39880 | 148 | |
| 40126 | 149 | Local names of function abstraction or case/let bindings are | 
| 39880 | 150 | typically shorter, sometimes using only rudiments of ``words'', | 
| 151 | while still avoiding cryptic shorthands. An auxiliary function | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 152 |   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is
 | 
| 39880 | 153 | considered bad style. | 
| 154 | ||
| 155 | Example: | |
| 156 | ||
| 157 |   \begin{verbatim}
 | |
| 158 | (* RIGHT *) | |
| 159 | ||
| 160 | fun print_foo ctxt foo = | |
| 161 | let | |
| 39881 | 162 | fun print t = ... Syntax.string_of_term ctxt t ... | 
| 163 | in ... end; | |
| 164 | ||
| 165 | ||
| 166 | (* RIGHT *) | |
| 167 | ||
| 168 | fun print_foo ctxt foo = | |
| 169 | let | |
| 39880 | 170 | val string_of_term = Syntax.string_of_term ctxt; | 
| 171 | fun print t = ... string_of_term t ... | |
| 172 | in ... end; | |
| 173 | ||
| 174 | ||
| 175 | (* WRONG *) | |
| 176 | ||
| 177 | val string_of_term = Syntax.string_of_term; | |
| 178 | ||
| 179 | fun print_foo ctxt foo = | |
| 180 | let | |
| 181 | fun aux t = ... string_of_term ctxt t ... | |
| 182 | in ... end; | |
| 183 | ||
| 184 |   \end{verbatim}
 | |
| 185 | ||
| 186 | ||
| 40126 | 187 |   \paragraph{Specific conventions.} Here are some specific name forms
 | 
| 188 | that occur frequently in the sources. | |
| 39881 | 189 | |
| 190 |   \begin{itemize}
 | |
| 191 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 192 |   \item A function that maps @{ML_text foo} to @{ML_text bar} is
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 193 |   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
 | 
| 57421 | 194 |   @{ML_text foo2bar}, nor @{ML_text bar_from_foo}, nor @{ML_text
 | 
| 195 |   bar_for_foo}, nor @{ML_text bar4foo}).
 | |
| 39881 | 196 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 197 |   \item The name component @{ML_text legacy} means that the operation
 | 
| 39881 | 198 | is about to be discontinued soon. | 
| 199 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 200 |   \item The name component @{ML_text global} means that this works
 | 
| 39881 | 201 | with the background theory instead of the regular local context | 
| 202 |   (\secref{sec:context}), sometimes for historical reasons, sometimes
 | |
| 203 | due a genuine lack of locality of the concept involved, sometimes as | |
| 204 | a fall-back for the lack of a proper context in the application | |
| 205 | code. Whenever there is a non-global variant available, the | |
| 206 | application should be migrated to use it with a proper local | |
| 207 | context. | |
| 208 | ||
| 209 | \item Variables of the main context types of the Isabelle/Isar | |
| 210 |   framework (\secref{sec:context} and \chref{ch:local-theory}) have
 | |
| 40126 | 211 | firm naming conventions as follows: | 
| 39881 | 212 | |
| 213 |   \begin{itemize}
 | |
| 214 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 215 |   \item theories are called @{ML_text thy}, rarely @{ML_text theory}
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 216 |   (never @{ML_text thry})
 | 
| 39881 | 217 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 218 |   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 219 |   context} (never @{ML_text ctx})
 | 
| 39881 | 220 | |
| 57421 | 221 |   \item generic contexts are called @{ML_text context}
 | 
| 39881 | 222 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 223 |   \item local theories are called @{ML_text lthy}, except for local
 | 
| 40126 | 224 | theories that are treated as proof context (which is a semantic | 
| 225 | super-type) | |
| 39881 | 226 | |
| 227 |   \end{itemize}
 | |
| 228 | ||
| 229 | Variations with primed or decimal numbers are always possible, as | |
| 56579 | 230 |   well as semantic prefixes like @{ML_text foo_thy} or @{ML_text
 | 
| 39881 | 231 | bar_ctxt}, but the base conventions above need to be preserved. | 
| 57421 | 232 | This allows to emphasize their data flow via plain regular | 
| 233 | expressions in the text editor. | |
| 39881 | 234 | |
| 40126 | 235 |   \item The main logical entities (\secref{ch:logic}) have established
 | 
| 236 | naming convention as follows: | |
| 39881 | 237 | |
| 238 |   \begin{itemize}
 | |
| 239 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 240 |   \item sorts are called @{ML_text S}
 | 
| 39881 | 241 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 242 |   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 243 |   ty} (never @{ML_text t})
 | 
| 39881 | 244 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 245 |   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 246 |   tm} (never @{ML_text trm})
 | 
| 39881 | 247 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 248 |   \item certified types are called @{ML_text cT}, rarely @{ML_text
 | 
| 39881 | 249 | T}, with variants as for types | 
| 250 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 251 |   \item certified terms are called @{ML_text ct}, rarely @{ML_text
 | 
| 52733 | 252 |   t}, with variants as for terms (never @{ML_text ctrm})
 | 
| 39881 | 253 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 254 |   \item theorems are called @{ML_text th}, or @{ML_text thm}
 | 
| 39881 | 255 | |
| 256 |   \end{itemize}
 | |
| 257 | ||
| 258 | Proper semantic names override these conventions completely. For | |
| 259 | example, the left-hand side of an equation (as a term) can be called | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 260 |   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 261 |   to be a variable can be called @{ML_text v} or @{ML_text x}.
 | 
| 39881 | 262 | |
| 40310 | 263 |   \item Tactics (\secref{sec:tactics}) are sufficiently important to
 | 
| 264 | have specific naming conventions. The name of a basic tactic | |
| 265 |   definition always has a @{ML_text "_tac"} suffix, the subgoal index
 | |
| 266 |   (if applicable) is always called @{ML_text i}, and the goal state
 | |
| 267 |   (if made explicit) is usually called @{ML_text st} instead of the
 | |
| 268 |   somewhat misleading @{ML_text thm}.  Any other arguments are given
 | |
| 269 | before the latter two, and the general context is given first. | |
| 270 | Example: | |
| 271 | ||
| 272 |   \begin{verbatim}
 | |
| 273 | fun my_tac ctxt arg1 arg2 i st = ... | |
| 274 |   \end{verbatim}
 | |
| 275 | ||
| 276 |   Note that the goal state @{ML_text st} above is rarely made
 | |
| 277 | explicit, if tactic combinators (tacticals) are used as usual. | |
| 278 | ||
| 57421 | 279 | A tactic that requires a proof context needs to make that explicit as seen | 
| 280 |   in the @{verbatim ctxt} argument above. Do not refer to the background
 | |
| 281 |   theory of @{verbatim st} -- it is not a proper context, but merely a formal
 | |
| 282 | certificate. | |
| 283 | ||
| 39881 | 284 |   \end{itemize}
 | 
| 39878 | 285 | *} | 
| 286 | ||
| 287 | ||
| 288 | subsection {* General source layout *}
 | |
| 289 | ||
| 57421 | 290 | text {*
 | 
| 291 | The general Isabelle/ML source layout imitates regular type-setting | |
| 292 | conventions, augmented by the requirements for deeply nested expressions | |
| 293 | that are commonplace in functional programming. | |
| 294 | ||
| 295 |   \paragraph{Line length} is limited to 80 characters according to ancient
 | |
| 40126 | 296 | standards, but we allow as much as 100 characters (not | 
| 297 |   more).\footnote{Readability requires to keep the beginning of a line
 | |
| 39881 | 298 | in view while watching its end. Modern wide-screen displays do not | 
| 40126 | 299 | change the way how the human brain works. Sources also need to be | 
| 300 | printable on plain paper with reasonable font-size.} The extra 20 | |
| 39880 | 301 | characters acknowledge the space requirements due to qualified | 
| 302 | library references in Isabelle/ML. | |
| 39878 | 303 | |
| 304 |   \paragraph{White-space} is used to emphasize the structure of
 | |
| 305 | expressions, following mostly standard conventions for mathematical | |
| 306 |   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
 | |
| 39879 | 307 | defines positioning of spaces for parentheses, punctuation, and | 
| 308 | infixes as illustrated here: | |
| 39878 | 309 | |
| 310 |   \begin{verbatim}
 | |
| 311 | val x = y + z * (a + b); | |
| 312 | val pair = (a, b); | |
| 313 |   val record = {foo = 1, bar = 2};
 | |
| 314 |   \end{verbatim}
 | |
| 315 | ||
| 39879 | 316 |   Lines are normally broken \emph{after} an infix operator or
 | 
| 317 | punctuation character. For example: | |
| 39878 | 318 | |
| 319 |   \begin{verbatim}
 | |
| 320 | val x = | |
| 321 | a + | |
| 322 | b + | |
| 323 | c; | |
| 324 | ||
| 325 | val tuple = | |
| 326 | (a, | |
| 327 | b, | |
| 328 | c); | |
| 329 |   \end{verbatim}
 | |
| 330 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 331 |   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the
 | 
| 39879 | 332 | start of the line, but punctuation is always at the end. | 
| 39878 | 333 | |
| 334 |   Function application follows the tradition of @{text "\<lambda>"}-calculus,
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 335 |   not informal mathematics.  For example: @{ML_text "f a b"} for a
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 336 |   curried function, or @{ML_text "g (a, b)"} for a tupled function.
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 337 |   Note that the space between @{ML_text g} and the pair @{ML_text
 | 
| 39879 | 338 | "(a, b)"} follows the important principle of | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 339 |   \emph{compositionality}: the layout of @{ML_text "g p"} does not
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 340 |   change when @{ML_text "p"} is refined to the concrete pair
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 341 |   @{ML_text "(a, b)"}.
 | 
| 39878 | 342 | |
| 343 |   \paragraph{Indentation} uses plain spaces, never hard
 | |
| 344 |   tabulators.\footnote{Tabulators were invented to move the carriage
 | |
| 345 | of a type-writer to certain predefined positions. In software they | |
| 346 | could be used as a primitive run-length compression of consecutive | |
| 347 | spaces, but the precise result would depend on non-standardized | |
| 57421 | 348 | text editor configuration.} | 
| 39878 | 349 | |
| 39879 | 350 | Each level of nesting is indented by 2 spaces, sometimes 1, very | 
| 40126 | 351 | rarely 4, never 8 or any other odd number. | 
| 39878 | 352 | |
| 39879 | 353 | Indentation follows a simple logical format that only depends on the | 
| 354 | nesting depth, not the accidental length of the text that initiates | |
| 355 | a level of nesting. Example: | |
| 39878 | 356 | |
| 357 |   \begin{verbatim}
 | |
| 39880 | 358 | (* RIGHT *) | 
| 359 | ||
| 39878 | 360 | if b then | 
| 39879 | 361 | expr1_part1 | 
| 362 | expr1_part2 | |
| 39878 | 363 | else | 
| 39879 | 364 | expr2_part1 | 
| 365 | expr2_part2 | |
| 39878 | 366 | |
| 39880 | 367 | |
| 368 | (* WRONG *) | |
| 369 | ||
| 39879 | 370 | if b then expr1_part1 | 
| 371 | expr1_part2 | |
| 372 | else expr2_part1 | |
| 373 | expr2_part2 | |
| 39878 | 374 |   \end{verbatim}
 | 
| 375 | ||
| 376 | The second form has many problems: it assumes a fixed-width font | |
| 39879 | 377 | when viewing the sources, it uses more space on the line and thus | 
| 378 | makes it hard to observe its strict length limit (working against | |
| 39878 | 379 |   \emph{readability}), it requires extra editing to adapt the layout
 | 
| 39879 | 380 | to changes of the initial text (working against | 
| 39878 | 381 |   \emph{maintainability}) etc.
 | 
| 382 | ||
| 39879 | 383 | \medskip For similar reasons, any kind of two-dimensional or tabular | 
| 40126 | 384 | layouts, ASCII-art with lines or boxes of asterisks etc.\ should be | 
| 39879 | 385 | avoided. | 
| 39881 | 386 | |
| 40126 | 387 |   \paragraph{Complex expressions} that consist of multi-clausal
 | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 388 |   function definitions, @{ML_text handle}, @{ML_text case},
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 389 |   @{ML_text let} (and combinations) require special attention.  The
 | 
| 40126 | 390 | syntax of Standard ML is quite ambitious and admits a lot of | 
| 391 | variance that can distort the meaning of the text. | |
| 39881 | 392 | |
| 57421 | 393 |   Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
 | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 394 |   @{ML_text case} get extra indentation to indicate the nesting
 | 
| 40126 | 395 | clearly. Example: | 
| 39881 | 396 | |
| 397 |   \begin{verbatim}
 | |
| 398 | (* RIGHT *) | |
| 399 | ||
| 400 | fun foo p1 = | |
| 401 | expr1 | |
| 402 | | foo p2 = | |
| 403 | expr2 | |
| 404 | ||
| 405 | ||
| 406 | (* WRONG *) | |
| 407 | ||
| 408 | fun foo p1 = | |
| 409 | expr1 | |
| 410 | | foo p2 = | |
| 411 | expr2 | |
| 412 |   \end{verbatim}
 | |
| 413 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 414 |   Body expressions consisting of @{ML_text case} or @{ML_text let}
 | 
| 39881 | 415 | require care to maintain compositionality, to prevent loss of | 
| 40126 | 416 | logical indentation where it is especially important to see the | 
| 417 | structure of the text. Example: | |
| 39881 | 418 | |
| 419 |   \begin{verbatim}
 | |
| 420 | (* RIGHT *) | |
| 421 | ||
| 422 | fun foo p1 = | |
| 423 | (case e of | |
| 424 | q1 => ... | |
| 425 | | q2 => ...) | |
| 426 | | foo p2 = | |
| 427 | let | |
| 428 | ... | |
| 429 | in | |
| 430 | ... | |
| 431 | end | |
| 432 | ||
| 433 | ||
| 434 | (* WRONG *) | |
| 435 | ||
| 436 | fun foo p1 = case e of | |
| 437 | q1 => ... | |
| 438 | | q2 => ... | |
| 439 | | foo p2 = | |
| 440 | let | |
| 441 | ... | |
| 442 | in | |
| 443 | ... | |
| 444 | end | |
| 445 |   \end{verbatim}
 | |
| 446 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 447 |   Extra parentheses around @{ML_text case} expressions are optional,
 | 
| 40126 | 448 | but help to analyse the nesting based on character matching in the | 
| 57421 | 449 | text editor. | 
| 39881 | 450 | |
| 451 | \medskip There are two main exceptions to the overall principle of | |
| 452 | compositionality in the layout of complex expressions. | |
| 453 | ||
| 454 |   \begin{enumerate}
 | |
| 455 | ||
| 57421 | 456 |   \item @{ML_text "if"} expressions are iterated as if ML had multi-branch
 | 
| 457 | conditionals, e.g. | |
| 39881 | 458 | |
| 459 |   \begin{verbatim}
 | |
| 460 | (* RIGHT *) | |
| 461 | ||
| 462 | if b1 then e1 | |
| 463 | else if b2 then e2 | |
| 464 | else e3 | |
| 465 |   \end{verbatim}
 | |
| 466 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 467 |   \item @{ML_text fn} abstractions are often layed-out as if they
 | 
| 39881 | 468 | would lack any structure by themselves. This traditional form is | 
| 469 | motivated by the possibility to shift function arguments back and | |
| 40126 | 470 | forth wrt.\ additional combinators. Example: | 
| 39881 | 471 | |
| 472 |   \begin{verbatim}
 | |
| 473 | (* RIGHT *) | |
| 474 | ||
| 475 | fun foo x y = fold (fn z => | |
| 476 | expr) | |
| 477 |   \end{verbatim}
 | |
| 478 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 479 |   Here the visual appearance is that of three arguments @{ML_text x},
 | 
| 57421 | 480 |   @{ML_text y}, @{ML_text z} in a row.
 | 
| 39881 | 481 | |
| 482 |   \end{enumerate}
 | |
| 483 | ||
| 484 | Such weakly structured layout should be use with great care. Here | |
| 40153 | 485 |   are some counter-examples involving @{ML_text let} expressions:
 | 
| 39881 | 486 | |
| 487 |   \begin{verbatim}
 | |
| 488 | (* WRONG *) | |
| 489 | ||
| 490 | fun foo x = let | |
| 491 | val y = ... | |
| 492 | in ... end | |
| 493 | ||
| 41162 | 494 | |
| 495 | (* WRONG *) | |
| 496 | ||
| 40153 | 497 | fun foo x = let | 
| 498 | val y = ... | |
| 499 | in ... end | |
| 500 | ||
| 41162 | 501 | |
| 502 | (* WRONG *) | |
| 503 | ||
| 39881 | 504 | fun foo x = | 
| 505 | let | |
| 506 | val y = ... | |
| 507 | in ... end | |
| 57421 | 508 | |
| 509 | ||
| 510 | (* WRONG *) | |
| 511 | ||
| 512 | fun foo x = | |
| 513 | let | |
| 514 | val y = ... | |
| 515 | in | |
| 516 | ... end | |
| 39881 | 517 |   \end{verbatim}
 | 
| 518 | ||
| 519 | \medskip In general the source layout is meant to emphasize the | |
| 520 | structure of complex language expressions, not to pretend that SML | |
| 57421 | 521 | had a completely different syntax (say that of Haskell, Scala, Java). | 
| 39878 | 522 | *} | 
| 523 | ||
| 524 | ||
| 57421 | 525 | section {* ML embedded into Isabelle/Isar *}
 | 
| 39823 | 526 | |
| 39824 | 527 | text {* ML and Isar are intertwined via an open-ended bootstrap
 | 
| 528 | process that provides more and more programming facilities and | |
| 529 | logical content in an alternating manner. Bootstrapping starts from | |
| 530 | the raw environment of existing implementations of Standard ML | |
| 531 | (mainly Poly/ML, but also SML/NJ). | |
| 39823 | 532 | |
| 57421 | 533 | Isabelle/Pure marks the point where the raw ML toplevel is superseded by | 
| 534 | Isabelle/ML within the Isar theory and proof language, with a uniform | |
| 535 |   context for arbitrary ML values (see also \secref{sec:context}). This formal
 | |
| 536 | environment holds ML compiler bindings, logical entities, and many other | |
| 537 | things. | |
| 538 | ||
| 539 | Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar | |
| 540 | environment by introducing suitable theories with associated ML modules, | |
| 541 |   either inlined within @{verbatim ".thy"} files, or as separate @{verbatim
 | |
| 542 | ".ML"} files that are loading from some theory. Thus Isabelle/HOL is defined | |
| 543 | as a regular user-space application within the Isabelle framework. Further | |
| 544 | add-on tools can be implemented in ML within the Isar context in the same | |
| 545 | manner: ML is part of the standard repertoire of Isabelle, and there is no | |
| 546 | distinction between ``users'' and ``developers'' in this respect. | |
| 39823 | 547 | *} | 
| 548 | ||
| 39824 | 549 | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 550 | subsection {* Isar ML commands *}
 | 
| 39823 | 551 | |
| 57421 | 552 | text {*
 | 
| 553 | The primary Isar source language provides facilities to ``open a window'' to | |
| 554 |   the underlying ML compiler. Especially see the Isar commands @{command_ref
 | |
| 555 |   "ML_file"} and @{command_ref "ML"}: both work the same way, but the source
 | |
| 556 | text is provided differently, via a file vs.\ inlined, respectively. Apart | |
| 557 | from embedding ML into the main theory definition like that, there are many | |
| 558 |   more commands that refer to ML source, such as @{command_ref setup} or
 | |
| 559 |   @{command_ref declaration}. Even more fine-grained embedding of ML into Isar
 | |
| 560 |   is encountered in the proof method @{method_ref tactic}, which refines the
 | |
| 561 |   pending goal state via a given expression of type @{ML_type tactic}.
 | |
| 39824 | 562 | *} | 
| 39823 | 563 | |
| 39824 | 564 | text %mlex {* The following artificial example demonstrates some ML
 | 
| 565 | toplevel declarations within the implicit Isar theory context. This | |
| 566 | is regular functional programming without referring to logical | |
| 567 | entities yet. | |
| 39823 | 568 | *} | 
| 569 | ||
| 570 | ML {*
 | |
| 571 | fun factorial 0 = 1 | |
| 572 | | factorial n = n * factorial (n - 1) | |
| 573 | *} | |
| 574 | ||
| 40126 | 575 | text {* Here the ML environment is already managed by Isabelle, i.e.\
 | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 576 |   the @{ML factorial} function is not yet accessible in the preceding
 | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 577 | paragraph, nor in a different theory that is independent from the | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 578 | current one in the import hierarchy. | 
| 39823 | 579 | |
| 57421 | 580 | Removing the above ML declaration from the source text will remove any trace | 
| 581 | of this definition, as expected. The Isabelle/ML toplevel environment is | |
| 582 |   managed in a \emph{stateless} way: in contrast to the raw ML toplevel, there
 | |
| 583 |   are no global side-effects involved here.\footnote{Such a stateless
 | |
| 584 | compilation environment is also a prerequisite for robust parallel | |
| 585 | compilation within independent nodes of the implicit theory development | |
| 586 | graph.} | |
| 39823 | 587 | |
| 40126 | 588 | \medskip The next example shows how to embed ML into Isar proofs, using | 
| 57421 | 589 |   @{command_ref "ML_prf"} instead of Instead of @{command_ref "ML"}.
 | 
| 40126 | 590 | As illustrated below, the effect on the ML environment is local to | 
| 57421 | 591 | the whole proof body, but ignoring the block structure. | 
| 40126 | 592 | *} | 
| 39823 | 593 | |
| 40964 | 594 | notepad | 
| 595 | begin | |
| 39851 | 596 |   ML_prf %"ML" {* val a = 1 *}
 | 
| 40126 | 597 |   {
 | 
| 39851 | 598 |     ML_prf %"ML" {* val b = a + 1 *}
 | 
| 39824 | 599 |   } -- {* Isar block structure ignored by ML environment *}
 | 
| 39851 | 600 |   ML_prf %"ML" {* val c = b + 1 *}
 | 
| 40964 | 601 | end | 
| 39823 | 602 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 603 | text {* By side-stepping the normal scoping rules for Isar proof
 | 
| 40126 | 604 | blocks, embedded ML code can refer to the different contexts and | 
| 605 | manipulate corresponding entities, e.g.\ export a fact from a block | |
| 606 | context. | |
| 39823 | 607 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 608 | \medskip Two further ML commands are useful in certain situations: | 
| 57421 | 609 |   @{command_ref ML_val} and @{command_ref ML_command} are \emph{diagnostic} in
 | 
| 610 | the sense that there is no effect on the underlying environment, and can | |
| 611 | thus be used anywhere. The examples below produce long strings of digits by | |
| 612 |   invoking @{ML factorial}: @{command ML_val} takes care of printing the ML
 | |
| 613 |   toplevel result, but @{command ML_command} is silent so we produce an
 | |
| 614 | explicit output message. | |
| 615 | *} | |
| 39823 | 616 | |
| 617 | ML_val {* factorial 100 *}
 | |
| 618 | ML_command {* writeln (string_of_int (factorial 100)) *}
 | |
| 619 | ||
| 40964 | 620 | notepad | 
| 621 | begin | |
| 52417 | 622 |   ML_val {* factorial 100 *}
 | 
| 39823 | 623 |   ML_command {* writeln (string_of_int (factorial 100)) *}
 | 
| 40964 | 624 | end | 
| 39823 | 625 | |
| 626 | ||
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 627 | subsection {* Compile-time context *}
 | 
| 39823 | 628 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 629 | text {* Whenever the ML compiler is invoked within Isabelle/Isar, the
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 630 | formal context is passed as a thread-local reference variable. Thus | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 631 | ML code may access the theory context during compilation, by reading | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 632 | or writing the (local) theory under construction. Note that such | 
| 40126 | 633 | direct access to the compile-time context is rare. In practice it | 
| 634 | is typically done via some derived ML functions instead. | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 635 | *} | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 636 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 637 | text %mlref {*
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 638 |   \begin{mldecls}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 639 |   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\
 | 
| 40126 | 640 |   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
 | 
| 56199 | 641 |   @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
 | 
| 642 |   @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 643 |   \end{mldecls}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 644 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 645 |   \begin{description}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 646 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 647 |   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 648 | context of the ML toplevel --- at compile time. ML code needs to | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 649 |   take care to refer to @{ML "ML_Context.the_generic_context ()"}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 650 | correctly. Recall that evaluation of a function body is delayed | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 651 | until actual run-time. | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 652 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 653 |   \item @{ML "Context.>>"}~@{text f} applies context transformation
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 654 |   @{text f} to the implicit context of the ML toplevel.
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 655 | |
| 56199 | 656 |   \item @{ML ML_Thms.bind_thms}~@{text "(name, thms)"} stores a list of
 | 
| 39850 | 657 | theorems produced in ML both in the (global) theory context and the | 
| 57421 | 658 | ML toplevel, associating it with the provided name. | 
| 659 | ||
| 660 |   \item @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but
 | |
| 661 | refers to a singleton fact. | |
| 39850 | 662 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 663 |   \end{description}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 664 | |
| 40126 | 665 | It is important to note that the above functions are really | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 666 | restricted to the compile time, even though the ML compiler is | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 667 | invoked at run-time. The majority of ML code either uses static | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 668 |   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 669 | proof context at run-time, by explicit functional abstraction. | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 670 | *} | 
| 39823 | 671 | |
| 672 | ||
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 673 | subsection {* Antiquotations \label{sec:ML-antiq} *}
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 674 | |
| 57421 | 675 | text {* A very important consequence of embedding ML into Isar is the
 | 
| 40126 | 676 |   concept of \emph{ML antiquotation}.  The standard token language of
 | 
| 677 | ML is augmented by special syntactic entities of the following form: | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 678 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 679 |   @{rail \<open>
 | 
| 53167 | 680 |   @{syntax_def antiquote}: '@{' nameref args '}'
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 681 | \<close>} | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 682 | |
| 57421 | 683 |   Here @{syntax nameref} and @{syntax args} are outer syntax categories, as
 | 
| 684 |   defined in \cite{isabelle-isar-ref}.
 | |
| 39823 | 685 | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 686 |   \medskip A regular antiquotation @{text "@{name args}"} processes
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 687 | its arguments by the usual means of the Isar source language, and | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 688 | produces corresponding ML source text, either as literal | 
| 57421 | 689 |   \emph{inline} text (e.g.\ @{text "@{term t}"}) or abstract
 | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 690 |   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 691 | scheme allows to refer to formal entities in a robust manner, with | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 692 | proper static scoping and with some degree of logical checking of | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 693 | small portions of the code. | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 694 | *} | 
| 39823 | 695 | |
| 39835 | 696 | |
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 697 | subsection {* Printing ML values *}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 698 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 699 | text {* The ML compiler knows about the structure of values according
 | 
| 57421 | 700 | to their static type, and can print them in the manner of its | 
| 701 | toplevel, although the details are non-portable. The | |
| 56399 | 702 |   antiquotations @{ML_antiquotation_def "make_string"} and
 | 
| 703 |   @{ML_antiquotation_def "print"} provide a quasi-portable way to
 | |
| 704 | refer to this potential capability of the underlying ML system in | |
| 705 | generic Isabelle/ML sources. | |
| 706 | ||
| 707 | This is occasionally useful for diagnostic or demonstration | |
| 708 | purposes. Note that production-quality tools require proper | |
| 57421 | 709 | user-level error messages, avoiding raw ML values in the output. *} | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 710 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 711 | text %mlantiq {*
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 712 |   \begin{matharray}{rcl}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 713 |   @{ML_antiquotation_def "make_string"} & : & @{text ML_antiquotation} \\
 | 
| 56399 | 714 |   @{ML_antiquotation_def "print"} & : & @{text ML_antiquotation} \\
 | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 715 |   \end{matharray}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 716 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 717 |   @{rail \<open>
 | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 718 |   @@{ML_antiquotation make_string}
 | 
| 56399 | 719 | ; | 
| 720 |   @@{ML_antiquotation print} @{syntax name}?
 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 721 | \<close>} | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 722 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 723 |   \begin{description}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 724 | |
| 57832 | 725 |   \item @{text "@{make_string}"} inlines a function to print arbitrary values
 | 
| 726 | similar to the ML toplevel. The result is compiler dependent and may fall | |
| 727 | back on "?" in certain situations. The value of configuration option | |
| 57834 
0d295e339f52
prefer dynamic ML_print_depth if context happens to be available;
 wenzelm parents: 
57832diff
changeset | 728 |   @{attribute_ref ML_print_depth} determines further details of output.
 | 
| 56399 | 729 | |
| 730 |   \item @{text "@{print f}"} uses the ML function @{text "f: string ->
 | |
| 731 |   unit"} to output the result of @{text "@{make_string}"} above,
 | |
| 732 | together with the source position of the antiquotation. The default | |
| 733 |   output function is @{ML writeln}.
 | |
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 734 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 735 |   \end{description}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 736 | *} | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 737 | |
| 56399 | 738 | text %mlex {* The following artificial examples show how to produce
 | 
| 739 | adhoc output of ML values for debugging purposes. *} | |
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 740 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 741 | ML {*
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 742 | val x = 42; | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 743 | val y = true; | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 744 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 745 |   writeln (@{make_string} {x = x, y = y});
 | 
| 56399 | 746 | |
| 747 |   @{print} {x = x, y = y};
 | |
| 748 |   @{print tracing} {x = x, y = y};
 | |
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 749 | *} | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 750 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 751 | |
| 39883 | 752 | section {* Canonical argument order \label{sec:canonical-argument-order} *}
 | 
| 753 | ||
| 754 | text {* Standard ML is a language in the tradition of @{text
 | |
| 755 |   "\<lambda>"}-calculus and \emph{higher-order functional programming},
 | |
| 756 | similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical | |
| 757 | languages. Getting acquainted with the native style of representing | |
| 758 | functions in that setting can save a lot of extra boiler-plate of | |
| 759 | redundant shuffling of arguments, auxiliary abstractions etc. | |
| 760 | ||
| 40126 | 761 |   Functions are usually \emph{curried}: the idea of turning arguments
 | 
| 762 |   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of
 | |
| 763 |   type @{text "\<tau>"} is represented by the iterated function space
 | |
| 764 |   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known
 | |
| 765 |   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried
 | |
| 766 |   version fits more smoothly into the basic calculus.\footnote{The
 | |
| 57421 | 767 | difference is even more significant in HOL, because the redundant | 
| 768 | tuple structure needs to be accommodated extraneous proof steps.} | |
| 39883 | 769 | |
| 56594 | 770 |   Currying gives some flexibility due to \emph{partial application}.  A
 | 
| 53071 | 771 |   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}
 | 
| 40126 | 772 |   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function
 | 
| 39883 | 773 | etc. How well this works in practice depends on the order of | 
| 774 | arguments. In the worst case, arguments are arranged erratically, | |
| 775 | and using a function in a certain situation always requires some | |
| 56579 | 776 | glue code. Thus we would get exponentially many opportunities to | 
| 39883 | 777 | decorate the code with meaningless permutations of arguments. | 
| 778 | ||
| 779 |   This can be avoided by \emph{canonical argument order}, which
 | |
| 40126 | 780 | observes certain standard patterns and minimizes adhoc permutations | 
| 40229 | 781 | in their application. In Isabelle/ML, large portions of text can be | 
| 52416 | 782 |   written without auxiliary operations like @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times>
 | 
| 57421 | 783 |   \<alpha>"} or @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} (the latter is not
 | 
| 52416 | 784 | present in the Isabelle/ML library). | 
| 39883 | 785 | |
| 57421 | 786 | \medskip The main idea is that arguments that vary less are moved | 
| 39883 | 787 | further to the left than those that vary more. Two particularly | 
| 788 |   important categories of functions are \emph{selectors} and
 | |
| 789 |   \emph{updates}.
 | |
| 790 | ||
| 791 | The subsequent scheme is based on a hypothetical set-like container | |
| 792 |   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both
 | |
| 793 | the names and types of the associated operations are canonical for | |
| 794 | Isabelle/ML. | |
| 795 | ||
| 52416 | 796 |   \begin{center}
 | 
| 39883 | 797 |   \begin{tabular}{ll}
 | 
| 798 | kind & canonical name and type \\\hline | |
| 799 |   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\
 | |
| 800 |   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\
 | |
| 801 |   \end{tabular}
 | |
| 52416 | 802 |   \end{center}
 | 
| 39883 | 803 | |
| 804 |   Given a container @{text "B: \<beta>"}, the partially applied @{text
 | |
| 805 |   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and
 | |
| 806 | thus represents the intended denotation directly. It is customary | |
| 807 | to pass the abstract predicate to further operations, not the | |
| 808 | concrete container. The argument order makes it easy to use other | |
| 809 |   combinators: @{text "forall (member B) list"} will check a list of
 | |
| 810 |   elements for membership in @{text "B"} etc. Often the explicit
 | |
| 40126 | 811 |   @{text "list"} is pointless and can be contracted to @{text "forall
 | 
| 812 | (member B)"} to get directly a predicate again. | |
| 39883 | 813 | |
| 40126 | 814 | In contrast, an update operation varies the container, so it moves | 
| 39883 | 815 |   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to
 | 
| 816 |   insert a value @{text "a"}.  These can be composed naturally as
 | |
| 40126 | 817 |   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward
 | 
| 40229 | 818 | inversion of the composition order is due to conventional | 
| 40126 | 819 | mathematical notation, which can be easily amended as explained | 
| 820 | below. | |
| 39883 | 821 | *} | 
| 822 | ||
| 823 | ||
| 824 | subsection {* Forward application and composition *}
 | |
| 825 | ||
| 826 | text {* Regular function application and infix notation works best for
 | |
| 827 |   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g
 | |
| 40126 | 828 |   z)"}.  The important special case of \emph{linear transformation}
 | 
| 829 |   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This
 | |
| 830 | becomes hard to read and maintain if the functions are themselves | |
| 831 | given as complex expressions. The notation can be significantly | |
| 39883 | 832 |   improved by introducing \emph{forward} versions of application and
 | 
| 833 | composition as follows: | |
| 834 | ||
| 835 | \medskip | |
| 836 |   \begin{tabular}{lll}
 | |
| 837 |   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\
 | |
| 41162 | 838 |   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\
 | 
| 39883 | 839 |   \end{tabular}
 | 
| 840 | \medskip | |
| 841 | ||
| 842 |   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or
 | |
| 843 |   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text
 | |
| 844 | "x"}. | |
| 845 | ||
| 846 | \medskip There is an additional set of combinators to accommodate | |
| 847 | multiple results (via pairs) that are passed on as multiple | |
| 848 | arguments (via currying). | |
| 849 | ||
| 850 | \medskip | |
| 851 |   \begin{tabular}{lll}
 | |
| 852 |   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\
 | |
| 41162 | 853 |   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\
 | 
| 39883 | 854 |   \end{tabular}
 | 
| 855 | \medskip | |
| 856 | *} | |
| 857 | ||
| 858 | text %mlref {*
 | |
| 859 |   \begin{mldecls}
 | |
| 46262 | 860 |   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
 | 
| 861 |   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
 | |
| 862 |   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
 | |
| 863 |   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
 | |
| 39883 | 864 |   \end{mldecls}
 | 
| 865 | *} | |
| 866 | ||
| 867 | ||
| 868 | subsection {* Canonical iteration *}
 | |
| 869 | ||
| 870 | text {* As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be
 | |
| 40126 | 871 |   understood as update on a configuration of type @{text "\<beta>"},
 | 
| 57421 | 872 |   parameterized by an argument of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}
 | 
| 39883 | 873 |   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates
 | 
| 874 |   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a
 | |
| 53071 | 875 |   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f a\<^sub>n"}.
 | 
| 876 |   The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.
 | |
| 39883 | 877 |   It can be applied to an initial configuration @{text "b: \<beta>"} to
 | 
| 878 |   start the iteration over the given list of arguments: each @{text
 | |
| 879 |   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a
 | |
| 880 | cumulative configuration. | |
| 881 | ||
| 882 |   The @{text fold} combinator in Isabelle/ML lifts a function @{text
 | |
| 883 | "f"} as above to its iterated version over a list of arguments. | |
| 884 |   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates
 | |
| 885 | over a list of lists as expected. | |
| 886 | ||
| 887 |   The variant @{text "fold_rev"} works inside-out over the list of
 | |
| 888 |   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.
 | |
| 889 | ||
| 890 |   The @{text "fold_map"} combinator essentially performs @{text
 | |
| 891 |   "fold"} and @{text "map"} simultaneously: each application of @{text
 | |
| 892 | "f"} produces an updated configuration together with a side-result; | |
| 893 | the iteration collects all such side-results as a separate list. | |
| 894 | *} | |
| 895 | ||
| 896 | text %mlref {*
 | |
| 897 |   \begin{mldecls}
 | |
| 898 |   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | |
| 899 |   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | |
| 900 |   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
 | |
| 901 |   \end{mldecls}
 | |
| 902 | ||
| 903 |   \begin{description}
 | |
| 904 | ||
| 905 |   \item @{ML fold}~@{text f} lifts the parametrized update function
 | |
| 906 |   @{text "f"} to a list of parameters.
 | |
| 907 | ||
| 908 |   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text
 | |
| 57421 | 909 | "f"}, but works inside-out, as if the list would be reversed. | 
| 39883 | 910 | |
| 911 |   \item @{ML fold_map}~@{text "f"} lifts the parametrized update
 | |
| 912 |   function @{text "f"} (with side-result) to a list of parameters and
 | |
| 913 | cumulative side-results. | |
| 914 | ||
| 915 |   \end{description}
 | |
| 916 | ||
| 917 |   \begin{warn}
 | |
| 57421 | 918 | The literature on functional programming provides a confusing multitude of | 
| 919 |   combinators called @{text "foldl"}, @{text "foldr"} etc. SML97 provides its
 | |
| 920 |   own variations as @{ML List.foldl} and @{ML List.foldr}, while the classic
 | |
| 921 |   Isabelle library also has the historic @{ML Library.foldl} and @{ML
 | |
| 922 | Library.foldr}. To avoid unnecessary complication, all these historical | |
| 923 |   versions should be ignored, and the canonical @{ML fold} (or @{ML fold_rev})
 | |
| 924 | used exclusively. | |
| 39883 | 925 |   \end{warn}
 | 
| 926 | *} | |
| 927 | ||
| 928 | text %mlex {* The following example shows how to fill a text buffer
 | |
| 929 | incrementally by adding strings, either individually or from a given | |
| 930 | list. | |
| 931 | *} | |
| 932 | ||
| 933 | ML {*
 | |
| 934 | val s = | |
| 935 | Buffer.empty | |
| 936 | |> Buffer.add "digits: " | |
| 937 | |> fold (Buffer.add o string_of_int) (0 upto 9) | |
| 938 | |> Buffer.content; | |
| 939 | ||
| 940 |   @{assert} (s = "digits: 0123456789");
 | |
| 941 | *} | |
| 942 | ||
| 943 | text {* Note how @{ML "fold (Buffer.add o string_of_int)"} above saves
 | |
| 944 |   an extra @{ML "map"} over the given list.  This kind of peephole
 | |
| 945 | optimization reduces both the code size and the tree structures in | |
| 52416 | 946 | memory (``deforestation''), but it requires some practice to read | 
| 947 | and write fluently. | |
| 39883 | 948 | |
| 40126 | 949 | \medskip The next example elaborates the idea of canonical | 
| 950 | iteration, demonstrating fast accumulation of tree content using a | |
| 951 | text buffer. | |
| 39883 | 952 | *} | 
| 953 | ||
| 954 | ML {*
 | |
| 955 | datatype tree = Text of string | Elem of string * tree list; | |
| 956 | ||
| 957 | fun slow_content (Text txt) = txt | |
| 958 | | slow_content (Elem (name, ts)) = | |
| 959 | "<" ^ name ^ ">" ^ | |
| 960 | implode (map slow_content ts) ^ | |
| 961 | "</" ^ name ^ ">" | |
| 962 | ||
| 963 | fun add_content (Text txt) = Buffer.add txt | |
| 964 | | add_content (Elem (name, ts)) = | |
| 965 |         Buffer.add ("<" ^ name ^ ">") #>
 | |
| 966 | fold add_content ts #> | |
| 967 |         Buffer.add ("</" ^ name ^ ">");
 | |
| 968 | ||
| 969 | fun fast_content tree = | |
| 970 | Buffer.empty |> add_content tree |> Buffer.content; | |
| 971 | *} | |
| 972 | ||
| 57421 | 973 | text {* The slowness of @{ML slow_content} is due to the @{ML implode} of
 | 
| 39883 | 974 | the recursive results, because it copies previously produced strings | 
| 57421 | 975 | again and again. | 
| 39883 | 976 | |
| 977 |   The incremental @{ML add_content} avoids this by operating on a
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 978 |   buffer that is passed through in a linear fashion.  Using @{ML_text
 | 
| 40126 | 979 | "#>"} and contraction over the actual buffer argument saves some | 
| 980 |   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}
 | |
| 981 | invocations with concatenated strings could have been split into | |
| 982 | smaller parts, but this would have obfuscated the source without | |
| 57421 | 983 | making a big difference in performance. Here we have done some | 
| 39883 | 984 | peephole-optimization for the sake of readability. | 
| 985 | ||
| 986 |   Another benefit of @{ML add_content} is its ``open'' form as a
 | |
| 40126 | 987 | function on buffers that can be continued in further linear | 
| 988 | transformations, folding etc. Thus it is more compositional than | |
| 989 |   the naive @{ML slow_content}.  As realistic example, compare the
 | |
| 990 |   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer
 | |
| 991 |   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
 | |
| 39883 | 992 | |
| 40126 | 993 |   Note that @{ML fast_content} above is only defined as example.  In
 | 
| 994 | many practical situations, it is customary to provide the | |
| 995 |   incremental @{ML add_content} only and leave the initialization and
 | |
| 57421 | 996 | termination to the concrete application to the user. | 
| 39883 | 997 | *} | 
| 998 | ||
| 999 | ||
| 39854 | 1000 | section {* Message output channels \label{sec:message-channels} *}
 | 
| 39835 | 1001 | |
| 1002 | text {* Isabelle provides output channels for different kinds of
 | |
| 1003 | messages: regular output, high-volume tracing information, warnings, | |
| 1004 | and errors. | |
| 1005 | ||
| 1006 | Depending on the user interface involved, these messages may appear | |
| 1007 | in different text styles or colours. The standard output for | |
| 57421 | 1008 |   batch sessions prefixes each line of warnings by @{verbatim
 | 
| 39835 | 1009 |   "###"} and errors by @{verbatim "***"}, but leaves anything else
 | 
| 57421 | 1010 | unchanged. The message body may contain further markup and formatting, | 
| 1011 |   which is routinely used in the Prover IDE \cite{isabelle-jedit}.
 | |
| 39835 | 1012 | |
| 1013 | Messages are associated with the transaction context of the running | |
| 1014 | Isar command. This enables the front-end to manage commands and | |
| 1015 | resulting messages together. For example, after deleting a command | |
| 1016 | from a given theory document version, the corresponding message | |
| 39872 | 1017 | output can be retracted from the display. | 
| 1018 | *} | |
| 39835 | 1019 | |
| 1020 | text %mlref {*
 | |
| 1021 |   \begin{mldecls}
 | |
| 1022 |   @{index_ML writeln: "string -> unit"} \\
 | |
| 1023 |   @{index_ML tracing: "string -> unit"} \\
 | |
| 1024 |   @{index_ML warning: "string -> unit"} \\
 | |
| 57421 | 1025 |   @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\
 | 
| 39835 | 1026 |   \end{mldecls}
 | 
| 1027 | ||
| 1028 |   \begin{description}
 | |
| 1029 | ||
| 1030 |   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular
 | |
| 1031 | message. This is the primary message output operation of Isabelle | |
| 1032 | and should be used by default. | |
| 1033 | ||
| 1034 |   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special
 | |
| 1035 | tracing message, indicating potential high-volume output to the | |
| 1036 | front-end (hundreds or thousands of messages issued by a single | |
| 1037 | command). The idea is to allow the user-interface to downgrade the | |
| 1038 | quality of message display to achieve higher throughput. | |
| 1039 | ||
| 1040 | Note that the user might have to take special actions to see tracing | |
| 1041 | output, e.g.\ switch to a different output window. So this channel | |
| 1042 | should not be used for regular output. | |
| 1043 | ||
| 1044 |   \item @{ML warning}~@{text "text"} outputs @{text "text"} as
 | |
| 1045 | warning, which typically means some extra emphasis on the front-end | |
| 40126 | 1046 | side (color highlighting, icons, etc.). | 
| 39835 | 1047 | |
| 1048 |   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text
 | |
| 1049 |   "text"} and thus lets the Isar toplevel print @{text "text"} on the
 | |
| 1050 | error channel, which typically means some extra emphasis on the | |
| 40126 | 1051 | front-end side (color highlighting, icons, etc.). | 
| 39835 | 1052 | |
| 1053 | This assumes that the exception is not handled before the command | |
| 1054 |   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a
 | |
| 1055 | perfectly legal alternative: it means that the error is absorbed | |
| 1056 | without any message output. | |
| 1057 | ||
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1058 |   \begin{warn}
 | 
| 54387 | 1059 |   The actual error channel is accessed via @{ML Output.error_message}, but
 | 
| 51058 | 1060 |   the old interaction protocol of Proof~General \emph{crashes} if that
 | 
| 39835 | 1061 | function is used in regular ML code: error output and toplevel | 
| 52416 | 1062 | command failure always need to coincide in classic TTY interaction. | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1063 |   \end{warn}
 | 
| 39835 | 1064 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1065 |   \end{description}
 | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1066 | |
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1067 |   \begin{warn}
 | 
| 39835 | 1068 | Regular Isabelle/ML code should output messages exclusively by the | 
| 1069 |   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}
 | |
| 1070 |   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in
 | |
| 1071 | the presence of parallel and asynchronous processing of Isabelle | |
| 1072 | theories. Such raw output might be displayed by the front-end in | |
| 1073 | some system console log, with a low chance that the user will ever | |
| 1074 | see it. Moreover, as a genuine side-effect on global process | |
| 1075 | channels, there is no proper way to retract output when Isar command | |
| 40126 | 1076 | transactions are reset by the system. | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1077 |   \end{warn}
 | 
| 39872 | 1078 | |
| 1079 |   \begin{warn}
 | |
| 1080 | The message channels should be used in a message-oriented manner. | |
| 40126 | 1081 | This means that multi-line output that logically belongs together is | 
| 57421 | 1082 |   issued by a single invocation of @{ML writeln} etc.\ with the
 | 
| 40126 | 1083 | functional concatenation of all message constituents. | 
| 39872 | 1084 |   \end{warn}
 | 
| 1085 | *} | |
| 1086 | ||
| 1087 | text %mlex {* The following example demonstrates a multi-line
 | |
| 1088 | warning. Note that in some situations the user sees only the first | |
| 1089 | line, so the most important point should be made first. | |
| 1090 | *} | |
| 1091 | ||
| 1092 | ML_command {*
 | |
| 1093 | warning (cat_lines | |
| 1094 | ["Beware the Jabberwock, my son!", | |
| 1095 | "The jaws that bite, the claws that catch!", | |
| 1096 | "Beware the Jubjub Bird, and shun", | |
| 1097 | "The frumious Bandersnatch!"]); | |
| 39835 | 1098 | *} | 
| 1099 | ||
| 39854 | 1100 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1101 | section {* Exceptions \label{sec:exceptions} *}
 | 
| 39854 | 1102 | |
| 1103 | text {* The Standard ML semantics of strict functional evaluation
 | |
| 1104 | together with exceptions is rather well defined, but some delicate | |
| 1105 | points need to be observed to avoid that ML programs go wrong | |
| 1106 | despite static type-checking. Exceptions in Isabelle/ML are | |
| 1107 | subsequently categorized as follows. | |
| 1108 | ||
| 1109 |   \paragraph{Regular user errors.}  These are meant to provide
 | |
| 1110 | informative feedback about malformed input etc. | |
| 1111 | ||
| 57421 | 1112 |   The \emph{error} function raises the corresponding @{ML ERROR}
 | 
| 1113 |   exception, with a plain text message as argument.  @{ML ERROR}
 | |
| 39854 | 1114 | exceptions can be handled internally, in order to be ignored, turned | 
| 1115 | into other exceptions, or cascaded by appending messages. If the | |
| 57421 | 1116 |   corresponding Isabelle/Isar command terminates with an @{ML ERROR}
 | 
| 1117 | exception state, the system will print the result on the error | |
| 39855 | 1118 |   channel (see \secref{sec:message-channels}).
 | 
| 39854 | 1119 | |
| 1120 | It is considered bad style to refer to internal function names or | |
| 57421 | 1121 | values in ML source notation in user error messages. Do not use | 
| 1122 |   @{text "@{make_string}"} here!
 | |
| 39854 | 1123 | |
| 1124 | Grammatical correctness of error messages can be improved by | |
| 1125 |   \emph{omitting} final punctuation: messages are often concatenated
 | |
| 1126 | or put into a larger context (e.g.\ augmented with source position). | |
| 57421 | 1127 | Note that punctuation after formal entities (types, terms, theorems) is | 
| 1128 | particularly prone to user confusion. | |
| 39854 | 1129 | |
| 1130 |   \paragraph{Program failures.}  There is a handful of standard
 | |
| 1131 | exceptions that indicate general failure situations, or failures of | |
| 1132 | core operations on logical entities (types, terms, theorems, | |
| 39856 | 1133 |   theories, see \chref{ch:logic}).
 | 
| 39854 | 1134 | |
| 1135 | These exceptions indicate a genuine breakdown of the program, so the | |
| 1136 | main purpose is to determine quickly what has happened where. | |
| 39855 | 1137 | Traditionally, the (short) exception message would include the name | 
| 40126 | 1138 | of an ML function, although this is no longer necessary, because the | 
| 57421 | 1139 | ML runtime system attaches detailed source position stemming from the | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1140 |   corresponding @{ML_text raise} keyword.
 | 
| 39854 | 1141 | |
| 1142 | \medskip User modules can always introduce their own custom | |
| 1143 | exceptions locally, e.g.\ to organize internal failures robustly | |
| 1144 | without overlapping with existing exceptions. Exceptions that are | |
| 1145 | exposed in module signatures require extra care, though, and should | |
| 40126 | 1146 |   \emph{not} be introduced by default.  Surprise by users of a module
 | 
| 1147 | can be often minimized by using plain user errors instead. | |
| 39854 | 1148 | |
| 1149 |   \paragraph{Interrupts.}  These indicate arbitrary system events:
 | |
| 1150 | both the ML runtime system and the Isabelle/ML infrastructure signal | |
| 1151 | various exceptional situations by raising the special | |
| 57421 | 1152 |   @{ML Exn.Interrupt} exception in user code.
 | 
| 1153 | ||
| 1154 | This is the one and only way that physical events can intrude an Isabelle/ML | |
| 1155 | program. Such an interrupt can mean out-of-memory, stack overflow, timeout, | |
| 1156 | internal signaling of threads, or a POSIX process signal. An Isabelle/ML | |
| 1157 | program that intercepts interrupts becomes dependent on physical effects of | |
| 1158 | the environment. Even worse, exception handling patterns that are too | |
| 1159 | general by accident, e.g.\ by misspelled exception constructors, will cover | |
| 1160 | interrupts unintentionally and thus render the program semantics | |
| 1161 | ill-defined. | |
| 39854 | 1162 | |
| 1163 | Note that the Interrupt exception dates back to the original SML90 | |
| 1164 | language definition. It was excluded from the SML97 version to | |
| 1165 | avoid its malign impact on ML program semantics, but without | |
| 1166 | providing a viable alternative. Isabelle/ML recovers physical | |
| 40229 | 1167 | interruptibility (which is an indispensable tool to implement | 
| 1168 | managed evaluation of command transactions), but requires user code | |
| 1169 | to be strictly transparent wrt.\ interrupts. | |
| 39854 | 1170 | |
| 1171 |   \begin{warn}
 | |
| 1172 | Isabelle/ML user code needs to terminate promptly on interruption, | |
| 1173 | without guessing at its meaning to the system infrastructure. | |
| 1174 | Temporary handling of interrupts for cleanup of global resources | |
| 1175 | etc.\ needs to be followed immediately by re-raising of the original | |
| 1176 | exception. | |
| 1177 |   \end{warn}
 | |
| 1178 | *} | |
| 1179 | ||
| 39855 | 1180 | text %mlref {*
 | 
| 1181 |   \begin{mldecls}
 | |
| 1182 |   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
 | |
| 1183 |   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
 | |
| 55838 | 1184 |   @{index_ML_exception ERROR: string} \\
 | 
| 1185 |   @{index_ML_exception Fail: string} \\
 | |
| 39856 | 1186 |   @{index_ML Exn.is_interrupt: "exn -> bool"} \\
 | 
| 39855 | 1187 |   @{index_ML reraise: "exn -> 'a"} \\
 | 
| 56303 
4cc3f4db3447
clarified Isabelle/ML bootstrap, such that Execution does not require ML_Compiler;
 wenzelm parents: 
56199diff
changeset | 1188 |   @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\
 | 
| 39855 | 1189 |   \end{mldecls}
 | 
| 1190 | ||
| 1191 |   \begin{description}
 | |
| 1192 | ||
| 1193 |   \item @{ML try}~@{text "f x"} makes the partiality of evaluating
 | |
| 1194 |   @{text "f x"} explicit via the option datatype.  Interrupts are
 | |
| 1195 |   \emph{not} handled here, i.e.\ this form serves as safe replacement
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1196 |   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1197 |   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in
 | 
| 57421 | 1198 | books about SML97, but not in Isabelle/ML. | 
| 39855 | 1199 | |
| 1200 |   \item @{ML can} is similar to @{ML try} with more abstract result.
 | |
| 1201 | ||
| 39856 | 1202 |   \item @{ML ERROR}~@{text "msg"} represents user errors; this
 | 
| 40126 | 1203 |   exception is normally raised indirectly via the @{ML error} function
 | 
| 1204 |   (see \secref{sec:message-channels}).
 | |
| 39856 | 1205 | |
| 1206 |   \item @{ML Fail}~@{text "msg"} represents general program failures.
 | |
| 1207 | ||
| 1208 |   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without
 | |
| 1209 | mentioning concrete exception constructors in user code. Handled | |
| 1210 | interrupts need to be re-raised promptly! | |
| 1211 | ||
| 39855 | 1212 |   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}
 | 
| 1213 | while preserving its implicit position information (if possible, | |
| 1214 | depending on the ML platform). | |
| 1215 | ||
| 56303 
4cc3f4db3447
clarified Isabelle/ML bootstrap, such that Execution does not require ML_Compiler;
 wenzelm parents: 
56199diff
changeset | 1216 |   \item @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~@{text
 | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1217 |   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing
 | 
| 39855 | 1218 | a full trace of its stack of nested exceptions (if possible, | 
| 53739 | 1219 | depending on the ML platform). | 
| 39855 | 1220 | |
| 56303 
4cc3f4db3447
clarified Isabelle/ML bootstrap, such that Execution does not require ML_Compiler;
 wenzelm parents: 
56199diff
changeset | 1221 |   Inserting @{ML Runtime.exn_trace} into ML code temporarily is
 | 
| 53709 
84522727f9d3
improved printing of exception trace in Poly/ML 5.5.1;
 wenzelm parents: 
53167diff
changeset | 1222 | useful for debugging, but not suitable for production code. | 
| 39855 | 1223 | |
| 1224 |   \end{description}
 | |
| 1225 | *} | |
| 1226 | ||
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1227 | text %mlantiq {*
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1228 |   \begin{matharray}{rcl}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1229 |   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1230 |   \end{matharray}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1231 | |
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1232 |   \begin{description}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1233 | |
| 40110 | 1234 |   \item @{text "@{assert}"} inlines a function
 | 
| 1235 |   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is
 | |
| 1236 |   @{ML false}.  Due to inlining the source position of failed
 | |
| 1237 | assertions is included in the error output. | |
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1238 | |
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1239 |   \end{description}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1240 | *} | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1241 | |
| 39859 | 1242 | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1243 | section {* Strings of symbols \label{sec:symbols} *}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1244 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1245 | text {* A \emph{symbol} constitutes the smallest textual unit in
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1246 | Isabelle/ML --- raw ML characters are normally not encountered at | 
| 57421 | 1247 | all. Isabelle strings consist of a sequence of symbols, represented | 
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1248 | as a packed string or an exploded list of strings. Each symbol is | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1249 | in itself a small string, which has either one of the following | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1250 | forms: | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1251 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1252 |   \begin{enumerate}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1253 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1254 |   \item a single ASCII character ``@{text "c"}'', for example
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1255 | ``\verb,a,'', | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1256 | |
| 56579 | 1257 | \item a codepoint according to UTF-8 (non-ASCII byte sequence), | 
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1258 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1259 |   \item a regular symbol ``\verb,\,\verb,<,@{text "ident"}\verb,>,'',
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1260 | for example ``\verb,\,\verb,<alpha>,'', | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1261 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1262 |   \item a control symbol ``\verb,\,\verb,<^,@{text "ident"}\verb,>,'',
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1263 | for example ``\verb,\,\verb,<^bold>,'', | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1264 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1265 |   \item a raw symbol ``\verb,\,\verb,<^raw:,@{text text}\verb,>,''
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1266 |   where @{text text} consists of printable characters excluding
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1267 | ``\verb,.,'' and ``\verb,>,'', for example | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1268 |   ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'',
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1269 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1270 |   \item a numbered raw control symbol ``\verb,\,\verb,<^raw,@{text
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1271 |   n}\verb,>, where @{text n} consists of digits, for example
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1272 | ``\verb,\,\verb,<^raw42>,''. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1273 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1274 |   \end{enumerate}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1275 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1276 |   The @{text "ident"} syntax for symbol names is @{text "letter
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1277 |   (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1278 | "digit = 0..9"}. There are infinitely many regular symbols and | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1279 | control symbols, but a fixed collection of standard symbols is | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1280 | treated specifically. For example, ``\verb,\,\verb,<alpha>,'' is | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1281 | classified as a letter, which means it may occur within regular | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1282 | Isabelle identifiers. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1283 | |
| 57421 | 1284 | The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit | 
| 1285 | character sequences are passed-through unchanged. Unicode/UCS data in UTF-8 | |
| 1286 | encoding is processed in a non-strict fashion, such that well-formed code | |
| 1287 | sequences are recognized accordingly. Unicode provides its own collection of | |
| 1288 | mathematical symbols, but within the core Isabelle/ML world there is no link | |
| 1289 | to the standard collection of Isabelle regular symbols. | |
| 1290 | ||
| 1291 | \medskip Output of Isabelle symbols depends on the print mode. For example, | |
| 1292 |   the standard {\LaTeX} setup of the Isabelle document preparation system
 | |
| 1293 |   would present ``\verb,\,\verb,<alpha>,'' as @{text "\<alpha>"}, and
 | |
| 1294 |   ``\verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as @{text "\<^bold>\<alpha>"}. On-screen
 | |
| 1295 | rendering usually works by mapping a finite subset of Isabelle symbols to | |
| 1296 | suitable Unicode characters. | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1297 | *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1298 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1299 | text %mlref {*
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1300 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1301 |   @{index_ML_type "Symbol.symbol": string} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1302 |   @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1303 |   @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1304 |   @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1305 |   @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1306 |   @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1307 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1308 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1309 |   @{index_ML_type "Symbol.sym"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1310 |   @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1311 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1312 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1313 |   \begin{description}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1314 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1315 |   \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1316 | symbols. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1317 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1318 |   \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1319 |   from the packed form.  This function supersedes @{ML
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1320 | "String.explode"} for virtually all purposes of manipulating text in | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1321 |   Isabelle!\footnote{The runtime overhead for exploded strings is
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1322 | mainly that of the list structure: individual symbols that happen to | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1323 | be a singleton string do not require extra memory in Poly/ML.} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1324 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1325 |   \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1326 |   "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1327 | symbols according to fixed syntactic conventions of Isabelle, cf.\ | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1328 |   \cite{isabelle-isar-ref}.
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1329 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1330 |   \item Type @{ML_type "Symbol.sym"} is a concrete datatype that
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1331 | represents the different kinds of symbols explicitly, with | 
| 57421 | 1332 |   constructors @{ML "Symbol.Char"}, @{ML "Symbol.UTF8"},
 | 
| 1333 |   @{ML "Symbol.Sym"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"},
 | |
| 1334 |   @{ML "Symbol.Malformed"}.
 | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1335 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1336 |   \item @{ML "Symbol.decode"} converts the string representation of a
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1337 | symbol into the datatype version. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1338 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1339 |   \end{description}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1340 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1341 |   \paragraph{Historical note.} In the original SML90 standard the
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1342 |   primitive ML type @{ML_type char} did not exists, and @{ML_text
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1343 | "explode: string -> string list"} produced a list of singleton | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1344 |   strings like @{ML "raw_explode: string -> string list"} in
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1345 | Isabelle/ML today. When SML97 came out, Isabelle did not adopt its | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1346 | somewhat anachronistic 8-bit or 16-bit characters, but the idea of | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1347 | exploding a string into a list of small strings was extended to | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1348 | ``symbols'' as explained above. Thus Isabelle sources can refer to | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1349 | an infinite store of user-defined symbols, without having to worry | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1350 | about the multitude of Unicode encodings that have emerged over the | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1351 | years. *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1352 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1353 | |
| 39863 | 1354 | section {* Basic data types *}
 | 
| 39859 | 1355 | |
| 40126 | 1356 | text {* The basis library proposal of SML97 needs to be treated with
 | 
| 39859 | 1357 | caution. Many of its operations simply do not fit with important | 
| 1358 | Isabelle/ML conventions (like ``canonical argument order'', see | |
| 40126 | 1359 |   \secref{sec:canonical-argument-order}), others cause problems with
 | 
| 1360 |   the parallel evaluation model of Isabelle/ML (such as @{ML
 | |
| 1361 |   TextIO.print} or @{ML OS.Process.system}).
 | |
| 39859 | 1362 | |
| 1363 | Subsequently we give a brief overview of important operations on | |
| 1364 | basic ML data types. | |
| 1365 | *} | |
| 1366 | ||
| 1367 | ||
| 39863 | 1368 | subsection {* Characters *}
 | 
| 1369 | ||
| 1370 | text %mlref {*
 | |
| 1371 |   \begin{mldecls}
 | |
| 1372 |   @{index_ML_type char} \\
 | |
| 1373 |   \end{mldecls}
 | |
| 1374 | ||
| 1375 |   \begin{description}
 | |
| 1376 | ||
| 39864 | 1377 |   \item Type @{ML_type char} is \emph{not} used.  The smallest textual
 | 
| 40126 | 1378 | unit in Isabelle is represented as a ``symbol'' (see | 
| 39864 | 1379 |   \secref{sec:symbols}).
 | 
| 39863 | 1380 | |
| 1381 |   \end{description}
 | |
| 1382 | *} | |
| 1383 | ||
| 1384 | ||
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1385 | subsection {* Strings *}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1386 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1387 | text %mlref {*
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1388 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1389 |   @{index_ML_type string} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1390 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1391 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1392 |   \begin{description}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1393 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1394 |   \item Type @{ML_type string} represents immutable vectors of 8-bit
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1395 | characters. There are operations in SML to convert back and forth | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1396 | to actual byte vectors, which are seldom used. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1397 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1398 | This historically important raw text representation is used for | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1399 | Isabelle-specific purposes with the following implicit substructures | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1400 | packed into the string content: | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1401 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1402 |   \begin{enumerate}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1403 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1404 |   \item sequence of Isabelle symbols (see also \secref{sec:symbols}),
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1405 |   with @{ML Symbol.explode} as key operation;
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1406 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1407 |   \item XML tree structure via YXML (see also \cite{isabelle-sys}),
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1408 |   with @{ML YXML.parse_body} as key operation.
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1409 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1410 |   \end{enumerate}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1411 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1412 | Note that Isabelle/ML string literals may refer Isabelle symbols | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1413 |   like ``\verb,\,\verb,<alpha>,'' natively, \emph{without} escaping
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1414 | the backslash. This is a consequence of Isabelle treating all | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1415 | source text as strings of symbols, instead of raw characters. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1416 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1417 |   \end{description}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1418 | *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1419 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1420 | text %mlex {* The subsequent example illustrates the difference of
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1421 | physical addressing of bytes versus logical addressing of symbols in | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1422 | Isabelle strings. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1423 | *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1424 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1425 | ML_val {*
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1426 | val s = "\<A>"; | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1427 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1428 |   @{assert} (length (Symbol.explode s) = 1);
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1429 |   @{assert} (size s = 4);
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1430 | *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1431 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1432 | text {* Note that in Unicode renderings of the symbol @{text "\<A>"},
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1433 | variations of encodings like UTF-8 or UTF-16 pose delicate questions | 
| 57421 | 1434 | about the multi-byte representations of its codepoint, which is outside | 
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1435 | of the 16-bit address space of the original Unicode standard from | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1436 | the 1990-ies. In Isabelle/ML it is just ``\verb,\,\verb,<A>,'' | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1437 | literally, using plain ASCII characters beyond any doubts. *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1438 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1439 | |
| 39862 | 1440 | subsection {* Integers *}
 | 
| 1441 | ||
| 1442 | text %mlref {*
 | |
| 1443 |   \begin{mldecls}
 | |
| 1444 |   @{index_ML_type int} \\
 | |
| 1445 |   \end{mldecls}
 | |
| 1446 | ||
| 1447 |   \begin{description}
 | |
| 1448 | ||
| 57421 | 1449 |   \item Type @{ML_type int} represents regular mathematical integers, which
 | 
| 1450 |   are \emph{unbounded}. Overflow is treated properly, but should never happen
 | |
| 1451 |   in practice.\footnote{The size limit for integer bit patterns in memory is
 | |
| 1452 | 64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.} This works | |
| 1453 | uniformly for all supported ML platforms (Poly/ML and SML/NJ). | |
| 39862 | 1454 | |
| 40126 | 1455 | Literal integers in ML text are forced to be of this one true | 
| 52417 | 1456 | integer type --- adhoc overloading of SML97 is disabled. | 
| 39862 | 1457 | |
| 55837 | 1458 |   Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
 | 
| 1459 |   @{ML_structure Int}.  Structure @{ML_structure Integer} in @{file
 | |
| 39862 | 1460 | "~~/src/Pure/General/integer.ML"} provides some additional | 
| 1461 | operations. | |
| 1462 | ||
| 1463 |   \end{description}
 | |
| 1464 | *} | |
| 1465 | ||
| 1466 | ||
| 40302 | 1467 | subsection {* Time *}
 | 
| 1468 | ||
| 1469 | text %mlref {*
 | |
| 1470 |   \begin{mldecls}
 | |
| 1471 |   @{index_ML_type Time.time} \\
 | |
| 1472 |   @{index_ML seconds: "real -> Time.time"} \\
 | |
| 1473 |   \end{mldecls}
 | |
| 1474 | ||
| 1475 |   \begin{description}
 | |
| 1476 | ||
| 1477 |   \item Type @{ML_type Time.time} represents time abstractly according
 | |
| 1478 | to the SML97 basis library definition. This is adequate for | |
| 1479 | internal ML operations, but awkward in concrete time specifications. | |
| 1480 | ||
| 1481 |   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text
 | |
| 1482 | "s"} (measured in seconds) into an abstract time value. Floating | |
| 52417 | 1483 | point numbers are easy to use as configuration options in the | 
| 57421 | 1484 |   context (see \secref{sec:config-options}) or system options that
 | 
| 52417 | 1485 | are maintained externally. | 
| 40302 | 1486 | |
| 1487 |   \end{description}
 | |
| 1488 | *} | |
| 1489 | ||
| 1490 | ||
| 39859 | 1491 | subsection {* Options *}
 | 
| 1492 | ||
| 1493 | text %mlref {*
 | |
| 1494 |   \begin{mldecls}
 | |
| 1495 |   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
 | |
| 1496 |   @{index_ML is_some: "'a option -> bool"} \\
 | |
| 1497 |   @{index_ML is_none: "'a option -> bool"} \\
 | |
| 1498 |   @{index_ML the: "'a option -> 'a"} \\
 | |
| 1499 |   @{index_ML these: "'a list option -> 'a list"} \\
 | |
| 1500 |   @{index_ML the_list: "'a option -> 'a list"} \\
 | |
| 1501 |   @{index_ML the_default: "'a -> 'a option -> 'a"} \\
 | |
| 1502 |   \end{mldecls}
 | |
| 1503 | *} | |
| 1504 | ||
| 52417 | 1505 | text {* Apart from @{ML Option.map} most other operations defined in
 | 
| 57421 | 1506 |   structure @{ML_structure Option} are alien to Isabelle/ML and never
 | 
| 52417 | 1507 |   used.  The operations shown above are defined in @{file
 | 
| 1508 | "~~/src/Pure/General/basics.ML"}. *} | |
| 39859 | 1509 | |
| 1510 | ||
| 39863 | 1511 | subsection {* Lists *}
 | 
| 1512 | ||
| 1513 | text {* Lists are ubiquitous in ML as simple and light-weight
 | |
| 1514 | ``collections'' for many everyday programming tasks. Isabelle/ML | |
| 39874 | 1515 | provides important additions and improvements over operations that | 
| 1516 | are predefined in the SML97 library. *} | |
| 39863 | 1517 | |
| 1518 | text %mlref {*
 | |
| 1519 |   \begin{mldecls}
 | |
| 1520 |   @{index_ML cons: "'a -> 'a list -> 'a list"} \\
 | |
| 39874 | 1521 |   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
 | 
| 1522 |   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | |
| 1523 |   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
 | |
| 1524 |   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | |
| 39863 | 1525 |   \end{mldecls}
 | 
| 1526 | ||
| 1527 |   \begin{description}
 | |
| 1528 | ||
| 1529 |   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.
 | |
| 1530 | ||
| 1531 | Tupled infix operators are a historical accident in Standard ML. | |
| 1532 |   The curried @{ML cons} amends this, but it should be only used when
 | |
| 1533 | partial application is required. | |
| 1534 | ||
| 39874 | 1535 |   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat
 | 
| 1536 | lists as a set-like container that maintains the order of elements. | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1537 |   See @{file "~~/src/Pure/library.ML"} for the full specifications
 | 
| 39874 | 1538 | (written in ML). There are some further derived operations like | 
| 1539 |   @{ML union} or @{ML inter}.
 | |
| 1540 | ||
| 1541 |   Note that @{ML insert} is conservative about elements that are
 | |
| 1542 |   already a @{ML member} of the list, while @{ML update} ensures that
 | |
| 40126 | 1543 | the latest entry is always put in front. The latter discipline is | 
| 39874 | 1544 | often more appropriate in declarations of context data | 
| 1545 |   (\secref{sec:context-data}) that are issued by the user in Isar
 | |
| 52417 | 1546 | source: later declarations take precedence over earlier ones. | 
| 39874 | 1547 | |
| 39863 | 1548 |   \end{description}
 | 
| 1549 | *} | |
| 1550 | ||
| 52417 | 1551 | text %mlex {* Using canonical @{ML fold} together with @{ML cons} (or
 | 
| 1552 | similar standard operations) alternates the orientation of data. | |
| 40126 | 1553 | The is quite natural and should not be altered forcible by inserting | 
| 1554 |   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can
 | |
| 1555 | be used in the few situations, where alternation should be | |
| 1556 | prevented. | |
| 39863 | 1557 | *} | 
| 1558 | ||
| 1559 | ML {*
 | |
| 1560 | val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; | |
| 1561 | ||
| 1562 | val list1 = fold cons items []; | |
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1563 |   @{assert} (list1 = rev items);
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1564 | |
| 39863 | 1565 | val list2 = fold_rev cons items []; | 
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1566 |   @{assert} (list2 = items);
 | 
| 39863 | 1567 | *} | 
| 1568 | ||
| 39883 | 1569 | text {* The subsequent example demonstrates how to \emph{merge} two
 | 
| 1570 | lists in a natural way. *} | |
| 1571 | ||
| 1572 | ML {*
 | |
| 1573 | fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs; | |
| 1574 | *} | |
| 1575 | ||
| 1576 | text {* Here the first list is treated conservatively: only the new
 | |
| 1577 | elements from the second list are inserted. The inside-out order of | |
| 1578 |   insertion via @{ML fold_rev} attempts to preserve the order of
 | |
| 1579 | elements in the result. | |
| 1580 | ||
| 1581 | This way of merging lists is typical for context data | |
| 1582 |   (\secref{sec:context-data}).  See also @{ML merge} as defined in
 | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1583 |   @{file "~~/src/Pure/library.ML"}.
 | 
| 39883 | 1584 | *} | 
| 1585 | ||
| 39863 | 1586 | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1587 | subsection {* Association lists *}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1588 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1589 | text {* The operations for association lists interpret a concrete list
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1590 | of pairs as a finite function from keys to values. Redundant | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1591 | representations with multiple occurrences of the same key are | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1592 | implicitly normalized: lookup and update only take the first | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1593 | occurrence into account. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1594 | *} | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1595 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1596 | text {*
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1597 |   \begin{mldecls}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1598 |   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1599 |   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1600 |   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1601 |   \end{mldecls}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1602 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1603 |   \begin{description}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1604 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1605 |   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1606 | implement the main ``framework operations'' for mappings in | 
| 40126 | 1607 | Isabelle/ML, following standard conventions for their names and | 
| 1608 | types. | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1609 | |
| 57421 | 1610 |   Note that a function called @{verbatim lookup} is obliged to express its
 | 
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1611 | partiality via an explicit option element. There is no choice to | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1612 | raise an exception, without changing the name to something like | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1613 |   @{text "the_element"} or @{text "get"}.
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1614 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1615 |   The @{text "defined"} operation is essentially a contraction of @{ML
 | 
| 57421 | 1616 |   is_some} and @{verbatim "lookup"}, but this is sufficiently frequent to
 | 
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1617 | justify its independent existence. This also gives the | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1618 | implementation some opportunity for peep-hole optimization. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1619 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1620 |   \end{description}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1621 | |
| 57421 | 1622 | Association lists are adequate as simple implementation of finite mappings | 
| 1623 | in many practical situations. A more advanced table structure is defined in | |
| 1624 |   @{file "~~/src/Pure/General/table.ML"}; that version scales easily to
 | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1625 | thousands or millions of elements. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1626 | *} | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1627 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1628 | |
| 39859 | 1629 | subsection {* Unsynchronized references *}
 | 
| 1630 | ||
| 1631 | text %mlref {*
 | |
| 1632 |   \begin{mldecls}
 | |
| 39870 | 1633 |   @{index_ML_type "'a Unsynchronized.ref"} \\
 | 
| 39859 | 1634 |   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
 | 
| 1635 |   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
 | |
| 46262 | 1636 |   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
 | 
| 39859 | 1637 |   \end{mldecls}
 | 
| 1638 | *} | |
| 1639 | ||
| 1640 | text {* Due to ubiquitous parallelism in Isabelle/ML (see also
 | |
| 1641 |   \secref{sec:multi-threading}), the mutable reference cells of
 | |
| 1642 | Standard ML are notorious for causing problems. In a highly | |
| 1643 |   parallel system, both correctness \emph{and} performance are easily
 | |
| 1644 | degraded when using mutable data. | |
| 1645 | ||
| 1646 |   The unwieldy name of @{ML Unsynchronized.ref} for the constructor
 | |
| 1647 | for references in Isabelle/ML emphasizes the inconveniences caused by | |
| 46262 | 1648 |   mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are
 | 
| 39859 | 1649 | unchanged, but should be used with special precautions, say in a | 
| 1650 | strictly local situation that is guaranteed to be restricted to | |
| 40508 | 1651 | sequential evaluation --- now and in the future. | 
| 1652 | ||
| 1653 |   \begin{warn}
 | |
| 1654 |   Never @{ML_text "open Unsynchronized"}, not even in a local scope!
 | |
| 1655 | Pretending that mutable state is no problem is a very bad idea. | |
| 1656 |   \end{warn}
 | |
| 1657 | *} | |
| 39859 | 1658 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1659 | |
| 39870 | 1660 | section {* Thread-safe programming \label{sec:multi-threading} *}
 | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1661 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1662 | text {* Multi-threaded execution has become an everyday reality in
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1663 | Isabelle since Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides | 
| 39868 | 1664 | implicit and explicit parallelism by default, and there is no way | 
| 1665 | for user-space tools to ``opt out''. ML programs that are purely | |
| 1666 | functional, output messages only via the official channels | |
| 1667 |   (\secref{sec:message-channels}), and do not intercept interrupts
 | |
| 1668 |   (\secref{sec:exceptions}) can participate in the multi-threaded
 | |
| 1669 | environment immediately without further ado. | |
| 1670 | ||
| 1671 | More ambitious tools with more fine-grained interaction with the | |
| 1672 | environment need to observe the principles explained below. | |
| 1673 | *} | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1674 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1675 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1676 | subsection {* Multi-threading with shared memory *}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1677 | |
| 39868 | 1678 | text {* Multiple threads help to organize advanced operations of the
 | 
| 1679 | system, such as real-time conditions on command transactions, | |
| 1680 | sub-components with explicit communication, general asynchronous | |
| 1681 | interaction etc. Moreover, parallel evaluation is a prerequisite to | |
| 1682 | make adequate use of the CPU resources that are available on | |
| 1683 |   multi-core systems.\footnote{Multi-core computing does not mean that
 | |
| 1684 | there are ``spare cycles'' to be wasted. It means that the | |
| 1685 | continued exponential speedup of CPU performance due to ``Moore's | |
| 1686 | Law'' follows different rules: clock frequency has reached its peak | |
| 1687 | around 2005, and applications need to be parallelized in order to | |
| 1688 | avoid a perceived loss of performance. See also | |
| 1689 |   \cite{Sutter:2005}.}
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1690 | |
| 57421 | 1691 | Isabelle/Isar exploits the inherent structure of theories and proofs to | 
| 1692 |   support \emph{implicit parallelism} to a large extent. LCF-style theorem
 | |
| 1693 | proving provides almost ideal conditions for that, see also | |
| 1694 |   \cite{Wenzel:2009}. This means, significant parts of theory and proof
 | |
| 1695 | checking is parallelized by default. In Isabelle2013, a maximum | |
| 1696 | speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be expected | |
| 1697 |   \cite{Wenzel:2013:ITP}.
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1698 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1699 | \medskip ML threads lack the memory protection of separate | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1700 | processes, and operate concurrently on shared heap memory. This has | 
| 40126 | 1701 | the advantage that results of independent computations are directly | 
| 1702 | available to other threads: abstract values can be passed without | |
| 1703 | copying or awkward serialization that is typically required for | |
| 1704 | separate processes. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1705 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1706 | To make shared-memory multi-threading work robustly and efficiently, | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1707 | some programming guidelines need to be observed. While the ML | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1708 | system is responsible to maintain basic integrity of the | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1709 | representation of ML values in memory, the application programmer | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1710 | needs to ensure that multi-threaded execution does not break the | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1711 | intended semantics. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1712 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1713 |   \begin{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1714 | To participate in implicit parallelism, tools need to be | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1715 | thread-safe. A single ill-behaved tool can affect the stability and | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1716 | performance of the whole system. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1717 |   \end{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1718 | |
| 57421 | 1719 | Apart from observing the principles of thread-safeness passively, advanced | 
| 1720 | tools may also exploit parallelism actively, e.g.\ by using library | |
| 39868 | 1721 |   functions for parallel list operations (\secref{sec:parlist}).
 | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1722 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1723 |   \begin{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1724 | Parallel computing resources are managed centrally by the | 
| 39868 | 1725 | Isabelle/ML infrastructure. User programs must not fork their own | 
| 57421 | 1726 | ML threads to perform heavy computations. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1727 |   \end{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1728 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1729 | |
| 39868 | 1730 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1731 | subsection {* Critical shared resources *}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1732 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1733 | text {* Thread-safeness is mainly concerned about concurrent
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1734 | read/write access to shared resources, which are outside the purely | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1735 | functional world of ML. This covers the following in particular. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1736 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1737 |   \begin{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1738 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1739 | \item Global references (or arrays), i.e.\ mutable memory cells that | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1740 | persist over several invocations of associated | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1741 |   operations.\footnote{This is independent of the visibility of such
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1742 | mutable values in the toplevel scope.} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1743 | |
| 39868 | 1744 | \item Global state of the running Isabelle/ML process, i.e.\ raw I/O | 
| 1745 | channels, environment variables, current working directory. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1746 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1747 | \item Writable resources in the file-system that are shared among | 
| 40126 | 1748 | different threads or external processes. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1749 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1750 |   \end{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1751 | |
| 39868 | 1752 | Isabelle/ML provides various mechanisms to avoid critical shared | 
| 40126 | 1753 | resources in most situations. As last resort there are some | 
| 1754 | mechanisms for explicit synchronization. The following guidelines | |
| 1755 | help to make Isabelle/ML programs work smoothly in a concurrent | |
| 1756 | environment. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1757 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1758 |   \begin{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1759 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1760 | \item Avoid global references altogether. Isabelle/Isar maintains a | 
| 39868 | 1761 | uniform context that incorporates arbitrary data declared by user | 
| 1762 |   programs (\secref{sec:context-data}).  This context is passed as
 | |
| 1763 | plain value and user tools can get/map their own data in a purely | |
| 1764 | functional manner. Configuration options within the context | |
| 1765 |   (\secref{sec:config-options}) provide simple drop-in replacements
 | |
| 40126 | 1766 | for historic reference variables. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1767 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1768 | \item Keep components with local state information re-entrant. | 
| 39868 | 1769 | Instead of poking initial values into (private) global references, a | 
| 1770 | new state record can be created on each invocation, and passed | |
| 1771 | through any auxiliary functions of the component. The state record | |
| 1772 | may well contain mutable references, without requiring any special | |
| 57421 | 1773 | synchronizations, as long as each invocation gets its own copy, and the | 
| 1774 | tool itself is single-threaded. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1775 | |
| 39868 | 1776 |   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The
 | 
| 1777 | Poly/ML library is thread-safe for each individual output operation, | |
| 1778 | but the ordering of parallel invocations is arbitrary. This means | |
| 1779 | raw output will appear on some system console with unpredictable | |
| 1780 | interleaving of atomic chunks. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1781 | |
| 39868 | 1782 | Note that this does not affect regular message output channels | 
| 57421 | 1783 |   (\secref{sec:message-channels}).  An official message id is associated
 | 
| 39868 | 1784 | with the command transaction from where it originates, independently | 
| 1785 | of other transactions. This means each running Isar command has | |
| 1786 | effectively its own set of message channels, and interleaving can | |
| 1787 | only happen when commands use parallelism internally (and only at | |
| 1788 | message boundaries). | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1789 | |
| 39868 | 1790 | \item Treat environment variables and the current working directory | 
| 57421 | 1791 | of the running process as read-only. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1792 | |
| 39868 | 1793 | \item Restrict writing to the file-system to unique temporary files. | 
| 1794 | Isabelle already provides a temporary directory that is unique for | |
| 1795 | the running process, and there is a centralized source of unique | |
| 1796 | serial numbers in Isabelle/ML. Thus temporary files that are passed | |
| 1797 | to to some external process will be always disjoint, and thus | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1798 | thread-safe. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1799 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1800 |   \end{itemize}
 | 
| 39868 | 1801 | *} | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1802 | |
| 39868 | 1803 | text %mlref {*
 | 
| 1804 |   \begin{mldecls}
 | |
| 1805 |   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
 | |
| 1806 |   @{index_ML serial_string: "unit -> string"} \\
 | |
| 1807 |   \end{mldecls}
 | |
| 1808 | ||
| 1809 |   \begin{description}
 | |
| 1810 | ||
| 1811 |   \item @{ML File.tmp_path}~@{text "path"} relocates the base
 | |
| 1812 |   component of @{text "path"} into the unique temporary directory of
 | |
| 1813 | the running Isabelle/ML process. | |
| 1814 | ||
| 1815 |   \item @{ML serial_string}~@{text "()"} creates a new serial number
 | |
| 1816 | that is unique over the runtime of the Isabelle/ML process. | |
| 1817 | ||
| 1818 |   \end{description}
 | |
| 1819 | *} | |
| 1820 | ||
| 1821 | text %mlex {* The following example shows how to create unique
 | |
| 1822 | temporary file names. | |
| 1823 | *} | |
| 1824 | ||
| 1825 | ML {*
 | |
| 1826 |   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | |
| 1827 |   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | |
| 1828 |   @{assert} (tmp1 <> tmp2);
 | |
| 1829 | *} | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1830 | |
| 39868 | 1831 | |
| 1832 | subsection {* Explicit synchronization *}
 | |
| 1833 | ||
| 1834 | text {* Isabelle/ML also provides some explicit synchronization
 | |
| 1835 | mechanisms, for the rare situations where mutable shared resources | |
| 1836 | are really required. These are based on the synchronizations | |
| 1837 | primitives of Poly/ML, which have been adapted to the specific | |
| 1838 | assumptions of the concurrent Isabelle/ML environment. User code | |
| 1839 | must not use the Poly/ML primitives directly! | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1840 | |
| 39868 | 1841 | \medskip The most basic synchronization concept is a single | 
| 1842 |   \emph{critical section} (also called ``monitor'' in the literature).
 | |
| 1843 | A thread that enters the critical section prevents all other threads | |
| 1844 | from doing the same. A thread that is already within the critical | |
| 1845 | section may re-enter it in an idempotent manner. | |
| 1846 | ||
| 1847 | Such centralized locking is convenient, because it prevents | |
| 1848 | deadlocks by construction. | |
| 1849 | ||
| 1850 |   \medskip More fine-grained locking works via \emph{synchronized
 | |
| 1851 | variables}. An explicit state component is associated with | |
| 1852 | mechanisms for locking and signaling. There are operations to | |
| 1853 | await a condition, change the state, and signal the change to all | |
| 1854 | other waiting threads. | |
| 1855 | ||
| 1856 |   Here the synchronized access to the state variable is \emph{not}
 | |
| 1857 | re-entrant: direct or indirect nesting within the same thread will | |
| 1858 | cause a deadlock! | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1859 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1860 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1861 | text %mlref {*
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1862 |   \begin{mldecls}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1863 |   @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1864 |   @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1865 |   \end{mldecls}
 | 
| 39871 | 1866 |   \begin{mldecls}
 | 
| 1867 |   @{index_ML_type "'a Synchronized.var"} \\
 | |
| 1868 |   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
 | |
| 1869 |   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
 | |
| 1870 |   ('a -> ('b * 'a) option) -> 'b"} \\
 | |
| 1871 |   \end{mldecls}
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1872 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1873 |   \begin{description}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1874 | |
| 39868 | 1875 |   \item @{ML NAMED_CRITICAL}~@{text "name e"} evaluates @{text "e ()"}
 | 
| 1876 | within the central critical section of Isabelle/ML. No other thread | |
| 1877 | may do so at the same time, but non-critical parallel execution will | |
| 39871 | 1878 |   continue.  The @{text "name"} argument is used for tracing and might
 | 
| 39868 | 1879 | help to spot sources of congestion. | 
| 1880 | ||
| 52418 | 1881 | Entering the critical section without contention is very fast. Each | 
| 1882 | thread should stay within the critical section only very briefly, | |
| 40126 | 1883 | otherwise parallel performance may degrade. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1884 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1885 |   \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1886 | name argument. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1887 | |
| 39871 | 1888 |   \item Type @{ML_type "'a Synchronized.var"} represents synchronized
 | 
| 1889 |   variables with state of type @{ML_type 'a}.
 | |
| 1890 | ||
| 1891 |   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized
 | |
| 1892 |   variable that is initialized with value @{text "x"}.  The @{text
 | |
| 1893 | "name"} is used for tracing. | |
| 1894 | ||
| 1895 |   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the
 | |
| 1896 |   function @{text "f"} operate within a critical section on the state
 | |
| 40126 | 1897 |   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it
 | 
| 1898 | continues to wait on the internal condition variable, expecting that | |
| 39871 | 1899 | some other thread will eventually change the content in a suitable | 
| 40126 | 1900 |   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is
 | 
| 1901 |   satisfied and assigns the new state value @{text "x'"}, broadcasts a
 | |
| 1902 | signal to all waiting threads on the associated condition variable, | |
| 1903 |   and returns the result @{text "y"}.
 | |
| 39871 | 1904 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1905 |   \end{description}
 | 
| 39871 | 1906 | |
| 40126 | 1907 |   There are some further variants of the @{ML
 | 
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1908 |   Synchronized.guarded_access} combinator, see @{file
 | 
| 39871 | 1909 | "~~/src/Pure/Concurrent/synchronized.ML"} for details. | 
| 1910 | *} | |
| 1911 | ||
| 40126 | 1912 | text %mlex {* The following example implements a counter that produces
 | 
| 39871 | 1913 | positive integers that are unique over the runtime of the Isabelle | 
| 1914 | process: | |
| 1915 | *} | |
| 1916 | ||
| 1917 | ML {*
 | |
| 1918 | local | |
| 1919 | val counter = Synchronized.var "counter" 0; | |
| 1920 | in | |
| 1921 | fun next () = | |
| 1922 | Synchronized.guarded_access counter | |
| 1923 | (fn i => | |
| 1924 | let val j = i + 1 | |
| 1925 | in SOME (j, j) end); | |
| 1926 | end; | |
| 1927 | *} | |
| 1928 | ||
| 1929 | ML {*
 | |
| 1930 | val a = next (); | |
| 1931 | val b = next (); | |
| 1932 |   @{assert} (a <> b);
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1933 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1934 | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1935 | text {* \medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how
 | 
| 40126 | 1936 | to implement a mailbox as synchronized variable over a purely | 
| 57417 
29fe9bac501b
more tight Mailbox: single list is sufficient for single receiver, reverse outside critical section;
 wenzelm parents: 
57350diff
changeset | 1937 | functional list. *} | 
| 40126 | 1938 | |
| 52419 | 1939 | |
| 1940 | section {* Managed evaluation *}
 | |
| 1941 | ||
| 1942 | text {* Execution of Standard ML follows the model of strict
 | |
| 1943 | functional evaluation with optional exceptions. Evaluation happens | |
| 1944 | whenever some function is applied to (sufficiently many) | |
| 1945 | arguments. The result is either an explicit value or an implicit | |
| 1946 | exception. | |
| 1947 | ||
| 1948 |   \emph{Managed evaluation} in Isabelle/ML organizes expressions and
 | |
| 1949 | results to control certain physical side-conditions, to say more | |
| 1950 | specifically when and how evaluation happens. For example, the | |
| 1951 | Isabelle/ML library supports lazy evaluation with memoing, parallel | |
| 1952 | evaluation via futures, asynchronous evaluation via promises, | |
| 1953 | evaluation with time limit etc. | |
| 1954 | ||
| 1955 |   \medskip An \emph{unevaluated expression} is represented either as
 | |
| 1956 |   unit abstraction @{verbatim "fn () => a"} of type
 | |
| 1957 |   @{verbatim "unit -> 'a"} or as regular function
 | |
| 1958 |   @{verbatim "fn a => b"} of type @{verbatim "'a -> 'b"}.  Both forms
 | |
| 1959 | occur routinely, and special care is required to tell them apart --- | |
| 1960 | the static type-system of SML is only of limited help here. | |
| 1961 | ||
| 1962 |   The first form is more intuitive: some combinator @{text "(unit ->
 | |
| 1963 |   'a) -> 'a"} applies the given function to @{text "()"} to initiate
 | |
| 1964 | the postponed evaluation process. The second form is more flexible: | |
| 1965 |   some combinator @{text "('a -> 'b) -> 'a -> 'b"} acts like a
 | |
| 1966 | modified form of function application; several such combinators may | |
| 1967 | be cascaded to modify a given function, before it is ultimately | |
| 1968 | applied to some argument. | |
| 1969 | ||
| 1970 |   \medskip \emph{Reified results} make the disjoint sum of regular
 | |
| 1971 | values versions exceptional situations explicit as ML datatype: | |
| 1972 |   @{text "'a result = Res of 'a | Exn of exn"}.  This is typically
 | |
| 1973 | used for administrative purposes, to store the overall outcome of an | |
| 1974 | evaluation process. | |
| 1975 | ||
| 1976 |   \emph{Parallel exceptions} aggregate reified results, such that
 | |
| 1977 | multiple exceptions are digested as a collection in canonical form | |
| 1978 | that identifies exceptions according to their original occurrence. | |
| 1979 | This is particular important for parallel evaluation via futures | |
| 1980 |   \secref{sec:futures}, which are organized as acyclic graph of
 | |
| 1981 | evaluations that depend on other evaluations: exceptions stemming | |
| 1982 | from shared sub-graphs are exposed exactly once and in the order of | |
| 1983 | their original occurrence (e.g.\ when printed at the toplevel). | |
| 1984 | Interrupt counts as neutral element here: it is treated as minimal | |
| 1985 | information about some canceled evaluation process, and is absorbed | |
| 1986 | by the presence of regular program exceptions. *} | |
| 1987 | ||
| 1988 | text %mlref {*
 | |
| 1989 |   \begin{mldecls}
 | |
| 1990 |   @{index_ML_type "'a Exn.result"} \\
 | |
| 1991 |   @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
 | |
| 1992 |   @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
 | |
| 1993 |   @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
 | |
| 1994 |   @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
 | |
| 1995 |   @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
 | |
| 1996 |   \end{mldecls}
 | |
| 1997 | ||
| 1998 |   \begin{description}
 | |
| 1999 | ||
| 2000 |   \item Type @{ML_type "'a Exn.result"} represents the disjoint sum of
 | |
| 2001 |   ML results explicitly, with constructor @{ML Exn.Res} for regular
 | |
| 2002 |   values and @{ML "Exn.Exn"} for exceptions.
 | |
| 2003 | ||
| 2004 |   \item @{ML Exn.capture}~@{text "f x"} manages the evaluation of
 | |
| 2005 |   @{text "f x"} such that exceptions are made explicit as @{ML
 | |
| 2006 | "Exn.Exn"}. Note that this includes physical interrupts (see also | |
| 2007 |   \secref{sec:exceptions}), so the same precautions apply to user
 | |
| 2008 | code: interrupts must not be absorbed accidentally! | |
| 2009 | ||
| 2010 |   \item @{ML Exn.interruptible_capture} is similar to @{ML
 | |
| 2011 | Exn.capture}, but interrupts are immediately re-raised as required | |
| 2012 | for user code. | |
| 2013 | ||
| 2014 |   \item @{ML Exn.release}~@{text "result"} releases the original
 | |
| 2015 | runtime result, exposing its regular value or raising the reified | |
| 2016 | exception. | |
| 2017 | ||
| 2018 |   \item @{ML Par_Exn.release_all}~@{text "results"} combines results
 | |
| 2019 | that were produced independently (e.g.\ by parallel evaluation). If | |
| 2020 | all results are regular values, that list is returned. Otherwise, | |
| 2021 | the collection of all exceptions is raised, wrapped-up as collective | |
| 2022 | parallel exception. Note that the latter prevents access to | |
| 57421 | 2023 |   individual exceptions by conventional @{verbatim "handle"} of ML.
 | 
| 52419 | 2024 | |
| 2025 |   \item @{ML Par_Exn.release_first} is similar to @{ML
 | |
| 2026 | Par_Exn.release_all}, but only the first exception that has occurred | |
| 2027 | in the original evaluation process is raised again, the others are | |
| 2028 | ignored. That single exception may get handled by conventional | |
| 57421 | 2029 | means in ML. | 
| 52419 | 2030 | |
| 2031 |   \end{description}
 | |
| 2032 | *} | |
| 2033 | ||
| 2034 | ||
| 52420 | 2035 | subsection {* Parallel skeletons \label{sec:parlist} *}
 | 
| 2036 | ||
| 2037 | text {*
 | |
| 2038 | Algorithmic skeletons are combinators that operate on lists in | |
| 2039 |   parallel, in the manner of well-known @{text map}, @{text exists},
 | |
| 2040 |   @{text forall} etc.  Management of futures (\secref{sec:futures})
 | |
| 2041 | and their results as reified exceptions is wrapped up into simple | |
| 2042 | programming interfaces that resemble the sequential versions. | |
| 2043 | ||
| 2044 | What remains is the application-specific problem to present | |
| 2045 |   expressions with suitable \emph{granularity}: each list element
 | |
| 2046 | corresponds to one evaluation task. If the granularity is too | |
| 2047 | coarse, the available CPUs are not saturated. If it is too | |
| 2048 | fine-grained, CPU cycles are wasted due to the overhead of | |
| 2049 | organizing parallel processing. In the worst case, parallel | |
| 2050 | performance will be less than the sequential counterpart! | |
| 2051 | *} | |
| 2052 | ||
| 2053 | text %mlref {*
 | |
| 2054 |   \begin{mldecls}
 | |
| 2055 |   @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
 | |
| 2056 |   @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
 | |
| 2057 |   \end{mldecls}
 | |
| 2058 | ||
| 2059 |   \begin{description}
 | |
| 2060 | ||
| 2061 |   \item @{ML Par_List.map}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} is like @{ML
 | |
| 2062 |   "map"}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"}, but the evaluation of @{text "f x\<^sub>i"}
 | |
| 2063 |   for @{text "i = 1, \<dots>, n"} is performed in parallel.
 | |
| 2064 | ||
| 2065 |   An exception in any @{text "f x\<^sub>i"} cancels the overall evaluation
 | |
| 2066 |   process.  The final result is produced via @{ML
 | |
| 2067 | Par_Exn.release_first} as explained above, which means the first | |
| 2068 | program exception that happened to occur in the parallel evaluation | |
| 2069 | is propagated, and all other failures are ignored. | |
| 2070 | ||
| 2071 |   \item @{ML Par_List.get_some}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} produces some
 | |
| 2072 |   @{text "f x\<^sub>i"} that is of the form @{text "SOME y\<^sub>i"}, if that
 | |
| 2073 |   exists, otherwise @{text "NONE"}.  Thus it is similar to @{ML
 | |
| 2074 | Library.get_first}, but subject to a non-deterministic parallel | |
| 2075 | choice process. The first successful result cancels the overall | |
| 2076 |   evaluation process; other exceptions are propagated as for @{ML
 | |
| 2077 | Par_List.map}. | |
| 2078 | ||
| 2079 | This generic parallel choice combinator is the basis for derived | |
| 2080 |   forms, such as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML
 | |
| 2081 | Par_List.forall}. | |
| 2082 | ||
| 2083 |   \end{description}
 | |
| 2084 | *} | |
| 2085 | ||
| 2086 | text %mlex {* Subsequently, the Ackermann function is evaluated in
 | |
| 2087 | parallel for some ranges of arguments. *} | |
| 2088 | ||
| 2089 | ML_val {*
 | |
| 2090 | fun ackermann 0 n = n + 1 | |
| 2091 | | ackermann m 0 = ackermann (m - 1) 1 | |
| 2092 | | ackermann m n = ackermann (m - 1) (ackermann m (n - 1)); | |
| 2093 | ||
| 2094 | Par_List.map (ackermann 2) (500 upto 1000); | |
| 2095 | Par_List.map (ackermann 3) (5 upto 10); | |
| 2096 | *} | |
| 2097 | ||
| 2098 | ||
| 2099 | subsection {* Lazy evaluation *}
 | |
| 2100 | ||
| 2101 | text {*
 | |
| 57349 | 2102 |   Classic lazy evaluation works via the @{text lazy}~/ @{text force} pair of
 | 
| 2103 |   operations: @{text lazy} to wrap an unevaluated expression, and @{text
 | |
| 2104 | force} to evaluate it once and store its result persistently. Later | |
| 2105 |   invocations of @{text force} retrieve the stored result without another
 | |
| 2106 | evaluation. Isabelle/ML refines this idea to accommodate the aspects of | |
| 2107 | multi-threading, synchronous program exceptions and asynchronous interrupts. | |
| 57347 | 2108 | |
| 2109 |   The first thread that invokes @{text force} on an unfinished lazy value
 | |
| 2110 |   changes its state into a \emph{promise} of the eventual result and starts
 | |
| 2111 |   evaluating it. Any other threads that @{text force} the same lazy value in
 | |
| 2112 | the meantime need to wait for it to finish, by producing a regular result or | |
| 2113 | program exception. If the evaluation attempt is interrupted, this event is | |
| 2114 | propagated to all waiting threads and the lazy value is reset to its | |
| 2115 | original state. | |
| 2116 | ||
| 2117 | This means a lazy value is completely evaluated at most once, in a | |
| 2118 | thread-safe manner. There might be multiple interrupted evaluation attempts, | |
| 2119 | and multiple receivers of intermediate interrupt events. Interrupts are | |
| 2120 |   \emph{not} made persistent: later evaluation attempts start again from the
 | |
| 2121 | original expression. | |
| 2122 | *} | |
| 2123 | ||
| 2124 | text %mlref {*
 | |
| 2125 |   \begin{mldecls}
 | |
| 2126 |   @{index_ML_type "'a lazy"} \\
 | |
| 2127 |   @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\
 | |
| 2128 |   @{index_ML Lazy.value: "'a -> 'a lazy"} \\
 | |
| 2129 |   @{index_ML Lazy.force: "'a lazy -> 'a"} \\
 | |
| 2130 |   \end{mldecls}
 | |
| 2131 | ||
| 2132 |   \begin{description}
 | |
| 2133 | ||
| 2134 |   \item Type @{ML_type "'a lazy"} represents lazy values over type @{verbatim
 | |
| 2135 | "'a"}. | |
| 2136 | ||
| 2137 |   \item @{ML Lazy.lazy}~@{text "(fn () => e)"} wraps the unevaluated
 | |
| 2138 |   expression @{text e} as unfinished lazy value.
 | |
| 2139 | ||
| 2140 |   \item @{ML Lazy.value}~@{text a} wraps the value @{text a} as finished lazy
 | |
| 2141 |   value.  When forced, it returns @{text a} without any further evaluation.
 | |
| 2142 | ||
| 57349 | 2143 | There is very low overhead for this proforma wrapping of strict values as | 
| 2144 | lazy values. | |
| 57347 | 2145 | |
| 2146 |   \item @{ML Lazy.force}~@{text x} produces the result of the lazy value in a
 | |
| 2147 | thread-safe manner as explained above. Thus it may cause the current thread | |
| 2148 | to wait on a pending evaluation attempt by another thread. | |
| 2149 | ||
| 2150 |   \end{description}
 | |
| 52420 | 2151 | *} | 
| 2152 | ||
| 2153 | ||
| 57349 | 2154 | subsection {* Futures \label{sec:futures} *}
 | 
| 52420 | 2155 | |
| 2156 | text {*
 | |
| 57349 | 2157 | Futures help to organize parallel execution in a value-oriented manner, with | 
| 2158 |   @{text fork}~/ @{text join} as the main pair of operations, and some further
 | |
| 57350 | 2159 |   variants; see also \cite{Wenzel:2009,Wenzel:2013:ITP}. Unlike lazy values,
 | 
| 2160 | futures are evaluated strictly and spontaneously on separate worker threads. | |
| 2161 | Futures may be canceled, which leads to interrupts on running evaluation | |
| 2162 | attempts, and forces structurally related futures to fail for all time; | |
| 2163 | already finished futures remain unchanged. Exceptions between related | |
| 2164 | futures are propagated as well, and turned into parallel exceptions (see | |
| 2165 | above). | |
| 57349 | 2166 | |
| 2167 | Technically, a future is a single-assignment variable together with a | |
| 2168 |   \emph{task} that serves administrative purposes, notably within the
 | |
| 2169 |   \emph{task queue} where new futures are registered for eventual evaluation
 | |
| 2170 | and the worker threads retrieve their work. | |
| 2171 | ||
| 57350 | 2172 | The pool of worker threads is limited, in correlation with the number of | 
| 2173 | physical cores on the machine. Note that allocation of runtime resources may | |
| 2174 | be distorted either if workers yield CPU time (e.g.\ via system sleep or | |
| 2175 | wait operations), or if non-worker threads contend for significant runtime | |
| 2176 | resources independently. There is a limited number of replacement worker | |
| 2177 | threads that get activated in certain explicit wait conditions, after a | |
| 2178 | timeout. | |
| 2179 | ||
| 57349 | 2180 |   \medskip Each future task belongs to some \emph{task group}, which
 | 
| 2181 | represents the hierarchic structure of related tasks, together with the | |
| 2182 | exception status a that point. By default, the task group of a newly created | |
| 2183 | future is a new sub-group of the presently running one, but it is also | |
| 2184 | possible to indicate different group layouts under program control. | |
| 2185 | ||
| 2186 | Cancellation of futures actually refers to the corresponding task group and | |
| 2187 | all its sub-groups. Thus interrupts are propagated down the group hierarchy. | |
| 2188 | Regular program exceptions are treated likewise: failure of the evaluation | |
| 2189 | of some future task affects its own group and all sub-groups. Given a | |
| 2190 |   particular task group, its \emph{group status} cumulates all relevant
 | |
| 57350 | 2191 | exceptions according to its position within the group hierarchy. Interrupted | 
| 2192 | tasks that lack regular result information, will pick up parallel exceptions | |
| 2193 | from the cumulative group status. | |
| 57349 | 2194 | |
| 2195 |   \medskip A \emph{passive future} or \emph{promise} is a future with slightly
 | |
| 2196 | different evaluation policies: there is only a single-assignment variable | |
| 2197 |   and some expression to evaluate for the \emph{failed} case (e.g.\ to clean
 | |
| 2198 | up resources when canceled). A regular result is produced by external means, | |
| 2199 |   using a separate \emph{fulfill} operation.
 | |
| 2200 | ||
| 2201 | Promises are managed in the same task queue, so regular futures may depend | |
| 2202 | on them. This allows a form of reactive programming, where some promises are | |
| 2203 | used as minimal elements (or guards) within the future dependency graph: | |
| 2204 | when these promises are fulfilled the evaluation of subsequent futures | |
| 2205 | starts spontaneously, according to their own inter-dependencies. | |
| 52420 | 2206 | *} | 
| 2207 | ||
| 57348 | 2208 | text %mlref {*
 | 
| 2209 |   \begin{mldecls}
 | |
| 2210 |   @{index_ML_type "'a future"} \\
 | |
| 2211 |   @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\
 | |
| 2212 |   @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\
 | |
| 57349 | 2213 |   @{index_ML Future.join: "'a future -> 'a"} \\
 | 
| 2214 |   @{index_ML Future.joins: "'a future list -> 'a list"} \\
 | |
| 57348 | 2215 |   @{index_ML Future.value: "'a -> 'a future"} \\
 | 
| 2216 |   @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\
 | |
| 2217 |   @{index_ML Future.cancel: "'a future -> unit"} \\
 | |
| 2218 |   @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex]
 | |
| 2219 |   @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\
 | |
| 2220 |   @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\
 | |
| 2221 |   \end{mldecls}
 | |
| 2222 | ||
| 2223 |   \begin{description}
 | |
| 2224 | ||
| 2225 |   \item Type @{ML_type "'a future"} represents future values over type
 | |
| 2226 |   @{verbatim "'a"}.
 | |
| 2227 | ||
| 2228 |   \item @{ML Future.fork}~@{text "(fn () => e)"} registers the unevaluated
 | |
| 2229 |   expression @{text e} as unfinished future value, to be evaluated eventually
 | |
| 2230 |   on the parallel worker-thread farm. This is a shorthand for @{ML
 | |
| 2231 | Future.forks} below, with default parameters and a single expression. | |
| 2232 | ||
| 2233 |   \item @{ML Future.forks}~@{text "params exprs"} is the general interface to
 | |
| 2234 |   fork several futures simultaneously. The @{text params} consist of the
 | |
| 2235 | following fields: | |
| 2236 | ||
| 2237 |   \begin{itemize}
 | |
| 2238 | ||
| 2239 |   \item @{text "name : string"} (default @{ML "\"\""}) specifies a common name
 | |
| 2240 | for the tasks of the forked futures, which serves diagnostic purposes. | |
| 2241 | ||
| 2242 |   \item @{text "group : Future.group option"} (default @{ML NONE}) specifies
 | |
| 2243 |   an optional task group for the forked futures. @{ML NONE} means that a new
 | |
| 2244 | sub-group of the current worker-thread task context is created. If this is | |
| 2245 | not a worker thread, the group will be a new root in the group hierarchy. | |
| 2246 | ||
| 2247 |   \item @{text "deps : Future.task list"} (default @{ML "[]"}) specifies
 | |
| 2248 | dependencies on other future tasks, i.e.\ the adjacency relation in the | |
| 57349 | 2249 | global task queue. Dependencies on already finished tasks are ignored. | 
| 57348 | 2250 | |
| 2251 |   \item @{text "pri : int"} (default @{ML 0}) specifies a priority within the
 | |
| 2252 | task queue. | |
| 2253 | ||
| 2254 |   Typically there is only little deviation from the default priority @{ML 0}.
 | |
| 2255 |   As a rule of thumb, @{ML "~1"} means ``low priority" and @{ML 1} means
 | |
| 2256 | ``high priority''. | |
| 2257 | ||
| 57349 | 2258 | Note that the task priority only affects the position in the queue, not the | 
| 57348 | 2259 | thread priority. When a worker thread picks up a task for processing, it | 
| 2260 | runs with the normal thread priority to the end (or until canceled). Higher | |
| 2261 | priority tasks that are queued later need to wait until this (or another) | |
| 2262 | worker thread becomes free again. | |
| 2263 | ||
| 2264 |   \item @{text "interrupts : bool"} (default @{ML true}) tells whether the
 | |
| 2265 | worker thread that processes the corresponding task is initially put into | |
| 57349 | 2266 | interruptible state. This state may change again while running, by modifying | 
| 2267 | the thread attributes. | |
| 57348 | 2268 | |
| 2269 | With interrupts disabled, a running future task cannot be canceled. It is | |
| 2270 | the responsibility of the programmer that this special state is retained | |
| 2271 | only briefly. | |
| 2272 | ||
| 2273 |   \end{itemize}
 | |
| 2274 | ||
| 2275 |   \item @{ML Future.join}~@{text x} retrieves the value of an already finished
 | |
| 2276 | future, which may lead to an exception, according to the result of its | |
| 2277 | previous evaluation. | |
| 2278 | ||
| 2279 | For an unfinished future there are several cases depending on the role of | |
| 2280 | the current thread and the status of the future. A non-worker thread waits | |
| 2281 | passively until the future is eventually evaluated. A worker thread | |
| 2282 | temporarily changes its task context and takes over the responsibility to | |
| 57349 | 2283 | evaluate the future expression on the spot. The latter is done in a | 
| 2284 | thread-safe manner: other threads that intend to join the same future need | |
| 2285 | to wait until the ongoing evaluation is finished. | |
| 2286 | ||
| 2287 | Note that excessive use of dynamic dependencies of futures by adhoc joining | |
| 2288 | may lead to bad utilization of CPU cores, due to threads waiting on other | |
| 2289 | threads to finish required futures. The future task farm has a limited | |
| 2290 | amount of replacement threads that continue working on unrelated tasks after | |
| 2291 | some timeout. | |
| 57348 | 2292 | |
| 2293 | Whenever possible, static dependencies of futures should be specified | |
| 57349 | 2294 |   explicitly when forked (see @{text deps} above). Thus the evaluation can
 | 
| 2295 | work from the bottom up, without join conflicts and wait states. | |
| 2296 | ||
| 2297 |   \item @{ML Future.joins}~@{text xs} joins the given list of futures
 | |
| 2298 |   simultaneously, which is more efficient than @{ML "map Future.join"}~@{text
 | |
| 2299 | xs}. | |
| 2300 | ||
| 2301 | Based on the dependency graph of tasks, the current thread takes over the | |
| 2302 | responsibility to evaluate future expressions that are required for the main | |
| 2303 | result, working from the bottom up. Waiting on future results that are | |
| 2304 | presently evaluated on other threads only happens as last resort, when no | |
| 2305 | other unfinished futures are left over. | |
| 2306 | ||
| 2307 |   \item @{ML Future.value}~@{text a} wraps the value @{text a} as finished
 | |
| 2308 | future value, bypassing the worker-thread farm. When joined, it returns | |
| 2309 |   @{text a} without any further evaluation.
 | |
| 2310 | ||
| 2311 | There is very low overhead for this proforma wrapping of strict values as | |
| 57421 | 2312 | futures. | 
| 57348 | 2313 | |
| 2314 |   \item @{ML Future.map}~@{text "f x"} is a fast-path implementation of @{ML
 | |
| 2315 |   Future.fork}~@{text "(fn () => f ("}@{ML Future.join}~@{text "x))"}, which
 | |
| 2316 | avoids the full overhead of the task queue and worker-thread farm as far as | |
| 2317 |   possible. The function @{text f} is supposed to be some trivial
 | |
| 2318 | post-processing or projection of the future result. | |
| 2319 | ||
| 2320 |   \item @{ML Future.cancel}~@{text "x"} cancels the task group of the given
 | |
| 2321 |   future, using @{ML Future.cancel_group} below.
 | |
| 2322 | ||
| 57349 | 2323 |   \item @{ML Future.cancel_group}~@{text "group"} cancels all tasks of the
 | 
| 2324 | given task group for all time. Threads that are presently processing a task | |
| 2325 | of the given group are interrupted: it may take some time until they are | |
| 2326 | actually terminated. Tasks that are queued but not yet processed are | |
| 57348 | 2327 | dequeued and forced into interrupted state. Since the task group is itself | 
| 2328 | invalidated, any further attempt to fork a future that belongs to it will | |
| 2329 | yield a canceled result as well. | |
| 2330 | ||
| 2331 |   \item @{ML Future.promise}~@{text abort} registers a passive future with the
 | |
| 2332 |   given @{text abort} operation: it is invoked when the future task group is
 | |
| 2333 | canceled. | |
| 2334 | ||
| 2335 |   \item @{ML Future.fulfill}~@{text "x a"} finishes the passive future @{text
 | |
| 2336 |   x} by the given value @{text a}. If the promise has already been canceled,
 | |
| 2337 | the attempt to fulfill it causes an exception. | |
| 2338 | ||
| 2339 |   \end{description}
 | |
| 2340 | *} | |
| 2341 | ||
| 2342 | ||
| 47180 | 2343 | end |