| author | huffman | 
| Mon, 24 Mar 2014 14:51:10 -0700 | |
| changeset 56271 | 61b1e3d88e91 | 
| parent 56199 | 8e8d28ed7529 | 
| child 56303 | 4cc3f4db3447 | 
| permissions | -rw-r--r-- | 
| 29755 | 1 | theory "ML" | 
| 2 | imports Base | |
| 3 | begin | |
| 18538 | 4 | |
| 39822 | 5 | chapter {* Isabelle/ML *}
 | 
| 20489 | 6 | |
| 39823 | 7 | text {* Isabelle/ML is best understood as a certain culture based on
 | 
| 8 | Standard ML. Thus it is not a new programming language, but a | |
| 9 | certain way to use SML at an advanced level within the Isabelle | |
| 10 | environment. This covers a variety of aspects that are geared | |
| 11 | towards an efficient and robust platform for applications of formal | |
| 12 | logic with fully foundational proof construction --- according to | |
| 40126 | 13 |   the well-known \emph{LCF principle}.  There is specific
 | 
| 14 | infrastructure with library modules to address the needs of this | |
| 15 | difficult task. For example, the raw parallel programming model of | |
| 16 | Poly/ML is presented as considerably more abstract concept of | |
| 17 |   \emph{future values}, which is then used to augment the inference
 | |
| 18 | kernel, proof interpreter, and theory loader accordingly. | |
| 39823 | 19 | |
| 20 | The main aspects of Isabelle/ML are introduced below. These | |
| 21 | first-hand explanations should help to understand how proper | |
| 22 | Isabelle/ML is to be read and written, and to get access to the | |
| 23 | wealth of experience that is expressed in the source text and its | |
| 24 |   history of changes.\footnote{See
 | |
| 54703 | 25 |   @{url "http://isabelle.in.tum.de/repos/isabelle"} for the full
 | 
| 39823 | 26 | Mercurial history. There are symbolic tags to refer to official | 
| 27 |   Isabelle releases, as opposed to arbitrary \emph{tip} versions that
 | |
| 28 | merely reflect snapshots that are never really up-to-date.} *} | |
| 29 | ||
| 30 | ||
| 39878 | 31 | section {* Style and orthography *}
 | 
| 32 | ||
| 39879 | 33 | text {* The sources of Isabelle/Isar are optimized for
 | 
| 34 |   \emph{readability} and \emph{maintainability}.  The main purpose is
 | |
| 35 | to tell an informed reader what is really going on and how things | |
| 36 | really work. This is a non-trivial aim, but it is supported by a | |
| 37 | certain style of writing Isabelle/ML that has emerged from long | |
| 38 |   years of system development.\footnote{See also the interesting style
 | |
| 39 | guide for OCaml | |
| 54703 | 40 |   @{url "http://caml.inria.fr/resources/doc/guides/guidelines.en.html"}
 | 
| 39878 | 41 | which shares many of our means and ends.} | 
| 42 | ||
| 43 |   The main principle behind any coding style is \emph{consistency}.
 | |
| 39879 | 44 | For a single author of a small program this merely means ``choose | 
| 39878 | 45 | your style and stick to it''. A complex project like Isabelle, with | 
| 39879 | 46 | long years of development and different contributors, requires more | 
| 47 | standardization. A coding style that is changed every few years or | |
| 48 | with every new contributor is no style at all, because consistency | |
| 49 | is quickly lost. Global consistency is hard to achieve, though. | |
| 40126 | 50 | Nonetheless, one should always strive at least for local consistency | 
| 51 | of modules and sub-systems, without deviating from some general | |
| 52 | principles how to write Isabelle/ML. | |
| 39878 | 53 | |
| 40126 | 54 |   In a sense, good coding style is like an \emph{orthography} for the
 | 
| 55 | sources: it helps to read quickly over the text and see through the | |
| 56 | main points, without getting distracted by accidental presentation | |
| 57 | of free-style code. | |
| 39878 | 58 | *} | 
| 59 | ||
| 60 | ||
| 61 | subsection {* Header and sectioning *}
 | |
| 62 | ||
| 39879 | 63 | text {* Isabelle source files have a certain standardized header
 | 
| 64 | format (with precise spacing) that follows ancient traditions | |
| 65 | reaching back to the earliest versions of the system by Larry | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 66 |   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.
 | 
| 39878 | 67 | |
| 68 |   The header includes at least @{verbatim Title} and @{verbatim
 | |
| 69 | Author} entries, followed by a prose description of the purpose of | |
| 70 | the module. The latter can range from a single line to several | |
| 71 | paragraphs of explanations. | |
| 72 | ||
| 73 | The rest of the file is divided into sections, subsections, | |
| 40126 | 74 | subsubsections, paragraphs etc.\ using a simple layout via ML | 
| 75 | comments as follows. | |
| 39878 | 76 | |
| 77 | \begin{verbatim}
 | |
| 78 | (*** section ***) | |
| 79 | ||
| 80 | (** subsection **) | |
| 81 | ||
| 82 | (* subsubsection *) | |
| 83 | ||
| 84 | (*short paragraph*) | |
| 85 | ||
| 86 | (* | |
| 40126 | 87 | long paragraph, | 
| 88 | with more text | |
| 39878 | 89 | *) | 
| 90 | \end{verbatim}
 | |
| 91 | ||
| 92 |   As in regular typography, there is some extra space \emph{before}
 | |
| 93 | section headings that are adjacent to plain text (not other headings | |
| 94 | as in the example above). | |
| 95 | ||
| 96 | \medskip The precise wording of the prose text given in these | |
| 40126 | 97 | headings is chosen carefully to introduce the main theme of the | 
| 39879 | 98 | subsequent formal ML text. | 
| 99 | *} | |
| 100 | ||
| 101 | ||
| 39880 | 102 | subsection {* Naming conventions *}
 | 
| 39879 | 103 | |
| 104 | text {* Since ML is the primary medium to express the meaning of the
 | |
| 105 | source text, naming of ML entities requires special care. | |
| 106 | ||
| 39881 | 107 |   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely
 | 
| 108 | 4, but not more) that are separated by underscore. There are three | |
| 40126 | 109 | variants concerning upper or lower case letters, which are used for | 
| 39881 | 110 | certain ML categories as follows: | 
| 39880 | 111 | |
| 112 | \medskip | |
| 113 |   \begin{tabular}{lll}
 | |
| 114 | variant & example & ML categories \\\hline | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 115 |   lower-case & @{ML_text foo_bar} & values, types, record fields \\
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 116 |   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 117 |   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
 | 
| 39880 | 118 |   \end{tabular}
 | 
| 119 | \medskip | |
| 120 | ||
| 121 | For historical reasons, many capitalized names omit underscores, | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 122 |   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.
 | 
| 47180 | 123 |   Genuine mixed-case names are \emph{not} used, because clear division
 | 
| 40126 | 124 |   of words is essential for readability.\footnote{Camel-case was
 | 
| 125 | invented to workaround the lack of underscore in some early | |
| 126 | non-ASCII character sets. Later it became habitual in some language | |
| 127 | communities that are now strong in numbers.} | |
| 39880 | 128 | |
| 39881 | 129 | A single (capital) character does not count as ``word'' in this | 
| 40126 | 130 | respect: some Isabelle/ML names are suffixed by extra markers like | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 131 |   this: @{ML_text foo_barT}.
 | 
| 39881 | 132 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 133 |   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 134 |   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
 | 
| 39881 | 135 | foo''''} or more. Decimal digits scale better to larger numbers, | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 136 |   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
 | 
| 39880 | 137 | |
| 138 |   \paragraph{Scopes.}  Apart from very basic library modules, ML
 | |
| 139 | structures are not ``opened'', but names are referenced with | |
| 39881 | 140 |   explicit qualification, as in @{ML Syntax.string_of_term} for
 | 
| 39880 | 141 | example. When devising names for structures and their components it | 
| 142 | is important aim at eye-catching compositions of both parts, because | |
| 40126 | 143 | this is how they are seen in the sources and documentation. For the | 
| 144 | same reasons, aliases of well-known library functions should be | |
| 145 | avoided. | |
| 39880 | 146 | |
| 40126 | 147 | Local names of function abstraction or case/let bindings are | 
| 39880 | 148 | typically shorter, sometimes using only rudiments of ``words'', | 
| 149 | while still avoiding cryptic shorthands. An auxiliary function | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 150 |   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is
 | 
| 39880 | 151 | considered bad style. | 
| 152 | ||
| 153 | Example: | |
| 154 | ||
| 155 |   \begin{verbatim}
 | |
| 156 | (* RIGHT *) | |
| 157 | ||
| 158 | fun print_foo ctxt foo = | |
| 159 | let | |
| 39881 | 160 | fun print t = ... Syntax.string_of_term ctxt t ... | 
| 161 | in ... end; | |
| 162 | ||
| 163 | ||
| 164 | (* RIGHT *) | |
| 165 | ||
| 166 | fun print_foo ctxt foo = | |
| 167 | let | |
| 39880 | 168 | val string_of_term = Syntax.string_of_term ctxt; | 
| 169 | fun print t = ... string_of_term t ... | |
| 170 | in ... end; | |
| 171 | ||
| 172 | ||
| 173 | (* WRONG *) | |
| 174 | ||
| 175 | val string_of_term = Syntax.string_of_term; | |
| 176 | ||
| 177 | fun print_foo ctxt foo = | |
| 178 | let | |
| 179 | fun aux t = ... string_of_term ctxt t ... | |
| 180 | in ... end; | |
| 181 | ||
| 182 |   \end{verbatim}
 | |
| 183 | ||
| 184 | ||
| 40126 | 185 |   \paragraph{Specific conventions.} Here are some specific name forms
 | 
| 186 | that occur frequently in the sources. | |
| 39881 | 187 | |
| 188 |   \begin{itemize}
 | |
| 189 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 190 |   \item A function that maps @{ML_text foo} to @{ML_text bar} is
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 191 |   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 192 |   @{ML_text foo2bar}, @{ML_text bar_from_foo}, @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 193 |   bar_for_foo}, or @{ML_text bar4foo}).
 | 
| 39881 | 194 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 195 |   \item The name component @{ML_text legacy} means that the operation
 | 
| 39881 | 196 | is about to be discontinued soon. | 
| 197 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 198 |   \item The name component @{ML_text old} means that this is historic
 | 
| 39881 | 199 | material that might disappear at some later stage. | 
| 200 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 201 |   \item The name component @{ML_text global} means that this works
 | 
| 39881 | 202 | with the background theory instead of the regular local context | 
| 203 |   (\secref{sec:context}), sometimes for historical reasons, sometimes
 | |
| 204 | due a genuine lack of locality of the concept involved, sometimes as | |
| 205 | a fall-back for the lack of a proper context in the application | |
| 206 | code. Whenever there is a non-global variant available, the | |
| 207 | application should be migrated to use it with a proper local | |
| 208 | context. | |
| 209 | ||
| 210 | \item Variables of the main context types of the Isabelle/Isar | |
| 211 |   framework (\secref{sec:context} and \chref{ch:local-theory}) have
 | |
| 40126 | 212 | firm naming conventions as follows: | 
| 39881 | 213 | |
| 214 |   \begin{itemize}
 | |
| 215 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 216 |   \item theories are called @{ML_text thy}, rarely @{ML_text theory}
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 217 |   (never @{ML_text thry})
 | 
| 39881 | 218 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 219 |   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 220 |   context} (never @{ML_text ctx})
 | 
| 39881 | 221 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 222 |   \item generic contexts are called @{ML_text context}, rarely
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 223 |   @{ML_text ctxt}
 | 
| 39881 | 224 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 225 |   \item local theories are called @{ML_text lthy}, except for local
 | 
| 40126 | 226 | theories that are treated as proof context (which is a semantic | 
| 227 | super-type) | |
| 39881 | 228 | |
| 229 |   \end{itemize}
 | |
| 230 | ||
| 231 | Variations with primed or decimal numbers are always possible, as | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 232 |   well as sematic prefixes like @{ML_text foo_thy} or @{ML_text
 | 
| 39881 | 233 | bar_ctxt}, but the base conventions above need to be preserved. | 
| 40126 | 234 | This allows to visualize the their data flow via plain regular | 
| 235 | expressions in the editor. | |
| 39881 | 236 | |
| 40126 | 237 |   \item The main logical entities (\secref{ch:logic}) have established
 | 
| 238 | naming convention as follows: | |
| 39881 | 239 | |
| 240 |   \begin{itemize}
 | |
| 241 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 242 |   \item sorts are called @{ML_text S}
 | 
| 39881 | 243 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 244 |   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 245 |   ty} (never @{ML_text t})
 | 
| 39881 | 246 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 247 |   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 248 |   tm} (never @{ML_text trm})
 | 
| 39881 | 249 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 250 |   \item certified types are called @{ML_text cT}, rarely @{ML_text
 | 
| 39881 | 251 | T}, with variants as for types | 
| 252 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 253 |   \item certified terms are called @{ML_text ct}, rarely @{ML_text
 | 
| 52733 | 254 |   t}, with variants as for terms (never @{ML_text ctrm})
 | 
| 39881 | 255 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 256 |   \item theorems are called @{ML_text th}, or @{ML_text thm}
 | 
| 39881 | 257 | |
| 258 |   \end{itemize}
 | |
| 259 | ||
| 260 | Proper semantic names override these conventions completely. For | |
| 261 | example, the left-hand side of an equation (as a term) can be called | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 262 |   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 263 |   to be a variable can be called @{ML_text v} or @{ML_text x}.
 | 
| 39881 | 264 | |
| 40310 | 265 |   \item Tactics (\secref{sec:tactics}) are sufficiently important to
 | 
| 266 | have specific naming conventions. The name of a basic tactic | |
| 267 |   definition always has a @{ML_text "_tac"} suffix, the subgoal index
 | |
| 268 |   (if applicable) is always called @{ML_text i}, and the goal state
 | |
| 269 |   (if made explicit) is usually called @{ML_text st} instead of the
 | |
| 270 |   somewhat misleading @{ML_text thm}.  Any other arguments are given
 | |
| 271 | before the latter two, and the general context is given first. | |
| 272 | Example: | |
| 273 | ||
| 274 |   \begin{verbatim}
 | |
| 275 | fun my_tac ctxt arg1 arg2 i st = ... | |
| 276 |   \end{verbatim}
 | |
| 277 | ||
| 278 |   Note that the goal state @{ML_text st} above is rarely made
 | |
| 279 | explicit, if tactic combinators (tacticals) are used as usual. | |
| 280 | ||
| 39881 | 281 |   \end{itemize}
 | 
| 39878 | 282 | *} | 
| 283 | ||
| 284 | ||
| 285 | subsection {* General source layout *}
 | |
| 286 | ||
| 287 | text {* The general Isabelle/ML source layout imitates regular
 | |
| 288 | type-setting to some extent, augmented by the requirements for | |
| 289 | deeply nested expressions that are commonplace in functional | |
| 290 | programming. | |
| 291 | ||
| 292 |   \paragraph{Line length} is 80 characters according to ancient
 | |
| 40126 | 293 | standards, but we allow as much as 100 characters (not | 
| 294 |   more).\footnote{Readability requires to keep the beginning of a line
 | |
| 39881 | 295 | in view while watching its end. Modern wide-screen displays do not | 
| 40126 | 296 | change the way how the human brain works. Sources also need to be | 
| 297 | printable on plain paper with reasonable font-size.} The extra 20 | |
| 39880 | 298 | characters acknowledge the space requirements due to qualified | 
| 299 | library references in Isabelle/ML. | |
| 39878 | 300 | |
| 301 |   \paragraph{White-space} is used to emphasize the structure of
 | |
| 302 | expressions, following mostly standard conventions for mathematical | |
| 303 |   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
 | |
| 39879 | 304 | defines positioning of spaces for parentheses, punctuation, and | 
| 305 | infixes as illustrated here: | |
| 39878 | 306 | |
| 307 |   \begin{verbatim}
 | |
| 308 | val x = y + z * (a + b); | |
| 309 | val pair = (a, b); | |
| 310 |   val record = {foo = 1, bar = 2};
 | |
| 311 |   \end{verbatim}
 | |
| 312 | ||
| 39879 | 313 |   Lines are normally broken \emph{after} an infix operator or
 | 
| 314 | punctuation character. For example: | |
| 39878 | 315 | |
| 316 |   \begin{verbatim}
 | |
| 317 | val x = | |
| 318 | a + | |
| 319 | b + | |
| 320 | c; | |
| 321 | ||
| 322 | val tuple = | |
| 323 | (a, | |
| 324 | b, | |
| 325 | c); | |
| 326 |   \end{verbatim}
 | |
| 327 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 328 |   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the
 | 
| 39879 | 329 | start of the line, but punctuation is always at the end. | 
| 39878 | 330 | |
| 331 |   Function application follows the tradition of @{text "\<lambda>"}-calculus,
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 332 |   not informal mathematics.  For example: @{ML_text "f a b"} for a
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 333 |   curried function, or @{ML_text "g (a, b)"} for a tupled function.
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 334 |   Note that the space between @{ML_text g} and the pair @{ML_text
 | 
| 39879 | 335 | "(a, b)"} follows the important principle of | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 336 |   \emph{compositionality}: the layout of @{ML_text "g p"} does not
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 337 |   change when @{ML_text "p"} is refined to the concrete pair
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 338 |   @{ML_text "(a, b)"}.
 | 
| 39878 | 339 | |
| 340 |   \paragraph{Indentation} uses plain spaces, never hard
 | |
| 341 |   tabulators.\footnote{Tabulators were invented to move the carriage
 | |
| 342 | of a type-writer to certain predefined positions. In software they | |
| 343 | could be used as a primitive run-length compression of consecutive | |
| 344 | spaces, but the precise result would depend on non-standardized | |
| 345 | editor configuration.} | |
| 346 | ||
| 39879 | 347 | Each level of nesting is indented by 2 spaces, sometimes 1, very | 
| 40126 | 348 | rarely 4, never 8 or any other odd number. | 
| 39878 | 349 | |
| 39879 | 350 | Indentation follows a simple logical format that only depends on the | 
| 351 | nesting depth, not the accidental length of the text that initiates | |
| 352 | a level of nesting. Example: | |
| 39878 | 353 | |
| 354 |   \begin{verbatim}
 | |
| 39880 | 355 | (* RIGHT *) | 
| 356 | ||
| 39878 | 357 | if b then | 
| 39879 | 358 | expr1_part1 | 
| 359 | expr1_part2 | |
| 39878 | 360 | else | 
| 39879 | 361 | expr2_part1 | 
| 362 | expr2_part2 | |
| 39878 | 363 | |
| 39880 | 364 | |
| 365 | (* WRONG *) | |
| 366 | ||
| 39879 | 367 | if b then expr1_part1 | 
| 368 | expr1_part2 | |
| 369 | else expr2_part1 | |
| 370 | expr2_part2 | |
| 39878 | 371 |   \end{verbatim}
 | 
| 372 | ||
| 373 | The second form has many problems: it assumes a fixed-width font | |
| 39879 | 374 | when viewing the sources, it uses more space on the line and thus | 
| 375 | makes it hard to observe its strict length limit (working against | |
| 39878 | 376 |   \emph{readability}), it requires extra editing to adapt the layout
 | 
| 39879 | 377 | to changes of the initial text (working against | 
| 39878 | 378 |   \emph{maintainability}) etc.
 | 
| 379 | ||
| 39879 | 380 | \medskip For similar reasons, any kind of two-dimensional or tabular | 
| 40126 | 381 | layouts, ASCII-art with lines or boxes of asterisks etc.\ should be | 
| 39879 | 382 | avoided. | 
| 39881 | 383 | |
| 40126 | 384 |   \paragraph{Complex expressions} that consist of multi-clausal
 | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 385 |   function definitions, @{ML_text handle}, @{ML_text case},
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 386 |   @{ML_text let} (and combinations) require special attention.  The
 | 
| 40126 | 387 | syntax of Standard ML is quite ambitious and admits a lot of | 
| 388 | variance that can distort the meaning of the text. | |
| 39881 | 389 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 390 |   Clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 391 |   @{ML_text case} get extra indentation to indicate the nesting
 | 
| 40126 | 392 | clearly. Example: | 
| 39881 | 393 | |
| 394 |   \begin{verbatim}
 | |
| 395 | (* RIGHT *) | |
| 396 | ||
| 397 | fun foo p1 = | |
| 398 | expr1 | |
| 399 | | foo p2 = | |
| 400 | expr2 | |
| 401 | ||
| 402 | ||
| 403 | (* WRONG *) | |
| 404 | ||
| 405 | fun foo p1 = | |
| 406 | expr1 | |
| 407 | | foo p2 = | |
| 408 | expr2 | |
| 409 |   \end{verbatim}
 | |
| 410 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 411 |   Body expressions consisting of @{ML_text case} or @{ML_text let}
 | 
| 39881 | 412 | require care to maintain compositionality, to prevent loss of | 
| 40126 | 413 | logical indentation where it is especially important to see the | 
| 414 | structure of the text. Example: | |
| 39881 | 415 | |
| 416 |   \begin{verbatim}
 | |
| 417 | (* RIGHT *) | |
| 418 | ||
| 419 | fun foo p1 = | |
| 420 | (case e of | |
| 421 | q1 => ... | |
| 422 | | q2 => ...) | |
| 423 | | foo p2 = | |
| 424 | let | |
| 425 | ... | |
| 426 | in | |
| 427 | ... | |
| 428 | end | |
| 429 | ||
| 430 | ||
| 431 | (* WRONG *) | |
| 432 | ||
| 433 | fun foo p1 = case e of | |
| 434 | q1 => ... | |
| 435 | | q2 => ... | |
| 436 | | foo p2 = | |
| 437 | let | |
| 438 | ... | |
| 439 | in | |
| 440 | ... | |
| 441 | end | |
| 442 |   \end{verbatim}
 | |
| 443 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 444 |   Extra parentheses around @{ML_text case} expressions are optional,
 | 
| 40126 | 445 | but help to analyse the nesting based on character matching in the | 
| 446 | editor. | |
| 39881 | 447 | |
| 448 | \medskip There are two main exceptions to the overall principle of | |
| 449 | compositionality in the layout of complex expressions. | |
| 450 | ||
| 451 |   \begin{enumerate}
 | |
| 452 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 453 |   \item @{ML_text "if"} expressions are iterated as if there would be
 | 
| 39881 | 454 | a multi-branch conditional in SML, e.g. | 
| 455 | ||
| 456 |   \begin{verbatim}
 | |
| 457 | (* RIGHT *) | |
| 458 | ||
| 459 | if b1 then e1 | |
| 460 | else if b2 then e2 | |
| 461 | else e3 | |
| 462 |   \end{verbatim}
 | |
| 463 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 464 |   \item @{ML_text fn} abstractions are often layed-out as if they
 | 
| 39881 | 465 | would lack any structure by themselves. This traditional form is | 
| 466 | motivated by the possibility to shift function arguments back and | |
| 40126 | 467 | forth wrt.\ additional combinators. Example: | 
| 39881 | 468 | |
| 469 |   \begin{verbatim}
 | |
| 470 | (* RIGHT *) | |
| 471 | ||
| 472 | fun foo x y = fold (fn z => | |
| 473 | expr) | |
| 474 |   \end{verbatim}
 | |
| 475 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 476 |   Here the visual appearance is that of three arguments @{ML_text x},
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 477 |   @{ML_text y}, @{ML_text z}.
 | 
| 39881 | 478 | |
| 479 |   \end{enumerate}
 | |
| 480 | ||
| 481 | Such weakly structured layout should be use with great care. Here | |
| 40153 | 482 |   are some counter-examples involving @{ML_text let} expressions:
 | 
| 39881 | 483 | |
| 484 |   \begin{verbatim}
 | |
| 485 | (* WRONG *) | |
| 486 | ||
| 487 | fun foo x = let | |
| 488 | val y = ... | |
| 489 | in ... end | |
| 490 | ||
| 41162 | 491 | |
| 492 | (* WRONG *) | |
| 493 | ||
| 40153 | 494 | fun foo x = let | 
| 495 | val y = ... | |
| 496 | in ... end | |
| 497 | ||
| 41162 | 498 | |
| 499 | (* WRONG *) | |
| 500 | ||
| 39881 | 501 | fun foo x = | 
| 502 | let | |
| 503 | val y = ... | |
| 504 | in ... end | |
| 505 |   \end{verbatim}
 | |
| 506 | ||
| 507 | \medskip In general the source layout is meant to emphasize the | |
| 508 | structure of complex language expressions, not to pretend that SML | |
| 509 | had a completely different syntax (say that of Haskell or Java). | |
| 39878 | 510 | *} | 
| 511 | ||
| 512 | ||
| 39823 | 513 | section {* SML embedded into Isabelle/Isar *}
 | 
| 514 | ||
| 39824 | 515 | text {* ML and Isar are intertwined via an open-ended bootstrap
 | 
| 516 | process that provides more and more programming facilities and | |
| 517 | logical content in an alternating manner. Bootstrapping starts from | |
| 518 | the raw environment of existing implementations of Standard ML | |
| 519 | (mainly Poly/ML, but also SML/NJ). | |
| 39823 | 520 | |
| 39824 | 521 | Isabelle/Pure marks the point where the original ML toplevel is | 
| 40126 | 522 | superseded by the Isar toplevel that maintains a uniform context for | 
| 523 |   arbitrary ML values (see also \secref{sec:context}).  This formal
 | |
| 524 | environment holds ML compiler bindings, logical entities, and many | |
| 525 | other things. Raw SML is never encountered again after the initial | |
| 526 | bootstrap of Isabelle/Pure. | |
| 39823 | 527 | |
| 40126 | 528 | Object-logics like Isabelle/HOL are built within the | 
| 529 | Isabelle/ML/Isar environment by introducing suitable theories with | |
| 530 | associated ML modules, either inlined or as separate files. Thus | |
| 531 | Isabelle/HOL is defined as a regular user-space application within | |
| 532 | the Isabelle framework. Further add-on tools can be implemented in | |
| 533 | ML within the Isar context in the same manner: ML is part of the | |
| 534 | standard repertoire of Isabelle, and there is no distinction between | |
| 535 | ``user'' and ``developer'' in this respect. | |
| 39823 | 536 | *} | 
| 537 | ||
| 39824 | 538 | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 539 | subsection {* Isar ML commands *}
 | 
| 39823 | 540 | |
| 40126 | 541 | text {* The primary Isar source language provides facilities to ``open
 | 
| 542 | a window'' to the underlying ML compiler. Especially see the Isar | |
| 51295 | 543 |   commands @{command_ref "ML_file"} and @{command_ref "ML"}: both work the
 | 
| 39824 | 544 | same way, only the source text is provided via a file vs.\ inlined, | 
| 545 | respectively. Apart from embedding ML into the main theory | |
| 546 | definition like that, there are many more commands that refer to ML | |
| 547 |   source, such as @{command_ref setup} or @{command_ref declaration}.
 | |
| 40126 | 548 | Even more fine-grained embedding of ML into Isar is encountered in | 
| 549 |   the proof method @{method_ref tactic}, which refines the pending
 | |
| 550 |   goal state via a given expression of type @{ML_type tactic}.
 | |
| 39824 | 551 | *} | 
| 39823 | 552 | |
| 39824 | 553 | text %mlex {* The following artificial example demonstrates some ML
 | 
| 554 | toplevel declarations within the implicit Isar theory context. This | |
| 555 | is regular functional programming without referring to logical | |
| 556 | entities yet. | |
| 39823 | 557 | *} | 
| 558 | ||
| 559 | ML {*
 | |
| 560 | fun factorial 0 = 1 | |
| 561 | | factorial n = n * factorial (n - 1) | |
| 562 | *} | |
| 563 | ||
| 40126 | 564 | text {* Here the ML environment is already managed by Isabelle, i.e.\
 | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 565 |   the @{ML factorial} function is not yet accessible in the preceding
 | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 566 | paragraph, nor in a different theory that is independent from the | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 567 | current one in the import hierarchy. | 
| 39823 | 568 | |
| 569 | Removing the above ML declaration from the source text will remove | |
| 570 | any trace of this definition as expected. The Isabelle/ML toplevel | |
| 571 |   environment is managed in a \emph{stateless} way: unlike the raw ML
 | |
| 40126 | 572 | toplevel there are no global side-effects involved | 
| 573 |   here.\footnote{Such a stateless compilation environment is also a
 | |
| 574 | prerequisite for robust parallel compilation within independent | |
| 575 | nodes of the implicit theory development graph.} | |
| 39823 | 576 | |
| 40126 | 577 | \medskip The next example shows how to embed ML into Isar proofs, using | 
| 578 |  @{command_ref "ML_prf"} instead of Instead of @{command_ref "ML"}.
 | |
| 579 | As illustrated below, the effect on the ML environment is local to | |
| 580 | the whole proof body, ignoring the block structure. | |
| 581 | *} | |
| 39823 | 582 | |
| 40964 | 583 | notepad | 
| 584 | begin | |
| 39851 | 585 |   ML_prf %"ML" {* val a = 1 *}
 | 
| 40126 | 586 |   {
 | 
| 39851 | 587 |     ML_prf %"ML" {* val b = a + 1 *}
 | 
| 39824 | 588 |   } -- {* Isar block structure ignored by ML environment *}
 | 
| 39851 | 589 |   ML_prf %"ML" {* val c = b + 1 *}
 | 
| 40964 | 590 | end | 
| 39823 | 591 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 592 | text {* By side-stepping the normal scoping rules for Isar proof
 | 
| 40126 | 593 | blocks, embedded ML code can refer to the different contexts and | 
| 594 | manipulate corresponding entities, e.g.\ export a fact from a block | |
| 595 | context. | |
| 39823 | 596 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 597 | \medskip Two further ML commands are useful in certain situations: | 
| 40126 | 598 |   @{command_ref ML_val} and @{command_ref ML_command} are
 | 
| 39824 | 599 |   \emph{diagnostic} in the sense that there is no effect on the
 | 
| 600 | underlying environment, and can thus used anywhere (even outside a | |
| 601 | theory). The examples below produce long strings of digits by | |
| 602 |   invoking @{ML factorial}: @{command ML_val} already takes care of
 | |
| 603 |   printing the ML toplevel result, but @{command ML_command} is silent
 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 604 | so we produce an explicit output message. *} | 
| 39823 | 605 | |
| 606 | ML_val {* factorial 100 *}
 | |
| 607 | ML_command {* writeln (string_of_int (factorial 100)) *}
 | |
| 608 | ||
| 40964 | 609 | notepad | 
| 610 | begin | |
| 52417 | 611 |   ML_val {* factorial 100 *}
 | 
| 39823 | 612 |   ML_command {* writeln (string_of_int (factorial 100)) *}
 | 
| 40964 | 613 | end | 
| 39823 | 614 | |
| 615 | ||
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 616 | subsection {* Compile-time context *}
 | 
| 39823 | 617 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 618 | text {* Whenever the ML compiler is invoked within Isabelle/Isar, the
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 619 | formal context is passed as a thread-local reference variable. Thus | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 620 | ML code may access the theory context during compilation, by reading | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 621 | or writing the (local) theory under construction. Note that such | 
| 40126 | 622 | direct access to the compile-time context is rare. In practice it | 
| 623 | is typically done via some derived ML functions instead. | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 624 | *} | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 625 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 626 | text %mlref {*
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 627 |   \begin{mldecls}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 628 |   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\
 | 
| 40126 | 629 |   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
 | 
| 56199 | 630 |   @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
 | 
| 631 |   @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 632 |   \end{mldecls}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 633 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 634 |   \begin{description}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 635 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 636 |   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 637 | context of the ML toplevel --- at compile time. ML code needs to | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 638 |   take care to refer to @{ML "ML_Context.the_generic_context ()"}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 639 | correctly. Recall that evaluation of a function body is delayed | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 640 | until actual run-time. | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 641 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 642 |   \item @{ML "Context.>>"}~@{text f} applies context transformation
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 643 |   @{text f} to the implicit context of the ML toplevel.
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 644 | |
| 56199 | 645 |   \item @{ML ML_Thms.bind_thms}~@{text "(name, thms)"} stores a list of
 | 
| 39850 | 646 | theorems produced in ML both in the (global) theory context and the | 
| 647 | ML toplevel, associating it with the provided name. Theorems are | |
| 648 | put into a global ``standard'' format before being stored. | |
| 649 | ||
| 56199 | 650 |   \item @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to a
 | 
| 40126 | 651 | singleton fact. | 
| 39850 | 652 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 653 |   \end{description}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 654 | |
| 40126 | 655 | It is important to note that the above functions are really | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 656 | restricted to the compile time, even though the ML compiler is | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 657 | invoked at run-time. The majority of ML code either uses static | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 658 |   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 659 | proof context at run-time, by explicit functional abstraction. | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 660 | *} | 
| 39823 | 661 | |
| 662 | ||
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 663 | subsection {* Antiquotations \label{sec:ML-antiq} *}
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 664 | |
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 665 | text {* A very important consequence of embedding SML into Isar is the
 | 
| 40126 | 666 |   concept of \emph{ML antiquotation}.  The standard token language of
 | 
| 667 | ML is augmented by special syntactic entities of the following form: | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 668 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 669 |   @{rail \<open>
 | 
| 53167 | 670 |   @{syntax_def antiquote}: '@{' nameref args '}'
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 671 | \<close>} | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 672 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 673 |   Here @{syntax nameref} and @{syntax args} are regular outer syntax
 | 
| 40126 | 674 |   categories \cite{isabelle-isar-ref}.  Attributes and proof methods
 | 
| 675 | use similar syntax. | |
| 39823 | 676 | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 677 |   \medskip A regular antiquotation @{text "@{name args}"} processes
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 678 | its arguments by the usual means of the Isar source language, and | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 679 | produces corresponding ML source text, either as literal | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 680 |   \emph{inline} text (e.g. @{text "@{term t}"}) or abstract
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 681 |   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 682 | scheme allows to refer to formal entities in a robust manner, with | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 683 | proper static scoping and with some degree of logical checking of | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 684 | small portions of the code. | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 685 | *} | 
| 39823 | 686 | |
| 39835 | 687 | |
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 688 | subsection {* Printing ML values *}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 689 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 690 | text {* The ML compiler knows about the structure of values according
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 691 | to their static type, and can print them in the manner of the | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 692 | toplevel loop, although the details are non-portable. The | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 693 |   antiquotation @{ML_antiquotation_def "make_string"} provides a
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 694 | quasi-portable way to refer to this potential capability of the | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 695 | underlying ML system in generic Isabelle/ML sources. *} | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 696 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 697 | text %mlantiq {*
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 698 |   \begin{matharray}{rcl}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 699 |   @{ML_antiquotation_def "make_string"} & : & @{text ML_antiquotation} \\
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 700 |   \end{matharray}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 701 | |
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 702 |   @{rail \<open>
 | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 703 |   @@{ML_antiquotation make_string}
 | 
| 55112 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 wenzelm parents: 
54703diff
changeset | 704 | \<close>} | 
| 51636 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 705 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 706 |   \begin{description}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 707 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 708 |   \item @{text "@{make_string}"} inlines a function to print arbitrary
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 709 | values similar to the ML toplevel. The result is compiler dependent | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 710 | and may fall back on "?" in certain situations. | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 711 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 712 |   \end{description}
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 713 | *} | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 714 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 715 | text %mlex {* The following artificial example shows how to produce
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 716 | ad-hoc output of ML values for debugging purposes. This should not | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 717 | be done in production code, and not persist in regular Isabelle/ML | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 718 | sources. *} | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 719 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 720 | ML {*
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 721 | val x = 42; | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 722 | val y = true; | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 723 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 724 |   writeln (@{make_string} {x = x, y = y});
 | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 725 | *} | 
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 726 | |
| 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 wenzelm parents: 
51295diff
changeset | 727 | |
| 39883 | 728 | section {* Canonical argument order \label{sec:canonical-argument-order} *}
 | 
| 729 | ||
| 730 | text {* Standard ML is a language in the tradition of @{text
 | |
| 731 |   "\<lambda>"}-calculus and \emph{higher-order functional programming},
 | |
| 732 | similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical | |
| 733 | languages. Getting acquainted with the native style of representing | |
| 734 | functions in that setting can save a lot of extra boiler-plate of | |
| 735 | redundant shuffling of arguments, auxiliary abstractions etc. | |
| 736 | ||
| 40126 | 737 |   Functions are usually \emph{curried}: the idea of turning arguments
 | 
| 738 |   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of
 | |
| 739 |   type @{text "\<tau>"} is represented by the iterated function space
 | |
| 740 |   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known
 | |
| 741 |   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried
 | |
| 742 |   version fits more smoothly into the basic calculus.\footnote{The
 | |
| 743 | difference is even more significant in higher-order logic, because | |
| 744 | the redundant tuple structure needs to be accommodated by formal | |
| 745 | reasoning.} | |
| 39883 | 746 | |
| 747 |   Currying gives some flexiblity due to \emph{partial application}.  A
 | |
| 53071 | 748 |   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}
 | 
| 40126 | 749 |   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function
 | 
| 39883 | 750 | etc. How well this works in practice depends on the order of | 
| 751 | arguments. In the worst case, arguments are arranged erratically, | |
| 752 | and using a function in a certain situation always requires some | |
| 753 | glue code. Thus we would get exponentially many oppurtunities to | |
| 754 | decorate the code with meaningless permutations of arguments. | |
| 755 | ||
| 756 |   This can be avoided by \emph{canonical argument order}, which
 | |
| 40126 | 757 | observes certain standard patterns and minimizes adhoc permutations | 
| 40229 | 758 | in their application. In Isabelle/ML, large portions of text can be | 
| 52416 | 759 |   written without auxiliary operations like @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times>
 | 
| 760 |   \<alpha>"} or @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} (the latter not
 | |
| 761 | present in the Isabelle/ML library). | |
| 39883 | 762 | |
| 763 | \medskip The basic idea is that arguments that vary less are moved | |
| 764 | further to the left than those that vary more. Two particularly | |
| 765 |   important categories of functions are \emph{selectors} and
 | |
| 766 |   \emph{updates}.
 | |
| 767 | ||
| 768 | The subsequent scheme is based on a hypothetical set-like container | |
| 769 |   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both
 | |
| 770 | the names and types of the associated operations are canonical for | |
| 771 | Isabelle/ML. | |
| 772 | ||
| 52416 | 773 |   \begin{center}
 | 
| 39883 | 774 |   \begin{tabular}{ll}
 | 
| 775 | kind & canonical name and type \\\hline | |
| 776 |   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\
 | |
| 777 |   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\
 | |
| 778 |   \end{tabular}
 | |
| 52416 | 779 |   \end{center}
 | 
| 39883 | 780 | |
| 781 |   Given a container @{text "B: \<beta>"}, the partially applied @{text
 | |
| 782 |   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and
 | |
| 783 | thus represents the intended denotation directly. It is customary | |
| 784 | to pass the abstract predicate to further operations, not the | |
| 785 | concrete container. The argument order makes it easy to use other | |
| 786 |   combinators: @{text "forall (member B) list"} will check a list of
 | |
| 787 |   elements for membership in @{text "B"} etc. Often the explicit
 | |
| 40126 | 788 |   @{text "list"} is pointless and can be contracted to @{text "forall
 | 
| 789 | (member B)"} to get directly a predicate again. | |
| 39883 | 790 | |
| 40126 | 791 | In contrast, an update operation varies the container, so it moves | 
| 39883 | 792 |   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to
 | 
| 793 |   insert a value @{text "a"}.  These can be composed naturally as
 | |
| 40126 | 794 |   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward
 | 
| 40229 | 795 | inversion of the composition order is due to conventional | 
| 40126 | 796 | mathematical notation, which can be easily amended as explained | 
| 797 | below. | |
| 39883 | 798 | *} | 
| 799 | ||
| 800 | ||
| 801 | subsection {* Forward application and composition *}
 | |
| 802 | ||
| 803 | text {* Regular function application and infix notation works best for
 | |
| 804 |   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g
 | |
| 40126 | 805 |   z)"}.  The important special case of \emph{linear transformation}
 | 
| 806 |   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This
 | |
| 807 | becomes hard to read and maintain if the functions are themselves | |
| 808 | given as complex expressions. The notation can be significantly | |
| 39883 | 809 |   improved by introducing \emph{forward} versions of application and
 | 
| 810 | composition as follows: | |
| 811 | ||
| 812 | \medskip | |
| 813 |   \begin{tabular}{lll}
 | |
| 814 |   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\
 | |
| 41162 | 815 |   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\
 | 
| 39883 | 816 |   \end{tabular}
 | 
| 817 | \medskip | |
| 818 | ||
| 819 |   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or
 | |
| 820 |   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text
 | |
| 821 | "x"}. | |
| 822 | ||
| 823 | \medskip There is an additional set of combinators to accommodate | |
| 824 | multiple results (via pairs) that are passed on as multiple | |
| 825 | arguments (via currying). | |
| 826 | ||
| 827 | \medskip | |
| 828 |   \begin{tabular}{lll}
 | |
| 829 |   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\
 | |
| 41162 | 830 |   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\
 | 
| 39883 | 831 |   \end{tabular}
 | 
| 832 | \medskip | |
| 833 | *} | |
| 834 | ||
| 835 | text %mlref {*
 | |
| 836 |   \begin{mldecls}
 | |
| 46262 | 837 |   @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
 | 
| 838 |   @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
 | |
| 839 |   @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
 | |
| 840 |   @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
 | |
| 39883 | 841 |   \end{mldecls}
 | 
| 842 | *} | |
| 843 | ||
| 844 | ||
| 845 | subsection {* Canonical iteration *}
 | |
| 846 | ||
| 847 | text {* As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be
 | |
| 40126 | 848 |   understood as update on a configuration of type @{text "\<beta>"},
 | 
| 39883 | 849 |   parametrized by arguments of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}
 | 
| 850 |   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates
 | |
| 851 |   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a
 | |
| 53071 | 852 |   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f a\<^sub>n"}.
 | 
| 853 |   The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.
 | |
| 39883 | 854 |   It can be applied to an initial configuration @{text "b: \<beta>"} to
 | 
| 855 |   start the iteration over the given list of arguments: each @{text
 | |
| 856 |   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a
 | |
| 857 | cumulative configuration. | |
| 858 | ||
| 859 |   The @{text fold} combinator in Isabelle/ML lifts a function @{text
 | |
| 860 | "f"} as above to its iterated version over a list of arguments. | |
| 861 |   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates
 | |
| 862 | over a list of lists as expected. | |
| 863 | ||
| 864 |   The variant @{text "fold_rev"} works inside-out over the list of
 | |
| 865 |   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.
 | |
| 866 | ||
| 867 |   The @{text "fold_map"} combinator essentially performs @{text
 | |
| 868 |   "fold"} and @{text "map"} simultaneously: each application of @{text
 | |
| 869 | "f"} produces an updated configuration together with a side-result; | |
| 870 | the iteration collects all such side-results as a separate list. | |
| 871 | *} | |
| 872 | ||
| 873 | text %mlref {*
 | |
| 874 |   \begin{mldecls}
 | |
| 875 |   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | |
| 876 |   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | |
| 877 |   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
 | |
| 878 |   \end{mldecls}
 | |
| 879 | ||
| 880 |   \begin{description}
 | |
| 881 | ||
| 882 |   \item @{ML fold}~@{text f} lifts the parametrized update function
 | |
| 883 |   @{text "f"} to a list of parameters.
 | |
| 884 | ||
| 885 |   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text
 | |
| 886 | "f"}, but works inside-out. | |
| 887 | ||
| 888 |   \item @{ML fold_map}~@{text "f"} lifts the parametrized update
 | |
| 889 |   function @{text "f"} (with side-result) to a list of parameters and
 | |
| 890 | cumulative side-results. | |
| 891 | ||
| 892 |   \end{description}
 | |
| 893 | ||
| 894 |   \begin{warn}
 | |
| 895 | The literature on functional programming provides a multitude of | |
| 896 |   combinators called @{text "foldl"}, @{text "foldr"} etc.  SML97
 | |
| 897 |   provides its own variations as @{ML List.foldl} and @{ML
 | |
| 40126 | 898 | List.foldr}, while the classic Isabelle library also has the | 
| 899 |   historic @{ML Library.foldl} and @{ML Library.foldr}.  To avoid
 | |
| 52416 | 900 | unnecessary complication and confusion, all these historical | 
| 901 |   versions should be ignored, and @{ML fold} (or @{ML fold_rev}) used
 | |
| 902 | exclusively. | |
| 39883 | 903 |   \end{warn}
 | 
| 904 | *} | |
| 905 | ||
| 906 | text %mlex {* The following example shows how to fill a text buffer
 | |
| 907 | incrementally by adding strings, either individually or from a given | |
| 908 | list. | |
| 909 | *} | |
| 910 | ||
| 911 | ML {*
 | |
| 912 | val s = | |
| 913 | Buffer.empty | |
| 914 | |> Buffer.add "digits: " | |
| 915 | |> fold (Buffer.add o string_of_int) (0 upto 9) | |
| 916 | |> Buffer.content; | |
| 917 | ||
| 918 |   @{assert} (s = "digits: 0123456789");
 | |
| 919 | *} | |
| 920 | ||
| 921 | text {* Note how @{ML "fold (Buffer.add o string_of_int)"} above saves
 | |
| 922 |   an extra @{ML "map"} over the given list.  This kind of peephole
 | |
| 923 | optimization reduces both the code size and the tree structures in | |
| 52416 | 924 | memory (``deforestation''), but it requires some practice to read | 
| 925 | and write fluently. | |
| 39883 | 926 | |
| 40126 | 927 | \medskip The next example elaborates the idea of canonical | 
| 928 | iteration, demonstrating fast accumulation of tree content using a | |
| 929 | text buffer. | |
| 39883 | 930 | *} | 
| 931 | ||
| 932 | ML {*
 | |
| 933 | datatype tree = Text of string | Elem of string * tree list; | |
| 934 | ||
| 935 | fun slow_content (Text txt) = txt | |
| 936 | | slow_content (Elem (name, ts)) = | |
| 937 | "<" ^ name ^ ">" ^ | |
| 938 | implode (map slow_content ts) ^ | |
| 939 | "</" ^ name ^ ">" | |
| 940 | ||
| 941 | fun add_content (Text txt) = Buffer.add txt | |
| 942 | | add_content (Elem (name, ts)) = | |
| 943 |         Buffer.add ("<" ^ name ^ ">") #>
 | |
| 944 | fold add_content ts #> | |
| 945 |         Buffer.add ("</" ^ name ^ ">");
 | |
| 946 | ||
| 947 | fun fast_content tree = | |
| 948 | Buffer.empty |> add_content tree |> Buffer.content; | |
| 949 | *} | |
| 950 | ||
| 951 | text {* The slow part of @{ML slow_content} is the @{ML implode} of
 | |
| 952 | the recursive results, because it copies previously produced strings | |
| 40126 | 953 | again. | 
| 39883 | 954 | |
| 955 |   The incremental @{ML add_content} avoids this by operating on a
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 956 |   buffer that is passed through in a linear fashion.  Using @{ML_text
 | 
| 40126 | 957 | "#>"} and contraction over the actual buffer argument saves some | 
| 958 |   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}
 | |
| 959 | invocations with concatenated strings could have been split into | |
| 960 | smaller parts, but this would have obfuscated the source without | |
| 961 | making a big difference in allocations. Here we have done some | |
| 39883 | 962 | peephole-optimization for the sake of readability. | 
| 963 | ||
| 964 |   Another benefit of @{ML add_content} is its ``open'' form as a
 | |
| 40126 | 965 | function on buffers that can be continued in further linear | 
| 966 | transformations, folding etc. Thus it is more compositional than | |
| 967 |   the naive @{ML slow_content}.  As realistic example, compare the
 | |
| 968 |   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer
 | |
| 969 |   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
 | |
| 39883 | 970 | |
| 40126 | 971 |   Note that @{ML fast_content} above is only defined as example.  In
 | 
| 972 | many practical situations, it is customary to provide the | |
| 973 |   incremental @{ML add_content} only and leave the initialization and
 | |
| 974 | termination to the concrete application by the user. | |
| 39883 | 975 | *} | 
| 976 | ||
| 977 | ||
| 39854 | 978 | section {* Message output channels \label{sec:message-channels} *}
 | 
| 39835 | 979 | |
| 980 | text {* Isabelle provides output channels for different kinds of
 | |
| 981 | messages: regular output, high-volume tracing information, warnings, | |
| 982 | and errors. | |
| 983 | ||
| 984 | Depending on the user interface involved, these messages may appear | |
| 985 | in different text styles or colours. The standard output for | |
| 986 |   terminal sessions prefixes each line of warnings by @{verbatim
 | |
| 987 |   "###"} and errors by @{verbatim "***"}, but leaves anything else
 | |
| 988 | unchanged. | |
| 989 | ||
| 990 | Messages are associated with the transaction context of the running | |
| 991 | Isar command. This enables the front-end to manage commands and | |
| 992 | resulting messages together. For example, after deleting a command | |
| 993 | from a given theory document version, the corresponding message | |
| 39872 | 994 | output can be retracted from the display. | 
| 995 | *} | |
| 39835 | 996 | |
| 997 | text %mlref {*
 | |
| 998 |   \begin{mldecls}
 | |
| 999 |   @{index_ML writeln: "string -> unit"} \\
 | |
| 1000 |   @{index_ML tracing: "string -> unit"} \\
 | |
| 1001 |   @{index_ML warning: "string -> unit"} \\
 | |
| 1002 |   @{index_ML error: "string -> 'a"} \\
 | |
| 1003 |   \end{mldecls}
 | |
| 1004 | ||
| 1005 |   \begin{description}
 | |
| 1006 | ||
| 1007 |   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular
 | |
| 1008 | message. This is the primary message output operation of Isabelle | |
| 1009 | and should be used by default. | |
| 1010 | ||
| 1011 |   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special
 | |
| 1012 | tracing message, indicating potential high-volume output to the | |
| 1013 | front-end (hundreds or thousands of messages issued by a single | |
| 1014 | command). The idea is to allow the user-interface to downgrade the | |
| 1015 | quality of message display to achieve higher throughput. | |
| 1016 | ||
| 1017 | Note that the user might have to take special actions to see tracing | |
| 1018 | output, e.g.\ switch to a different output window. So this channel | |
| 1019 | should not be used for regular output. | |
| 1020 | ||
| 1021 |   \item @{ML warning}~@{text "text"} outputs @{text "text"} as
 | |
| 1022 | warning, which typically means some extra emphasis on the front-end | |
| 40126 | 1023 | side (color highlighting, icons, etc.). | 
| 39835 | 1024 | |
| 1025 |   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text
 | |
| 1026 |   "text"} and thus lets the Isar toplevel print @{text "text"} on the
 | |
| 1027 | error channel, which typically means some extra emphasis on the | |
| 40126 | 1028 | front-end side (color highlighting, icons, etc.). | 
| 39835 | 1029 | |
| 1030 | This assumes that the exception is not handled before the command | |
| 1031 |   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a
 | |
| 1032 | perfectly legal alternative: it means that the error is absorbed | |
| 1033 | without any message output. | |
| 1034 | ||
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1035 |   \begin{warn}
 | 
| 54387 | 1036 |   The actual error channel is accessed via @{ML Output.error_message}, but
 | 
| 51058 | 1037 |   the old interaction protocol of Proof~General \emph{crashes} if that
 | 
| 39835 | 1038 | function is used in regular ML code: error output and toplevel | 
| 52416 | 1039 | command failure always need to coincide in classic TTY interaction. | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1040 |   \end{warn}
 | 
| 39835 | 1041 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1042 |   \end{description}
 | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1043 | |
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1044 |   \begin{warn}
 | 
| 39835 | 1045 | Regular Isabelle/ML code should output messages exclusively by the | 
| 1046 |   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}
 | |
| 1047 |   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in
 | |
| 1048 | the presence of parallel and asynchronous processing of Isabelle | |
| 1049 | theories. Such raw output might be displayed by the front-end in | |
| 1050 | some system console log, with a low chance that the user will ever | |
| 1051 | see it. Moreover, as a genuine side-effect on global process | |
| 1052 | channels, there is no proper way to retract output when Isar command | |
| 40126 | 1053 | transactions are reset by the system. | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1054 |   \end{warn}
 | 
| 39872 | 1055 | |
| 1056 |   \begin{warn}
 | |
| 1057 | The message channels should be used in a message-oriented manner. | |
| 40126 | 1058 | This means that multi-line output that logically belongs together is | 
| 1059 |   issued by a \emph{single} invocation of @{ML writeln} etc.\ with the
 | |
| 1060 | functional concatenation of all message constituents. | |
| 39872 | 1061 |   \end{warn}
 | 
| 1062 | *} | |
| 1063 | ||
| 1064 | text %mlex {* The following example demonstrates a multi-line
 | |
| 1065 | warning. Note that in some situations the user sees only the first | |
| 1066 | line, so the most important point should be made first. | |
| 1067 | *} | |
| 1068 | ||
| 1069 | ML_command {*
 | |
| 1070 | warning (cat_lines | |
| 1071 | ["Beware the Jabberwock, my son!", | |
| 1072 | "The jaws that bite, the claws that catch!", | |
| 1073 | "Beware the Jubjub Bird, and shun", | |
| 1074 | "The frumious Bandersnatch!"]); | |
| 39835 | 1075 | *} | 
| 1076 | ||
| 39854 | 1077 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1078 | section {* Exceptions \label{sec:exceptions} *}
 | 
| 39854 | 1079 | |
| 1080 | text {* The Standard ML semantics of strict functional evaluation
 | |
| 1081 | together with exceptions is rather well defined, but some delicate | |
| 1082 | points need to be observed to avoid that ML programs go wrong | |
| 1083 | despite static type-checking. Exceptions in Isabelle/ML are | |
| 1084 | subsequently categorized as follows. | |
| 1085 | ||
| 1086 |   \paragraph{Regular user errors.}  These are meant to provide
 | |
| 1087 | informative feedback about malformed input etc. | |
| 1088 | ||
| 1089 |   The \emph{error} function raises the corresponding \emph{ERROR}
 | |
| 1090 |   exception, with a plain text message as argument.  \emph{ERROR}
 | |
| 1091 | exceptions can be handled internally, in order to be ignored, turned | |
| 1092 | into other exceptions, or cascaded by appending messages. If the | |
| 1093 |   corresponding Isabelle/Isar command terminates with an \emph{ERROR}
 | |
| 39855 | 1094 | exception state, the toplevel will print the result on the error | 
| 1095 |   channel (see \secref{sec:message-channels}).
 | |
| 39854 | 1096 | |
| 1097 | It is considered bad style to refer to internal function names or | |
| 1098 | values in ML source notation in user error messages. | |
| 1099 | ||
| 1100 | Grammatical correctness of error messages can be improved by | |
| 1101 |   \emph{omitting} final punctuation: messages are often concatenated
 | |
| 1102 | or put into a larger context (e.g.\ augmented with source position). | |
| 1103 | By not insisting in the final word at the origin of the error, the | |
| 1104 | system can perform its administrative tasks more easily and | |
| 1105 | robustly. | |
| 1106 | ||
| 1107 |   \paragraph{Program failures.}  There is a handful of standard
 | |
| 1108 | exceptions that indicate general failure situations, or failures of | |
| 1109 | core operations on logical entities (types, terms, theorems, | |
| 39856 | 1110 |   theories, see \chref{ch:logic}).
 | 
| 39854 | 1111 | |
| 1112 | These exceptions indicate a genuine breakdown of the program, so the | |
| 1113 | main purpose is to determine quickly what has happened where. | |
| 39855 | 1114 | Traditionally, the (short) exception message would include the name | 
| 40126 | 1115 | of an ML function, although this is no longer necessary, because the | 
| 1116 | ML runtime system prints a detailed source position of the | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1117 |   corresponding @{ML_text raise} keyword.
 | 
| 39854 | 1118 | |
| 1119 | \medskip User modules can always introduce their own custom | |
| 1120 | exceptions locally, e.g.\ to organize internal failures robustly | |
| 1121 | without overlapping with existing exceptions. Exceptions that are | |
| 1122 | exposed in module signatures require extra care, though, and should | |
| 40126 | 1123 |   \emph{not} be introduced by default.  Surprise by users of a module
 | 
| 1124 | can be often minimized by using plain user errors instead. | |
| 39854 | 1125 | |
| 1126 |   \paragraph{Interrupts.}  These indicate arbitrary system events:
 | |
| 1127 | both the ML runtime system and the Isabelle/ML infrastructure signal | |
| 1128 | various exceptional situations by raising the special | |
| 1129 |   \emph{Interrupt} exception in user code.
 | |
| 1130 | ||
| 1131 | This is the one and only way that physical events can intrude an | |
| 1132 | Isabelle/ML program. Such an interrupt can mean out-of-memory, | |
| 1133 | stack overflow, timeout, internal signaling of threads, or the user | |
| 39855 | 1134 | producing a console interrupt manually etc. An Isabelle/ML program | 
| 1135 | that intercepts interrupts becomes dependent on physical effects of | |
| 1136 | the environment. Even worse, exception handling patterns that are | |
| 1137 | too general by accident, e.g.\ by mispelled exception constructors, | |
| 40126 | 1138 | will cover interrupts unintentionally and thus render the program | 
| 39855 | 1139 | semantics ill-defined. | 
| 39854 | 1140 | |
| 1141 | Note that the Interrupt exception dates back to the original SML90 | |
| 1142 | language definition. It was excluded from the SML97 version to | |
| 1143 | avoid its malign impact on ML program semantics, but without | |
| 1144 | providing a viable alternative. Isabelle/ML recovers physical | |
| 40229 | 1145 | interruptibility (which is an indispensable tool to implement | 
| 1146 | managed evaluation of command transactions), but requires user code | |
| 1147 | to be strictly transparent wrt.\ interrupts. | |
| 39854 | 1148 | |
| 1149 |   \begin{warn}
 | |
| 1150 | Isabelle/ML user code needs to terminate promptly on interruption, | |
| 1151 | without guessing at its meaning to the system infrastructure. | |
| 1152 | Temporary handling of interrupts for cleanup of global resources | |
| 1153 | etc.\ needs to be followed immediately by re-raising of the original | |
| 1154 | exception. | |
| 1155 |   \end{warn}
 | |
| 1156 | *} | |
| 1157 | ||
| 39855 | 1158 | text %mlref {*
 | 
| 1159 |   \begin{mldecls}
 | |
| 1160 |   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
 | |
| 1161 |   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
 | |
| 55838 | 1162 |   @{index_ML_exception ERROR: string} \\
 | 
| 1163 |   @{index_ML_exception Fail: string} \\
 | |
| 39856 | 1164 |   @{index_ML Exn.is_interrupt: "exn -> bool"} \\
 | 
| 39855 | 1165 |   @{index_ML reraise: "exn -> 'a"} \\
 | 
| 53709 
84522727f9d3
improved printing of exception trace in Poly/ML 5.5.1;
 wenzelm parents: 
53167diff
changeset | 1166 |   @{index_ML ML_Compiler.exn_trace: "(unit -> 'a) -> 'a"} \\
 | 
| 39855 | 1167 |   \end{mldecls}
 | 
| 1168 | ||
| 1169 |   \begin{description}
 | |
| 1170 | ||
| 1171 |   \item @{ML try}~@{text "f x"} makes the partiality of evaluating
 | |
| 1172 |   @{text "f x"} explicit via the option datatype.  Interrupts are
 | |
| 1173 |   \emph{not} handled here, i.e.\ this form serves as safe replacement
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1174 |   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1175 |   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in
 | 
| 52417 | 1176 | books about SML97, not Isabelle/ML. | 
| 39855 | 1177 | |
| 1178 |   \item @{ML can} is similar to @{ML try} with more abstract result.
 | |
| 1179 | ||
| 39856 | 1180 |   \item @{ML ERROR}~@{text "msg"} represents user errors; this
 | 
| 40126 | 1181 |   exception is normally raised indirectly via the @{ML error} function
 | 
| 1182 |   (see \secref{sec:message-channels}).
 | |
| 39856 | 1183 | |
| 1184 |   \item @{ML Fail}~@{text "msg"} represents general program failures.
 | |
| 1185 | ||
| 1186 |   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without
 | |
| 1187 | mentioning concrete exception constructors in user code. Handled | |
| 1188 | interrupts need to be re-raised promptly! | |
| 1189 | ||
| 39855 | 1190 |   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}
 | 
| 1191 | while preserving its implicit position information (if possible, | |
| 1192 | depending on the ML platform). | |
| 1193 | ||
| 53709 
84522727f9d3
improved printing of exception trace in Poly/ML 5.5.1;
 wenzelm parents: 
53167diff
changeset | 1194 |   \item @{ML ML_Compiler.exn_trace}~@{ML_text "(fn () =>"}~@{text
 | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1195 |   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing
 | 
| 39855 | 1196 | a full trace of its stack of nested exceptions (if possible, | 
| 53739 | 1197 | depending on the ML platform). | 
| 39855 | 1198 | |
| 53709 
84522727f9d3
improved printing of exception trace in Poly/ML 5.5.1;
 wenzelm parents: 
53167diff
changeset | 1199 |   Inserting @{ML ML_Compiler.exn_trace} into ML code temporarily is
 | 
| 
84522727f9d3
improved printing of exception trace in Poly/ML 5.5.1;
 wenzelm parents: 
53167diff
changeset | 1200 | useful for debugging, but not suitable for production code. | 
| 39855 | 1201 | |
| 1202 |   \end{description}
 | |
| 1203 | *} | |
| 1204 | ||
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1205 | text %mlantiq {*
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1206 |   \begin{matharray}{rcl}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1207 |   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1208 |   \end{matharray}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1209 | |
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1210 |   \begin{description}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1211 | |
| 40110 | 1212 |   \item @{text "@{assert}"} inlines a function
 | 
| 1213 |   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is
 | |
| 1214 |   @{ML false}.  Due to inlining the source position of failed
 | |
| 1215 | assertions is included in the error output. | |
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1216 | |
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1217 |   \end{description}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1218 | *} | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1219 | |
| 39859 | 1220 | |
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1221 | section {* Strings of symbols \label{sec:symbols} *}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1222 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1223 | text {* A \emph{symbol} constitutes the smallest textual unit in
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1224 | Isabelle/ML --- raw ML characters are normally not encountered at | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1225 | all! Isabelle strings consist of a sequence of symbols, represented | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1226 | as a packed string or an exploded list of strings. Each symbol is | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1227 | in itself a small string, which has either one of the following | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1228 | forms: | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1229 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1230 |   \begin{enumerate}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1231 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1232 |   \item a single ASCII character ``@{text "c"}'', for example
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1233 | ``\verb,a,'', | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1234 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1235 | \item a codepoint according to UTF8 (non-ASCII byte sequence), | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1236 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1237 |   \item a regular symbol ``\verb,\,\verb,<,@{text "ident"}\verb,>,'',
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1238 | for example ``\verb,\,\verb,<alpha>,'', | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1239 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1240 |   \item a control symbol ``\verb,\,\verb,<^,@{text "ident"}\verb,>,'',
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1241 | for example ``\verb,\,\verb,<^bold>,'', | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1242 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1243 |   \item a raw symbol ``\verb,\,\verb,<^raw:,@{text text}\verb,>,''
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1244 |   where @{text text} consists of printable characters excluding
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1245 | ``\verb,.,'' and ``\verb,>,'', for example | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1246 |   ``\verb,\,\verb,<^raw:$\sum_{i = 1}^n$>,'',
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1247 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1248 |   \item a numbered raw control symbol ``\verb,\,\verb,<^raw,@{text
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1249 |   n}\verb,>, where @{text n} consists of digits, for example
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1250 | ``\verb,\,\verb,<^raw42>,''. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1251 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1252 |   \end{enumerate}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1253 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1254 |   The @{text "ident"} syntax for symbol names is @{text "letter
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1255 |   (letter | digit)\<^sup>*"}, where @{text "letter = A..Za..z"} and @{text
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1256 | "digit = 0..9"}. There are infinitely many regular symbols and | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1257 | control symbols, but a fixed collection of standard symbols is | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1258 | treated specifically. For example, ``\verb,\,\verb,<alpha>,'' is | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1259 | classified as a letter, which means it may occur within regular | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1260 | Isabelle identifiers. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1261 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1262 | The character set underlying Isabelle symbols is 7-bit ASCII, but | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1263 | 8-bit character sequences are passed-through unchanged. Unicode/UCS | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1264 | data in UTF-8 encoding is processed in a non-strict fashion, such | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1265 | that well-formed code sequences are recognized | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1266 |   accordingly.\footnote{Note that ISO-Latin-1 differs from UTF-8 only
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1267 | in some special punctuation characters that even have replacements | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1268 | within the standard collection of Isabelle symbols. Text consisting | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1269 | of ASCII plus accented letters can be processed in either encoding.} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1270 | Unicode provides its own collection of mathematical symbols, but | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1271 | within the core Isabelle/ML world there is no link to the standard | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1272 | collection of Isabelle regular symbols. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1273 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1274 | \medskip Output of Isabelle symbols depends on the print mode | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1275 |   \cite{isabelle-isar-ref}.  For example, the standard {\LaTeX}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1276 | setup of the Isabelle document preparation system would present | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1277 |   ``\verb,\,\verb,<alpha>,'' as @{text "\<alpha>"}, and
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1278 |   ``\verb,\,\verb,<^bold>,\verb,\,\verb,<alpha>,'' as @{text "\<^bold>\<alpha>"}.
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1279 | On-screen rendering usually works by mapping a finite subset of | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1280 | Isabelle symbols to suitable Unicode characters. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1281 | *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1282 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1283 | text %mlref {*
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1284 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1285 |   @{index_ML_type "Symbol.symbol": string} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1286 |   @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1287 |   @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1288 |   @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1289 |   @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1290 |   @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1291 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1292 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1293 |   @{index_ML_type "Symbol.sym"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1294 |   @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1295 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1296 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1297 |   \begin{description}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1298 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1299 |   \item Type @{ML_type "Symbol.symbol"} represents individual Isabelle
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1300 | symbols. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1301 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1302 |   \item @{ML "Symbol.explode"}~@{text "str"} produces a symbol list
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1303 |   from the packed form.  This function supersedes @{ML
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1304 | "String.explode"} for virtually all purposes of manipulating text in | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1305 |   Isabelle!\footnote{The runtime overhead for exploded strings is
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1306 | mainly that of the list structure: individual symbols that happen to | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1307 | be a singleton string do not require extra memory in Poly/ML.} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1308 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1309 |   \item @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1310 |   "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1311 | symbols according to fixed syntactic conventions of Isabelle, cf.\ | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1312 |   \cite{isabelle-isar-ref}.
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1313 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1314 |   \item Type @{ML_type "Symbol.sym"} is a concrete datatype that
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1315 | represents the different kinds of symbols explicitly, with | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1316 |   constructors @{ML "Symbol.Char"}, @{ML "Symbol.Sym"}, @{ML
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1317 |   "Symbol.UTF8"}, @{ML "Symbol.Ctrl"}, @{ML "Symbol.Raw"}.
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1318 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1319 |   \item @{ML "Symbol.decode"} converts the string representation of a
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1320 | symbol into the datatype version. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1321 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1322 |   \end{description}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1323 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1324 |   \paragraph{Historical note.} In the original SML90 standard the
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1325 |   primitive ML type @{ML_type char} did not exists, and @{ML_text
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1326 | "explode: string -> string list"} produced a list of singleton | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1327 |   strings like @{ML "raw_explode: string -> string list"} in
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1328 | Isabelle/ML today. When SML97 came out, Isabelle did not adopt its | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1329 | somewhat anachronistic 8-bit or 16-bit characters, but the idea of | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1330 | exploding a string into a list of small strings was extended to | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1331 | ``symbols'' as explained above. Thus Isabelle sources can refer to | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1332 | an infinite store of user-defined symbols, without having to worry | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1333 | about the multitude of Unicode encodings that have emerged over the | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1334 | years. *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1335 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1336 | |
| 39863 | 1337 | section {* Basic data types *}
 | 
| 39859 | 1338 | |
| 40126 | 1339 | text {* The basis library proposal of SML97 needs to be treated with
 | 
| 39859 | 1340 | caution. Many of its operations simply do not fit with important | 
| 1341 | Isabelle/ML conventions (like ``canonical argument order'', see | |
| 40126 | 1342 |   \secref{sec:canonical-argument-order}), others cause problems with
 | 
| 1343 |   the parallel evaluation model of Isabelle/ML (such as @{ML
 | |
| 1344 |   TextIO.print} or @{ML OS.Process.system}).
 | |
| 39859 | 1345 | |
| 1346 | Subsequently we give a brief overview of important operations on | |
| 1347 | basic ML data types. | |
| 1348 | *} | |
| 1349 | ||
| 1350 | ||
| 39863 | 1351 | subsection {* Characters *}
 | 
| 1352 | ||
| 1353 | text %mlref {*
 | |
| 1354 |   \begin{mldecls}
 | |
| 1355 |   @{index_ML_type char} \\
 | |
| 1356 |   \end{mldecls}
 | |
| 1357 | ||
| 1358 |   \begin{description}
 | |
| 1359 | ||
| 39864 | 1360 |   \item Type @{ML_type char} is \emph{not} used.  The smallest textual
 | 
| 40126 | 1361 | unit in Isabelle is represented as a ``symbol'' (see | 
| 39864 | 1362 |   \secref{sec:symbols}).
 | 
| 39863 | 1363 | |
| 1364 |   \end{description}
 | |
| 1365 | *} | |
| 1366 | ||
| 1367 | ||
| 52421 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1368 | subsection {* Strings *}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1369 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1370 | text %mlref {*
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1371 |   \begin{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1372 |   @{index_ML_type string} \\
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1373 |   \end{mldecls}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1374 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1375 |   \begin{description}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1376 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1377 |   \item Type @{ML_type string} represents immutable vectors of 8-bit
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1378 | characters. There are operations in SML to convert back and forth | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1379 | to actual byte vectors, which are seldom used. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1380 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1381 | This historically important raw text representation is used for | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1382 | Isabelle-specific purposes with the following implicit substructures | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1383 | packed into the string content: | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1384 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1385 |   \begin{enumerate}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1386 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1387 |   \item sequence of Isabelle symbols (see also \secref{sec:symbols}),
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1388 |   with @{ML Symbol.explode} as key operation;
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1389 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1390 |   \item XML tree structure via YXML (see also \cite{isabelle-sys}),
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1391 |   with @{ML YXML.parse_body} as key operation.
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1392 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1393 |   \end{enumerate}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1394 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1395 | Note that Isabelle/ML string literals may refer Isabelle symbols | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1396 |   like ``\verb,\,\verb,<alpha>,'' natively, \emph{without} escaping
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1397 | the backslash. This is a consequence of Isabelle treating all | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1398 | source text as strings of symbols, instead of raw characters. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1399 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1400 |   \end{description}
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1401 | *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1402 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1403 | text %mlex {* The subsequent example illustrates the difference of
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1404 | physical addressing of bytes versus logical addressing of symbols in | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1405 | Isabelle strings. | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1406 | *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1407 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1408 | ML_val {*
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1409 | val s = "\<A>"; | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1410 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1411 |   @{assert} (length (Symbol.explode s) = 1);
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1412 |   @{assert} (size s = 4);
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1413 | *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1414 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1415 | text {* Note that in Unicode renderings of the symbol @{text "\<A>"},
 | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1416 | variations of encodings like UTF-8 or UTF-16 pose delicate questions | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1417 | about the multi-byte representations its codepoint, which is outside | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1418 | of the 16-bit address space of the original Unicode standard from | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1419 | the 1990-ies. In Isabelle/ML it is just ``\verb,\,\verb,<A>,'' | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1420 | literally, using plain ASCII characters beyond any doubts. *} | 
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1421 | |
| 
6d93140a206c
clarified strings of symbols, including ML string literals;
 wenzelm parents: 
52420diff
changeset | 1422 | |
| 39862 | 1423 | subsection {* Integers *}
 | 
| 1424 | ||
| 1425 | text %mlref {*
 | |
| 1426 |   \begin{mldecls}
 | |
| 1427 |   @{index_ML_type int} \\
 | |
| 1428 |   \end{mldecls}
 | |
| 1429 | ||
| 1430 |   \begin{description}
 | |
| 1431 | ||
| 39864 | 1432 |   \item Type @{ML_type int} represents regular mathematical integers,
 | 
| 1433 |   which are \emph{unbounded}.  Overflow never happens in
 | |
| 39862 | 1434 |   practice.\footnote{The size limit for integer bit patterns in memory
 | 
| 1435 | is 64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.} | |
| 1436 | This works uniformly for all supported ML platforms (Poly/ML and | |
| 1437 | SML/NJ). | |
| 1438 | ||
| 40126 | 1439 | Literal integers in ML text are forced to be of this one true | 
| 52417 | 1440 | integer type --- adhoc overloading of SML97 is disabled. | 
| 39862 | 1441 | |
| 55837 | 1442 |   Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
 | 
| 1443 |   @{ML_structure Int}.  Structure @{ML_structure Integer} in @{file
 | |
| 39862 | 1444 | "~~/src/Pure/General/integer.ML"} provides some additional | 
| 1445 | operations. | |
| 1446 | ||
| 1447 |   \end{description}
 | |
| 1448 | *} | |
| 1449 | ||
| 1450 | ||
| 40302 | 1451 | subsection {* Time *}
 | 
| 1452 | ||
| 1453 | text %mlref {*
 | |
| 1454 |   \begin{mldecls}
 | |
| 1455 |   @{index_ML_type Time.time} \\
 | |
| 1456 |   @{index_ML seconds: "real -> Time.time"} \\
 | |
| 1457 |   \end{mldecls}
 | |
| 1458 | ||
| 1459 |   \begin{description}
 | |
| 1460 | ||
| 1461 |   \item Type @{ML_type Time.time} represents time abstractly according
 | |
| 1462 | to the SML97 basis library definition. This is adequate for | |
| 1463 | internal ML operations, but awkward in concrete time specifications. | |
| 1464 | ||
| 1465 |   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text
 | |
| 1466 | "s"} (measured in seconds) into an abstract time value. Floating | |
| 52417 | 1467 | point numbers are easy to use as configuration options in the | 
| 1468 |   context (see \secref{sec:config-options}) or system preferences that
 | |
| 1469 | are maintained externally. | |
| 40302 | 1470 | |
| 1471 |   \end{description}
 | |
| 1472 | *} | |
| 1473 | ||
| 1474 | ||
| 39859 | 1475 | subsection {* Options *}
 | 
| 1476 | ||
| 1477 | text %mlref {*
 | |
| 1478 |   \begin{mldecls}
 | |
| 1479 |   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
 | |
| 1480 |   @{index_ML is_some: "'a option -> bool"} \\
 | |
| 1481 |   @{index_ML is_none: "'a option -> bool"} \\
 | |
| 1482 |   @{index_ML the: "'a option -> 'a"} \\
 | |
| 1483 |   @{index_ML these: "'a list option -> 'a list"} \\
 | |
| 1484 |   @{index_ML the_list: "'a option -> 'a list"} \\
 | |
| 1485 |   @{index_ML the_default: "'a -> 'a option -> 'a"} \\
 | |
| 1486 |   \end{mldecls}
 | |
| 1487 | *} | |
| 1488 | ||
| 52417 | 1489 | text {* Apart from @{ML Option.map} most other operations defined in
 | 
| 55837 | 1490 |   structure @{ML_structure Option} are alien to Isabelle/ML an never
 | 
| 52417 | 1491 |   used.  The operations shown above are defined in @{file
 | 
| 1492 | "~~/src/Pure/General/basics.ML"}. *} | |
| 39859 | 1493 | |
| 1494 | ||
| 39863 | 1495 | subsection {* Lists *}
 | 
| 1496 | ||
| 1497 | text {* Lists are ubiquitous in ML as simple and light-weight
 | |
| 1498 | ``collections'' for many everyday programming tasks. Isabelle/ML | |
| 39874 | 1499 | provides important additions and improvements over operations that | 
| 1500 | are predefined in the SML97 library. *} | |
| 39863 | 1501 | |
| 1502 | text %mlref {*
 | |
| 1503 |   \begin{mldecls}
 | |
| 1504 |   @{index_ML cons: "'a -> 'a list -> 'a list"} \\
 | |
| 39874 | 1505 |   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
 | 
| 1506 |   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | |
| 1507 |   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
 | |
| 1508 |   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | |
| 39863 | 1509 |   \end{mldecls}
 | 
| 1510 | ||
| 1511 |   \begin{description}
 | |
| 1512 | ||
| 1513 |   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.
 | |
| 1514 | ||
| 1515 | Tupled infix operators are a historical accident in Standard ML. | |
| 1516 |   The curried @{ML cons} amends this, but it should be only used when
 | |
| 1517 | partial application is required. | |
| 1518 | ||
| 39874 | 1519 |   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat
 | 
| 1520 | lists as a set-like container that maintains the order of elements. | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1521 |   See @{file "~~/src/Pure/library.ML"} for the full specifications
 | 
| 39874 | 1522 | (written in ML). There are some further derived operations like | 
| 1523 |   @{ML union} or @{ML inter}.
 | |
| 1524 | ||
| 1525 |   Note that @{ML insert} is conservative about elements that are
 | |
| 1526 |   already a @{ML member} of the list, while @{ML update} ensures that
 | |
| 40126 | 1527 | the latest entry is always put in front. The latter discipline is | 
| 39874 | 1528 | often more appropriate in declarations of context data | 
| 1529 |   (\secref{sec:context-data}) that are issued by the user in Isar
 | |
| 52417 | 1530 | source: later declarations take precedence over earlier ones. | 
| 39874 | 1531 | |
| 39863 | 1532 |   \end{description}
 | 
| 1533 | *} | |
| 1534 | ||
| 52417 | 1535 | text %mlex {* Using canonical @{ML fold} together with @{ML cons} (or
 | 
| 1536 | similar standard operations) alternates the orientation of data. | |
| 40126 | 1537 | The is quite natural and should not be altered forcible by inserting | 
| 1538 |   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can
 | |
| 1539 | be used in the few situations, where alternation should be | |
| 1540 | prevented. | |
| 39863 | 1541 | *} | 
| 1542 | ||
| 1543 | ML {*
 | |
| 1544 | val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; | |
| 1545 | ||
| 1546 | val list1 = fold cons items []; | |
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1547 |   @{assert} (list1 = rev items);
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1548 | |
| 39863 | 1549 | val list2 = fold_rev cons items []; | 
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1550 |   @{assert} (list2 = items);
 | 
| 39863 | 1551 | *} | 
| 1552 | ||
| 39883 | 1553 | text {* The subsequent example demonstrates how to \emph{merge} two
 | 
| 1554 | lists in a natural way. *} | |
| 1555 | ||
| 1556 | ML {*
 | |
| 1557 | fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs; | |
| 1558 | *} | |
| 1559 | ||
| 1560 | text {* Here the first list is treated conservatively: only the new
 | |
| 1561 | elements from the second list are inserted. The inside-out order of | |
| 1562 |   insertion via @{ML fold_rev} attempts to preserve the order of
 | |
| 1563 | elements in the result. | |
| 1564 | ||
| 1565 | This way of merging lists is typical for context data | |
| 1566 |   (\secref{sec:context-data}).  See also @{ML merge} as defined in
 | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1567 |   @{file "~~/src/Pure/library.ML"}.
 | 
| 39883 | 1568 | *} | 
| 1569 | ||
| 39863 | 1570 | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1571 | subsection {* Association lists *}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1572 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1573 | text {* The operations for association lists interpret a concrete list
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1574 | of pairs as a finite function from keys to values. Redundant | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1575 | representations with multiple occurrences of the same key are | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1576 | implicitly normalized: lookup and update only take the first | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1577 | occurrence into account. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1578 | *} | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1579 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1580 | text {*
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1581 |   \begin{mldecls}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1582 |   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1583 |   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1584 |   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1585 |   \end{mldecls}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1586 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1587 |   \begin{description}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1588 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1589 |   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1590 | implement the main ``framework operations'' for mappings in | 
| 40126 | 1591 | Isabelle/ML, following standard conventions for their names and | 
| 1592 | types. | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1593 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1594 |   Note that a function called @{text lookup} is obliged to express its
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1595 | partiality via an explicit option element. There is no choice to | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1596 | raise an exception, without changing the name to something like | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1597 |   @{text "the_element"} or @{text "get"}.
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1598 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1599 |   The @{text "defined"} operation is essentially a contraction of @{ML
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1600 |   is_some} and @{text "lookup"}, but this is sufficiently frequent to
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1601 | justify its independent existence. This also gives the | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1602 | implementation some opportunity for peep-hole optimization. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1603 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1604 |   \end{description}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1605 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1606 | Association lists are adequate as simple and light-weight | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1607 | implementation of finite mappings in many practical situations. A | 
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1608 |   more heavy-duty table structure is defined in @{file
 | 
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1609 | "~~/src/Pure/General/table.ML"}; that version scales easily to | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1610 | thousands or millions of elements. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1611 | *} | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1612 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1613 | |
| 39859 | 1614 | subsection {* Unsynchronized references *}
 | 
| 1615 | ||
| 1616 | text %mlref {*
 | |
| 1617 |   \begin{mldecls}
 | |
| 39870 | 1618 |   @{index_ML_type "'a Unsynchronized.ref"} \\
 | 
| 39859 | 1619 |   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
 | 
| 1620 |   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
 | |
| 46262 | 1621 |   @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
 | 
| 39859 | 1622 |   \end{mldecls}
 | 
| 1623 | *} | |
| 1624 | ||
| 1625 | text {* Due to ubiquitous parallelism in Isabelle/ML (see also
 | |
| 1626 |   \secref{sec:multi-threading}), the mutable reference cells of
 | |
| 1627 | Standard ML are notorious for causing problems. In a highly | |
| 1628 |   parallel system, both correctness \emph{and} performance are easily
 | |
| 1629 | degraded when using mutable data. | |
| 1630 | ||
| 1631 |   The unwieldy name of @{ML Unsynchronized.ref} for the constructor
 | |
| 1632 | for references in Isabelle/ML emphasizes the inconveniences caused by | |
| 46262 | 1633 |   mutability.  Existing operations @{ML "!"}  and @{ML_op ":="} are
 | 
| 39859 | 1634 | unchanged, but should be used with special precautions, say in a | 
| 1635 | strictly local situation that is guaranteed to be restricted to | |
| 40508 | 1636 | sequential evaluation --- now and in the future. | 
| 1637 | ||
| 1638 |   \begin{warn}
 | |
| 1639 |   Never @{ML_text "open Unsynchronized"}, not even in a local scope!
 | |
| 1640 | Pretending that mutable state is no problem is a very bad idea. | |
| 1641 |   \end{warn}
 | |
| 1642 | *} | |
| 39859 | 1643 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1644 | |
| 39870 | 1645 | section {* Thread-safe programming \label{sec:multi-threading} *}
 | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1646 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1647 | text {* Multi-threaded execution has become an everyday reality in
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1648 | Isabelle since Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides | 
| 39868 | 1649 | implicit and explicit parallelism by default, and there is no way | 
| 1650 | for user-space tools to ``opt out''. ML programs that are purely | |
| 1651 | functional, output messages only via the official channels | |
| 1652 |   (\secref{sec:message-channels}), and do not intercept interrupts
 | |
| 1653 |   (\secref{sec:exceptions}) can participate in the multi-threaded
 | |
| 1654 | environment immediately without further ado. | |
| 1655 | ||
| 1656 | More ambitious tools with more fine-grained interaction with the | |
| 1657 | environment need to observe the principles explained below. | |
| 1658 | *} | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1659 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1660 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1661 | subsection {* Multi-threading with shared memory *}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1662 | |
| 39868 | 1663 | text {* Multiple threads help to organize advanced operations of the
 | 
| 1664 | system, such as real-time conditions on command transactions, | |
| 1665 | sub-components with explicit communication, general asynchronous | |
| 1666 | interaction etc. Moreover, parallel evaluation is a prerequisite to | |
| 1667 | make adequate use of the CPU resources that are available on | |
| 1668 |   multi-core systems.\footnote{Multi-core computing does not mean that
 | |
| 1669 | there are ``spare cycles'' to be wasted. It means that the | |
| 1670 | continued exponential speedup of CPU performance due to ``Moore's | |
| 1671 | Law'' follows different rules: clock frequency has reached its peak | |
| 1672 | around 2005, and applications need to be parallelized in order to | |
| 1673 | avoid a perceived loss of performance. See also | |
| 1674 |   \cite{Sutter:2005}.}
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1675 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1676 | Isabelle/Isar exploits the inherent structure of theories and proofs | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1677 |   to support \emph{implicit parallelism} to a large extent.  LCF-style
 | 
| 40126 | 1678 | theorem provides almost ideal conditions for that, see also | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1679 |   \cite{Wenzel:2009}.  This means, significant parts of theory and
 | 
| 52418 | 1680 | proof checking is parallelized by default. In Isabelle2013, a | 
| 1681 | maximum speedup-factor of 3.5 on 4 cores and 6.5 on 8 cores can be | |
| 1682 | expected. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1683 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1684 | \medskip ML threads lack the memory protection of separate | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1685 | processes, and operate concurrently on shared heap memory. This has | 
| 40126 | 1686 | the advantage that results of independent computations are directly | 
| 1687 | available to other threads: abstract values can be passed without | |
| 1688 | copying or awkward serialization that is typically required for | |
| 1689 | separate processes. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1690 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1691 | To make shared-memory multi-threading work robustly and efficiently, | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1692 | some programming guidelines need to be observed. While the ML | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1693 | system is responsible to maintain basic integrity of the | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1694 | representation of ML values in memory, the application programmer | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1695 | needs to ensure that multi-threaded execution does not break the | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1696 | intended semantics. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1697 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1698 |   \begin{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1699 | To participate in implicit parallelism, tools need to be | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1700 | thread-safe. A single ill-behaved tool can affect the stability and | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1701 | performance of the whole system. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1702 |   \end{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1703 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1704 | Apart from observing the principles of thread-safeness passively, | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1705 | advanced tools may also exploit parallelism actively, e.g.\ by using | 
| 39868 | 1706 |   ``future values'' (\secref{sec:futures}) or the more basic library
 | 
| 1707 |   functions for parallel list operations (\secref{sec:parlist}).
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1708 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1709 |   \begin{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1710 | Parallel computing resources are managed centrally by the | 
| 39868 | 1711 | Isabelle/ML infrastructure. User programs must not fork their own | 
| 1712 | ML threads to perform computations. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1713 |   \end{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1714 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1715 | |
| 39868 | 1716 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1717 | subsection {* Critical shared resources *}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1718 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1719 | text {* Thread-safeness is mainly concerned about concurrent
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1720 | read/write access to shared resources, which are outside the purely | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1721 | functional world of ML. This covers the following in particular. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1722 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1723 |   \begin{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1724 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1725 | \item Global references (or arrays), i.e.\ mutable memory cells that | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1726 | persist over several invocations of associated | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1727 |   operations.\footnote{This is independent of the visibility of such
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1728 | mutable values in the toplevel scope.} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1729 | |
| 39868 | 1730 | \item Global state of the running Isabelle/ML process, i.e.\ raw I/O | 
| 1731 | channels, environment variables, current working directory. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1732 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1733 | \item Writable resources in the file-system that are shared among | 
| 40126 | 1734 | different threads or external processes. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1735 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1736 |   \end{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1737 | |
| 39868 | 1738 | Isabelle/ML provides various mechanisms to avoid critical shared | 
| 40126 | 1739 | resources in most situations. As last resort there are some | 
| 1740 | mechanisms for explicit synchronization. The following guidelines | |
| 1741 | help to make Isabelle/ML programs work smoothly in a concurrent | |
| 1742 | environment. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1743 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1744 |   \begin{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1745 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1746 | \item Avoid global references altogether. Isabelle/Isar maintains a | 
| 39868 | 1747 | uniform context that incorporates arbitrary data declared by user | 
| 1748 |   programs (\secref{sec:context-data}).  This context is passed as
 | |
| 1749 | plain value and user tools can get/map their own data in a purely | |
| 1750 | functional manner. Configuration options within the context | |
| 1751 |   (\secref{sec:config-options}) provide simple drop-in replacements
 | |
| 40126 | 1752 | for historic reference variables. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1753 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1754 | \item Keep components with local state information re-entrant. | 
| 39868 | 1755 | Instead of poking initial values into (private) global references, a | 
| 1756 | new state record can be created on each invocation, and passed | |
| 1757 | through any auxiliary functions of the component. The state record | |
| 1758 | may well contain mutable references, without requiring any special | |
| 1759 | synchronizations, as long as each invocation gets its own copy. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1760 | |
| 39868 | 1761 |   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The
 | 
| 1762 | Poly/ML library is thread-safe for each individual output operation, | |
| 1763 | but the ordering of parallel invocations is arbitrary. This means | |
| 1764 | raw output will appear on some system console with unpredictable | |
| 1765 | interleaving of atomic chunks. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1766 | |
| 39868 | 1767 | Note that this does not affect regular message output channels | 
| 1768 |   (\secref{sec:message-channels}).  An official message is associated
 | |
| 1769 | with the command transaction from where it originates, independently | |
| 1770 | of other transactions. This means each running Isar command has | |
| 1771 | effectively its own set of message channels, and interleaving can | |
| 1772 | only happen when commands use parallelism internally (and only at | |
| 1773 | message boundaries). | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1774 | |
| 39868 | 1775 | \item Treat environment variables and the current working directory | 
| 1776 | of the running process as strictly read-only. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1777 | |
| 39868 | 1778 | \item Restrict writing to the file-system to unique temporary files. | 
| 1779 | Isabelle already provides a temporary directory that is unique for | |
| 1780 | the running process, and there is a centralized source of unique | |
| 1781 | serial numbers in Isabelle/ML. Thus temporary files that are passed | |
| 1782 | to to some external process will be always disjoint, and thus | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1783 | thread-safe. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1784 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1785 |   \end{itemize}
 | 
| 39868 | 1786 | *} | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1787 | |
| 39868 | 1788 | text %mlref {*
 | 
| 1789 |   \begin{mldecls}
 | |
| 1790 |   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
 | |
| 1791 |   @{index_ML serial_string: "unit -> string"} \\
 | |
| 1792 |   \end{mldecls}
 | |
| 1793 | ||
| 1794 |   \begin{description}
 | |
| 1795 | ||
| 1796 |   \item @{ML File.tmp_path}~@{text "path"} relocates the base
 | |
| 1797 |   component of @{text "path"} into the unique temporary directory of
 | |
| 1798 | the running Isabelle/ML process. | |
| 1799 | ||
| 1800 |   \item @{ML serial_string}~@{text "()"} creates a new serial number
 | |
| 1801 | that is unique over the runtime of the Isabelle/ML process. | |
| 1802 | ||
| 1803 |   \end{description}
 | |
| 1804 | *} | |
| 1805 | ||
| 1806 | text %mlex {* The following example shows how to create unique
 | |
| 1807 | temporary file names. | |
| 1808 | *} | |
| 1809 | ||
| 1810 | ML {*
 | |
| 1811 |   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | |
| 1812 |   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | |
| 1813 |   @{assert} (tmp1 <> tmp2);
 | |
| 1814 | *} | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1815 | |
| 39868 | 1816 | |
| 1817 | subsection {* Explicit synchronization *}
 | |
| 1818 | ||
| 1819 | text {* Isabelle/ML also provides some explicit synchronization
 | |
| 1820 | mechanisms, for the rare situations where mutable shared resources | |
| 1821 | are really required. These are based on the synchronizations | |
| 1822 | primitives of Poly/ML, which have been adapted to the specific | |
| 1823 | assumptions of the concurrent Isabelle/ML environment. User code | |
| 1824 | must not use the Poly/ML primitives directly! | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1825 | |
| 39868 | 1826 | \medskip The most basic synchronization concept is a single | 
| 1827 |   \emph{critical section} (also called ``monitor'' in the literature).
 | |
| 1828 | A thread that enters the critical section prevents all other threads | |
| 1829 | from doing the same. A thread that is already within the critical | |
| 1830 | section may re-enter it in an idempotent manner. | |
| 1831 | ||
| 1832 | Such centralized locking is convenient, because it prevents | |
| 1833 | deadlocks by construction. | |
| 1834 | ||
| 1835 |   \medskip More fine-grained locking works via \emph{synchronized
 | |
| 1836 | variables}. An explicit state component is associated with | |
| 1837 | mechanisms for locking and signaling. There are operations to | |
| 1838 | await a condition, change the state, and signal the change to all | |
| 1839 | other waiting threads. | |
| 1840 | ||
| 1841 |   Here the synchronized access to the state variable is \emph{not}
 | |
| 1842 | re-entrant: direct or indirect nesting within the same thread will | |
| 1843 | cause a deadlock! | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1844 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1845 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1846 | text %mlref {*
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1847 |   \begin{mldecls}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1848 |   @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1849 |   @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1850 |   \end{mldecls}
 | 
| 39871 | 1851 |   \begin{mldecls}
 | 
| 1852 |   @{index_ML_type "'a Synchronized.var"} \\
 | |
| 1853 |   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
 | |
| 1854 |   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
 | |
| 1855 |   ('a -> ('b * 'a) option) -> 'b"} \\
 | |
| 1856 |   \end{mldecls}
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1857 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1858 |   \begin{description}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1859 | |
| 39868 | 1860 |   \item @{ML NAMED_CRITICAL}~@{text "name e"} evaluates @{text "e ()"}
 | 
| 1861 | within the central critical section of Isabelle/ML. No other thread | |
| 1862 | may do so at the same time, but non-critical parallel execution will | |
| 39871 | 1863 |   continue.  The @{text "name"} argument is used for tracing and might
 | 
| 39868 | 1864 | help to spot sources of congestion. | 
| 1865 | ||
| 52418 | 1866 | Entering the critical section without contention is very fast. Each | 
| 1867 | thread should stay within the critical section only very briefly, | |
| 40126 | 1868 | otherwise parallel performance may degrade. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1869 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1870 |   \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1871 | name argument. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1872 | |
| 39871 | 1873 |   \item Type @{ML_type "'a Synchronized.var"} represents synchronized
 | 
| 1874 |   variables with state of type @{ML_type 'a}.
 | |
| 1875 | ||
| 1876 |   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized
 | |
| 1877 |   variable that is initialized with value @{text "x"}.  The @{text
 | |
| 1878 | "name"} is used for tracing. | |
| 1879 | ||
| 1880 |   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the
 | |
| 1881 |   function @{text "f"} operate within a critical section on the state
 | |
| 40126 | 1882 |   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it
 | 
| 1883 | continues to wait on the internal condition variable, expecting that | |
| 39871 | 1884 | some other thread will eventually change the content in a suitable | 
| 40126 | 1885 |   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is
 | 
| 1886 |   satisfied and assigns the new state value @{text "x'"}, broadcasts a
 | |
| 1887 | signal to all waiting threads on the associated condition variable, | |
| 1888 |   and returns the result @{text "y"}.
 | |
| 39871 | 1889 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1890 |   \end{description}
 | 
| 39871 | 1891 | |
| 40126 | 1892 |   There are some further variants of the @{ML
 | 
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1893 |   Synchronized.guarded_access} combinator, see @{file
 | 
| 39871 | 1894 | "~~/src/Pure/Concurrent/synchronized.ML"} for details. | 
| 1895 | *} | |
| 1896 | ||
| 40126 | 1897 | text %mlex {* The following example implements a counter that produces
 | 
| 39871 | 1898 | positive integers that are unique over the runtime of the Isabelle | 
| 1899 | process: | |
| 1900 | *} | |
| 1901 | ||
| 1902 | ML {*
 | |
| 1903 | local | |
| 1904 | val counter = Synchronized.var "counter" 0; | |
| 1905 | in | |
| 1906 | fun next () = | |
| 1907 | Synchronized.guarded_access counter | |
| 1908 | (fn i => | |
| 1909 | let val j = i + 1 | |
| 1910 | in SOME (j, j) end); | |
| 1911 | end; | |
| 1912 | *} | |
| 1913 | ||
| 1914 | ML {*
 | |
| 1915 | val a = next (); | |
| 1916 | val b = next (); | |
| 1917 |   @{assert} (a <> b);
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1918 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1919 | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1920 | text {* \medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how
 | 
| 40126 | 1921 | to implement a mailbox as synchronized variable over a purely | 
| 1922 | functional queue. *} | |
| 1923 | ||
| 52419 | 1924 | |
| 1925 | section {* Managed evaluation *}
 | |
| 1926 | ||
| 1927 | text {* Execution of Standard ML follows the model of strict
 | |
| 1928 | functional evaluation with optional exceptions. Evaluation happens | |
| 1929 | whenever some function is applied to (sufficiently many) | |
| 1930 | arguments. The result is either an explicit value or an implicit | |
| 1931 | exception. | |
| 1932 | ||
| 1933 |   \emph{Managed evaluation} in Isabelle/ML organizes expressions and
 | |
| 1934 | results to control certain physical side-conditions, to say more | |
| 1935 | specifically when and how evaluation happens. For example, the | |
| 1936 | Isabelle/ML library supports lazy evaluation with memoing, parallel | |
| 1937 | evaluation via futures, asynchronous evaluation via promises, | |
| 1938 | evaluation with time limit etc. | |
| 1939 | ||
| 1940 |   \medskip An \emph{unevaluated expression} is represented either as
 | |
| 1941 |   unit abstraction @{verbatim "fn () => a"} of type
 | |
| 1942 |   @{verbatim "unit -> 'a"} or as regular function
 | |
| 1943 |   @{verbatim "fn a => b"} of type @{verbatim "'a -> 'b"}.  Both forms
 | |
| 1944 | occur routinely, and special care is required to tell them apart --- | |
| 1945 | the static type-system of SML is only of limited help here. | |
| 1946 | ||
| 1947 |   The first form is more intuitive: some combinator @{text "(unit ->
 | |
| 1948 |   'a) -> 'a"} applies the given function to @{text "()"} to initiate
 | |
| 1949 | the postponed evaluation process. The second form is more flexible: | |
| 1950 |   some combinator @{text "('a -> 'b) -> 'a -> 'b"} acts like a
 | |
| 1951 | modified form of function application; several such combinators may | |
| 1952 | be cascaded to modify a given function, before it is ultimately | |
| 1953 | applied to some argument. | |
| 1954 | ||
| 1955 |   \medskip \emph{Reified results} make the disjoint sum of regular
 | |
| 1956 | values versions exceptional situations explicit as ML datatype: | |
| 1957 |   @{text "'a result = Res of 'a | Exn of exn"}.  This is typically
 | |
| 1958 | used for administrative purposes, to store the overall outcome of an | |
| 1959 | evaluation process. | |
| 1960 | ||
| 1961 |   \emph{Parallel exceptions} aggregate reified results, such that
 | |
| 1962 | multiple exceptions are digested as a collection in canonical form | |
| 1963 | that identifies exceptions according to their original occurrence. | |
| 1964 | This is particular important for parallel evaluation via futures | |
| 1965 |   \secref{sec:futures}, which are organized as acyclic graph of
 | |
| 1966 | evaluations that depend on other evaluations: exceptions stemming | |
| 1967 | from shared sub-graphs are exposed exactly once and in the order of | |
| 1968 | their original occurrence (e.g.\ when printed at the toplevel). | |
| 1969 | Interrupt counts as neutral element here: it is treated as minimal | |
| 1970 | information about some canceled evaluation process, and is absorbed | |
| 1971 | by the presence of regular program exceptions. *} | |
| 1972 | ||
| 1973 | text %mlref {*
 | |
| 1974 |   \begin{mldecls}
 | |
| 1975 |   @{index_ML_type "'a Exn.result"} \\
 | |
| 1976 |   @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
 | |
| 1977 |   @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
 | |
| 1978 |   @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
 | |
| 1979 |   @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
 | |
| 1980 |   @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
 | |
| 1981 |   \end{mldecls}
 | |
| 1982 | ||
| 1983 |   \begin{description}
 | |
| 1984 | ||
| 1985 |   \item Type @{ML_type "'a Exn.result"} represents the disjoint sum of
 | |
| 1986 |   ML results explicitly, with constructor @{ML Exn.Res} for regular
 | |
| 1987 |   values and @{ML "Exn.Exn"} for exceptions.
 | |
| 1988 | ||
| 1989 |   \item @{ML Exn.capture}~@{text "f x"} manages the evaluation of
 | |
| 1990 |   @{text "f x"} such that exceptions are made explicit as @{ML
 | |
| 1991 | "Exn.Exn"}. Note that this includes physical interrupts (see also | |
| 1992 |   \secref{sec:exceptions}), so the same precautions apply to user
 | |
| 1993 | code: interrupts must not be absorbed accidentally! | |
| 1994 | ||
| 1995 |   \item @{ML Exn.interruptible_capture} is similar to @{ML
 | |
| 1996 | Exn.capture}, but interrupts are immediately re-raised as required | |
| 1997 | for user code. | |
| 1998 | ||
| 1999 |   \item @{ML Exn.release}~@{text "result"} releases the original
 | |
| 2000 | runtime result, exposing its regular value or raising the reified | |
| 2001 | exception. | |
| 2002 | ||
| 2003 |   \item @{ML Par_Exn.release_all}~@{text "results"} combines results
 | |
| 2004 | that were produced independently (e.g.\ by parallel evaluation). If | |
| 2005 | all results are regular values, that list is returned. Otherwise, | |
| 2006 | the collection of all exceptions is raised, wrapped-up as collective | |
| 2007 | parallel exception. Note that the latter prevents access to | |
| 2008 |   individual exceptions by conventional @{verbatim "handle"} of SML.
 | |
| 2009 | ||
| 2010 |   \item @{ML Par_Exn.release_first} is similar to @{ML
 | |
| 2011 | Par_Exn.release_all}, but only the first exception that has occurred | |
| 2012 | in the original evaluation process is raised again, the others are | |
| 2013 | ignored. That single exception may get handled by conventional | |
| 2014 | means in SML. | |
| 2015 | ||
| 2016 |   \end{description}
 | |
| 2017 | *} | |
| 2018 | ||
| 2019 | ||
| 52420 | 2020 | subsection {* Parallel skeletons \label{sec:parlist} *}
 | 
| 2021 | ||
| 2022 | text {*
 | |
| 2023 | Algorithmic skeletons are combinators that operate on lists in | |
| 2024 |   parallel, in the manner of well-known @{text map}, @{text exists},
 | |
| 2025 |   @{text forall} etc.  Management of futures (\secref{sec:futures})
 | |
| 2026 | and their results as reified exceptions is wrapped up into simple | |
| 2027 | programming interfaces that resemble the sequential versions. | |
| 2028 | ||
| 2029 | What remains is the application-specific problem to present | |
| 2030 |   expressions with suitable \emph{granularity}: each list element
 | |
| 2031 | corresponds to one evaluation task. If the granularity is too | |
| 2032 | coarse, the available CPUs are not saturated. If it is too | |
| 2033 | fine-grained, CPU cycles are wasted due to the overhead of | |
| 2034 | organizing parallel processing. In the worst case, parallel | |
| 2035 | performance will be less than the sequential counterpart! | |
| 2036 | *} | |
| 2037 | ||
| 2038 | text %mlref {*
 | |
| 2039 |   \begin{mldecls}
 | |
| 2040 |   @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
 | |
| 2041 |   @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
 | |
| 2042 |   \end{mldecls}
 | |
| 2043 | ||
| 2044 |   \begin{description}
 | |
| 2045 | ||
| 2046 |   \item @{ML Par_List.map}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} is like @{ML
 | |
| 2047 |   "map"}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"}, but the evaluation of @{text "f x\<^sub>i"}
 | |
| 2048 |   for @{text "i = 1, \<dots>, n"} is performed in parallel.
 | |
| 2049 | ||
| 2050 |   An exception in any @{text "f x\<^sub>i"} cancels the overall evaluation
 | |
| 2051 |   process.  The final result is produced via @{ML
 | |
| 2052 | Par_Exn.release_first} as explained above, which means the first | |
| 2053 | program exception that happened to occur in the parallel evaluation | |
| 2054 | is propagated, and all other failures are ignored. | |
| 2055 | ||
| 2056 |   \item @{ML Par_List.get_some}~@{text "f [x\<^sub>1, \<dots>, x\<^sub>n]"} produces some
 | |
| 2057 |   @{text "f x\<^sub>i"} that is of the form @{text "SOME y\<^sub>i"}, if that
 | |
| 2058 |   exists, otherwise @{text "NONE"}.  Thus it is similar to @{ML
 | |
| 2059 | Library.get_first}, but subject to a non-deterministic parallel | |
| 2060 | choice process. The first successful result cancels the overall | |
| 2061 |   evaluation process; other exceptions are propagated as for @{ML
 | |
| 2062 | Par_List.map}. | |
| 2063 | ||
| 2064 | This generic parallel choice combinator is the basis for derived | |
| 2065 |   forms, such as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML
 | |
| 2066 | Par_List.forall}. | |
| 2067 | ||
| 2068 |   \end{description}
 | |
| 2069 | *} | |
| 2070 | ||
| 2071 | text %mlex {* Subsequently, the Ackermann function is evaluated in
 | |
| 2072 | parallel for some ranges of arguments. *} | |
| 2073 | ||
| 2074 | ML_val {*
 | |
| 2075 | fun ackermann 0 n = n + 1 | |
| 2076 | | ackermann m 0 = ackermann (m - 1) 1 | |
| 2077 | | ackermann m n = ackermann (m - 1) (ackermann m (n - 1)); | |
| 2078 | ||
| 2079 | Par_List.map (ackermann 2) (500 upto 1000); | |
| 2080 | Par_List.map (ackermann 3) (5 upto 10); | |
| 2081 | *} | |
| 2082 | ||
| 2083 | ||
| 2084 | subsection {* Lazy evaluation *}
 | |
| 2085 | ||
| 2086 | text {*
 | |
| 2087 | %FIXME | |
| 2088 | ||
| 53982 | 2089 |   See also @{file "~~/src/Pure/Concurrent/lazy.ML"}.
 | 
| 52420 | 2090 | *} | 
| 2091 | ||
| 2092 | ||
| 2093 | subsection {* Future values \label{sec:futures} *}
 | |
| 2094 | ||
| 2095 | text {*
 | |
| 2096 | %FIXME | |
| 2097 | ||
| 53982 | 2098 |   See also @{file "~~/src/Pure/Concurrent/future.ML"}.
 | 
| 52420 | 2099 | *} | 
| 2100 | ||
| 2101 | ||
| 47180 | 2102 | end |