| author | hoelzl | 
| Tue, 29 Mar 2011 14:27:42 +0200 | |
| changeset 42148 | d596e7bb251f | 
| parent 41162 | 2181c47a02fe | 
| child 42510 | b9c106763325 | 
| permissions | -rw-r--r-- | 
| 29755 | 1 | theory "ML" | 
| 2 | imports Base | |
| 3 | begin | |
| 18538 | 4 | |
| 39822 | 5 | chapter {* Isabelle/ML *}
 | 
| 20489 | 6 | |
| 39823 | 7 | text {* Isabelle/ML is best understood as a certain culture based on
 | 
| 8 | Standard ML. Thus it is not a new programming language, but a | |
| 9 | certain way to use SML at an advanced level within the Isabelle | |
| 10 | environment. This covers a variety of aspects that are geared | |
| 11 | towards an efficient and robust platform for applications of formal | |
| 12 | logic with fully foundational proof construction --- according to | |
| 40126 | 13 |   the well-known \emph{LCF principle}.  There is specific
 | 
| 14 | infrastructure with library modules to address the needs of this | |
| 15 | difficult task. For example, the raw parallel programming model of | |
| 16 | Poly/ML is presented as considerably more abstract concept of | |
| 17 |   \emph{future values}, which is then used to augment the inference
 | |
| 18 | kernel, proof interpreter, and theory loader accordingly. | |
| 39823 | 19 | |
| 20 | The main aspects of Isabelle/ML are introduced below. These | |
| 21 | first-hand explanations should help to understand how proper | |
| 22 | Isabelle/ML is to be read and written, and to get access to the | |
| 23 | wealth of experience that is expressed in the source text and its | |
| 24 |   history of changes.\footnote{See
 | |
| 25 |   \url{http://isabelle.in.tum.de/repos/isabelle} for the full
 | |
| 26 | Mercurial history. There are symbolic tags to refer to official | |
| 27 |   Isabelle releases, as opposed to arbitrary \emph{tip} versions that
 | |
| 28 | merely reflect snapshots that are never really up-to-date.} *} | |
| 29 | ||
| 30 | ||
| 39878 | 31 | section {* Style and orthography *}
 | 
| 32 | ||
| 39879 | 33 | text {* The sources of Isabelle/Isar are optimized for
 | 
| 34 |   \emph{readability} and \emph{maintainability}.  The main purpose is
 | |
| 35 | to tell an informed reader what is really going on and how things | |
| 36 | really work. This is a non-trivial aim, but it is supported by a | |
| 37 | certain style of writing Isabelle/ML that has emerged from long | |
| 38 |   years of system development.\footnote{See also the interesting style
 | |
| 39 | guide for OCaml | |
| 39878 | 40 |   \url{http://caml.inria.fr/resources/doc/guides/guidelines.en.html}
 | 
| 41 | which shares many of our means and ends.} | |
| 42 | ||
| 43 |   The main principle behind any coding style is \emph{consistency}.
 | |
| 39879 | 44 | For a single author of a small program this merely means ``choose | 
| 39878 | 45 | your style and stick to it''. A complex project like Isabelle, with | 
| 39879 | 46 | long years of development and different contributors, requires more | 
| 47 | standardization. A coding style that is changed every few years or | |
| 48 | with every new contributor is no style at all, because consistency | |
| 49 | is quickly lost. Global consistency is hard to achieve, though. | |
| 40126 | 50 | Nonetheless, one should always strive at least for local consistency | 
| 51 | of modules and sub-systems, without deviating from some general | |
| 52 | principles how to write Isabelle/ML. | |
| 39878 | 53 | |
| 40126 | 54 |   In a sense, good coding style is like an \emph{orthography} for the
 | 
| 55 | sources: it helps to read quickly over the text and see through the | |
| 56 | main points, without getting distracted by accidental presentation | |
| 57 | of free-style code. | |
| 39878 | 58 | *} | 
| 59 | ||
| 60 | ||
| 61 | subsection {* Header and sectioning *}
 | |
| 62 | ||
| 39879 | 63 | text {* Isabelle source files have a certain standardized header
 | 
| 64 | format (with precise spacing) that follows ancient traditions | |
| 65 | reaching back to the earliest versions of the system by Larry | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 66 |   Paulson.  See @{file "~~/src/Pure/thm.ML"}, for example.
 | 
| 39878 | 67 | |
| 68 |   The header includes at least @{verbatim Title} and @{verbatim
 | |
| 69 | Author} entries, followed by a prose description of the purpose of | |
| 70 | the module. The latter can range from a single line to several | |
| 71 | paragraphs of explanations. | |
| 72 | ||
| 73 | The rest of the file is divided into sections, subsections, | |
| 40126 | 74 | subsubsections, paragraphs etc.\ using a simple layout via ML | 
| 75 | comments as follows. | |
| 39878 | 76 | |
| 77 | \begin{verbatim}
 | |
| 78 | (*** section ***) | |
| 79 | ||
| 80 | (** subsection **) | |
| 81 | ||
| 82 | (* subsubsection *) | |
| 83 | ||
| 84 | (*short paragraph*) | |
| 85 | ||
| 86 | (* | |
| 40126 | 87 | long paragraph, | 
| 88 | with more text | |
| 39878 | 89 | *) | 
| 90 | \end{verbatim}
 | |
| 91 | ||
| 92 |   As in regular typography, there is some extra space \emph{before}
 | |
| 93 | section headings that are adjacent to plain text (not other headings | |
| 94 | as in the example above). | |
| 95 | ||
| 96 | \medskip The precise wording of the prose text given in these | |
| 40126 | 97 | headings is chosen carefully to introduce the main theme of the | 
| 39879 | 98 | subsequent formal ML text. | 
| 99 | *} | |
| 100 | ||
| 101 | ||
| 39880 | 102 | subsection {* Naming conventions *}
 | 
| 39879 | 103 | |
| 104 | text {* Since ML is the primary medium to express the meaning of the
 | |
| 105 | source text, naming of ML entities requires special care. | |
| 106 | ||
| 39881 | 107 |   \paragraph{Notation.}  A name consists of 1--3 \emph{words} (rarely
 | 
| 108 | 4, but not more) that are separated by underscore. There are three | |
| 40126 | 109 | variants concerning upper or lower case letters, which are used for | 
| 39881 | 110 | certain ML categories as follows: | 
| 39880 | 111 | |
| 112 | \medskip | |
| 113 |   \begin{tabular}{lll}
 | |
| 114 | variant & example & ML categories \\\hline | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 115 |   lower-case & @{ML_text foo_bar} & values, types, record fields \\
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 116 |   capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 117 |   upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
 | 
| 39880 | 118 |   \end{tabular}
 | 
| 119 | \medskip | |
| 120 | ||
| 121 | For historical reasons, many capitalized names omit underscores, | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 122 |   e.g.\ old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}.
 | 
| 40126 | 123 |   Genuine mixed-case names are \emph{not} used, bacause clear division
 | 
| 124 |   of words is essential for readability.\footnote{Camel-case was
 | |
| 125 | invented to workaround the lack of underscore in some early | |
| 126 | non-ASCII character sets. Later it became habitual in some language | |
| 127 | communities that are now strong in numbers.} | |
| 39880 | 128 | |
| 39881 | 129 | A single (capital) character does not count as ``word'' in this | 
| 40126 | 130 | respect: some Isabelle/ML names are suffixed by extra markers like | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 131 |   this: @{ML_text foo_barT}.
 | 
| 39881 | 132 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 133 |   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 134 |   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
 | 
| 39881 | 135 | foo''''} or more. Decimal digits scale better to larger numbers, | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 136 |   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
 | 
| 39880 | 137 | |
| 138 |   \paragraph{Scopes.}  Apart from very basic library modules, ML
 | |
| 139 | structures are not ``opened'', but names are referenced with | |
| 39881 | 140 |   explicit qualification, as in @{ML Syntax.string_of_term} for
 | 
| 39880 | 141 | example. When devising names for structures and their components it | 
| 142 | is important aim at eye-catching compositions of both parts, because | |
| 40126 | 143 | this is how they are seen in the sources and documentation. For the | 
| 144 | same reasons, aliases of well-known library functions should be | |
| 145 | avoided. | |
| 39880 | 146 | |
| 40126 | 147 | Local names of function abstraction or case/let bindings are | 
| 39880 | 148 | typically shorter, sometimes using only rudiments of ``words'', | 
| 149 | while still avoiding cryptic shorthands. An auxiliary function | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 150 |   called @{ML_text helper}, @{ML_text aux}, or @{ML_text f} is
 | 
| 39880 | 151 | considered bad style. | 
| 152 | ||
| 153 | Example: | |
| 154 | ||
| 155 |   \begin{verbatim}
 | |
| 156 | (* RIGHT *) | |
| 157 | ||
| 158 | fun print_foo ctxt foo = | |
| 159 | let | |
| 39881 | 160 | fun print t = ... Syntax.string_of_term ctxt t ... | 
| 161 | in ... end; | |
| 162 | ||
| 163 | ||
| 164 | (* RIGHT *) | |
| 165 | ||
| 166 | fun print_foo ctxt foo = | |
| 167 | let | |
| 39880 | 168 | val string_of_term = Syntax.string_of_term ctxt; | 
| 169 | fun print t = ... string_of_term t ... | |
| 170 | in ... end; | |
| 171 | ||
| 172 | ||
| 173 | (* WRONG *) | |
| 174 | ||
| 175 | val string_of_term = Syntax.string_of_term; | |
| 176 | ||
| 177 | fun print_foo ctxt foo = | |
| 178 | let | |
| 179 | fun aux t = ... string_of_term ctxt t ... | |
| 180 | in ... end; | |
| 181 | ||
| 182 |   \end{verbatim}
 | |
| 183 | ||
| 184 | ||
| 40126 | 185 |   \paragraph{Specific conventions.} Here are some specific name forms
 | 
| 186 | that occur frequently in the sources. | |
| 39881 | 187 | |
| 188 |   \begin{itemize}
 | |
| 189 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 190 |   \item A function that maps @{ML_text foo} to @{ML_text bar} is
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 191 |   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 192 |   @{ML_text foo2bar}, @{ML_text bar_from_foo}, @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 193 |   bar_for_foo}, or @{ML_text bar4foo}).
 | 
| 39881 | 194 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 195 |   \item The name component @{ML_text legacy} means that the operation
 | 
| 39881 | 196 | is about to be discontinued soon. | 
| 197 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 198 |   \item The name component @{ML_text old} means that this is historic
 | 
| 39881 | 199 | material that might disappear at some later stage. | 
| 200 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 201 |   \item The name component @{ML_text global} means that this works
 | 
| 39881 | 202 | with the background theory instead of the regular local context | 
| 203 |   (\secref{sec:context}), sometimes for historical reasons, sometimes
 | |
| 204 | due a genuine lack of locality of the concept involved, sometimes as | |
| 205 | a fall-back for the lack of a proper context in the application | |
| 206 | code. Whenever there is a non-global variant available, the | |
| 207 | application should be migrated to use it with a proper local | |
| 208 | context. | |
| 209 | ||
| 210 | \item Variables of the main context types of the Isabelle/Isar | |
| 211 |   framework (\secref{sec:context} and \chref{ch:local-theory}) have
 | |
| 40126 | 212 | firm naming conventions as follows: | 
| 39881 | 213 | |
| 214 |   \begin{itemize}
 | |
| 215 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 216 |   \item theories are called @{ML_text thy}, rarely @{ML_text theory}
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 217 |   (never @{ML_text thry})
 | 
| 39881 | 218 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 219 |   \item proof contexts are called @{ML_text ctxt}, rarely @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 220 |   context} (never @{ML_text ctx})
 | 
| 39881 | 221 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 222 |   \item generic contexts are called @{ML_text context}, rarely
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 223 |   @{ML_text ctxt}
 | 
| 39881 | 224 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 225 |   \item local theories are called @{ML_text lthy}, except for local
 | 
| 40126 | 226 | theories that are treated as proof context (which is a semantic | 
| 227 | super-type) | |
| 39881 | 228 | |
| 229 |   \end{itemize}
 | |
| 230 | ||
| 231 | Variations with primed or decimal numbers are always possible, as | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 232 |   well as sematic prefixes like @{ML_text foo_thy} or @{ML_text
 | 
| 39881 | 233 | bar_ctxt}, but the base conventions above need to be preserved. | 
| 40126 | 234 | This allows to visualize the their data flow via plain regular | 
| 235 | expressions in the editor. | |
| 39881 | 236 | |
| 40126 | 237 |   \item The main logical entities (\secref{ch:logic}) have established
 | 
| 238 | naming convention as follows: | |
| 39881 | 239 | |
| 240 |   \begin{itemize}
 | |
| 241 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 242 |   \item sorts are called @{ML_text S}
 | 
| 39881 | 243 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 244 |   \item types are called @{ML_text T}, @{ML_text U}, or @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 245 |   ty} (never @{ML_text t})
 | 
| 39881 | 246 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 247 |   \item terms are called @{ML_text t}, @{ML_text u}, or @{ML_text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 248 |   tm} (never @{ML_text trm})
 | 
| 39881 | 249 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 250 |   \item certified types are called @{ML_text cT}, rarely @{ML_text
 | 
| 39881 | 251 | T}, with variants as for types | 
| 252 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 253 |   \item certified terms are called @{ML_text ct}, rarely @{ML_text
 | 
| 39881 | 254 | t}, with variants as for terms | 
| 255 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 256 |   \item theorems are called @{ML_text th}, or @{ML_text thm}
 | 
| 39881 | 257 | |
| 258 |   \end{itemize}
 | |
| 259 | ||
| 260 | Proper semantic names override these conventions completely. For | |
| 261 | example, the left-hand side of an equation (as a term) can be called | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 262 |   @{ML_text lhs} (not @{ML_text lhs_tm}).  Or a term that is known
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 263 |   to be a variable can be called @{ML_text v} or @{ML_text x}.
 | 
| 39881 | 264 | |
| 40310 | 265 |   \item Tactics (\secref{sec:tactics}) are sufficiently important to
 | 
| 266 | have specific naming conventions. The name of a basic tactic | |
| 267 |   definition always has a @{ML_text "_tac"} suffix, the subgoal index
 | |
| 268 |   (if applicable) is always called @{ML_text i}, and the goal state
 | |
| 269 |   (if made explicit) is usually called @{ML_text st} instead of the
 | |
| 270 |   somewhat misleading @{ML_text thm}.  Any other arguments are given
 | |
| 271 | before the latter two, and the general context is given first. | |
| 272 | Example: | |
| 273 | ||
| 274 |   \begin{verbatim}
 | |
| 275 | fun my_tac ctxt arg1 arg2 i st = ... | |
| 276 |   \end{verbatim}
 | |
| 277 | ||
| 278 |   Note that the goal state @{ML_text st} above is rarely made
 | |
| 279 | explicit, if tactic combinators (tacticals) are used as usual. | |
| 280 | ||
| 39881 | 281 |   \end{itemize}
 | 
| 39878 | 282 | *} | 
| 283 | ||
| 284 | ||
| 285 | subsection {* General source layout *}
 | |
| 286 | ||
| 287 | text {* The general Isabelle/ML source layout imitates regular
 | |
| 288 | type-setting to some extent, augmented by the requirements for | |
| 289 | deeply nested expressions that are commonplace in functional | |
| 290 | programming. | |
| 291 | ||
| 292 |   \paragraph{Line length} is 80 characters according to ancient
 | |
| 40126 | 293 | standards, but we allow as much as 100 characters (not | 
| 294 |   more).\footnote{Readability requires to keep the beginning of a line
 | |
| 39881 | 295 | in view while watching its end. Modern wide-screen displays do not | 
| 40126 | 296 | change the way how the human brain works. Sources also need to be | 
| 297 | printable on plain paper with reasonable font-size.} The extra 20 | |
| 39880 | 298 | characters acknowledge the space requirements due to qualified | 
| 299 | library references in Isabelle/ML. | |
| 39878 | 300 | |
| 301 |   \paragraph{White-space} is used to emphasize the structure of
 | |
| 302 | expressions, following mostly standard conventions for mathematical | |
| 303 |   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
 | |
| 39879 | 304 | defines positioning of spaces for parentheses, punctuation, and | 
| 305 | infixes as illustrated here: | |
| 39878 | 306 | |
| 307 |   \begin{verbatim}
 | |
| 308 | val x = y + z * (a + b); | |
| 309 | val pair = (a, b); | |
| 310 |   val record = {foo = 1, bar = 2};
 | |
| 311 |   \end{verbatim}
 | |
| 312 | ||
| 39879 | 313 |   Lines are normally broken \emph{after} an infix operator or
 | 
| 314 | punctuation character. For example: | |
| 39878 | 315 | |
| 316 |   \begin{verbatim}
 | |
| 317 | val x = | |
| 318 | a + | |
| 319 | b + | |
| 320 | c; | |
| 321 | ||
| 322 | val tuple = | |
| 323 | (a, | |
| 324 | b, | |
| 325 | c); | |
| 326 |   \end{verbatim}
 | |
| 327 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 328 |   Some special infixes (e.g.\ @{ML_text "|>"}) work better at the
 | 
| 39879 | 329 | start of the line, but punctuation is always at the end. | 
| 39878 | 330 | |
| 331 |   Function application follows the tradition of @{text "\<lambda>"}-calculus,
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 332 |   not informal mathematics.  For example: @{ML_text "f a b"} for a
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 333 |   curried function, or @{ML_text "g (a, b)"} for a tupled function.
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 334 |   Note that the space between @{ML_text g} and the pair @{ML_text
 | 
| 39879 | 335 | "(a, b)"} follows the important principle of | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 336 |   \emph{compositionality}: the layout of @{ML_text "g p"} does not
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 337 |   change when @{ML_text "p"} is refined to the concrete pair
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 338 |   @{ML_text "(a, b)"}.
 | 
| 39878 | 339 | |
| 340 |   \paragraph{Indentation} uses plain spaces, never hard
 | |
| 341 |   tabulators.\footnote{Tabulators were invented to move the carriage
 | |
| 342 | of a type-writer to certain predefined positions. In software they | |
| 343 | could be used as a primitive run-length compression of consecutive | |
| 344 | spaces, but the precise result would depend on non-standardized | |
| 345 | editor configuration.} | |
| 346 | ||
| 39879 | 347 | Each level of nesting is indented by 2 spaces, sometimes 1, very | 
| 40126 | 348 | rarely 4, never 8 or any other odd number. | 
| 39878 | 349 | |
| 39879 | 350 | Indentation follows a simple logical format that only depends on the | 
| 351 | nesting depth, not the accidental length of the text that initiates | |
| 352 | a level of nesting. Example: | |
| 39878 | 353 | |
| 354 |   \begin{verbatim}
 | |
| 39880 | 355 | (* RIGHT *) | 
| 356 | ||
| 39878 | 357 | if b then | 
| 39879 | 358 | expr1_part1 | 
| 359 | expr1_part2 | |
| 39878 | 360 | else | 
| 39879 | 361 | expr2_part1 | 
| 362 | expr2_part2 | |
| 39878 | 363 | |
| 39880 | 364 | |
| 365 | (* WRONG *) | |
| 366 | ||
| 39879 | 367 | if b then expr1_part1 | 
| 368 | expr1_part2 | |
| 369 | else expr2_part1 | |
| 370 | expr2_part2 | |
| 39878 | 371 |   \end{verbatim}
 | 
| 372 | ||
| 373 | The second form has many problems: it assumes a fixed-width font | |
| 39879 | 374 | when viewing the sources, it uses more space on the line and thus | 
| 375 | makes it hard to observe its strict length limit (working against | |
| 39878 | 376 |   \emph{readability}), it requires extra editing to adapt the layout
 | 
| 39879 | 377 | to changes of the initial text (working against | 
| 39878 | 378 |   \emph{maintainability}) etc.
 | 
| 379 | ||
| 39879 | 380 | \medskip For similar reasons, any kind of two-dimensional or tabular | 
| 40126 | 381 | layouts, ASCII-art with lines or boxes of asterisks etc.\ should be | 
| 39879 | 382 | avoided. | 
| 39881 | 383 | |
| 40126 | 384 |   \paragraph{Complex expressions} that consist of multi-clausal
 | 
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 385 |   function definitions, @{ML_text handle}, @{ML_text case},
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 386 |   @{ML_text let} (and combinations) require special attention.  The
 | 
| 40126 | 387 | syntax of Standard ML is quite ambitious and admits a lot of | 
| 388 | variance that can distort the meaning of the text. | |
| 39881 | 389 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 390 |   Clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 391 |   @{ML_text case} get extra indentation to indicate the nesting
 | 
| 40126 | 392 | clearly. Example: | 
| 39881 | 393 | |
| 394 |   \begin{verbatim}
 | |
| 395 | (* RIGHT *) | |
| 396 | ||
| 397 | fun foo p1 = | |
| 398 | expr1 | |
| 399 | | foo p2 = | |
| 400 | expr2 | |
| 401 | ||
| 402 | ||
| 403 | (* WRONG *) | |
| 404 | ||
| 405 | fun foo p1 = | |
| 406 | expr1 | |
| 407 | | foo p2 = | |
| 408 | expr2 | |
| 409 |   \end{verbatim}
 | |
| 410 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 411 |   Body expressions consisting of @{ML_text case} or @{ML_text let}
 | 
| 39881 | 412 | require care to maintain compositionality, to prevent loss of | 
| 40126 | 413 | logical indentation where it is especially important to see the | 
| 414 | structure of the text. Example: | |
| 39881 | 415 | |
| 416 |   \begin{verbatim}
 | |
| 417 | (* RIGHT *) | |
| 418 | ||
| 419 | fun foo p1 = | |
| 420 | (case e of | |
| 421 | q1 => ... | |
| 422 | | q2 => ...) | |
| 423 | | foo p2 = | |
| 424 | let | |
| 425 | ... | |
| 426 | in | |
| 427 | ... | |
| 428 | end | |
| 429 | ||
| 430 | ||
| 431 | (* WRONG *) | |
| 432 | ||
| 433 | fun foo p1 = case e of | |
| 434 | q1 => ... | |
| 435 | | q2 => ... | |
| 436 | | foo p2 = | |
| 437 | let | |
| 438 | ... | |
| 439 | in | |
| 440 | ... | |
| 441 | end | |
| 442 |   \end{verbatim}
 | |
| 443 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 444 |   Extra parentheses around @{ML_text case} expressions are optional,
 | 
| 40126 | 445 | but help to analyse the nesting based on character matching in the | 
| 446 | editor. | |
| 39881 | 447 | |
| 448 | \medskip There are two main exceptions to the overall principle of | |
| 449 | compositionality in the layout of complex expressions. | |
| 450 | ||
| 451 |   \begin{enumerate}
 | |
| 452 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 453 |   \item @{ML_text "if"} expressions are iterated as if there would be
 | 
| 39881 | 454 | a multi-branch conditional in SML, e.g. | 
| 455 | ||
| 456 |   \begin{verbatim}
 | |
| 457 | (* RIGHT *) | |
| 458 | ||
| 459 | if b1 then e1 | |
| 460 | else if b2 then e2 | |
| 461 | else e3 | |
| 462 |   \end{verbatim}
 | |
| 463 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 464 |   \item @{ML_text fn} abstractions are often layed-out as if they
 | 
| 39881 | 465 | would lack any structure by themselves. This traditional form is | 
| 466 | motivated by the possibility to shift function arguments back and | |
| 40126 | 467 | forth wrt.\ additional combinators. Example: | 
| 39881 | 468 | |
| 469 |   \begin{verbatim}
 | |
| 470 | (* RIGHT *) | |
| 471 | ||
| 472 | fun foo x y = fold (fn z => | |
| 473 | expr) | |
| 474 |   \end{verbatim}
 | |
| 475 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 476 |   Here the visual appearance is that of three arguments @{ML_text x},
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 477 |   @{ML_text y}, @{ML_text z}.
 | 
| 39881 | 478 | |
| 479 |   \end{enumerate}
 | |
| 480 | ||
| 481 | Such weakly structured layout should be use with great care. Here | |
| 40153 | 482 |   are some counter-examples involving @{ML_text let} expressions:
 | 
| 39881 | 483 | |
| 484 |   \begin{verbatim}
 | |
| 485 | (* WRONG *) | |
| 486 | ||
| 487 | fun foo x = let | |
| 488 | val y = ... | |
| 489 | in ... end | |
| 490 | ||
| 41162 | 491 | |
| 492 | (* WRONG *) | |
| 493 | ||
| 40153 | 494 | fun foo x = let | 
| 495 | val y = ... | |
| 496 | in ... end | |
| 497 | ||
| 41162 | 498 | |
| 499 | (* WRONG *) | |
| 500 | ||
| 39881 | 501 | fun foo x = | 
| 502 | let | |
| 503 | val y = ... | |
| 504 | in ... end | |
| 505 |   \end{verbatim}
 | |
| 506 | ||
| 507 | \medskip In general the source layout is meant to emphasize the | |
| 508 | structure of complex language expressions, not to pretend that SML | |
| 509 | had a completely different syntax (say that of Haskell or Java). | |
| 39878 | 510 | *} | 
| 511 | ||
| 512 | ||
| 39823 | 513 | section {* SML embedded into Isabelle/Isar *}
 | 
| 514 | ||
| 39824 | 515 | text {* ML and Isar are intertwined via an open-ended bootstrap
 | 
| 516 | process that provides more and more programming facilities and | |
| 517 | logical content in an alternating manner. Bootstrapping starts from | |
| 518 | the raw environment of existing implementations of Standard ML | |
| 519 | (mainly Poly/ML, but also SML/NJ). | |
| 39823 | 520 | |
| 39824 | 521 | Isabelle/Pure marks the point where the original ML toplevel is | 
| 40126 | 522 | superseded by the Isar toplevel that maintains a uniform context for | 
| 523 |   arbitrary ML values (see also \secref{sec:context}).  This formal
 | |
| 524 | environment holds ML compiler bindings, logical entities, and many | |
| 525 | other things. Raw SML is never encountered again after the initial | |
| 526 | bootstrap of Isabelle/Pure. | |
| 39823 | 527 | |
| 40126 | 528 | Object-logics like Isabelle/HOL are built within the | 
| 529 | Isabelle/ML/Isar environment by introducing suitable theories with | |
| 530 | associated ML modules, either inlined or as separate files. Thus | |
| 531 | Isabelle/HOL is defined as a regular user-space application within | |
| 532 | the Isabelle framework. Further add-on tools can be implemented in | |
| 533 | ML within the Isar context in the same manner: ML is part of the | |
| 534 | standard repertoire of Isabelle, and there is no distinction between | |
| 535 | ``user'' and ``developer'' in this respect. | |
| 39823 | 536 | *} | 
| 537 | ||
| 39824 | 538 | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 539 | subsection {* Isar ML commands *}
 | 
| 39823 | 540 | |
| 40126 | 541 | text {* The primary Isar source language provides facilities to ``open
 | 
| 542 | a window'' to the underlying ML compiler. Especially see the Isar | |
| 543 |   commands @{command_ref "use"} and @{command_ref "ML"}: both work the
 | |
| 39824 | 544 | same way, only the source text is provided via a file vs.\ inlined, | 
| 545 | respectively. Apart from embedding ML into the main theory | |
| 546 | definition like that, there are many more commands that refer to ML | |
| 547 |   source, such as @{command_ref setup} or @{command_ref declaration}.
 | |
| 40126 | 548 | Even more fine-grained embedding of ML into Isar is encountered in | 
| 549 |   the proof method @{method_ref tactic}, which refines the pending
 | |
| 550 |   goal state via a given expression of type @{ML_type tactic}.
 | |
| 39824 | 551 | *} | 
| 39823 | 552 | |
| 39824 | 553 | text %mlex {* The following artificial example demonstrates some ML
 | 
| 554 | toplevel declarations within the implicit Isar theory context. This | |
| 555 | is regular functional programming without referring to logical | |
| 556 | entities yet. | |
| 39823 | 557 | *} | 
| 558 | ||
| 559 | ML {*
 | |
| 560 | fun factorial 0 = 1 | |
| 561 | | factorial n = n * factorial (n - 1) | |
| 562 | *} | |
| 563 | ||
| 40126 | 564 | text {* Here the ML environment is already managed by Isabelle, i.e.\
 | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 565 |   the @{ML factorial} function is not yet accessible in the preceding
 | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 566 | paragraph, nor in a different theory that is independent from the | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 567 | current one in the import hierarchy. | 
| 39823 | 568 | |
| 569 | Removing the above ML declaration from the source text will remove | |
| 570 | any trace of this definition as expected. The Isabelle/ML toplevel | |
| 571 |   environment is managed in a \emph{stateless} way: unlike the raw ML
 | |
| 40126 | 572 | toplevel there are no global side-effects involved | 
| 573 |   here.\footnote{Such a stateless compilation environment is also a
 | |
| 574 | prerequisite for robust parallel compilation within independent | |
| 575 | nodes of the implicit theory development graph.} | |
| 39823 | 576 | |
| 40126 | 577 | \medskip The next example shows how to embed ML into Isar proofs, using | 
| 578 |  @{command_ref "ML_prf"} instead of Instead of @{command_ref "ML"}.
 | |
| 579 | As illustrated below, the effect on the ML environment is local to | |
| 580 | the whole proof body, ignoring the block structure. | |
| 581 | *} | |
| 39823 | 582 | |
| 40964 | 583 | notepad | 
| 584 | begin | |
| 39851 | 585 |   ML_prf %"ML" {* val a = 1 *}
 | 
| 40126 | 586 |   {
 | 
| 39851 | 587 |     ML_prf %"ML" {* val b = a + 1 *}
 | 
| 39824 | 588 |   } -- {* Isar block structure ignored by ML environment *}
 | 
| 39851 | 589 |   ML_prf %"ML" {* val c = b + 1 *}
 | 
| 40964 | 590 | end | 
| 39823 | 591 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 592 | text {* By side-stepping the normal scoping rules for Isar proof
 | 
| 40126 | 593 | blocks, embedded ML code can refer to the different contexts and | 
| 594 | manipulate corresponding entities, e.g.\ export a fact from a block | |
| 595 | context. | |
| 39823 | 596 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 597 | \medskip Two further ML commands are useful in certain situations: | 
| 40126 | 598 |   @{command_ref ML_val} and @{command_ref ML_command} are
 | 
| 39824 | 599 |   \emph{diagnostic} in the sense that there is no effect on the
 | 
| 600 | underlying environment, and can thus used anywhere (even outside a | |
| 601 | theory). The examples below produce long strings of digits by | |
| 602 |   invoking @{ML factorial}: @{command ML_val} already takes care of
 | |
| 603 |   printing the ML toplevel result, but @{command ML_command} is silent
 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 604 | so we produce an explicit output message. *} | 
| 39823 | 605 | |
| 606 | ML_val {* factorial 100 *}
 | |
| 607 | ML_command {* writeln (string_of_int (factorial 100)) *}
 | |
| 608 | ||
| 40964 | 609 | notepad | 
| 610 | begin | |
| 39824 | 611 |   ML_val {* factorial 100 *}  (* FIXME check/fix indentation *)
 | 
| 39823 | 612 |   ML_command {* writeln (string_of_int (factorial 100)) *}
 | 
| 40964 | 613 | end | 
| 39823 | 614 | |
| 615 | ||
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 616 | subsection {* Compile-time context *}
 | 
| 39823 | 617 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 618 | text {* Whenever the ML compiler is invoked within Isabelle/Isar, the
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 619 | formal context is passed as a thread-local reference variable. Thus | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 620 | ML code may access the theory context during compilation, by reading | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 621 | or writing the (local) theory under construction. Note that such | 
| 40126 | 622 | direct access to the compile-time context is rare. In practice it | 
| 623 | is typically done via some derived ML functions instead. | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 624 | *} | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 625 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 626 | text %mlref {*
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 627 |   \begin{mldecls}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 628 |   @{index_ML ML_Context.the_generic_context: "unit -> Context.generic"} \\
 | 
| 40126 | 629 |   @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
 | 
| 39850 | 630 |   @{index_ML bind_thms: "string * thm list -> unit"} \\
 | 
| 631 |   @{index_ML bind_thm: "string * thm -> unit"} \\
 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 632 |   \end{mldecls}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 633 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 634 |   \begin{description}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 635 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 636 |   \item @{ML "ML_Context.the_generic_context ()"} refers to the theory
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 637 | context of the ML toplevel --- at compile time. ML code needs to | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 638 |   take care to refer to @{ML "ML_Context.the_generic_context ()"}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 639 | correctly. Recall that evaluation of a function body is delayed | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 640 | until actual run-time. | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 641 | |
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 642 |   \item @{ML "Context.>>"}~@{text f} applies context transformation
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 643 |   @{text f} to the implicit context of the ML toplevel.
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 644 | |
| 39850 | 645 |   \item @{ML bind_thms}~@{text "(name, thms)"} stores a list of
 | 
| 646 | theorems produced in ML both in the (global) theory context and the | |
| 647 | ML toplevel, associating it with the provided name. Theorems are | |
| 648 | put into a global ``standard'' format before being stored. | |
| 649 | ||
| 650 |   \item @{ML bind_thm} is similar to @{ML bind_thms} but refers to a
 | |
| 40126 | 651 | singleton fact. | 
| 39850 | 652 | |
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 653 |   \end{description}
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 654 | |
| 40126 | 655 | It is important to note that the above functions are really | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 656 | restricted to the compile time, even though the ML compiler is | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 657 | invoked at run-time. The majority of ML code either uses static | 
| 39825 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 658 |   antiquotations (\secref{sec:ML-antiq}) or refers to the theory or
 | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 659 | proof context at run-time, by explicit functional abstraction. | 
| 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 wenzelm parents: 
39824diff
changeset | 660 | *} | 
| 39823 | 661 | |
| 662 | ||
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 663 | subsection {* Antiquotations \label{sec:ML-antiq} *}
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 664 | |
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 665 | text {* A very important consequence of embedding SML into Isar is the
 | 
| 40126 | 666 |   concept of \emph{ML antiquotation}.  The standard token language of
 | 
| 667 | ML is augmented by special syntactic entities of the following form: | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 668 | |
| 39829 | 669 |   \indexouternonterm{antiquote}
 | 
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 670 |   \begin{rail}
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 671 | antiquote: atsign lbrace nameref args rbrace | lbracesym | rbracesym | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 672 | ; | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 673 |   \end{rail}
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 674 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 675 |   Here @{syntax nameref} and @{syntax args} are regular outer syntax
 | 
| 40126 | 676 |   categories \cite{isabelle-isar-ref}.  Attributes and proof methods
 | 
| 677 | use similar syntax. | |
| 39823 | 678 | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 679 |   \medskip A regular antiquotation @{text "@{name args}"} processes
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 680 | its arguments by the usual means of the Isar source language, and | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 681 | produces corresponding ML source text, either as literal | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 682 |   \emph{inline} text (e.g. @{text "@{term t}"}) or abstract
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 683 |   \emph{value} (e.g. @{text "@{thm th}"}).  This pre-compilation
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 684 | scheme allows to refer to formal entities in a robust manner, with | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 685 | proper static scoping and with some degree of logical checking of | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 686 | small portions of the code. | 
| 39823 | 687 | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 688 |   Special antiquotations like @{text "@{let \<dots>}"} or @{text "@{note
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 689 | \<dots>}"} augment the compilation context without generating code. The | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 690 |   non-ASCII braces @{text "\<lbrace>"} and @{text "\<rbrace>"} allow to delimit the
 | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 691 | effect by introducing local blocks within the pre-compilation | 
| 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 692 | environment. | 
| 39829 | 693 | |
| 40126 | 694 |   \medskip See also \cite{Wenzel-Chaieb:2007b} for a broader
 | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 695 | perspective on Isabelle/ML antiquotations. *} | 
| 39829 | 696 | |
| 39830 | 697 | text %mlantiq {*
 | 
| 39829 | 698 |   \begin{matharray}{rcl}
 | 
| 699 |   @{ML_antiquotation_def "let"} & : & @{text ML_antiquotation} \\
 | |
| 700 |   @{ML_antiquotation_def "note"} & : & @{text ML_antiquotation} \\
 | |
| 701 |   \end{matharray}
 | |
| 702 | ||
| 703 |   \begin{rail}
 | |
| 704 | 'let' ((term + 'and') '=' term + 'and') | |
| 705 | ; | |
| 706 | ||
| 707 | 'note' (thmdef? thmrefs + 'and') | |
| 708 | ; | |
| 709 |   \end{rail}
 | |
| 710 | ||
| 711 |   \begin{description}
 | |
| 712 | ||
| 713 |   \item @{text "@{let p = t}"} binds schematic variables in the
 | |
| 714 |   pattern @{text "p"} by higher-order matching against the term @{text
 | |
| 715 |   "t"}.  This is analogous to the regular @{command_ref let} command
 | |
| 716 | in the Isar proof language. The pre-compilation environment is | |
| 717 | augmented by auxiliary term bindings, without emitting ML source. | |
| 718 | ||
| 719 |   \item @{text "@{note a = b\<^sub>1 \<dots> b\<^sub>n}"} recalls existing facts @{text
 | |
| 720 |   "b\<^sub>1, \<dots>, b\<^sub>n"}, binding the result as @{text a}.  This is analogous to
 | |
| 721 |   the regular @{command_ref note} command in the Isar proof language.
 | |
| 722 | The pre-compilation environment is augmented by auxiliary fact | |
| 723 | bindings, without emitting ML source. | |
| 724 | ||
| 725 |   \end{description}
 | |
| 726 | *} | |
| 727 | ||
| 40126 | 728 | text %mlex {* The following artificial example gives some impression
 | 
| 729 | about the antiquotation elements introduced so far, together with | |
| 730 |   the important @{text "@{thm}"} antiquotation defined later.
 | |
| 39829 | 731 | *} | 
| 732 | ||
| 733 | ML {*
 | |
| 734 | \<lbrace> | |
| 735 |     @{let ?t = my_term}
 | |
| 736 |     @{note my_refl = reflexive [of ?t]}
 | |
| 737 |     fun foo th = Thm.transitive th @{thm my_refl}
 | |
| 738 | \<rbrace> | |
| 739 | *} | |
| 740 | ||
| 40126 | 741 | text {* The extra block delimiters do not affect the compiled code
 | 
| 742 |   itself, i.e.\ function @{ML foo} is available in the present context
 | |
| 743 | of this paragraph. | |
| 39827 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 wenzelm parents: 
39825diff
changeset | 744 | *} | 
| 39823 | 745 | |
| 39835 | 746 | |
| 39883 | 747 | section {* Canonical argument order \label{sec:canonical-argument-order} *}
 | 
| 748 | ||
| 749 | text {* Standard ML is a language in the tradition of @{text
 | |
| 750 |   "\<lambda>"}-calculus and \emph{higher-order functional programming},
 | |
| 751 | similar to OCaml, Haskell, or Isabelle/Pure and HOL as logical | |
| 752 | languages. Getting acquainted with the native style of representing | |
| 753 | functions in that setting can save a lot of extra boiler-plate of | |
| 754 | redundant shuffling of arguments, auxiliary abstractions etc. | |
| 755 | ||
| 40126 | 756 |   Functions are usually \emph{curried}: the idea of turning arguments
 | 
| 757 |   of type @{text "\<tau>\<^sub>i"} (for @{text "i \<in> {1, \<dots> n}"}) into a result of
 | |
| 758 |   type @{text "\<tau>"} is represented by the iterated function space
 | |
| 759 |   @{text "\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>"}.  This is isomorphic to the well-known
 | |
| 760 |   encoding via tuples @{text "\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>"}, but the curried
 | |
| 761 |   version fits more smoothly into the basic calculus.\footnote{The
 | |
| 762 | difference is even more significant in higher-order logic, because | |
| 763 | the redundant tuple structure needs to be accommodated by formal | |
| 764 | reasoning.} | |
| 39883 | 765 | |
| 766 |   Currying gives some flexiblity due to \emph{partial application}.  A
 | |
| 767 |   function @{text "f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^bsub>2\<^esub> \<rightarrow> \<tau>"} can be applied to @{text "x: \<tau>\<^sub>1"}
 | |
| 40126 | 768 |   and the remaining @{text "(f x): \<tau>\<^sub>2 \<rightarrow> \<tau>"} passed to another function
 | 
| 39883 | 769 | etc. How well this works in practice depends on the order of | 
| 770 | arguments. In the worst case, arguments are arranged erratically, | |
| 771 | and using a function in a certain situation always requires some | |
| 772 | glue code. Thus we would get exponentially many oppurtunities to | |
| 773 | decorate the code with meaningless permutations of arguments. | |
| 774 | ||
| 775 |   This can be avoided by \emph{canonical argument order}, which
 | |
| 40126 | 776 | observes certain standard patterns and minimizes adhoc permutations | 
| 40229 | 777 | in their application. In Isabelle/ML, large portions of text can be | 
| 40126 | 778 |   written without ever using @{text "swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times> \<alpha>"}, or the
 | 
| 779 |   combinator @{text "C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)"} that is not even
 | |
| 39883 | 780 | defined in our library. | 
| 781 | ||
| 782 | \medskip The basic idea is that arguments that vary less are moved | |
| 783 | further to the left than those that vary more. Two particularly | |
| 784 |   important categories of functions are \emph{selectors} and
 | |
| 785 |   \emph{updates}.
 | |
| 786 | ||
| 787 | The subsequent scheme is based on a hypothetical set-like container | |
| 788 |   of type @{text "\<beta>"} that manages elements of type @{text "\<alpha>"}.  Both
 | |
| 789 | the names and types of the associated operations are canonical for | |
| 790 | Isabelle/ML. | |
| 791 | ||
| 792 | \medskip | |
| 793 |   \begin{tabular}{ll}
 | |
| 794 | kind & canonical name and type \\\hline | |
| 795 |   selector & @{text "member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool"} \\
 | |
| 796 |   update & @{text "insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} \\
 | |
| 797 |   \end{tabular}
 | |
| 798 | \medskip | |
| 799 | ||
| 800 |   Given a container @{text "B: \<beta>"}, the partially applied @{text
 | |
| 801 |   "member B"} is a predicate over elements @{text "\<alpha> \<rightarrow> bool"}, and
 | |
| 802 | thus represents the intended denotation directly. It is customary | |
| 803 | to pass the abstract predicate to further operations, not the | |
| 804 | concrete container. The argument order makes it easy to use other | |
| 805 |   combinators: @{text "forall (member B) list"} will check a list of
 | |
| 806 |   elements for membership in @{text "B"} etc. Often the explicit
 | |
| 40126 | 807 |   @{text "list"} is pointless and can be contracted to @{text "forall
 | 
| 808 | (member B)"} to get directly a predicate again. | |
| 39883 | 809 | |
| 40126 | 810 | In contrast, an update operation varies the container, so it moves | 
| 39883 | 811 |   to the right: @{text "insert a"} is a function @{text "\<beta> \<rightarrow> \<beta>"} to
 | 
| 812 |   insert a value @{text "a"}.  These can be composed naturally as
 | |
| 40126 | 813 |   @{text "insert c \<circ> insert b \<circ> insert a"}.  The slightly awkward
 | 
| 40229 | 814 | inversion of the composition order is due to conventional | 
| 40126 | 815 | mathematical notation, which can be easily amended as explained | 
| 816 | below. | |
| 39883 | 817 | *} | 
| 818 | ||
| 819 | ||
| 820 | subsection {* Forward application and composition *}
 | |
| 821 | ||
| 822 | text {* Regular function application and infix notation works best for
 | |
| 823 |   relatively deeply structured expressions, e.g.\ @{text "h (f x y + g
 | |
| 40126 | 824 |   z)"}.  The important special case of \emph{linear transformation}
 | 
| 825 |   applies a cascade of functions @{text "f\<^sub>n (\<dots> (f\<^sub>1 x))"}.  This
 | |
| 826 | becomes hard to read and maintain if the functions are themselves | |
| 827 | given as complex expressions. The notation can be significantly | |
| 39883 | 828 |   improved by introducing \emph{forward} versions of application and
 | 
| 829 | composition as follows: | |
| 830 | ||
| 831 | \medskip | |
| 832 |   \begin{tabular}{lll}
 | |
| 833 |   @{text "x |> f"} & @{text "\<equiv>"} & @{text "f x"} \\
 | |
| 41162 | 834 |   @{text "(f #> g) x"} & @{text "\<equiv>"} & @{text "x |> f |> g"} \\
 | 
| 39883 | 835 |   \end{tabular}
 | 
| 836 | \medskip | |
| 837 | ||
| 838 |   This enables to write conveniently @{text "x |> f\<^sub>1 |> \<dots> |> f\<^sub>n"} or
 | |
| 839 |   @{text "f\<^sub>1 #> \<dots> #> f\<^sub>n"} for its functional abstraction over @{text
 | |
| 840 | "x"}. | |
| 841 | ||
| 842 | \medskip There is an additional set of combinators to accommodate | |
| 843 | multiple results (via pairs) that are passed on as multiple | |
| 844 | arguments (via currying). | |
| 845 | ||
| 846 | \medskip | |
| 847 |   \begin{tabular}{lll}
 | |
| 848 |   @{text "(x, y) |-> f"} & @{text "\<equiv>"} & @{text "f x y"} \\
 | |
| 41162 | 849 |   @{text "(f #-> g) x"} & @{text "\<equiv>"} & @{text "x |> f |-> g"} \\
 | 
| 39883 | 850 |   \end{tabular}
 | 
| 851 | \medskip | |
| 852 | *} | |
| 853 | ||
| 854 | text %mlref {*
 | |
| 855 |   \begin{mldecls}
 | |
| 856 |   @{index_ML "op |> ": "'a * ('a -> 'b) -> 'b"} \\
 | |
| 857 |   @{index_ML "op |-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
 | |
| 858 |   @{index_ML "op #> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
 | |
| 859 |   @{index_ML "op #-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
 | |
| 860 |   \end{mldecls}
 | |
| 861 | ||
| 862 | %FIXME description!? | |
| 863 | *} | |
| 864 | ||
| 865 | ||
| 866 | subsection {* Canonical iteration *}
 | |
| 867 | ||
| 868 | text {* As explained above, a function @{text "f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>"} can be
 | |
| 40126 | 869 |   understood as update on a configuration of type @{text "\<beta>"},
 | 
| 39883 | 870 |   parametrized by arguments of type @{text "\<alpha>"}.  Given @{text "a: \<alpha>"}
 | 
| 871 |   the partial application @{text "(f a): \<beta> \<rightarrow> \<beta>"} operates
 | |
| 872 |   homogeneously on @{text "\<beta>"}.  This can be iterated naturally over a
 | |
| 873 |   list of parameters @{text "[a\<^sub>1, \<dots>, a\<^sub>n]"} as @{text "f a\<^sub>1 #> \<dots> #> f
 | |
| 874 |   a\<^bsub>n\<^esub>\<^bsub>\<^esub>"}.  The latter expression is again a function @{text "\<beta> \<rightarrow> \<beta>"}.
 | |
| 875 |   It can be applied to an initial configuration @{text "b: \<beta>"} to
 | |
| 876 |   start the iteration over the given list of arguments: each @{text
 | |
| 877 |   "a"} in @{text "a\<^sub>1, \<dots>, a\<^sub>n"} is applied consecutively by updating a
 | |
| 878 | cumulative configuration. | |
| 879 | ||
| 880 |   The @{text fold} combinator in Isabelle/ML lifts a function @{text
 | |
| 881 | "f"} as above to its iterated version over a list of arguments. | |
| 882 |   Lifting can be repeated, e.g.\ @{text "(fold \<circ> fold) f"} iterates
 | |
| 883 | over a list of lists as expected. | |
| 884 | ||
| 885 |   The variant @{text "fold_rev"} works inside-out over the list of
 | |
| 886 |   arguments, such that @{text "fold_rev f \<equiv> fold f \<circ> rev"} holds.
 | |
| 887 | ||
| 888 |   The @{text "fold_map"} combinator essentially performs @{text
 | |
| 889 |   "fold"} and @{text "map"} simultaneously: each application of @{text
 | |
| 890 | "f"} produces an updated configuration together with a side-result; | |
| 891 | the iteration collects all such side-results as a separate list. | |
| 892 | *} | |
| 893 | ||
| 894 | text %mlref {*
 | |
| 895 |   \begin{mldecls}
 | |
| 896 |   @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | |
| 897 |   @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | |
| 898 |   @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
 | |
| 899 |   \end{mldecls}
 | |
| 900 | ||
| 901 |   \begin{description}
 | |
| 902 | ||
| 903 |   \item @{ML fold}~@{text f} lifts the parametrized update function
 | |
| 904 |   @{text "f"} to a list of parameters.
 | |
| 905 | ||
| 906 |   \item @{ML fold_rev}~@{text "f"} is similar to @{ML fold}~@{text
 | |
| 907 | "f"}, but works inside-out. | |
| 908 | ||
| 909 |   \item @{ML fold_map}~@{text "f"} lifts the parametrized update
 | |
| 910 |   function @{text "f"} (with side-result) to a list of parameters and
 | |
| 911 | cumulative side-results. | |
| 912 | ||
| 913 |   \end{description}
 | |
| 914 | ||
| 915 |   \begin{warn}
 | |
| 916 | The literature on functional programming provides a multitude of | |
| 917 |   combinators called @{text "foldl"}, @{text "foldr"} etc.  SML97
 | |
| 918 |   provides its own variations as @{ML List.foldl} and @{ML
 | |
| 40126 | 919 | List.foldr}, while the classic Isabelle library also has the | 
| 920 |   historic @{ML Library.foldl} and @{ML Library.foldr}.  To avoid
 | |
| 921 |   further confusion, all of this should be ignored, and @{ML fold} (or
 | |
| 922 |   @{ML fold_rev}) used exclusively.
 | |
| 39883 | 923 |   \end{warn}
 | 
| 924 | *} | |
| 925 | ||
| 926 | text %mlex {* The following example shows how to fill a text buffer
 | |
| 927 | incrementally by adding strings, either individually or from a given | |
| 928 | list. | |
| 929 | *} | |
| 930 | ||
| 931 | ML {*
 | |
| 932 | val s = | |
| 933 | Buffer.empty | |
| 934 | |> Buffer.add "digits: " | |
| 935 | |> fold (Buffer.add o string_of_int) (0 upto 9) | |
| 936 | |> Buffer.content; | |
| 937 | ||
| 938 |   @{assert} (s = "digits: 0123456789");
 | |
| 939 | *} | |
| 940 | ||
| 941 | text {* Note how @{ML "fold (Buffer.add o string_of_int)"} above saves
 | |
| 942 |   an extra @{ML "map"} over the given list.  This kind of peephole
 | |
| 943 | optimization reduces both the code size and the tree structures in | |
| 944 | memory (``deforestation''), but requires some practice to read and | |
| 945 | write it fluently. | |
| 946 | ||
| 40126 | 947 | \medskip The next example elaborates the idea of canonical | 
| 948 | iteration, demonstrating fast accumulation of tree content using a | |
| 949 | text buffer. | |
| 39883 | 950 | *} | 
| 951 | ||
| 952 | ML {*
 | |
| 953 | datatype tree = Text of string | Elem of string * tree list; | |
| 954 | ||
| 955 | fun slow_content (Text txt) = txt | |
| 956 | | slow_content (Elem (name, ts)) = | |
| 957 | "<" ^ name ^ ">" ^ | |
| 958 | implode (map slow_content ts) ^ | |
| 959 | "</" ^ name ^ ">" | |
| 960 | ||
| 961 | fun add_content (Text txt) = Buffer.add txt | |
| 962 | | add_content (Elem (name, ts)) = | |
| 963 |         Buffer.add ("<" ^ name ^ ">") #>
 | |
| 964 | fold add_content ts #> | |
| 965 |         Buffer.add ("</" ^ name ^ ">");
 | |
| 966 | ||
| 967 | fun fast_content tree = | |
| 968 | Buffer.empty |> add_content tree |> Buffer.content; | |
| 969 | *} | |
| 970 | ||
| 971 | text {* The slow part of @{ML slow_content} is the @{ML implode} of
 | |
| 972 | the recursive results, because it copies previously produced strings | |
| 40126 | 973 | again. | 
| 39883 | 974 | |
| 975 |   The incremental @{ML add_content} avoids this by operating on a
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 976 |   buffer that is passed through in a linear fashion.  Using @{ML_text
 | 
| 40126 | 977 | "#>"} and contraction over the actual buffer argument saves some | 
| 978 |   additional boiler-plate.  Of course, the two @{ML "Buffer.add"}
 | |
| 979 | invocations with concatenated strings could have been split into | |
| 980 | smaller parts, but this would have obfuscated the source without | |
| 981 | making a big difference in allocations. Here we have done some | |
| 39883 | 982 | peephole-optimization for the sake of readability. | 
| 983 | ||
| 984 |   Another benefit of @{ML add_content} is its ``open'' form as a
 | |
| 40126 | 985 | function on buffers that can be continued in further linear | 
| 986 | transformations, folding etc. Thus it is more compositional than | |
| 987 |   the naive @{ML slow_content}.  As realistic example, compare the
 | |
| 988 |   old-style @{ML "Term.maxidx_of_term: term -> int"} with the newer
 | |
| 989 |   @{ML "Term.maxidx_term: term -> int -> int"} in Isabelle/Pure.
 | |
| 39883 | 990 | |
| 40126 | 991 |   Note that @{ML fast_content} above is only defined as example.  In
 | 
| 992 | many practical situations, it is customary to provide the | |
| 993 |   incremental @{ML add_content} only and leave the initialization and
 | |
| 994 | termination to the concrete application by the user. | |
| 39883 | 995 | *} | 
| 996 | ||
| 997 | ||
| 39854 | 998 | section {* Message output channels \label{sec:message-channels} *}
 | 
| 39835 | 999 | |
| 1000 | text {* Isabelle provides output channels for different kinds of
 | |
| 1001 | messages: regular output, high-volume tracing information, warnings, | |
| 1002 | and errors. | |
| 1003 | ||
| 1004 | Depending on the user interface involved, these messages may appear | |
| 1005 | in different text styles or colours. The standard output for | |
| 1006 |   terminal sessions prefixes each line of warnings by @{verbatim
 | |
| 1007 |   "###"} and errors by @{verbatim "***"}, but leaves anything else
 | |
| 1008 | unchanged. | |
| 1009 | ||
| 1010 | Messages are associated with the transaction context of the running | |
| 1011 | Isar command. This enables the front-end to manage commands and | |
| 1012 | resulting messages together. For example, after deleting a command | |
| 1013 | from a given theory document version, the corresponding message | |
| 39872 | 1014 | output can be retracted from the display. | 
| 1015 | *} | |
| 39835 | 1016 | |
| 1017 | text %mlref {*
 | |
| 1018 |   \begin{mldecls}
 | |
| 1019 |   @{index_ML writeln: "string -> unit"} \\
 | |
| 1020 |   @{index_ML tracing: "string -> unit"} \\
 | |
| 1021 |   @{index_ML warning: "string -> unit"} \\
 | |
| 1022 |   @{index_ML error: "string -> 'a"} \\
 | |
| 1023 |   \end{mldecls}
 | |
| 1024 | ||
| 1025 |   \begin{description}
 | |
| 1026 | ||
| 1027 |   \item @{ML writeln}~@{text "text"} outputs @{text "text"} as regular
 | |
| 1028 | message. This is the primary message output operation of Isabelle | |
| 1029 | and should be used by default. | |
| 1030 | ||
| 1031 |   \item @{ML tracing}~@{text "text"} outputs @{text "text"} as special
 | |
| 1032 | tracing message, indicating potential high-volume output to the | |
| 1033 | front-end (hundreds or thousands of messages issued by a single | |
| 1034 | command). The idea is to allow the user-interface to downgrade the | |
| 1035 | quality of message display to achieve higher throughput. | |
| 1036 | ||
| 1037 | Note that the user might have to take special actions to see tracing | |
| 1038 | output, e.g.\ switch to a different output window. So this channel | |
| 1039 | should not be used for regular output. | |
| 1040 | ||
| 1041 |   \item @{ML warning}~@{text "text"} outputs @{text "text"} as
 | |
| 1042 | warning, which typically means some extra emphasis on the front-end | |
| 40126 | 1043 | side (color highlighting, icons, etc.). | 
| 39835 | 1044 | |
| 1045 |   \item @{ML error}~@{text "text"} raises exception @{ML ERROR}~@{text
 | |
| 1046 |   "text"} and thus lets the Isar toplevel print @{text "text"} on the
 | |
| 1047 | error channel, which typically means some extra emphasis on the | |
| 40126 | 1048 | front-end side (color highlighting, icons, etc.). | 
| 39835 | 1049 | |
| 1050 | This assumes that the exception is not handled before the command | |
| 1051 |   terminates.  Handling exception @{ML ERROR}~@{text "text"} is a
 | |
| 1052 | perfectly legal alternative: it means that the error is absorbed | |
| 1053 | without any message output. | |
| 1054 | ||
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1055 |   \begin{warn}
 | 
| 39835 | 1056 |   The actual error channel is accessed via @{ML Output.error_msg}, but
 | 
| 1057 |   the interaction protocol of Proof~General \emph{crashes} if that
 | |
| 1058 | function is used in regular ML code: error output and toplevel | |
| 40126 | 1059 | command failure always need to coincide. | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1060 |   \end{warn}
 | 
| 39835 | 1061 | |
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1062 |   \end{description}
 | 
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1063 | |
| 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1064 |   \begin{warn}
 | 
| 39835 | 1065 | Regular Isabelle/ML code should output messages exclusively by the | 
| 1066 |   official channels.  Using raw I/O on \emph{stdout} or \emph{stderr}
 | |
| 1067 |   instead (e.g.\ via @{ML TextIO.output}) is apt to cause problems in
 | |
| 1068 | the presence of parallel and asynchronous processing of Isabelle | |
| 1069 | theories. Such raw output might be displayed by the front-end in | |
| 1070 | some system console log, with a low chance that the user will ever | |
| 1071 | see it. Moreover, as a genuine side-effect on global process | |
| 1072 | channels, there is no proper way to retract output when Isar command | |
| 40126 | 1073 | transactions are reset by the system. | 
| 39861 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 wenzelm parents: 
39859diff
changeset | 1074 |   \end{warn}
 | 
| 39872 | 1075 | |
| 1076 |   \begin{warn}
 | |
| 1077 | The message channels should be used in a message-oriented manner. | |
| 40126 | 1078 | This means that multi-line output that logically belongs together is | 
| 1079 |   issued by a \emph{single} invocation of @{ML writeln} etc.\ with the
 | |
| 1080 | functional concatenation of all message constituents. | |
| 39872 | 1081 |   \end{warn}
 | 
| 1082 | *} | |
| 1083 | ||
| 1084 | text %mlex {* The following example demonstrates a multi-line
 | |
| 1085 | warning. Note that in some situations the user sees only the first | |
| 1086 | line, so the most important point should be made first. | |
| 1087 | *} | |
| 1088 | ||
| 1089 | ML_command {*
 | |
| 1090 | warning (cat_lines | |
| 1091 | ["Beware the Jabberwock, my son!", | |
| 1092 | "The jaws that bite, the claws that catch!", | |
| 1093 | "Beware the Jubjub Bird, and shun", | |
| 1094 | "The frumious Bandersnatch!"]); | |
| 39835 | 1095 | *} | 
| 1096 | ||
| 39854 | 1097 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1098 | section {* Exceptions \label{sec:exceptions} *}
 | 
| 39854 | 1099 | |
| 1100 | text {* The Standard ML semantics of strict functional evaluation
 | |
| 1101 | together with exceptions is rather well defined, but some delicate | |
| 1102 | points need to be observed to avoid that ML programs go wrong | |
| 1103 | despite static type-checking. Exceptions in Isabelle/ML are | |
| 1104 | subsequently categorized as follows. | |
| 1105 | ||
| 1106 |   \paragraph{Regular user errors.}  These are meant to provide
 | |
| 1107 | informative feedback about malformed input etc. | |
| 1108 | ||
| 1109 |   The \emph{error} function raises the corresponding \emph{ERROR}
 | |
| 1110 |   exception, with a plain text message as argument.  \emph{ERROR}
 | |
| 1111 | exceptions can be handled internally, in order to be ignored, turned | |
| 1112 | into other exceptions, or cascaded by appending messages. If the | |
| 1113 |   corresponding Isabelle/Isar command terminates with an \emph{ERROR}
 | |
| 39855 | 1114 | exception state, the toplevel will print the result on the error | 
| 1115 |   channel (see \secref{sec:message-channels}).
 | |
| 39854 | 1116 | |
| 1117 | It is considered bad style to refer to internal function names or | |
| 1118 | values in ML source notation in user error messages. | |
| 1119 | ||
| 1120 | Grammatical correctness of error messages can be improved by | |
| 1121 |   \emph{omitting} final punctuation: messages are often concatenated
 | |
| 1122 | or put into a larger context (e.g.\ augmented with source position). | |
| 1123 | By not insisting in the final word at the origin of the error, the | |
| 1124 | system can perform its administrative tasks more easily and | |
| 1125 | robustly. | |
| 1126 | ||
| 1127 |   \paragraph{Program failures.}  There is a handful of standard
 | |
| 1128 | exceptions that indicate general failure situations, or failures of | |
| 1129 | core operations on logical entities (types, terms, theorems, | |
| 39856 | 1130 |   theories, see \chref{ch:logic}).
 | 
| 39854 | 1131 | |
| 1132 | These exceptions indicate a genuine breakdown of the program, so the | |
| 1133 | main purpose is to determine quickly what has happened where. | |
| 39855 | 1134 | Traditionally, the (short) exception message would include the name | 
| 40126 | 1135 | of an ML function, although this is no longer necessary, because the | 
| 1136 | ML runtime system prints a detailed source position of the | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1137 |   corresponding @{ML_text raise} keyword.
 | 
| 39854 | 1138 | |
| 1139 | \medskip User modules can always introduce their own custom | |
| 1140 | exceptions locally, e.g.\ to organize internal failures robustly | |
| 1141 | without overlapping with existing exceptions. Exceptions that are | |
| 1142 | exposed in module signatures require extra care, though, and should | |
| 40126 | 1143 |   \emph{not} be introduced by default.  Surprise by users of a module
 | 
| 1144 | can be often minimized by using plain user errors instead. | |
| 39854 | 1145 | |
| 1146 |   \paragraph{Interrupts.}  These indicate arbitrary system events:
 | |
| 1147 | both the ML runtime system and the Isabelle/ML infrastructure signal | |
| 1148 | various exceptional situations by raising the special | |
| 1149 |   \emph{Interrupt} exception in user code.
 | |
| 1150 | ||
| 1151 | This is the one and only way that physical events can intrude an | |
| 1152 | Isabelle/ML program. Such an interrupt can mean out-of-memory, | |
| 1153 | stack overflow, timeout, internal signaling of threads, or the user | |
| 39855 | 1154 | producing a console interrupt manually etc. An Isabelle/ML program | 
| 1155 | that intercepts interrupts becomes dependent on physical effects of | |
| 1156 | the environment. Even worse, exception handling patterns that are | |
| 1157 | too general by accident, e.g.\ by mispelled exception constructors, | |
| 40126 | 1158 | will cover interrupts unintentionally and thus render the program | 
| 39855 | 1159 | semantics ill-defined. | 
| 39854 | 1160 | |
| 1161 | Note that the Interrupt exception dates back to the original SML90 | |
| 1162 | language definition. It was excluded from the SML97 version to | |
| 1163 | avoid its malign impact on ML program semantics, but without | |
| 1164 | providing a viable alternative. Isabelle/ML recovers physical | |
| 40229 | 1165 | interruptibility (which is an indispensable tool to implement | 
| 1166 | managed evaluation of command transactions), but requires user code | |
| 1167 | to be strictly transparent wrt.\ interrupts. | |
| 39854 | 1168 | |
| 1169 |   \begin{warn}
 | |
| 1170 | Isabelle/ML user code needs to terminate promptly on interruption, | |
| 1171 | without guessing at its meaning to the system infrastructure. | |
| 1172 | Temporary handling of interrupts for cleanup of global resources | |
| 1173 | etc.\ needs to be followed immediately by re-raising of the original | |
| 1174 | exception. | |
| 1175 |   \end{warn}
 | |
| 1176 | *} | |
| 1177 | ||
| 39855 | 1178 | text %mlref {*
 | 
| 1179 |   \begin{mldecls}
 | |
| 1180 |   @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
 | |
| 1181 |   @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
 | |
| 39856 | 1182 |   @{index_ML ERROR: "string -> exn"} \\
 | 
| 1183 |   @{index_ML Fail: "string -> exn"} \\
 | |
| 1184 |   @{index_ML Exn.is_interrupt: "exn -> bool"} \\
 | |
| 39855 | 1185 |   @{index_ML reraise: "exn -> 'a"} \\
 | 
| 1186 |   @{index_ML exception_trace: "(unit -> 'a) -> 'a"} \\
 | |
| 1187 |   \end{mldecls}
 | |
| 1188 | ||
| 1189 |   \begin{description}
 | |
| 1190 | ||
| 1191 |   \item @{ML try}~@{text "f x"} makes the partiality of evaluating
 | |
| 1192 |   @{text "f x"} explicit via the option datatype.  Interrupts are
 | |
| 1193 |   \emph{not} handled here, i.e.\ this form serves as safe replacement
 | |
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1194 |   for the \emph{unsafe} version @{ML_text "(SOME"}~@{text "f
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1195 |   x"}~@{ML_text "handle _ => NONE)"} that is occasionally seen in
 | 
| 39855 | 1196 | books about SML. | 
| 1197 | ||
| 1198 |   \item @{ML can} is similar to @{ML try} with more abstract result.
 | |
| 1199 | ||
| 39856 | 1200 |   \item @{ML ERROR}~@{text "msg"} represents user errors; this
 | 
| 40126 | 1201 |   exception is normally raised indirectly via the @{ML error} function
 | 
| 1202 |   (see \secref{sec:message-channels}).
 | |
| 39856 | 1203 | |
| 1204 |   \item @{ML Fail}~@{text "msg"} represents general program failures.
 | |
| 1205 | ||
| 1206 |   \item @{ML Exn.is_interrupt} identifies interrupts robustly, without
 | |
| 1207 | mentioning concrete exception constructors in user code. Handled | |
| 1208 | interrupts need to be re-raised promptly! | |
| 1209 | ||
| 39855 | 1210 |   \item @{ML reraise}~@{text "exn"} raises exception @{text "exn"}
 | 
| 1211 | while preserving its implicit position information (if possible, | |
| 1212 | depending on the ML platform). | |
| 1213 | ||
| 40149 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1214 |   \item @{ML exception_trace}~@{ML_text "(fn () =>"}~@{text
 | 
| 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 wenzelm parents: 
40126diff
changeset | 1215 |   "e"}@{ML_text ")"} evaluates expression @{text "e"} while printing
 | 
| 39855 | 1216 | a full trace of its stack of nested exceptions (if possible, | 
| 40126 | 1217 |   depending on the ML platform).\footnote{In versions of Poly/ML the
 | 
| 1218 | trace will appear on raw stdout of the Isabelle process.} | |
| 39855 | 1219 | |
| 1220 |   Inserting @{ML exception_trace} into ML code temporarily is useful
 | |
| 1221 | for debugging, but not suitable for production code. | |
| 1222 | ||
| 1223 |   \end{description}
 | |
| 1224 | *} | |
| 1225 | ||
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1226 | text %mlantiq {*
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1227 |   \begin{matharray}{rcl}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1228 |   @{ML_antiquotation_def "assert"} & : & @{text ML_antiquotation} \\
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1229 |   \end{matharray}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1230 | |
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1231 |   \begin{description}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1232 | |
| 40110 | 1233 |   \item @{text "@{assert}"} inlines a function
 | 
| 1234 |   @{ML_type "bool -> unit"} that raises @{ML Fail} if the argument is
 | |
| 1235 |   @{ML false}.  Due to inlining the source position of failed
 | |
| 1236 | assertions is included in the error output. | |
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1237 | |
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1238 |   \end{description}
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1239 | *} | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1240 | |
| 39859 | 1241 | |
| 39863 | 1242 | section {* Basic data types *}
 | 
| 39859 | 1243 | |
| 40126 | 1244 | text {* The basis library proposal of SML97 needs to be treated with
 | 
| 39859 | 1245 | caution. Many of its operations simply do not fit with important | 
| 1246 | Isabelle/ML conventions (like ``canonical argument order'', see | |
| 40126 | 1247 |   \secref{sec:canonical-argument-order}), others cause problems with
 | 
| 1248 |   the parallel evaluation model of Isabelle/ML (such as @{ML
 | |
| 1249 |   TextIO.print} or @{ML OS.Process.system}).
 | |
| 39859 | 1250 | |
| 1251 | Subsequently we give a brief overview of important operations on | |
| 1252 | basic ML data types. | |
| 1253 | *} | |
| 1254 | ||
| 1255 | ||
| 39863 | 1256 | subsection {* Characters *}
 | 
| 1257 | ||
| 1258 | text %mlref {*
 | |
| 1259 |   \begin{mldecls}
 | |
| 1260 |   @{index_ML_type char} \\
 | |
| 1261 |   \end{mldecls}
 | |
| 1262 | ||
| 1263 |   \begin{description}
 | |
| 1264 | ||
| 39864 | 1265 |   \item Type @{ML_type char} is \emph{not} used.  The smallest textual
 | 
| 40126 | 1266 | unit in Isabelle is represented as a ``symbol'' (see | 
| 39864 | 1267 |   \secref{sec:symbols}).
 | 
| 39863 | 1268 | |
| 1269 |   \end{description}
 | |
| 1270 | *} | |
| 1271 | ||
| 1272 | ||
| 39862 | 1273 | subsection {* Integers *}
 | 
| 1274 | ||
| 1275 | text %mlref {*
 | |
| 1276 |   \begin{mldecls}
 | |
| 1277 |   @{index_ML_type int} \\
 | |
| 1278 |   \end{mldecls}
 | |
| 1279 | ||
| 1280 |   \begin{description}
 | |
| 1281 | ||
| 39864 | 1282 |   \item Type @{ML_type int} represents regular mathematical integers,
 | 
| 1283 |   which are \emph{unbounded}.  Overflow never happens in
 | |
| 39862 | 1284 |   practice.\footnote{The size limit for integer bit patterns in memory
 | 
| 1285 | is 64\,MB for 32-bit Poly/ML, and much higher for 64-bit systems.} | |
| 1286 | This works uniformly for all supported ML platforms (Poly/ML and | |
| 1287 | SML/NJ). | |
| 1288 | ||
| 40126 | 1289 | Literal integers in ML text are forced to be of this one true | 
| 39862 | 1290 | integer type --- overloading of SML97 is disabled. | 
| 1291 | ||
| 40126 | 1292 |   Structure @{ML_struct IntInf} of SML97 is obsolete and superseded by
 | 
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1293 |   @{ML_struct Int}.  Structure @{ML_struct Integer} in @{file
 | 
| 39862 | 1294 | "~~/src/Pure/General/integer.ML"} provides some additional | 
| 1295 | operations. | |
| 1296 | ||
| 1297 |   \end{description}
 | |
| 1298 | *} | |
| 1299 | ||
| 1300 | ||
| 40302 | 1301 | subsection {* Time *}
 | 
| 1302 | ||
| 1303 | text %mlref {*
 | |
| 1304 |   \begin{mldecls}
 | |
| 1305 |   @{index_ML_type Time.time} \\
 | |
| 1306 |   @{index_ML seconds: "real -> Time.time"} \\
 | |
| 1307 |   \end{mldecls}
 | |
| 1308 | ||
| 1309 |   \begin{description}
 | |
| 1310 | ||
| 1311 |   \item Type @{ML_type Time.time} represents time abstractly according
 | |
| 1312 | to the SML97 basis library definition. This is adequate for | |
| 1313 | internal ML operations, but awkward in concrete time specifications. | |
| 1314 | ||
| 1315 |   \item @{ML seconds}~@{text "s"} turns the concrete scalar @{text
 | |
| 1316 | "s"} (measured in seconds) into an abstract time value. Floating | |
| 1317 | point numbers are easy to use as context parameters (e.g.\ via | |
| 1318 |   configuration options, see \secref{sec:config-options}) or
 | |
| 1319 | preferences that are maintained by external tools as well. | |
| 1320 | ||
| 1321 |   \end{description}
 | |
| 1322 | *} | |
| 1323 | ||
| 1324 | ||
| 39859 | 1325 | subsection {* Options *}
 | 
| 1326 | ||
| 1327 | text %mlref {*
 | |
| 1328 |   \begin{mldecls}
 | |
| 1329 |   @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
 | |
| 1330 |   @{index_ML is_some: "'a option -> bool"} \\
 | |
| 1331 |   @{index_ML is_none: "'a option -> bool"} \\
 | |
| 1332 |   @{index_ML the: "'a option -> 'a"} \\
 | |
| 1333 |   @{index_ML these: "'a list option -> 'a list"} \\
 | |
| 1334 |   @{index_ML the_list: "'a option -> 'a list"} \\
 | |
| 1335 |   @{index_ML the_default: "'a -> 'a option -> 'a"} \\
 | |
| 1336 |   \end{mldecls}
 | |
| 1337 | *} | |
| 1338 | ||
| 1339 | text {* Apart from @{ML Option.map} most operations defined in
 | |
| 1340 |   structure @{ML_struct Option} are alien to Isabelle/ML.  The
 | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1341 |   operations shown above are defined in @{file
 | 
| 39859 | 1342 | "~~/src/Pure/General/basics.ML"}, among others. *} | 
| 1343 | ||
| 1344 | ||
| 39863 | 1345 | subsection {* Lists *}
 | 
| 1346 | ||
| 1347 | text {* Lists are ubiquitous in ML as simple and light-weight
 | |
| 1348 | ``collections'' for many everyday programming tasks. Isabelle/ML | |
| 39874 | 1349 | provides important additions and improvements over operations that | 
| 1350 | are predefined in the SML97 library. *} | |
| 39863 | 1351 | |
| 1352 | text %mlref {*
 | |
| 1353 |   \begin{mldecls}
 | |
| 1354 |   @{index_ML cons: "'a -> 'a list -> 'a list"} \\
 | |
| 39874 | 1355 |   @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
 | 
| 1356 |   @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | |
| 1357 |   @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
 | |
| 1358 |   @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | |
| 39863 | 1359 |   \end{mldecls}
 | 
| 1360 | ||
| 1361 |   \begin{description}
 | |
| 1362 | ||
| 1363 |   \item @{ML cons}~@{text "x xs"} evaluates to @{text "x :: xs"}.
 | |
| 1364 | ||
| 1365 | Tupled infix operators are a historical accident in Standard ML. | |
| 1366 |   The curried @{ML cons} amends this, but it should be only used when
 | |
| 1367 | partial application is required. | |
| 1368 | ||
| 39874 | 1369 |   \item @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat
 | 
| 1370 | lists as a set-like container that maintains the order of elements. | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1371 |   See @{file "~~/src/Pure/library.ML"} for the full specifications
 | 
| 39874 | 1372 | (written in ML). There are some further derived operations like | 
| 1373 |   @{ML union} or @{ML inter}.
 | |
| 1374 | ||
| 1375 |   Note that @{ML insert} is conservative about elements that are
 | |
| 1376 |   already a @{ML member} of the list, while @{ML update} ensures that
 | |
| 40126 | 1377 | the latest entry is always put in front. The latter discipline is | 
| 39874 | 1378 | often more appropriate in declarations of context data | 
| 1379 |   (\secref{sec:context-data}) that are issued by the user in Isar
 | |
| 1380 | source: more recent declarations normally take precedence over | |
| 1381 | earlier ones. | |
| 1382 | ||
| 39863 | 1383 |   \end{description}
 | 
| 1384 | *} | |
| 1385 | ||
| 40126 | 1386 | text %mlex {* Using canonical @{ML fold} together with @{ML cons}, or
 | 
| 1387 | similar standard operations, alternates the orientation of data. | |
| 1388 | The is quite natural and should not be altered forcible by inserting | |
| 1389 |   extra applications of @{ML rev}.  The alternative @{ML fold_rev} can
 | |
| 1390 | be used in the few situations, where alternation should be | |
| 1391 | prevented. | |
| 39863 | 1392 | *} | 
| 1393 | ||
| 1394 | ML {*
 | |
| 1395 | val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; | |
| 1396 | ||
| 1397 | val list1 = fold cons items []; | |
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1398 |   @{assert} (list1 = rev items);
 | 
| 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1399 | |
| 39863 | 1400 | val list2 = fold_rev cons items []; | 
| 39866 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 wenzelm parents: 
39864diff
changeset | 1401 |   @{assert} (list2 = items);
 | 
| 39863 | 1402 | *} | 
| 1403 | ||
| 39883 | 1404 | text {* The subsequent example demonstrates how to \emph{merge} two
 | 
| 1405 | lists in a natural way. *} | |
| 1406 | ||
| 1407 | ML {*
 | |
| 1408 | fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs; | |
| 1409 | *} | |
| 1410 | ||
| 1411 | text {* Here the first list is treated conservatively: only the new
 | |
| 1412 | elements from the second list are inserted. The inside-out order of | |
| 1413 |   insertion via @{ML fold_rev} attempts to preserve the order of
 | |
| 1414 | elements in the result. | |
| 1415 | ||
| 1416 | This way of merging lists is typical for context data | |
| 1417 |   (\secref{sec:context-data}).  See also @{ML merge} as defined in
 | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1418 |   @{file "~~/src/Pure/library.ML"}.
 | 
| 39883 | 1419 | *} | 
| 1420 | ||
| 39863 | 1421 | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1422 | subsection {* Association lists *}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1423 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1424 | text {* The operations for association lists interpret a concrete list
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1425 | of pairs as a finite function from keys to values. Redundant | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1426 | representations with multiple occurrences of the same key are | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1427 | implicitly normalized: lookup and update only take the first | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1428 | occurrence into account. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1429 | *} | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1430 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1431 | text {*
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1432 |   \begin{mldecls}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1433 |   @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1434 |   @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1435 |   @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1436 |   \end{mldecls}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1437 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1438 |   \begin{description}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1439 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1440 |   \item @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1441 | implement the main ``framework operations'' for mappings in | 
| 40126 | 1442 | Isabelle/ML, following standard conventions for their names and | 
| 1443 | types. | |
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1444 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1445 |   Note that a function called @{text lookup} is obliged to express its
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1446 | partiality via an explicit option element. There is no choice to | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1447 | raise an exception, without changing the name to something like | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1448 |   @{text "the_element"} or @{text "get"}.
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1449 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1450 |   The @{text "defined"} operation is essentially a contraction of @{ML
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1451 |   is_some} and @{text "lookup"}, but this is sufficiently frequent to
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1452 | justify its independent existence. This also gives the | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1453 | implementation some opportunity for peep-hole optimization. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1454 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1455 |   \end{description}
 | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1456 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1457 | Association lists are adequate as simple and light-weight | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1458 | implementation of finite mappings in many practical situations. A | 
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1459 |   more heavy-duty table structure is defined in @{file
 | 
| 39875 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1460 | "~~/src/Pure/General/table.ML"}; that version scales easily to | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1461 | thousands or millions of elements. | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1462 | *} | 
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1463 | |
| 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 wenzelm parents: 
39874diff
changeset | 1464 | |
| 39859 | 1465 | subsection {* Unsynchronized references *}
 | 
| 1466 | ||
| 1467 | text %mlref {*
 | |
| 1468 |   \begin{mldecls}
 | |
| 39870 | 1469 |   @{index_ML_type "'a Unsynchronized.ref"} \\
 | 
| 39859 | 1470 |   @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
 | 
| 1471 |   @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
 | |
| 1472 |   @{index_ML "op :=": "'a Unsynchronized.ref * 'a -> unit"} \\
 | |
| 1473 |   \end{mldecls}
 | |
| 1474 | *} | |
| 1475 | ||
| 1476 | text {* Due to ubiquitous parallelism in Isabelle/ML (see also
 | |
| 1477 |   \secref{sec:multi-threading}), the mutable reference cells of
 | |
| 1478 | Standard ML are notorious for causing problems. In a highly | |
| 1479 |   parallel system, both correctness \emph{and} performance are easily
 | |
| 1480 | degraded when using mutable data. | |
| 1481 | ||
| 1482 |   The unwieldy name of @{ML Unsynchronized.ref} for the constructor
 | |
| 1483 | for references in Isabelle/ML emphasizes the inconveniences caused by | |
| 1484 |   mutability.  Existing operations @{ML "!"}  and @{ML "op :="} are
 | |
| 1485 | unchanged, but should be used with special precautions, say in a | |
| 1486 | strictly local situation that is guaranteed to be restricted to | |
| 40508 | 1487 | sequential evaluation --- now and in the future. | 
| 1488 | ||
| 1489 |   \begin{warn}
 | |
| 1490 |   Never @{ML_text "open Unsynchronized"}, not even in a local scope!
 | |
| 1491 | Pretending that mutable state is no problem is a very bad idea. | |
| 1492 |   \end{warn}
 | |
| 1493 | *} | |
| 39859 | 1494 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1495 | |
| 39870 | 1496 | section {* Thread-safe programming \label{sec:multi-threading} *}
 | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1497 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1498 | text {* Multi-threaded execution has become an everyday reality in
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1499 | Isabelle since Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides | 
| 39868 | 1500 | implicit and explicit parallelism by default, and there is no way | 
| 1501 | for user-space tools to ``opt out''. ML programs that are purely | |
| 1502 | functional, output messages only via the official channels | |
| 1503 |   (\secref{sec:message-channels}), and do not intercept interrupts
 | |
| 1504 |   (\secref{sec:exceptions}) can participate in the multi-threaded
 | |
| 1505 | environment immediately without further ado. | |
| 1506 | ||
| 1507 | More ambitious tools with more fine-grained interaction with the | |
| 1508 | environment need to observe the principles explained below. | |
| 1509 | *} | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1510 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1511 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1512 | subsection {* Multi-threading with shared memory *}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1513 | |
| 39868 | 1514 | text {* Multiple threads help to organize advanced operations of the
 | 
| 1515 | system, such as real-time conditions on command transactions, | |
| 1516 | sub-components with explicit communication, general asynchronous | |
| 1517 | interaction etc. Moreover, parallel evaluation is a prerequisite to | |
| 1518 | make adequate use of the CPU resources that are available on | |
| 1519 |   multi-core systems.\footnote{Multi-core computing does not mean that
 | |
| 1520 | there are ``spare cycles'' to be wasted. It means that the | |
| 1521 | continued exponential speedup of CPU performance due to ``Moore's | |
| 1522 | Law'' follows different rules: clock frequency has reached its peak | |
| 1523 | around 2005, and applications need to be parallelized in order to | |
| 1524 | avoid a perceived loss of performance. See also | |
| 1525 |   \cite{Sutter:2005}.}
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1526 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1527 | Isabelle/Isar exploits the inherent structure of theories and proofs | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1528 |   to support \emph{implicit parallelism} to a large extent.  LCF-style
 | 
| 40126 | 1529 | theorem provides almost ideal conditions for that, see also | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1530 |   \cite{Wenzel:2009}.  This means, significant parts of theory and
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1531 | proof checking is parallelized by default. A maximum speedup-factor | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1532 | of 3.0 on 4 cores and 5.0 on 8 cores can be | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1533 |   expected.\footnote{Further scalability is limited due to garbage
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1534 | collection, which is still sequential in Poly/ML 5.2/5.3/5.4. It | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1535 | helps to provide initial heap space generously, using the | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1536 |   \texttt{-H} option.  Initial heap size needs to be scaled-up
 | 
| 39868 | 1537 | together with the number of CPU cores: approximately 1--2\,GB per | 
| 1538 | core..} | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1539 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1540 | \medskip ML threads lack the memory protection of separate | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1541 | processes, and operate concurrently on shared heap memory. This has | 
| 40126 | 1542 | the advantage that results of independent computations are directly | 
| 1543 | available to other threads: abstract values can be passed without | |
| 1544 | copying or awkward serialization that is typically required for | |
| 1545 | separate processes. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1546 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1547 | To make shared-memory multi-threading work robustly and efficiently, | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1548 | some programming guidelines need to be observed. While the ML | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1549 | system is responsible to maintain basic integrity of the | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1550 | representation of ML values in memory, the application programmer | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1551 | needs to ensure that multi-threaded execution does not break the | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1552 | intended semantics. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1553 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1554 |   \begin{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1555 | To participate in implicit parallelism, tools need to be | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1556 | thread-safe. A single ill-behaved tool can affect the stability and | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1557 | performance of the whole system. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1558 |   \end{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1559 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1560 | Apart from observing the principles of thread-safeness passively, | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1561 | advanced tools may also exploit parallelism actively, e.g.\ by using | 
| 39868 | 1562 |   ``future values'' (\secref{sec:futures}) or the more basic library
 | 
| 1563 |   functions for parallel list operations (\secref{sec:parlist}).
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1564 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1565 |   \begin{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1566 | Parallel computing resources are managed centrally by the | 
| 39868 | 1567 | Isabelle/ML infrastructure. User programs must not fork their own | 
| 1568 | ML threads to perform computations. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1569 |   \end{warn}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1570 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1571 | |
| 39868 | 1572 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1573 | subsection {* Critical shared resources *}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1574 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1575 | text {* Thread-safeness is mainly concerned about concurrent
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1576 | read/write access to shared resources, which are outside the purely | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1577 | functional world of ML. This covers the following in particular. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1578 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1579 |   \begin{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1580 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1581 | \item Global references (or arrays), i.e.\ mutable memory cells that | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1582 | persist over several invocations of associated | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1583 |   operations.\footnote{This is independent of the visibility of such
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1584 | mutable values in the toplevel scope.} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1585 | |
| 39868 | 1586 | \item Global state of the running Isabelle/ML process, i.e.\ raw I/O | 
| 1587 | channels, environment variables, current working directory. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1588 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1589 | \item Writable resources in the file-system that are shared among | 
| 40126 | 1590 | different threads or external processes. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1591 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1592 |   \end{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1593 | |
| 39868 | 1594 | Isabelle/ML provides various mechanisms to avoid critical shared | 
| 40126 | 1595 | resources in most situations. As last resort there are some | 
| 1596 | mechanisms for explicit synchronization. The following guidelines | |
| 1597 | help to make Isabelle/ML programs work smoothly in a concurrent | |
| 1598 | environment. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1599 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1600 |   \begin{itemize}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1601 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1602 | \item Avoid global references altogether. Isabelle/Isar maintains a | 
| 39868 | 1603 | uniform context that incorporates arbitrary data declared by user | 
| 1604 |   programs (\secref{sec:context-data}).  This context is passed as
 | |
| 1605 | plain value and user tools can get/map their own data in a purely | |
| 1606 | functional manner. Configuration options within the context | |
| 1607 |   (\secref{sec:config-options}) provide simple drop-in replacements
 | |
| 40126 | 1608 | for historic reference variables. | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1609 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1610 | \item Keep components with local state information re-entrant. | 
| 39868 | 1611 | Instead of poking initial values into (private) global references, a | 
| 1612 | new state record can be created on each invocation, and passed | |
| 1613 | through any auxiliary functions of the component. The state record | |
| 1614 | may well contain mutable references, without requiring any special | |
| 1615 | synchronizations, as long as each invocation gets its own copy. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1616 | |
| 39868 | 1617 |   \item Avoid raw output on @{text "stdout"} or @{text "stderr"}.  The
 | 
| 1618 | Poly/ML library is thread-safe for each individual output operation, | |
| 1619 | but the ordering of parallel invocations is arbitrary. This means | |
| 1620 | raw output will appear on some system console with unpredictable | |
| 1621 | interleaving of atomic chunks. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1622 | |
| 39868 | 1623 | Note that this does not affect regular message output channels | 
| 1624 |   (\secref{sec:message-channels}).  An official message is associated
 | |
| 1625 | with the command transaction from where it originates, independently | |
| 1626 | of other transactions. This means each running Isar command has | |
| 1627 | effectively its own set of message channels, and interleaving can | |
| 1628 | only happen when commands use parallelism internally (and only at | |
| 1629 | message boundaries). | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1630 | |
| 39868 | 1631 | \item Treat environment variables and the current working directory | 
| 1632 | of the running process as strictly read-only. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1633 | |
| 39868 | 1634 | \item Restrict writing to the file-system to unique temporary files. | 
| 1635 | Isabelle already provides a temporary directory that is unique for | |
| 1636 | the running process, and there is a centralized source of unique | |
| 1637 | serial numbers in Isabelle/ML. Thus temporary files that are passed | |
| 1638 | to to some external process will be always disjoint, and thus | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1639 | thread-safe. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1640 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1641 |   \end{itemize}
 | 
| 39868 | 1642 | *} | 
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1643 | |
| 39868 | 1644 | text %mlref {*
 | 
| 1645 |   \begin{mldecls}
 | |
| 1646 |   @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
 | |
| 1647 |   @{index_ML serial_string: "unit -> string"} \\
 | |
| 1648 |   \end{mldecls}
 | |
| 1649 | ||
| 1650 |   \begin{description}
 | |
| 1651 | ||
| 1652 |   \item @{ML File.tmp_path}~@{text "path"} relocates the base
 | |
| 1653 |   component of @{text "path"} into the unique temporary directory of
 | |
| 1654 | the running Isabelle/ML process. | |
| 1655 | ||
| 1656 |   \item @{ML serial_string}~@{text "()"} creates a new serial number
 | |
| 1657 | that is unique over the runtime of the Isabelle/ML process. | |
| 1658 | ||
| 1659 |   \end{description}
 | |
| 1660 | *} | |
| 1661 | ||
| 1662 | text %mlex {* The following example shows how to create unique
 | |
| 1663 | temporary file names. | |
| 1664 | *} | |
| 1665 | ||
| 1666 | ML {*
 | |
| 1667 |   val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | |
| 1668 |   val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | |
| 1669 |   @{assert} (tmp1 <> tmp2);
 | |
| 1670 | *} | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1671 | |
| 39868 | 1672 | |
| 1673 | subsection {* Explicit synchronization *}
 | |
| 1674 | ||
| 1675 | text {* Isabelle/ML also provides some explicit synchronization
 | |
| 1676 | mechanisms, for the rare situations where mutable shared resources | |
| 1677 | are really required. These are based on the synchronizations | |
| 1678 | primitives of Poly/ML, which have been adapted to the specific | |
| 1679 | assumptions of the concurrent Isabelle/ML environment. User code | |
| 1680 | must not use the Poly/ML primitives directly! | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1681 | |
| 39868 | 1682 | \medskip The most basic synchronization concept is a single | 
| 1683 |   \emph{critical section} (also called ``monitor'' in the literature).
 | |
| 1684 | A thread that enters the critical section prevents all other threads | |
| 1685 | from doing the same. A thread that is already within the critical | |
| 1686 | section may re-enter it in an idempotent manner. | |
| 1687 | ||
| 1688 | Such centralized locking is convenient, because it prevents | |
| 1689 | deadlocks by construction. | |
| 1690 | ||
| 1691 |   \medskip More fine-grained locking works via \emph{synchronized
 | |
| 1692 | variables}. An explicit state component is associated with | |
| 1693 | mechanisms for locking and signaling. There are operations to | |
| 1694 | await a condition, change the state, and signal the change to all | |
| 1695 | other waiting threads. | |
| 1696 | ||
| 1697 |   Here the synchronized access to the state variable is \emph{not}
 | |
| 1698 | re-entrant: direct or indirect nesting within the same thread will | |
| 1699 | cause a deadlock! | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1700 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1701 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1702 | text %mlref {*
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1703 |   \begin{mldecls}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1704 |   @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1705 |   @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1706 |   \end{mldecls}
 | 
| 39871 | 1707 |   \begin{mldecls}
 | 
| 1708 |   @{index_ML_type "'a Synchronized.var"} \\
 | |
| 1709 |   @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
 | |
| 1710 |   @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
 | |
| 1711 |   ('a -> ('b * 'a) option) -> 'b"} \\
 | |
| 1712 |   \end{mldecls}
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1713 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1714 |   \begin{description}
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1715 | |
| 39868 | 1716 |   \item @{ML NAMED_CRITICAL}~@{text "name e"} evaluates @{text "e ()"}
 | 
| 1717 | within the central critical section of Isabelle/ML. No other thread | |
| 1718 | may do so at the same time, but non-critical parallel execution will | |
| 39871 | 1719 |   continue.  The @{text "name"} argument is used for tracing and might
 | 
| 39868 | 1720 | help to spot sources of congestion. | 
| 1721 | ||
| 1722 | Entering the critical section without contention is very fast, and | |
| 1723 | several basic system operations do so frequently. Each thread | |
| 40126 | 1724 | should stay within the critical section quickly only very briefly, | 
| 1725 | otherwise parallel performance may degrade. | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1726 | |
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1727 |   \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
 | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1728 | name argument. | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1729 | |
| 39871 | 1730 |   \item Type @{ML_type "'a Synchronized.var"} represents synchronized
 | 
| 1731 |   variables with state of type @{ML_type 'a}.
 | |
| 1732 | ||
| 1733 |   \item @{ML Synchronized.var}~@{text "name x"} creates a synchronized
 | |
| 1734 |   variable that is initialized with value @{text "x"}.  The @{text
 | |
| 1735 | "name"} is used for tracing. | |
| 1736 | ||
| 1737 |   \item @{ML Synchronized.guarded_access}~@{text "var f"} lets the
 | |
| 1738 |   function @{text "f"} operate within a critical section on the state
 | |
| 40126 | 1739 |   @{text "x"} as follows: if @{text "f x"} produces @{ML NONE}, it
 | 
| 1740 | continues to wait on the internal condition variable, expecting that | |
| 39871 | 1741 | some other thread will eventually change the content in a suitable | 
| 40126 | 1742 |   manner; if @{text "f x"} produces @{ML SOME}~@{text "(y, x')"} it is
 | 
| 1743 |   satisfied and assigns the new state value @{text "x'"}, broadcasts a
 | |
| 1744 | signal to all waiting threads on the associated condition variable, | |
| 1745 |   and returns the result @{text "y"}.
 | |
| 39871 | 1746 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1747 |   \end{description}
 | 
| 39871 | 1748 | |
| 40126 | 1749 |   There are some further variants of the @{ML
 | 
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1750 |   Synchronized.guarded_access} combinator, see @{file
 | 
| 39871 | 1751 | "~~/src/Pure/Concurrent/synchronized.ML"} for details. | 
| 1752 | *} | |
| 1753 | ||
| 40126 | 1754 | text %mlex {* The following example implements a counter that produces
 | 
| 39871 | 1755 | positive integers that are unique over the runtime of the Isabelle | 
| 1756 | process: | |
| 1757 | *} | |
| 1758 | ||
| 1759 | ML {*
 | |
| 1760 | local | |
| 1761 | val counter = Synchronized.var "counter" 0; | |
| 1762 | in | |
| 1763 | fun next () = | |
| 1764 | Synchronized.guarded_access counter | |
| 1765 | (fn i => | |
| 1766 | let val j = i + 1 | |
| 1767 | in SOME (j, j) end); | |
| 1768 | end; | |
| 1769 | *} | |
| 1770 | ||
| 1771 | ML {*
 | |
| 1772 | val a = next (); | |
| 1773 | val b = next (); | |
| 1774 |   @{assert} (a <> b);
 | |
| 39867 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1775 | *} | 
| 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 wenzelm parents: 
39866diff
changeset | 1776 | |
| 40800 
330eb65c9469
Parse.liberal_name for document antiquotations and attributes;
 wenzelm parents: 
40508diff
changeset | 1777 | text {* \medskip See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how
 | 
| 40126 | 1778 | to implement a mailbox as synchronized variable over a purely | 
| 1779 | functional queue. *} | |
| 1780 | ||
| 30270 
61811c9224a6
dummy changes to produce a new changeset of these files;
 wenzelm parents: 
29755diff
changeset | 1781 | end |