| author | wenzelm | 
| Tue, 05 Apr 2016 20:03:24 +0200 | |
| changeset 62876 | 507c90523113 | 
| parent 62505 | 9e2a65912111 | 
| child 62969 | 9f394a16c557 | 
| permissions | -rw-r--r-- | 
| 61656 | 1  | 
(*:maxLineLen=78:*)  | 
| 57347 | 2  | 
|
| 29755 | 3  | 
theory "ML"  | 
4  | 
imports Base  | 
|
5  | 
begin  | 
|
| 18538 | 6  | 
|
| 58618 | 7  | 
chapter \<open>Isabelle/ML\<close>  | 
8  | 
||
| 61854 | 9  | 
text \<open>  | 
10  | 
Isabelle/ML is best understood as a certain culture based on Standard ML.  | 
|
11  | 
Thus it is not a new programming language, but a certain way to use SML at  | 
|
12  | 
an advanced level within the Isabelle environment. This covers a variety of  | 
|
13  | 
aspects that are geared towards an efficient and robust platform for  | 
|
14  | 
applications of formal logic with fully foundational proof construction ---  | 
|
15  | 
according to the well-known \<^emph>\<open>LCF principle\<close>. There is specific  | 
|
16  | 
infrastructure with library modules to address the needs of this difficult  | 
|
17  | 
task. For example, the raw parallel programming model of Poly/ML is  | 
|
18  | 
presented as considerably more abstract concept of \<^emph>\<open>futures\<close>, which is then  | 
|
19  | 
used to augment the inference kernel, Isar theory and proof interpreter, and  | 
|
20  | 
PIDE document management.  | 
|
| 39823 | 21  | 
|
| 61854 | 22  | 
The main aspects of Isabelle/ML are introduced below. These first-hand  | 
23  | 
explanations should help to understand how proper Isabelle/ML is to be read  | 
|
24  | 
and written, and to get access to the wealth of experience that is expressed  | 
|
25  | 
  in the source text and its history of changes.\<^footnote>\<open>See @{url
 | 
|
26  | 
"http://isabelle.in.tum.de/repos/isabelle"} for the full Mercurial history.  | 
|
27  | 
There are symbolic tags to refer to official Isabelle releases, as opposed  | 
|
28  | 
to arbitrary \<^emph>\<open>tip\<close> versions that merely reflect snapshots that are never  | 
|
29  | 
really up-to-date.\<close>  | 
|
30  | 
\<close>  | 
|
| 58618 | 31  | 
|
32  | 
||
33  | 
section \<open>Style and orthography\<close>  | 
|
34  | 
||
| 61854 | 35  | 
text \<open>  | 
36  | 
The sources of Isabelle/Isar are optimized for \<^emph>\<open>readability\<close> and  | 
|
37  | 
\<^emph>\<open>maintainability\<close>. The main purpose is to tell an informed reader what is  | 
|
38  | 
really going on and how things really work. This is a non-trivial aim, but  | 
|
39  | 
it is supported by a certain style of writing Isabelle/ML that has emerged  | 
|
40  | 
from long years of system development.\<^footnote>\<open>See also the interesting style guide  | 
|
41  | 
  for OCaml @{url
 | 
|
42  | 
"http://caml.inria.fr/resources/doc/guides/guidelines.en.html"} which shares  | 
|
43  | 
many of our means and ends.\<close>  | 
|
| 39878 | 44  | 
|
| 61854 | 45  | 
The main principle behind any coding style is \<^emph>\<open>consistency\<close>. For a single  | 
46  | 
author of a small program this merely means ``choose your style and stick to  | 
|
47  | 
it''. A complex project like Isabelle, with long years of development and  | 
|
48  | 
different contributors, requires more standardization. A coding style that  | 
|
49  | 
is changed every few years or with every new contributor is no style at all,  | 
|
50  | 
because consistency is quickly lost. Global consistency is hard to achieve,  | 
|
51  | 
though. Nonetheless, one should always strive at least for local consistency  | 
|
52  | 
of modules and sub-systems, without deviating from some general principles  | 
|
53  | 
how to write Isabelle/ML.  | 
|
| 39878 | 54  | 
|
| 61854 | 55  | 
In a sense, good coding style is like an \<^emph>\<open>orthography\<close> for the sources: it  | 
56  | 
helps to read quickly over the text and see through the main points, without  | 
|
57  | 
getting distracted by accidental presentation of free-style code.  | 
|
| 58618 | 58  | 
\<close>  | 
59  | 
||
60  | 
||
61  | 
subsection \<open>Header and sectioning\<close>  | 
|
62  | 
||
| 61854 | 63  | 
text \<open>  | 
64  | 
Isabelle source files have a certain standardized header format (with  | 
|
65  | 
precise spacing) that follows ancient traditions reaching back to the  | 
|
66  | 
  earliest versions of the system by Larry Paulson. See @{file
 | 
|
67  | 
"~~/src/Pure/thm.ML"}, for example.  | 
|
| 39878 | 68  | 
|
| 61854 | 69  | 
The header includes at least \<^verbatim>\<open>Title\<close> and \<^verbatim>\<open>Author\<close> entries, followed by a  | 
70  | 
prose description of the purpose of the module. The latter can range from a  | 
|
71  | 
single line to several paragraphs of explanations.  | 
|
| 39878 | 72  | 
|
| 61854 | 73  | 
The rest of the file is divided into sections, subsections, subsubsections,  | 
74  | 
paragraphs etc.\ using a simple layout via ML comments as follows.  | 
|
| 39878 | 75  | 
|
| 61458 | 76  | 
  @{verbatim [display]
 | 
77  | 
\<open> (*** section ***)  | 
|
| 58723 | 78  | 
|
79  | 
(** subsection **)  | 
|
80  | 
||
81  | 
(* subsubsection *)  | 
|
82  | 
||
83  | 
(*short paragraph*)  | 
|
84  | 
||
85  | 
(*  | 
|
86  | 
long paragraph,  | 
|
87  | 
with more text  | 
|
| 61458 | 88  | 
*)\<close>}  | 
| 39878 | 89  | 
|
| 61854 | 90  | 
As in regular typography, there is some extra space \<^emph>\<open>before\<close> section  | 
91  | 
headings that are adjacent to plain text, but not other headings as in the  | 
|
92  | 
example above.  | 
|
| 39878 | 93  | 
|
| 61416 | 94  | 
\<^medskip>  | 
| 61854 | 95  | 
The precise wording of the prose text given in these headings is chosen  | 
96  | 
carefully to introduce the main theme of the subsequent formal ML text.  | 
|
| 58618 | 97  | 
\<close>  | 
98  | 
||
99  | 
||
100  | 
subsection \<open>Naming conventions\<close>  | 
|
101  | 
||
| 61854 | 102  | 
text \<open>  | 
103  | 
Since ML is the primary medium to express the meaning of the source text,  | 
|
104  | 
naming of ML entities requires special care.  | 
|
105  | 
\<close>  | 
|
| 61506 | 106  | 
|
107  | 
paragraph \<open>Notation.\<close>  | 
|
| 61854 | 108  | 
text \<open>  | 
109  | 
A name consists of 1--3 \<^emph>\<open>words\<close> (rarely 4, but not more) that are separated  | 
|
110  | 
by underscore. There are three variants concerning upper or lower case  | 
|
111  | 
letters, which are used for certain ML categories as follows:  | 
|
| 39880 | 112  | 
|
| 61416 | 113  | 
\<^medskip>  | 
| 39880 | 114  | 
  \begin{tabular}{lll}
 | 
115  | 
variant & example & ML categories \\\hline  | 
|
| 
40149
 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 
wenzelm 
parents: 
40126 
diff
changeset
 | 
116  | 
  lower-case & @{ML_text foo_bar} & values, types, record fields \\
 | 
| 
 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 
wenzelm 
parents: 
40126 
diff
changeset
 | 
117  | 
  capitalized & @{ML_text Foo_Bar} & datatype constructors, structures, functors \\
 | 
| 
 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 
wenzelm 
parents: 
40126 
diff
changeset
 | 
118  | 
  upper-case & @{ML_text FOO_BAR} & special values, exception constructors, signatures \\
 | 
| 39880 | 119  | 
  \end{tabular}
 | 
| 61416 | 120  | 
\<^medskip>  | 
| 39880 | 121  | 
|
| 61854 | 122  | 
For historical reasons, many capitalized names omit underscores, e.g.\  | 
123  | 
  old-style @{ML_text FooBar} instead of @{ML_text Foo_Bar}. Genuine
 | 
|
124  | 
mixed-case names are \<^emph>\<open>not\<close> used, because clear division of words is  | 
|
125  | 
essential for readability.\<^footnote>\<open>Camel-case was invented to workaround the lack  | 
|
126  | 
of underscore in some early non-ASCII character sets. Later it became  | 
|
127  | 
habitual in some language communities that are now strong in numbers.\<close>  | 
|
| 39880 | 128  | 
|
| 61854 | 129  | 
A single (capital) character does not count as ``word'' in this respect:  | 
130  | 
  some Isabelle/ML names are suffixed by extra markers like this: @{ML_text
 | 
|
131  | 
foo_barT}.  | 
|
| 39881 | 132  | 
|
| 61854 | 133  | 
  Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text foo'},
 | 
134  | 
  @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text foo''''} or more.
 | 
|
135  | 
  Decimal digits scale better to larger numbers, e.g.\ @{ML_text foo0},
 | 
|
136  | 
  @{ML_text foo1}, @{ML_text foo42}.
 | 
|
| 61506 | 137  | 
\<close>  | 
138  | 
||
139  | 
paragraph\<open>Scopes.\<close>  | 
|
| 61854 | 140  | 
text \<open>  | 
141  | 
Apart from very basic library modules, ML structures are not ``opened'', but  | 
|
142  | 
  names are referenced with explicit qualification, as in @{ML
 | 
|
| 61506 | 143  | 
Syntax.string_of_term} for example. When devising names for structures and  | 
144  | 
their components it is important to aim at eye-catching compositions of both  | 
|
145  | 
parts, because this is how they are seen in the sources and documentation.  | 
|
146  | 
For the same reasons, aliases of well-known library functions should be  | 
|
| 40126 | 147  | 
avoided.  | 
| 39880 | 148  | 
|
| 61854 | 149  | 
Local names of function abstraction or case/let bindings are typically  | 
150  | 
shorter, sometimes using only rudiments of ``words'', while still avoiding  | 
|
151  | 
  cryptic shorthands. An auxiliary function called @{ML_text helper},
 | 
|
152  | 
  @{ML_text aux}, or @{ML_text f} is considered bad style.
 | 
|
| 39880 | 153  | 
|
154  | 
Example:  | 
|
155  | 
||
| 61458 | 156  | 
  @{verbatim [display]
 | 
157  | 
\<open> (* RIGHT *)  | 
|
| 39880 | 158  | 
|
159  | 
fun print_foo ctxt foo =  | 
|
160  | 
let  | 
|
| 39881 | 161  | 
fun print t = ... Syntax.string_of_term ctxt t ...  | 
162  | 
in ... end;  | 
|
163  | 
||
164  | 
||
165  | 
(* RIGHT *)  | 
|
166  | 
||
167  | 
fun print_foo ctxt foo =  | 
|
168  | 
let  | 
|
| 39880 | 169  | 
val string_of_term = Syntax.string_of_term ctxt;  | 
170  | 
fun print t = ... string_of_term t ...  | 
|
171  | 
in ... end;  | 
|
172  | 
||
173  | 
||
174  | 
(* WRONG *)  | 
|
175  | 
||
176  | 
val string_of_term = Syntax.string_of_term;  | 
|
177  | 
||
178  | 
fun print_foo ctxt foo =  | 
|
179  | 
let  | 
|
180  | 
fun aux t = ... string_of_term ctxt t ...  | 
|
| 61458 | 181  | 
in ... end;\<close>}  | 
| 61506 | 182  | 
\<close>  | 
183  | 
||
184  | 
paragraph \<open>Specific conventions.\<close>  | 
|
| 61854 | 185  | 
text \<open>  | 
186  | 
Here are some specific name forms that occur frequently in the sources.  | 
|
| 39881 | 187  | 
|
| 61854 | 188  | 
  \<^item> A function that maps @{ML_text foo} to @{ML_text bar} is called @{ML_text
 | 
189  | 
  foo_to_bar} or @{ML_text bar_of_foo} (never @{ML_text foo2bar}, nor
 | 
|
190  | 
  @{ML_text bar_from_foo}, nor @{ML_text bar_for_foo}, nor @{ML_text
 | 
|
191  | 
bar4foo}).  | 
|
| 39881 | 192  | 
|
| 61854 | 193  | 
  \<^item> The name component @{ML_text legacy} means that the operation is about to
 | 
194  | 
be discontinued soon.  | 
|
| 39881 | 195  | 
|
| 61854 | 196  | 
  \<^item> The name component @{ML_text global} means that this works with the
 | 
197  | 
background theory instead of the regular local context  | 
|
198  | 
  (\secref{sec:context}), sometimes for historical reasons, sometimes due a
 | 
|
199  | 
genuine lack of locality of the concept involved, sometimes as a fall-back  | 
|
200  | 
for the lack of a proper context in the application code. Whenever there is  | 
|
201  | 
a non-global variant available, the application should be migrated to use it  | 
|
202  | 
with a proper local context.  | 
|
| 39881 | 203  | 
|
| 61854 | 204  | 
\<^item> Variables of the main context types of the Isabelle/Isar framework  | 
205  | 
  (\secref{sec:context} and \chref{ch:local-theory}) have firm naming
 | 
|
206  | 
conventions as follows:  | 
|
| 39881 | 207  | 
|
| 
61459
 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 
wenzelm 
parents: 
61458 
diff
changeset
 | 
208  | 
    \<^item> theories are called @{ML_text thy}, rarely @{ML_text theory}
 | 
| 61458 | 209  | 
    (never @{ML_text thry})
 | 
210  | 
||
| 
61459
 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 
wenzelm 
parents: 
61458 
diff
changeset
 | 
211  | 
    \<^item> proof contexts are called @{ML_text ctxt}, rarely @{ML_text
 | 
| 61458 | 212  | 
    context} (never @{ML_text ctx})
 | 
213  | 
||
| 
61459
 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 
wenzelm 
parents: 
61458 
diff
changeset
 | 
214  | 
    \<^item> generic contexts are called @{ML_text context}
 | 
| 
 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 
wenzelm 
parents: 
61458 
diff
changeset
 | 
215  | 
|
| 
 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 
wenzelm 
parents: 
61458 
diff
changeset
 | 
216  | 
    \<^item> local theories are called @{ML_text lthy}, except for local
 | 
| 61458 | 217  | 
theories that are treated as proof context (which is a semantic  | 
218  | 
super-type)  | 
|
| 39881 | 219  | 
|
| 61854 | 220  | 
Variations with primed or decimal numbers are always possible, as well as  | 
221  | 
  semantic prefixes like @{ML_text foo_thy} or @{ML_text bar_ctxt}, but the
 | 
|
222  | 
base conventions above need to be preserved. This allows to emphasize their  | 
|
223  | 
data flow via plain regular expressions in the text editor.  | 
|
| 39881 | 224  | 
|
| 61854 | 225  | 
  \<^item> The main logical entities (\secref{ch:logic}) have established naming
 | 
226  | 
convention as follows:  | 
|
| 39881 | 227  | 
|
| 
61459
 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 
wenzelm 
parents: 
61458 
diff
changeset
 | 
228  | 
    \<^item> sorts are called @{ML_text S}
 | 
| 
 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 
wenzelm 
parents: 
61458 
diff
changeset
 | 
229  | 
|
| 61854 | 230  | 
    \<^item> types are called @{ML_text T}, @{ML_text U}, or @{ML_text ty} (never
 | 
231  | 
    @{ML_text t})
 | 
|
| 61458 | 232  | 
|
| 61854 | 233  | 
    \<^item> terms are called @{ML_text t}, @{ML_text u}, or @{ML_text tm} (never
 | 
234  | 
    @{ML_text trm})
 | 
|
| 61458 | 235  | 
|
| 61854 | 236  | 
    \<^item> certified types are called @{ML_text cT}, rarely @{ML_text T}, with
 | 
237  | 
variants as for types  | 
|
| 61458 | 238  | 
|
| 61854 | 239  | 
    \<^item> certified terms are called @{ML_text ct}, rarely @{ML_text t}, with
 | 
240  | 
    variants as for terms (never @{ML_text ctrm})
 | 
|
| 61458 | 241  | 
|
| 
61459
 
5f2ddeb15c06
clarified nesting of paragraphs: indentation is taken into account more uniformly;
 
wenzelm 
parents: 
61458 
diff
changeset
 | 
242  | 
    \<^item> theorems are called @{ML_text th}, or @{ML_text thm}
 | 
| 39881 | 243  | 
|
| 61854 | 244  | 
Proper semantic names override these conventions completely. For example,  | 
245  | 
  the left-hand side of an equation (as a term) can be called @{ML_text lhs}
 | 
|
246  | 
  (not @{ML_text lhs_tm}). Or a term that is known to be a variable can be
 | 
|
247  | 
  called @{ML_text v} or @{ML_text x}.
 | 
|
| 39881 | 248  | 
|
| 61854 | 249  | 
  \<^item> Tactics (\secref{sec:tactics}) are sufficiently important to have specific
 | 
250  | 
naming conventions. The name of a basic tactic definition always has a  | 
|
251  | 
  @{ML_text "_tac"} suffix, the subgoal index (if applicable) is always called
 | 
|
252  | 
  @{ML_text i}, and the goal state (if made explicit) is usually called
 | 
|
253  | 
  @{ML_text st} instead of the somewhat misleading @{ML_text thm}. Any other
 | 
|
254  | 
arguments are given before the latter two, and the general context is given  | 
|
255  | 
first. Example:  | 
|
| 40310 | 256  | 
|
| 61458 | 257  | 
  @{verbatim [display] \<open>  fun my_tac ctxt arg1 arg2 i st = ...\<close>}
 | 
| 40310 | 258  | 
|
| 61854 | 259  | 
  Note that the goal state @{ML_text st} above is rarely made explicit, if
 | 
260  | 
tactic combinators (tacticals) are used as usual.  | 
|
| 40310 | 261  | 
|
| 57421 | 262  | 
A tactic that requires a proof context needs to make that explicit as seen  | 
| 61854 | 263  | 
in the \<^verbatim>\<open>ctxt\<close> argument above. Do not refer to the background theory of  | 
264  | 
\<^verbatim>\<open>st\<close> -- it is not a proper context, but merely a formal certificate.  | 
|
| 58618 | 265  | 
\<close>  | 
266  | 
||
267  | 
||
268  | 
subsection \<open>General source layout\<close>  | 
|
269  | 
||
270  | 
text \<open>  | 
|
| 57421 | 271  | 
The general Isabelle/ML source layout imitates regular type-setting  | 
272  | 
conventions, augmented by the requirements for deeply nested expressions  | 
|
| 61854 | 273  | 
that are commonplace in functional programming.  | 
274  | 
\<close>  | 
|
| 61506 | 275  | 
|
276  | 
paragraph \<open>Line length\<close>  | 
|
| 61854 | 277  | 
text \<open>  | 
278  | 
is limited to 80 characters according to ancient standards, but we allow as  | 
|
279  | 
much as 100 characters (not more).\<^footnote>\<open>Readability requires to keep the  | 
|
280  | 
beginning of a line in view while watching its end. Modern wide-screen  | 
|
| 61506 | 281  | 
displays do not change the way how the human brain works. Sources also need  | 
| 61572 | 282  | 
to be printable on plain paper with reasonable font-size.\<close> The extra 20  | 
| 61506 | 283  | 
characters acknowledge the space requirements due to qualified library  | 
| 61854 | 284  | 
references in Isabelle/ML.  | 
285  | 
\<close>  | 
|
| 61506 | 286  | 
|
287  | 
paragraph \<open>White-space\<close>  | 
|
| 61854 | 288  | 
text \<open>  | 
289  | 
is used to emphasize the structure of expressions, following mostly standard  | 
|
290  | 
  conventions for mathematical typesetting, as can be seen in plain {\TeX} or
 | 
|
291  | 
  {\LaTeX}. This defines positioning of spaces for parentheses, punctuation,
 | 
|
292  | 
and infixes as illustrated here:  | 
|
| 39878 | 293  | 
|
| 61458 | 294  | 
  @{verbatim [display]
 | 
295  | 
\<open> val x = y + z * (a + b);  | 
|
| 39878 | 296  | 
val pair = (a, b);  | 
| 61458 | 297  | 
  val record = {foo = 1, bar = 2};\<close>}
 | 
| 39878 | 298  | 
|
| 61854 | 299  | 
Lines are normally broken \<^emph>\<open>after\<close> an infix operator or punctuation  | 
300  | 
character. For example:  | 
|
| 39878 | 301  | 
|
| 61458 | 302  | 
  @{verbatim [display]
 | 
303  | 
\<open>  | 
|
| 39878 | 304  | 
val x =  | 
305  | 
a +  | 
|
306  | 
b +  | 
|
307  | 
c;  | 
|
308  | 
||
309  | 
val tuple =  | 
|
310  | 
(a,  | 
|
311  | 
b,  | 
|
312  | 
c);  | 
|
| 61458 | 313  | 
\<close>}  | 
| 39878 | 314  | 
|
| 61854 | 315  | 
  Some special infixes (e.g.\ @{ML_text "|>"}) work better at the start of the
 | 
316  | 
line, but punctuation is always at the end.  | 
|
| 39878 | 317  | 
|
| 61854 | 318  | 
Function application follows the tradition of \<open>\<lambda>\<close>-calculus, not informal  | 
319  | 
  mathematics. For example: @{ML_text "f a b"} for a curried function, or
 | 
|
320  | 
  @{ML_text "g (a, b)"} for a tupled function. Note that the space between
 | 
|
321  | 
  @{ML_text g} and the pair @{ML_text "(a, b)"} follows the important
 | 
|
322  | 
  principle of \<^emph>\<open>compositionality\<close>: the layout of @{ML_text "g p"} does not
 | 
|
323  | 
  change when @{ML_text "p"} is refined to the concrete pair @{ML_text "(a,
 | 
|
324  | 
b)"}.  | 
|
| 61506 | 325  | 
\<close>  | 
326  | 
||
327  | 
paragraph \<open>Indentation\<close>  | 
|
| 61854 | 328  | 
text \<open>  | 
329  | 
uses plain spaces, never hard tabulators.\<^footnote>\<open>Tabulators were invented to move  | 
|
330  | 
the carriage of a type-writer to certain predefined positions. In software  | 
|
331  | 
they could be used as a primitive run-length compression of consecutive  | 
|
332  | 
spaces, but the precise result would depend on non-standardized text editor  | 
|
333  | 
configuration.\<close>  | 
|
| 39878 | 334  | 
|
| 61854 | 335  | 
Each level of nesting is indented by 2 spaces, sometimes 1, very rarely 4,  | 
336  | 
never 8 or any other odd number.  | 
|
| 39878 | 337  | 
|
| 61854 | 338  | 
Indentation follows a simple logical format that only depends on the nesting  | 
339  | 
depth, not the accidental length of the text that initiates a level of  | 
|
340  | 
nesting. Example:  | 
|
| 39878 | 341  | 
|
| 61458 | 342  | 
  @{verbatim [display]
 | 
343  | 
\<open> (* RIGHT *)  | 
|
| 39880 | 344  | 
|
| 39878 | 345  | 
if b then  | 
| 39879 | 346  | 
expr1_part1  | 
347  | 
expr1_part2  | 
|
| 39878 | 348  | 
else  | 
| 39879 | 349  | 
expr2_part1  | 
350  | 
expr2_part2  | 
|
| 39878 | 351  | 
|
| 39880 | 352  | 
|
353  | 
(* WRONG *)  | 
|
354  | 
||
| 39879 | 355  | 
if b then expr1_part1  | 
356  | 
expr1_part2  | 
|
357  | 
else expr2_part1  | 
|
| 61458 | 358  | 
expr2_part2\<close>}  | 
| 39878 | 359  | 
|
| 61854 | 360  | 
The second form has many problems: it assumes a fixed-width font when  | 
361  | 
viewing the sources, it uses more space on the line and thus makes it hard  | 
|
362  | 
to observe its strict length limit (working against \<^emph>\<open>readability\<close>), it  | 
|
363  | 
requires extra editing to adapt the layout to changes of the initial text  | 
|
364  | 
(working against \<^emph>\<open>maintainability\<close>) etc.  | 
|
| 39878 | 365  | 
|
| 61416 | 366  | 
\<^medskip>  | 
| 61854 | 367  | 
For similar reasons, any kind of two-dimensional or tabular layouts,  | 
368  | 
ASCII-art with lines or boxes of asterisks etc.\ should be avoided.  | 
|
| 61506 | 369  | 
\<close>  | 
370  | 
||
371  | 
paragraph \<open>Complex expressions\<close>  | 
|
| 61854 | 372  | 
text \<open>  | 
373  | 
  that consist of multi-clausal function definitions, @{ML_text handle},
 | 
|
| 61506 | 374  | 
  @{ML_text case}, @{ML_text let} (and combinations) require special
 | 
375  | 
attention. The syntax of Standard ML is quite ambitious and admits a lot of  | 
|
| 40126 | 376  | 
variance that can distort the meaning of the text.  | 
| 39881 | 377  | 
|
| 57421 | 378  | 
  Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
 | 
| 61854 | 379  | 
  @{ML_text case} get extra indentation to indicate the nesting clearly.
 | 
380  | 
Example:  | 
|
| 39881 | 381  | 
|
| 61458 | 382  | 
  @{verbatim [display]
 | 
383  | 
\<open> (* RIGHT *)  | 
|
| 39881 | 384  | 
|
385  | 
fun foo p1 =  | 
|
386  | 
expr1  | 
|
387  | 
| foo p2 =  | 
|
388  | 
expr2  | 
|
389  | 
||
390  | 
||
391  | 
(* WRONG *)  | 
|
392  | 
||
393  | 
fun foo p1 =  | 
|
394  | 
expr1  | 
|
395  | 
| foo p2 =  | 
|
| 61458 | 396  | 
expr2\<close>}  | 
| 39881 | 397  | 
|
| 61854 | 398  | 
  Body expressions consisting of @{ML_text case} or @{ML_text let} require
 | 
399  | 
care to maintain compositionality, to prevent loss of logical indentation  | 
|
400  | 
where it is especially important to see the structure of the text. Example:  | 
|
| 39881 | 401  | 
|
| 61458 | 402  | 
  @{verbatim [display]
 | 
403  | 
\<open> (* RIGHT *)  | 
|
| 39881 | 404  | 
|
405  | 
fun foo p1 =  | 
|
406  | 
(case e of  | 
|
407  | 
q1 => ...  | 
|
408  | 
| q2 => ...)  | 
|
409  | 
| foo p2 =  | 
|
410  | 
let  | 
|
411  | 
...  | 
|
412  | 
in  | 
|
413  | 
...  | 
|
414  | 
end  | 
|
415  | 
||
416  | 
||
417  | 
(* WRONG *)  | 
|
418  | 
||
419  | 
fun foo p1 = case e of  | 
|
420  | 
q1 => ...  | 
|
421  | 
| q2 => ...  | 
|
422  | 
| foo p2 =  | 
|
423  | 
let  | 
|
424  | 
...  | 
|
425  | 
in  | 
|
426  | 
...  | 
|
| 61458 | 427  | 
end\<close>}  | 
| 39881 | 428  | 
|
| 61854 | 429  | 
  Extra parentheses around @{ML_text case} expressions are optional, but help
 | 
430  | 
to analyse the nesting based on character matching in the text editor.  | 
|
| 39881 | 431  | 
|
| 61416 | 432  | 
\<^medskip>  | 
| 61854 | 433  | 
There are two main exceptions to the overall principle of compositionality  | 
434  | 
in the layout of complex expressions.  | 
|
| 39881 | 435  | 
|
| 61416 | 436  | 
  \<^enum> @{ML_text "if"} expressions are iterated as if ML had multi-branch
 | 
| 57421 | 437  | 
conditionals, e.g.  | 
| 39881 | 438  | 
|
| 61458 | 439  | 
  @{verbatim [display]
 | 
440  | 
\<open> (* RIGHT *)  | 
|
| 39881 | 441  | 
|
442  | 
if b1 then e1  | 
|
443  | 
else if b2 then e2  | 
|
| 61458 | 444  | 
else e3\<close>}  | 
| 39881 | 445  | 
|
| 61854 | 446  | 
  \<^enum> @{ML_text fn} abstractions are often layed-out as if they would lack any
 | 
447  | 
structure by themselves. This traditional form is motivated by the  | 
|
448  | 
possibility to shift function arguments back and forth wrt.\ additional  | 
|
449  | 
combinators. Example:  | 
|
| 39881 | 450  | 
|
| 61458 | 451  | 
  @{verbatim [display]
 | 
452  | 
\<open> (* RIGHT *)  | 
|
| 39881 | 453  | 
|
454  | 
fun foo x y = fold (fn z =>  | 
|
| 61458 | 455  | 
expr)\<close>}  | 
| 39881 | 456  | 
|
| 
40149
 
4c35be108990
proper markup of uninterpreted ML text as @{ML_text}, not @{verbatim};
 
wenzelm 
parents: 
40126 
diff
changeset
 | 
457  | 
  Here the visual appearance is that of three arguments @{ML_text x},
 | 
| 57421 | 458  | 
  @{ML_text y}, @{ML_text z} in a row.
 | 
| 39881 | 459  | 
|
460  | 
||
| 61854 | 461  | 
Such weakly structured layout should be use with great care. Here are some  | 
462  | 
  counter-examples involving @{ML_text let} expressions:
 | 
|
| 39881 | 463  | 
|
| 61458 | 464  | 
  @{verbatim [display]
 | 
465  | 
\<open> (* WRONG *)  | 
|
| 39881 | 466  | 
|
467  | 
fun foo x = let  | 
|
468  | 
val y = ...  | 
|
469  | 
in ... end  | 
|
470  | 
||
| 41162 | 471  | 
|
472  | 
(* WRONG *)  | 
|
473  | 
||
| 40153 | 474  | 
fun foo x = let  | 
475  | 
val y = ...  | 
|
476  | 
in ... end  | 
|
477  | 
||
| 41162 | 478  | 
|
479  | 
(* WRONG *)  | 
|
480  | 
||
| 39881 | 481  | 
fun foo x =  | 
482  | 
let  | 
|
483  | 
val y = ...  | 
|
484  | 
in ... end  | 
|
| 57421 | 485  | 
|
486  | 
||
487  | 
(* WRONG *)  | 
|
488  | 
||
489  | 
fun foo x =  | 
|
490  | 
let  | 
|
491  | 
val y = ...  | 
|
492  | 
in  | 
|
| 61458 | 493  | 
... end\<close>}  | 
| 39881 | 494  | 
|
| 61416 | 495  | 
\<^medskip>  | 
| 61854 | 496  | 
In general the source layout is meant to emphasize the structure of complex  | 
497  | 
language expressions, not to pretend that SML had a completely different  | 
|
498  | 
syntax (say that of Haskell, Scala, Java).  | 
|
| 58618 | 499  | 
\<close>  | 
500  | 
||
501  | 
||
502  | 
section \<open>ML embedded into Isabelle/Isar\<close>  | 
|
503  | 
||
| 61854 | 504  | 
text \<open>  | 
505  | 
ML and Isar are intertwined via an open-ended bootstrap process that  | 
|
506  | 
provides more and more programming facilities and logical content in an  | 
|
507  | 
alternating manner. Bootstrapping starts from the raw environment of  | 
|
| 62354 | 508  | 
existing implementations of Standard ML (mainly Poly/ML).  | 
| 39823 | 509  | 
|
| 57421 | 510  | 
Isabelle/Pure marks the point where the raw ML toplevel is superseded by  | 
511  | 
Isabelle/ML within the Isar theory and proof language, with a uniform  | 
|
512  | 
  context for arbitrary ML values (see also \secref{sec:context}). This formal
 | 
|
513  | 
environment holds ML compiler bindings, logical entities, and many other  | 
|
514  | 
things.  | 
|
515  | 
||
516  | 
Object-logics like Isabelle/HOL are built within the Isabelle/ML/Isar  | 
|
517  | 
environment by introducing suitable theories with associated ML modules,  | 
|
| 61503 | 518  | 
either inlined within \<^verbatim>\<open>.thy\<close> files, or as separate \<^verbatim>\<open>.ML\<close> files that are  | 
| 61854 | 519  | 
loading from some theory. Thus Isabelle/HOL is defined as a regular  | 
520  | 
user-space application within the Isabelle framework. Further add-on tools  | 
|
521  | 
can be implemented in ML within the Isar context in the same manner: ML is  | 
|
522  | 
part of the standard repertoire of Isabelle, and there is no distinction  | 
|
523  | 
between ``users'' and ``developers'' in this respect.  | 
|
| 58618 | 524  | 
\<close>  | 
525  | 
||
526  | 
||
527  | 
subsection \<open>Isar ML commands\<close>  | 
|
528  | 
||
529  | 
text \<open>  | 
|
| 57421 | 530  | 
The primary Isar source language provides facilities to ``open a window'' to  | 
531  | 
  the underlying ML compiler. Especially see the Isar commands @{command_ref
 | 
|
532  | 
  "ML_file"} and @{command_ref "ML"}: both work the same way, but the source
 | 
|
533  | 
text is provided differently, via a file vs.\ inlined, respectively. Apart  | 
|
534  | 
from embedding ML into the main theory definition like that, there are many  | 
|
535  | 
  more commands that refer to ML source, such as @{command_ref setup} or
 | 
|
536  | 
  @{command_ref declaration}. Even more fine-grained embedding of ML into Isar
 | 
|
537  | 
  is encountered in the proof method @{method_ref tactic}, which refines the
 | 
|
538  | 
  pending goal state via a given expression of type @{ML_type tactic}.
 | 
|
| 58618 | 539  | 
\<close>  | 
540  | 
||
| 61854 | 541  | 
text %mlex \<open>  | 
542  | 
The following artificial example demonstrates some ML toplevel declarations  | 
|
543  | 
within the implicit Isar theory context. This is regular functional  | 
|
544  | 
programming without referring to logical entities yet.  | 
|
| 58618 | 545  | 
\<close>  | 
546  | 
||
547  | 
ML \<open>  | 
|
| 39823 | 548  | 
fun factorial 0 = 1  | 
549  | 
| factorial n = n * factorial (n - 1)  | 
|
| 58618 | 550  | 
\<close>  | 
551  | 
||
| 61854 | 552  | 
text \<open>  | 
553  | 
  Here the ML environment is already managed by Isabelle, i.e.\ the @{ML
 | 
|
554  | 
factorial} function is not yet accessible in the preceding paragraph, nor in  | 
|
555  | 
a different theory that is independent from the current one in the import  | 
|
556  | 
hierarchy.  | 
|
| 39823 | 557  | 
|
| 57421 | 558  | 
Removing the above ML declaration from the source text will remove any trace  | 
559  | 
of this definition, as expected. The Isabelle/ML toplevel environment is  | 
|
| 61854 | 560  | 
managed in a \<^emph>\<open>stateless\<close> way: in contrast to the raw ML toplevel, there are  | 
561  | 
no global side-effects involved here.\<^footnote>\<open>Such a stateless compilation  | 
|
562  | 
environment is also a prerequisite for robust parallel compilation within  | 
|
563  | 
independent nodes of the implicit theory development graph.\<close>  | 
|
| 39823 | 564  | 
|
| 61416 | 565  | 
\<^medskip>  | 
| 61854 | 566  | 
  The next example shows how to embed ML into Isar proofs, using @{command_ref
 | 
567  | 
  "ML_prf"} instead of @{command_ref "ML"}. As illustrated below, the effect
 | 
|
568  | 
on the ML environment is local to the whole proof body, but ignoring the  | 
|
569  | 
block structure.  | 
|
570  | 
\<close>  | 
|
| 39823 | 571  | 
|
| 40964 | 572  | 
notepad  | 
573  | 
begin  | 
|
| 58618 | 574  | 
ML_prf %"ML" \<open>val a = 1\<close>  | 
| 40126 | 575  | 
  {
 | 
| 58618 | 576  | 
ML_prf %"ML" \<open>val b = a + 1\<close>  | 
| 61580 | 577  | 
} \<comment> \<open>Isar block structure ignored by ML environment\<close>  | 
| 58618 | 578  | 
ML_prf %"ML" \<open>val c = b + 1\<close>  | 
| 40964 | 579  | 
end  | 
| 39823 | 580  | 
|
| 61854 | 581  | 
text \<open>  | 
582  | 
By side-stepping the normal scoping rules for Isar proof blocks, embedded ML  | 
|
583  | 
code can refer to the different contexts and manipulate corresponding  | 
|
584  | 
entities, e.g.\ export a fact from a block context.  | 
|
| 39823 | 585  | 
|
| 61416 | 586  | 
\<^medskip>  | 
| 61854 | 587  | 
  Two further ML commands are useful in certain situations: @{command_ref
 | 
588  | 
  ML_val} and @{command_ref ML_command} are \<^emph>\<open>diagnostic\<close> in the sense that
 | 
|
589  | 
there is no effect on the underlying environment, and can thus be used  | 
|
590  | 
  anywhere. The examples below produce long strings of digits by invoking @{ML
 | 
|
591  | 
  factorial}: @{command ML_val} takes care of printing the ML toplevel result,
 | 
|
592  | 
  but @{command ML_command} is silent so we produce an explicit output
 | 
|
593  | 
message.  | 
|
| 58618 | 594  | 
\<close>  | 
595  | 
||
596  | 
ML_val \<open>factorial 100\<close>  | 
|
597  | 
ML_command \<open>writeln (string_of_int (factorial 100))\<close>  | 
|
| 39823 | 598  | 
|
| 40964 | 599  | 
notepad  | 
600  | 
begin  | 
|
| 58618 | 601  | 
ML_val \<open>factorial 100\<close>  | 
602  | 
ML_command \<open>writeln (string_of_int (factorial 100))\<close>  | 
|
| 40964 | 603  | 
end  | 
| 39823 | 604  | 
|
605  | 
||
| 58618 | 606  | 
subsection \<open>Compile-time context\<close>  | 
607  | 
||
| 61854 | 608  | 
text \<open>  | 
609  | 
Whenever the ML compiler is invoked within Isabelle/Isar, the formal context  | 
|
610  | 
is passed as a thread-local reference variable. Thus ML code may access the  | 
|
611  | 
theory context during compilation, by reading or writing the (local) theory  | 
|
612  | 
under construction. Note that such direct access to the compile-time context  | 
|
613  | 
is rare. In practice it is typically done via some derived ML functions  | 
|
614  | 
instead.  | 
|
| 58618 | 615  | 
\<close>  | 
616  | 
||
617  | 
text %mlref \<open>  | 
|
| 
39825
 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 
wenzelm 
parents: 
39824 
diff
changeset
 | 
618  | 
  \begin{mldecls}
 | 
| 62876 | 619  | 
  @{index_ML Context.the_generic_context: "unit -> Context.generic"} \\
 | 
| 40126 | 620  | 
  @{index_ML "Context.>>": "(Context.generic -> Context.generic) -> unit"} \\
 | 
| 56199 | 621  | 
  @{index_ML ML_Thms.bind_thms: "string * thm list -> unit"} \\
 | 
622  | 
  @{index_ML ML_Thms.bind_thm: "string * thm -> unit"} \\
 | 
|
| 
39825
 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 
wenzelm 
parents: 
39824 
diff
changeset
 | 
623  | 
  \end{mldecls}
 | 
| 
 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 
wenzelm 
parents: 
39824 
diff
changeset
 | 
624  | 
|
| 62876 | 625  | 
    \<^descr> @{ML "Context.the_generic_context ()"} refers to the theory context of
 | 
| 61854 | 626  | 
the ML toplevel --- at compile time. ML code needs to take care to refer to  | 
| 62876 | 627  | 
    @{ML "Context.the_generic_context ()"} correctly. Recall that evaluation
 | 
| 61854 | 628  | 
of a function body is delayed until actual run-time.  | 
| 
39825
 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 
wenzelm 
parents: 
39824 
diff
changeset
 | 
629  | 
|
| 61854 | 630  | 
    \<^descr> @{ML "Context.>>"}~\<open>f\<close> applies context transformation \<open>f\<close> to the implicit
 | 
631  | 
context of the ML toplevel.  | 
|
| 61493 | 632  | 
|
| 61854 | 633  | 
    \<^descr> @{ML ML_Thms.bind_thms}~\<open>(name, thms)\<close> stores a list of theorems produced
 | 
634  | 
in ML both in the (global) theory context and the ML toplevel, associating  | 
|
635  | 
it with the provided name.  | 
|
| 39850 | 636  | 
|
| 61854 | 637  | 
    \<^descr> @{ML ML_Thms.bind_thm} is similar to @{ML ML_Thms.bind_thms} but refers to
 | 
638  | 
a singleton fact.  | 
|
| 
39825
 
f9066b94bf07
eliminated fancy \ML logo for the sake of simpler source text (less dependence on LaTeX);
 
wenzelm 
parents: 
39824 
diff
changeset
 | 
639  | 
|
| 61854 | 640  | 
It is important to note that the above functions are really restricted to  | 
641  | 
the compile time, even though the ML compiler is invoked at run-time. The  | 
|
642  | 
majority of ML code either uses static antiquotations  | 
|
643  | 
  (\secref{sec:ML-antiq}) or refers to the theory or proof context at
 | 
|
644  | 
run-time, by explicit functional abstraction.  | 
|
| 58618 | 645  | 
\<close>  | 
646  | 
||
647  | 
||
648  | 
subsection \<open>Antiquotations \label{sec:ML-antiq}\<close>
 | 
|
649  | 
||
| 61854 | 650  | 
text \<open>  | 
651  | 
A very important consequence of embedding ML into Isar is the concept of  | 
|
652  | 
\<^emph>\<open>ML antiquotation\<close>. The standard token language of ML is augmented by  | 
|
653  | 
special syntactic entities of the following form:  | 
|
| 
39827
 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 
wenzelm 
parents: 
39825 
diff
changeset
 | 
654  | 
|
| 
55112
 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 
wenzelm 
parents: 
54703 
diff
changeset
 | 
655  | 
  @{rail \<open>
 | 
| 53167 | 656  | 
  @{syntax_def antiquote}: '@{' nameref args '}'
 | 
| 
55112
 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 
wenzelm 
parents: 
54703 
diff
changeset
 | 
657  | 
\<close>}  | 
| 
39827
 
d829ce302ca4
basic setup for ML antiquotations -- with rail diagrams;
 
wenzelm 
parents: 
39825 
diff
changeset
 | 
658  | 
|
| 57421 | 659  | 
  Here @{syntax nameref} and @{syntax args} are outer syntax categories, as
 | 
| 58555 | 660  | 
  defined in @{cite "isabelle-isar-ref"}.
 | 
| 39823 | 661  | 
|
| 61416 | 662  | 
\<^medskip>  | 
| 61854 | 663  | 
  A regular antiquotation \<open>@{name args}\<close> processes its arguments by the usual
 | 
664  | 
means of the Isar source language, and produces corresponding ML source  | 
|
665  | 
  text, either as literal \<^emph>\<open>inline\<close> text (e.g.\ \<open>@{term t}\<close>) or abstract
 | 
|
666  | 
  \<^emph>\<open>value\<close> (e.g. \<open>@{thm th}\<close>). This pre-compilation scheme allows to refer to
 | 
|
667  | 
formal entities in a robust manner, with proper static scoping and with some  | 
|
668  | 
degree of logical checking of small portions of the code.  | 
|
| 58618 | 669  | 
\<close>  | 
670  | 
||
671  | 
||
672  | 
subsection \<open>Printing ML values\<close>  | 
|
673  | 
||
| 61854 | 674  | 
text \<open>  | 
675  | 
The ML compiler knows about the structure of values according to their  | 
|
676  | 
static type, and can print them in the manner of its toplevel, although the  | 
|
677  | 
  details are non-portable. The antiquotations @{ML_antiquotation_def
 | 
|
678  | 
  "make_string"} and @{ML_antiquotation_def "print"} provide a quasi-portable
 | 
|
679  | 
way to refer to this potential capability of the underlying ML system in  | 
|
| 56399 | 680  | 
generic Isabelle/ML sources.  | 
681  | 
||
| 61854 | 682  | 
This is occasionally useful for diagnostic or demonstration purposes. Note  | 
683  | 
that production-quality tools require proper user-level error messages,  | 
|
684  | 
avoiding raw ML values in the output.  | 
|
685  | 
\<close>  | 
|
| 58618 | 686  | 
|
687  | 
text %mlantiq \<open>  | 
|
| 
51636
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
688  | 
  \begin{matharray}{rcl}
 | 
| 61493 | 689  | 
  @{ML_antiquotation_def "make_string"} & : & \<open>ML_antiquotation\<close> \\
 | 
690  | 
  @{ML_antiquotation_def "print"} & : & \<open>ML_antiquotation\<close> \\
 | 
|
| 
51636
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
691  | 
  \end{matharray}
 | 
| 
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
692  | 
|
| 
55112
 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 
wenzelm 
parents: 
54703 
diff
changeset
 | 
693  | 
  @{rail \<open>
 | 
| 
51636
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
694  | 
  @@{ML_antiquotation make_string}
 | 
| 56399 | 695  | 
;  | 
696  | 
  @@{ML_antiquotation print} @{syntax name}?
 | 
|
| 
55112
 
b1a5d603fd12
prefer rail cartouche -- avoid back-slashed quotes;
 
wenzelm 
parents: 
54703 
diff
changeset
 | 
697  | 
\<close>}  | 
| 
51636
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
698  | 
|
| 61854 | 699  | 
  \<^descr> \<open>@{make_string}\<close> inlines a function to print arbitrary values similar to
 | 
700  | 
the ML toplevel. The result is compiler dependent and may fall back on "?"  | 
|
701  | 
  in certain situations. The value of configuration option @{attribute_ref
 | 
|
702  | 
ML_print_depth} determines further details of output.  | 
|
| 56399 | 703  | 
|
| 61854 | 704  | 
  \<^descr> \<open>@{print f}\<close> uses the ML function \<open>f: string -> unit\<close> to output the result
 | 
705  | 
  of \<open>@{make_string}\<close> above, together with the source position of the
 | 
|
706  | 
  antiquotation. The default output function is @{ML writeln}.
 | 
|
| 58618 | 707  | 
\<close>  | 
708  | 
||
| 61854 | 709  | 
text %mlex \<open>  | 
710  | 
The following artificial examples show how to produce adhoc output of ML  | 
|
711  | 
values for debugging purposes.  | 
|
712  | 
\<close>  | 
|
| 58618 | 713  | 
|
| 59902 | 714  | 
ML_val \<open>  | 
| 
51636
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
715  | 
val x = 42;  | 
| 
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
716  | 
val y = true;  | 
| 
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
717  | 
|
| 
 
e49bf0be79ba
document @{make_string}, cf. NEWS of Isabelle2009-2 (June 2010);
 
wenzelm 
parents: 
51295 
diff
changeset
 | 
718  | 
  writeln (@{make_string} {x = x, y = y});
 | 
| 56399 | 719  | 
|
720  | 
  @{print} {x = x, y = y};
 | 
|
721  | 
  @{print tracing} {x = x, y = y};
 | 
|
| 58618 | 722  | 
\<close>  | 
723  | 
||
724  | 
||
725  | 
section \<open>Canonical argument order \label{sec:canonical-argument-order}\<close>
 | 
|
726  | 
||
| 61854 | 727  | 
text \<open>  | 
728  | 
Standard ML is a language in the tradition of \<open>\<lambda>\<close>-calculus and  | 
|
729  | 
\<^emph>\<open>higher-order functional programming\<close>, similar to OCaml, Haskell, or  | 
|
730  | 
Isabelle/Pure and HOL as logical languages. Getting acquainted with the  | 
|
731  | 
native style of representing functions in that setting can save a lot of  | 
|
732  | 
extra boiler-plate of redundant shuffling of arguments, auxiliary  | 
|
733  | 
abstractions etc.  | 
|
| 39883 | 734  | 
|
| 61854 | 735  | 
Functions are usually \<^emph>\<open>curried\<close>: the idea of turning arguments of type  | 
736  | 
  \<open>\<tau>\<^sub>i\<close> (for \<open>i \<in> {1, \<dots> n}\<close>) into a result of type \<open>\<tau>\<close> is represented by the
 | 
|
737  | 
iterated function space \<open>\<tau>\<^sub>1 \<rightarrow> \<dots> \<rightarrow> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>. This is isomorphic to the  | 
|
738  | 
well-known encoding via tuples \<open>\<tau>\<^sub>1 \<times> \<dots> \<times> \<tau>\<^sub>n \<rightarrow> \<tau>\<close>, but the curried version  | 
|
739  | 
fits more smoothly into the basic calculus.\<^footnote>\<open>The difference is even more  | 
|
740  | 
significant in HOL, because the redundant tuple structure needs to be  | 
|
741  | 
accommodated extraneous proof steps.\<close>  | 
|
| 39883 | 742  | 
|
| 61854 | 743  | 
Currying gives some flexibility due to \<^emph>\<open>partial application\<close>. A function  | 
744  | 
\<open>f: \<tau>\<^sub>1 \<rightarrow> \<tau>\<^sub>2 \<rightarrow> \<tau>\<close> can be applied to \<open>x: \<tau>\<^sub>1\<close> and the remaining \<open>(f x): \<tau>\<^sub>2  | 
|
745  | 
\<rightarrow> \<tau>\<close> passed to another function etc. How well this works in practice depends  | 
|
746  | 
on the order of arguments. In the worst case, arguments are arranged  | 
|
747  | 
erratically, and using a function in a certain situation always requires  | 
|
748  | 
some glue code. Thus we would get exponentially many opportunities to  | 
|
| 39883 | 749  | 
decorate the code with meaningless permutations of arguments.  | 
750  | 
||
| 61854 | 751  | 
This can be avoided by \<^emph>\<open>canonical argument order\<close>, which observes certain  | 
752  | 
standard patterns and minimizes adhoc permutations in their application. In  | 
|
753  | 
Isabelle/ML, large portions of text can be written without auxiliary  | 
|
754  | 
operations like \<open>swap: \<alpha> \<times> \<beta> \<rightarrow> \<beta> \<times> \<alpha>\<close> or \<open>C: (\<alpha> \<rightarrow> \<beta> \<rightarrow> \<gamma>) \<rightarrow> (\<beta> \<rightarrow> \<alpha> \<rightarrow> \<gamma>)\<close> (the  | 
|
755  | 
latter is not present in the Isabelle/ML library).  | 
|
| 39883 | 756  | 
|
| 61416 | 757  | 
\<^medskip>  | 
| 61854 | 758  | 
The main idea is that arguments that vary less are moved further to the left  | 
759  | 
than those that vary more. Two particularly important categories of  | 
|
760  | 
functions are \<^emph>\<open>selectors\<close> and \<^emph>\<open>updates\<close>.  | 
|
| 39883 | 761  | 
|
| 61854 | 762  | 
The subsequent scheme is based on a hypothetical set-like container of type  | 
763  | 
\<open>\<beta>\<close> that manages elements of type \<open>\<alpha>\<close>. Both the names and types of the  | 
|
764  | 
associated operations are canonical for Isabelle/ML.  | 
|
| 39883 | 765  | 
|
| 52416 | 766  | 
  \begin{center}
 | 
| 39883 | 767  | 
  \begin{tabular}{ll}
 | 
768  | 
kind & canonical name and type \\\hline  | 
|
| 61493 | 769  | 
selector & \<open>member: \<beta> \<rightarrow> \<alpha> \<rightarrow> bool\<close> \\  | 
770  | 
update & \<open>insert: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> \\  | 
|
| 39883 | 771  | 
  \end{tabular}
 | 
| 52416 | 772  | 
  \end{center}
 | 
| 39883 | 773  | 
|
| 61854 | 774  | 
Given a container \<open>B: \<beta>\<close>, the partially applied \<open>member B\<close> is a predicate  | 
775  | 
over elements \<open>\<alpha> \<rightarrow> bool\<close>, and thus represents the intended denotation  | 
|
776  | 
directly. It is customary to pass the abstract predicate to further  | 
|
777  | 
operations, not the concrete container. The argument order makes it easy to  | 
|
778  | 
use other combinators: \<open>forall (member B) list\<close> will check a list of  | 
|
779  | 
elements for membership in \<open>B\<close> etc. Often the explicit \<open>list\<close> is pointless  | 
|
780  | 
and can be contracted to \<open>forall (member B)\<close> to get directly a predicate  | 
|
781  | 
again.  | 
|
| 39883 | 782  | 
|
| 61854 | 783  | 
In contrast, an update operation varies the container, so it moves to the  | 
784  | 
right: \<open>insert a\<close> is a function \<open>\<beta> \<rightarrow> \<beta>\<close> to insert a value \<open>a\<close>. These can be  | 
|
785  | 
composed naturally as \<open>insert c \<circ> insert b \<circ> insert a\<close>. The slightly awkward  | 
|
786  | 
inversion of the composition order is due to conventional mathematical  | 
|
787  | 
notation, which can be easily amended as explained below.  | 
|
| 58618 | 788  | 
\<close>  | 
789  | 
||
790  | 
||
791  | 
subsection \<open>Forward application and composition\<close>  | 
|
792  | 
||
| 61854 | 793  | 
text \<open>  | 
794  | 
Regular function application and infix notation works best for relatively  | 
|
795  | 
deeply structured expressions, e.g.\ \<open>h (f x y + g z)\<close>. The important  | 
|
796  | 
special case of \<^emph>\<open>linear transformation\<close> applies a cascade of functions \<open>f\<^sub>n  | 
|
797  | 
(\<dots> (f\<^sub>1 x))\<close>. This becomes hard to read and maintain if the functions are  | 
|
798  | 
themselves given as complex expressions. The notation can be significantly  | 
|
799  | 
improved by introducing \<^emph>\<open>forward\<close> versions of application and composition  | 
|
800  | 
as follows:  | 
|
| 39883 | 801  | 
|
| 61416 | 802  | 
\<^medskip>  | 
| 39883 | 803  | 
  \begin{tabular}{lll}
 | 
| 61493 | 804  | 
\<open>x |> f\<close> & \<open>\<equiv>\<close> & \<open>f x\<close> \\  | 
805  | 
\<open>(f #> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |> g\<close> \\  | 
|
| 39883 | 806  | 
  \end{tabular}
 | 
| 61416 | 807  | 
\<^medskip>  | 
| 39883 | 808  | 
|
| 61854 | 809  | 
This enables to write conveniently \<open>x |> f\<^sub>1 |> \<dots> |> f\<^sub>n\<close> or \<open>f\<^sub>1 #> \<dots> #>  | 
810  | 
f\<^sub>n\<close> for its functional abstraction over \<open>x\<close>.  | 
|
| 39883 | 811  | 
|
| 61416 | 812  | 
\<^medskip>  | 
| 61854 | 813  | 
There is an additional set of combinators to accommodate multiple results  | 
814  | 
(via pairs) that are passed on as multiple arguments (via currying).  | 
|
| 39883 | 815  | 
|
| 61416 | 816  | 
\<^medskip>  | 
| 39883 | 817  | 
  \begin{tabular}{lll}
 | 
| 61493 | 818  | 
\<open>(x, y) |-> f\<close> & \<open>\<equiv>\<close> & \<open>f x y\<close> \\  | 
819  | 
\<open>(f #-> g) x\<close> & \<open>\<equiv>\<close> & \<open>x |> f |-> g\<close> \\  | 
|
| 39883 | 820  | 
  \end{tabular}
 | 
| 61416 | 821  | 
\<^medskip>  | 
| 58618 | 822  | 
\<close>  | 
823  | 
||
824  | 
text %mlref \<open>  | 
|
| 39883 | 825  | 
  \begin{mldecls}
 | 
| 46262 | 826  | 
  @{index_ML_op "|> ": "'a * ('a -> 'b) -> 'b"} \\
 | 
827  | 
  @{index_ML_op "|-> ": "('c * 'a) * ('c -> 'a -> 'b) -> 'b"} \\
 | 
|
828  | 
  @{index_ML_op "#> ": "('a -> 'b) * ('b -> 'c) -> 'a -> 'c"} \\
 | 
|
829  | 
  @{index_ML_op "#-> ": "('a -> 'c * 'b) * ('c -> 'b -> 'd) -> 'a -> 'd"} \\
 | 
|
| 39883 | 830  | 
  \end{mldecls}
 | 
| 58618 | 831  | 
\<close>  | 
832  | 
||
833  | 
||
834  | 
subsection \<open>Canonical iteration\<close>  | 
|
835  | 
||
| 61854 | 836  | 
text \<open>  | 
837  | 
As explained above, a function \<open>f: \<alpha> \<rightarrow> \<beta> \<rightarrow> \<beta>\<close> can be understood as update on  | 
|
838  | 
a configuration of type \<open>\<beta>\<close>, parameterized by an argument of type \<open>\<alpha>\<close>. Given  | 
|
839  | 
\<open>a: \<alpha>\<close> the partial application \<open>(f a): \<beta> \<rightarrow> \<beta>\<close> operates homogeneously on \<open>\<beta>\<close>.  | 
|
840  | 
This can be iterated naturally over a list of parameters \<open>[a\<^sub>1, \<dots>, a\<^sub>n]\<close> as  | 
|
841  | 
\<open>f a\<^sub>1 #> \<dots> #> f a\<^sub>n\<close>. The latter expression is again a function \<open>\<beta> \<rightarrow> \<beta>\<close>. It  | 
|
842  | 
can be applied to an initial configuration \<open>b: \<beta>\<close> to start the iteration  | 
|
843  | 
over the given list of arguments: each \<open>a\<close> in \<open>a\<^sub>1, \<dots>, a\<^sub>n\<close> is applied  | 
|
844  | 
consecutively by updating a cumulative configuration.  | 
|
| 39883 | 845  | 
|
| 61854 | 846  | 
The \<open>fold\<close> combinator in Isabelle/ML lifts a function \<open>f\<close> as above to its  | 
847  | 
iterated version over a list of arguments. Lifting can be repeated, e.g.\  | 
|
848  | 
\<open>(fold \<circ> fold) f\<close> iterates over a list of lists as expected.  | 
|
| 39883 | 849  | 
|
| 61854 | 850  | 
The variant \<open>fold_rev\<close> works inside-out over the list of arguments, such  | 
851  | 
that \<open>fold_rev f \<equiv> fold f \<circ> rev\<close> holds.  | 
|
| 61493 | 852  | 
|
| 61854 | 853  | 
The \<open>fold_map\<close> combinator essentially performs \<open>fold\<close> and \<open>map\<close>  | 
854  | 
simultaneously: each application of \<open>f\<close> produces an updated configuration  | 
|
855  | 
together with a side-result; the iteration collects all such side-results as  | 
|
856  | 
a separate list.  | 
|
| 58618 | 857  | 
\<close>  | 
858  | 
||
859  | 
text %mlref \<open>  | 
|
| 39883 | 860  | 
  \begin{mldecls}
 | 
861  | 
  @{index_ML fold: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | 
|
862  | 
  @{index_ML fold_rev: "('a -> 'b -> 'b) -> 'a list -> 'b -> 'b"} \\
 | 
|
863  | 
  @{index_ML fold_map: "('a -> 'b -> 'c * 'b) -> 'a list -> 'b -> 'c list * 'b"} \\
 | 
|
864  | 
  \end{mldecls}
 | 
|
865  | 
||
| 61854 | 866  | 
  \<^descr> @{ML fold}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> to a list of
 | 
867  | 
parameters.  | 
|
| 61493 | 868  | 
|
| 61854 | 869  | 
  \<^descr> @{ML fold_rev}~\<open>f\<close> is similar to @{ML fold}~\<open>f\<close>, but works inside-out, as
 | 
870  | 
if the list would be reversed.  | 
|
| 61493 | 871  | 
|
| 61854 | 872  | 
  \<^descr> @{ML fold_map}~\<open>f\<close> lifts the parametrized update function \<open>f\<close> (with
 | 
873  | 
side-result) to a list of parameters and cumulative side-results.  | 
|
| 39883 | 874  | 
|
875  | 
||
876  | 
  \begin{warn}
 | 
|
| 57421 | 877  | 
The literature on functional programming provides a confusing multitude of  | 
| 61854 | 878  | 
combinators called \<open>foldl\<close>, \<open>foldr\<close> etc. SML97 provides its own variations  | 
879  | 
  as @{ML List.foldl} and @{ML List.foldr}, while the classic Isabelle library
 | 
|
880  | 
  also has the historic @{ML Library.foldl} and @{ML Library.foldr}. To avoid
 | 
|
881  | 
unnecessary complication, all these historical versions should be ignored,  | 
|
882  | 
  and the canonical @{ML fold} (or @{ML fold_rev}) used exclusively.
 | 
|
| 39883 | 883  | 
  \end{warn}
 | 
| 58618 | 884  | 
\<close>  | 
885  | 
||
| 61854 | 886  | 
text %mlex \<open>  | 
887  | 
The following example shows how to fill a text buffer incrementally by  | 
|
888  | 
adding strings, either individually or from a given list.  | 
|
| 58618 | 889  | 
\<close>  | 
890  | 
||
| 59902 | 891  | 
ML_val \<open>  | 
| 39883 | 892  | 
val s =  | 
893  | 
Buffer.empty  | 
|
894  | 
|> Buffer.add "digits: "  | 
|
895  | 
|> fold (Buffer.add o string_of_int) (0 upto 9)  | 
|
896  | 
|> Buffer.content;  | 
|
897  | 
||
898  | 
  @{assert} (s = "digits: 0123456789");
 | 
|
| 58618 | 899  | 
\<close>  | 
900  | 
||
| 61854 | 901  | 
text \<open>  | 
902  | 
  Note how @{ML "fold (Buffer.add o string_of_int)"} above saves an extra @{ML
 | 
|
903  | 
"map"} over the given list. This kind of peephole optimization reduces both  | 
|
904  | 
the code size and the tree structures in memory (``deforestation''), but it  | 
|
905  | 
requires some practice to read and write fluently.  | 
|
| 39883 | 906  | 
|
| 61416 | 907  | 
\<^medskip>  | 
| 61854 | 908  | 
The next example elaborates the idea of canonical iteration, demonstrating  | 
909  | 
fast accumulation of tree content using a text buffer.  | 
|
| 58618 | 910  | 
\<close>  | 
911  | 
||
912  | 
ML \<open>  | 
|
| 39883 | 913  | 
datatype tree = Text of string | Elem of string * tree list;  | 
914  | 
||
915  | 
fun slow_content (Text txt) = txt  | 
|
916  | 
| slow_content (Elem (name, ts)) =  | 
|
917  | 
"<" ^ name ^ ">" ^  | 
|
918  | 
implode (map slow_content ts) ^  | 
|
919  | 
"</" ^ name ^ ">"  | 
|
920  | 
||
921  | 
fun add_content (Text txt) = Buffer.add txt  | 
|
922  | 
| add_content (Elem (name, ts)) =  | 
|
923  | 
        Buffer.add ("<" ^ name ^ ">") #>
 | 
|
924  | 
fold add_content ts #>  | 
|
925  | 
        Buffer.add ("</" ^ name ^ ">");
 | 
|
926  | 
||
927  | 
fun fast_content tree =  | 
|
928  | 
Buffer.empty |> add_content tree |> Buffer.content;  | 
|
| 58618 | 929  | 
\<close>  | 
930  | 
||
| 61854 | 931  | 
text \<open>  | 
932  | 
  The slowness of @{ML slow_content} is due to the @{ML implode} of the
 | 
|
933  | 
recursive results, because it copies previously produced strings again and  | 
|
934  | 
again.  | 
|
| 39883 | 935  | 
|
| 61854 | 936  | 
  The incremental @{ML add_content} avoids this by operating on a buffer that
 | 
937  | 
  is passed through in a linear fashion. Using @{ML_text "#>"} and contraction
 | 
|
938  | 
over the actual buffer argument saves some additional boiler-plate. Of  | 
|
939  | 
  course, the two @{ML "Buffer.add"} invocations with concatenated strings
 | 
|
940  | 
could have been split into smaller parts, but this would have obfuscated the  | 
|
941  | 
source without making a big difference in performance. Here we have done  | 
|
942  | 
some peephole-optimization for the sake of readability.  | 
|
| 39883 | 943  | 
|
| 61854 | 944  | 
  Another benefit of @{ML add_content} is its ``open'' form as a function on
 | 
945  | 
buffers that can be continued in further linear transformations, folding  | 
|
946  | 
  etc. Thus it is more compositional than the naive @{ML slow_content}. As
 | 
|
947  | 
  realistic example, compare the old-style @{ML "Term.maxidx_of_term: term ->
 | 
|
948  | 
  int"} with the newer @{ML "Term.maxidx_term: term -> int -> int"} in
 | 
|
949  | 
Isabelle/Pure.  | 
|
| 39883 | 950  | 
|
| 61854 | 951  | 
  Note that @{ML fast_content} above is only defined as example. In many
 | 
952  | 
  practical situations, it is customary to provide the incremental @{ML
 | 
|
953  | 
add_content} only and leave the initialization and termination to the  | 
|
954  | 
concrete application to the user.  | 
|
| 58618 | 955  | 
\<close>  | 
956  | 
||
957  | 
||
958  | 
section \<open>Message output channels \label{sec:message-channels}\<close>
 | 
|
959  | 
||
| 61854 | 960  | 
text \<open>  | 
961  | 
Isabelle provides output channels for different kinds of messages: regular  | 
|
962  | 
output, high-volume tracing information, warnings, and errors.  | 
|
| 39835 | 963  | 
|
| 61854 | 964  | 
Depending on the user interface involved, these messages may appear in  | 
965  | 
different text styles or colours. The standard output for batch sessions  | 
|
966  | 
prefixes each line of warnings by \<^verbatim>\<open>###\<close> and errors by \<^verbatim>\<open>***\<close>, but leaves  | 
|
967  | 
anything else unchanged. The message body may contain further markup and  | 
|
968  | 
  formatting, which is routinely used in the Prover IDE @{cite
 | 
|
969  | 
"isabelle-jedit"}.  | 
|
| 39835 | 970  | 
|
| 61854 | 971  | 
Messages are associated with the transaction context of the running Isar  | 
972  | 
command. This enables the front-end to manage commands and resulting  | 
|
973  | 
messages together. For example, after deleting a command from a given theory  | 
|
974  | 
document version, the corresponding message output can be retracted from the  | 
|
975  | 
display.  | 
|
| 58618 | 976  | 
\<close>  | 
977  | 
||
978  | 
text %mlref \<open>  | 
|
| 39835 | 979  | 
  \begin{mldecls}
 | 
980  | 
  @{index_ML writeln: "string -> unit"} \\
 | 
|
981  | 
  @{index_ML tracing: "string -> unit"} \\
 | 
|
982  | 
  @{index_ML warning: "string -> unit"} \\
 | 
|
| 57421 | 983  | 
  @{index_ML error: "string -> 'a"} % FIXME Output.error_message (!?) \\
 | 
| 39835 | 984  | 
  \end{mldecls}
 | 
985  | 
||
| 61854 | 986  | 
  \<^descr> @{ML writeln}~\<open>text\<close> outputs \<open>text\<close> as regular message. This is the
 | 
987  | 
primary message output operation of Isabelle and should be used by default.  | 
|
| 39835 | 988  | 
|
| 61854 | 989  | 
  \<^descr> @{ML tracing}~\<open>text\<close> outputs \<open>text\<close> as special tracing message, indicating
 | 
990  | 
potential high-volume output to the front-end (hundreds or thousands of  | 
|
991  | 
messages issued by a single command). The idea is to allow the  | 
|
992  | 
user-interface to downgrade the quality of message display to achieve higher  | 
|
993  | 
throughput.  | 
|
| 39835 | 994  | 
|
| 61854 | 995  | 
Note that the user might have to take special actions to see tracing output,  | 
996  | 
e.g.\ switch to a different output window. So this channel should not be  | 
|
997  | 
used for regular output.  | 
|
| 39835 | 998  | 
|
| 61854 | 999  | 
  \<^descr> @{ML warning}~\<open>text\<close> outputs \<open>text\<close> as warning, which typically means some
 | 
1000  | 
extra emphasis on the front-end side (color highlighting, icons, etc.).  | 
|
| 39835 | 1001  | 
|
| 61854 | 1002  | 
  \<^descr> @{ML error}~\<open>text\<close> raises exception @{ML ERROR}~\<open>text\<close> and thus lets the
 | 
1003  | 
Isar toplevel print \<open>text\<close> on the error channel, which typically means some  | 
|
1004  | 
extra emphasis on the front-end side (color highlighting, icons, etc.).  | 
|
| 39835 | 1005  | 
|
1006  | 
This assumes that the exception is not handled before the command  | 
|
| 61854 | 1007  | 
  terminates. Handling exception @{ML ERROR}~\<open>text\<close> is a perfectly legal
 | 
1008  | 
alternative: it means that the error is absorbed without any message output.  | 
|
| 39835 | 1009  | 
|
| 
39861
 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 
wenzelm 
parents: 
39859 
diff
changeset
 | 
1010  | 
  \begin{warn}
 | 
| 54387 | 1011  | 
  The actual error channel is accessed via @{ML Output.error_message}, but
 | 
| 58842 | 1012  | 
this is normally not used directly in user code.  | 
| 
39861
 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 
wenzelm 
parents: 
39859 
diff
changeset
 | 
1013  | 
  \end{warn}
 | 
| 39835 | 1014  | 
|
| 
39861
 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 
wenzelm 
parents: 
39859 
diff
changeset
 | 
1015  | 
|
| 
 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 
wenzelm 
parents: 
39859 
diff
changeset
 | 
1016  | 
  \begin{warn}
 | 
| 61854 | 1017  | 
Regular Isabelle/ML code should output messages exclusively by the official  | 
1018  | 
  channels. Using raw I/O on \<^emph>\<open>stdout\<close> or \<^emph>\<open>stderr\<close> instead (e.g.\ via @{ML
 | 
|
1019  | 
TextIO.output}) is apt to cause problems in the presence of parallel and  | 
|
1020  | 
asynchronous processing of Isabelle theories. Such raw output might be  | 
|
1021  | 
displayed by the front-end in some system console log, with a low chance  | 
|
1022  | 
that the user will ever see it. Moreover, as a genuine side-effect on global  | 
|
1023  | 
process channels, there is no proper way to retract output when Isar command  | 
|
| 40126 | 1024  | 
transactions are reset by the system.  | 
| 
39861
 
b8d89db3e238
use continental paragraph style, which works better with mixture of (in)formal text;
 
wenzelm 
parents: 
39859 
diff
changeset
 | 
1025  | 
  \end{warn}
 | 
| 39872 | 1026  | 
|
1027  | 
  \begin{warn}
 | 
|
| 61854 | 1028  | 
The message channels should be used in a message-oriented manner. This means  | 
1029  | 
that multi-line output that logically belongs together is issued by a single  | 
|
1030  | 
  invocation of @{ML writeln} etc.\ with the functional concatenation of all
 | 
|
1031  | 
message constituents.  | 
|
| 39872 | 1032  | 
  \end{warn}
 | 
| 58618 | 1033  | 
\<close>  | 
1034  | 
||
| 61854 | 1035  | 
text %mlex \<open>  | 
1036  | 
The following example demonstrates a multi-line warning. Note that in some  | 
|
1037  | 
situations the user sees only the first line, so the most important point  | 
|
1038  | 
should be made first.  | 
|
| 58618 | 1039  | 
\<close>  | 
1040  | 
||
1041  | 
ML_command \<open>  | 
|
| 39872 | 1042  | 
warning (cat_lines  | 
1043  | 
["Beware the Jabberwock, my son!",  | 
|
1044  | 
"The jaws that bite, the claws that catch!",  | 
|
1045  | 
"Beware the Jubjub Bird, and shun",  | 
|
1046  | 
"The frumious Bandersnatch!"]);  | 
|
| 58618 | 1047  | 
\<close>  | 
1048  | 
||
| 59902 | 1049  | 
text \<open>  | 
| 61416 | 1050  | 
\<^medskip>  | 
| 61854 | 1051  | 
An alternative is to make a paragraph of freely-floating words as follows.  | 
| 59902 | 1052  | 
\<close>  | 
1053  | 
||
1054  | 
ML_command \<open>  | 
|
1055  | 
warning (Pretty.string_of (Pretty.para  | 
|
1056  | 
"Beware the Jabberwock, my son! \  | 
|
1057  | 
\The jaws that bite, the claws that catch! \  | 
|
1058  | 
\Beware the Jubjub Bird, and shun \  | 
|
1059  | 
\The frumious Bandersnatch!"))  | 
|
1060  | 
\<close>  | 
|
1061  | 
||
1062  | 
text \<open>  | 
|
1063  | 
This has advantages with variable window / popup sizes, but might make it  | 
|
1064  | 
harder to search for message content systematically, e.g.\ by other tools or  | 
|
1065  | 
by humans expecting the ``verse'' of a formal message in a fixed layout.  | 
|
1066  | 
\<close>  | 
|
1067  | 
||
| 58618 | 1068  | 
|
1069  | 
section \<open>Exceptions \label{sec:exceptions}\<close>
 | 
|
1070  | 
||
| 61854 | 1071  | 
text \<open>  | 
1072  | 
The Standard ML semantics of strict functional evaluation together with  | 
|
1073  | 
exceptions is rather well defined, but some delicate points need to be  | 
|
1074  | 
observed to avoid that ML programs go wrong despite static type-checking.  | 
|
1075  | 
Exceptions in Isabelle/ML are subsequently categorized as follows.  | 
|
1076  | 
\<close>  | 
|
| 61506 | 1077  | 
|
1078  | 
paragraph \<open>Regular user errors.\<close>  | 
|
| 61854 | 1079  | 
text \<open>  | 
1080  | 
These are meant to provide informative feedback about malformed input etc.  | 
|
1081  | 
||
1082  | 
  The \<^emph>\<open>error\<close> function raises the corresponding @{ML ERROR} exception, with a
 | 
|
1083  | 
  plain text message as argument. @{ML ERROR} exceptions can be handled
 | 
|
1084  | 
internally, in order to be ignored, turned into other exceptions, or  | 
|
1085  | 
cascaded by appending messages. If the corresponding Isabelle/Isar command  | 
|
1086  | 
  terminates with an @{ML ERROR} exception state, the system will print the
 | 
|
1087  | 
  result on the error channel (see \secref{sec:message-channels}).
 | 
|
| 39854 | 1088  | 
|
| 61854 | 1089  | 
It is considered bad style to refer to internal function names or values in  | 
1090  | 
  ML source notation in user error messages. Do not use \<open>@{make_string}\<close> nor
 | 
|
1091  | 
  \<open>@{here}\<close>!
 | 
|
| 39854 | 1092  | 
|
| 61854 | 1093  | 
Grammatical correctness of error messages can be improved by \<^emph>\<open>omitting\<close>  | 
1094  | 
final punctuation: messages are often concatenated or put into a larger  | 
|
1095  | 
context (e.g.\ augmented with source position). Note that punctuation after  | 
|
1096  | 
formal entities (types, terms, theorems) is particularly prone to user  | 
|
1097  | 
confusion.  | 
|
| 61506 | 1098  | 
\<close>  | 
1099  | 
||
1100  | 
paragraph \<open>Program failures.\<close>  | 
|
| 61854 | 1101  | 
text \<open>  | 
1102  | 
There is a handful of standard exceptions that indicate general failure  | 
|
| 61506 | 1103  | 
situations, or failures of core operations on logical entities (types,  | 
1104  | 
  terms, theorems, theories, see \chref{ch:logic}).
 | 
|
| 39854 | 1105  | 
|
| 61854 | 1106  | 
These exceptions indicate a genuine breakdown of the program, so the main  | 
1107  | 
purpose is to determine quickly what has happened where. Traditionally, the  | 
|
1108  | 
(short) exception message would include the name of an ML function, although  | 
|
1109  | 
this is no longer necessary, because the ML runtime system attaches detailed  | 
|
1110  | 
  source position stemming from the corresponding @{ML_text raise} keyword.
 | 
|
| 39854 | 1111  | 
|
| 61416 | 1112  | 
\<^medskip>  | 
| 61854 | 1113  | 
User modules can always introduce their own custom exceptions locally, e.g.\  | 
1114  | 
to organize internal failures robustly without overlapping with existing  | 
|
1115  | 
exceptions. Exceptions that are exposed in module signatures require extra  | 
|
1116  | 
care, though, and should \<^emph>\<open>not\<close> be introduced by default. Surprise by users  | 
|
1117  | 
of a module can be often minimized by using plain user errors instead.  | 
|
| 61506 | 1118  | 
\<close>  | 
1119  | 
||
1120  | 
paragraph \<open>Interrupts.\<close>  | 
|
| 61854 | 1121  | 
text \<open>  | 
1122  | 
These indicate arbitrary system events: both the ML runtime system and the  | 
|
1123  | 
Isabelle/ML infrastructure signal various exceptional situations by raising  | 
|
1124  | 
  the special @{ML Exn.Interrupt} exception in user code.
 | 
|
| 57421 | 1125  | 
|
1126  | 
This is the one and only way that physical events can intrude an Isabelle/ML  | 
|
1127  | 
program. Such an interrupt can mean out-of-memory, stack overflow, timeout,  | 
|
1128  | 
internal signaling of threads, or a POSIX process signal. An Isabelle/ML  | 
|
1129  | 
program that intercepts interrupts becomes dependent on physical effects of  | 
|
1130  | 
the environment. Even worse, exception handling patterns that are too  | 
|
1131  | 
general by accident, e.g.\ by misspelled exception constructors, will cover  | 
|
1132  | 
interrupts unintentionally and thus render the program semantics  | 
|
1133  | 
ill-defined.  | 
|
| 39854 | 1134  | 
|
| 61854 | 1135  | 
Note that the Interrupt exception dates back to the original SML90 language  | 
1136  | 
definition. It was excluded from the SML97 version to avoid its malign  | 
|
1137  | 
impact on ML program semantics, but without providing a viable alternative.  | 
|
1138  | 
Isabelle/ML recovers physical interruptibility (which is an indispensable  | 
|
1139  | 
tool to implement managed evaluation of command transactions), but requires  | 
|
1140  | 
user code to be strictly transparent wrt.\ interrupts.  | 
|
| 39854 | 1141  | 
|
1142  | 
  \begin{warn}
 | 
|
| 61854 | 1143  | 
Isabelle/ML user code needs to terminate promptly on interruption, without  | 
1144  | 
guessing at its meaning to the system infrastructure. Temporary handling of  | 
|
1145  | 
interrupts for cleanup of global resources etc.\ needs to be followed  | 
|
1146  | 
immediately by re-raising of the original exception.  | 
|
| 39854 | 1147  | 
  \end{warn}
 | 
| 58618 | 1148  | 
\<close>  | 
1149  | 
||
1150  | 
text %mlref \<open>  | 
|
| 39855 | 1151  | 
  \begin{mldecls}
 | 
1152  | 
  @{index_ML try: "('a -> 'b) -> 'a -> 'b option"} \\
 | 
|
1153  | 
  @{index_ML can: "('a -> 'b) -> 'a -> bool"} \\
 | 
|
| 55838 | 1154  | 
  @{index_ML_exception ERROR: string} \\
 | 
1155  | 
  @{index_ML_exception Fail: string} \\
 | 
|
| 39856 | 1156  | 
  @{index_ML Exn.is_interrupt: "exn -> bool"} \\
 | 
| 62505 | 1157  | 
  @{index_ML Exn.reraise: "exn -> 'a"} \\
 | 
| 
56303
 
4cc3f4db3447
clarified Isabelle/ML bootstrap, such that Execution does not require ML_Compiler;
 
wenzelm 
parents: 
56199 
diff
changeset
 | 
1158  | 
  @{index_ML Runtime.exn_trace: "(unit -> 'a) -> 'a"} \\
 | 
| 39855 | 1159  | 
  \end{mldecls}
 | 
1160  | 
||
| 61854 | 1161  | 
  \<^descr> @{ML try}~\<open>f x\<close> makes the partiality of evaluating \<open>f x\<close> explicit via the
 | 
1162  | 
option datatype. Interrupts are \<^emph>\<open>not\<close> handled here, i.e.\ this form serves  | 
|
1163  | 
  as safe replacement for the \<^emph>\<open>unsafe\<close> version @{ML_text "(SOME"}~\<open>f
 | 
|
1164  | 
  x\<close>~@{ML_text "handle _ => NONE)"} that is occasionally seen in books about
 | 
|
1165  | 
SML97, but not in Isabelle/ML.  | 
|
| 39855 | 1166  | 
|
| 61439 | 1167  | 
  \<^descr> @{ML can} is similar to @{ML try} with more abstract result.
 | 
1168  | 
||
| 61854 | 1169  | 
  \<^descr> @{ML ERROR}~\<open>msg\<close> represents user errors; this exception is normally
 | 
1170  | 
  raised indirectly via the @{ML error} function (see
 | 
|
1171  | 
  \secref{sec:message-channels}).
 | 
|
| 39856 | 1172  | 
|
| 61493 | 1173  | 
  \<^descr> @{ML Fail}~\<open>msg\<close> represents general program failures.
 | 
| 61439 | 1174  | 
|
| 61854 | 1175  | 
  \<^descr> @{ML Exn.is_interrupt} identifies interrupts robustly, without mentioning
 | 
1176  | 
concrete exception constructors in user code. Handled interrupts need to be  | 
|
1177  | 
re-raised promptly!  | 
|
1178  | 
||
| 62505 | 1179  | 
  \<^descr> @{ML Exn.reraise}~\<open>exn\<close> raises exception \<open>exn\<close> while preserving its implicit
 | 
| 61854 | 1180  | 
position information (if possible, depending on the ML platform).  | 
| 39856 | 1181  | 
|
| 61854 | 1182  | 
  \<^descr> @{ML Runtime.exn_trace}~@{ML_text "(fn () =>"}~\<open>e\<close>@{ML_text ")"} evaluates
 | 
1183  | 
expression \<open>e\<close> while printing a full trace of its stack of nested exceptions  | 
|
1184  | 
(if possible, depending on the ML platform).  | 
|
| 39855 | 1185  | 
|
| 61854 | 1186  | 
  Inserting @{ML Runtime.exn_trace} into ML code temporarily is useful for
 | 
1187  | 
debugging, but not suitable for production code.  | 
|
| 58618 | 1188  | 
\<close>  | 
1189  | 
||
1190  | 
text %mlantiq \<open>  | 
|
| 
39866
 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 
wenzelm 
parents: 
39864 
diff
changeset
 | 
1191  | 
  \begin{matharray}{rcl}
 | 
| 61493 | 1192  | 
  @{ML_antiquotation_def "assert"} & : & \<open>ML_antiquotation\<close> \\
 | 
| 
39866
 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 
wenzelm 
parents: 
39864 
diff
changeset
 | 
1193  | 
  \end{matharray}
 | 
| 
 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 
wenzelm 
parents: 
39864 
diff
changeset
 | 
1194  | 
|
| 61854 | 1195  | 
  \<^descr> \<open>@{assert}\<close> inlines a function @{ML_type "bool -> unit"} that raises @{ML
 | 
1196  | 
  Fail} if the argument is @{ML false}. Due to inlining the source position of
 | 
|
1197  | 
failed assertions is included in the error output.  | 
|
| 58618 | 1198  | 
\<close>  | 
1199  | 
||
1200  | 
||
1201  | 
section \<open>Strings of symbols \label{sec:symbols}\<close>
 | 
|
1202  | 
||
| 61854 | 1203  | 
text \<open>  | 
1204  | 
A \<^emph>\<open>symbol\<close> constitutes the smallest textual unit in Isabelle/ML --- raw ML  | 
|
1205  | 
characters are normally not encountered at all. Isabelle strings consist of  | 
|
1206  | 
a sequence of symbols, represented as a packed string or an exploded list of  | 
|
1207  | 
strings. Each symbol is in itself a small string, which has either one of  | 
|
| 61504 | 1208  | 
the following forms:  | 
1209  | 
||
1210  | 
\<^enum> a single ASCII character ``\<open>c\<close>'', for example ``\<^verbatim>\<open>a\<close>'',  | 
|
1211  | 
||
1212  | 
\<^enum> a codepoint according to UTF-8 (non-ASCII byte sequence),  | 
|
1213  | 
||
1214  | 
\<^enum> a regular symbol ``\<^verbatim>\<open>\<ident>\<close>'', for example ``\<^verbatim>\<open>\<alpha>\<close>'',  | 
|
1215  | 
||
1216  | 
\<^enum> a control symbol ``\<^verbatim>\<open>\<^ident>\<close>'', for example ``\<^verbatim>\<open>\<^bold>\<close>'',  | 
|
1217  | 
||
1218  | 
\<^enum> a raw symbol ``\<^verbatim>\<open>\\<close>\<^verbatim>\<open><^raw:\<close>\<open>text\<close>\<^verbatim>\<open>>\<close>'' where \<open>text\<close> consists of  | 
|
1219  | 
printable characters excluding ``\<^verbatim>\<open>.\<close>'' and ``\<^verbatim>\<open>>\<close>'', for example  | 
|
1220  | 
    ``\<^verbatim>\<open>\<^raw:$\sum_{i = 1}^n$>\<close>'',
 | 
|
1221  | 
||
1222  | 
\<^enum> a numbered raw control symbol ``\<^verbatim>\<open>\\<close>\<^verbatim>\<open><^raw\<close>\<open>n\<close>\<^verbatim>\<open>>\<close>, where \<open>n\<close> consists  | 
|
1223  | 
of digits, for example ``\<^verbatim>\<open>\<^raw42>\<close>''.  | 
|
1224  | 
||
1225  | 
The \<open>ident\<close> syntax for symbol names is \<open>letter (letter | digit)\<^sup>*\<close>, where  | 
|
1226  | 
\<open>letter = A..Za..z\<close> and \<open>digit = 0..9\<close>. There are infinitely many regular  | 
|
1227  | 
symbols and control symbols, but a fixed collection of standard symbols is  | 
|
1228  | 
treated specifically. For example, ``\<^verbatim>\<open>\<alpha>\<close>'' is classified as a letter, which  | 
|
1229  | 
means it may occur within regular Isabelle identifiers.  | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1230  | 
|
| 57421 | 1231  | 
The character set underlying Isabelle symbols is 7-bit ASCII, but 8-bit  | 
1232  | 
character sequences are passed-through unchanged. Unicode/UCS data in UTF-8  | 
|
1233  | 
encoding is processed in a non-strict fashion, such that well-formed code  | 
|
1234  | 
sequences are recognized accordingly. Unicode provides its own collection of  | 
|
1235  | 
mathematical symbols, but within the core Isabelle/ML world there is no link  | 
|
1236  | 
to the standard collection of Isabelle regular symbols.  | 
|
1237  | 
||
| 61416 | 1238  | 
\<^medskip>  | 
| 61504 | 1239  | 
Output of Isabelle symbols depends on the print mode. For example, the  | 
1240  | 
  standard {\LaTeX} setup of the Isabelle document preparation system would
 | 
|
1241  | 
present ``\<^verbatim>\<open>\<alpha>\<close>'' as \<open>\<alpha>\<close>, and ``\<^verbatim>\<open>\<^bold>\<alpha>\<close>'' as \<open>\<^bold>\<alpha>\<close>. On-screen rendering usually  | 
|
1242  | 
works by mapping a finite subset of Isabelle symbols to suitable Unicode  | 
|
1243  | 
characters.  | 
|
| 58618 | 1244  | 
\<close>  | 
1245  | 
||
1246  | 
text %mlref \<open>  | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1247  | 
  \begin{mldecls}
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1248  | 
  @{index_ML_type "Symbol.symbol": string} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1249  | 
  @{index_ML Symbol.explode: "string -> Symbol.symbol list"} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1250  | 
  @{index_ML Symbol.is_letter: "Symbol.symbol -> bool"} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1251  | 
  @{index_ML Symbol.is_digit: "Symbol.symbol -> bool"} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1252  | 
  @{index_ML Symbol.is_quasi: "Symbol.symbol -> bool"} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1253  | 
  @{index_ML Symbol.is_blank: "Symbol.symbol -> bool"} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1254  | 
  \end{mldecls}
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1255  | 
  \begin{mldecls}
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1256  | 
  @{index_ML_type "Symbol.sym"} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1257  | 
  @{index_ML Symbol.decode: "Symbol.symbol -> Symbol.sym"} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1258  | 
  \end{mldecls}
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1259  | 
|
| 61854 | 1260  | 
  \<^descr> Type @{ML_type "Symbol.symbol"} represents individual Isabelle symbols.
 | 
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1261  | 
|
| 61854 | 1262  | 
  \<^descr> @{ML "Symbol.explode"}~\<open>str\<close> produces a symbol list from the packed form.
 | 
1263  | 
  This function supersedes @{ML "String.explode"} for virtually all purposes
 | 
|
1264  | 
of manipulating text in Isabelle!\<^footnote>\<open>The runtime overhead for exploded strings  | 
|
1265  | 
is mainly that of the list structure: individual symbols that happen to be a  | 
|
1266  | 
singleton string do not require extra memory in Poly/ML.\<close>  | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1267  | 
|
| 61439 | 1268  | 
  \<^descr> @{ML "Symbol.is_letter"}, @{ML "Symbol.is_digit"}, @{ML
 | 
| 61854 | 1269  | 
  "Symbol.is_quasi"}, @{ML "Symbol.is_blank"} classify standard symbols
 | 
1270  | 
  according to fixed syntactic conventions of Isabelle, cf.\ @{cite
 | 
|
1271  | 
"isabelle-isar-ref"}.  | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1272  | 
|
| 61854 | 1273  | 
  \<^descr> Type @{ML_type "Symbol.sym"} is a concrete datatype that represents the
 | 
1274  | 
  different kinds of symbols explicitly, with constructors @{ML
 | 
|
1275  | 
  "Symbol.Char"}, @{ML "Symbol.UTF8"}, @{ML "Symbol.Sym"}, @{ML
 | 
|
1276  | 
  "Symbol.Control"}, @{ML "Symbol.Raw"}, @{ML "Symbol.Malformed"}.
 | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1277  | 
|
| 61854 | 1278  | 
  \<^descr> @{ML "Symbol.decode"} converts the string representation of a symbol into
 | 
1279  | 
the datatype version.  | 
|
| 61506 | 1280  | 
\<close>  | 
1281  | 
||
1282  | 
paragraph \<open>Historical note.\<close>  | 
|
| 61854 | 1283  | 
text \<open>  | 
1284  | 
  In the original SML90 standard the primitive ML type @{ML_type char} did not
 | 
|
1285  | 
  exists, and @{ML_text "explode: string -> string list"} produced a list of
 | 
|
1286  | 
  singleton strings like @{ML "raw_explode: string -> string list"} in
 | 
|
| 61506 | 1287  | 
Isabelle/ML today. When SML97 came out, Isabelle did not adopt its somewhat  | 
1288  | 
anachronistic 8-bit or 16-bit characters, but the idea of exploding a string  | 
|
1289  | 
into a list of small strings was extended to ``symbols'' as explained above.  | 
|
1290  | 
Thus Isabelle sources can refer to an infinite store of user-defined  | 
|
1291  | 
symbols, without having to worry about the multitude of Unicode encodings  | 
|
| 61854 | 1292  | 
that have emerged over the years.  | 
1293  | 
\<close>  | 
|
| 58618 | 1294  | 
|
1295  | 
||
1296  | 
section \<open>Basic data types\<close>  | 
|
1297  | 
||
| 61854 | 1298  | 
text \<open>  | 
1299  | 
The basis library proposal of SML97 needs to be treated with caution. Many  | 
|
1300  | 
of its operations simply do not fit with important Isabelle/ML conventions  | 
|
1301  | 
(like ``canonical argument order'', see  | 
|
1302  | 
  \secref{sec:canonical-argument-order}), others cause problems with the
 | 
|
1303  | 
  parallel evaluation model of Isabelle/ML (such as @{ML TextIO.print} or @{ML
 | 
|
1304  | 
OS.Process.system}).  | 
|
| 39859 | 1305  | 
|
| 61854 | 1306  | 
Subsequently we give a brief overview of important operations on basic ML  | 
1307  | 
data types.  | 
|
| 58618 | 1308  | 
\<close>  | 
1309  | 
||
1310  | 
||
1311  | 
subsection \<open>Characters\<close>  | 
|
1312  | 
||
1313  | 
text %mlref \<open>  | 
|
| 39863 | 1314  | 
  \begin{mldecls}
 | 
1315  | 
  @{index_ML_type char} \\
 | 
|
1316  | 
  \end{mldecls}
 | 
|
1317  | 
||
| 61854 | 1318  | 
  \<^descr> Type @{ML_type char} is \<^emph>\<open>not\<close> used. The smallest textual unit in Isabelle
 | 
1319  | 
  is represented as a ``symbol'' (see \secref{sec:symbols}).
 | 
|
| 58618 | 1320  | 
\<close>  | 
1321  | 
||
1322  | 
||
1323  | 
subsection \<open>Strings\<close>  | 
|
1324  | 
||
1325  | 
text %mlref \<open>  | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1326  | 
  \begin{mldecls}
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1327  | 
  @{index_ML_type string} \\
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1328  | 
  \end{mldecls}
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1329  | 
|
| 61854 | 1330  | 
  \<^descr> Type @{ML_type string} represents immutable vectors of 8-bit characters.
 | 
1331  | 
There are operations in SML to convert back and forth to actual byte  | 
|
1332  | 
vectors, which are seldom used.  | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1333  | 
|
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1334  | 
This historically important raw text representation is used for  | 
| 61854 | 1335  | 
Isabelle-specific purposes with the following implicit substructures packed  | 
1336  | 
into the string content:  | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1337  | 
|
| 61854 | 1338  | 
    \<^enum> sequence of Isabelle symbols (see also \secref{sec:symbols}), with @{ML
 | 
1339  | 
Symbol.explode} as key operation;  | 
|
| 61458 | 1340  | 
|
| 61854 | 1341  | 
    \<^enum> XML tree structure via YXML (see also @{cite "isabelle-system"}), with
 | 
1342  | 
    @{ML YXML.parse_body} as key operation.
 | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1343  | 
|
| 58723 | 1344  | 
Note that Isabelle/ML string literals may refer Isabelle symbols like  | 
| 61854 | 1345  | 
``\<^verbatim>\<open>\<alpha>\<close>'' natively, \<^emph>\<open>without\<close> escaping the backslash. This is a consequence  | 
1346  | 
of Isabelle treating all source text as strings of symbols, instead of raw  | 
|
1347  | 
characters.  | 
|
| 58618 | 1348  | 
\<close>  | 
1349  | 
||
| 61854 | 1350  | 
text %mlex \<open>  | 
1351  | 
The subsequent example illustrates the difference of physical addressing of  | 
|
1352  | 
bytes versus logical addressing of symbols in Isabelle strings.  | 
|
| 58618 | 1353  | 
\<close>  | 
1354  | 
||
1355  | 
ML_val \<open>  | 
|
| 
52421
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1356  | 
val s = "\<A>";  | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1357  | 
|
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1358  | 
  @{assert} (length (Symbol.explode s) = 1);
 | 
| 
 
6d93140a206c
clarified strings of symbols, including ML string literals;
 
wenzelm 
parents: 
52420 
diff
changeset
 | 
1359  | 
  @{assert} (size s = 4);
 | 
| 58618 | 1360  | 
\<close>  | 
1361  | 
||
| 61854 | 1362  | 
text \<open>  | 
1363  | 
Note that in Unicode renderings of the symbol \<open>\<A>\<close>, variations of encodings  | 
|
1364  | 
like UTF-8 or UTF-16 pose delicate questions about the multi-byte  | 
|
1365  | 
representations of its codepoint, which is outside of the 16-bit address  | 
|
1366  | 
space of the original Unicode standard from the 1990-ies. In Isabelle/ML it  | 
|
1367  | 
is just ``\<^verbatim>\<open>\<A>\<close>'' literally, using plain ASCII characters beyond any  | 
|
1368  | 
doubts.  | 
|
1369  | 
\<close>  | 
|
| 58618 | 1370  | 
|
1371  | 
||
1372  | 
subsection \<open>Integers\<close>  | 
|
1373  | 
||
1374  | 
text %mlref \<open>  | 
|
| 39862 | 1375  | 
  \begin{mldecls}
 | 
1376  | 
  @{index_ML_type int} \\
 | 
|
1377  | 
  \end{mldecls}
 | 
|
1378  | 
||
| 61854 | 1379  | 
  \<^descr> Type @{ML_type int} represents regular mathematical integers, which are
 | 
1380  | 
\<^emph>\<open>unbounded\<close>. Overflow is treated properly, but should never happen in  | 
|
1381  | 
practice.\<^footnote>\<open>The size limit for integer bit patterns in memory is 64\,MB for  | 
|
| 62354 | 1382  | 
32-bit Poly/ML, and much higher for 64-bit systems.\<close>  | 
| 39862 | 1383  | 
|
| 61854 | 1384  | 
Literal integers in ML text are forced to be of this one true integer type  | 
1385  | 
--- adhoc overloading of SML97 is disabled.  | 
|
| 39862 | 1386  | 
|
| 55837 | 1387  | 
  Structure @{ML_structure IntInf} of SML97 is obsolete and superseded by
 | 
| 61854 | 1388  | 
  @{ML_structure Int}. Structure @{ML_structure Integer} in @{file
 | 
1389  | 
"~~/src/Pure/General/integer.ML"} provides some additional operations.  | 
|
| 58618 | 1390  | 
\<close>  | 
1391  | 
||
1392  | 
||
1393  | 
subsection \<open>Time\<close>  | 
|
1394  | 
||
1395  | 
text %mlref \<open>  | 
|
| 40302 | 1396  | 
  \begin{mldecls}
 | 
1397  | 
  @{index_ML_type Time.time} \\
 | 
|
1398  | 
  @{index_ML seconds: "real -> Time.time"} \\
 | 
|
1399  | 
  \end{mldecls}
 | 
|
1400  | 
||
| 61854 | 1401  | 
  \<^descr> Type @{ML_type Time.time} represents time abstractly according to the
 | 
1402  | 
SML97 basis library definition. This is adequate for internal ML operations,  | 
|
1403  | 
but awkward in concrete time specifications.  | 
|
| 40302 | 1404  | 
|
| 61854 | 1405  | 
  \<^descr> @{ML seconds}~\<open>s\<close> turns the concrete scalar \<open>s\<close> (measured in seconds) into
 | 
1406  | 
an abstract time value. Floating point numbers are easy to use as  | 
|
1407  | 
  configuration options in the context (see \secref{sec:config-options}) or
 | 
|
1408  | 
system options that are maintained externally.  | 
|
| 58618 | 1409  | 
\<close>  | 
1410  | 
||
1411  | 
||
1412  | 
subsection \<open>Options\<close>  | 
|
1413  | 
||
1414  | 
text %mlref \<open>  | 
|
| 39859 | 1415  | 
  \begin{mldecls}
 | 
1416  | 
  @{index_ML Option.map: "('a -> 'b) -> 'a option -> 'b option"} \\
 | 
|
1417  | 
  @{index_ML is_some: "'a option -> bool"} \\
 | 
|
1418  | 
  @{index_ML is_none: "'a option -> bool"} \\
 | 
|
1419  | 
  @{index_ML the: "'a option -> 'a"} \\
 | 
|
1420  | 
  @{index_ML these: "'a list option -> 'a list"} \\
 | 
|
1421  | 
  @{index_ML the_list: "'a option -> 'a list"} \\
 | 
|
1422  | 
  @{index_ML the_default: "'a -> 'a option -> 'a"} \\
 | 
|
1423  | 
  \end{mldecls}
 | 
|
| 58618 | 1424  | 
\<close>  | 
1425  | 
||
| 61854 | 1426  | 
text \<open>  | 
1427  | 
  Apart from @{ML Option.map} most other operations defined in structure
 | 
|
1428  | 
  @{ML_structure Option} are alien to Isabelle/ML and never used. The
 | 
|
1429  | 
  operations shown above are defined in @{file
 | 
|
1430  | 
"~~/src/Pure/General/basics.ML"}.  | 
|
1431  | 
\<close>  | 
|
| 58618 | 1432  | 
|
1433  | 
||
1434  | 
subsection \<open>Lists\<close>  | 
|
1435  | 
||
| 61854 | 1436  | 
text \<open>  | 
1437  | 
Lists are ubiquitous in ML as simple and light-weight ``collections'' for  | 
|
1438  | 
many everyday programming tasks. Isabelle/ML provides important additions  | 
|
1439  | 
and improvements over operations that are predefined in the SML97 library.  | 
|
1440  | 
\<close>  | 
|
| 58618 | 1441  | 
|
1442  | 
text %mlref \<open>  | 
|
| 39863 | 1443  | 
  \begin{mldecls}
 | 
1444  | 
  @{index_ML cons: "'a -> 'a list -> 'a list"} \\
 | 
|
| 39874 | 1445  | 
  @{index_ML member: "('b * 'a -> bool) -> 'a list -> 'b -> bool"} \\
 | 
1446  | 
  @{index_ML insert: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | 
|
1447  | 
  @{index_ML remove: "('b * 'a -> bool) -> 'b -> 'a list -> 'a list"} \\
 | 
|
1448  | 
  @{index_ML update: "('a * 'a -> bool) -> 'a -> 'a list -> 'a list"} \\
 | 
|
| 39863 | 1449  | 
  \end{mldecls}
 | 
1450  | 
||
| 61493 | 1451  | 
  \<^descr> @{ML cons}~\<open>x xs\<close> evaluates to \<open>x :: xs\<close>.
 | 
| 39863 | 1452  | 
|
| 61854 | 1453  | 
Tupled infix operators are a historical accident in Standard ML. The curried  | 
1454  | 
  @{ML cons} amends this, but it should be only used when partial application
 | 
|
1455  | 
is required.  | 
|
| 39863 | 1456  | 
|
| 61854 | 1457  | 
  \<^descr> @{ML member}, @{ML insert}, @{ML remove}, @{ML update} treat lists as a
 | 
1458  | 
  set-like container that maintains the order of elements. See @{file
 | 
|
1459  | 
"~~/src/Pure/library.ML"} for the full specifications (written in ML). There  | 
|
1460  | 
  are some further derived operations like @{ML union} or @{ML inter}.
 | 
|
| 39874 | 1461  | 
|
| 61854 | 1462  | 
  Note that @{ML insert} is conservative about elements that are already a
 | 
1463  | 
  @{ML member} of the list, while @{ML update} ensures that the latest entry
 | 
|
1464  | 
is always put in front. The latter discipline is often more appropriate in  | 
|
1465  | 
  declarations of context data (\secref{sec:context-data}) that are issued by
 | 
|
1466  | 
the user in Isar source: later declarations take precedence over earlier  | 
|
1467  | 
ones. \<close>  | 
|
| 58618 | 1468  | 
|
| 61854 | 1469  | 
text %mlex \<open>  | 
1470  | 
  Using canonical @{ML fold} together with @{ML cons} (or similar standard
 | 
|
1471  | 
operations) alternates the orientation of data. The is quite natural and  | 
|
1472  | 
  should not be altered forcible by inserting extra applications of @{ML rev}.
 | 
|
1473  | 
  The alternative @{ML fold_rev} can be used in the few situations, where
 | 
|
1474  | 
alternation should be prevented.  | 
|
| 58618 | 1475  | 
\<close>  | 
1476  | 
||
| 59902 | 1477  | 
ML_val \<open>  | 
| 39863 | 1478  | 
val items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];  | 
1479  | 
||
1480  | 
val list1 = fold cons items [];  | 
|
| 
39866
 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 
wenzelm 
parents: 
39864 
diff
changeset
 | 
1481  | 
  @{assert} (list1 = rev items);
 | 
| 
 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 
wenzelm 
parents: 
39864 
diff
changeset
 | 
1482  | 
|
| 39863 | 1483  | 
val list2 = fold_rev cons items [];  | 
| 
39866
 
5ec01d5acd0c
more robust examples: explicit @{assert} instead of unchecked output;
 
wenzelm 
parents: 
39864 
diff
changeset
 | 
1484  | 
  @{assert} (list2 = items);
 | 
| 58618 | 1485  | 
\<close>  | 
1486  | 
||
| 61854 | 1487  | 
text \<open>  | 
1488  | 
The subsequent example demonstrates how to \<^emph>\<open>merge\<close> two lists in a natural  | 
|
1489  | 
way.  | 
|
1490  | 
\<close>  | 
|
| 58618 | 1491  | 
|
| 59902 | 1492  | 
ML_val \<open>  | 
| 39883 | 1493  | 
fun merge_lists eq (xs, ys) = fold_rev (insert eq) ys xs;  | 
| 58618 | 1494  | 
\<close>  | 
1495  | 
||
| 61854 | 1496  | 
text \<open>  | 
1497  | 
Here the first list is treated conservatively: only the new elements from  | 
|
1498  | 
  the second list are inserted. The inside-out order of insertion via @{ML
 | 
|
1499  | 
fold_rev} attempts to preserve the order of elements in the result.  | 
|
| 39883 | 1500  | 
|
1501  | 
This way of merging lists is typical for context data  | 
|
| 61854 | 1502  | 
  (\secref{sec:context-data}). See also @{ML merge} as defined in @{file
 | 
1503  | 
"~~/src/Pure/library.ML"}.  | 
|
| 58618 | 1504  | 
\<close>  | 
1505  | 
||
1506  | 
||
1507  | 
subsection \<open>Association lists\<close>  | 
|
1508  | 
||
| 61854 | 1509  | 
text \<open>  | 
1510  | 
The operations for association lists interpret a concrete list of pairs as a  | 
|
1511  | 
finite function from keys to values. Redundant representations with multiple  | 
|
1512  | 
occurrences of the same key are implicitly normalized: lookup and update  | 
|
1513  | 
only take the first occurrence into account.  | 
|
| 58618 | 1514  | 
\<close>  | 
1515  | 
||
1516  | 
text \<open>  | 
|
| 
39875
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1517  | 
  \begin{mldecls}
 | 
| 
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1518  | 
  @{index_ML AList.lookup: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> 'c option"} \\
 | 
| 
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1519  | 
  @{index_ML AList.defined: "('a * 'b -> bool) -> ('b * 'c) list -> 'a -> bool"} \\
 | 
| 
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1520  | 
  @{index_ML AList.update: "('a * 'a -> bool) -> 'a * 'b -> ('a * 'b) list -> ('a * 'b) list"} \\
 | 
| 
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1521  | 
  \end{mldecls}
 | 
| 
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1522  | 
|
| 61854 | 1523  | 
  \<^descr> @{ML AList.lookup}, @{ML AList.defined}, @{ML AList.update} implement the
 | 
1524  | 
main ``framework operations'' for mappings in Isabelle/ML, following  | 
|
1525  | 
standard conventions for their names and types.  | 
|
| 
39875
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1526  | 
|
| 61854 | 1527  | 
Note that a function called \<^verbatim>\<open>lookup\<close> is obliged to express its partiality  | 
1528  | 
via an explicit option element. There is no choice to raise an exception,  | 
|
1529  | 
without changing the name to something like \<open>the_element\<close> or \<open>get\<close>.  | 
|
| 61493 | 1530  | 
|
| 61854 | 1531  | 
  The \<open>defined\<close> operation is essentially a contraction of @{ML is_some} and
 | 
1532  | 
\<^verbatim>\<open>lookup\<close>, but this is sufficiently frequent to justify its independent  | 
|
1533  | 
existence. This also gives the implementation some opportunity for peep-hole  | 
|
1534  | 
optimization.  | 
|
| 
39875
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1535  | 
|
| 
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1536  | 
|
| 57421 | 1537  | 
Association lists are adequate as simple implementation of finite mappings  | 
1538  | 
in many practical situations. A more advanced table structure is defined in  | 
|
1539  | 
  @{file "~~/src/Pure/General/table.ML"}; that version scales easily to
 | 
|
| 
39875
 
648c930125f6
more on "Association lists", based on more succinct version of older material;
 
wenzelm 
parents: 
39874 
diff
changeset
 | 
1540  | 
thousands or millions of elements.  | 
| 58618 | 1541  | 
\<close>  | 
1542  | 
||
1543  | 
||
1544  | 
subsection \<open>Unsynchronized references\<close>  | 
|
1545  | 
||
1546  | 
text %mlref \<open>  | 
|
| 39859 | 1547  | 
  \begin{mldecls}
 | 
| 39870 | 1548  | 
  @{index_ML_type "'a Unsynchronized.ref"} \\
 | 
| 39859 | 1549  | 
  @{index_ML Unsynchronized.ref: "'a -> 'a Unsynchronized.ref"} \\
 | 
1550  | 
  @{index_ML "!": "'a Unsynchronized.ref -> 'a"} \\
 | 
|
| 46262 | 1551  | 
  @{index_ML_op ":=": "'a Unsynchronized.ref * 'a -> unit"} \\
 | 
| 39859 | 1552  | 
  \end{mldecls}
 | 
| 58618 | 1553  | 
\<close>  | 
1554  | 
||
| 61854 | 1555  | 
text \<open>  | 
1556  | 
Due to ubiquitous parallelism in Isabelle/ML (see also  | 
|
1557  | 
  \secref{sec:multi-threading}), the mutable reference cells of Standard ML
 | 
|
1558  | 
are notorious for causing problems. In a highly parallel system, both  | 
|
1559  | 
correctness \<^emph>\<open>and\<close> performance are easily degraded when using mutable data.  | 
|
| 39859 | 1560  | 
|
| 61854 | 1561  | 
  The unwieldy name of @{ML Unsynchronized.ref} for the constructor for
 | 
1562  | 
references in Isabelle/ML emphasizes the inconveniences caused by  | 
|
1563  | 
  mutability. Existing operations @{ML "!"} and @{ML_op ":="} are unchanged,
 | 
|
1564  | 
but should be used with special precautions, say in a strictly local  | 
|
1565  | 
situation that is guaranteed to be restricted to sequential evaluation ---  | 
|
1566  | 
now and in the future.  | 
|
| 40508 | 1567  | 
|
1568  | 
  \begin{warn}
 | 
|
1569  | 
  Never @{ML_text "open Unsynchronized"}, not even in a local scope!
 | 
|
1570  | 
Pretending that mutable state is no problem is a very bad idea.  | 
|
1571  | 
  \end{warn}
 | 
|
| 58618 | 1572  | 
\<close>  | 
1573  | 
||
1574  | 
||
1575  | 
section \<open>Thread-safe programming \label{sec:multi-threading}\<close>
 | 
|
1576  | 
||
| 61854 | 1577  | 
text \<open>  | 
1578  | 
Multi-threaded execution has become an everyday reality in Isabelle since  | 
|
1579  | 
Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides implicit and explicit  | 
|
1580  | 
parallelism by default, and there is no way for user-space tools to ``opt  | 
|
1581  | 
out''. ML programs that are purely functional, output messages only via the  | 
|
1582  | 
  official channels (\secref{sec:message-channels}), and do not intercept
 | 
|
1583  | 
  interrupts (\secref{sec:exceptions}) can participate in the multi-threaded
 | 
|
| 39868 | 1584  | 
environment immediately without further ado.  | 
1585  | 
||
| 61854 | 1586  | 
More ambitious tools with more fine-grained interaction with the environment  | 
1587  | 
need to observe the principles explained below.  | 
|
| 58618 | 1588  | 
\<close>  | 
1589  | 
||
1590  | 
||
1591  | 
subsection \<open>Multi-threading with shared memory\<close>  | 
|
1592  | 
||
| 61854 | 1593  | 
text \<open>  | 
1594  | 
Multiple threads help to organize advanced operations of the system, such as  | 
|
1595  | 
real-time conditions on command transactions, sub-components with explicit  | 
|
1596  | 
communication, general asynchronous interaction etc. Moreover, parallel  | 
|
1597  | 
evaluation is a prerequisite to make adequate use of the CPU resources that  | 
|
1598  | 
are available on multi-core systems.\<^footnote>\<open>Multi-core computing does not mean  | 
|
1599  | 
that there are ``spare cycles'' to be wasted. It means that the continued  | 
|
1600  | 
exponential speedup of CPU performance due to ``Moore's Law'' follows  | 
|
1601  | 
different rules: clock frequency has reached its peak around 2005, and  | 
|
1602  | 
applications need to be parallelized in order to avoid a perceived loss of  | 
|
1603  | 
  performance. See also @{cite "Sutter:2005"}.\<close>
 | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1604  | 
|
| 57421 | 1605  | 
Isabelle/Isar exploits the inherent structure of theories and proofs to  | 
| 61854 | 1606  | 
support \<^emph>\<open>implicit parallelism\<close> to a large extent. LCF-style theorem proving  | 
1607  | 
  provides almost ideal conditions for that, see also @{cite "Wenzel:2009"}.
 | 
|
1608  | 
This means, significant parts of theory and proof checking is parallelized  | 
|
1609  | 
by default. In Isabelle2013, a maximum speedup-factor of 3.5 on 4 cores and  | 
|
1610  | 
  6.5 on 8 cores can be expected @{cite "Wenzel:2013:ITP"}.
 | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1611  | 
|
| 61416 | 1612  | 
\<^medskip>  | 
| 61854 | 1613  | 
ML threads lack the memory protection of separate processes, and operate  | 
1614  | 
concurrently on shared heap memory. This has the advantage that results of  | 
|
1615  | 
independent computations are directly available to other threads: abstract  | 
|
1616  | 
values can be passed without copying or awkward serialization that is  | 
|
1617  | 
typically required for separate processes.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1618  | 
|
| 61854 | 1619  | 
To make shared-memory multi-threading work robustly and efficiently, some  | 
1620  | 
programming guidelines need to be observed. While the ML system is  | 
|
1621  | 
responsible to maintain basic integrity of the representation of ML values  | 
|
1622  | 
in memory, the application programmer needs to ensure that multi-threaded  | 
|
1623  | 
execution does not break the intended semantics.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1624  | 
|
| 
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1625  | 
  \begin{warn}
 | 
| 61854 | 1626  | 
To participate in implicit parallelism, tools need to be thread-safe. A  | 
1627  | 
single ill-behaved tool can affect the stability and performance of the  | 
|
1628  | 
whole system.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1629  | 
  \end{warn}
 | 
| 
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1630  | 
|
| 57421 | 1631  | 
Apart from observing the principles of thread-safeness passively, advanced  | 
1632  | 
tools may also exploit parallelism actively, e.g.\ by using library  | 
|
| 39868 | 1633  | 
  functions for parallel list operations (\secref{sec:parlist}).
 | 
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1634  | 
|
| 
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1635  | 
  \begin{warn}
 | 
| 61854 | 1636  | 
Parallel computing resources are managed centrally by the Isabelle/ML  | 
1637  | 
infrastructure. User programs should not fork their own ML threads to  | 
|
1638  | 
perform heavy computations.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1639  | 
  \end{warn}
 | 
| 58618 | 1640  | 
\<close>  | 
1641  | 
||
1642  | 
||
1643  | 
subsection \<open>Critical shared resources\<close>  | 
|
1644  | 
||
| 61854 | 1645  | 
text \<open>  | 
1646  | 
Thread-safeness is mainly concerned about concurrent read/write access to  | 
|
1647  | 
shared resources, which are outside the purely functional world of ML. This  | 
|
1648  | 
covers the following in particular.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1649  | 
|
| 61854 | 1650  | 
\<^item> Global references (or arrays), i.e.\ mutable memory cells that persist  | 
1651  | 
over several invocations of associated operations.\<^footnote>\<open>This is independent of  | 
|
1652  | 
the visibility of such mutable values in the toplevel scope.\<close>  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1653  | 
|
| 61854 | 1654  | 
\<^item> Global state of the running Isabelle/ML process, i.e.\ raw I/O channels,  | 
1655  | 
environment variables, current working directory.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1656  | 
|
| 61854 | 1657  | 
\<^item> Writable resources in the file-system that are shared among different  | 
1658  | 
threads or external processes.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1659  | 
|
| 61854 | 1660  | 
Isabelle/ML provides various mechanisms to avoid critical shared resources  | 
1661  | 
in most situations. As last resort there are some mechanisms for explicit  | 
|
1662  | 
synchronization. The following guidelines help to make Isabelle/ML programs  | 
|
1663  | 
work smoothly in a concurrent environment.  | 
|
1664  | 
||
1665  | 
\<^item> Avoid global references altogether. Isabelle/Isar maintains a uniform  | 
|
1666  | 
context that incorporates arbitrary data declared by user programs  | 
|
1667  | 
  (\secref{sec:context-data}). This context is passed as plain value and user
 | 
|
1668  | 
tools can get/map their own data in a purely functional manner.  | 
|
1669  | 
  Configuration options within the context (\secref{sec:config-options})
 | 
|
1670  | 
provide simple drop-in replacements for historic reference variables.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1671  | 
|
| 61854 | 1672  | 
\<^item> Keep components with local state information re-entrant. Instead of poking  | 
1673  | 
initial values into (private) global references, a new state record can be  | 
|
1674  | 
created on each invocation, and passed through any auxiliary functions of  | 
|
1675  | 
the component. The state record contain mutable references in special  | 
|
1676  | 
situations, without requiring any synchronization, as long as each  | 
|
1677  | 
invocation gets its own copy and the tool itself is single-threaded.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1678  | 
|
| 61854 | 1679  | 
\<^item> Avoid raw output on \<open>stdout\<close> or \<open>stderr\<close>. The Poly/ML library is  | 
1680  | 
thread-safe for each individual output operation, but the ordering of  | 
|
1681  | 
parallel invocations is arbitrary. This means raw output will appear on some  | 
|
1682  | 
system console with unpredictable interleaving of atomic chunks.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1683  | 
|
| 39868 | 1684  | 
Note that this does not affect regular message output channels  | 
| 61854 | 1685  | 
  (\secref{sec:message-channels}). An official message id is associated with
 | 
1686  | 
the command transaction from where it originates, independently of other  | 
|
1687  | 
transactions. This means each running Isar command has effectively its own  | 
|
1688  | 
set of message channels, and interleaving can only happen when commands use  | 
|
1689  | 
parallelism internally (and only at message boundaries).  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1690  | 
|
| 61854 | 1691  | 
\<^item> Treat environment variables and the current working directory of the  | 
1692  | 
running process as read-only.  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1693  | 
|
| 61854 | 1694  | 
\<^item> Restrict writing to the file-system to unique temporary files. Isabelle  | 
1695  | 
already provides a temporary directory that is unique for the running  | 
|
1696  | 
process, and there is a centralized source of unique serial numbers in  | 
|
1697  | 
Isabelle/ML. Thus temporary files that are passed to to some external  | 
|
1698  | 
process will be always disjoint, and thus thread-safe.  | 
|
| 58618 | 1699  | 
\<close>  | 
1700  | 
||
1701  | 
text %mlref \<open>  | 
|
| 39868 | 1702  | 
  \begin{mldecls}
 | 
1703  | 
  @{index_ML File.tmp_path: "Path.T -> Path.T"} \\
 | 
|
1704  | 
  @{index_ML serial_string: "unit -> string"} \\
 | 
|
1705  | 
  \end{mldecls}
 | 
|
1706  | 
||
| 61854 | 1707  | 
  \<^descr> @{ML File.tmp_path}~\<open>path\<close> relocates the base component of \<open>path\<close> into the
 | 
1708  | 
unique temporary directory of the running Isabelle/ML process.  | 
|
| 39868 | 1709  | 
|
| 61854 | 1710  | 
  \<^descr> @{ML serial_string}~\<open>()\<close> creates a new serial number that is unique over
 | 
1711  | 
the runtime of the Isabelle/ML process.  | 
|
| 58618 | 1712  | 
\<close>  | 
1713  | 
||
| 61854 | 1714  | 
text %mlex \<open>  | 
1715  | 
The following example shows how to create unique temporary file names.  | 
|
| 58618 | 1716  | 
\<close>  | 
1717  | 
||
| 59902 | 1718  | 
ML_val \<open>  | 
| 39868 | 1719  | 
  val tmp1 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | 
1720  | 
  val tmp2 = File.tmp_path (Path.basic ("foo" ^ serial_string ()));
 | 
|
1721  | 
  @{assert} (tmp1 <> tmp2);
 | 
|
| 58618 | 1722  | 
\<close>  | 
1723  | 
||
1724  | 
||
1725  | 
subsection \<open>Explicit synchronization\<close>  | 
|
1726  | 
||
| 61854 | 1727  | 
text \<open>  | 
1728  | 
Isabelle/ML provides explicit synchronization for mutable variables over  | 
|
| 
59180
 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 
wenzelm 
parents: 
59138 
diff
changeset
 | 
1729  | 
immutable data, which may be updated atomically and exclusively. This  | 
| 
 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 
wenzelm 
parents: 
59138 
diff
changeset
 | 
1730  | 
addresses the rare situations where mutable shared resources are really  | 
| 
 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 
wenzelm 
parents: 
59138 
diff
changeset
 | 
1731  | 
required. Synchronization in Isabelle/ML is based on primitives of Poly/ML,  | 
| 
 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 
wenzelm 
parents: 
59138 
diff
changeset
 | 
1732  | 
which have been adapted to the specific assumptions of the concurrent  | 
| 
 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 
wenzelm 
parents: 
59138 
diff
changeset
 | 
1733  | 
Isabelle environment. User code should not break this abstraction, but stay  | 
| 
 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 
wenzelm 
parents: 
59138 
diff
changeset
 | 
1734  | 
within the confines of concurrent Isabelle/ML.  | 
| 
 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 
wenzelm 
parents: 
59138 
diff
changeset
 | 
1735  | 
|
| 61854 | 1736  | 
A \<^emph>\<open>synchronized variable\<close> is an explicit state component associated with  | 
1737  | 
mechanisms for locking and signaling. There are operations to await a  | 
|
| 
59180
 
c0fa3b3bdabd
discontinued central critical sections: NAMED_CRITICAL / CRITICAL;
 
wenzelm 
parents: 
59138 
diff
changeset
 | 
1738  | 
condition, change the state, and signal the change to all other waiting  | 
| 61477 | 1739  | 
threads. Synchronized access to the state variable is \<^emph>\<open>not\<close> re-entrant:  | 
| 61854 | 1740  | 
direct or indirect nesting within the same thread will cause a deadlock!  | 
1741  | 
\<close>  | 
|
| 58618 | 1742  | 
|
1743  | 
text %mlref \<open>  | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1744  | 
  \begin{mldecls}
 | 
| 39871 | 1745  | 
  @{index_ML_type "'a Synchronized.var"} \\
 | 
1746  | 
  @{index_ML Synchronized.var: "string -> 'a -> 'a Synchronized.var"} \\
 | 
|
1747  | 
  @{index_ML Synchronized.guarded_access: "'a Synchronized.var ->
 | 
|
1748  | 
  ('a -> ('b * 'a) option) -> 'b"} \\
 | 
|
1749  | 
  \end{mldecls}
 | 
|
| 
39867
 
a8363532cd4d
somewhat modernized version of "Thread-safe programming";
 
wenzelm 
parents: 
39866 
diff
changeset
 | 
1750  | 
|
| 61854 | 1751  | 
    \<^descr> Type @{ML_type "'a Synchronized.var"} represents synchronized variables
 | 
1752  | 
    with state of type @{ML_type 'a}.
 | 
|
| 39871 | 1753  | 
|
| 61854 | 1754  | 
    \<^descr> @{ML Synchronized.var}~\<open>name x\<close> creates a synchronized variable that is
 | 
1755  | 
initialized with value \<open>x\<close>. The \<open>name\<close> is used for tracing.  | 
|
| 61493 | 1756  | 
|
| 61854 | 1757  | 
    \<^descr> @{ML Synchronized.guarded_access}~\<open>var f\<close> lets the function \<open>f\<close> operate
 | 
1758  | 
within a critical section on the state \<open>x\<close> as follows: if \<open>f x\<close> produces  | 
|
1759  | 
    @{ML NONE}, it continues to wait on the internal condition variable,
 | 
|
1760  | 
expecting that some other thread will eventually change the content in a  | 
|
1761  | 
    suitable manner; if \<open>f x\<close> produces @{ML SOME}~\<open>(y, x')\<close> it is satisfied and
 | 
|
1762  | 
assigns the new state value \<open>x'\<close>, broadcasts a signal to all waiting threads  | 
|
1763  | 
on the associated condition variable, and returns the result \<open>y\<close>.  | 
|
| 39871 | 1764  | 
|
| 61854 | 1765  | 
  There are some further variants of the @{ML Synchronized.guarded_access}
 | 
1766  | 
  combinator, see @{file "~~/src/Pure/Concurrent/synchronized.ML"} for
 | 
|
1767  | 
details.  | 
|
| 58618 | 1768  | 
\<close>  | 
1769  | 
||
| 61854 | 1770  | 
text %mlex \<open>  | 
1771  | 
The following example implements a counter that produces positive integers  | 
|
1772  | 
that are unique over the runtime of the Isabelle process:  | 
|
1773  | 
\<close>  | 
|
| 58618 | 1774  | 
|
| 59902 | 1775  | 
ML_val \<open>  | 
| 39871 | 1776  | 
local  | 
1777  | 
val counter = Synchronized.var "counter" 0;  | 
|
1778  | 
in  | 
|
1779  | 
fun next () =  | 
|
1780  | 
Synchronized.guarded_access counter  | 
|
1781  | 
(fn i =>  | 
|
1782  | 
let val j = i + 1  | 
|
1783  | 
in SOME (j, j) end);  | 
|
1784  | 
end;  | 
|
| 59902 | 1785  | 
|
| 39871 | 1786  | 
val a = next ();  | 
1787  | 
val b = next ();  | 
|
1788  | 
  @{assert} (a <> b);
 | 
|
| 58618 | 1789  | 
\<close>  | 
1790  | 
||
| 61416 | 1791  | 
text \<open>  | 
1792  | 
\<^medskip>  | 
|
| 61854 | 1793  | 
  See @{file "~~/src/Pure/Concurrent/mailbox.ML"} how to implement a mailbox
 | 
1794  | 
as synchronized variable over a purely functional list.  | 
|
1795  | 
\<close>  | 
|
| 58618 | 1796  | 
|
1797  | 
||
1798  | 
section \<open>Managed evaluation\<close>  | 
|
1799  | 
||
| 61854 | 1800  | 
text \<open>  | 
1801  | 
Execution of Standard ML follows the model of strict functional evaluation  | 
|
1802  | 
with optional exceptions. Evaluation happens whenever some function is  | 
|
1803  | 
applied to (sufficiently many) arguments. The result is either an explicit  | 
|
1804  | 
value or an implicit exception.  | 
|
| 52419 | 1805  | 
|
| 61854 | 1806  | 
\<^emph>\<open>Managed evaluation\<close> in Isabelle/ML organizes expressions and results to  | 
1807  | 
control certain physical side-conditions, to say more specifically when and  | 
|
1808  | 
how evaluation happens. For example, the Isabelle/ML library supports lazy  | 
|
1809  | 
evaluation with memoing, parallel evaluation via futures, asynchronous  | 
|
1810  | 
evaluation via promises, evaluation with time limit etc.  | 
|
| 52419 | 1811  | 
|
| 61416 | 1812  | 
\<^medskip>  | 
| 61854 | 1813  | 
An \<^emph>\<open>unevaluated expression\<close> is represented either as unit abstraction \<^verbatim>\<open>fn  | 
1814  | 
() => a\<close> of type \<^verbatim>\<open>unit -> 'a\<close> or as regular function \<^verbatim>\<open>fn a => b\<close> of type  | 
|
1815  | 
\<^verbatim>\<open>'a -> 'b\<close>. Both forms occur routinely, and special care is required to  | 
|
1816  | 
tell them apart --- the static type-system of SML is only of limited help  | 
|
1817  | 
here.  | 
|
| 52419 | 1818  | 
|
| 61854 | 1819  | 
The first form is more intuitive: some combinator \<open>(unit -> 'a) -> 'a\<close>  | 
1820  | 
applies the given function to \<open>()\<close> to initiate the postponed evaluation  | 
|
1821  | 
  process. The second form is more flexible: some combinator \<open>('a -> 'b) -> 'a
 | 
|
1822  | 
-> 'b\<close> acts like a modified form of function application; several such  | 
|
1823  | 
combinators may be cascaded to modify a given function, before it is  | 
|
1824  | 
ultimately applied to some argument.  | 
|
| 52419 | 1825  | 
|
| 61416 | 1826  | 
\<^medskip>  | 
| 61854 | 1827  | 
\<^emph>\<open>Reified results\<close> make the disjoint sum of regular values versions  | 
1828  | 
exceptional situations explicit as ML datatype: \<open>'a result = Res of 'a | Exn  | 
|
1829  | 
of exn\<close>. This is typically used for administrative purposes, to store the  | 
|
1830  | 
overall outcome of an evaluation process.  | 
|
| 52419 | 1831  | 
|
| 61854 | 1832  | 
\<^emph>\<open>Parallel exceptions\<close> aggregate reified results, such that multiple  | 
1833  | 
exceptions are digested as a collection in canonical form that identifies  | 
|
1834  | 
exceptions according to their original occurrence. This is particular  | 
|
1835  | 
  important for parallel evaluation via futures \secref{sec:futures}, which
 | 
|
1836  | 
are organized as acyclic graph of evaluations that depend on other  | 
|
1837  | 
evaluations: exceptions stemming from shared sub-graphs are exposed exactly  | 
|
1838  | 
once and in the order of their original occurrence (e.g.\ when printed at  | 
|
1839  | 
the toplevel). Interrupt counts as neutral element here: it is treated as  | 
|
1840  | 
minimal information about some canceled evaluation process, and is absorbed  | 
|
1841  | 
by the presence of regular program exceptions.  | 
|
1842  | 
\<close>  | 
|
| 58618 | 1843  | 
|
1844  | 
text %mlref \<open>  | 
|
| 52419 | 1845  | 
  \begin{mldecls}
 | 
1846  | 
  @{index_ML_type "'a Exn.result"} \\
 | 
|
1847  | 
  @{index_ML Exn.capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
 | 
|
1848  | 
  @{index_ML Exn.interruptible_capture: "('a -> 'b) -> 'a -> 'b Exn.result"} \\
 | 
|
1849  | 
  @{index_ML Exn.release: "'a Exn.result -> 'a"} \\
 | 
|
1850  | 
  @{index_ML Par_Exn.release_all: "'a Exn.result list -> 'a list"} \\
 | 
|
1851  | 
  @{index_ML Par_Exn.release_first: "'a Exn.result list -> 'a list"} \\
 | 
|
1852  | 
  \end{mldecls}
 | 
|
1853  | 
||
| 61854 | 1854  | 
  \<^descr> Type @{ML_type "'a Exn.result"} represents the disjoint sum of ML results
 | 
1855  | 
  explicitly, with constructor @{ML Exn.Res} for regular values and @{ML
 | 
|
1856  | 
"Exn.Exn"} for exceptions.  | 
|
| 52419 | 1857  | 
|
| 61854 | 1858  | 
  \<^descr> @{ML Exn.capture}~\<open>f x\<close> manages the evaluation of \<open>f x\<close> such that
 | 
1859  | 
  exceptions are made explicit as @{ML "Exn.Exn"}. Note that this includes
 | 
|
1860  | 
  physical interrupts (see also \secref{sec:exceptions}), so the same
 | 
|
1861  | 
precautions apply to user code: interrupts must not be absorbed  | 
|
1862  | 
accidentally!  | 
|
| 52419 | 1863  | 
|
| 61854 | 1864  | 
  \<^descr> @{ML Exn.interruptible_capture} is similar to @{ML Exn.capture}, but
 | 
1865  | 
interrupts are immediately re-raised as required for user code.  | 
|
| 52419 | 1866  | 
|
| 61854 | 1867  | 
  \<^descr> @{ML Exn.release}~\<open>result\<close> releases the original runtime result, exposing
 | 
1868  | 
its regular value or raising the reified exception.  | 
|
| 52419 | 1869  | 
|
| 61854 | 1870  | 
  \<^descr> @{ML Par_Exn.release_all}~\<open>results\<close> combines results that were produced
 | 
1871  | 
independently (e.g.\ by parallel evaluation). If all results are regular  | 
|
1872  | 
values, that list is returned. Otherwise, the collection of all exceptions  | 
|
1873  | 
is raised, wrapped-up as collective parallel exception. Note that the latter  | 
|
1874  | 
prevents access to individual exceptions by conventional \<^verbatim>\<open>handle\<close> of ML.  | 
|
| 52419 | 1875  | 
|
| 61854 | 1876  | 
  \<^descr> @{ML Par_Exn.release_first} is similar to @{ML Par_Exn.release_all}, but
 | 
1877  | 
only the first (meaningful) exception that has occurred in the original  | 
|
1878  | 
evaluation process is raised again, the others are ignored. That single  | 
|
1879  | 
exception may get handled by conventional means in ML.  | 
|
| 58618 | 1880  | 
\<close>  | 
1881  | 
||
1882  | 
||
1883  | 
subsection \<open>Parallel skeletons \label{sec:parlist}\<close>
 | 
|
1884  | 
||
1885  | 
text \<open>  | 
|
| 61854 | 1886  | 
Algorithmic skeletons are combinators that operate on lists in parallel, in  | 
1887  | 
the manner of well-known \<open>map\<close>, \<open>exists\<close>, \<open>forall\<close> etc. Management of  | 
|
1888  | 
  futures (\secref{sec:futures}) and their results as reified exceptions is
 | 
|
1889  | 
wrapped up into simple programming interfaces that resemble the sequential  | 
|
1890  | 
versions.  | 
|
| 52420 | 1891  | 
|
| 61854 | 1892  | 
What remains is the application-specific problem to present expressions with  | 
1893  | 
suitable \<^emph>\<open>granularity\<close>: each list element corresponds to one evaluation  | 
|
1894  | 
task. If the granularity is too coarse, the available CPUs are not  | 
|
1895  | 
saturated. If it is too fine-grained, CPU cycles are wasted due to the  | 
|
1896  | 
overhead of organizing parallel processing. In the worst case, parallel  | 
|
| 52420 | 1897  | 
performance will be less than the sequential counterpart!  | 
| 58618 | 1898  | 
\<close>  | 
1899  | 
||
1900  | 
text %mlref \<open>  | 
|
| 52420 | 1901  | 
  \begin{mldecls}
 | 
1902  | 
  @{index_ML Par_List.map: "('a -> 'b) -> 'a list -> 'b list"} \\
 | 
|
1903  | 
  @{index_ML Par_List.get_some: "('a -> 'b option) -> 'a list -> 'b option"} \\
 | 
|
1904  | 
  \end{mldecls}
 | 
|
1905  | 
||
| 61854 | 1906  | 
  \<^descr> @{ML Par_List.map}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> is like @{ML "map"}~\<open>f [x\<^sub>1, \<dots>,
 | 
1907  | 
x\<^sub>n]\<close>, but the evaluation of \<open>f x\<^sub>i\<close> for \<open>i = 1, \<dots>, n\<close> is performed in  | 
|
1908  | 
parallel.  | 
|
| 61493 | 1909  | 
|
| 61854 | 1910  | 
An exception in any \<open>f x\<^sub>i\<close> cancels the overall evaluation process. The  | 
1911  | 
  final result is produced via @{ML Par_Exn.release_first} as explained above,
 | 
|
1912  | 
which means the first program exception that happened to occur in the  | 
|
1913  | 
parallel evaluation is propagated, and all other failures are ignored.  | 
|
| 52420 | 1914  | 
|
| 61854 | 1915  | 
  \<^descr> @{ML Par_List.get_some}~\<open>f [x\<^sub>1, \<dots>, x\<^sub>n]\<close> produces some \<open>f x\<^sub>i\<close> that is of
 | 
1916  | 
the form \<open>SOME y\<^sub>i\<close>, if that exists, otherwise \<open>NONE\<close>. Thus it is similar to  | 
|
1917  | 
  @{ML Library.get_first}, but subject to a non-deterministic parallel choice
 | 
|
1918  | 
process. The first successful result cancels the overall evaluation process;  | 
|
1919  | 
  other exceptions are propagated as for @{ML Par_List.map}.
 | 
|
| 52420 | 1920  | 
|
| 61854 | 1921  | 
This generic parallel choice combinator is the basis for derived forms, such  | 
1922  | 
  as @{ML Par_List.find_some}, @{ML Par_List.exists}, @{ML Par_List.forall}.
 | 
|
| 58618 | 1923  | 
\<close>  | 
1924  | 
||
| 61854 | 1925  | 
text %mlex \<open>  | 
1926  | 
Subsequently, the Ackermann function is evaluated in parallel for some  | 
|
1927  | 
ranges of arguments.  | 
|
1928  | 
\<close>  | 
|
| 58618 | 1929  | 
|
1930  | 
ML_val \<open>  | 
|
| 52420 | 1931  | 
fun ackermann 0 n = n + 1  | 
1932  | 
| ackermann m 0 = ackermann (m - 1) 1  | 
|
1933  | 
| ackermann m n = ackermann (m - 1) (ackermann m (n - 1));  | 
|
1934  | 
||
1935  | 
Par_List.map (ackermann 2) (500 upto 1000);  | 
|
1936  | 
Par_List.map (ackermann 3) (5 upto 10);  | 
|
| 58618 | 1937  | 
\<close>  | 
1938  | 
||
1939  | 
||
1940  | 
subsection \<open>Lazy evaluation\<close>  | 
|
1941  | 
||
1942  | 
text \<open>  | 
|
| 61854 | 1943  | 
Classic lazy evaluation works via the \<open>lazy\<close>~/ \<open>force\<close> pair of operations:  | 
1944  | 
\<open>lazy\<close> to wrap an unevaluated expression, and \<open>force\<close> to evaluate it once  | 
|
1945  | 
and store its result persistently. Later invocations of \<open>force\<close> retrieve the  | 
|
1946  | 
stored result without another evaluation. Isabelle/ML refines this idea to  | 
|
1947  | 
accommodate the aspects of multi-threading, synchronous program exceptions  | 
|
1948  | 
and asynchronous interrupts.  | 
|
| 57347 | 1949  | 
|
| 61854 | 1950  | 
The first thread that invokes \<open>force\<close> on an unfinished lazy value changes  | 
1951  | 
its state into a \<^emph>\<open>promise\<close> of the eventual result and starts evaluating it.  | 
|
1952  | 
Any other threads that \<open>force\<close> the same lazy value in the meantime need to  | 
|
1953  | 
wait for it to finish, by producing a regular result or program exception.  | 
|
1954  | 
If the evaluation attempt is interrupted, this event is propagated to all  | 
|
1955  | 
waiting threads and the lazy value is reset to its original state.  | 
|
| 57347 | 1956  | 
|
1957  | 
This means a lazy value is completely evaluated at most once, in a  | 
|
1958  | 
thread-safe manner. There might be multiple interrupted evaluation attempts,  | 
|
1959  | 
and multiple receivers of intermediate interrupt events. Interrupts are  | 
|
| 61477 | 1960  | 
\<^emph>\<open>not\<close> made persistent: later evaluation attempts start again from the  | 
| 57347 | 1961  | 
original expression.  | 
| 58618 | 1962  | 
\<close>  | 
1963  | 
||
1964  | 
text %mlref \<open>  | 
|
| 57347 | 1965  | 
  \begin{mldecls}
 | 
1966  | 
  @{index_ML_type "'a lazy"} \\
 | 
|
1967  | 
  @{index_ML Lazy.lazy: "(unit -> 'a) -> 'a lazy"} \\
 | 
|
1968  | 
  @{index_ML Lazy.value: "'a -> 'a lazy"} \\
 | 
|
1969  | 
  @{index_ML Lazy.force: "'a lazy -> 'a"} \\
 | 
|
1970  | 
  \end{mldecls}
 | 
|
1971  | 
||
| 61503 | 1972  | 
  \<^descr> Type @{ML_type "'a lazy"} represents lazy values over type \<^verbatim>\<open>'a\<close>.
 | 
| 57347 | 1973  | 
|
| 61854 | 1974  | 
  \<^descr> @{ML Lazy.lazy}~\<open>(fn () => e)\<close> wraps the unevaluated expression \<open>e\<close> as
 | 
1975  | 
unfinished lazy value.  | 
|
| 61493 | 1976  | 
|
| 61854 | 1977  | 
  \<^descr> @{ML Lazy.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished lazy value. When
 | 
1978  | 
forced, it returns \<open>a\<close> without any further evaluation.  | 
|
| 57347 | 1979  | 
|
| 57349 | 1980  | 
There is very low overhead for this proforma wrapping of strict values as  | 
1981  | 
lazy values.  | 
|
| 57347 | 1982  | 
|
| 61493 | 1983  | 
  \<^descr> @{ML Lazy.force}~\<open>x\<close> produces the result of the lazy value in a
 | 
| 57347 | 1984  | 
thread-safe manner as explained above. Thus it may cause the current thread  | 
1985  | 
to wait on a pending evaluation attempt by another thread.  | 
|
| 58618 | 1986  | 
\<close>  | 
1987  | 
||
1988  | 
||
1989  | 
subsection \<open>Futures \label{sec:futures}\<close>
 | 
|
1990  | 
||
1991  | 
text \<open>  | 
|
| 57349 | 1992  | 
Futures help to organize parallel execution in a value-oriented manner, with  | 
| 61854 | 1993  | 
\<open>fork\<close>~/ \<open>join\<close> as the main pair of operations, and some further variants;  | 
1994  | 
  see also @{cite "Wenzel:2009" and "Wenzel:2013:ITP"}. Unlike lazy values,
 | 
|
1995  | 
futures are evaluated strictly and spontaneously on separate worker threads.  | 
|
1996  | 
Futures may be canceled, which leads to interrupts on running evaluation  | 
|
1997  | 
attempts, and forces structurally related futures to fail for all time;  | 
|
1998  | 
already finished futures remain unchanged. Exceptions between related  | 
|
| 57350 | 1999  | 
futures are propagated as well, and turned into parallel exceptions (see  | 
2000  | 
above).  | 
|
| 57349 | 2001  | 
|
2002  | 
Technically, a future is a single-assignment variable together with a  | 
|
| 61854 | 2003  | 
\<^emph>\<open>task\<close> that serves administrative purposes, notably within the \<^emph>\<open>task  | 
2004  | 
queue\<close> where new futures are registered for eventual evaluation and the  | 
|
2005  | 
worker threads retrieve their work.  | 
|
| 57349 | 2006  | 
|
| 57350 | 2007  | 
The pool of worker threads is limited, in correlation with the number of  | 
2008  | 
physical cores on the machine. Note that allocation of runtime resources may  | 
|
2009  | 
be distorted either if workers yield CPU time (e.g.\ via system sleep or  | 
|
2010  | 
wait operations), or if non-worker threads contend for significant runtime  | 
|
2011  | 
resources independently. There is a limited number of replacement worker  | 
|
2012  | 
threads that get activated in certain explicit wait conditions, after a  | 
|
2013  | 
timeout.  | 
|
2014  | 
||
| 61416 | 2015  | 
\<^medskip>  | 
| 61854 | 2016  | 
Each future task belongs to some \<^emph>\<open>task group\<close>, which represents the  | 
2017  | 
hierarchic structure of related tasks, together with the exception status a  | 
|
2018  | 
that point. By default, the task group of a newly created future is a new  | 
|
2019  | 
sub-group of the presently running one, but it is also possible to indicate  | 
|
2020  | 
different group layouts under program control.  | 
|
| 57349 | 2021  | 
|
2022  | 
Cancellation of futures actually refers to the corresponding task group and  | 
|
2023  | 
all its sub-groups. Thus interrupts are propagated down the group hierarchy.  | 
|
2024  | 
Regular program exceptions are treated likewise: failure of the evaluation  | 
|
2025  | 
of some future task affects its own group and all sub-groups. Given a  | 
|
| 61854 | 2026  | 
particular task group, its \<^emph>\<open>group status\<close> cumulates all relevant exceptions  | 
2027  | 
according to its position within the group hierarchy. Interrupted tasks that  | 
|
2028  | 
lack regular result information, will pick up parallel exceptions from the  | 
|
2029  | 
cumulative group status.  | 
|
| 57349 | 2030  | 
|
| 61416 | 2031  | 
\<^medskip>  | 
| 61854 | 2032  | 
A \<^emph>\<open>passive future\<close> or \<^emph>\<open>promise\<close> is a future with slightly different  | 
2033  | 
evaluation policies: there is only a single-assignment variable and some  | 
|
2034  | 
expression to evaluate for the \<^emph>\<open>failed\<close> case (e.g.\ to clean up resources  | 
|
2035  | 
when canceled). A regular result is produced by external means, using a  | 
|
2036  | 
separate \<^emph>\<open>fulfill\<close> operation.  | 
|
| 57349 | 2037  | 
|
2038  | 
Promises are managed in the same task queue, so regular futures may depend  | 
|
2039  | 
on them. This allows a form of reactive programming, where some promises are  | 
|
2040  | 
used as minimal elements (or guards) within the future dependency graph:  | 
|
2041  | 
when these promises are fulfilled the evaluation of subsequent futures  | 
|
2042  | 
starts spontaneously, according to their own inter-dependencies.  | 
|
| 58618 | 2043  | 
\<close>  | 
2044  | 
||
2045  | 
text %mlref \<open>  | 
|
| 57348 | 2046  | 
  \begin{mldecls}
 | 
2047  | 
  @{index_ML_type "'a future"} \\
 | 
|
2048  | 
  @{index_ML Future.fork: "(unit -> 'a) -> 'a future"} \\
 | 
|
2049  | 
  @{index_ML Future.forks: "Future.params -> (unit -> 'a) list -> 'a future list"} \\
 | 
|
| 57349 | 2050  | 
  @{index_ML Future.join: "'a future -> 'a"} \\
 | 
2051  | 
  @{index_ML Future.joins: "'a future list -> 'a list"} \\
 | 
|
| 57348 | 2052  | 
  @{index_ML Future.value: "'a -> 'a future"} \\
 | 
2053  | 
  @{index_ML Future.map: "('a -> 'b) -> 'a future -> 'b future"} \\
 | 
|
2054  | 
  @{index_ML Future.cancel: "'a future -> unit"} \\
 | 
|
2055  | 
  @{index_ML Future.cancel_group: "Future.group -> unit"} \\[0.5ex]
 | 
|
2056  | 
  @{index_ML Future.promise: "(unit -> unit) -> 'a future"} \\
 | 
|
2057  | 
  @{index_ML Future.fulfill: "'a future -> 'a -> unit"} \\
 | 
|
2058  | 
  \end{mldecls}
 | 
|
2059  | 
||
| 61503 | 2060  | 
  \<^descr> Type @{ML_type "'a future"} represents future values over type \<^verbatim>\<open>'a\<close>.
 | 
| 57348 | 2061  | 
|
| 61854 | 2062  | 
  \<^descr> @{ML Future.fork}~\<open>(fn () => e)\<close> registers the unevaluated expression \<open>e\<close>
 | 
2063  | 
as unfinished future value, to be evaluated eventually on the parallel  | 
|
2064  | 
  worker-thread farm. This is a shorthand for @{ML Future.forks} below, with
 | 
|
2065  | 
default parameters and a single expression.  | 
|
| 57348 | 2066  | 
|
| 61854 | 2067  | 
  \<^descr> @{ML Future.forks}~\<open>params exprs\<close> is the general interface to fork several
 | 
2068  | 
futures simultaneously. The \<open>params\<close> consist of the following fields:  | 
|
| 57348 | 2069  | 
|
| 61854 | 2070  | 
    \<^item> \<open>name : string\<close> (default @{ML "\"\""}) specifies a common name for the
 | 
2071  | 
tasks of the forked futures, which serves diagnostic purposes.  | 
|
| 61458 | 2072  | 
|
| 61854 | 2073  | 
    \<^item> \<open>group : Future.group option\<close> (default @{ML NONE}) specifies an optional
 | 
2074  | 
    task group for the forked futures. @{ML NONE} means that a new sub-group
 | 
|
2075  | 
of the current worker-thread task context is created. If this is not a  | 
|
2076  | 
worker thread, the group will be a new root in the group hierarchy.  | 
|
| 61458 | 2077  | 
|
| 61854 | 2078  | 
    \<^item> \<open>deps : Future.task list\<close> (default @{ML "[]"}) specifies dependencies on
 | 
2079  | 
other future tasks, i.e.\ the adjacency relation in the global task queue.  | 
|
2080  | 
Dependencies on already finished tasks are ignored.  | 
|
| 61458 | 2081  | 
|
| 61854 | 2082  | 
    \<^item> \<open>pri : int\<close> (default @{ML 0}) specifies a priority within the task
 | 
2083  | 
queue.  | 
|
| 61458 | 2084  | 
|
| 61854 | 2085  | 
    Typically there is only little deviation from the default priority @{ML
 | 
2086  | 
    0}. As a rule of thumb, @{ML "~1"} means ``low priority" and @{ML 1} means
 | 
|
| 61458 | 2087  | 
``high priority''.  | 
2088  | 
||
| 61854 | 2089  | 
Note that the task priority only affects the position in the queue, not  | 
2090  | 
the thread priority. When a worker thread picks up a task for processing,  | 
|
2091  | 
it runs with the normal thread priority to the end (or until canceled).  | 
|
2092  | 
Higher priority tasks that are queued later need to wait until this (or  | 
|
2093  | 
another) worker thread becomes free again.  | 
|
| 61458 | 2094  | 
|
| 61854 | 2095  | 
    \<^item> \<open>interrupts : bool\<close> (default @{ML true}) tells whether the worker thread
 | 
2096  | 
that processes the corresponding task is initially put into interruptible  | 
|
2097  | 
state. This state may change again while running, by modifying the thread  | 
|
2098  | 
attributes.  | 
|
| 61458 | 2099  | 
|
| 61854 | 2100  | 
With interrupts disabled, a running future task cannot be canceled. It is  | 
| 61458 | 2101  | 
the responsibility of the programmer that this special state is retained  | 
2102  | 
only briefly.  | 
|
| 57348 | 2103  | 
|
| 61854 | 2104  | 
  \<^descr> @{ML Future.join}~\<open>x\<close> retrieves the value of an already finished future,
 | 
2105  | 
which may lead to an exception, according to the result of its previous  | 
|
2106  | 
evaluation.  | 
|
| 57348 | 2107  | 
|
2108  | 
For an unfinished future there are several cases depending on the role of  | 
|
2109  | 
the current thread and the status of the future. A non-worker thread waits  | 
|
2110  | 
passively until the future is eventually evaluated. A worker thread  | 
|
2111  | 
temporarily changes its task context and takes over the responsibility to  | 
|
| 57349 | 2112  | 
evaluate the future expression on the spot. The latter is done in a  | 
2113  | 
thread-safe manner: other threads that intend to join the same future need  | 
|
2114  | 
to wait until the ongoing evaluation is finished.  | 
|
2115  | 
||
2116  | 
Note that excessive use of dynamic dependencies of futures by adhoc joining  | 
|
2117  | 
may lead to bad utilization of CPU cores, due to threads waiting on other  | 
|
2118  | 
threads to finish required futures. The future task farm has a limited  | 
|
2119  | 
amount of replacement threads that continue working on unrelated tasks after  | 
|
2120  | 
some timeout.  | 
|
| 57348 | 2121  | 
|
2122  | 
Whenever possible, static dependencies of futures should be specified  | 
|
| 61854 | 2123  | 
explicitly when forked (see \<open>deps\<close> above). Thus the evaluation can work from  | 
2124  | 
the bottom up, without join conflicts and wait states.  | 
|
| 57349 | 2125  | 
|
| 61854 | 2126  | 
  \<^descr> @{ML Future.joins}~\<open>xs\<close> joins the given list of futures simultaneously,
 | 
2127  | 
  which is more efficient than @{ML "map Future.join"}~\<open>xs\<close>.
 | 
|
| 57349 | 2128  | 
|
2129  | 
Based on the dependency graph of tasks, the current thread takes over the  | 
|
2130  | 
responsibility to evaluate future expressions that are required for the main  | 
|
2131  | 
result, working from the bottom up. Waiting on future results that are  | 
|
2132  | 
presently evaluated on other threads only happens as last resort, when no  | 
|
2133  | 
other unfinished futures are left over.  | 
|
2134  | 
||
| 61854 | 2135  | 
  \<^descr> @{ML Future.value}~\<open>a\<close> wraps the value \<open>a\<close> as finished future value,
 | 
2136  | 
bypassing the worker-thread farm. When joined, it returns \<open>a\<close> without any  | 
|
2137  | 
further evaluation.  | 
|
| 57349 | 2138  | 
|
2139  | 
There is very low overhead for this proforma wrapping of strict values as  | 
|
| 57421 | 2140  | 
futures.  | 
| 57348 | 2141  | 
|
| 61493 | 2142  | 
  \<^descr> @{ML Future.map}~\<open>f x\<close> is a fast-path implementation of @{ML
 | 
| 61854 | 2143  | 
  Future.fork}~\<open>(fn () => f (\<close>@{ML Future.join}~\<open>x))\<close>, which avoids the full
 | 
2144  | 
overhead of the task queue and worker-thread farm as far as possible. The  | 
|
2145  | 
function \<open>f\<close> is supposed to be some trivial post-processing or projection of  | 
|
2146  | 
the future result.  | 
|
| 57348 | 2147  | 
|
| 61854 | 2148  | 
  \<^descr> @{ML Future.cancel}~\<open>x\<close> cancels the task group of the given future, using
 | 
2149  | 
  @{ML Future.cancel_group} below.
 | 
|
| 57348 | 2150  | 
|
| 61854 | 2151  | 
  \<^descr> @{ML Future.cancel_group}~\<open>group\<close> cancels all tasks of the given task
 | 
2152  | 
group for all time. Threads that are presently processing a task of the  | 
|
2153  | 
given group are interrupted: it may take some time until they are actually  | 
|
2154  | 
terminated. Tasks that are queued but not yet processed are dequeued and  | 
|
2155  | 
forced into interrupted state. Since the task group is itself invalidated,  | 
|
2156  | 
any further attempt to fork a future that belongs to it will yield a  | 
|
2157  | 
canceled result as well.  | 
|
| 57348 | 2158  | 
|
| 61854 | 2159  | 
  \<^descr> @{ML Future.promise}~\<open>abort\<close> registers a passive future with the given
 | 
2160  | 
\<open>abort\<close> operation: it is invoked when the future task group is canceled.  | 
|
| 57348 | 2161  | 
|
| 61854 | 2162  | 
  \<^descr> @{ML Future.fulfill}~\<open>x a\<close> finishes the passive future \<open>x\<close> by the given
 | 
2163  | 
value \<open>a\<close>. If the promise has already been canceled, the attempt to fulfill  | 
|
2164  | 
it causes an exception.  | 
|
| 58618 | 2165  | 
\<close>  | 
| 57348 | 2166  | 
|
| 47180 | 2167  | 
end  |