--- a/doc-src/IsarImplementation/Thy/ML.thy Mon Oct 18 19:06:07 2010 +0100
+++ b/doc-src/IsarImplementation/Thy/ML.thy Mon Oct 18 21:37:26 2010 +0100
@@ -341,7 +341,7 @@
*}
-section {* Exceptions *}
+section {* Exceptions \label{sec:exceptions} *}
text {* The Standard ML semantics of strict functional evaluation
together with exceptions is rather well defined, but some delicate
@@ -693,4 +693,186 @@
strictly local situation that is guaranteed to be restricted to
sequential evaluation -- now and in the future. *}
+
+section {* Thread-safe programming *}
+
+text {* Multi-threaded execution has become an everyday reality in
+ Isabelle since Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides
+ implicit and explicit parallelism by default, without any option to
+ ``drop out''. User-code that is purely functional and does not
+ intercept interrupts (\secref{sec:exceptions}) can participate
+ within the multi-threaded environment without further ado. More
+ ambitious tools need to observe the principles explained below. *}
+
+
+subsection {* Multi-threading with shared memory *}
+
+text {*
+ Multiple threads help to organize advanced operations of the system,
+ such as real-time conditions on command transactions, sub-components
+ with explicit communication, general asynchronous interaction etc.
+ Moreover, truely parallel evaluation is inevitable to make adequate
+ use of the CPU resources that are available on multi-core
+ systems.\footnote{Multi-core computing does not mean that there are
+ ``spare cycles'' to be wasted. It means that the continued
+ exponential speedup of CPU performance due to ``Moore's Law''
+ follows different rules: clock frequency has reached its peak around
+ 2005, and applications need to be parallelized in order to avoid a
+ perceived loss of performance, see also \cite{Sutter:2005}.}
+
+ Isabelle/Isar exploits the inherent structure of theories and proofs
+ to support \emph{implicit parallelism} to a large extent. LCF-style
+ theorem provides unusually good conditions for parallelism; see also
+ \cite{Wenzel:2009}. This means, significant parts of theory and
+ proof checking is parallelized by default. A maximum speedup-factor
+ of 3.0 on 4 cores and 5.0 on 8 cores can be
+ expected.\footnote{Further scalability is limited due to garbage
+ collection, which is still sequential in Poly/ML 5.2/5.3/5.4. It
+ helps to provide initial heap space generously, using the
+ \texttt{-H} option. Initial heap size needs to be scaled-up
+ together with the number of CPU cores.}
+
+ \medskip ML threads lack the memory protection of separate
+ processes, and operate concurrently on shared heap memory. This has
+ the advantage that results of independent computations are
+ immediately available to other threads. Abstract values can be
+ passed between threads without copying or awkward serialization that
+ is typically required for explicit message passing between separate
+ processes.
+
+ To make shared-memory multi-threading work robustly and efficiently,
+ some programming guidelines need to be observed. While the ML
+ system is responsible to maintain basic integrity of the
+ representation of ML values in memory, the application programmer
+ needs to ensure that multi-threaded execution does not break the
+ intended semantics.
+
+ \begin{warn}
+ To participate in implicit parallelism, tools need to be
+ thread-safe. A single ill-behaved tool can affect the stability and
+ performance of the whole system.
+ \end{warn}
+
+ Apart from observing the principles of thread-safeness passively,
+ advanced tools may also exploit parallelism actively, e.g.\ by using
+ ``future values'' (\secref{sec:futures}) or or the more basic
+ library functions for parallel list operations
+ (\secref{sec:parlist}).
+
+ \begin{warn}
+ Parallel computing resources are managed centrally by the
+ Isabelle/ML infrastructure. User-space programs must not fork their
+ own threads to perform computations.
+ \end{warn}
+*}
+
+subsection {* Critical shared resources *}
+
+text {* Thread-safeness is mainly concerned about concurrent
+ read/write access to shared resources, which are outside the purely
+ functional world of ML. This covers the following in particular.
+
+ \begin{itemize}
+
+ \item Global references (or arrays), i.e.\ mutable memory cells that
+ persist over several invocations of associated
+ operations.\footnote{This is independent of the visibility of such
+ mutable values in the toplevel scope.}
+
+ \item Global state of the running process, i.e.\ raw I/O channels,
+ environment variables, current working directory.
+
+ \item Writable resources in the file-system that are shared among
+ different threads or other processes.
+
+ \end{itemize}
+
+ Isabelle/ML provides various mechanisms to \emph{avoid} critical
+ shared resources in most practical situations. As last resort there
+ are some mechanisms for explicit synchronization. The following
+ guidelines help to make Isabelle/ML programs work smoothly in a
+ highly parallel environment.
+
+ \begin{itemize}
+
+ \item Avoid global references altogether. Isabelle/Isar maintains a
+ uniform context that incorporates arbitrary data declared by
+ Isabelle/ML programs in user-space (see \secref{sec:context-data}).
+ This context is passed as plain value and user tools can get/map
+ their own data in a purely functional manner. Configuration options
+ within the context (\secref{sec:config-options}) provide simple
+ drop-in replacements for formerly writable flags.
+
+ \item Keep components with local state information re-entrant.
+ Instead of poking initial values into (private) global references,
+ create a new state record on each invocation, and pass that through
+ any auxiliary functions of the component. The state record may well
+ contain mutable references, without requiring any special
+ synchronizations, as long as each invocation sees its own copy.
+
+ \item Raw output on @{text "stdout"} or @{text "stderr"} should be
+ avoided altogether, or at least performed as a single atomic
+ action.\footnote{The Poly/ML library is thread-safe for each
+ individual output operation, but the ordering of parallel
+ invocations is arbitrary. This means raw output will appear on some
+ system console with unpredictable interleaving of atomic chunks.}
+
+ Note that regular message output channels
+ (\secref{sec:message-channels}) are not directly affected: each
+ message is associated with the command transaction from where it
+ originates, independently of other transactions. This means each
+ running command has effectively its own set of message channels, and
+ interleaving can only happen (at message boundary) when commands use
+ parallelism internally.
+
+ \item Environment variables and the current working directory of the
+ running Isabelle process are considered strictly read-only.
+
+ \item Writable files are in most situations just temporary input to
+ external processes invoked by some ML thread. By ensuring a unique
+ file name for each instance, such write operations will be disjoint
+ over the life-time of a given Isabelle process, and thus
+ thread-safe.
+
+ \end{itemize}
+
+ In rare situations where actual mutable content needs to be
+ manipulated, Isabelle provides a single \emph{critical section} that
+ may be entered while preventing any other thread from doing the
+ same. Entering the critical section without contention is very
+ fast, and several basic system operations do so frequently. This
+ also means that each thread should leave the critical section
+ quickly, otherwise parallel execution performance may degrade
+ significantly.
+
+ Despite this potential bottle-neck, centralized locking is
+ convenient, because it prevents deadlocks without demanding special
+ precautions. Explicit communication demands other means, though.
+ The high-level abstraction of synchronized variables @{"file"
+ "~~/src/Pure/Concurrent/synchronized.ML"} enables parallel
+ components to communicate via shared state; see also @{"file"
+ "~~/src/Pure/Concurrent/mailbox.ML"} as canonical example.
+
+*}
+
+text %mlref {*
+ \begin{mldecls}
+ @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
+ @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
+ \end{mldecls}
+
+ \begin{description}
+
+ \item @{ML NAMED_CRITICAL}~@{text "name f"} evaluates @{text "f ()"}
+ while staying within the critical section of Isabelle/Isar. No
+ other thread may do so at the same time, but non-critical parallel
+ execution will continue. The @{text "name"} argument serves for
+ diagnostic purposes and might help to spot sources of congestion.
+
+ \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
+ name argument.
+
+ \end{description}
+*}
+
end
\ No newline at end of file
--- a/doc-src/IsarImplementation/Thy/ML_old.thy Mon Oct 18 19:06:07 2010 +0100
+++ b/doc-src/IsarImplementation/Thy/ML_old.thy Mon Oct 18 21:37:26 2010 +0100
@@ -104,184 +104,6 @@
*}
-section {* Thread-safe programming *}
-
-text {*
- Recent versions of Poly/ML (5.2.1 or later) support robust
- multithreaded execution, based on native operating system threads of
- the underlying platform. Thus threads will actually be executed in
- parallel on multi-core systems. A speedup-factor of approximately
- 1.5--3 can be expected on a regular 4-core machine.\footnote{There
- is some inherent limitation of the speedup factor due to garbage
- collection, which is still sequential. It helps to provide initial
- heap space generously, using the \texttt{-H} option of Poly/ML.}
- Threads also help to organize advanced operations of the system,
- with explicit communication between sub-components, real-time
- conditions, time-outs etc.
-
- Threads lack the memory protection of separate processes, and
- operate concurrently on shared heap memory. This has the advantage
- that results of independent computations are immediately available
- to other threads, without requiring untyped character streams,
- awkward serialization etc.
-
- On the other hand, some programming guidelines need to be observed
- in order to make unprotected parallelism work out smoothly. While
- the ML system implementation is responsible to maintain basic
- integrity of the representation of ML values in memory, the
- application programmer needs to ensure that multithreaded execution
- does not break the intended semantics.
-
- \medskip \paragraph{Critical shared resources.} Actually only those
- parts outside the purely functional world of ML are critical. In
- particular, this covers
-
- \begin{itemize}
-
- \item global references (or arrays), i.e.\ those that persist over
- several invocations of associated operations,\footnote{This is
- independent of the visibility of such mutable values in the toplevel
- scope.}
-
- \item direct I/O on shared channels, notably @{text "stdin"}, @{text
- "stdout"}, @{text "stderr"}.
-
- \end{itemize}
-
- The majority of tools implemented within the Isabelle/Isar framework
- will not require any of these critical elements: nothing special
- needs to be observed when staying in the purely functional fragment
- of ML. Note that output via the official Isabelle channels does not
- count as direct I/O, so the operations @{ML "writeln"}, @{ML
- "warning"}, @{ML "tracing"} etc.\ are safe.
-
- Moreover, ML bindings within the toplevel environment (@{verbatim
- "type"}, @{verbatim val}, @{verbatim "structure"} etc.) due to
- run-time invocation of the compiler are also safe, because
- Isabelle/Isar manages this as part of the theory or proof context.
-
- \paragraph{Multithreading in Isabelle/Isar.} The theory loader
- automatically exploits the overall parallelism of independent nodes
- in the development graph, as well as the inherent irrelevance of
- proofs for goals being fully specified in advance. This means,
- checking of individual Isar proofs is parallelized by default.
- Beyond that, very sophisticated proof tools may use local
- parallelism internally, via the general programming model of
- ``future values'' (see also @{"file"
- "~~/src/Pure/Concurrent/future.ML"}).
-
- Any ML code that works relatively to the present background theory
- is already safe. Contextual data may be easily stored within the
- theory or proof context, thanks to the generic data concept of
- Isabelle/Isar (see \secref{sec:context-data}). This greatly
- diminishes the demand for global state information in the first
- place.
-
- \medskip In rare situations where actual mutable content needs to be
- manipulated, Isabelle provides a single \emph{critical section} that
- may be entered while preventing any other thread from doing the
- same. Entering the critical section without contention is very
- fast, and several basic system operations do so frequently. This
- also means that each thread should leave the critical section
- quickly, otherwise parallel execution performance may degrade
- significantly.
-
- Despite this potential bottle-neck, centralized locking is
- convenient, because it prevents deadlocks without demanding special
- precautions. Explicit communication demands other means, though.
- The high-level abstraction of synchronized variables @{"file"
- "~~/src/Pure/Concurrent/synchronized.ML"} enables parallel
- components to communicate via shared state; see also @{"file"
- "~~/src/Pure/Concurrent/mailbox.ML"} as canonical example.
-
- \paragraph{Good conduct of impure programs.} The following
- guidelines enable non-functional programs to participate in
- multithreading.
-
- \begin{itemize}
-
- \item Minimize global state information. Using proper theory and
- proof context data will actually return to functional update of
- values, without any special precautions for multithreading. Apart
- from the fully general functors for theory and proof data (see
- \secref{sec:context-data}) there are drop-in replacements that
- emulate primitive references for common cases of \emph{configuration
- options} for type @{ML_type "bool"}/@{ML_type "int"}/@{ML_type
- "string"} (see structure @{ML_struct Config} and @{ML
- Attrib.config_bool} etc.), and lists of theorems (see functor
- @{ML_functor Named_Thms}).
-
- \item Keep components with local state information
- \emph{re-entrant}. Instead of poking initial values into (private)
- global references, create a new state record on each invocation, and
- pass that through any auxiliary functions of the component. The
- state record may well contain mutable references, without requiring
- any special synchronizations, as long as each invocation sees its
- own copy. Occasionally, one might even return to plain functional
- updates on non-mutable record values here.
-
- \item Isolate process configuration flags. The main legitimate
- application of global references is to configure the whole process
- in a certain way, essentially affecting all threads. A typical
- example is the @{ML show_types} flag, which tells the pretty printer
- to output explicit type information for terms. Such flags usually
- do not affect the functionality of the core system, but only the
- view being presented to the user.
-
- Occasionally, such global process flags are treated like implicit
- arguments to certain operations, by using the @{ML setmp_CRITICAL} combinator
- for safe temporary assignment. Its traditional purpose was to
- ensure proper recovery of the original value when exceptions are
- raised in the body, now the functionality is extended to enter the
- \emph{critical section} (with its usual potential of degrading
- parallelism).
-
- Note that recovery of plain value passing semantics via @{ML
- setmp_CRITICAL}~@{text "ref value"} assumes that this @{text "ref"} is
- exclusively manipulated within the critical section. In particular,
- any persistent global assignment of @{text "ref := value"} needs to
- be marked critical as well, to prevent intruding another threads
- local view, and a lost-update in the global scope, too.
-
- \end{itemize}
-
- Recall that in an open ``LCF-style'' system like Isabelle/Isar, the
- user participates in constructing the overall environment. This
- means that state-based facilities offered by one component will
- require special caution later on. So minimizing critical elements,
- by staying within the plain value-oriented view relative to theory
- or proof contexts most of the time, will also reduce the chance of
- mishaps occurring to end-users.
-*}
-
-text %mlref {*
- \begin{mldecls}
- @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\
- @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\
- @{index_ML setmp_CRITICAL: "'a Unsynchronized.ref -> 'a -> ('b -> 'c) -> 'b -> 'c"} \\
- \end{mldecls}
-
- \begin{description}
-
- \item @{ML NAMED_CRITICAL}~@{text "name f"} evaluates @{text "f ()"}
- while staying within the critical section of Isabelle/Isar. No
- other thread may do so at the same time, but non-critical parallel
- execution will continue. The @{text "name"} argument serves for
- diagnostic purposes and might help to spot sources of congestion.
-
- \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty
- name argument.
-
- \item @{ML setmp_CRITICAL}~@{text "ref value f x"} evaluates @{text "f x"}
- while staying within the critical section and having @{text "ref :=
- value"} assigned temporarily. This recovers a value-passing
- semantics involving global references, regardless of exceptions or
- concurrency.
-
- \end{description}
-*}
-
-
chapter {* Basic library functions *}
text {*
--- a/doc-src/manual.bib Mon Oct 18 19:06:07 2010 +0100
+++ b/doc-src/manual.bib Mon Oct 18 21:37:26 2010 +0100
@@ -1393,6 +1393,14 @@
year = 2000,
publisher = Springer}
+@Article{Sutter:2005,
+ author = {H. Sutter},
+ title = {The Free Lunch Is Over --- A Fundamental Turn Toward Concurrency in Software},
+ journal = {Dr. Dobb's Journal},
+ year = 2005,
+ volume = 30,
+ number = 3}
+
@InCollection{szasz93,
author = {Nora Szasz},
title = {A Machine Checked Proof that {Ackermann's} Function is not
@@ -1579,6 +1587,14 @@
year = {2007}
}
+@InProceedings{Wenzel:2009,
+ author = {M. Wenzel},
+ title = {Parallel Proof Checking in {Isabelle/Isar}},
+ booktitle = {ACM SIGSAM Workshop on Programming Languages for Mechanized Mathematics Systems (PLMMS 2009)},
+ year = 2009,
+ editor = {Dos Reis, G. and L. Th\'ery},
+ publisher = {ACM Digital Library}}
+
@book{principia,
author = {A. N. Whitehead and B. Russell},
title = {Principia Mathematica},