# HG changeset patch # User wenzelm # Date 1287434246 -3600 # Node ID a8363532cd4de55c7a9352e512aabe2b737c9b3e # Parent 5ec01d5acd0ce5e104d70c74377259d778fe352f somewhat modernized version of "Thread-safe programming"; diff -r 5ec01d5acd0c -r a8363532cd4d doc-src/IsarImplementation/Thy/ML.thy --- a/doc-src/IsarImplementation/Thy/ML.thy Mon Oct 18 19:06:07 2010 +0100 +++ b/doc-src/IsarImplementation/Thy/ML.thy Mon Oct 18 21:37:26 2010 +0100 @@ -341,7 +341,7 @@ *} -section {* Exceptions *} +section {* Exceptions \label{sec:exceptions} *} text {* The Standard ML semantics of strict functional evaluation together with exceptions is rather well defined, but some delicate @@ -693,4 +693,186 @@ strictly local situation that is guaranteed to be restricted to sequential evaluation -- now and in the future. *} + +section {* Thread-safe programming *} + +text {* Multi-threaded execution has become an everyday reality in + Isabelle since Poly/ML 5.2.1 and Isabelle2008. Isabelle/ML provides + implicit and explicit parallelism by default, without any option to + ``drop out''. User-code that is purely functional and does not + intercept interrupts (\secref{sec:exceptions}) can participate + within the multi-threaded environment without further ado. More + ambitious tools need to observe the principles explained below. *} + + +subsection {* Multi-threading with shared memory *} + +text {* + Multiple threads help to organize advanced operations of the system, + such as real-time conditions on command transactions, sub-components + with explicit communication, general asynchronous interaction etc. + Moreover, truely parallel evaluation is inevitable to make adequate + use of the CPU resources that are available on multi-core + systems.\footnote{Multi-core computing does not mean that there are + ``spare cycles'' to be wasted. It means that the continued + exponential speedup of CPU performance due to ``Moore's Law'' + follows different rules: clock frequency has reached its peak around + 2005, and applications need to be parallelized in order to avoid a + perceived loss of performance, see also \cite{Sutter:2005}.} + + Isabelle/Isar exploits the inherent structure of theories and proofs + to support \emph{implicit parallelism} to a large extent. LCF-style + theorem provides unusually good conditions for parallelism; see also + \cite{Wenzel:2009}. This means, significant parts of theory and + proof checking is parallelized by default. A maximum speedup-factor + of 3.0 on 4 cores and 5.0 on 8 cores can be + expected.\footnote{Further scalability is limited due to garbage + collection, which is still sequential in Poly/ML 5.2/5.3/5.4. It + helps to provide initial heap space generously, using the + \texttt{-H} option. Initial heap size needs to be scaled-up + together with the number of CPU cores.} + + \medskip ML threads lack the memory protection of separate + processes, and operate concurrently on shared heap memory. This has + the advantage that results of independent computations are + immediately available to other threads. Abstract values can be + passed between threads without copying or awkward serialization that + is typically required for explicit message passing between separate + processes. + + To make shared-memory multi-threading work robustly and efficiently, + some programming guidelines need to be observed. While the ML + system is responsible to maintain basic integrity of the + representation of ML values in memory, the application programmer + needs to ensure that multi-threaded execution does not break the + intended semantics. + + \begin{warn} + To participate in implicit parallelism, tools need to be + thread-safe. A single ill-behaved tool can affect the stability and + performance of the whole system. + \end{warn} + + Apart from observing the principles of thread-safeness passively, + advanced tools may also exploit parallelism actively, e.g.\ by using + ``future values'' (\secref{sec:futures}) or or the more basic + library functions for parallel list operations + (\secref{sec:parlist}). + + \begin{warn} + Parallel computing resources are managed centrally by the + Isabelle/ML infrastructure. User-space programs must not fork their + own threads to perform computations. + \end{warn} +*} + +subsection {* Critical shared resources *} + +text {* Thread-safeness is mainly concerned about concurrent + read/write access to shared resources, which are outside the purely + functional world of ML. This covers the following in particular. + + \begin{itemize} + + \item Global references (or arrays), i.e.\ mutable memory cells that + persist over several invocations of associated + operations.\footnote{This is independent of the visibility of such + mutable values in the toplevel scope.} + + \item Global state of the running process, i.e.\ raw I/O channels, + environment variables, current working directory. + + \item Writable resources in the file-system that are shared among + different threads or other processes. + + \end{itemize} + + Isabelle/ML provides various mechanisms to \emph{avoid} critical + shared resources in most practical situations. As last resort there + are some mechanisms for explicit synchronization. The following + guidelines help to make Isabelle/ML programs work smoothly in a + highly parallel environment. + + \begin{itemize} + + \item Avoid global references altogether. Isabelle/Isar maintains a + uniform context that incorporates arbitrary data declared by + Isabelle/ML programs in user-space (see \secref{sec:context-data}). + This context is passed as plain value and user tools can get/map + their own data in a purely functional manner. Configuration options + within the context (\secref{sec:config-options}) provide simple + drop-in replacements for formerly writable flags. + + \item Keep components with local state information re-entrant. + Instead of poking initial values into (private) global references, + create a new state record on each invocation, and pass that through + any auxiliary functions of the component. The state record may well + contain mutable references, without requiring any special + synchronizations, as long as each invocation sees its own copy. + + \item Raw output on @{text "stdout"} or @{text "stderr"} should be + avoided altogether, or at least performed as a single atomic + action.\footnote{The Poly/ML library is thread-safe for each + individual output operation, but the ordering of parallel + invocations is arbitrary. This means raw output will appear on some + system console with unpredictable interleaving of atomic chunks.} + + Note that regular message output channels + (\secref{sec:message-channels}) are not directly affected: each + message is associated with the command transaction from where it + originates, independently of other transactions. This means each + running command has effectively its own set of message channels, and + interleaving can only happen (at message boundary) when commands use + parallelism internally. + + \item Environment variables and the current working directory of the + running Isabelle process are considered strictly read-only. + + \item Writable files are in most situations just temporary input to + external processes invoked by some ML thread. By ensuring a unique + file name for each instance, such write operations will be disjoint + over the life-time of a given Isabelle process, and thus + thread-safe. + + \end{itemize} + + In rare situations where actual mutable content needs to be + manipulated, Isabelle provides a single \emph{critical section} that + may be entered while preventing any other thread from doing the + same. Entering the critical section without contention is very + fast, and several basic system operations do so frequently. This + also means that each thread should leave the critical section + quickly, otherwise parallel execution performance may degrade + significantly. + + Despite this potential bottle-neck, centralized locking is + convenient, because it prevents deadlocks without demanding special + precautions. Explicit communication demands other means, though. + The high-level abstraction of synchronized variables @{"file" + "~~/src/Pure/Concurrent/synchronized.ML"} enables parallel + components to communicate via shared state; see also @{"file" + "~~/src/Pure/Concurrent/mailbox.ML"} as canonical example. + +*} + +text %mlref {* + \begin{mldecls} + @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\ + @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\ + \end{mldecls} + + \begin{description} + + \item @{ML NAMED_CRITICAL}~@{text "name f"} evaluates @{text "f ()"} + while staying within the critical section of Isabelle/Isar. No + other thread may do so at the same time, but non-critical parallel + execution will continue. The @{text "name"} argument serves for + diagnostic purposes and might help to spot sources of congestion. + + \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty + name argument. + + \end{description} +*} + end \ No newline at end of file diff -r 5ec01d5acd0c -r a8363532cd4d doc-src/IsarImplementation/Thy/ML_old.thy --- a/doc-src/IsarImplementation/Thy/ML_old.thy Mon Oct 18 19:06:07 2010 +0100 +++ b/doc-src/IsarImplementation/Thy/ML_old.thy Mon Oct 18 21:37:26 2010 +0100 @@ -104,184 +104,6 @@ *} -section {* Thread-safe programming *} - -text {* - Recent versions of Poly/ML (5.2.1 or later) support robust - multithreaded execution, based on native operating system threads of - the underlying platform. Thus threads will actually be executed in - parallel on multi-core systems. A speedup-factor of approximately - 1.5--3 can be expected on a regular 4-core machine.\footnote{There - is some inherent limitation of the speedup factor due to garbage - collection, which is still sequential. It helps to provide initial - heap space generously, using the \texttt{-H} option of Poly/ML.} - Threads also help to organize advanced operations of the system, - with explicit communication between sub-components, real-time - conditions, time-outs etc. - - Threads lack the memory protection of separate processes, and - operate concurrently on shared heap memory. This has the advantage - that results of independent computations are immediately available - to other threads, without requiring untyped character streams, - awkward serialization etc. - - On the other hand, some programming guidelines need to be observed - in order to make unprotected parallelism work out smoothly. While - the ML system implementation is responsible to maintain basic - integrity of the representation of ML values in memory, the - application programmer needs to ensure that multithreaded execution - does not break the intended semantics. - - \medskip \paragraph{Critical shared resources.} Actually only those - parts outside the purely functional world of ML are critical. In - particular, this covers - - \begin{itemize} - - \item global references (or arrays), i.e.\ those that persist over - several invocations of associated operations,\footnote{This is - independent of the visibility of such mutable values in the toplevel - scope.} - - \item direct I/O on shared channels, notably @{text "stdin"}, @{text - "stdout"}, @{text "stderr"}. - - \end{itemize} - - The majority of tools implemented within the Isabelle/Isar framework - will not require any of these critical elements: nothing special - needs to be observed when staying in the purely functional fragment - of ML. Note that output via the official Isabelle channels does not - count as direct I/O, so the operations @{ML "writeln"}, @{ML - "warning"}, @{ML "tracing"} etc.\ are safe. - - Moreover, ML bindings within the toplevel environment (@{verbatim - "type"}, @{verbatim val}, @{verbatim "structure"} etc.) due to - run-time invocation of the compiler are also safe, because - Isabelle/Isar manages this as part of the theory or proof context. - - \paragraph{Multithreading in Isabelle/Isar.} The theory loader - automatically exploits the overall parallelism of independent nodes - in the development graph, as well as the inherent irrelevance of - proofs for goals being fully specified in advance. This means, - checking of individual Isar proofs is parallelized by default. - Beyond that, very sophisticated proof tools may use local - parallelism internally, via the general programming model of - ``future values'' (see also @{"file" - "~~/src/Pure/Concurrent/future.ML"}). - - Any ML code that works relatively to the present background theory - is already safe. Contextual data may be easily stored within the - theory or proof context, thanks to the generic data concept of - Isabelle/Isar (see \secref{sec:context-data}). This greatly - diminishes the demand for global state information in the first - place. - - \medskip In rare situations where actual mutable content needs to be - manipulated, Isabelle provides a single \emph{critical section} that - may be entered while preventing any other thread from doing the - same. Entering the critical section without contention is very - fast, and several basic system operations do so frequently. This - also means that each thread should leave the critical section - quickly, otherwise parallel execution performance may degrade - significantly. - - Despite this potential bottle-neck, centralized locking is - convenient, because it prevents deadlocks without demanding special - precautions. Explicit communication demands other means, though. - The high-level abstraction of synchronized variables @{"file" - "~~/src/Pure/Concurrent/synchronized.ML"} enables parallel - components to communicate via shared state; see also @{"file" - "~~/src/Pure/Concurrent/mailbox.ML"} as canonical example. - - \paragraph{Good conduct of impure programs.} The following - guidelines enable non-functional programs to participate in - multithreading. - - \begin{itemize} - - \item Minimize global state information. Using proper theory and - proof context data will actually return to functional update of - values, without any special precautions for multithreading. Apart - from the fully general functors for theory and proof data (see - \secref{sec:context-data}) there are drop-in replacements that - emulate primitive references for common cases of \emph{configuration - options} for type @{ML_type "bool"}/@{ML_type "int"}/@{ML_type - "string"} (see structure @{ML_struct Config} and @{ML - Attrib.config_bool} etc.), and lists of theorems (see functor - @{ML_functor Named_Thms}). - - \item Keep components with local state information - \emph{re-entrant}. Instead of poking initial values into (private) - global references, create a new state record on each invocation, and - pass that through any auxiliary functions of the component. The - state record may well contain mutable references, without requiring - any special synchronizations, as long as each invocation sees its - own copy. Occasionally, one might even return to plain functional - updates on non-mutable record values here. - - \item Isolate process configuration flags. The main legitimate - application of global references is to configure the whole process - in a certain way, essentially affecting all threads. A typical - example is the @{ML show_types} flag, which tells the pretty printer - to output explicit type information for terms. Such flags usually - do not affect the functionality of the core system, but only the - view being presented to the user. - - Occasionally, such global process flags are treated like implicit - arguments to certain operations, by using the @{ML setmp_CRITICAL} combinator - for safe temporary assignment. Its traditional purpose was to - ensure proper recovery of the original value when exceptions are - raised in the body, now the functionality is extended to enter the - \emph{critical section} (with its usual potential of degrading - parallelism). - - Note that recovery of plain value passing semantics via @{ML - setmp_CRITICAL}~@{text "ref value"} assumes that this @{text "ref"} is - exclusively manipulated within the critical section. In particular, - any persistent global assignment of @{text "ref := value"} needs to - be marked critical as well, to prevent intruding another threads - local view, and a lost-update in the global scope, too. - - \end{itemize} - - Recall that in an open ``LCF-style'' system like Isabelle/Isar, the - user participates in constructing the overall environment. This - means that state-based facilities offered by one component will - require special caution later on. So minimizing critical elements, - by staying within the plain value-oriented view relative to theory - or proof contexts most of the time, will also reduce the chance of - mishaps occurring to end-users. -*} - -text %mlref {* - \begin{mldecls} - @{index_ML NAMED_CRITICAL: "string -> (unit -> 'a) -> 'a"} \\ - @{index_ML CRITICAL: "(unit -> 'a) -> 'a"} \\ - @{index_ML setmp_CRITICAL: "'a Unsynchronized.ref -> 'a -> ('b -> 'c) -> 'b -> 'c"} \\ - \end{mldecls} - - \begin{description} - - \item @{ML NAMED_CRITICAL}~@{text "name f"} evaluates @{text "f ()"} - while staying within the critical section of Isabelle/Isar. No - other thread may do so at the same time, but non-critical parallel - execution will continue. The @{text "name"} argument serves for - diagnostic purposes and might help to spot sources of congestion. - - \item @{ML CRITICAL} is the same as @{ML NAMED_CRITICAL} with empty - name argument. - - \item @{ML setmp_CRITICAL}~@{text "ref value f x"} evaluates @{text "f x"} - while staying within the critical section and having @{text "ref := - value"} assigned temporarily. This recovers a value-passing - semantics involving global references, regardless of exceptions or - concurrency. - - \end{description} -*} - - chapter {* Basic library functions *} text {* diff -r 5ec01d5acd0c -r a8363532cd4d doc-src/manual.bib --- a/doc-src/manual.bib Mon Oct 18 19:06:07 2010 +0100 +++ b/doc-src/manual.bib Mon Oct 18 21:37:26 2010 +0100 @@ -1393,6 +1393,14 @@ year = 2000, publisher = Springer} +@Article{Sutter:2005, + author = {H. Sutter}, + title = {The Free Lunch Is Over --- A Fundamental Turn Toward Concurrency in Software}, + journal = {Dr. Dobb's Journal}, + year = 2005, + volume = 30, + number = 3} + @InCollection{szasz93, author = {Nora Szasz}, title = {A Machine Checked Proof that {Ackermann's} Function is not @@ -1579,6 +1587,14 @@ year = {2007} } +@InProceedings{Wenzel:2009, + author = {M. Wenzel}, + title = {Parallel Proof Checking in {Isabelle/Isar}}, + booktitle = {ACM SIGSAM Workshop on Programming Languages for Mechanized Mathematics Systems (PLMMS 2009)}, + year = 2009, + editor = {Dos Reis, G. and L. Th\'ery}, + publisher = {ACM Digital Library}} + @book{principia, author = {A. N. Whitehead and B. Russell}, title = {Principia Mathematica},