src/Doc/Implementation/ML.thy
changeset 61506 436b7fe89cdc
parent 61504 a7ae3ef886a9
child 61572 ddb3ac3fef45
equal deleted inserted replaced
61505:a4478ca660a5 61506:436b7fe89cdc
   102 
   102 
   103 
   103 
   104 subsection \<open>Naming conventions\<close>
   104 subsection \<open>Naming conventions\<close>
   105 
   105 
   106 text \<open>Since ML is the primary medium to express the meaning of the
   106 text \<open>Since ML is the primary medium to express the meaning of the
   107   source text, naming of ML entities requires special care.
   107   source text, naming of ML entities requires special care.\<close>
   108 
   108 
   109   \paragraph{Notation.}  A name consists of 1--3 \<^emph>\<open>words\<close> (rarely
   109 paragraph \<open>Notation.\<close>
   110   4, but not more) that are separated by underscore.  There are three
   110 text \<open>A name consists of 1--3 \<^emph>\<open>words\<close> (rarely 4, but not more) that are
   111   variants concerning upper or lower case letters, which are used for
   111   separated by underscore. There are three variants concerning upper or lower
   112   certain ML categories as follows:
   112   case letters, which are used for certain ML categories as follows:
   113 
   113 
   114   \<^medskip>
   114   \<^medskip>
   115   \begin{tabular}{lll}
   115   \begin{tabular}{lll}
   116   variant & example & ML categories \\\hline
   116   variant & example & ML categories \\\hline
   117   lower-case & @{ML_text foo_bar} & values, types, record fields \\
   117   lower-case & @{ML_text foo_bar} & values, types, record fields \\
   134 
   134 
   135   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
   135   Name variants are produced by adding 1--3 primes, e.g.\ @{ML_text
   136   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
   136   foo'}, @{ML_text foo''}, or @{ML_text foo'''}, but not @{ML_text
   137   foo''''} or more.  Decimal digits scale better to larger numbers,
   137   foo''''} or more.  Decimal digits scale better to larger numbers,
   138   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
   138   e.g.\ @{ML_text foo0}, @{ML_text foo1}, @{ML_text foo42}.
   139 
   139 \<close>
   140   \paragraph{Scopes.}  Apart from very basic library modules, ML
   140 
   141   structures are not ``opened'', but names are referenced with
   141 paragraph\<open>Scopes.\<close>
   142   explicit qualification, as in @{ML Syntax.string_of_term} for
   142 text \<open>Apart from very basic library modules, ML structures are not ``opened'',
   143   example.  When devising names for structures and their components it
   143   but names are referenced with explicit qualification, as in @{ML
   144   is important to aim at eye-catching compositions of both parts, because
   144   Syntax.string_of_term} for example. When devising names for structures and
   145   this is how they are seen in the sources and documentation.  For the
   145   their components it is important to aim at eye-catching compositions of both
   146   same reasons, aliases of well-known library functions should be
   146   parts, because this is how they are seen in the sources and documentation.
       
   147   For the same reasons, aliases of well-known library functions should be
   147   avoided.
   148   avoided.
   148 
   149 
   149   Local names of function abstraction or case/let bindings are
   150   Local names of function abstraction or case/let bindings are
   150   typically shorter, sometimes using only rudiments of ``words'',
   151   typically shorter, sometimes using only rudiments of ``words'',
   151   while still avoiding cryptic shorthands.  An auxiliary function
   152   while still avoiding cryptic shorthands.  An auxiliary function
   178 
   179 
   179   fun print_foo ctxt foo =
   180   fun print_foo ctxt foo =
   180     let
   181     let
   181       fun aux t = ... string_of_term ctxt t ...
   182       fun aux t = ... string_of_term ctxt t ...
   182     in ... end;\<close>}
   183     in ... end;\<close>}
   183 
   184 \<close>
   184 
   185 
   185   \paragraph{Specific conventions.} Here are some specific name forms
   186 paragraph \<open>Specific conventions.\<close>
   186   that occur frequently in the sources.
   187 text \<open>Here are some specific name forms that occur frequently in the sources.
   187 
   188 
   188   \<^item> A function that maps @{ML_text foo} to @{ML_text bar} is
   189   \<^item> A function that maps @{ML_text foo} to @{ML_text bar} is
   189   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
   190   called @{ML_text foo_to_bar} or @{ML_text bar_of_foo} (never
   190   @{ML_text foo2bar}, nor @{ML_text bar_from_foo}, nor @{ML_text
   191   @{ML_text foo2bar}, nor @{ML_text bar_from_foo}, nor @{ML_text
   191   bar_for_foo}, nor @{ML_text bar4foo}).
   192   bar_for_foo}, nor @{ML_text bar4foo}).
   272 subsection \<open>General source layout\<close>
   273 subsection \<open>General source layout\<close>
   273 
   274 
   274 text \<open>
   275 text \<open>
   275   The general Isabelle/ML source layout imitates regular type-setting
   276   The general Isabelle/ML source layout imitates regular type-setting
   276   conventions, augmented by the requirements for deeply nested expressions
   277   conventions, augmented by the requirements for deeply nested expressions
   277   that are commonplace in functional programming.
   278   that are commonplace in functional programming.\<close>
   278 
   279 
   279   \paragraph{Line length} is limited to 80 characters according to ancient
   280 paragraph \<open>Line length\<close>
   280   standards, but we allow as much as 100 characters (not
   281 text \<open>is limited to 80 characters according to ancient standards, but we allow
   281   more).\footnote{Readability requires to keep the beginning of a line
   282   as much as 100 characters (not more).\footnote{Readability requires to keep
   282   in view while watching its end.  Modern wide-screen displays do not
   283   the beginning of a line in view while watching its end. Modern wide-screen
   283   change the way how the human brain works.  Sources also need to be
   284   displays do not change the way how the human brain works. Sources also need
   284   printable on plain paper with reasonable font-size.} The extra 20
   285   to be printable on plain paper with reasonable font-size.} The extra 20
   285   characters acknowledge the space requirements due to qualified
   286   characters acknowledge the space requirements due to qualified library
   286   library references in Isabelle/ML.
   287   references in Isabelle/ML.\<close>
   287 
   288 
   288   \paragraph{White-space} is used to emphasize the structure of
   289 paragraph \<open>White-space\<close>
   289   expressions, following mostly standard conventions for mathematical
   290 text \<open>is used to emphasize the structure of expressions, following mostly
   290   typesetting, as can be seen in plain {\TeX} or {\LaTeX}.  This
   291   standard conventions for mathematical typesetting, as can be seen in plain
   291   defines positioning of spaces for parentheses, punctuation, and
   292   {\TeX} or {\LaTeX}. This defines positioning of spaces for parentheses,
   292   infixes as illustrated here:
   293   punctuation, and infixes as illustrated here:
   293 
   294 
   294   @{verbatim [display]
   295   @{verbatim [display]
   295 \<open>  val x = y + z * (a + b);
   296 \<open>  val x = y + z * (a + b);
   296   val pair = (a, b);
   297   val pair = (a, b);
   297   val record = {foo = 1, bar = 2};\<close>}
   298   val record = {foo = 1, bar = 2};\<close>}
   321   Note that the space between @{ML_text g} and the pair @{ML_text
   322   Note that the space between @{ML_text g} and the pair @{ML_text
   322   "(a, b)"} follows the important principle of
   323   "(a, b)"} follows the important principle of
   323   \<^emph>\<open>compositionality\<close>: the layout of @{ML_text "g p"} does not
   324   \<^emph>\<open>compositionality\<close>: the layout of @{ML_text "g p"} does not
   324   change when @{ML_text "p"} is refined to the concrete pair
   325   change when @{ML_text "p"} is refined to the concrete pair
   325   @{ML_text "(a, b)"}.
   326   @{ML_text "(a, b)"}.
   326 
   327 \<close>
   327   \paragraph{Indentation} uses plain spaces, never hard
   328 
   328   tabulators.\footnote{Tabulators were invented to move the carriage
   329 paragraph \<open>Indentation\<close>
   329   of a type-writer to certain predefined positions.  In software they
   330 text \<open>uses plain spaces, never hard tabulators.\footnote{Tabulators were
   330   could be used as a primitive run-length compression of consecutive
   331   invented to move the carriage of a type-writer to certain predefined
   331   spaces, but the precise result would depend on non-standardized
   332   positions. In software they could be used as a primitive run-length
   332   text editor configuration.}
   333   compression of consecutive spaces, but the precise result would depend on
       
   334   non-standardized text editor configuration.}
   333 
   335 
   334   Each level of nesting is indented by 2 spaces, sometimes 1, very
   336   Each level of nesting is indented by 2 spaces, sometimes 1, very
   335   rarely 4, never 8 or any other odd number.
   337   rarely 4, never 8 or any other odd number.
   336 
   338 
   337   Indentation follows a simple logical format that only depends on the
   339   Indentation follows a simple logical format that only depends on the
   365 
   367 
   366   \<^medskip>
   368   \<^medskip>
   367   For similar reasons, any kind of two-dimensional or tabular
   369   For similar reasons, any kind of two-dimensional or tabular
   368   layouts, ASCII-art with lines or boxes of asterisks etc.\ should be
   370   layouts, ASCII-art with lines or boxes of asterisks etc.\ should be
   369   avoided.
   371   avoided.
   370 
   372 \<close>
   371   \paragraph{Complex expressions} that consist of multi-clausal
   373 
   372   function definitions, @{ML_text handle}, @{ML_text case},
   374 paragraph \<open>Complex expressions\<close>
   373   @{ML_text let} (and combinations) require special attention.  The
   375 
   374   syntax of Standard ML is quite ambitious and admits a lot of
   376 text \<open>that consist of multi-clausal function definitions, @{ML_text handle},
       
   377   @{ML_text case}, @{ML_text let} (and combinations) require special
       
   378   attention. The syntax of Standard ML is quite ambitious and admits a lot of
   375   variance that can distort the meaning of the text.
   379   variance that can distort the meaning of the text.
   376 
   380 
   377   Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
   381   Multiple clauses of @{ML_text fun}, @{ML_text fn}, @{ML_text handle},
   378   @{ML_text case} get extra indentation to indicate the nesting
   382   @{ML_text case} get extra indentation to indicate the nesting
   379   clearly.  Example:
   383   clearly.  Example:
  1081 
  1085 
  1082 text \<open>The Standard ML semantics of strict functional evaluation
  1086 text \<open>The Standard ML semantics of strict functional evaluation
  1083   together with exceptions is rather well defined, but some delicate
  1087   together with exceptions is rather well defined, but some delicate
  1084   points need to be observed to avoid that ML programs go wrong
  1088   points need to be observed to avoid that ML programs go wrong
  1085   despite static type-checking.  Exceptions in Isabelle/ML are
  1089   despite static type-checking.  Exceptions in Isabelle/ML are
  1086   subsequently categorized as follows.
  1090   subsequently categorized as follows.\<close>
  1087 
  1091 
  1088   \paragraph{Regular user errors.}  These are meant to provide
  1092 paragraph \<open>Regular user errors.\<close>
  1089   informative feedback about malformed input etc.
  1093 text \<open>These are meant to provide informative feedback about malformed input
       
  1094   etc.
  1090 
  1095 
  1091   The \<^emph>\<open>error\<close> function raises the corresponding @{ML ERROR}
  1096   The \<^emph>\<open>error\<close> function raises the corresponding @{ML ERROR}
  1092   exception, with a plain text message as argument.  @{ML ERROR}
  1097   exception, with a plain text message as argument.  @{ML ERROR}
  1093   exceptions can be handled internally, in order to be ignored, turned
  1098   exceptions can be handled internally, in order to be ignored, turned
  1094   into other exceptions, or cascaded by appending messages.  If the
  1099   into other exceptions, or cascaded by appending messages.  If the
  1103   Grammatical correctness of error messages can be improved by
  1108   Grammatical correctness of error messages can be improved by
  1104   \<^emph>\<open>omitting\<close> final punctuation: messages are often concatenated
  1109   \<^emph>\<open>omitting\<close> final punctuation: messages are often concatenated
  1105   or put into a larger context (e.g.\ augmented with source position).
  1110   or put into a larger context (e.g.\ augmented with source position).
  1106   Note that punctuation after formal entities (types, terms, theorems) is
  1111   Note that punctuation after formal entities (types, terms, theorems) is
  1107   particularly prone to user confusion.
  1112   particularly prone to user confusion.
  1108 
  1113 \<close>
  1109   \paragraph{Program failures.}  There is a handful of standard
  1114 
  1110   exceptions that indicate general failure situations, or failures of
  1115 paragraph \<open>Program failures.\<close>
  1111   core operations on logical entities (types, terms, theorems,
  1116 text \<open>There is a handful of standard exceptions that indicate general failure
  1112   theories, see \chref{ch:logic}).
  1117   situations, or failures of core operations on logical entities (types,
       
  1118   terms, theorems, theories, see \chref{ch:logic}).
  1113 
  1119 
  1114   These exceptions indicate a genuine breakdown of the program, so the
  1120   These exceptions indicate a genuine breakdown of the program, so the
  1115   main purpose is to determine quickly what has happened where.
  1121   main purpose is to determine quickly what has happened where.
  1116   Traditionally, the (short) exception message would include the name
  1122   Traditionally, the (short) exception message would include the name
  1117   of an ML function, although this is no longer necessary, because the
  1123   of an ML function, although this is no longer necessary, because the
  1123   exceptions locally, e.g.\ to organize internal failures robustly
  1129   exceptions locally, e.g.\ to organize internal failures robustly
  1124   without overlapping with existing exceptions.  Exceptions that are
  1130   without overlapping with existing exceptions.  Exceptions that are
  1125   exposed in module signatures require extra care, though, and should
  1131   exposed in module signatures require extra care, though, and should
  1126   \<^emph>\<open>not\<close> be introduced by default.  Surprise by users of a module
  1132   \<^emph>\<open>not\<close> be introduced by default.  Surprise by users of a module
  1127   can be often minimized by using plain user errors instead.
  1133   can be often minimized by using plain user errors instead.
  1128 
  1134 \<close>
  1129   \paragraph{Interrupts.}  These indicate arbitrary system events:
  1135 
  1130   both the ML runtime system and the Isabelle/ML infrastructure signal
  1136 paragraph \<open>Interrupts.\<close>
  1131   various exceptional situations by raising the special
  1137 text \<open>These indicate arbitrary system events: both the ML runtime system and
  1132   @{ML Exn.Interrupt} exception in user code.
  1138   the Isabelle/ML infrastructure signal various exceptional situations by
       
  1139   raising the special @{ML Exn.Interrupt} exception in user code.
  1133 
  1140 
  1134   This is the one and only way that physical events can intrude an Isabelle/ML
  1141   This is the one and only way that physical events can intrude an Isabelle/ML
  1135   program. Such an interrupt can mean out-of-memory, stack overflow, timeout,
  1142   program. Such an interrupt can mean out-of-memory, stack overflow, timeout,
  1136   internal signaling of threads, or a POSIX process signal. An Isabelle/ML
  1143   internal signaling of threads, or a POSIX process signal. An Isabelle/ML
  1137   program that intercepts interrupts becomes dependent on physical effects of
  1144   program that intercepts interrupts becomes dependent on physical effects of
  1290   @{ML "Symbol.Sym"}, @{ML "Symbol.Control"}, @{ML "Symbol.Raw"},
  1297   @{ML "Symbol.Sym"}, @{ML "Symbol.Control"}, @{ML "Symbol.Raw"},
  1291   @{ML "Symbol.Malformed"}.
  1298   @{ML "Symbol.Malformed"}.
  1292 
  1299 
  1293   \<^descr> @{ML "Symbol.decode"} converts the string representation of a
  1300   \<^descr> @{ML "Symbol.decode"} converts the string representation of a
  1294   symbol into the datatype version.
  1301   symbol into the datatype version.
  1295 
  1302 \<close>
  1296 
  1303 
  1297   \paragraph{Historical note.} In the original SML90 standard the
  1304 paragraph \<open>Historical note.\<close>
  1298   primitive ML type @{ML_type char} did not exists, and @{ML_text
  1305 text \<open>In the original SML90 standard the primitive ML type @{ML_type char} did
  1299   "explode: string -> string list"} produced a list of singleton
  1306   not exists, and @{ML_text "explode: string -> string list"} produced a list
  1300   strings like @{ML "raw_explode: string -> string list"} in
  1307   of singleton strings like @{ML "raw_explode: string -> string list"} in
  1301   Isabelle/ML today.  When SML97 came out, Isabelle did not adopt its
  1308   Isabelle/ML today. When SML97 came out, Isabelle did not adopt its somewhat
  1302   somewhat anachronistic 8-bit or 16-bit characters, but the idea of
  1309   anachronistic 8-bit or 16-bit characters, but the idea of exploding a string
  1303   exploding a string into a list of small strings was extended to
  1310   into a list of small strings was extended to ``symbols'' as explained above.
  1304   ``symbols'' as explained above.  Thus Isabelle sources can refer to
  1311   Thus Isabelle sources can refer to an infinite store of user-defined
  1305   an infinite store of user-defined symbols, without having to worry
  1312   symbols, without having to worry about the multitude of Unicode encodings
  1306   about the multitude of Unicode encodings that have emerged over the
  1313   that have emerged over the years.\<close>
  1307   years.\<close>
       
  1308 
  1314 
  1309 
  1315 
  1310 section \<open>Basic data types\<close>
  1316 section \<open>Basic data types\<close>
  1311 
  1317 
  1312 text \<open>The basis library proposal of SML97 needs to be treated with
  1318 text \<open>The basis library proposal of SML97 needs to be treated with