changed MaSh to use SML version instead of Python version of naive Bayes by default (i.e. if MASH=yes in the settings, or 'fact_filter=mash' with no other explicit setting)
authorblanchet
Wed, 28 May 2014 17:42:34 +0200
changeset 57107 2d502370ee99
parent 57106 52e1e424eec1
child 57108 dc0b4f50e288
changed MaSh to use SML version instead of Python version of naive Bayes by default (i.e. if MASH=yes in the settings, or 'fact_filter=mash' with no other explicit setting)
src/Doc/Sledgehammer/document/root.tex
src/HOL/Tools/Sledgehammer/sledgehammer_mash.ML
--- a/src/Doc/Sledgehammer/document/root.tex	Wed May 28 17:42:33 2014 +0200
+++ b/src/Doc/Sledgehammer/document/root.tex	Wed May 28 17:42:34 2014 +0200
@@ -1037,7 +1037,7 @@
 interactions recorded in the name of science, please enable this feature and send the statistics
 file every now and then to the author of this manual (\authoremail).
 To change the default value of this option globally, set the environment variable
-\texttt{SLEDGEHAMMER\_SPY} to \texttt{yes}.
+\texttt{SLEDGEHAMMER\_SPY} to \textit{yes}.
 
 \nopagebreak
 {\small See also \textit{debug} (\S\ref{output-format}).}
@@ -1070,43 +1070,44 @@
 The experimental MaSh machine learner. Three learning engines are provided:
 
 \begin{enum}
-\item[\labelitemi] \textbf{\textit{sml\_nb}} (also called \textbf{\textit{sml}})
-is a Standard ML implementation of naive Bayes.
+\item[\labelitemi] \textbf{\textit{sml\_nb}} (also called \textbf{\textit{sml}}
+and \textbf{\textit{yes}}) is a Standard ML implementation of naive Bayes.
 
 \item[\labelitemi] \textbf{\textit{sml\_knn}} is a Standard ML implementation of
 $k$-nearest neighbors.
 
-\item[\labelitemi] \textbf{\textit{py}} (also called \textbf{\textit{yes}}) is a
-Python implementation of naive Bayes. The program is included with Isabelle as
-\texttt{mash.py}.
+\item[\labelitemi] \textbf{\textit{py}} is a Python implementation of naive Bayes.
+The program is included with Isabelle as \texttt{mash.py}.
 \end{enum}
 
-To enable MaSh, set the variable \texttt{MASH} to the name of the desired
+The engine can be selected by setting \texttt{MASH} to the name of the desired
 engine---either in the environment in which Isabelle is launched, in your
 \texttt{\$ISABELLE\_HOME\_USER/etc/settings} file, or via the ``MaSh'' option
 under ``Plugins > Plugin Options > Isabelle > General'' in Isabelle/jEdit.
 Persistent data for both engines is stored in the directory
 \texttt{\$ISABELLE\_HOME\_USER/mash}. When switching to the \textit{py} engine,
-it is recommended to invoke the \textit{relearn\_isar} subcommand
-(\S\ref{sledgehammer}) to synchronize the Python persistent databases.
+you will need to invoke the \textit{relearn\_isar} subcommand
+(\S\ref{sledgehammer}) to synchronize the persistent databases on the
+Python side.
 
 \item[\labelitemi] \textbf{\textit{mesh}:} The MeSh filter, which combines the
 rankings from MePo and MaSh.
 
-\item[\labelitemi] \textbf{\textit{smart}:} A mixture of MePo, MaSh, and MeSh if
-MaSh is enabled; otherwise, MePo.
+\item[\labelitemi] \textbf{\textit{smart}:} A combination of MePo, MaSh, and
+MeSh if MaSh is a learning engine has been specified (as explained above);
+otherwise, MePo.
 \end{enum}
 
+The behavior of \textit{smart} ensures that MaSh is not used by default,
+reflecting the experimental nature of the filter. Nevertheless, there is ample
+empirical evidence in favor of a combination of MePo, MaSh, and MeSh, and you
+are warmly encouraged to try it out.
+
 \opdefault{max\_facts}{smart\_int}{smart}
 Specifies the maximum number of facts that may be returned by the relevance
 filter. If the option is set to \textit{smart} (the default), it effectively
 takes a value that was empirically found to be appropriate for the prover.
-Typical values range between 50 and 1000.
-
-For the MaSh-related subcommands \textit{learn\_isar}, \textit{learn\_prover},
-\textit{relearn\_isar}, and \textit{relearn\_prover} (\S\ref{sledgehammer}),
-this option specifies the maximum number of facts from the background library
-that should be learned ($\infty$ by default).
+Typical values lie between 50 and 1000.
 
 \opdefault{fact\_thresholds}{float\_pair}{\upshape 0.45~0.85}
 Specifies the thresholds above which facts are considered relevant by the
--- a/src/HOL/Tools/Sledgehammer/sledgehammer_mash.ML	Wed May 28 17:42:33 2014 +0200
+++ b/src/HOL/Tools/Sledgehammer/sledgehammer_mash.ML	Wed May 28 17:42:34 2014 +0200
@@ -140,7 +140,7 @@
 fun mash_engine () =
   let val flag1 = Options.default_string @{system_option MaSh} in
     (case if flag1 <> "none" (* default *) then flag1 else getenv "MASH" of
-      "yes" => SOME MaSh_Py
+      "yes" => SOME MaSh_SML_NB
     | "py" => SOME MaSh_Py
     | "sml" => SOME MaSh_SML_NB
     | "sml_knn" => SOME MaSh_SML_kNN
@@ -149,10 +149,10 @@
   end
 
 val is_mash_enabled = is_some o mash_engine
-val the_mash_engine = the_default MaSh_Py o mash_engine
+val the_mash_engine = the_default MaSh_SML_NB o mash_engine
 
 
-(*** Low-level communication with Python version of MaSh ***)
+(*** Low-level communication with the Python version of MaSh ***)
 
 val save_models_arg = "--saveModels"
 val shutdown_server_arg = "--shutdownServer"
@@ -1634,8 +1634,7 @@
           SOME ff => (ff <> mepoN andalso maybe_learn (), ff)
         | NONE =>
           if is_mash_enabled () then
-            (maybe_learn (),
-             if mash_can_suggest_facts ctxt overlord then meshN else mepoN)
+            (maybe_learn (), if mash_can_suggest_facts ctxt overlord then meshN else mepoN)
           else
             (false, mepoN))