isabelle: doc-src/Sledgehammer/sledgehammer.tex@94c69e441440


\documentclass[a4paper,12pt]{article}
\usepackage[T1]{fontenc}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage[english,french]{babel}
\usepackage{color}
\usepackage{footmisc}
\usepackage{graphicx}
%\usepackage{mathpazo}
\usepackage{multicol}
\usepackage{stmaryrd}
%\usepackage[scaled=.85]{beramono}
\usepackage{../../lib/texinputs/isabelle,../iman,../pdfsetup}

%\oddsidemargin=4.6mm
%\evensidemargin=4.6mm
%\textwidth=150mm
%\topmargin=4.6mm
%\headheight=0mm
%\headsep=0mm
%\textheight=234mm

\def\Colon{\mathord{:\mkern-1.5mu:}}
%\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}
%\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}
\def\lparr{\mathopen{(\mkern-4mu\mid}}
\def\rparr{\mathclose{\mid\mkern-4mu)}}

\def\unk{{?}}
\def\undef{(\lambda x.\; \unk)}
%\def\unr{\textit{others}}
\def\unr{\ldots}
\def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}
\def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}

\urlstyle{tt}

\begin{document}

\selectlanguage{english}

\title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]
Hammering Away \\[\smallskipamount]
\Large A User's Guide to Sledgehammer for Isabelle/HOL}
\author{\hbox{} \\
Jasmin Christian Blanchette \\
{\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\
\hbox{}}

\maketitle

\tableofcontents

\setlength{\parskip}{.7em plus .2em minus .1em}
\setlength{\parindent}{0pt}
\setlength{\abovedisplayskip}{\parskip}
\setlength{\abovedisplayshortskip}{.9\parskip}
\setlength{\belowdisplayskip}{\parskip}
\setlength{\belowdisplayshortskip}{.9\parskip}

% General-purpose enum environment with correct spacing
\newenvironment{enum}%
    {\begin{list}{}{%
        \setlength{\topsep}{.1\parskip}%
        \setlength{\partopsep}{.1\parskip}%
        \setlength{\itemsep}{\parskip}%
        \advance\itemsep by-\parsep}}
    {\end{list}}

\def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin
\advance\rightskip by\leftmargin}
\def\post{\vskip0pt plus1ex\endgroup}

\def\prew{\pre\advance\rightskip by-\leftmargin}
\def\postw{\post}

\section{Introduction}
\label{introduction}

Sledgehammer is a tool that applies first-order automatic theorem provers (ATPs)
and satisfiability-modulo-theories (SMT) solvers on the current goal. The
supported ATPs are E \cite{schulz-2002}, SPASS \cite{weidenbach-et-al-2009},
Vampire \cite{riazanov-voronkov-2002}, SInE-E \cite{sine}, SNARK
\cite{snark}, and ToFoF-E \cite{tofof}. The ATPs are run either locally or
remotely via the System\-On\-TPTP web service \cite{sutcliffe-2000}. In addition
to the ATPs, the SMT solvers Z3 \cite{z3} is used by default, and you can tell
Sledgehammer to try Yices \cite{yices} and CVC3 \cite{cvc3} as well; these
are run either locally or on a server in Munich.

The problem passed to the automatic provers consists of your current goal
together with a heuristic selection of hundreds of facts (theorems) from the
current theory context, filtered by relevance. Because jobs are run in the
background, you can continue to work on your proof by other means. Provers can
be run in parallel. Any reply (which may arrive half a minute later) will appear
in the Proof General response buffer.

The result of a successful proof search is some source text that usually (but
not always) reconstructs the proof within Isabelle. For ATPs, the reconstructed
proof relies on the general-purpose Metis prover \cite{metis}, which is fully
integrated into Isabelle/HOL, with explicit inferences going through the kernel.
Thus its results are correct by construction.

In this manual, we will explicitly invoke the \textbf{sledgehammer} command.
Sledgehammer also provides an automatic mode that can be enabled via the
``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof General. In
this mode, Sledgehammer is run on every newly entered theorem. The time limit
for Auto Sledgehammer and other automatic tools can be set using the ``Auto
Tools Time Limit'' option.

\newbox\boxA
\setbox\boxA=\hbox{\texttt{nospam}}

\newcommand\authoremail{\texttt{blan{\color{white}nospam}\kern-\wd\boxA{}chette@\allowbreak
in.\allowbreak tum.\allowbreak de}}

To run Sledgehammer, you must make sure that the theory \textit{Sledgehammer} is
imported---this is rarely a problem in practice since it is part of
\textit{Main}. Examples of Sledgehammer use can be found in Isabelle's
\texttt{src/HOL/Metis\_Examples} directory.
Comments and bug reports concerning Sledgehammer or this manual should be
directed to \authoremail.

\vskip2.5\smallskipamount

%\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for
%suggesting several textual improvements.

\section{Installation}
\label{installation}

Sledgehammer is part of Isabelle, so you don't need to install it. However, it
relies on third-party automatic theorem provers (ATPs) and SMT solvers.

\subsection{Installing ATPs}

Currently, E, SPASS, and Vampire can be run locally; in addition, E, Vampire,
SInE-E, SNARK, and ToFoF-E are available remotely via System\-On\-TPTP
\cite{sutcliffe-2000}. If you want better performance, you should at least
install E and SPASS locally.

There are three main ways to install ATPs on your machine:

\begin{enum}
\item[$\bullet$] If you installed an official Isabelle package with everything
inside, it should already include properly setup executables for E and SPASS,
ready to use.%
\footnote{Vampire's license prevents us from doing the same for this otherwise
wonderful tool.}

\item[$\bullet$] Alternatively, you can download the Isabelle-aware E and SPASS
binary packages from Isabelle's download page. Extract the archives, then add a
line to your \texttt{\$ISABELLE\_HOME\_USER/etc/components}%
\footnote{The variable \texttt{\$ISABELLE\_HOME\_USER} is set by Isabelle at
startup. Its value can be retrieved by invoking \texttt{isabelle}
\texttt{getenv} \texttt{ISABELLE\_HOME\_USER} on the command line.}
file with the absolute
path to E or SPASS. For example, if the \texttt{components} does not exist yet
and you extracted SPASS to \texttt{/usr/local/spass-3.7}, create the
\texttt{components} file with the single line

\prew
\texttt{/usr/local/spass-3.7}
\postw

in it.

\item[$\bullet$] If you prefer to build E or SPASS yourself, or obtained a
Vampire executable from somewhere (e.g., \url{http://www.vprover.org/}),
set the environment variable \texttt{E\_HOME}, \texttt{SPASS\_HOME}, or
\texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{eproof},
\texttt{SPASS}, or \texttt{vampire} executable. Sledgehammer has been tested
with E 1.0 and 1.2, SPASS 3.5 and 3.7, and Vampire 0.6 and 1.0%
\footnote{Following the rewrite of Vampire, the counter for version numbers was
reset to 0; hence the (new) Vampire versions 0.6 and 1.0 are more recent than,
say, Vampire 11.5.}%
. Since the ATPs' output formats are neither documented nor stable, other
versions of the ATPs might or might not work well with Sledgehammer. Ideally,
also set \texttt{E\_VERSION}, \texttt{SPASS\_VERSION}, or
\texttt{VAMPIRE\_VERSION} to the ATP's version number (e.g., ``1.2'').
\end{enum}

To check whether E and SPASS are successfully installed, follow the example in
\S\ref{first-steps}. If the remote versions of E and SPASS are used (identified
by the prefix ``\emph{remote\_}''), or if the local versions fail to solve the
easy goal presented there, this is a sign that something is wrong with your
installation.

Remote ATP invocation via the SystemOnTPTP web service requires Perl with the
World Wide Web Library (\texttt{libwww-perl}) installed. If you must use a proxy
server to access the Internet, set the \texttt{http\_proxy} environment variable
to the proxy, either in the environment in which Isabelle is launched or in your
\texttt{\char`\~/\$ISABELLE\_HOME\_USER/etc/settings} file. Here are a few examples:

\prew
\texttt{http\_proxy=http://proxy.example.org} \\
\texttt{http\_proxy=http://proxy.example.org:8080} \\
\texttt{http\_proxy=http://joeblow:pAsSwRd@proxy.example.org}
\postw

\subsection{Installing SMT Solvers}

CVC3, Yices, and Z3 can be run locally or remotely on a Munich server. If you
want better performance and get the ability to replay proofs that rely on the
\emph{smt} proof method, you should at least install Z3 locally.

There are two main ways of installing SMT solvers locally.

\begin{enum}
\item[$\bullet$] If you installed an official Isabelle package with everything
inside, it should already include properly setup executables for CVC3 and Z3,
ready to use.%
\footnote{Yices's license prevents us from doing the same for this otherwise
wonderful tool.}
For Z3, you additionally need to set the environment variable
\texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a noncommercial
user.

\item[$\bullet$] Otherwise, follow the instructions documented in the \emph{SMT}
theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}).
\end{enum}

\section{First Steps}
\label{first-steps}

To illustrate Sledgehammer in context, let us start a theory file and
attempt to prove a simple lemma:

\prew
\textbf{theory}~\textit{Scratch} \\
\textbf{imports}~\textit{Main} \\
\textbf{begin} \\[2\smallskipamount]
%
\textbf{lemma} ``$[a] = [b] \,\longleftrightarrow\, a = b$'' \\
\textbf{sledgehammer}
\postw

Instead of issuing the \textbf{sledgehammer} command, you can also find
Sledgehammer in the ``Commands'' submenu of the ``Isabelle'' menu in Proof
General or press the Emacs key sequence C-c C-a C-s.
Either way, Sledgehammer produces the following output after a few seconds:

\prew
\slshape
Sledgehammer: ``\textit{e}'' for subgoal 1: \\
$([a] = [b]) = (a = b)$ \\
Try this command: \textbf{by} (\textit{metis hd.simps}). \\
To minimize the number of lemmas, try this: \\
\textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{e}] (\textit{hd.simps}). \\[3\smallskipamount]
%
Sledgehammer: ``\textit{spass}'' for subgoal 1: \\
$([a] = [b]) = (a = b)$ \\
Try this command: \textbf{by} (\textit{metis insert\_Nil last\_ConsL}). \\
To minimize the number of lemmas, try this: \\
\textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{spass}] (\textit{insert\_Nil last\_ConsL}). \\[3\smallskipamount]
%
Sledgehammer: ``\textit{vampire}'' for subgoal 1: \\
$([a] = [b]) = (a = b)$ \\
Try this command: \textbf{by} (\textit{metis eq\_commute last\_snoc}) \\
To minimize the number of lemmas, try this: \\
\textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{vampire}]~(\textit{eq\_commute last\_snoc}). \\[3\smallskipamount]
%
Sledgehammer: ``\textit{remote\_sine\_e}'' for subgoal 1: \\
$([a] = [b]) = (a = b)$ \\
Try this command: \textbf{by} (\textit{metis hd.simps}) \\
To minimize the number of lemmas, try this: \\
\textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{remote\_sine\_e}]~(\textit{hd.simps}).
%
Sledgehammer: ``\textit{remote\_z3}'' for subgoal 1: \\
$([a] = [b]) = (a = b)$ \\
Try this command: \textbf{by} (\textit{metis hd.simps}) \\
To minimize the number of lemmas, try this: \\
\textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{remote\_sine\_e}]~(\textit{hd.simps}).
\postw

Sledgehammer ran E, SPASS, Vampire, SInE-E, and Z3 in parallel. Depending on
which provers are installed and how many processor cores are available, some of
the provers might be missing or present with a \textit{remote\_} prefix.

For each successful prover, Sledgehammer gives a one-liner proof that uses the
\textit{metis} or \textit{smt} method. You can click the proof to insert it into
the theory text. You can click the ``\textbf{sledgehammer} \textit{minimize}''
command if you want to look for a shorter (and probably faster) proof. But here
the proof found by E looks perfect, so click it to finish the proof.

You can ask Sledgehammer for an Isar text proof by passing the
\textit{isar\_proof} option:

\prew
\textbf{sledgehammer} [\textit{isar\_proof}]
\postw

When Isar proof construction is successful, it can yield proofs that are more
readable and also faster than the \textit{metis} one-liners. This feature is
experimental and is only available for ATPs.

\section{Hints}
\label{hints}

\newcommand\point[1]{{\sl\bfseries#1}\par\nopagebreak}

\point{Presimplify the goal}

For best results, first simplify your problem by calling \textit{auto} or at
least \textit{safe} followed by \textit{simp\_all}. None of the ATPs contain
arithmetic decision procedures. They are not especially good at heavy rewriting,
but because they regard equations as undirected, they often prove theorems that
require the reverse orientation of a \textit{simp} rule. Higher-order problems
can be tackled, but the success rate is better for first-order problems. Hence,
you may get better results if you first simplify the problem to remove
higher-order features.

\point{Make sure at least E, SPASS, Vampire, and Z3 are installed}

Locally installed provers are faster and more reliable than those running on
servers. See \S\ref{installation} for details on how to install them.

\point{Familiarize yourself with the most important options}

Sledgehammer's options are fully documented in \S\ref{command-syntax}. Many of
the options are very specialized, but serious users of the tool should at least
familiarize themselves with the following options:

\begin{enum}
\item[$\bullet$] \textbf{\textit{provers}} specifies the ATP and SMT solvers to
use (e.g., ``\textit{provers} = \textit{e spass remote\_vampire}'').

\item[$\bullet$] \textbf{\textit{timeout}} controls the time limit. It is set to
30 seconds, but since Sledgehammer runs asynchronously you should not hesitate
to crank up this limit to 60 or 120 seconds if you are the kind of user who can
think clearly while ATPs are active.

\item[$\bullet$] \textbf{\textit{full\_types}} specifies whether type-sound
encodings should be used. By default, Sledgehammer employs a mixture of
type-sound and type-unsound encodings, occasionally yielding unsound ATP proofs.
(SMT solver proofs should always be sound, although we occasionally find
soundness bugs in the solvers.)

\item[$\bullet$] \textbf{\textit{max\_relevant}} specifies the maximum number of
facts that should be passed to the provers. By default, the value is
prover-dependent but varies between about 150 and 1000. If the provers time out,
you can try lowering this value to, say, 100 or 50 and see if that helps.

\item[$\bullet$] \textbf{\textit{isar\_proof}} specifies that Isar proofs should
be generated, instead of one-liner Metis proofs. The length of the Isar proofs
can be controlled by setting \textit{isar\_shrink\_factor}.
\end{enum}

Options can be set globally using \textbf{sledgehammer\_params}. Fact selection
can be influenced by specifying ``$(\textit{add}{:}~\textit{some\_facts})$'' after
the \textbf{sledgehammer} call to ensure that certain facts are included, or
simply ``$(\textit{some\_facts})$'' to force Sledgehammer to run only with
$\textit{some\_facts}$.

\section{Frequently Asked Questions}
\label{frequently-asked-questions}

\point{Why does Metis fail to reconstruct the proof?}

There can be many reasons. If Metis runs seemingly forever, that's a sign that
the proof is too difficult for it. Metis is complete, so it should eventually
find it, but that's little consolation. There are several possible solutions:

\begin{enum}
\item[$\bullet$] Try the \textit{isar\_proof} option to obtain a step-by-step
Isar proof where each step is justified by Metis. Since the steps are fairly
small, Metis is more likely to be able to replay them.

\item[$\bullet$] Try the \textit{smt} proof method instead of \textit{metis}. It
is usually stronger, but you need to have Z3 available to replay the proofs,
trust the SMT solver, or use certificates. See the documentation in the
\emph{SMT} theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.

\item[$\bullet$] Try the \textit{blast} or \textit{auto} proof methods, passing
facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},
\textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
\end{enum}

%    * sometimes Metis runs into some error, e.g. a type error. then it tries
%      again with metisFT, where FT stands for ``full type information'
%    * metisFT is much slower, but its proof search is fully typed, and it also
%      includes more powerful rules such as the axiom ``$x = \mathit{True}
%      \mathrel{\lor} x = \mathit{False}$'' for reasoning in higher-order places
%      (e.g., in set comprehensions)
%
%    * finally, in some cases the ATP proof is simply type-incorrect.
%      Sledgehammer drops some type information to speed up the search. Try
%      Sledgehammer again with full type information: \textit{full\_types}
%      (\S\ref{problem-encoding}), or choose a specific type encoding with
%      \textit{type\_sys} (\S\ref{problem-encoding}). Older versions of
%      Sledgehammer were frequent victims of this problem. Now this should very
%      seldom be an issue, but if you notice too many unsound proofs, contact
%
%\point{How can I easily tell whether a Sledgehammer proof is sound?}
%
%Easiest way: Once it's found: ... by (metis facts)
%try
%sledgehammer [full\_types] (facts)
%
%should usually give unprovable or refind the proof fairly quickly
%
%Same trick if you believe that there exists a proof with certain facts.
%
%\point{Which facts does Sledgehammer select?}
%
%    * heuristic
%    * and several hundreds
%    * show them: debug
%    * influence it with sledgehammer (add: xxx)
%
%    * S/h good at finding short proofs combining a handful of existing lemmas
%    * for deeper proofs, you must restrict the number of facts, e.g.
%      max\_relevant = 50
%    * but then proof reconstruction is an issue
%
%\point{Why are the Isar proofs generated by Sledgehammer so ugly?}
%
%    * experimental
%    * working on this
%    * there is a large body of research into transforming resolution proofs into
%      natural deduction proofs (e.g., Isar proofs)
%    * meantime: isar\_shrink\_factor
%
%
%\point{Should I let Sledgehammer minimize the number of lemmas?}
%
%    * in general, yes
%    * proofs involving fewer lemmas tend to be shorter as well, and hence easier
%      to re-find by Metis
%    * but the opposite is sometimes the case

\point{I got a strange error from Sledgehammer---what should I do?}

Sledgehammer tries to give informative error messages. Please report any strange
error to \authoremail. This applies double if you get the message

\begin{quote}
\slshape
The prover found a type-unsound proof even though a supposedly type-sound
encoding was used (or, very unlikely, your axioms are inconsistent). You
might want to report this to the Isabelle developers.
\end{quote}

\point{Auto can solve it---why not Sledgehammer?}

Problems can be easy for \textit{auto} and difficult for automatic provers, but
the reverse is also true, so don't be discouraged if your first attempts fail.
Because the system refers to all theorems known to Isabelle, it is particularly
suitable when your goal has a short proof from lemmas that you don't know about.

\section{Command Syntax}
\label{command-syntax}

Sledgehammer can be invoked at any point when there is an open goal by entering
the \textbf{sledgehammer} command in the theory file. Its general syntax is as
follows:

\prew
\textbf{sledgehammer} \textit{subcommand\/$^?$ options\/$^?$ facts\_override\/$^?$ num\/$^?$}
\postw

For convenience, Sledgehammer is also available in the ``Commands'' submenu of
the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c
C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with
no arguments in the theory text.

In the general syntax, the \textit{subcommand} may be any of the following:

\begin{enum}
\item[$\bullet$] \textbf{\textit{run} (the default):} Runs Sledgehammer on
subgoal number \textit{num} (1 by default), with the given options and facts.

\item[$\bullet$] \textbf{\textit{minimize}:} Attempts to minimize the provided facts
(specified in the \textit{facts\_override} argument) to obtain a simpler proof
involving fewer facts. The options and goal number are as for \textit{run}.

\item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued
by Sledgehammer. This allows you to examine results that might have been lost
due to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a
limit on the number of messages to display (5 by default).

\item[$\bullet$] \textbf{\textit{supported\_provers}:} Prints the list of
automatic provers supported by Sledgehammer. See \S\ref{installation} and
\S\ref{mode-of-operation} for more information on how to install automatic
provers.

\item[$\bullet$] \textbf{\textit{running\_provers}:} Prints information about
currently running automatic provers, including elapsed runtime and remaining
time until timeout.

\item[$\bullet$] \textbf{\textit{kill\_provers}:} Terminates all running
automatic provers.

\item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
\end{enum}

Sledgehammer's behavior can be influenced by various \textit{options}, which can
be specified in brackets after the \textbf{sledgehammer} command. The
\textit{options} are a list of key--value pairs of the form ``[$k_1 = v_1,
\ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true}'' is optional. For
example:

\prew
\textbf{sledgehammer} [\textit{isar\_proof}, \,\textit{timeout} = 120$\,s$]
\postw

Default values can be set using \textbf{sledgehammer\_\allowbreak params}:

\prew
\textbf{sledgehammer\_params} \textit{options}
\postw

The supported options are described in \S\ref{option-reference}.

The \textit{facts\_override} argument lets you alter the set of facts that go
through the relevance filter. It may be of the form ``(\textit{facts})'', where
\textit{facts} is a space-separated list of Isabelle facts (theorems, local
assumptions, etc.), in which case the relevance filter is bypassed and the given
facts are used. It may also be of the form ``(\textit{add}:\ \textit{facts}$_1$)'',
``(\textit{del}:\ \textit{facts}$_2$)'', or ``(\textit{add}:\ \textit{facts}$_1$\
\textit{del}:\ \textit{facts}$_2$)'', where the relevance filter is instructed to
proceed as usual except that it should consider \textit{facts}$_1$
highly-relevant and \textit{facts}$_2$ fully irrelevant.

You can instruct Sledgehammer to run automatically on newly entered theorems by
enabling the ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof
General. For automatic runs, only the first prover set using \textit{provers}
(\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,
\textit{slicing} (\S\ref{mode-of-operation}) is disabled, \textit{timeout}
(\S\ref{mode-of-operation}) is superseded by the ``Auto Tools Time Limit'' in
Proof General's ``Isabelle'' menu, \textit{full\_types}
(\S\ref{problem-encoding}) is enabled, and \textit{verbose}
(\S\ref{output-format}) and \textit{debug} (\S\ref{output-format}) are disabled.
Sledgehammer's output is also more concise.

\section{Option Reference}
\label{option-reference}

\def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}
\def\qty#1{$\left<\textit{#1}\right>$}
\def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1}}\right>}$}
\def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\quad [\textit{true}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
\def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\quad [\textit{false}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
\def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\quad [\textit{smart}]\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
\def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\quad [\textit{smart}]\hfill\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}
\def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}
\def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\quad [\textit{#3}]} \nopagebreak\\[\parskip]}
\def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}
\def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}
\def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool\_or\_smart}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}

Sledgehammer's options are categorized as follows:\ mode of operation
(\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}),
relevance filter (\S\ref{relevance-filter}), output format
(\S\ref{output-format}), and authentication (\S\ref{authentication}).

The descriptions below refer to the following syntactic quantities:

\begin{enum}
\item[$\bullet$] \qtybf{string}: A string.
\item[$\bullet$] \qtybf{bool\/}: \textit{true} or \textit{false}.
\item[$\bullet$] \qtybf{bool\_or\_smart\/}: \textit{true}, \textit{false}, or
\textit{smart}.
\item[$\bullet$] \qtybf{int\/}: An integer.
%\item[$\bullet$] \qtybf{float\/}: A floating-point number (e.g., 2.5).
\item[$\bullet$] \qtybf{float\_pair\/}: A pair of floating-point numbers
(e.g., 0.6 0.95).
\item[$\bullet$] \qtybf{int\_or\_smart\/}: An integer or \textit{smart}.
\item[$\bullet$] \qtybf{float\_or\_none\/}: An integer (e.g., 60) or
floating-point number (e.g., 0.5) expressing a number of seconds, or the keyword
\textit{none} ($\infty$ seconds).
\end{enum}

Default values are indicated in square brackets. Boolean options have a negated
counterpart (e.g., \textit{blocking} vs.\ \textit{non\_blocking}). When setting
Boolean options, ``= \textit{true}'' may be omitted.

\subsection{Mode of Operation}
\label{mode-of-operation}

\begin{enum}
\opnodefault{provers}{string}
Specifies the automatic provers to use as a space-separated list (e.g.,
``\textit{e}~\textit{spass}''). The following provers are supported:

\begin{enum}
\item[$\bullet$] \textbf{\textit{e}:} E is an ATP developed by Stephan Schulz
\cite{schulz-2002}. To use E, set the environment variable
\texttt{E\_HOME} to the directory that contains the \texttt{eproof} executable,
or install the prebuilt E package from Isabelle's download page. See
\S\ref{installation} for details.

\item[$\bullet$] \textbf{\textit{spass}:} SPASS is an ATP developed by Christoph
Weidenbach et al.\ \cite{weidenbach-et-al-2009}. To use SPASS, set the
environment variable \texttt{SPASS\_HOME} to the directory that contains the
\texttt{SPASS} executable, or install the prebuilt SPASS package from Isabelle's
download page. Sledgehammer requires version 3.5 or above. See
\S\ref{installation} for details.

\item[$\bullet$] \textbf{\textit{vampire}:} Vampire is an ATP developed by
Andrei Voronkov and his colleagues \cite{riazanov-voronkov-2002}. To use
Vampire, set the environment variable \texttt{VAMPIRE\_HOME} to the directory
that contains the \texttt{vampire} executable. Sledgehammer has been tested with
versions 11, 0.6, and 1.0.

\item[$\bullet$] \textbf{\textit{cvc3}:} CVC3 is an SMT solver developed by
Clark Barrett, Cesare Tinelli, and their colleagues \cite{cvc3}. To use CVC3,
set the environment variable \texttt{CVC3\_SOLVER} to the complete path of the
executable, including the file name. Sledgehammer has been tested with version
2.2.

\item[$\bullet$] \textbf{\textit{yices}:} Yices is an SMT solver developed at
SRI \cite{yices}. To use Yices, set the environment variable
\texttt{YICES\_SOLVER} to the complete path of the executable, including the
file name. Sledgehammer has been tested with version 1.0.

\item[$\bullet$] \textbf{\textit{z3}:} Z3 is an SMT solver developed at
Microsoft Research \cite{z3}. To use Z3, set the environment variable
\texttt{Z3\_SOLVER} to the complete path of the executable, including the file
name. Sledgehammer has been tested with versions 2.7 to 2.18.

\item[$\bullet$] \textbf{\textit{z3\_atp}:} This version of Z3 pretends to be an
ATP, exploiting Z3's undocumented support for the TPTP format. It is included
for experimental purposes. It requires version 2.18 or above.

\item[$\bullet$] \textbf{\textit{remote\_e}:} The remote version of E runs
on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.

\item[$\bullet$] \textbf{\textit{remote\_vampire}:} The remote version of
Vampire runs on Geoff Sutcliffe's Miami servers. Version 9 is used.

\item[$\bullet$] \textbf{\textit{remote\_tofof\_e}:} ToFoF-E is a metaprover
developed by Geoff Sutcliffe \cite{tofof} based on E running on his Miami
servers. This ATP supports a fragment of the TPTP many-typed first-order format
(TFF). It is supported primarily for experimenting with the
\textit{type\_sys} $=$ \textit{simple\_types} option (\S\ref{problem-encoding}).

\item[$\bullet$] \textbf{\textit{remote\_sine\_e}:} SInE-E is a metaprover
developed by Kry\v stof Hoder \cite{sine} based on E. The remote version of
SInE runs on Geoff Sutcliffe's Miami servers.

\item[$\bullet$] \textbf{\textit{remote\_snark}:} SNARK is a prover
developed by Stickel et al.\ \cite{snark}. The remote version of
SNARK runs on Geoff Sutcliffe's Miami servers.

\item[$\bullet$] \textbf{\textit{remote\_cvc3}:} The remote version of CVC3 runs
on servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
point).

\item[$\bullet$] \textbf{\textit{remote\_z3}:} The remote version of Z3 runs on
servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to
point).

\item[$\bullet$] \textbf{\textit{remote\_z3\_atp}:} The remote version of ``Z3
as an ATP'' runs on Geoff Sutcliffe's Miami servers.
\end{enum}

By default, Sledgehammer will run E, SPASS, Vampire, SInE-E, and Z3 (or whatever
the SMT module's \textit{smt\_solver} configuration option is set to) in
parallel---either locally or remotely, depending on the number of processor
cores available. For historical reasons, the default value of this option can be
overridden using the option ``Sledgehammer: Provers'' from the ``Isabelle'' menu
in Proof General.

It is a good idea to run several provers in parallel, although it could slow
down your machine. Running E, SPASS, Vampire, and SInE-E together for 5 seconds
yields a better success rate than running the most effective of these (Vampire)
for 120 seconds \cite{boehme-nipkow-2010}.

\opnodefault{prover}{string}
Alias for \textit{provers}.

\opnodefault{atps}{string}
Legacy alias for \textit{provers}.

\opnodefault{atp}{string}
Legacy alias for \textit{provers}.

\opdefault{timeout}{float\_or\_none}{\upshape 30}
Specifies the maximum number of seconds that the automatic provers should spend
searching for a proof. For historical reasons, the default value of this option
can be overridden using the option ``Sledgehammer: Time Limit'' from the
``Isabelle'' menu in Proof General.

\opfalse{blocking}{non\_blocking}
Specifies whether the \textbf{sledgehammer} command should operate
synchronously. The asynchronous (non-blocking) mode lets the user start proving
the putative theorem manually while Sledgehammer looks for a proof, but it can
also be more confusing.

\optrue{slicing}{no\_slicing}
Specifies whether the time allocated to a prover should be sliced into several
segments, each of which has its own set of possibly prover-dependent options.
For SPASS and Vampire, the first slice tries the fast but incomplete
set-of-support (SOS) strategy, whereas the second slice runs without it. For E,
up to three slices are tried, with different weighted search strategies and
number of facts. For SMT solvers, several slices are tried with the same options
each time but fewer and fewer facts. According to benchmarks with a timeout of
30 seconds, slicing is a valuable optimization, and you should probably leave it
enabled unless you are conducting experiments. This option is implicitly
disabled for (short) automatic runs.

\nopagebreak
{\small See also \textit{verbose} (\S\ref{output-format}).}

\opfalse{overlord}{no\_overlord}
Specifies whether Sledgehammer should put its temporary files in
\texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for
debugging Sledgehammer but also unsafe if several instances of the tool are run
simultaneously. The files are identified by the prefix \texttt{prob\_}; you may
safely remove them after Sledgehammer has run.

\nopagebreak
{\small See also \textit{debug} (\S\ref{output-format}).}
\end{enum}

\subsection{Problem Encoding}
\label{problem-encoding}

\begin{enum}
\opfalse{explicit\_apply}{implicit\_apply}
Specifies whether function application should be encoded as an explicit
``apply'' operator in ATP problems. If the option is set to \textit{false}, each
function will be directly applied to as many arguments as possible. Enabling
this option can sometimes help discover higher-order proofs that otherwise would
not be found.

\opfalse{full\_types}{partial\_types}
Specifies whether full type information is encoded in ATP problems. Enabling
this option prevents the discovery of type-incorrect proofs, but it can slow
down the ATP slightly. This option is implicitly enabled for automatic runs. For
historical reasons, the default value of this option can be overridden using the
option ``Sledgehammer: Full Types'' from the ``Isabelle'' menu in Proof General.

\opdefault{type\_sys}{string}{smart}
Specifies the type system to use in ATP problems. The option can take the
following values:

\begin{enum}
\item[$\bullet$] \textbf{\textit{poly\_preds}:} Types are encoded using a predicate
$\mathit{has\_type\/}(\tau, t)$ that restricts the range of bound variables.
Constants are annotated with their types, supplied as extra arguments, to
resolve overloading.

\item[$\bullet$] \textbf{\textit{poly\_tags}:} Each term and subterm is tagged with
its type using a function $\mathit{type\_info\/}(\tau, t)$.

\item[$\bullet$] \textbf{\textit{poly\_args}:}
Like for the other sound encodings, constants are annotated with their types to
resolve overloading, but otherwise no type information is encoded.

\item[$\bullet$] \textbf{\textit{erased}:} No type information is supplied to
the ATP. Types are simply erased.

\item[$\bullet$]
\textbf{%
\textit{mono\_preds},
\textit{mono\_tags},
\textit{mono\_args}:} \\
Similar to \textit{poly\_preds}, \textit{poly\_tags}, and \textit{poly\_args},
respectively, but the problem is additionally monomorphized, meaning that type
variables are instantiated with heuristically chosen ground types.
Monomorphization can simplify reasoning but also leads to larger fact bases,
which can slow down the ATPs.

\item[$\bullet$] \textbf{\textit{simple\_types}:} Use the prover's support for
simply typed first-order logic if available; otherwise, fall back on
\textit{mangled\_preds}. The problem is monomorphized.

\item[$\bullet$]
\textbf{%
\textit{mangled\_preds},
\textit{mangled\_tags},
\textit{mangled\_args}:} \\
Similar to
\textit{mono\_preds}, \textit{mono\_tags}, and \textit{mono\_args},
respectively but types are mangled in constant names instead of being supplied
as ground term arguments. The binary predicate $\mathit{has\_type\/}(\tau, t)$
becomes a unary predicate $\mathit{has\_type\_}\tau(t)$, and the binary function
$\mathit{type\_info\/}(\tau, t)$ becomes a unary function
$\mathit{type\_info\_}\tau(t)$.

\item[$\bullet$]
\textbf{%
\textit{mono\_preds}?, \textit{mono\_tags}?, \textit{simple\_types}?, \\
\textit{mangled\_preds}?, \textit{mangled\_tags}?:} \\
The type systems \textit{mono\_preds}, \textit{mono\_tags}, \textit{simple\_types},
\textit{mangled\_preds}, and \textit{mangled\_tags} are fully typed and
virtually sound---except for pathological cases, all found proofs are
type-correct. For each of these, Sledgehammer also provides a lighter (but
virtually sound) variant identified by a question mark (`{?}')\ that detects and
erases monotonic types, notably infinite types. (For \textit{simple\_types}, the
types are not actually erased but rather replaced by a shared uniform type of
individuals.)

\item[$\bullet$]
\textbf{%
\textit{poly\_tags}!, \textit{mono\_preds}!, \textit{mono\_tags}!, \\
\textit{simple\_types}!, \textit{mangled\_preds}!, \textit{mangled\_tags}!:} \\
The type systems \textit{poly\_preds}, \textit{poly\_tags},
\textit{mono\_preds}, \textit{mono\_tags}, \textit{simple\_types},
\textit{mangled\_preds}, and \textit{mangled\_tags} also admit a somewhat
unsound (but very efficient) variant identified by an exclamation mark (`{!}')
that detects and erases erases all types except those that are clearly finite
(e.g., \textit{bool}). (For \textit{simple\_types}, the types are not actually
erased but rather replaced by a shared uniform type of individuals.)

\item[$\bullet$] \textbf{\textit{smart}:} If \textit{full\_types} is enabled,
uses a fully typed, virtually sound encoding; otherwise, uses any encoding. The
actual encoding used depends on the ATP and should be the most efficient for
that ATP.
\end{enum}

For SMT solvers and ToFoF-E, the type system is always \textit{simple\_types}.

\opdefault{max\_mono\_iters}{int}{\upshape 5}
Specifies the maximum number of iterations for the monomorphization fixpoint
construction. The higher this limit is, the more monomorphic instances are
potentially generated. Whether monomorphization takes place depends on the
type system used.

\opdefault{max\_new\_mono\_instances}{int}{\upshape 250}
Specifies the maximum number of monomorphic instances to generate beyond
\textit{max\_relevant}. The higher this limit is, the more monomorphic instances
are potentially generated. Whether monomorphization takes place depends on the
type system used.
\end{enum}

\subsection{Relevance Filter}
\label{relevance-filter}

\begin{enum}
\opdefault{relevance\_thresholds}{float\_pair}{\upshape 0.45~0.85}
Specifies the thresholds above which facts are considered relevant by the
relevance filter. The first threshold is used for the first iteration of the
relevance filter and the second threshold is used for the last iteration (if it
is reached). The effective threshold is quadratically interpolated for the other
iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems
are relevant and 1 only theorems that refer to previously seen constants.

\opsmart{max\_relevant}{int\_or\_smart}
Specifies the maximum number of facts that may be returned by the relevance
filter. If the option is set to \textit{smart}, it is set to a value that was
empirically found to be appropriate for the prover. A typical value would be
300.

\end{enum}

\subsection{Output Format}
\label{output-format}

\begin{enum}

\opfalse{verbose}{quiet}
Specifies whether the \textbf{sledgehammer} command should explain what it does.
This option is implicitly disabled for automatic runs.

\opfalse{debug}{no\_debug}
Specifies whether Sledgehammer should display additional debugging information
beyond what \textit{verbose} already displays. Enabling \textit{debug} also
enables \textit{verbose} and \textit{blocking} (\S\ref{mode-of-operation})
behind the scenes. The \textit{debug} option is implicitly disabled for
automatic runs.

\nopagebreak
{\small See also \textit{overlord} (\S\ref{mode-of-operation}).}

\opfalse{isar\_proof}{no\_isar\_proof}
Specifies whether Isar proofs should be output in addition to one-liner
\textit{metis} proofs. Isar proof construction is still experimental and often
fails; however, they are usually faster and sometimes more robust than
\textit{metis} proofs.

\opdefault{isar\_shrink\_factor}{int}{\upshape 1}
Specifies the granularity of the Isar proof. A value of $n$ indicates that each
Isar proof step should correspond to a group of up to $n$ consecutive proof
steps in the ATP proof.

\end{enum}

\subsection{Authentication}
\label{authentication}

\begin{enum}
\opnodefault{expect}{string}
Specifies the expected outcome, which must be one of the following:

\begin{enum}
\item[$\bullet$] \textbf{\textit{some}:} Sledgehammer found a (potentially
unsound) proof.
\item[$\bullet$] \textbf{\textit{none}:} Sledgehammer found no proof.
\item[$\bullet$] \textbf{\textit{unknown}:} Sledgehammer encountered some
problem.
\end{enum}

Sledgehammer emits an error (if \textit{blocking} is enabled) or a warning
(otherwise) if the actual outcome differs from the expected outcome. This option
is useful for regression testing.

\nopagebreak
{\small See also \textit{blocking} (\S\ref{mode-of-operation}).}
\end{enum}

\let\em=\sl
\bibliography{../manual}{}
\bibliographystyle{abbrv}

\end{document}
author	blanchet
	Thu, 19 May 2011 10:24:13 +0200
changeset 42845	94c69e441440
parent 42763	e588d3e8ad91
child 42846	dfed4dbe5596
permissions	-rw-r--r--