doc-src/Sledgehammer/sledgehammer.tex
changeset 40059 6ad9081665db
parent 39335 87a9ff4d5817
child 40060 5ef6747aa619
equal deleted inserted replaced
40058:b4f62d0660e0 40059:6ad9081665db
    77 \section{Introduction}
    77 \section{Introduction}
    78 \label{introduction}
    78 \label{introduction}
    79 
    79 
    80 Sledgehammer is a tool that applies first-order automatic theorem provers (ATPs)
    80 Sledgehammer is a tool that applies first-order automatic theorem provers (ATPs)
    81 on the current goal. The supported ATPs are E \cite{schulz-2002}, SPASS
    81 on the current goal. The supported ATPs are E \cite{schulz-2002}, SPASS
    82 \cite{weidenbach-et-al-2009}, and Vampire \cite{riazanov-voronkov-2002}, which
    82 \cite{weidenbach-et-al-2009}, Vampire \cite{riazanov-voronkov-2002},
    83 can be run locally or remotely via the SystemOnTPTP web service
    83 SInE-E \cite{sine}, and SNARK \cite{snark}. The ATPs are run either locally or
    84 \cite{sutcliffe-2000}.
    84 remotely via the SystemOnTPTP web service \cite{sutcliffe-2000}.
    85 
    85 
    86 The problem passed to ATPs consists of your current goal together with a
    86 The problem passed to ATPs consists of your current goal together with a
    87 heuristic selection of hundreds of facts (theorems) from the current theory
    87 heuristic selection of hundreds of facts (theorems) from the current theory
    88 context, filtered by relevance. Because jobs are run in the background, you can
    88 context, filtered by relevance. Because jobs are run in the background, you can
    89 continue to work on your proof by other means. Provers can be run in parallel.
    89 continue to work on your proof by other means. Provers can be run in parallel.
   122 \section{Installation}
   122 \section{Installation}
   123 \label{installation}
   123 \label{installation}
   124 
   124 
   125 Sledgehammer is part of Isabelle, so you don't need to install it. However, it
   125 Sledgehammer is part of Isabelle, so you don't need to install it. However, it
   126 relies on third-party automatic theorem provers (ATPs). Currently, E, SPASS, and
   126 relies on third-party automatic theorem provers (ATPs). Currently, E, SPASS, and
   127 Vampire are supported. All of these are available remotely via SystemOnTPTP
   127 Vampire can be run locally; in addition, E, Vampire, SInE-E, and SNARK
   128 \cite{sutcliffe-2000}, but if you want better performance you will need to
   128 are available remotely via SystemOnTPTP \cite{sutcliffe-2000}. If you want
   129 install at least E and SPASS locally.
   129 better performance, you should install at least E and SPASS locally.
   130 
   130 
   131 There are three main ways to install ATPs on your machine:
   131 There are three main ways to install ATPs on your machine:
   132 
   132 
   133 \begin{enum}
   133 \begin{enum}
   134 \item[$\bullet$] If you installed an official Isabelle package with everything
   134 \item[$\bullet$] If you installed an official Isabelle package with everything
   197 General or press the Emacs key sequence C-c C-a C-s.
   197 General or press the Emacs key sequence C-c C-a C-s.
   198 Either way, Sledgehammer produces the following output after a few seconds:
   198 Either way, Sledgehammer produces the following output after a few seconds:
   199 
   199 
   200 \prew
   200 \prew
   201 \slshape
   201 \slshape
   202 Sledgehammer: ATP ``\textit{e}'' for subgoal 1: \\
   202 Sledgehammer: Prover ``\textit{e}'' for subgoal 1: \\
   203 $([a] = [b]) = (a = b)$ \\
   203 $([a] = [b]) = (a = b)$ \\
   204 Try this command: \textbf{by} (\textit{metis hd.simps}). \\
   204 Try this command: \textbf{by} (\textit{metis hd.simps}). \\
   205 To minimize the number of lemmas, try this: \\
   205 To minimize the number of lemmas, try this: \\
   206 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{e}] (\textit{hd.simps}). \\[3\smallskipamount]
   206 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{e}] (\textit{hd.simps}). \\[3\smallskipamount]
   207 %
   207 %
   208 Sledgehammer: ATP ``\textit{spass}'' for subgoal 1: \\
   208 Sledgehammer: Prover ``\textit{spass}'' for subgoal 1: \\
   209 $([a] = [b]) = (a = b)$ \\
   209 $([a] = [b]) = (a = b)$ \\
   210 Try this command: \textbf{by} (\textit{metis insert\_Nil last\_ConsL}). \\
   210 Try this command: \textbf{by} (\textit{metis insert\_Nil last\_ConsL}). \\
   211 To minimize the number of lemmas, try this: \\
   211 To minimize the number of lemmas, try this: \\
   212 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{spass}] (\textit{insert\_Nil last\_ConsL}). \\[3\smallskipamount]
   212 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{spass}] (\textit{insert\_Nil last\_ConsL}). \\[3\smallskipamount]
   213 %
   213 %
   214 Sledgehammer: ATP ``\textit{remote\_vampire}'' for subgoal 1: \\
   214 Sledgehammer: Prover ``\textit{remote\_vampire}'' for subgoal 1: \\
   215 $([a] = [b]) = (a = b)$ \\
   215 $([a] = [b]) = (a = b)$ \\
   216 Try this command: \textbf{by} (\textit{metis One\_nat\_def\_raw empty\_replicate} \\
   216 Try this command: \textbf{by} (\textit{metis One\_nat\_def\_raw empty\_replicate} \\
   217 \phantom{Try this command: \textbf{by} (\textit{metis~}}\textit{insert\_Nil last\_ConsL replicate\_Suc}). \\
   217 \phantom{Try this command: \textbf{by} (\textit{metis~}}\textit{insert\_Nil last\_ConsL replicate\_Suc}). \\
   218 To minimize the number of lemmas, try this: \\
   218 To minimize the number of lemmas, try this: \\
   219 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{remote\_vampire}] \\
   219 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{remote\_vampire}] \\
   220 \phantom{\textbf{sledgehammer}~}(\textit{One\_nat\_def\_raw empty\_replicate insert\_Nil} \\
   220 \phantom{\textbf{sledgehammer}~}(\textit{One\_nat\_def\_raw empty\_replicate insert\_Nil} \\
   221 \phantom{\textbf{sledgehammer}~(}\textit{last\_ConsL replicate\_Suc}).
   221 \phantom{\textbf{sledgehammer}~(}\textit{last\_ConsL replicate\_Suc}).
   222 \postw
   222 \postw
   223 
   223 
   224 Sledgehammer ran E, SPASS, and the remote version of Vampire in parallel. If E
   224 Sledgehammer ran E, SPASS, and the remote version of Vampire in parallel. If E
   289 \item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued by
   289 \item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued by
   290 Sledgehammer. This allows you to examine results that might have been lost due
   290 Sledgehammer. This allows you to examine results that might have been lost due
   291 to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a
   291 to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a
   292 limit on the number of messages to display (5 by default).
   292 limit on the number of messages to display (5 by default).
   293 
   293 
   294 \item[$\bullet$] \textbf{\textit{available\_atps}:} Prints the list of installed ATPs.
   294 \item[$\bullet$] \textbf{\textit{available\_provers}:} Prints the list of installed provers.
   295 See \S\ref{installation} and \S\ref{mode-of-operation} for more information on
   295 See \S\ref{installation} and \S\ref{mode-of-operation} for more information on
   296 how to install ATPs.
   296 how to install automatic provers.
   297 
   297 
   298 \item[$\bullet$] \textbf{\textit{running\_atps}:} Prints information about currently
   298 \item[$\bullet$] \textbf{\textit{running\_provers}:} Prints information about
   299 running ATPs, including elapsed runtime and remaining time until timeout.
   299 currently running automatic provers, including elapsed runtime and remaining
   300 
   300 time until timeout.
   301 \item[$\bullet$] \textbf{\textit{kill\_atps}:} Terminates all running ATPs.
   301 
       
   302 \item[$\bullet$] \textbf{\textit{kill\_provers}:} Terminates all running
       
   303 automatic provers.
   302 
   304 
   303 \item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
   305 \item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote
   304 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
   306 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.
   305 \end{enum}
   307 \end{enum}
   306 
   308 
   332 proceed as usual except that it should consider \textit{facts}$_1$
   334 proceed as usual except that it should consider \textit{facts}$_1$
   333 highly-relevant and \textit{facts}$_2$ fully irrelevant.
   335 highly-relevant and \textit{facts}$_2$ fully irrelevant.
   334 
   336 
   335 You can instruct Sledgehammer to run automatically on newly entered theorems by
   337 You can instruct Sledgehammer to run automatically on newly entered theorems by
   336 enabling the ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof
   338 enabling the ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof
   337 General. For automatic runs, only the first ATP set using \textit{atps}
   339 General. For automatic runs, only the first prover set using \textit{provers}
   338 (\S\ref{mode-of-operation}) is considered, \textit{verbose}
   340 (\S\ref{mode-of-operation}) is considered, \textit{verbose}
   339 (\S\ref{output-format}) and \textit{debug} (\S\ref{output-format}) are disabled,
   341 (\S\ref{output-format}) and \textit{debug} (\S\ref{output-format}) are disabled,
   340 fewer facts are passed to the ATP, and \textit{timeout} (\S\ref{timeouts}) is
   342 fewer facts are passed to the prover, and \textit{timeout} (\S\ref{timeouts}) is
   341 superseded by the ``Auto Tools Time Limit'' in Proof General's ``Isabelle''
   343 superseded by the ``Auto Tools Time Limit'' in Proof General's ``Isabelle''
   342 menu. Sledgehammer's output is also more concise.
   344 menu. Sledgehammer's output is also more concise.
   343 
   345 
   344 \section{Option Reference}
   346 \section{Option Reference}
   345 \label{option-reference}
   347 \label{option-reference}
   380 
   382 
   381 \subsection{Mode of Operation}
   383 \subsection{Mode of Operation}
   382 \label{mode-of-operation}
   384 \label{mode-of-operation}
   383 
   385 
   384 \begin{enum}
   386 \begin{enum}
   385 \opnodefault{atps}{string}
   387 \opnodefault{provers}{string}
   386 Specifies the ATPs (automated theorem provers) to use as a space-separated list
   388 Specifies the automatic provers to use as a space-separated list (e.g.,
   387 (e.g., ``\textit{e}~\textit{spass}''). The following ATPs are supported:
   389 ``\textit{e}~\textit{spass}''). The following provers are supported:
   388 
   390 
   389 \begin{enum}
   391 \begin{enum}
   390 \item[$\bullet$] \textbf{\textit{e}:} E is an ATP developed by Stephan Schulz
   392 \item[$\bullet$] \textbf{\textit{e}:} E is an ATP developed by Stephan Schulz
   391 \cite{schulz-2002}. To use E, set the environment variable
   393 \cite{schulz-2002}. To use E, set the environment variable
   392 \texttt{E\_HOME} to the directory that contains the \texttt{eproof} executable,
   394 \texttt{E\_HOME} to the directory that contains the \texttt{eproof} executable,
   421 \end{enum}
   423 \end{enum}
   422 
   424 
   423 By default, Sledgehammer will run E, SPASS, Vampire, and SInE-E in parallel.
   425 By default, Sledgehammer will run E, SPASS, Vampire, and SInE-E in parallel.
   424 For at most two of E, SPASS, and Vampire, it will use any locally installed
   426 For at most two of E, SPASS, and Vampire, it will use any locally installed
   425 version if available. For historical reasons, the default value of this option
   427 version if available. For historical reasons, the default value of this option
   426 can be overridden using the option ``Sledgehammer: ATPs'' from the ``Isabelle''
   428 can be overridden using the option ``Sledgehammer: Provers'' from the
   427 menu in Proof General.
   429 ``Isabelle'' menu in Proof General.
   428 
   430 
   429 It is a good idea to run several ATPs in parallel, although it could slow down
   431 It is a good idea to run several provers in parallel, although it could slow
   430 your machine. Running E, SPASS, and Vampire together for 5 seconds yields about
   432 down your machine. Running E, SPASS, and Vampire together for 5 seconds yields
   431 the same success rate as running the most effective of these (Vampire) for 120
   433 about the same success rate as running the most effective of these (Vampire) for
   432 seconds \cite{boehme-nipkow-2010}.
   434 120 seconds \cite{boehme-nipkow-2010}.
       
   435 
       
   436 \opnodefault{prover}{string}
       
   437 Alias for \textit{provers}.
       
   438 
       
   439 \opnodefault{atps}{string}
       
   440 Legacy alias for \textit{provers}.
   433 
   441 
   434 \opnodefault{atp}{string}
   442 \opnodefault{atp}{string}
   435 Alias for \textit{atps}.
   443 Legacy alias for \textit{provers}.
   436 
   444 
   437 \opdefault{timeout}{time}{$\mathbf{30}$ s}
   445 \opdefault{timeout}{time}{$\mathbf{30}$ s}
   438 Specifies the maximum amount of time that the ATPs should spend searching for a
   446 Specifies the maximum amount of time that the automatic provers should spend
   439 proof. For historical reasons, the default value of this option can be
   447 searching for a proof. For historical reasons, the default value of this option
   440 overridden using the option ``Sledgehammer: Time Limit'' from the ``Isabelle''
   448 can be overridden using the option ``Sledgehammer: Time Limit'' from the
   441 menu in Proof General.
   449 ``Isabelle'' menu in Proof General.
   442 
   450 
   443 \opfalse{blocking}{non\_blocking}
   451 \opfalse{blocking}{non\_blocking}
   444 Specifies whether the \textbf{sledgehammer} command should operate
   452 Specifies whether the \textbf{sledgehammer} command should operate
   445 synchronously. The asynchronous (non-blocking) mode lets the user start proving
   453 synchronously. The asynchronous (non-blocking) mode lets the user start proving
   446 the putative theorem manually while Sledgehammer looks for a proof, but it can
   454 the putative theorem manually while Sledgehammer looks for a proof, but it can
   469 
   477 
   470 \opfalse{full\_types}{partial\_types}
   478 \opfalse{full\_types}{partial\_types}
   471 Specifies whether full-type information is exported. Enabling this option can
   479 Specifies whether full-type information is exported. Enabling this option can
   472 prevent the discovery of type-incorrect proofs, but it also tends to slow down
   480 prevent the discovery of type-incorrect proofs, but it also tends to slow down
   473 the ATPs significantly. For historical reasons, the default value of this option
   481 the ATPs significantly. For historical reasons, the default value of this option
   474 can be overridden using the option ``Sledgehammer: ATPs'' from the ``Isabelle''
   482 can be overridden using the option ``Sledgehammer: Full Types'' from the
   475 menu in Proof General.
   483 ``Isabelle'' menu in Proof General.
   476 \end{enum}
   484 \end{enum}
   477 
   485 
   478 \subsection{Relevance Filter}
   486 \subsection{Relevance Filter}
   479 \label{relevance-filter}
   487 \label{relevance-filter}
   480 
   488 
   488 are relevant and 100 only theorems that refer to previously seen constants.
   496 are relevant and 100 only theorems that refer to previously seen constants.
   489 
   497 
   490 \opdefault{max\_relevant}{int\_or\_smart}{\textit{smart}}
   498 \opdefault{max\_relevant}{int\_or\_smart}{\textit{smart}}
   491 Specifies the maximum number of facts that may be returned by the relevance
   499 Specifies the maximum number of facts that may be returned by the relevance
   492 filter. If the option is set to \textit{smart}, it is set to a value that was
   500 filter. If the option is set to \textit{smart}, it is set to a value that was
   493 empirically found to be appropriate for the ATP. A typical value would be 300.
   501 empirically found to be appropriate for the prover. A typical value would be
   494 
   502 300.
   495 %\opsmartx{theory\_relevant}{theory\_irrelevant}
       
   496 %Specifies whether the theory from which a fact comes should be taken into
       
   497 %consideration by the relevance filter. If the option is set to \textit{smart},
       
   498 %it is taken to be \textit{true} for SPASS and \textit{false} for the other ATPs;
       
   499 %empirical results suggest that these are the best settings.
       
   500 
       
   501 %\opfalse{defs\_relevant}{defs\_irrelevant}
       
   502 %Specifies whether the definition of constants occurring in the formula to prove
       
   503 %should be considered particularly relevant. Enabling this option tends to lead
       
   504 %to larger problems and typically slows down the ATPs.
       
   505 \end{enum}
   503 \end{enum}
   506 
   504 
   507 \subsection{Output Format}
   505 \subsection{Output Format}
   508 \label{output-format}
   506 \label{output-format}
   509 
   507