77 \section{Introduction} |
77 \section{Introduction} |
78 \label{introduction} |
78 \label{introduction} |
79 |
79 |
80 Sledgehammer is a tool that applies first-order automatic theorem provers (ATPs) |
80 Sledgehammer is a tool that applies first-order automatic theorem provers (ATPs) |
81 on the current goal. The supported ATPs are E \cite{schulz-2002}, SPASS |
81 on the current goal. The supported ATPs are E \cite{schulz-2002}, SPASS |
82 \cite{weidenbach-et-al-2009}, and Vampire \cite{riazanov-voronkov-2002}, which |
82 \cite{weidenbach-et-al-2009}, Vampire \cite{riazanov-voronkov-2002}, |
83 can be run locally or remotely via the SystemOnTPTP web service |
83 SInE-E \cite{sine}, and SNARK \cite{snark}. The ATPs are run either locally or |
84 \cite{sutcliffe-2000}. |
84 remotely via the SystemOnTPTP web service \cite{sutcliffe-2000}. |
85 |
85 |
86 The problem passed to ATPs consists of your current goal together with a |
86 The problem passed to ATPs consists of your current goal together with a |
87 heuristic selection of hundreds of facts (theorems) from the current theory |
87 heuristic selection of hundreds of facts (theorems) from the current theory |
88 context, filtered by relevance. Because jobs are run in the background, you can |
88 context, filtered by relevance. Because jobs are run in the background, you can |
89 continue to work on your proof by other means. Provers can be run in parallel. |
89 continue to work on your proof by other means. Provers can be run in parallel. |
122 \section{Installation} |
122 \section{Installation} |
123 \label{installation} |
123 \label{installation} |
124 |
124 |
125 Sledgehammer is part of Isabelle, so you don't need to install it. However, it |
125 Sledgehammer is part of Isabelle, so you don't need to install it. However, it |
126 relies on third-party automatic theorem provers (ATPs). Currently, E, SPASS, and |
126 relies on third-party automatic theorem provers (ATPs). Currently, E, SPASS, and |
127 Vampire are supported. All of these are available remotely via SystemOnTPTP |
127 Vampire can be run locally; in addition, E, Vampire, SInE-E, and SNARK |
128 \cite{sutcliffe-2000}, but if you want better performance you will need to |
128 are available remotely via SystemOnTPTP \cite{sutcliffe-2000}. If you want |
129 install at least E and SPASS locally. |
129 better performance, you should install at least E and SPASS locally. |
130 |
130 |
131 There are three main ways to install ATPs on your machine: |
131 There are three main ways to install ATPs on your machine: |
132 |
132 |
133 \begin{enum} |
133 \begin{enum} |
134 \item[$\bullet$] If you installed an official Isabelle package with everything |
134 \item[$\bullet$] If you installed an official Isabelle package with everything |
197 General or press the Emacs key sequence C-c C-a C-s. |
197 General or press the Emacs key sequence C-c C-a C-s. |
198 Either way, Sledgehammer produces the following output after a few seconds: |
198 Either way, Sledgehammer produces the following output after a few seconds: |
199 |
199 |
200 \prew |
200 \prew |
201 \slshape |
201 \slshape |
202 Sledgehammer: ATP ``\textit{e}'' for subgoal 1: \\ |
202 Sledgehammer: Prover ``\textit{e}'' for subgoal 1: \\ |
203 $([a] = [b]) = (a = b)$ \\ |
203 $([a] = [b]) = (a = b)$ \\ |
204 Try this command: \textbf{by} (\textit{metis hd.simps}). \\ |
204 Try this command: \textbf{by} (\textit{metis hd.simps}). \\ |
205 To minimize the number of lemmas, try this: \\ |
205 To minimize the number of lemmas, try this: \\ |
206 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{e}] (\textit{hd.simps}). \\[3\smallskipamount] |
206 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{e}] (\textit{hd.simps}). \\[3\smallskipamount] |
207 % |
207 % |
208 Sledgehammer: ATP ``\textit{spass}'' for subgoal 1: \\ |
208 Sledgehammer: Prover ``\textit{spass}'' for subgoal 1: \\ |
209 $([a] = [b]) = (a = b)$ \\ |
209 $([a] = [b]) = (a = b)$ \\ |
210 Try this command: \textbf{by} (\textit{metis insert\_Nil last\_ConsL}). \\ |
210 Try this command: \textbf{by} (\textit{metis insert\_Nil last\_ConsL}). \\ |
211 To minimize the number of lemmas, try this: \\ |
211 To minimize the number of lemmas, try this: \\ |
212 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{spass}] (\textit{insert\_Nil last\_ConsL}). \\[3\smallskipamount] |
212 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{spass}] (\textit{insert\_Nil last\_ConsL}). \\[3\smallskipamount] |
213 % |
213 % |
214 Sledgehammer: ATP ``\textit{remote\_vampire}'' for subgoal 1: \\ |
214 Sledgehammer: Prover ``\textit{remote\_vampire}'' for subgoal 1: \\ |
215 $([a] = [b]) = (a = b)$ \\ |
215 $([a] = [b]) = (a = b)$ \\ |
216 Try this command: \textbf{by} (\textit{metis One\_nat\_def\_raw empty\_replicate} \\ |
216 Try this command: \textbf{by} (\textit{metis One\_nat\_def\_raw empty\_replicate} \\ |
217 \phantom{Try this command: \textbf{by} (\textit{metis~}}\textit{insert\_Nil last\_ConsL replicate\_Suc}). \\ |
217 \phantom{Try this command: \textbf{by} (\textit{metis~}}\textit{insert\_Nil last\_ConsL replicate\_Suc}). \\ |
218 To minimize the number of lemmas, try this: \\ |
218 To minimize the number of lemmas, try this: \\ |
219 \textbf{sledgehammer} \textit{minimize} [\textit{atp} = \textit{remote\_vampire}] \\ |
219 \textbf{sledgehammer} \textit{minimize} [\textit{prover} = \textit{remote\_vampire}] \\ |
220 \phantom{\textbf{sledgehammer}~}(\textit{One\_nat\_def\_raw empty\_replicate insert\_Nil} \\ |
220 \phantom{\textbf{sledgehammer}~}(\textit{One\_nat\_def\_raw empty\_replicate insert\_Nil} \\ |
221 \phantom{\textbf{sledgehammer}~(}\textit{last\_ConsL replicate\_Suc}). |
221 \phantom{\textbf{sledgehammer}~(}\textit{last\_ConsL replicate\_Suc}). |
222 \postw |
222 \postw |
223 |
223 |
224 Sledgehammer ran E, SPASS, and the remote version of Vampire in parallel. If E |
224 Sledgehammer ran E, SPASS, and the remote version of Vampire in parallel. If E |
289 \item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued by |
289 \item[$\bullet$] \textbf{\textit{messages}:} Redisplays recent messages issued by |
290 Sledgehammer. This allows you to examine results that might have been lost due |
290 Sledgehammer. This allows you to examine results that might have been lost due |
291 to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a |
291 to Sledgehammer's asynchronous nature. The \textit{num} argument specifies a |
292 limit on the number of messages to display (5 by default). |
292 limit on the number of messages to display (5 by default). |
293 |
293 |
294 \item[$\bullet$] \textbf{\textit{available\_atps}:} Prints the list of installed ATPs. |
294 \item[$\bullet$] \textbf{\textit{available\_provers}:} Prints the list of installed provers. |
295 See \S\ref{installation} and \S\ref{mode-of-operation} for more information on |
295 See \S\ref{installation} and \S\ref{mode-of-operation} for more information on |
296 how to install ATPs. |
296 how to install automatic provers. |
297 |
297 |
298 \item[$\bullet$] \textbf{\textit{running\_atps}:} Prints information about currently |
298 \item[$\bullet$] \textbf{\textit{running\_provers}:} Prints information about |
299 running ATPs, including elapsed runtime and remaining time until timeout. |
299 currently running automatic provers, including elapsed runtime and remaining |
300 |
300 time until timeout. |
301 \item[$\bullet$] \textbf{\textit{kill\_atps}:} Terminates all running ATPs. |
301 |
|
302 \item[$\bullet$] \textbf{\textit{kill\_provers}:} Terminates all running |
|
303 automatic provers. |
302 |
304 |
303 \item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote |
305 \item[$\bullet$] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote |
304 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}. |
306 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}. |
305 \end{enum} |
307 \end{enum} |
306 |
308 |
332 proceed as usual except that it should consider \textit{facts}$_1$ |
334 proceed as usual except that it should consider \textit{facts}$_1$ |
333 highly-relevant and \textit{facts}$_2$ fully irrelevant. |
335 highly-relevant and \textit{facts}$_2$ fully irrelevant. |
334 |
336 |
335 You can instruct Sledgehammer to run automatically on newly entered theorems by |
337 You can instruct Sledgehammer to run automatically on newly entered theorems by |
336 enabling the ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof |
338 enabling the ``Auto Sledgehammer'' option from the ``Isabelle'' menu in Proof |
337 General. For automatic runs, only the first ATP set using \textit{atps} |
339 General. For automatic runs, only the first prover set using \textit{provers} |
338 (\S\ref{mode-of-operation}) is considered, \textit{verbose} |
340 (\S\ref{mode-of-operation}) is considered, \textit{verbose} |
339 (\S\ref{output-format}) and \textit{debug} (\S\ref{output-format}) are disabled, |
341 (\S\ref{output-format}) and \textit{debug} (\S\ref{output-format}) are disabled, |
340 fewer facts are passed to the ATP, and \textit{timeout} (\S\ref{timeouts}) is |
342 fewer facts are passed to the prover, and \textit{timeout} (\S\ref{timeouts}) is |
341 superseded by the ``Auto Tools Time Limit'' in Proof General's ``Isabelle'' |
343 superseded by the ``Auto Tools Time Limit'' in Proof General's ``Isabelle'' |
342 menu. Sledgehammer's output is also more concise. |
344 menu. Sledgehammer's output is also more concise. |
343 |
345 |
344 \section{Option Reference} |
346 \section{Option Reference} |
345 \label{option-reference} |
347 \label{option-reference} |
380 |
382 |
381 \subsection{Mode of Operation} |
383 \subsection{Mode of Operation} |
382 \label{mode-of-operation} |
384 \label{mode-of-operation} |
383 |
385 |
384 \begin{enum} |
386 \begin{enum} |
385 \opnodefault{atps}{string} |
387 \opnodefault{provers}{string} |
386 Specifies the ATPs (automated theorem provers) to use as a space-separated list |
388 Specifies the automatic provers to use as a space-separated list (e.g., |
387 (e.g., ``\textit{e}~\textit{spass}''). The following ATPs are supported: |
389 ``\textit{e}~\textit{spass}''). The following provers are supported: |
388 |
390 |
389 \begin{enum} |
391 \begin{enum} |
390 \item[$\bullet$] \textbf{\textit{e}:} E is an ATP developed by Stephan Schulz |
392 \item[$\bullet$] \textbf{\textit{e}:} E is an ATP developed by Stephan Schulz |
391 \cite{schulz-2002}. To use E, set the environment variable |
393 \cite{schulz-2002}. To use E, set the environment variable |
392 \texttt{E\_HOME} to the directory that contains the \texttt{eproof} executable, |
394 \texttt{E\_HOME} to the directory that contains the \texttt{eproof} executable, |
421 \end{enum} |
423 \end{enum} |
422 |
424 |
423 By default, Sledgehammer will run E, SPASS, Vampire, and SInE-E in parallel. |
425 By default, Sledgehammer will run E, SPASS, Vampire, and SInE-E in parallel. |
424 For at most two of E, SPASS, and Vampire, it will use any locally installed |
426 For at most two of E, SPASS, and Vampire, it will use any locally installed |
425 version if available. For historical reasons, the default value of this option |
427 version if available. For historical reasons, the default value of this option |
426 can be overridden using the option ``Sledgehammer: ATPs'' from the ``Isabelle'' |
428 can be overridden using the option ``Sledgehammer: Provers'' from the |
427 menu in Proof General. |
429 ``Isabelle'' menu in Proof General. |
428 |
430 |
429 It is a good idea to run several ATPs in parallel, although it could slow down |
431 It is a good idea to run several provers in parallel, although it could slow |
430 your machine. Running E, SPASS, and Vampire together for 5 seconds yields about |
432 down your machine. Running E, SPASS, and Vampire together for 5 seconds yields |
431 the same success rate as running the most effective of these (Vampire) for 120 |
433 about the same success rate as running the most effective of these (Vampire) for |
432 seconds \cite{boehme-nipkow-2010}. |
434 120 seconds \cite{boehme-nipkow-2010}. |
|
435 |
|
436 \opnodefault{prover}{string} |
|
437 Alias for \textit{provers}. |
|
438 |
|
439 \opnodefault{atps}{string} |
|
440 Legacy alias for \textit{provers}. |
433 |
441 |
434 \opnodefault{atp}{string} |
442 \opnodefault{atp}{string} |
435 Alias for \textit{atps}. |
443 Legacy alias for \textit{provers}. |
436 |
444 |
437 \opdefault{timeout}{time}{$\mathbf{30}$ s} |
445 \opdefault{timeout}{time}{$\mathbf{30}$ s} |
438 Specifies the maximum amount of time that the ATPs should spend searching for a |
446 Specifies the maximum amount of time that the automatic provers should spend |
439 proof. For historical reasons, the default value of this option can be |
447 searching for a proof. For historical reasons, the default value of this option |
440 overridden using the option ``Sledgehammer: Time Limit'' from the ``Isabelle'' |
448 can be overridden using the option ``Sledgehammer: Time Limit'' from the |
441 menu in Proof General. |
449 ``Isabelle'' menu in Proof General. |
442 |
450 |
443 \opfalse{blocking}{non\_blocking} |
451 \opfalse{blocking}{non\_blocking} |
444 Specifies whether the \textbf{sledgehammer} command should operate |
452 Specifies whether the \textbf{sledgehammer} command should operate |
445 synchronously. The asynchronous (non-blocking) mode lets the user start proving |
453 synchronously. The asynchronous (non-blocking) mode lets the user start proving |
446 the putative theorem manually while Sledgehammer looks for a proof, but it can |
454 the putative theorem manually while Sledgehammer looks for a proof, but it can |
469 |
477 |
470 \opfalse{full\_types}{partial\_types} |
478 \opfalse{full\_types}{partial\_types} |
471 Specifies whether full-type information is exported. Enabling this option can |
479 Specifies whether full-type information is exported. Enabling this option can |
472 prevent the discovery of type-incorrect proofs, but it also tends to slow down |
480 prevent the discovery of type-incorrect proofs, but it also tends to slow down |
473 the ATPs significantly. For historical reasons, the default value of this option |
481 the ATPs significantly. For historical reasons, the default value of this option |
474 can be overridden using the option ``Sledgehammer: ATPs'' from the ``Isabelle'' |
482 can be overridden using the option ``Sledgehammer: Full Types'' from the |
475 menu in Proof General. |
483 ``Isabelle'' menu in Proof General. |
476 \end{enum} |
484 \end{enum} |
477 |
485 |
478 \subsection{Relevance Filter} |
486 \subsection{Relevance Filter} |
479 \label{relevance-filter} |
487 \label{relevance-filter} |
480 |
488 |
488 are relevant and 100 only theorems that refer to previously seen constants. |
496 are relevant and 100 only theorems that refer to previously seen constants. |
489 |
497 |
490 \opdefault{max\_relevant}{int\_or\_smart}{\textit{smart}} |
498 \opdefault{max\_relevant}{int\_or\_smart}{\textit{smart}} |
491 Specifies the maximum number of facts that may be returned by the relevance |
499 Specifies the maximum number of facts that may be returned by the relevance |
492 filter. If the option is set to \textit{smart}, it is set to a value that was |
500 filter. If the option is set to \textit{smart}, it is set to a value that was |
493 empirically found to be appropriate for the ATP. A typical value would be 300. |
501 empirically found to be appropriate for the prover. A typical value would be |
494 |
502 300. |
495 %\opsmartx{theory\_relevant}{theory\_irrelevant} |
|
496 %Specifies whether the theory from which a fact comes should be taken into |
|
497 %consideration by the relevance filter. If the option is set to \textit{smart}, |
|
498 %it is taken to be \textit{true} for SPASS and \textit{false} for the other ATPs; |
|
499 %empirical results suggest that these are the best settings. |
|
500 |
|
501 %\opfalse{defs\_relevant}{defs\_irrelevant} |
|
502 %Specifies whether the definition of constants occurring in the formula to prove |
|
503 %should be considered particularly relevant. Enabling this option tends to lead |
|
504 %to larger problems and typically slows down the ATPs. |
|
505 \end{enum} |
503 \end{enum} |
506 |
504 |
507 \subsection{Output Format} |
505 \subsection{Output Format} |
508 \label{output-format} |
506 \label{output-format} |
509 |
507 |