summary |
shortlog |
changelog |
graph |
tags |
branches |
files |
changeset |
file |
revisions |
annotate |
diff |
raw

doc-src/Sledgehammer/sledgehammer.tex

author | blanchet |

Sun Apr 22 14:16:46 2012 +0200 (2012-04-22) | |

changeset 47672 | 1bf4fa90cd03 |

parent 47642 | 9a9218111085 |

child 47963 | 46c73ad5f7c0 |

permissions | -rw-r--r-- |

fixed typos

1 \documentclass[a4paper,12pt]{article}

2 \usepackage[T1]{fontenc}

3 \usepackage{amsmath}

4 \usepackage{amssymb}

5 \usepackage[english,french]{babel}

6 \usepackage{color}

7 \usepackage{footmisc}

8 \usepackage{graphicx}

9 %\usepackage{mathpazo}

10 \usepackage{multicol}

11 \usepackage{stmaryrd}

12 %\usepackage[scaled=.85]{beramono}

13 \usepackage{../../lib/texinputs/isabelle,../iman,../pdfsetup}

15 \newcommand\download{\url{http://www21.in.tum.de/~blanchet/\#software}}

17 \def\qty#1{\ensuremath{\left<\mathit{#1\/}\right>}}

18 \def\qtybf#1{$\mathbf{\left<\textbf{\textit{#1\/}}\right>}$}

20 \newcommand\const[1]{\textsf{#1}}

22 %\oddsidemargin=4.6mm

23 %\evensidemargin=4.6mm

24 %\textwidth=150mm

25 %\topmargin=4.6mm

26 %\headheight=0mm

27 %\headsep=0mm

28 %\textheight=234mm

30 \def\Colon{\mathord{:\mkern-1.5mu:}}

31 %\def\lbrakk{\mathopen{\lbrack\mkern-3.25mu\lbrack}}

32 %\def\rbrakk{\mathclose{\rbrack\mkern-3.255mu\rbrack}}

33 \def\lparr{\mathopen{(\mkern-4mu\mid}}

34 \def\rparr{\mathclose{\mid\mkern-4mu)}}

36 \def\unk{{?}}

37 \def\undef{(\lambda x.\; \unk)}

38 %\def\unr{\textit{others}}

39 \def\unr{\ldots}

40 \def\Abs#1{\hbox{\rm{\flqq}}{\,#1\,}\hbox{\rm{\frqq}}}

41 \def\Q{{\smash{\lower.2ex\hbox{$\scriptstyle?$}}}}

43 \urlstyle{tt}

45 \begin{document}

47 %%% TYPESETTING

48 %\renewcommand\labelitemi{$\bullet$}

49 \renewcommand\labelitemi{\raise.065ex\hbox{\small\textbullet}}

51 \selectlanguage{english}

53 \title{\includegraphics[scale=0.5]{isabelle_sledgehammer} \\[4ex]

54 Hammering Away \\[\smallskipamount]

55 \Large A User's Guide to Sledgehammer for Isabelle/HOL}

56 \author{\hbox{} \\

57 Jasmin Christian Blanchette \\

58 {\normalsize Institut f\"ur Informatik, Technische Universit\"at M\"unchen} \\[4\smallskipamount]

59 {\normalsize with contributions from} \\[4\smallskipamount]

60 Lawrence C. Paulson \\

61 {\normalsize Computer Laboratory, University of Cambridge} \\

62 \hbox{}}

64 \maketitle

66 \tableofcontents

68 \setlength{\parskip}{.7em plus .2em minus .1em}

69 \setlength{\parindent}{0pt}

70 \setlength{\abovedisplayskip}{\parskip}

71 \setlength{\abovedisplayshortskip}{.9\parskip}

72 \setlength{\belowdisplayskip}{\parskip}

73 \setlength{\belowdisplayshortskip}{.9\parskip}

75 % General-purpose enum environment with correct spacing

76 \newenvironment{enum}%

77 {\begin{list}{}{%

78 \setlength{\topsep}{.1\parskip}%

79 \setlength{\partopsep}{.1\parskip}%

80 \setlength{\itemsep}{\parskip}%

81 \advance\itemsep by-\parsep}}

82 {\end{list}}

84 \def\pre{\begingroup\vskip0pt plus1ex\advance\leftskip by\leftmargin

85 \advance\rightskip by\leftmargin}

86 \def\post{\vskip0pt plus1ex\endgroup}

88 \def\prew{\pre\advance\rightskip by-\leftmargin}

89 \def\postw{\post}

91 \section{Introduction}

92 \label{introduction}

94 Sledgehammer is a tool that applies automatic theorem provers (ATPs)

95 and satisfiability-modulo-theories (SMT) solvers on the current goal.%

96 \footnote{The distinction between ATPs and SMT solvers is convenient but mostly

97 historical. The two communities are converging, with more and more ATPs

98 supporting typical SMT features such as arithmetic and sorts, and a few SMT

99 solvers parsing ATP syntaxes. There is also a strong technological connection

100 between instantiation-based ATPs (such as iProver and iProver-Eq) and SMT

101 solvers.}

102 %

103 The supported ATPs are E \cite{schulz-2002}, E-SInE \cite{sine}, E-ToFoF

104 \cite{tofof}, iProver \cite{korovin-2009}, iProver-Eq

105 \cite{korovin-sticksel-2010}, LEO-II \cite{leo2}, Satallax \cite{satallax},

106 SNARK \cite{snark}, SPASS \cite{weidenbach-et-al-2009}, Vampire

107 \cite{riazanov-voronkov-2002}, and Waldmeister \cite{waldmeister}. The ATPs are

108 run either locally or remotely via the System\-On\-TPTP web service

109 \cite{sutcliffe-2000}. In addition to the ATPs, the SMT solvers Z3 \cite{z3} is

110 used by default, and you can tell Sledgehammer to try Alt-Ergo \cite{alt-ergo},

111 CVC3 \cite{cvc3}, and Yices \cite{yices} as well; these are run either locally

112 or (for CVC3 and Z3) on a server at the TU M\"unchen.

114 The problem passed to the automatic provers consists of your current goal

115 together with a heuristic selection of hundreds of facts (theorems) from the

116 current theory context, filtered by relevance. Because jobs are run in the

117 background, you can continue to work on your proof by other means. Provers can

118 be run in parallel. Any reply (which may arrive half a minute later) will appear

119 in the Proof General response buffer.

121 The result of a successful proof search is some source text that usually (but

122 not always) reconstructs the proof within Isabelle. For ATPs, the reconstructed

123 proof relies on the general-purpose \textit{metis} proof method, which

124 integrates the Metis ATP in Isabelle/HOL with explicit inferences going through

125 the kernel. Thus its results are correct by construction.

127 In this manual, we will explicitly invoke the \textbf{sledgehammer} command.

128 Sledgehammer also provides an automatic mode that can be enabled via the ``Auto

129 Sledgehammer'' option in Proof General's ``Isabelle'' menu. In this mode,

130 Sledgehammer is run on every newly entered theorem. The time limit for Auto

131 Sledgehammer and other automatic tools can be set using the ``Auto Tools Time

132 Limit'' option.

134 \newbox\boxA

135 \setbox\boxA=\hbox{\texttt{NOSPAM}}

137 \newcommand\authoremail{\texttt{blan{\color{white}NOSPAM}\kern-\wd\boxA{}chette@\allowbreak

138 in.\allowbreak tum.\allowbreak de}}

140 To run Sledgehammer, you must make sure that the theory \textit{Sledgehammer} is

141 imported---this is rarely a problem in practice since it is part of

142 \textit{Main}. Examples of Sledgehammer use can be found in Isabelle's

143 \texttt{src/HOL/Metis\_Examples} directory.

144 Comments and bug reports concerning Sledgehammer or this manual should be

145 directed to the author at \authoremail.

147 \vskip2.5\smallskipamount

149 %\textbf{Acknowledgment.} The author would like to thank Mark Summerfield for

150 %suggesting several textual improvements.

152 \section{Installation}

153 \label{installation}

155 Sledgehammer is part of Isabelle, so you don't need to install it. However, it

156 relies on third-party automatic provers (ATPs and SMT solvers).

158 Among the ATPs, E, LEO-II, Satallax, SPASS, and Vampire can be run locally; in

159 addition, E, E-SInE, E-ToFoF, iProver, iProver-Eq, LEO-II, Satallax, SNARK,

160 Vampire, and Waldmeister are available remotely via System\-On\-TPTP

161 \cite{sutcliffe-2000}. If you want better performance, you should at least

162 install E and SPASS locally.

164 Among the SMT solvers, Alt-Ergo, CVC3, Yices, and Z3 can be run locally, and

165 CVC3 and Z3 can be run remotely on a TU M\"unchen server. If you want better

166 performance and get the ability to replay proofs that rely on the \emph{smt}

167 proof method without an Internet connection, you should at least install Z3

168 locally.

170 There are three main ways to install automatic provers on your machine:

172 \begin{sloppy}

173 \begin{enum}

174 \item[\labelitemi] If you installed an official Isabelle package, it should

175 already include properly setup executables for CVC3, E, SPASS, and Z3, ready to use.%

176 \footnote{Vampire's and Yices's licenses prevent us from doing the same for

177 these otherwise remarkable tools.}

178 For Z3, you must additionally set the variable

179 \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a

180 noncommercial user, either in the environment in which Isabelle is

181 launched or in your

182 \texttt{\$ISABELLE\_HOME\_USER/etc/settings} file.

184 \item[\labelitemi] Alternatively, you can download the Isabelle-aware CVC3, E,

185 SPASS, and Z3 binary packages from \download. Extract the archives, then add a

186 line to your \texttt{\$ISABELLE\_HOME\_USER\slash etc\slash components}%

187 \footnote{The variable \texttt{\$ISABELLE\_HOME\_USER} is set by Isabelle at

188 startup. Its value can be retrieved by executing \texttt{isabelle}

189 \texttt{getenv} \texttt{ISABELLE\_HOME\_USER} on the command line.}

190 file with the absolute path to CVC3, E, SPASS, or Z3. For example, if the

191 \texttt{components} file does not exist yet and you extracted SPASS to

192 \texttt{/usr/local/spass-3.8ds}, create it with the single line

194 \prew

195 \texttt{/usr/local/spass-3.8ds}

196 \postw

198 in it.

200 \item[\labelitemi] If you prefer to build E, LEO-II, Satallax, or SPASS

201 manually, or found a Vampire executable somewhere (e.g.,

202 \url{http://www.vprover.org/}), set the environment variable \texttt{E\_HOME},

203 \texttt{LEO2\_HOME}, \texttt{SATALLAX\_HOME}, \texttt{SPASS\_HOME}, or

204 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{eproof},

205 \texttt{leo}, \texttt{satallax}, \texttt{SPASS}, or \texttt{vampire} executable.

206 Sledgehammer has been tested with E 1.0 to 1.4, LEO-II 1.2.9, Satallax 2.2 and~2.3,

207 SPASS 3.5, 3.7, and 3.8ds, and Vampire 0.6, 1.0, and 1.8%

208 \footnote{Following the rewrite of Vampire, the counter for version numbers was

209 reset to 0; hence the (new) Vampire versions 0.6, 1.0, and 1.8 are more recent

210 than 9.0 or 11.5.}%

211 . Since the ATPs' output formats are neither documented nor stable, other

212 versions might not work well with Sledgehammer. Ideally,

213 you should also set \texttt{E\_VERSION}, \texttt{LEO2\_VERSION},

214 \texttt{SATALLAX\_VERSION}, \texttt{SPASS\_VERSION}, or

215 \texttt{VAMPIRE\_VERSION} to the prover's version number (e.g., ``1.4'').

217 Similarly, if you want to build Alt-Ergo or CVC3, or found a

218 Yices or Z3 executable somewhere (e.g.,

219 \url{http://yices.csl.sri.com/download.shtml} or

220 \url{http://research.microsoft.com/en-us/um/redmond/projects/z3/download.html}),

221 set the environment variable \texttt{CVC3\_\allowbreak SOLVER},

222 \texttt{YICES\_SOLVER}, or \texttt{Z3\_SOLVER} to the complete path of

223 the executable, \emph{including the file name}. Sledgehammer has been tested

224 with Alt-Ergo 0.93, CVC3 2.2 and 2.4.1, Yices 1.0.28 and 1.0.33, and Z3 3.0 to 3.2.

225 Since the SMT solvers' output formats are somewhat unstable, other

226 versions of the solvers might not work well with Sledgehammer. Ideally,

227 also set \texttt{CVC3\_VERSION}, \texttt{YICES\_VERSION}, or

228 \texttt{Z3\_VERSION} to the solver's version number (e.g., ``3.2'').

229 \end{enum}

230 \end{sloppy}

232 To check whether E, SPASS, Vampire, and/or Z3 are successfully installed, try

233 out the example in \S\ref{first-steps}. If the remote versions of any of these

234 provers is used (identified by the prefix ``\emph{remote\_\/}''), or if the

235 local versions fail to solve the easy goal presented there, something must be

236 wrong with the installation.

238 Remote prover invocation requires Perl with the World Wide Web Library

239 (\texttt{libwww-perl}) installed. If you must use a proxy server to access the

240 Internet, set the \texttt{http\_proxy} environment variable to the proxy, either

241 in the environment in which Isabelle is launched or in your

242 \texttt{\$ISABELLE\_HOME\_USER/etc/settings} file. Here are a few

243 examples:

245 \prew

246 \texttt{http\_proxy=http://proxy.example.org} \\

247 \texttt{http\_proxy=http://proxy.example.org:8080} \\

248 \texttt{http\_proxy=http://joeblow:pAsSwRd@proxy.example.org}

249 \postw

251 \section{First Steps}

252 \label{first-steps}

254 To illustrate Sledgehammer in context, let us start a theory file and

255 attempt to prove a simple lemma:

257 \prew

258 \textbf{theory}~\textit{Scratch} \\

259 \textbf{imports}~\textit{Main} \\

260 \textbf{begin} \\[2\smallskipamount]

261 %

262 \textbf{lemma} ``$[a] = [b] \,\Longrightarrow\, a = b$'' \\

263 \textbf{sledgehammer}

264 \postw

266 Instead of issuing the \textbf{sledgehammer} command, you can also find

267 Sledgehammer in the ``Commands'' submenu of the ``Isabelle'' menu in Proof

268 General or press the Emacs key sequence C-c C-a C-s.

269 Either way, Sledgehammer produces the following output after a few seconds:

271 \prew

272 \slshape

273 Sledgehammer: ``\textit{e\/}'' on goal \\

274 $[a] = [b] \,\Longrightarrow\, a = b$ \\

275 Try this: \textbf{by} (\textit{metis last\_ConsL}) (64 ms). \\[3\smallskipamount]

276 %

277 Sledgehammer: ``\textit{z3\/}'' on goal \\

278 $[a] = [b] \,\Longrightarrow\, a = b$ \\

279 Try this: \textbf{by} (\textit{metis list.inject}) (20 ms). \\[3\smallskipamount]

280 %

281 Sledgehammer: ``\textit{vampire\/}'' on goal \\

282 $[a] = [b] \,\Longrightarrow\, a = b$ \\

283 Try this: \textbf{by} (\textit{metis hd.simps}) (14 ms). \\[3\smallskipamount]

284 %

285 Sledgehammer: ``\textit{spass\/}'' on goal \\

286 $[a] = [b] \,\Longrightarrow\, a = b$ \\

287 Try this: \textbf{by} (\textit{metis list.inject}) (17 ms). \\[3\smallskipamount]

288 %

289 Sledgehammer: ``\textit{remote\_waldmeister\/}'' on goal \\

290 $[a] = [b] \,\Longrightarrow\, a = b$ \\

291 Try this: \textbf{by} (\textit{metis hd.simps}) (15 ms). \\[3\smallskipamount]

292 %

293 Sledgehammer: ``\textit{remote\_e\_sine\/}'' on goal \\

294 $[a] = [b] \,\Longrightarrow\, a = b$ \\

295 Try this: \textbf{by} (\textit{metis hd.simps}) (18 ms).

296 \postw

298 Sledgehammer ran E, E-SInE, SPASS, Vampire, Waldmeister, and Z3 in parallel.

299 Depending on which provers are installed and how many processor cores are

300 available, some of the provers might be missing or present with a

301 \textit{remote\_} prefix. Waldmeister is run only for unit equational problems,

302 where the goal's conclusion is a (universally quantified) equation.

304 For each successful prover, Sledgehammer gives a one-liner proof that uses

305 the \textit{metis} or \textit{smt} proof method. Approximate timings are shown

306 in parentheses, indicating how fast the call is. You can click the proof to

307 insert it into the theory text.

309 In addition, you can ask Sledgehammer for an Isar text proof by passing the

310 \textit{isar\_proof} option (\S\ref{output-format}):

312 \prew

313 \textbf{sledgehammer} [\textit{isar\_proof}]

314 \postw

316 When Isar proof construction is successful, it can yield proofs that are more

317 readable and also faster than the \textit{metis} or \textit{smt} one-liners.

318 This feature is experimental and is only available for ATPs.

320 \section{Hints}

321 \label{hints}

323 This section presents a few hints that should help you get the most out of

324 Sledgehammer. Frequently asked questions are answered in

325 \S\ref{frequently-asked-questions}.

327 %\newcommand\point[1]{\medskip\par{\sl\bfseries#1}\par\nopagebreak}

328 \newcommand\point[1]{\subsection{\emph{#1}}}

330 \point{Presimplify the goal}

332 For best results, first simplify your problem by calling \textit{auto} or at

333 least \textit{safe} followed by \textit{simp\_all}. The SMT solvers provide

334 arithmetic decision procedures, but the ATPs typically do not (or if they do,

335 Sledgehammer does not use it yet). Apart from Waldmeister, they are not

336 especially good at heavy rewriting, but because they regard equations as

337 undirected, they often prove theorems that require the reverse orientation of a

338 \textit{simp} rule. Higher-order problems can be tackled, but the success rate

339 is better for first-order problems. Hence, you may get better results if you

340 first simplify the problem to remove higher-order features.

342 \point{Make sure E, SPASS, Vampire, and Z3 are locally installed}

344 Locally installed provers are faster and more reliable than those running on

345 servers. See \S\ref{installation} for details on how to install them.

347 \point{Familiarize yourself with the most important options}

349 Sledgehammer's options are fully documented in \S\ref{command-syntax}. Many of

350 the options are very specialized, but serious users of the tool should at least

351 familiarize themselves with the following options:

353 \begin{enum}

354 \item[\labelitemi] \textbf{\textit{provers}} (\S\ref{mode-of-operation}) specifies

355 the automatic provers (ATPs and SMT solvers) that should be run whenever

356 Sledgehammer is invoked (e.g., ``\textit{provers}~= \textit{e spass

357 remote\_vampire\/}''). For convenience, you can omit ``\textit{provers}~=''

358 and simply write the prover names as a space-separated list (e.g., ``\textit{e

359 spass remote\_vampire\/}'').

361 \item[\labelitemi] \textbf{\textit{max\_relevant}} (\S\ref{relevance-filter})

362 specifies the maximum number of facts that should be passed to the provers. By

363 default, the value is prover-dependent but varies between about 150 and 1000. If

364 the provers time out, you can try lowering this value to, say, 100 or 50 and see

365 if that helps.

367 \item[\labelitemi] \textbf{\textit{isar\_proof}} (\S\ref{output-format}) specifies

368 that Isar proofs should be generated, instead of one-liner \textit{metis} or

369 \textit{smt} proofs. The length of the Isar proofs can be controlled by setting

370 \textit{isar\_shrink\_factor} (\S\ref{output-format}).

372 \item[\labelitemi] \textbf{\textit{timeout}} (\S\ref{timeouts}) controls the

373 provers' time limit. It is set to 30 seconds, but since Sledgehammer runs

374 asynchronously you should not hesitate to raise this limit to 60 or 120 seconds

375 if you are the kind of user who can think clearly while ATPs are active.

376 \end{enum}

378 Options can be set globally using \textbf{sledgehammer\_params}

379 (\S\ref{command-syntax}). The command also prints the list of all available

380 options with their current value. Fact selection can be influenced by specifying

381 ``$(\textit{add}{:}~\textit{my\_facts})$'' after the \textbf{sledgehammer} call

382 to ensure that certain facts are included, or simply ``$(\textit{my\_facts})$''

383 to force Sledgehammer to run only with $\textit{my\_facts}$.

385 \section{Frequently Asked Questions}

386 \label{frequently-asked-questions}

388 This sections answers frequently (and infrequently) asked questions about

389 Sledgehammer. It is a good idea to skim over it now even if you don't have any

390 questions at this stage. And if you have any further questions not listed here,

391 send them to the author at \authoremail.

393 \point{Which facts are passed to the automatic provers?}

395 The relevance filter assigns a score to every available fact (lemma, theorem,

396 definition, or axiom) based upon how many constants that fact shares with the

397 conjecture. This process iterates to include facts relevant to those just

398 accepted, but with a decay factor to ensure termination. The constants are

399 weighted to give unusual ones greater significance. The relevance filter copes

400 best when the conjecture contains some unusual constants; if all the constants

401 are common, it is unable to discriminate among the hundreds of facts that are

402 picked up. The relevance filter is also memoryless: It has no information about

403 how many times a particular fact has been used in a proof, and it cannot learn.

405 The number of facts included in a problem varies from prover to prover, since

406 some provers get overwhelmed more easily than others. You can show the number of

407 facts given using the \textit{verbose} option (\S\ref{output-format}) and the

408 actual facts using \textit{debug} (\S\ref{output-format}).

410 Sledgehammer is good at finding short proofs combining a handful of existing

411 lemmas. If you are looking for longer proofs, you must typically restrict the

412 number of facts, by setting the \textit{max\_relevant} option

413 (\S\ref{relevance-filter}) to, say, 25 or 50.

415 You can also influence which facts are actually selected in a number of ways. If

416 you simply want to ensure that a fact is included, you can specify it using the

417 ``$(\textit{add}{:}~\textit{my\_facts})$'' syntax. For example:

418 %

419 \prew

420 \textbf{sledgehammer} (\textit{add}: \textit{hd.simps} \textit{tl.simps})

421 \postw

422 %

423 The specified facts then replace the least relevant facts that would otherwise be

424 included; the other selected facts remain the same.

425 If you want to direct the selection in a particular direction, you can specify

426 the facts via \textbf{using}:

427 %

428 \prew

429 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\

430 \textbf{sledgehammer}

431 \postw

432 %

433 The facts are then more likely to be selected than otherwise, and if they are

434 selected at iteration $j$ they also influence which facts are selected at

435 iterations $j + 1$, $j + 2$, etc. To give them even more weight, try

436 %

437 \prew

438 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\

439 \textbf{apply}~\textbf{--} \\

440 \textbf{sledgehammer}

441 \postw

443 \point{Why does Metis fail to reconstruct the proof?}

445 There are many reasons. If Metis runs seemingly forever, that is a sign that the

446 proof is too difficult for it. Metis's search is complete, so it should

447 eventually find it, but that's little consolation. There are several possible

448 solutions:

450 \begin{enum}

451 \item[\labelitemi] Try the \textit{isar\_proof} option (\S\ref{output-format}) to

452 obtain a step-by-step Isar proof where each step is justified by \textit{metis}.

453 Since the steps are fairly small, \textit{metis} is more likely to be able to

454 replay them.

456 \item[\labelitemi] Try the \textit{smt} proof method instead of \textit{metis}. It

457 is usually stronger, but you need to either have Z3 available to replay the

458 proofs, trust the SMT solver, or use certificates. See the documentation in the

459 \emph{SMT} theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.

461 \item[\labelitemi] Try the \textit{blast} or \textit{auto} proof methods, passing

462 the necessary facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},

463 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.

464 \end{enum}

466 In some rare cases, \textit{metis} fails fairly quickly, and you get the error

467 message

469 \prew

470 \slshape

471 One-line proof reconstruction failed.

472 \postw

474 This message indicates that Sledgehammer determined that the goal is provable,

475 but the proof is, for technical reasons, beyond \textit{metis}'s power. You can

476 then try again with the \textit{strict} option (\S\ref{problem-encoding}).

478 If the goal is actually unprovable and you did not specify an unsound encoding

479 using \textit{type\_enc} (\S\ref{problem-encoding}), this is a bug, and you are

480 strongly encouraged to report this to the author at \authoremail.

482 \point{Why are the generated Isar proofs so ugly/broken?}

484 The current implementation of the Isar proof feature,

485 enabled by the \textit{isar\_proof} option (\S\ref{output-format}),

486 is highly experimental. Work on a new implementation has begun. There is a large body of

487 research into transforming resolution proofs into natural deduction proofs (such

488 as Isar proofs), which we hope to leverage. In the meantime, a workaround is to

489 set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger

490 value or to try several provers and keep the nicest-looking proof.

492 \point{How can I tell whether a suggested proof is sound?}

494 Earlier versions of Sledgehammer often suggested unsound proofs---either proofs

495 of nontheorems or simply proofs that rely on type-unsound inferences. This

496 is a thing of the past, unless you explicitly specify an unsound encoding

497 using \textit{type\_enc} (\S\ref{problem-encoding}).

498 %

499 Officially, the only form of ``unsoundness'' that lurks in the sound

500 encodings is related to missing characteristic theorems of datatypes. For

501 example,

503 \prew

504 \textbf{lemma}~``$\exists \mathit{xs}.\; \mathit{xs} \neq []$'' \\

505 \textbf{sledgehammer} ()

506 \postw

508 suggests an argumentless \textit{metis} call that fails. However, the conjecture

509 does actually hold, and the \textit{metis} call can be repaired by adding

510 \textit{list.distinct}.

511 %

512 We hope to address this problem in a future version of Isabelle. In the

513 meantime, you can avoid it by passing the \textit{strict} option

514 (\S\ref{problem-encoding}).

516 \point{What are the \textit{full\_types}, \textit{no\_types}, and

517 \textit{mono\_tags} arguments to Metis?}

519 The \textit{metis}~(\textit{full\_types}) proof method

520 and its cousin \textit{metis}~(\textit{mono\_tags}) are fully-typed

521 version of Metis. It is somewhat slower than \textit{metis}, but the proof

522 search is fully typed, and it also includes more powerful rules such as the

523 axiom ``$x = \const{True} \mathrel{\lor} x = \const{False}$'' for reasoning in

524 higher-order places (e.g., in set comprehensions). The method kicks in

525 automatically as a fallback when \textit{metis} fails, and it is sometimes

526 generated by Sledgehammer instead of \textit{metis} if the proof obviously

527 requires type information or if \textit{metis} failed when Sledgehammer

528 preplayed the proof. (By default, Sledgehammer tries to run \textit{metis} with

529 various options for up to 3 seconds each time to ensure that the generated

530 one-line proofs actually work and to display timing information. This can be

531 configured using the \textit{preplay\_timeout} and \textit{dont\_preplay}

532 options (\S\ref{timeouts}).)

533 %

534 At the other end of the soundness spectrum, \textit{metis} (\textit{no\_types})

535 uses no type information at all during the proof search, which is more efficient

536 but often fails. Calls to \textit{metis} (\textit{no\_types}) are occasionally

537 generated by Sledgehammer.

538 %

539 See the \textit{type\_enc} option (\S\ref{problem-encoding}) for details.

541 Incidentally, if you ever see warnings such as

543 \prew

544 \slshape

545 Metis: Falling back on ``\textit{metis} (\textit{full\_types})''.

546 \postw

548 for a successful \textit{metis} proof, you can advantageously pass the

549 \textit{full\_types} option to \textit{metis} directly.

551 \point{And what are the \textit{lifting} and \textit{hide\_lams} arguments

552 to Metis?}

554 Orthogonally to the encoding of types, it is important to choose an appropriate

555 translation of $\lambda$-abstractions. Metis supports three translation schemes,

556 in decreasing order of power: Curry combinators (the default),

557 $\lambda$-lifting, and a ``hiding'' scheme that disables all reasoning under

558 $\lambda$-abstractions. The more powerful schemes also give the automatic

559 provers more rope to hang themselves. See the \textit{lam\_trans} option (\S\ref{problem-encoding}) for details.

561 \point{Are generated proofs minimal?}

563 Automatic provers frequently use many more facts than are necessary.

564 Sledgehammer inclues a minimization tool that takes a set of facts returned by a

565 given prover and repeatedly calls the same prover, \textit{metis}, or

566 \textit{smt} with subsets of those axioms in order to find a minimal set.

567 Reducing the number of axioms typically improves Metis's speed and success rate,

568 while also removing superfluous clutter from the proof scripts.

570 In earlier versions of Sledgehammer, generated proofs were systematically

571 accompanied by a suggestion to invoke the minimization tool. This step is now

572 performed implicitly if it can be done in a reasonable amount of time (something

573 that can be guessed from the number of facts in the original proof and the time

574 it took to find or preplay it).

576 In addition, some provers (e.g., Yices) do not provide proofs or sometimes

577 produce incomplete proofs. The minimizer is then invoked to find out which facts

578 are actually needed from the (large) set of facts that was initially given to

579 the prover. Finally, if a prover returns a proof with lots of facts, the

580 minimizer is invoked automatically since Metis would be unlikely to re-find the

581 proof.

582 %

583 Automatic minimization can be forced or disabled using the \textit{minimize}

584 option (\S\ref{mode-of-operation}).

586 \point{A strange error occurred---what should I do?}

588 Sledgehammer tries to give informative error messages. Please report any strange

589 error to the author at \authoremail. This applies double if you get the message

591 \prew

592 \slshape

593 The prover found a type-unsound proof involving ``\textit{foo\/}'',

594 ``\textit{bar\/}'', and ``\textit{baz\/}'' even though a supposedly type-sound

595 encoding was used (or, less likely, your axioms are inconsistent). You might

596 want to report this to the Isabelle developers.

597 \postw

599 \point{Auto can solve it---why not Sledgehammer?}

601 Problems can be easy for \textit{auto} and difficult for automatic provers, but

602 the reverse is also true, so don't be discouraged if your first attempts fail.

603 Because the system refers to all theorems known to Isabelle, it is particularly

604 suitable when your goal has a short proof from lemmas that you don't know about.

606 \point{Why are there so many options?}

608 Sledgehammer's philosophy should work out of the box, without user guidance.

609 Many of the options are meant to be used mostly by the Sledgehammer developers

610 for experimentation purposes. Of course, feel free to experiment with them if

611 you are so inclined.

613 \section{Command Syntax}

614 \label{command-syntax}

616 \subsection{Sledgehammer}

618 Sledgehammer can be invoked at any point when there is an open goal by entering

619 the \textbf{sledgehammer} command in the theory file. Its general syntax is as

620 follows:

622 \prew

623 \textbf{sledgehammer} \qty{subcommand}$^?$ \qty{options}$^?$ \qty{facts\_override}$^?$ \qty{num}$^?$

624 \postw

626 For convenience, Sledgehammer is also available in the ``Commands'' submenu of

627 the ``Isabelle'' menu in Proof General or by pressing the Emacs key sequence C-c

628 C-a C-s. This is equivalent to entering the \textbf{sledgehammer} command with

629 no arguments in the theory text.

631 In the general syntax, the \qty{subcommand} may be any of the following:

633 \begin{enum}

634 \item[\labelitemi] \textbf{\textit{run} (the default):} Runs Sledgehammer on

635 subgoal number \qty{num} (1 by default), with the given options and facts.

637 \item[\labelitemi] \textbf{\textit{min}:} Attempts to minimize the facts

638 specified in the \qty{facts\_override} argument to obtain a simpler proof

639 involving fewer facts. The options and goal number are as for \textit{run}.

641 \item[\labelitemi] \textbf{\textit{messages}:} Redisplays recent messages issued

642 by Sledgehammer. This allows you to examine results that might have been lost

643 due to Sledgehammer's asynchronous nature. The \qty{num} argument specifies a

644 limit on the number of messages to display (10 by default).

646 \item[\labelitemi] \textbf{\textit{supported\_provers}:} Prints the list of

647 automatic provers supported by Sledgehammer. See \S\ref{installation} and

648 \S\ref{mode-of-operation} for more information on how to install automatic

649 provers.

651 \item[\labelitemi] \textbf{\textit{running\_provers}:} Prints information about

652 currently running automatic provers, including elapsed runtime and remaining

653 time until timeout.

655 \item[\labelitemi] \textbf{\textit{kill\_provers}:} Terminates all running

656 automatic provers.

658 \item[\labelitemi] \textbf{\textit{refresh\_tptp}:} Refreshes the list of remote

659 ATPs available at System\-On\-TPTP \cite{sutcliffe-2000}.

660 \end{enum}

662 Sledgehammer's behavior can be influenced by various \qty{options}, which can be

663 specified in brackets after the \textbf{sledgehammer} command. The

664 \qty{options} are a list of key--value pairs of the form ``[$k_1 = v_1,

665 \ldots, k_n = v_n$]''. For Boolean options, ``= \textit{true\/}'' is optional. For

666 example:

668 \prew

669 \textbf{sledgehammer} [\textit{isar\_proof}, \,\textit{timeout} = 120]

670 \postw

672 Default values can be set using \textbf{sledgehammer\_\allowbreak params}:

674 \prew

675 \textbf{sledgehammer\_params} \qty{options}

676 \postw

678 The supported options are described in \S\ref{option-reference}.

680 The \qty{facts\_override} argument lets you alter the set of facts that go

681 through the relevance filter. It may be of the form ``(\qty{facts})'', where

682 \qty{facts} is a space-separated list of Isabelle facts (theorems, local

683 assumptions, etc.), in which case the relevance filter is bypassed and the given

684 facts are used. It may also be of the form ``(\textit{add}:\ \qty{facts\/_{\mathrm{1}}})'',

685 ``(\textit{del}:\ \qty{facts\/_{\mathrm{2}}})'', or ``(\textit{add}:\ \qty{facts\/_{\mathrm{1}}}\

686 \textit{del}:\ \qty{facts\/_{\mathrm{2}}})'', where the relevance filter is instructed to

687 proceed as usual except that it should consider \qty{facts\/_{\mathrm{1}}}

688 highly-relevant and \qty{facts\/_{\mathrm{2}}} fully irrelevant.

690 You can instruct Sledgehammer to run automatically on newly entered theorems by

691 enabling the ``Auto Sledgehammer'' option in Proof General's ``Isabelle'' menu.

692 For automatic runs, only the first prover set using \textit{provers}

693 (\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,

694 \textit{slice} (\S\ref{mode-of-operation}) is disabled, \textit{strict}

695 (\S\ref{problem-encoding}) is enabled, \textit{verbose} (\S\ref{output-format})

696 and \textit{debug} (\S\ref{output-format}) are disabled, and \textit{timeout}

697 (\S\ref{timeouts}) is superseded by the ``Auto Tools Time Limit'' in Proof

698 General's ``Isabelle'' menu. Sledgehammer's output is also more concise.

700 \subsection{Metis}

702 The \textit{metis} proof method has the syntax

704 \prew

705 \textbf{\textit{metis}}~(\qty{options})${}^?$~\qty{facts}${}^?$

706 \postw

708 where \qty{facts} is a list of arbitrary facts and \qty{options} is a

709 comma-separated list consisting of at most one $\lambda$ translation scheme

710 specification with the same semantics as Sledgehammer's \textit{lam\_trans}

711 option (\S\ref{problem-encoding}) and at most one type encoding specification

712 with the same semantics as Sledgehammer's \textit{type\_enc} option

713 (\S\ref{problem-encoding}).

714 %

715 The supported $\lambda$ translation schemes are \textit{hide\_lams},

716 \textit{lifting}, and \textit{combs} (the default).

717 %

718 All the untyped type encodings listed in \S\ref{problem-encoding} are supported.

719 For convenience, the following aliases are provided:

720 \begin{enum}

721 \item[\labelitemi] \textbf{\textit{full\_types}:} Synonym for \textit{poly\_guards\_query}.

722 \item[\labelitemi] \textbf{\textit{partial\_types}:} Synonym for \textit{poly\_args}.

723 \item[\labelitemi] \textbf{\textit{no\_types}:} Synonym for \textit{erased}.

724 \end{enum}

726 \section{Option Reference}

727 \label{option-reference}

729 \def\defl{\{}

730 \def\defr{\}}

732 \def\flushitem#1{\item[]\noindent\kern-\leftmargin \textbf{#1}}

733 \def\optrueonly#1{\flushitem{\textit{#1} $\bigl[$= \textit{true}$\bigr]$\enskip}\nopagebreak\\[\parskip]}

734 \def\optrue#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{true}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}

735 \def\opfalse#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{bool}$\bigr]$\enskip \defl\textit{false}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}

736 \def\opsmart#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}

737 \def\opsmartx#1#2{\flushitem{\textit{#1} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\enskip \defl\textit{smart}\defr\\\hbox{}\hfill (neg.: \textit{#2})}\nopagebreak\\[\parskip]}

738 \def\opnodefault#1#2{\flushitem{\textit{#1} = \qtybf{#2}} \nopagebreak\\[\parskip]}

739 \def\opnodefaultbrk#1#2{\flushitem{$\bigl[$\textit{#1} =$\bigr]$ \qtybf{#2}} \nopagebreak\\[\parskip]}

740 \def\opdefault#1#2#3{\flushitem{\textit{#1} = \qtybf{#2}\enskip \defl\textit{#3}\defr} \nopagebreak\\[\parskip]}

741 \def\oparg#1#2#3{\flushitem{\textit{#1} \qtybf{#2} = \qtybf{#3}} \nopagebreak\\[\parskip]}

742 \def\opargbool#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}

743 \def\opargboolorsmart#1#2#3{\flushitem{\textit{#1} \qtybf{#2} $\bigl[$= \qtybf{smart\_bool}$\bigr]$\hfill (neg.: \textit{#3})}\nopagebreak\\[\parskip]}

745 Sledgehammer's options are categorized as follows:\ mode of operation

746 (\S\ref{mode-of-operation}), problem encoding (\S\ref{problem-encoding}),

747 relevance filter (\S\ref{relevance-filter}), output format

748 (\S\ref{output-format}), authentication (\S\ref{authentication}), and timeouts

749 (\S\ref{timeouts}).

751 The descriptions below refer to the following syntactic quantities:

753 \begin{enum}

754 \item[\labelitemi] \qtybf{string}: A string.

755 \item[\labelitemi] \qtybf{bool\/}: \textit{true} or \textit{false}.

756 \item[\labelitemi] \qtybf{smart\_bool\/}: \textit{true}, \textit{false}, or

757 \textit{smart}.

758 \item[\labelitemi] \qtybf{int\/}: An integer.

759 %\item[\labelitemi] \qtybf{float\/}: A floating-point number (e.g., 2.5).

760 \item[\labelitemi] \qtybf{float\_pair\/}: A pair of floating-point numbers

761 (e.g., 0.6 0.95).

762 \item[\labelitemi] \qtybf{smart\_int\/}: An integer or \textit{smart}.

763 \item[\labelitemi] \qtybf{float\_or\_none\/}: A floating-point number (e.g., 60 or

764 0.5) expressing a number of seconds, or the keyword \textit{none} ($\infty$

765 seconds).

766 \end{enum}

768 Default values are indicated in curly brackets (\textrm{\{\}}). Boolean options

769 have a negated counterpart (e.g., \textit{blocking} vs.\

770 \textit{non\_blocking}). When setting them, ``= \textit{true\/}'' may be omitted.

772 \subsection{Mode of Operation}

773 \label{mode-of-operation}

775 \begin{enum}

776 \opnodefaultbrk{provers}{string}

777 Specifies the automatic provers to use as a space-separated list (e.g.,

778 ``\textit{e}~\textit{spass}~\textit{remote\_vampire\/}'').

779 Provers can be run locally or remotely; see \S\ref{installation} for

780 installation instructions.

782 The following local provers are supported:

784 \begin{enum}

785 \item[\labelitemi] \textbf{\textit{alt\_ergo}:} Alt-Ergo is a polymorphic

786 SMT solver developed by Bobot et al.\ \cite{alt-ergo}.

787 It supports the TPTP polymorphic typed first-order format (TFF1) via Why3

788 \cite{why3}. It is included for experimental purposes. To use Alt-Ergo, set the

789 environment variable \texttt{WHY3\_HOME} to the directory that contains the

790 \texttt{why3} executable. Sledgehammer has been tested with Alt-Ergo 0.93 and an

791 unidentified development version of Why3.

793 \item[\labelitemi] \textbf{\textit{cvc3}:} CVC3 is an SMT solver developed by

794 Clark Barrett, Cesare Tinelli, and their colleagues \cite{cvc3}. To use CVC3,

795 set the environment variable \texttt{CVC3\_SOLVER} to the complete path of the

796 executable, including the file name, or install the prebuilt CVC3 package from

797 \download. Sledgehammer has been tested with version 2.2.

799 \item[\labelitemi] \textbf{\textit{e}:} E is a first-order resolution prover

800 developed by Stephan Schulz \cite{schulz-2002}. To use E, set the environment

801 variable \texttt{E\_HOME} to the directory that contains the \texttt{eproof}

802 executable and \texttt{E\_VERSION} to the version number (e.g., ``1.4''), or

803 install the prebuilt E package from \download. Sledgehammer has been tested with

804 versions 1.0 to 1.4.

806 \item[\labelitemi] \textbf{\textit{leo2}:} LEO-II is an automatic

807 higher-order prover developed by Christoph Benzm\"uller et al.\ \cite{leo2},

808 with support for the TPTP typed higher-order syntax (THF0). To use LEO-II, set

809 the environment variable \texttt{LEO2\_HOME} to the directory that contains the

810 \texttt{leo} executable. Sledgehammer requires version 1.2.9 or above.

812 \item[\labelitemi] \textbf{\textit{metis}:} Although it is much less powerful than

813 the external provers, Metis itself can be used for proof search.

815 \item[\labelitemi] \textbf{\textit{satallax}:} Satallax is an automatic

816 higher-order prover developed by Chad Brown et al.\ \cite{satallax}, with

817 support for the TPTP typed higher-order syntax (THF0). To use Satallax, set the

818 environment variable \texttt{SATALLAX\_HOME} to the directory that contains the

819 \texttt{satallax} executable. Sledgehammer requires version 2.2 or above.

821 \item[\labelitemi] \textbf{\textit{smt}:} The \textit{smt} proof method with the

822 current settings (usually:\ Z3 with proof reconstruction).

824 \item[\labelitemi] \textbf{\textit{spass}:} SPASS is a first-order resolution

825 prover developed by Christoph Weidenbach et al.\ \cite{weidenbach-et-al-2009}.

826 To use SPASS, set the environment variable \texttt{SPASS\_HOME} to the directory

827 that contains the \texttt{SPASS} executable and \texttt{SPASS\_VERSION} to the

828 version number (e.g., ``3.8ds''), or install the prebuilt SPASS package from

829 \download. Sledgehammer requires version 3.5 or above.

831 \item[\labelitemi] \textbf{\textit{vampire}:} Vampire is a first-order resolution

832 prover developed by Andrei Voronkov and his colleagues

833 \cite{riazanov-voronkov-2002}. To use Vampire, set the environment variable

834 \texttt{VAMPIRE\_HOME} to the directory that contains the \texttt{vampire}

835 executable and \texttt{VAMPIRE\_VERSION} to the version number (e.g., ``1.8'').

836 Sledgehammer has been tested with versions 0.6, 1.0, and 1.8.

837 Versions above 1.8 support the TPTP typed first-order format (TFF0).

839 \item[\labelitemi] \textbf{\textit{yices}:} Yices is an SMT solver developed at

840 SRI \cite{yices}. To use Yices, set the environment variable

841 \texttt{YICES\_SOLVER} to the complete path of the executable, including the

842 file name. Sledgehammer has been tested with version 1.0.28.

844 \item[\labelitemi] \textbf{\textit{z3}:} Z3 is an SMT solver developed at

845 Microsoft Research \cite{z3}. To use Z3, set the environment variable

846 \texttt{Z3\_SOLVER} to the complete path of the executable, including the file

847 name, and set \texttt{Z3\_NON\_COMMERCIAL} to ``yes'' to confirm that you are a

848 noncommercial user. Sledgehammer has been tested with versions 3.0 to 3.2.

850 \item[\labelitemi] \textbf{\textit{z3\_tptp}:} This version of Z3 pretends to be

851 an ATP, exploiting Z3's support for the TPTP untyped and typed first-order

852 formats (FOF and TFF0). It is included for experimental purposes. It

853 requires version 3.0 or above. To use it, set the environment variable

854 \texttt{Z3\_HOME} to the directory that contains the \texttt{z3}

855 executable.

856 \end{enum}

858 The following remote provers are supported:

860 \begin{enum}

861 \item[\labelitemi] \textbf{\textit{remote\_cvc3}:} The remote version of CVC3 runs

862 on servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to

863 point).

865 \item[\labelitemi] \textbf{\textit{remote\_e}:} The remote version of E runs

866 on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.

868 \item[\labelitemi] \textbf{\textit{remote\_e\_sine}:} E-SInE is a metaprover

869 developed by Kry\v stof Hoder \cite{sine} based on E. It runs on Geoff

870 Sutcliffe's Miami servers.

872 \item[\labelitemi] \textbf{\textit{remote\_e\_tofof}:} E-ToFoF is a metaprover

873 developed by Geoff Sutcliffe \cite{tofof} based on E running on his Miami

874 servers. This ATP supports the TPTP typed first-order format (TFF0). The

875 remote version of E-ToFoF runs on Geoff Sutcliffe's Miami servers.

877 \item[\labelitemi] \textbf{\textit{remote\_iprover}:} iProver is a pure

878 instantiation-based prover developed by Konstantin Korovin \cite{korovin-2009}. The

879 remote version of iProver runs on Geoff Sutcliffe's Miami servers

880 \cite{sutcliffe-2000}.

882 \item[\labelitemi] \textbf{\textit{remote\_iprover\_eq}:} iProver-Eq is an

883 instantiation-based prover with native support for equality developed by

884 Konstantin Korovin and Christoph Sticksel \cite{korovin-sticksel-2010}. The

885 remote version of iProver-Eq runs on Geoff Sutcliffe's Miami servers

886 \cite{sutcliffe-2000}.

888 \item[\labelitemi] \textbf{\textit{remote\_leo2}:} The remote version of LEO-II

889 runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.

891 \item[\labelitemi] \textbf{\textit{remote\_satallax}:} The remote version of

892 Satallax runs on Geoff Sutcliffe's Miami servers \cite{sutcliffe-2000}.

894 \item[\labelitemi] \textbf{\textit{remote\_snark}:} SNARK is a first-order

895 resolution prover developed by Stickel et al.\ \cite{snark}. It supports the

896 TPTP typed first-order format (TFF0). The remote version of SNARK runs on

897 Geoff Sutcliffe's Miami servers.

899 \item[\labelitemi] \textbf{\textit{remote\_vampire}:} The remote version of

900 Vampire runs on Geoff Sutcliffe's Miami servers. Version 1.8 is used.

902 \item[\labelitemi] \textbf{\textit{remote\_waldmeister}:} Waldmeister is a unit

903 equality prover developed by Hillenbrand et al.\ \cite{waldmeister}. It can be

904 used to prove universally quantified equations using unconditional equations,

905 corresponding to the TPTP CNF UEQ division. The remote version of Waldmeister

906 runs on Geoff Sutcliffe's Miami servers.

908 \item[\labelitemi] \textbf{\textit{remote\_z3}:} The remote version of Z3 runs on

909 servers at the TU M\"unchen (or wherever \texttt{REMOTE\_SMT\_URL} is set to

910 point).

912 \item[\labelitemi] \textbf{\textit{remote\_z3\_tptp}:} The remote version of ``Z3

913 with TPTP syntax'' runs on Geoff Sutcliffe's Miami servers.

914 \end{enum}

916 By default, Sledgehammer runs E, E-SInE, SPASS, Vampire, Z3 (or whatever

917 the SMT module's \textit{smt\_solver} configuration option is set to), and (if

918 appropriate) Waldmeister in parallel---either locally or remotely, depending on

919 the number of processor cores available. For historical reasons, the default

920 value of this option can be overridden using the option ``Sledgehammer:

921 Provers'' in Proof General's ``Isabelle'' menu.

923 It is generally a good idea to run several provers in parallel. Running E,

924 SPASS, and Vampire for 5~seconds yields a similar success rate to running the

925 most effective of these for 120~seconds \cite{boehme-nipkow-2010}.

927 For the \textit{min} subcommand, the default prover is \textit{metis}. If

928 several provers are set, the first one is used.

930 \opnodefault{prover}{string}

931 Alias for \textit{provers}.

933 \opfalse{blocking}{non\_blocking}

934 Specifies whether the \textbf{sledgehammer} command should operate

935 synchronously. The asynchronous (non-blocking) mode lets the user start proving

936 the putative theorem manually while Sledgehammer looks for a proof, but it can

937 also be more confusing. Irrespective of the value of this option, Sledgehammer

938 is always run synchronously for the new jEdit-based user interface or if

939 \textit{debug} (\S\ref{output-format}) is enabled.

941 \optrue{slice}{dont\_slice}

942 Specifies whether the time allocated to a prover should be sliced into several

943 segments, each of which has its own set of possibly prover-dependent options.

944 For SPASS and Vampire, the first slice tries the fast but incomplete

945 set-of-support (SOS) strategy, whereas the second slice runs without it. For E,

946 up to three slices are tried, with different weighted search strategies and

947 number of facts. For SMT solvers, several slices are tried with the same options

948 each time but fewer and fewer facts. According to benchmarks with a timeout of

949 30 seconds, slicing is a valuable optimization, and you should probably leave it

950 enabled unless you are conducting experiments. This option is implicitly

951 disabled for (short) automatic runs.

953 \nopagebreak

954 {\small See also \textit{verbose} (\S\ref{output-format}).}

956 \opsmart{minimize}{dont\_minimize}

957 Specifies whether the minimization tool should be invoked automatically after

958 proof search. By default, automatic minimization takes place only if

959 it can be done in a reasonable amount of time (as determined by

960 the number of facts in the original proof and the time it took to find or

961 preplay it) or the proof involves an unreasonably large number of facts.

963 \nopagebreak

964 {\small See also \textit{preplay\_timeout} (\S\ref{timeouts})

965 and \textit{dont\_preplay} (\S\ref{timeouts}).}

967 \opfalse{overlord}{no\_overlord}

968 Specifies whether Sledgehammer should put its temporary files in

969 \texttt{\$ISA\-BELLE\_\allowbreak HOME\_\allowbreak USER}, which is useful for

970 debugging Sledgehammer but also unsafe if several instances of the tool are run

971 simultaneously. The files are identified by the prefix \texttt{prob\_}; you may

972 safely remove them after Sledgehammer has run.

974 \nopagebreak

975 {\small See also \textit{debug} (\S\ref{output-format}).}

976 \end{enum}

978 \subsection{Problem Encoding}

979 \label{problem-encoding}

981 \newcommand\comb[1]{\const{#1}}

983 \begin{enum}

984 \opdefault{lam\_trans}{string}{smart}

985 Specifies the $\lambda$ translation scheme to use in ATP problems. The supported

986 translation schemes are listed below:

988 \begin{enum}

989 \item[\labelitemi] \textbf{\textit{hide\_lams}:} Hide the $\lambda$-abstractions

990 by replacing them by unspecified fresh constants, effectively disabling all

991 reasoning under $\lambda$-abstractions.

993 \item[\labelitemi] \textbf{\textit{lifting}:} Introduce a new

994 supercombinator \const{c} for each cluster of $n$~$\lambda$-abstractions,

995 defined using an equation $\const{c}~x_1~\ldots~x_n = t$ ($\lambda$-lifting).

997 \item[\labelitemi] \textbf{\textit{combs}:} Rewrite lambdas to the Curry

998 combinators (\comb{I}, \comb{K}, \comb{S}, \comb{B}, \comb{C}). Combinators

999 enable the ATPs to synthesize $\lambda$-terms but tend to yield bulkier formulas

1000 than $\lambda$-lifting: The translation is quadratic in the worst case, and the

1001 equational definitions of the combinators are very prolific in the context of

1002 resolution.

1004 \item[\labelitemi] \textbf{\textit{combs\_and\_lifting}:} Introduce a new

1005 supercombinator \const{c} for each cluster of $\lambda$-abstractions and characterize it both using a

1006 lifted equation $\const{c}~x_1~\ldots~x_n = t$ and via Curry combinators.

1008 \item[\labelitemi] \textbf{\textit{combs\_or\_lifting}:} For each cluster of

1009 $\lambda$-abstractions, heuristically choose between $\lambda$-lifting and Curry

1010 combinators.

1012 \item[\labelitemi] \textbf{\textit{keep\_lams}:}

1013 Keep the $\lambda$-abstractions in the generated problems. This is available

1014 only with provers that support the THF0 syntax.

1016 \item[\labelitemi] \textbf{\textit{smart}:} The actual translation scheme used

1017 depends on the ATP and should be the most efficient scheme for that ATP.

1018 \end{enum}

1020 For SMT solvers, the $\lambda$ translation scheme is always \textit{lifting},

1021 irrespective of the value of this option.

1023 \opsmartx{uncurried\_aliases}{no\_uncurried\_aliases}

1024 Specifies whether fresh function symbols should be generated as aliases for

1025 applications of curried functions in ATP problems.

1027 \opdefault{type\_enc}{string}{smart}

1028 Specifies the type encoding to use in ATP problems. Some of the type encodings

1029 are unsound, meaning that they can give rise to spurious proofs

1030 (unreconstructible using \textit{metis}). The supported type encodings are

1031 listed below, with an indication of their soundness in parentheses.

1032 An asterisk (*) means that the encoding is slightly incomplete for

1033 reconstruction with \textit{metis}, unless the \emph{strict} option (described

1034 below) is enabled.

1036 \begin{enum}

1037 \item[\labelitemi] \textbf{\textit{erased} (very unsound):} No type information is

1038 supplied to the ATP, not even to resolve overloading. Types are simply erased.

1040 \item[\labelitemi] \textbf{\textit{poly\_guards} (sound):} Types are encoded using

1041 a predicate \const{g}$(\tau, t)$ that guards bound

1042 variables. Constants are annotated with their types, supplied as additional

1043 arguments, to resolve overloading.

1045 \item[\labelitemi] \textbf{\textit{poly\_tags} (sound):} Each term and subterm is

1046 tagged with its type using a function $\const{t\/}(\tau, t)$.

1048 \item[\labelitemi] \textbf{\textit{poly\_args} (unsound):}

1049 Like for \textit{poly\_guards} constants are annotated with their types to

1050 resolve overloading, but otherwise no type information is encoded. This

1051 coincides with the default encoding used by the \textit{metis} command.

1053 \item[\labelitemi]

1054 \textbf{%

1055 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags} (sound); \\

1056 \textit{raw\_mono\_args} (unsound):} \\

1057 Similar to \textit{poly\_guards}, \textit{poly\_tags}, and \textit{poly\_args},

1058 respectively, but the problem is additionally monomorphized, meaning that type

1059 variables are instantiated with heuristically chosen ground types.

1060 Monomorphization can simplify reasoning but also leads to larger fact bases,

1061 which can slow down the ATPs.

1063 \item[\labelitemi]

1064 \textbf{%

1065 \textit{mono\_guards}, \textit{mono\_tags} (sound);

1066 \textit{mono\_args} (unsound):} \\

1067 Similar to

1068 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, and

1069 \textit{raw\_mono\_args}, respectively but types are mangled in constant names

1070 instead of being supplied as ground term arguments. The binary predicate

1071 $\const{g}(\tau, t)$ becomes a unary predicate

1072 $\const{g\_}\tau(t)$, and the binary function

1073 $\const{t}(\tau, t)$ becomes a unary function

1074 $\const{t\_}\tau(t)$.

1076 \item[\labelitemi] \textbf{\textit{mono\_native} (sound):} Exploits native

1077 first-order types if the prover supports the TFF0, TFF1, or THF0 syntax;

1078 otherwise, falls back on \textit{mono\_guards}. The problem is monomorphized.

1080 \item[\labelitemi] \textbf{\textit{mono\_native\_higher} (sound):} Exploits

1081 native higher-order types if the prover supports the THF0 syntax; otherwise,

1082 falls back on \textit{mono\_native} or \textit{mono\_guards}. The problem is

1083 monomorphized.

1085 \item[\labelitemi] \textbf{\textit{poly\_native} (sound):} Exploits native

1086 polymorphic first-order types if the prover supports the TFF1 syntax; otherwise,

1087 falls back on \textit{mono\_native}.

1089 \item[\labelitemi]

1090 \textbf{%

1091 \textit{poly\_guards}?, \textit{poly\_tags}?, \textit{raw\_mono\_guards}?, \\

1092 \textit{raw\_mono\_tags}?, \textit{mono\_guards}?, \textit{mono\_tags}?, \\

1093 \textit{mono\_native}? (sound*):} \\

1094 The type encodings \textit{poly\_guards}, \textit{poly\_tags},

1095 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},

1096 \textit{mono\_tags}, and \textit{mono\_native} are fully typed and sound. For

1097 each of these, Sledgehammer also provides a lighter variant identified by a

1098 question mark (`\hbox{?}')\ that detects and erases monotonic types, notably

1099 infinite types. (For \textit{mono\_native}, the types are not actually erased

1100 but rather replaced by a shared uniform type of individuals.) As argument to the

1101 \textit{metis} proof method, the question mark is replaced by a

1102 \hbox{``\textit{\_query\/}''} suffix.

1104 \item[\labelitemi]

1105 \textbf{%

1106 \textit{poly\_guards}??, \textit{poly\_tags}??, \textit{raw\_mono\_guards}??, \\

1107 \textit{raw\_mono\_tags}??, \textit{mono\_guards}??, \textit{mono\_tags}?? \\

1108 (sound*):} \\

1109 Even lighter versions of the `\hbox{?}' encodings. As argument to the

1110 \textit{metis} proof method, the `\hbox{??}' suffix is replaced by

1111 \hbox{``\textit{\_query\_query\/}''}.

1113 \item[\labelitemi]

1114 \textbf{%

1115 \textit{poly\_guards}@?, \textit{raw\_mono\_guards}@? (sound*):} \\

1116 Alternative versions of the `\hbox{??}' encodings. As argument to the

1117 \textit{metis} proof method, the `\hbox{@?}' suffix is replaced by

1118 \hbox{``\textit{\_at\_query\/}''}.

1120 \item[\labelitemi]

1121 \textbf{%

1122 \textit{poly\_guards}!, \textit{poly\_tags}!, \textit{raw\_mono\_guards}!, \\

1123 \textit{raw\_mono\_tags}!, \textit{mono\_guards}!, \textit{mono\_tags}!, \\

1124 \textit{mono\_native}!, \textit{mono\_native\_higher}! (mildly unsound):} \\

1125 The type encodings \textit{poly\_guards}, \textit{poly\_tags},

1126 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},

1127 \textit{mono\_tags}, \textit{mono\_native}, and \textit{mono\_native\_higher}

1128 also admit a mildly unsound (but very efficient) variant identified by an

1129 exclamation mark (`\hbox{!}') that detects and erases erases all types except

1130 those that are clearly finite (e.g., \textit{bool}). (For \textit{mono\_native}

1131 and \textit{mono\_native\_higher}, the types are not actually erased but rather

1132 replaced by a shared uniform type of individuals.) As argument to the

1133 \textit{metis} proof method, the exclamation mark is replaced by the suffix

1134 \hbox{``\textit{\_bang\/}''}.

1136 \item[\labelitemi]

1137 \textbf{%

1138 \textit{poly\_guards}!!, \textit{poly\_tags}!!, \textit{raw\_mono\_guards}!!, \\

1139 \textit{raw\_mono\_tags}!!, \textit{mono\_guards}!!, \textit{mono\_tags}!! \\

1140 (mildly unsound):} \\

1141 Even lighter versions of the `\hbox{!}' encodings. As argument to the

1142 \textit{metis} proof method, the `\hbox{!!}' suffix is replaced by

1143 \hbox{``\textit{\_bang\_bang\/}''}.

1145 \item[\labelitemi]

1146 \textbf{%

1147 \textit{poly\_guards}@!, \textit{raw\_mono\_guards}@! (mildly unsound):} \\

1148 Alternative versions of the `\hbox{!!}' encodings. As argument to the

1149 \textit{metis} proof method, the `\hbox{@!}' suffix is replaced by

1150 \hbox{``\textit{\_at\_bang\/}''}.

1152 \item[\labelitemi] \textbf{\textit{smart}:} The actual encoding used depends on

1153 the ATP and should be the most efficient sound encoding for that ATP.

1154 \end{enum}

1156 For SMT solvers, the type encoding is always \textit{mono\_native}, irrespective

1157 of the value of this option.

1159 \nopagebreak

1160 {\small See also \textit{max\_new\_mono\_instances} (\S\ref{relevance-filter})

1161 and \textit{max\_mono\_iters} (\S\ref{relevance-filter}).}

1163 \opfalse{strict}{non\_strict}

1164 Specifies whether Sledgehammer should run in its strict mode. In that mode,

1165 sound type encodings marked with an asterisk (*) above are made complete

1166 for reconstruction with \textit{metis}, at the cost of some clutter in the

1167 generated problems. This option has no effect if \textit{type\_enc} is

1168 deliberately set to an unsound encoding.

1169 \end{enum}

1171 \subsection{Relevance Filter}

1172 \label{relevance-filter}

1174 \begin{enum}

1175 \opdefault{relevance\_thresholds}{float\_pair}{\upshape 0.45~0.85}

1176 Specifies the thresholds above which facts are considered relevant by the

1177 relevance filter. The first threshold is used for the first iteration of the

1178 relevance filter and the second threshold is used for the last iteration (if it

1179 is reached). The effective threshold is quadratically interpolated for the other

1180 iterations. Each threshold ranges from 0 to 1, where 0 means that all theorems

1181 are relevant and 1 only theorems that refer to previously seen constants.

1183 \opdefault{max\_relevant}{smart\_int}{smart}

1184 Specifies the maximum number of facts that may be returned by the relevance

1185 filter. If the option is set to \textit{smart}, it is set to a value that was

1186 empirically found to be appropriate for the prover. A typical value would be

1187 250.

1189 \opdefault{max\_new\_mono\_instances}{int}{\upshape 200}

1190 Specifies the maximum number of monomorphic instances to generate beyond

1191 \textit{max\_relevant}. The higher this limit is, the more monomorphic instances

1192 are potentially generated. Whether monomorphization takes place depends on the

1193 type encoding used.

1195 \nopagebreak

1196 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).}

1198 \opdefault{max\_mono\_iters}{int}{\upshape 3}

1199 Specifies the maximum number of iterations for the monomorphization fixpoint

1200 construction. The higher this limit is, the more monomorphic instances are

1201 potentially generated. Whether monomorphization takes place depends on the

1202 type encoding used.

1204 \nopagebreak

1205 {\small See also \textit{type\_enc} (\S\ref{problem-encoding}).}

1206 \end{enum}

1208 \subsection{Output Format}

1209 \label{output-format}

1211 \begin{enum}

1213 \opfalse{verbose}{quiet}

1214 Specifies whether the \textbf{sledgehammer} command should explain what it does.

1215 This option is implicitly disabled for automatic runs.

1217 \opfalse{debug}{no\_debug}

1218 Specifies whether Sledgehammer should display additional debugging information

1219 beyond what \textit{verbose} already displays. Enabling \textit{debug} also

1220 enables \textit{verbose} and \textit{blocking} (\S\ref{mode-of-operation})

1221 behind the scenes. The \textit{debug} option is implicitly disabled for

1222 automatic runs.

1224 \nopagebreak

1225 {\small See also \textit{overlord} (\S\ref{mode-of-operation}).}

1227 \opfalse{isar\_proof}{no\_isar\_proof}

1228 Specifies whether Isar proofs should be output in addition to one-liner

1229 \textit{metis} proofs. Isar proof construction is still experimental and often

1230 fails; however, they are usually faster and sometimes more robust than

1231 \textit{metis} proofs.

1233 \opdefault{isar\_shrink\_factor}{int}{\upshape 1}

1234 Specifies the granularity of the Isar proof. A value of $n$ indicates that each

1235 Isar proof step should correspond to a group of up to $n$ consecutive proof

1236 steps in the ATP proof.

1237 \end{enum}

1239 \subsection{Authentication}

1240 \label{authentication}

1242 \begin{enum}

1243 \opnodefault{expect}{string}

1244 Specifies the expected outcome, which must be one of the following:

1246 \begin{enum}

1247 \item[\labelitemi] \textbf{\textit{some}:} Sledgehammer found a proof.

1248 \item[\labelitemi] \textbf{\textit{none}:} Sledgehammer found no proof.

1249 \item[\labelitemi] \textbf{\textit{timeout}:} Sledgehammer timed out.

1250 \item[\labelitemi] \textbf{\textit{unknown}:} Sledgehammer encountered some

1251 problem.

1252 \end{enum}

1254 Sledgehammer emits an error (if \textit{blocking} is enabled) or a warning

1255 (otherwise) if the actual outcome differs from the expected outcome. This option

1256 is useful for regression testing.

1258 \nopagebreak

1259 {\small See also \textit{blocking} (\S\ref{mode-of-operation}) and

1260 \textit{timeout} (\S\ref{timeouts}).}

1261 \end{enum}

1263 \subsection{Timeouts}

1264 \label{timeouts}

1266 \begin{enum}

1267 \opdefault{timeout}{float\_or\_none}{\upshape 30}

1268 Specifies the maximum number of seconds that the automatic provers should spend

1269 searching for a proof. This excludes problem preparation and is a soft limit.

1270 For historical reasons, the default value of this option can be overridden using

1271 the option ``Sledgehammer: Time Limit'' in Proof General's ``Isabelle'' menu.

1273 \opdefault{preplay\_timeout}{float\_or\_none}{\upshape 3}

1274 Specifies the maximum number of seconds that \textit{metis} or \textit{smt}

1275 should spend trying to ``preplay'' the found proof. If this option is set to 0,

1276 no preplaying takes place, and no timing information is displayed next to the

1277 suggested \textit{metis} calls.

1279 \nopagebreak

1280 {\small See also \textit{minimize} (\S\ref{mode-of-operation}).}

1282 \optrueonly{dont\_preplay}

1283 Alias for ``\textit{preplay\_timeout} = 0''.

1285 \end{enum}

1287 \let\em=\sl

1288 \bibliography{../manual}{}

1289 \bibliographystyle{abbrv}

1291 \end{document}