doc-src/Sledgehammer/sledgehammer.tex
changeset 46300 6ded25a6098a
parent 46299 14459b9671a1
child 46302 adf10579fe43
equal deleted inserted replaced
46299:14459b9671a1 46300:6ded25a6098a
   378 This sections answers frequently (and infrequently) asked questions about
   378 This sections answers frequently (and infrequently) asked questions about
   379 Sledgehammer. It is a good idea to skim over it now even if you don't have any
   379 Sledgehammer. It is a good idea to skim over it now even if you don't have any
   380 questions at this stage. And if you have any further questions not listed here,
   380 questions at this stage. And if you have any further questions not listed here,
   381 send them to the author at \authoremail.
   381 send them to the author at \authoremail.
   382 
   382 
   383 \point{Why does Metis fail to reconstruct the proof?}
       
   384 
       
   385 There are many reasons. If Metis runs seemingly forever, that is a sign that the
       
   386 proof is too difficult for it. Metis's search is complete, so it should
       
   387 eventually find it, but that's little consolation. There are several possible
       
   388 solutions:
       
   389 
       
   390 \begin{enum}
       
   391 \item[\labelitemi] Try the \textit{isar\_proof} option (\S\ref{output-format}) to
       
   392 obtain a step-by-step Isar proof where each step is justified by \textit{metis}.
       
   393 Since the steps are fairly small, \textit{metis} is more likely to be able to
       
   394 replay them.
       
   395 
       
   396 \item[\labelitemi] Try the \textit{smt} proof method instead of \textit{metis}. It
       
   397 is usually stronger, but you need to either have Z3 available to replay the
       
   398 proofs, trust the SMT solver, or use certificates. See the documentation in the
       
   399 \emph{SMT} theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.
       
   400 
       
   401 \item[\labelitemi] Try the \textit{blast} or \textit{auto} proof methods, passing
       
   402 the necessary facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},
       
   403 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
       
   404 \end{enum}
       
   405 
       
   406 In some rare cases, \textit{metis} fails fairly quickly, and you get the error
       
   407 message
       
   408 
       
   409 \prew
       
   410 \slshape
       
   411 Proof reconstruction failed.
       
   412 \postw
       
   413 
       
   414 This message usually indicates that Sledgehammer found a type-incorrect proof.
       
   415 This was a frequent issue with older versions of Sledgehammer, which did not
       
   416 supply enough typing information to the ATPs by default. If you notice many
       
   417 unsound proofs and are not using \textit{type\_enc} (\S\ref{problem-encoding}),
       
   418 contact the author at \authoremail.
       
   419 
       
   420 \point{How can I tell whether a generated proof is sound?}
       
   421 
       
   422 First, if \textit{metis} can reconstruct it, the proof is sound (assuming
       
   423 Isabelle's inference kernel is sound). If it fails or runs seemingly forever,
       
   424 you can try
       
   425 
       
   426 \prew
       
   427 \textbf{apply}~\textbf{--} \\
       
   428 \textbf{sledgehammer} [\textit{sound}] (\textit{metis\_facts})
       
   429 \postw
       
   430 
       
   431 where \textit{metis\_facts} is the list of facts appearing in the suggested
       
   432 \textit{metis} call. The automatic provers should be able to re-find the proof
       
   433 quickly if it is sound, and the \textit{sound} option (\S\ref{problem-encoding})
       
   434 ensures that no unsound proofs are found.
       
   435 
       
   436 \point{Which facts are passed to the automatic provers?}
   383 \point{Which facts are passed to the automatic provers?}
   437 
   384 
   438 The relevance filter assigns a score to every available fact (lemma, theorem,
   385 The relevance filter assigns a score to every available fact (lemma, theorem,
   439 definition, or axiom)\ based upon how many constants that fact shares with the
   386 definition, or axiom) based upon how many constants that fact shares with the
   440 conjecture. This process iterates to include facts relevant to those just
   387 conjecture. This process iterates to include facts relevant to those just
   441 accepted, but with a decay factor to ensure termination. The constants are
   388 accepted, but with a decay factor to ensure termination. The constants are
   442 weighted to give unusual ones greater significance. The relevance filter copes
   389 weighted to give unusual ones greater significance. The relevance filter copes
   443 best when the conjecture contains some unusual constants; if all the constants
   390 best when the conjecture contains some unusual constants; if all the constants
   444 are common, it is unable to discriminate among the hundreds of facts that are
   391 are common, it is unable to discriminate among the hundreds of facts that are
   481 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\
   428 \textbf{using} \textit{hd.simps} \textit{tl.simps} \\
   482 \textbf{apply}~\textbf{--} \\
   429 \textbf{apply}~\textbf{--} \\
   483 \textbf{sledgehammer}
   430 \textbf{sledgehammer}
   484 \postw
   431 \postw
   485 
   432 
       
   433 \point{Why does Metis fail to reconstruct the proof?}
       
   434 
       
   435 There are many reasons. If Metis runs seemingly forever, that is a sign that the
       
   436 proof is too difficult for it. Metis's search is complete, so it should
       
   437 eventually find it, but that's little consolation. There are several possible
       
   438 solutions:
       
   439 
       
   440 \begin{enum}
       
   441 \item[\labelitemi] Try the \textit{isar\_proof} option (\S\ref{output-format}) to
       
   442 obtain a step-by-step Isar proof where each step is justified by \textit{metis}.
       
   443 Since the steps are fairly small, \textit{metis} is more likely to be able to
       
   444 replay them.
       
   445 
       
   446 \item[\labelitemi] Try the \textit{smt} proof method instead of \textit{metis}. It
       
   447 is usually stronger, but you need to either have Z3 available to replay the
       
   448 proofs, trust the SMT solver, or use certificates. See the documentation in the
       
   449 \emph{SMT} theory (\texttt{\$ISABELLE\_HOME/src/HOL/SMT.thy}) for details.
       
   450 
       
   451 \item[\labelitemi] Try the \textit{blast} or \textit{auto} proof methods, passing
       
   452 the necessary facts via \textbf{unfolding}, \textbf{using}, \textit{intro}{:},
       
   453 \textit{elim}{:}, \textit{dest}{:}, or \textit{simp}{:}, as appropriate.
       
   454 \end{enum}
       
   455 
       
   456 In some rare cases, \textit{metis} fails fairly quickly, and you get the error
       
   457 message
       
   458 
       
   459 \prew
       
   460 \slshape
       
   461 One-line proof reconstruction failed.
       
   462 \postw
       
   463 
       
   464 This message indicates that Sledgehammer determined that the goal is provable,
       
   465 but the proof is, for technical reasons, beyond \textit{metis}'s power. You can
       
   466 then try again with the \textit{strict} option (\S\ref{problem-encoding}).
       
   467 
       
   468 If the goal is actually unprovable are you did not specify an unsound encoding
       
   469 using \textit{type\_enc} (\S\ref{problem-encoding}), this is a bug, and you are
       
   470 strongly encouraged to report this to the author at \authoremail.
       
   471 
   486 \point{Why are the generated Isar proofs so ugly/broken?}
   472 \point{Why are the generated Isar proofs so ugly/broken?}
   487 
   473 
   488 The current implementation is experimental and explodes exponentially in the
   474 The current implementation of the Isar proof feature,
   489 worst case. Work on a new implementation has begun. There is a large body of
   475 enabled by the \textit{isar\_proof} option (\S\ref{output-format}),
       
   476 is highly experimental. Work on a new implementation has begun. There is a large body of
   490 research into transforming resolution proofs into natural deduction proofs (such
   477 research into transforming resolution proofs into natural deduction proofs (such
   491 as Isar proofs), which we hope to leverage. In the meantime, a workaround is to
   478 as Isar proofs), which we hope to leverage. In the meantime, a workaround is to
   492 set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger
   479 set the \textit{isar\_shrink\_factor} option (\S\ref{output-format}) to a larger
   493 value or to try several provers and keep the nicest-looking proof.
   480 value or to try several provers and keep the nicest-looking proof.
       
   481 
       
   482 \point{How can I tell whether a suggested proof is sound?}
       
   483 
       
   484 Earlier versions of Sledgehammer often suggested unsound proofs---either proofs
       
   485 of nontheorems or simply proofs that rely on type-unsound inferences. This
       
   486 is a thing of the pass, unless you explicitly specify an unsound encoding
       
   487 using \textit{type\_enc} (\S\ref{problem-encoding}).
       
   488 %
       
   489 Officially, the only form of ``unsoundness'' that lurks in the sound
       
   490 encodings is related to missing characteristic theorems of datatypes. For
       
   491 example,
       
   492 
       
   493 \prew
       
   494 \textbf{lemma}~``$\exists \mathit{xs}.\; \mathit{xs} \neq []$'' \\
       
   495 \textbf{sledgehammer} ()
       
   496 \postw
       
   497 
       
   498 suggests an argumentless \textit{metis} call that fails. However, the conjecture
       
   499 does actually hold, and the \textit{metis} call can be repaired by adding
       
   500 \textit{list.distinct}.
       
   501 %
       
   502 We hope to address this problem in a future version of Isabelle. In the
       
   503 meantime, you can avoid it by passing the \textit{strict} option
       
   504 (\S\ref{problem-encoding}).
   494 
   505 
   495 \point{What are the \textit{full\_types}, \textit{no\_types}, and
   506 \point{What are the \textit{full\_types}, \textit{no\_types}, and
   496 \textit{mono\_tags} arguments to Metis?}
   507 \textit{mono\_tags} arguments to Metis?}
   497 
   508 
   498 The \textit{metis}~(\textit{full\_types}) proof method
   509 The \textit{metis}~(\textit{full\_types}) proof method
   667 
   678 
   668 You can instruct Sledgehammer to run automatically on newly entered theorems by
   679 You can instruct Sledgehammer to run automatically on newly entered theorems by
   669 enabling the ``Auto Sledgehammer'' option in Proof General's ``Isabelle'' menu.
   680 enabling the ``Auto Sledgehammer'' option in Proof General's ``Isabelle'' menu.
   670 For automatic runs, only the first prover set using \textit{provers}
   681 For automatic runs, only the first prover set using \textit{provers}
   671 (\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,
   682 (\S\ref{mode-of-operation}) is considered, fewer facts are passed to the prover,
   672 \textit{slice} (\S\ref{mode-of-operation}) is disabled, \textit{sound}
   683 \textit{slice} (\S\ref{mode-of-operation}) is disabled, \textit{strict}
   673 (\S\ref{problem-encoding}) is enabled, \textit{verbose} (\S\ref{output-format})
   684 (\S\ref{problem-encoding}) is enabled, \textit{verbose} (\S\ref{output-format})
   674 and \textit{debug} (\S\ref{output-format}) are disabled, and \textit{timeout}
   685 and \textit{debug} (\S\ref{output-format}) are disabled, and \textit{timeout}
   675 (\S\ref{timeouts}) is superseded by the ``Auto Tools Time Limit'' in Proof
   686 (\S\ref{timeouts}) is superseded by the ``Auto Tools Time Limit'' in Proof
   676 General's ``Isabelle'' menu. Sledgehammer's output is also more concise.
   687 General's ``Isabelle'' menu. Sledgehammer's output is also more concise.
   677 
   688 
   694 \textit{lam\_lifting}, and \textit{combinators} (the default).
   705 \textit{lam\_lifting}, and \textit{combinators} (the default).
   695 %
   706 %
   696 All the untyped type encodings listed in \S\ref{problem-encoding} are supported.
   707 All the untyped type encodings listed in \S\ref{problem-encoding} are supported.
   697 For convenience, the following aliases are provided:
   708 For convenience, the following aliases are provided:
   698 \begin{enum}
   709 \begin{enum}
   699 \item[\labelitemi] \textbf{\textit{full\_types}:} Synonym for An appropriate type-sound encoding.
   710 \item[\labelitemi] \textbf{\textit{full\_types}:} Synonym for \textit{poly\_guards\_query}.
   700 \item[\labelitemi] \textbf{\textit{partial\_types}:} Synonym for \textit{poly\_args}.
   711 \item[\labelitemi] \textbf{\textit{partial\_types}:} Synonym for \textit{poly\_args}.
   701 \item[\labelitemi] \textbf{\textit{no\_types}:} Synonym for \textit{erased}.
   712 \item[\labelitemi] \textbf{\textit{no\_types}:} Synonym for \textit{erased}.
   702 \end{enum}
   713 \end{enum}
   703 
   714 
   704 \section{Option Reference}
   715 \section{Option Reference}
   992 
  1003 
   993 \opdefault{type\_enc}{string}{smart}
  1004 \opdefault{type\_enc}{string}{smart}
   994 Specifies the type encoding to use in ATP problems. Some of the type encodings
  1005 Specifies the type encoding to use in ATP problems. Some of the type encodings
   995 are unsound, meaning that they can give rise to spurious proofs
  1006 are unsound, meaning that they can give rise to spurious proofs
   996 (unreconstructible using \textit{metis}). The supported type encodings are
  1007 (unreconstructible using \textit{metis}). The supported type encodings are
   997 listed below, with an indication of their soundness in parentheses:
  1008 listed below, with an indication of their soundness in parentheses.
       
  1009 An asterisk (*) indicates that the encodings sometimes lead to reconstruction
       
  1010 failures in \textit{metis}, unless the \emph{strict} option (described below) is
       
  1011 enabled.
   998 
  1012 
   999 \begin{enum}
  1013 \begin{enum}
  1000 \item[\labelitemi] \textbf{\textit{erased} (very unsound):} No type information is
  1014 \item[\labelitemi] \textbf{\textit{erased} (very unsound):} No type information is
  1001 supplied to the ATP. Types are simply erased.
  1015 supplied to the ATP, not even to resolve overloading. Types are simply erased.
  1002 
  1016 
  1003 \item[\labelitemi] \textbf{\textit{poly\_guards} (sound):} Types are encoded using
  1017 \item[\labelitemi] \textbf{\textit{poly\_guards} (sound):} Types are encoded using
  1004 a predicate \textit{has\_\allowbreak type\/}$(\tau, t)$ that guards bound
  1018 a predicate \const{g}$(\tau, t)$ that guards bound
  1005 variables. Constants are annotated with their types, supplied as additional
  1019 variables. Constants are annotated with their types, supplied as additional
  1006 arguments, to resolve overloading.
  1020 arguments, to resolve overloading.
  1007 
  1021 
  1008 \item[\labelitemi] \textbf{\textit{poly\_tags} (sound):} Each term and subterm is
  1022 \item[\labelitemi] \textbf{\textit{poly\_tags} (sound):} Each term and subterm is
  1009 tagged with its type using a function $\const{type\/}(\tau, t)$.
  1023 tagged with its type using a function $\const{t\/}(\tau, t)$.
  1010 
  1024 
  1011 \item[\labelitemi] \textbf{\textit{poly\_args} (unsound):}
  1025 \item[\labelitemi] \textbf{\textit{poly\_args} (unsound):}
  1012 Like for \textit{poly\_guards} constants are annotated with their types to
  1026 Like for \textit{poly\_guards} constants are annotated with their types to
  1013 resolve overloading, but otherwise no type information is encoded. This
  1027 resolve overloading, but otherwise no type information is encoded. This
  1014 coincides with the default encoding used by the \textit{metis} command.
  1028 coincides with the default encoding used by the \textit{metis} command.
  1029 \textit{mono\_args} (unsound):} \\
  1043 \textit{mono\_args} (unsound):} \\
  1030 Similar to
  1044 Similar to
  1031 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, and
  1045 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, and
  1032 \textit{raw\_mono\_args}, respectively but types are mangled in constant names
  1046 \textit{raw\_mono\_args}, respectively but types are mangled in constant names
  1033 instead of being supplied as ground term arguments. The binary predicate
  1047 instead of being supplied as ground term arguments. The binary predicate
  1034 $\const{has\_type\/}(\tau, t)$ becomes a unary predicate
  1048 $\const{g}(\tau, t)$ becomes a unary predicate
  1035 $\const{has\_type\_}\tau(t)$, and the binary function
  1049 $\const{g\_}\tau(t)$, and the binary function
  1036 $\const{type\/}(\tau, t)$ becomes a unary function
  1050 $\const{t}(\tau, t)$ becomes a unary function
  1037 $\const{type\_}\tau(t)$.
  1051 $\const{t\_}\tau(t)$.
  1038 
  1052 
  1039 \item[\labelitemi] \textbf{\textit{mono\_simple} (sound):} Exploits simple
  1053 \item[\labelitemi] \textbf{\textit{mono\_simple} (sound):} Exploits simple
  1040 first-order types if the prover supports the TFF0 or THF0 syntax; otherwise,
  1054 first-order types if the prover supports the TFF0 or THF0 syntax; otherwise,
  1041 falls back on \textit{mono\_guards}. The problem is monomorphized.
  1055 falls back on \textit{mono\_guards}. The problem is monomorphized.
  1042 
  1056 
  1046 
  1060 
  1047 \item[\labelitemi]
  1061 \item[\labelitemi]
  1048 \textbf{%
  1062 \textbf{%
  1049 \textit{poly\_guards}?, \textit{poly\_tags}?, \textit{raw\_mono\_guards}?, \\
  1063 \textit{poly\_guards}?, \textit{poly\_tags}?, \textit{raw\_mono\_guards}?, \\
  1050 \textit{raw\_mono\_tags}?, \textit{mono\_guards}?, \textit{mono\_tags}?, \\
  1064 \textit{raw\_mono\_tags}?, \textit{mono\_guards}?, \textit{mono\_tags}?, \\
  1051 \textit{mono\_simple}? (quasi-sound):} \\
  1065 \textit{mono\_simple}? (sound*):} \\
  1052 The type encodings \textit{poly\_guards}, \textit{poly\_tags},
  1066 The type encodings \textit{poly\_guards}, \textit{poly\_tags},
  1053 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},
  1067 \textit{raw\_mono\_guards}, \textit{raw\_mono\_tags}, \textit{mono\_guards},
  1054 \textit{mono\_tags}, and \textit{mono\_simple} are fully
  1068 \textit{mono\_tags}, and \textit{mono\_simple} are fully
  1055 typed and sound. For each of these, Sledgehammer also provides a lighter,
  1069 typed and sound. For each of these, Sledgehammer also provides a lighter,
  1056 virtually sound variant identified by a question mark (`\hbox{?}')\ that detects
  1070 virtually sound variant identified by a question mark (`\hbox{?}')\ that detects
  1057 and erases monotonic types, notably infinite types. (For \textit{mono\_simple},
  1071 and erases monotonic types, notably infinite types. (For \textit{mono\_simple},
  1058 the types are not actually erased but rather replaced by a shared uniform type
  1072 the types are not actually erased but rather replaced by a shared uniform type
  1059 of individuals.) As argument to the \textit{metis} proof method, the question
  1073 of individuals.) As argument to the \textit{metis} proof method, the question
  1060 mark is replaced by a \hbox{``\textit{\_query\/}''} suffix. If the \emph{sound}
  1074 mark is replaced by a \hbox{``\textit{\_query\/}''} suffix.
  1061 option is enabled, these encodings are fully sound.
       
  1062 
  1075 
  1063 \item[\labelitemi]
  1076 \item[\labelitemi]
  1064 \textbf{%
  1077 \textbf{%
  1065 \textit{poly\_guards}??, \textit{poly\_tags}??, \textit{raw\_mono\_guards}??, \\
  1078 \textit{poly\_guards}??, \textit{poly\_tags}??, \textit{raw\_mono\_guards}??, \\
  1066 \textit{raw\_mono\_tags}??, \textit{mono\_guards}??, \textit{mono\_tags}?? \\
  1079 \textit{raw\_mono\_tags}??, \textit{mono\_guards}??, \textit{mono\_tags}?? \\
  1067 (quasi-sound):} \\
  1080 (sound*):} \\
  1068 Even lighter versions of the `\hbox{?}' encodings. As argument to the
  1081 Even lighter versions of the `\hbox{?}' encodings. As argument to the
  1069 \textit{metis} proof method, the `\hbox{??}' suffix is replaced by
  1082 \textit{metis} proof method, the `\hbox{??}' suffix is replaced by
  1070 \hbox{``\textit{\_query\_query\/}''}.
  1083 \hbox{``\textit{\_query\_query\/}''}.
  1071 
  1084 
  1072 \item[\labelitemi]
  1085 \item[\labelitemi]
  1073 \textbf{%
  1086 \textbf{%
  1074 \textit{poly\_guards}@?, \textit{raw\_mono\_guards}@? (quasi-sound):} \\
  1087 \textit{poly\_guards}@?, \textit{raw\_mono\_guards}@? (sound*):} \\
  1075 Alternative versions of the `\hbox{??}' encodings. As argument to the
  1088 Alternative versions of the `\hbox{??}' encodings. As argument to the
  1076 \textit{metis} proof method, the `\hbox{@?}' suffix is replaced by
  1089 \textit{metis} proof method, the `\hbox{@?}' suffix is replaced by
  1077 \hbox{``\textit{\_at\_query\/}''}.
  1090 \hbox{``\textit{\_at\_query\/}''}.
  1078 
  1091 
  1079 \item[\labelitemi]
  1092 \item[\labelitemi]
  1117 
  1130 
  1118 \nopagebreak
  1131 \nopagebreak
  1119 {\small See also \textit{max\_new\_mono\_instances} (\S\ref{relevance-filter})
  1132 {\small See also \textit{max\_new\_mono\_instances} (\S\ref{relevance-filter})
  1120 and \textit{max\_mono\_iters} (\S\ref{relevance-filter}).}
  1133 and \textit{max\_mono\_iters} (\S\ref{relevance-filter}).}
  1121 
  1134 
  1122 \opfalse{sound}{unsound}
  1135 \opfalse{strict}{non\_stric}
  1123 Specifies whether Sledgehammer should run in its fully sound mode. In that mode,
  1136 Specifies whether Sledgehammer should run in its strict mode. In that mode,
  1124 quasi-sound type encodings (which are the default) are made fully sound, at the
  1137 sound type encodings marked with an asterisk (*) above are made more reliable
  1125 cost of some clutter in the generated problems.
  1138 for reconstruction with \textit{metis}, at the cost of some clutter in the
  1126 This option is ignored if \textit{type\_enc} is deliberately set to an encoding
  1139 generated problems. This option has no effect if \textit{type\_enc} is
  1127 that is not sound or quasi-sound.
  1140 deliberately set to an unsound encoding.
  1128 \end{enum}
  1141 \end{enum}
  1129 
  1142 
  1130 \subsection{Relevance Filter}
  1143 \subsection{Relevance Filter}
  1131 \label{relevance-filter}
  1144 \label{relevance-filter}
  1132 
  1145 
  1201 \begin{enum}
  1214 \begin{enum}
  1202 \opnodefault{expect}{string}
  1215 \opnodefault{expect}{string}
  1203 Specifies the expected outcome, which must be one of the following:
  1216 Specifies the expected outcome, which must be one of the following:
  1204 
  1217 
  1205 \begin{enum}
  1218 \begin{enum}
  1206 \item[\labelitemi] \textbf{\textit{some}:} Sledgehammer found a (potentially
  1219 \item[\labelitemi] \textbf{\textit{some}:} Sledgehammer found a proof.
  1207 unsound) proof.
       
  1208 \item[\labelitemi] \textbf{\textit{none}:} Sledgehammer found no proof.
  1220 \item[\labelitemi] \textbf{\textit{none}:} Sledgehammer found no proof.
  1209 \item[\labelitemi] \textbf{\textit{timeout}:} Sledgehammer timed out.
  1221 \item[\labelitemi] \textbf{\textit{timeout}:} Sledgehammer timed out.
  1210 \item[\labelitemi] \textbf{\textit{unknown}:} Sledgehammer encountered some
  1222 \item[\labelitemi] \textbf{\textit{unknown}:} Sledgehammer encountered some
  1211 problem.
  1223 problem.
  1212 \end{enum}
  1224 \end{enum}