summary |
shortlog |
changelog |
graph |
tags |
bookmarks |
branches |
files |
changeset |
raw | gz |
help

author | blanchet |

Mon, 13 Jul 2015 16:54:27 +0200 | |

changeset 60716 | 8e82a83757df |

parent 60715 | ee0afee54b1d |

child 60717 | 9a14d574ea65 |

imported patch up_casc

--- a/src/HOL/TPTP/CASC/ReadMe Mon Jul 13 14:40:16 2015 +0200 +++ b/src/HOL/TPTP/CASC/ReadMe Mon Jul 13 16:54:27 2015 +0200 @@ -1,31 +1,42 @@ -Isabelle/HOL 2013 at CASC-24 +Notes from Geoff: + +I added a few lines to the top of bin/isabelle ... + + ## Geoff makes Isabelle a robust tool, because he's kind + function cleanup { + rm -rf $HOME + } + if [ -z ${HOME+x} ]; then + HOME="/tmp/Isabelle_$$" + trap cleanup EXIT + fi + +... which you might like to adopt. Now it works on SystemOnTPTP. + Notes to Geoff: Once you have open the archive, Isabelle and its tool are ready to go. The - various tools are invoked as follows: + various tools are invoked as follows (given a file name %s): - Isabelle, competition version: - ./bin/isabelle tptp_isabelle %d %s + Isabelle, competition version: + STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle %s - Isabelle, demo version: - ./bin/isabelle tptp_isabelle_hot %d %s + Isabelle, demo version: + STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle_hot %s - Nitpick and Nitrox: - ./bin/isabelle tptp_nitpick %d %s + Nitpick (formerly also called Nitrox): + STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_nitpick %s - Refute: - ./bin/isabelle tptp_refute %d %s + Refute: + STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_refute %s Here's an example: - ./bin/isabelle tptp_isabelle_hot 300 $TPTP/Problems/SET/SET014^4.p + STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle $TPTP/Problems/SET/SET014^4.p The output should look as follows: - > val it = (): unit - val commit = fn: unit -> bool - Loading theory "Scratch_tptp_isabelle_hot_29414_2568" running nitpick for 7 s FAILURE: nitpick running simp for 15 s @@ -34,37 +45,35 @@ Additional sanity tests: - ./bin/isabelle tptp_isabelle_hot 300 $TPTP/Problems/CSR/CSR150^3.p - ./bin/isabelle tptp_isabelle_hot 300 $TPTP/Problems/SYO/SYO304^5.p - ./bin/isabelle tptp_isabelle_hot 300 $TPTP/Problems/PUZ/PUZ087^1.p + STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle_hot $TPTP/Problems/CSR/CSR150^3.p + STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle_hot $TPTP/Problems/SYO/SYO304^5.p + + The first problem is unprovable; the second one is proved by Satallax. - The first problem is unprovable; the second one is proved by Satallax; the - third one is proved by LEO-II. + All the tools accept CNF, FOF, TFF0, or THF0 problems and output SZS + statuses of the form - All the tools accept CNF, FOF, TFF0, or THF0 problems and output SZS statuses - of the form - - % SZS status XXX + % SZS status XXX where XXX is in the set {Unknown, TimedOut, Unsatisfiable, Theorem, Satisfiable, CounterSatisfiable} - Nitpick and Nitrox also output a model within "% SZS begin" and "% SZS end" - tags. + Nitpick also output a model within "% SZS begin" and "% SZS end" tags, in + its idiosyncratic syntax. In 2011, there were some problems with Java (needed for Nitpick), because it required so much memory at startup. I doubt there will be any problems this year, because Isabelle now includes its own version of Java, but the solution back then was to replace - exec "$ISABELLE_TOOL" java + exec "$ISABELLE_TOOL" java in the last line of the "contrib/kodkodi-1.5.2/bin/kodkodi" script with - /usr/lib64/jvm/java-1.5.0-gcj-4.5-1.5.0.0/jre/bin/java + /usr/lib64/jvm/java-1.5.0-gcj-4.5-1.5.0.0/jre/bin/java - See the emails we exchanged on July 18, 2011, with the subject "No problem on + See the emails we exchanged on 18 July 2011, with the subject "No problem on my Linux 64-bit". Enjoy! @@ -72,39 +81,43 @@ Notes to myself: - I downloaded the official Isabelle2013 Linux package from + I downloaded the official Isabelle2015 Linux package from + + http://isabelle.in.tum.de/dist/Isabelle2015_linux.tar.gz - http://www.cl.cam.ac.uk/research/hvg/Isabelle/dist/Isabelle2013_linux.tar.gz + on "macbroy21" and renamed the directory "Isabelle2015-CASC". I modified - on a "macbroy" machine and renamed the directory "Isabelle2013-CASC". I built - a "HOL-TPTP" image: + src/HOL/TPTP/atp_problem_import.ML + + to include changes backported from the development version of Isabelle. I + then built a "HOL-TPTP" image: ./bin/isabelle build -b HOL-TPTP I copied the heaps over to "./heaps": - mv ~/.isabelle/Isabelle2013/heaps . + mv ~/.isabelle/Isabelle2015/heaps . - To use this image and suppress some scary output, I added - - HOL-TPTP | grep --line-buffered -v "^###\|^PROOF FAILED for depth\|^Failure node\|inferences so far. Searching to depth\|^val it = (): unit" + I created some wrapper scripts in "./bin": - to the next-to-last lines of "src/HOL/TPTP/lib/Tools/tptp_[inrs]*". + starexec_run_default + starexec_run_isabelle + starexec_run_isabelle_hot + starexec_run_nitpick + starexec_run_refute - At this point I tested the "SYN044^4" mentioned above. - - I renamed "README" to "README.orig" and copied this "ReadMe" over. + I tested the "SET014^4" problem mentioned above. Next, I installed and enabled ATPs. - LEO-II (1.4.3): + LEO-II (1.6.2): I logged to a 32-bit Linux ("lxlabbroy") machine. I retrieved LEO-II from - http://www.ags.uni-sb.de/~leo/leo2_v1.4.3.tgz + http://page.mi.fu-berlin.de/cbenzmueller/leo/leo2_v1.6.2.tgz - I did "make opt". I copied - "bin/leo.opt" to "~/Isabelle2013-CASC/contrib/leo". + I did "make opt". I copied "bin/leo.opt" to + "~/Isabelle2015-CASC/contrib/leo". I added this line to "etc/settings": @@ -130,39 +143,57 @@ Vampire (2.6): - I copied the file "vampire_rel.linux64" from the 2012 CASC archive to - "~/Isabelle2013-CASC/contrib/vampire". + I copied the file "vampire", which I probably got from the 2013 CASC + archive and moved it to "~/Isabelle2013-CASC/contrib/vampire". I added these lines to "etc/settings": VAMPIRE_HOME=$ISABELLE_HOME/contrib - VAMPIRE_VERSION=2.6 + VAMPIRE_VERSION=3.0 + + Z3 TPTP (4.3.2.0 postrelease): - Z3 (3.2): + I cloned out the git repository: + + git clone https://git01.codeplex.com/z3 - I uncommented the following line in "contrib/z3-3.2/etc/settings": + I build Z3 and from "build", ran "make examples" to build "z3_tptp". + I copied "z3_tptp" as "z3_tptp-solver" and "libz3.so" to "./contrib", + and put a wrapper called "z3_tptp" to set the library path correctly + (inspired by the CVC4 setup on Mac OS X). - # Z3_NON_COMMERCIAL="yes" + I added this line to "etc/settings": + + Z3_TPTP_HOME=$ISABELLE_HOME/contrib - To test that the examples actually worked, I did + Unfortunately, I got "z3::exception" errors. I did not investigate this + further and commented out the environment variable in "etc/settings". + + To test that the examples actually worked, I create a file called + "/tmp/T.thy" with the following content: - ./bin/isabelle tty - theory T imports Main begin; - lemma "a = b ==> [b] = [a]"; - sledgehammer [e leo2 satallax spass vampire z3 z3_tptp] (); + theory T imports Main begin + lemma "a = b ==> [b] = [a]" + sledgehammer [cvc4 e leo2 satallax spass vampire z3 z3_tptp] () + oops + end + + Then I ran - and I performed the aforementioned sanity tests. + ./bin/isabelle_process -e 'use_thy "/tmp/T";' + + I also performed the aforementioned sanity tests. - Ideas for next year: + Finally, I renamed "README" to "README.orig" and copied this "ReadMe" over. + + Ideas for a future year: * Unfold definitions, esp. if it makes the problem more first-order (cf. "SEU466^1"). * Detect and remove needless definitions. - * Expand "p b" to "(b & p True) | (~ b & p False)" (cf. "CSR148^3"). - * Select subset of axioms (cf. "CSR148^3"). That's it. Jasmin Blanchette - 21 May 2013 + 10 June 2015

--- a/src/HOL/TPTP/CASC/SysDesc_Isabelle.html Mon Jul 13 14:40:16 2015 +0200 +++ b/src/HOL/TPTP/CASC/SysDesc_Isabelle.html Mon Jul 13 16:54:27 2015 +0200 @@ -1,5 +1,5 @@ <HR><!------------------------------------------------------------------------> -<H2>Isabelle/HOL 2013</H2> +<H2>Isabelle/HOL 2015</H2> Jasmin C. Blanchette<sup>1</sup>, Lawrence C. Paulson<sup>2</sup>, Tobias Nipkow<sup>1</sup>, Makarius Wenzel<sup>3</sup> <BR> <sup>1</sup>Technische Universität München, Germany <BR> @@ -8,9 +8,9 @@ <H3>Architecture</H3> -Isabelle/HOL 2013 [<A HREF="#References">NPW13</A>] is the higher-order +Isabelle/HOL 2015 [<A HREF="#References">NPW13</A>] is the higher-order logic incarnation of the generic proof assistant -<A HREF="http://www.cl.cam.ac.uk/research/hvg/Isabelle/">Isabelle2013</A>. +<A HREF="http://www.cl.cam.ac.uk/research/hvg/Isabelle/">Isabelle2015</A>. Isabelle/HOL provides several automatic proof tactics, notably an equational reasoner [<A HREF="#References">Nip89</A>], a classical reasoner [<A HREF="#References">Pau94</A>], and a tableau prover [<A @@ -108,7 +108,10 @@ <H3>Expected Competition Performance</H3> <P> -Isabelle won last year by a thin margin. This year: first or second place. +Thanks to the addition of CVC4 and a new version of Vampire, +Isabelle might have become now strong enough to take on Satallax +and its various declensions. But we expect Isabelle to end in +second or third place, to be honest. <H3>References</H3> <DL>

--- a/src/HOL/TPTP/CASC/SysDesc_Nitpick.html Mon Jul 13 14:40:16 2015 +0200 +++ b/src/HOL/TPTP/CASC/SysDesc_Nitpick.html Mon Jul 13 16:54:27 2015 +0200 @@ -1,5 +1,5 @@ <HR><!------------------------------------------------------------------------> -<H2>Nitpick 2013</H2> +<H2>Nitpick 2015</H2> Jasmin C. Blanchette<BR> Technische Universität München, Germany <BR> @@ -51,8 +51,9 @@ <H3>Expected Competition Performance</H3> +<P> Thanks to Kodkod's amazing power, we expect that Nitpick will beat both Satallax -and Refute with its hands tied behind its back in the TNT category. +and Refute with its hands tied behind its back. <H3>References</H3> <DL>

--- a/src/HOL/TPTP/CASC/SysDesc_Nitrox.html Mon Jul 13 14:40:16 2015 +0200 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,78 +0,0 @@ -<HR><!------------------------------------------------------------------------> -<H2>Nitrox 2013</H2> -Jasmin C. Blanchette<sup>1</sup>, Emina Torlak<sup>2</sup><BR> -<sup>1</sup>Technische Universität München, Germany <BR> -<sup>2</sup>University of California, Berkeley, USA <BR> - -<H3>Architecture</H3> - -Nitrox is the first-order version of Nitpick [<A HREF="#References">BN10</A>], -an open source counterexample generator for Isabelle/HOL -[<A HREF="#References">NPW13</A>]. It builds on Kodkod -[<A HREF="#References">TJ07</A>], a highly optimized first-order relational -model finder based on SAT. The name Nitrox is a portmanteau of <b><i>Nit</i></b>pick -and Pa<b><i>r</i></b>ad<b><i>ox</i></b> (clever, eh?). - -<H3>Strategies</H3> - -<p> -Nitrox employs Kodkod to find a finite model of the negated conjecture. It -performs a few transformations on the input, such as pushing quantifiers inside, -but 99% of the solving logic is in Kodkod and the underlying SAT solver. - -<p> -The translation from HOL to Kodkod's first-order relational logic (FORL) is -parameterized by the cardinalities of the atomic types occurring in it. Nitrox -enumerates the possible cardinalities for the universe. If a formula has a -finite counterexample, the tool eventually finds it, unless it runs out of -resources. - -<p> -Nitpick is optimized to work with higher-order logic (HOL) and its definitional -principles (e.g., (co)inductive predicates, (co)inductive datatypes, -(co)recursive functions). When invoked on untyped first-order problem, few of -its optimizations come into play, and the problem handed to Kodkod is -essentially a first-order relational logic (FORL) rendering of the TPTP FOF -problem. There are two main exceptions: -<ul> -<li> Nested quantifiers are moved as far inside the formula as possible before -Kodkod gets a chance to look at them [<A HREF="#References">BN10</A>]. -<li> Definitions invoked with fixed arguments are specialized. -</ul> - -<H3>Implementation</H3> - -<p> -Nitrox, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle -itself, which adheres to the LCF small-kernel discipline, Nitrox does not -certify its results and must be trusted. Kodkod is written in Java. MiniSat 1.14 -is used as the SAT solver. - -<H3>Expected Competition Performance</H3> - -Since Nitpick was designed for HOL, it doesn't have any type inference à -la Paradox. It also doesn't use the SAT solver incrementally, which penalizes it -a bit (but not as much as the missing type inference). Kodkod itself is known to -perform less well on FOF than Paradox, because it is designed and optimized for -a somewhat different logic, FORL. On the other hand, Kodkod's symmetry breaking -might be better calibrated than Paradox's. Hence, we expect Nitrox to end up in -second or third place in the FNT category. - -<H3>References</H3> -<DL> -<DT> BN10 -<DD> Blanchette J. C., Nipkow T. (2010), - <STRONG>Nitpick: A Counterexample Generator for Higher-Order Logic Based on a Relational Model Finder</STRONG>, - ITP 2010, <EM>LNCS</EM> 6172, pp. 131–146, Springer. -<DT> NPW13 -<DD> Nipkow T., Paulson L. C., Wenzel M. (2013), - <STRONG>Isabelle/HOL: A Proof Assistant for Higher-Order Logic</STRONG>, - <A HREF="http://www.cl.cam.ac.uk/research/hvg/Isabelle/dist/Isabelle/doc/tutorial.pdf">http://www.cl.cam.ac.uk/research/hvg/Isabelle/dist/Isabelle/doc/tutorial.pdf</a>. -<DT> TJ07 -<DD> Torlak E., Jackson D. (2007), - <STRONG>Kodkod: A Relational Model Finder</STRONG>, TACAS 2007, - <EM>LNCS</EM> 4424, pp. 632–647, Springer. -</DL> -<P> - -<HR><!------------------------------------------------------------------------>

--- a/src/HOL/TPTP/CASC/SysDesc_Refute.html Mon Jul 13 14:40:16 2015 +0200 +++ b/src/HOL/TPTP/CASC/SysDesc_Refute.html Mon Jul 13 16:54:27 2015 +0200 @@ -1,5 +1,5 @@ <HR><!------------------------------------------------------------------------> -<H2>Refute 2013</H2> +<H2>Refute 2015</H2> Jasmin C. Blanchette<sup>1</sup>, Tjark Weber<sup>2</sup><BR> <sup>1</sup>Technische Universität München, Germany <BR> <sup>2</sup>Uppsala Universitet, Sweden <BR> @@ -33,8 +33,8 @@ <H3>Expected Competition Performance</H3> -We expect that Refute will solve about 75% of the problems solved by Nitpick in -the TNT category, and perhaps a few problems that Nitpick cannot solve. +<P> +We expect Refute to beat Satallax but also to be beaten by Nitpick. <H3>References</H3> <DL>