imported patch up_casc
authorblanchet
Mon, 13 Jul 2015 16:54:27 +0200
changeset 60716 8e82a83757df
parent 60715 ee0afee54b1d
child 60717 9a14d574ea65
imported patch up_casc
src/HOL/TPTP/CASC/ReadMe
src/HOL/TPTP/CASC/SysDesc_Isabelle.html
src/HOL/TPTP/CASC/SysDesc_Nitpick.html
src/HOL/TPTP/CASC/SysDesc_Nitrox.html
src/HOL/TPTP/CASC/SysDesc_Refute.html
--- a/src/HOL/TPTP/CASC/ReadMe	Mon Jul 13 14:40:16 2015 +0200
+++ b/src/HOL/TPTP/CASC/ReadMe	Mon Jul 13 16:54:27 2015 +0200
@@ -1,31 +1,42 @@
-Isabelle/HOL 2013 at CASC-24
+Notes from Geoff:
+
+I added a few lines to the top of bin/isabelle ...
+
+   ## Geoff makes Isabelle a robust tool, because he's kind
+   function cleanup {
+       rm -rf $HOME
+   }
+   if [ -z ${HOME+x} ]; then
+       HOME="/tmp/Isabelle_$$"
+       trap cleanup EXIT
+   fi
+
+... which you might like to adopt. Now it works on SystemOnTPTP.
+
 
 Notes to Geoff:
 
   Once you have open the archive, Isabelle and its tool are ready to go. The
-  various tools are invoked as follows:
+  various tools are invoked as follows (given a file name %s):
 
-    Isabelle, competition version:
-      ./bin/isabelle tptp_isabelle %d %s
+  	Isabelle, competition version:
+  		STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle %s
 
-    Isabelle, demo version:
-      ./bin/isabelle tptp_isabelle_hot %d %s
+  	Isabelle, demo version:
+  		STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle_hot %s
 
-    Nitpick and Nitrox:
-      ./bin/isabelle tptp_nitpick %d %s
+  	Nitpick (formerly also called Nitrox):
+  		STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_nitpick %s
 
-    Refute:
-      ./bin/isabelle tptp_refute %d %s
+  	Refute:
+  		STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_refute %s
 
   Here's an example:
 
-    ./bin/isabelle tptp_isabelle_hot 300 $TPTP/Problems/SET/SET014^4.p
+		STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle $TPTP/Problems/SET/SET014^4.p
 
   The output should look as follows:
 
-    > val it = (): unit
-    val commit = fn: unit -> bool
-    Loading theory "Scratch_tptp_isabelle_hot_29414_2568"
     running nitpick for 7 s
     FAILURE: nitpick
     running simp for 15 s
@@ -34,37 +45,35 @@
 
   Additional sanity tests:
 
-    ./bin/isabelle tptp_isabelle_hot 300 $TPTP/Problems/CSR/CSR150^3.p
-    ./bin/isabelle tptp_isabelle_hot 300 $TPTP/Problems/SYO/SYO304^5.p
-    ./bin/isabelle tptp_isabelle_hot 300 $TPTP/Problems/PUZ/PUZ087^1.p
+		STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle_hot $TPTP/Problems/CSR/CSR150^3.p
+		STAREXEC_WALLCLOCK_LIMIT=300 ./bin/starexec_run_isabelle_hot $TPTP/Problems/SYO/SYO304^5.p
+
+  The first problem is unprovable; the second one is proved by Satallax.
 
-  The first problem is unprovable; the second one is proved by Satallax; the
-  third one is proved by LEO-II.
+  All the tools accept CNF, FOF, TFF0, or THF0 problems and output SZS
+  statuses of the form
 
-  All the tools accept CNF, FOF, TFF0, or THF0 problems and output SZS statuses
-  of the form
-
-    % SZS status XXX
+  	% SZS status XXX
 
   where XXX is in the set
 
     {Unknown, TimedOut, Unsatisfiable, Theorem, Satisfiable, CounterSatisfiable}
 
-  Nitpick and Nitrox also output a model within "% SZS begin" and "% SZS end"
-  tags.
+  Nitpick also output a model within "% SZS begin" and "% SZS end" tags, in
+  its idiosyncratic syntax.
 
   In 2011, there were some problems with Java (needed for Nitpick), because it
   required so much memory at startup. I doubt there will be any problems this
   year, because Isabelle now includes its own version of Java, but the solution
   back then was to replace
 
-    exec "$ISABELLE_TOOL" java
+  	exec "$ISABELLE_TOOL" java
 
   in the last line of the "contrib/kodkodi-1.5.2/bin/kodkodi" script with
 
-    /usr/lib64/jvm/java-1.5.0-gcj-4.5-1.5.0.0/jre/bin/java
+  	/usr/lib64/jvm/java-1.5.0-gcj-4.5-1.5.0.0/jre/bin/java
 
-  See the emails we exchanged on July 18, 2011, with the subject "No problem on
+  See the emails we exchanged on 18 July 2011, with the subject "No problem on
   my Linux 64-bit".
 
   Enjoy!
@@ -72,39 +81,43 @@
 
 Notes to myself:
 
-  I downloaded the official Isabelle2013 Linux package from
+  I downloaded the official Isabelle2015 Linux package from
+
+    http://isabelle.in.tum.de/dist/Isabelle2015_linux.tar.gz
 
-    http://www.cl.cam.ac.uk/research/hvg/Isabelle/dist/Isabelle2013_linux.tar.gz
+  on "macbroy21" and renamed the directory "Isabelle2015-CASC". I modified
 
-  on a "macbroy" machine and renamed the directory "Isabelle2013-CASC". I built
-  a "HOL-TPTP" image:
+    src/HOL/TPTP/atp_problem_import.ML
+
+  to include changes backported from the development version of Isabelle. I
+  then built a "HOL-TPTP" image:
 
     ./bin/isabelle build -b HOL-TPTP
 
   I copied the heaps over to "./heaps":
 
-    mv ~/.isabelle/Isabelle2013/heaps .
+    mv ~/.isabelle/Isabelle2015/heaps .
 
-  To use this image and suppress some scary output, I added
-
-    HOL-TPTP | grep --line-buffered -v "^###\|^PROOF FAILED for depth\|^Failure node\|inferences so far.  Searching to depth\|^val it = (): unit"
+  I created some wrapper scripts in "./bin":
 
-  to the next-to-last lines of "src/HOL/TPTP/lib/Tools/tptp_[inrs]*".
+    starexec_run_default
+    starexec_run_isabelle
+    starexec_run_isabelle_hot
+    starexec_run_nitpick
+    starexec_run_refute
 
-  At this point I tested the "SYN044^4" mentioned above.
-
-  I renamed "README" to "README.orig" and copied this "ReadMe" over.
+  I tested the "SET014^4" problem mentioned above.
 
   Next, I installed and enabled ATPs.
 
-  LEO-II (1.4.3):
+  LEO-II (1.6.2):
 
     I logged to a 32-bit Linux ("lxlabbroy") machine. I retrieved LEO-II from
 
-      http://www.ags.uni-sb.de/~leo/leo2_v1.4.3.tgz
+      http://page.mi.fu-berlin.de/cbenzmueller/leo/leo2_v1.6.2.tgz
 
-    I did "make opt". I copied
-    "bin/leo.opt" to "~/Isabelle2013-CASC/contrib/leo".
+    I did "make opt". I copied "bin/leo.opt" to
+    "~/Isabelle2015-CASC/contrib/leo".
 
     I added this line to "etc/settings":
 
@@ -130,39 +143,57 @@
 
   Vampire (2.6):
 
-    I copied the file "vampire_rel.linux64" from the 2012 CASC archive to
-    "~/Isabelle2013-CASC/contrib/vampire".
+    I copied the file "vampire", which I probably got from the 2013 CASC
+    archive and moved it to "~/Isabelle2013-CASC/contrib/vampire".
 
     I added these lines to "etc/settings":
 
       VAMPIRE_HOME=$ISABELLE_HOME/contrib
-      VAMPIRE_VERSION=2.6
+      VAMPIRE_VERSION=3.0
+
+  Z3 TPTP (4.3.2.0 postrelease):
 
-  Z3 (3.2):
+    I cloned out the git repository:
+
+      git clone https://git01.codeplex.com/z3
 
-    I uncommented the following line in "contrib/z3-3.2/etc/settings":
+    I build Z3 and from "build", ran "make examples" to build "z3_tptp".
+    I copied "z3_tptp" as "z3_tptp-solver" and "libz3.so" to "./contrib",
+    and put a wrapper called "z3_tptp" to set the library path correctly
+    (inspired by the CVC4 setup on Mac OS X).
 
-      # Z3_NON_COMMERCIAL="yes"
+    I added this line to "etc/settings":
+
+      Z3_TPTP_HOME=$ISABELLE_HOME/contrib
 
-  To test that the examples actually worked, I did
+    Unfortunately, I got "z3::exception" errors. I did not investigate this
+    further and commented out the environment variable in "etc/settings".
+
+  To test that the examples actually worked, I create a file called
+  "/tmp/T.thy" with the following content:
 
-    ./bin/isabelle tty
-    theory T imports Main begin;
-    lemma "a = b ==> [b] = [a]";
-    sledgehammer [e leo2 satallax spass vampire z3 z3_tptp] ();
+    theory T imports Main begin
+    lemma "a = b ==> [b] = [a]"
+    sledgehammer [cvc4 e leo2 satallax spass vampire z3 z3_tptp] ()
+    oops
+    end
+
+  Then I ran
 
-  and I performed the aforementioned sanity tests.
+    ./bin/isabelle_process -e 'use_thy "/tmp/T";'  
+
+  I also performed the aforementioned sanity tests.
 
-  Ideas for next year:
+  Finally, I renamed "README" to "README.orig" and copied this "ReadMe" over.
+
+  Ideas for a future year:
 
     * Unfold definitions, esp. if it makes the problem more first-order (cf.
       "SEU466^1").
     * Detect and remove needless definitions.
-    * Expand "p b" to "(b & p True) | (~ b & p False)" (cf. "CSR148^3").
-    * Select subset of axioms (cf. "CSR148^3").
 
   That's it.
 
 
                 Jasmin Blanchette
-                21 May 2013
+                10 June 2015
--- a/src/HOL/TPTP/CASC/SysDesc_Isabelle.html	Mon Jul 13 14:40:16 2015 +0200
+++ b/src/HOL/TPTP/CASC/SysDesc_Isabelle.html	Mon Jul 13 16:54:27 2015 +0200
@@ -1,5 +1,5 @@
 <HR><!------------------------------------------------------------------------>
-<H2>Isabelle/HOL 2013</H2>
+<H2>Isabelle/HOL 2015</H2>
 Jasmin C. Blanchette<sup>1</sup>, Lawrence C. Paulson<sup>2</sup>,
 Tobias Nipkow<sup>1</sup>, Makarius Wenzel<sup>3</sup> <BR>
 <sup>1</sup>Technische Universit&auml;t M&uuml;nchen, Germany <BR>
@@ -8,9 +8,9 @@
 
 <H3>Architecture</H3>
 
-Isabelle/HOL 2013 [<A HREF="#References">NPW13</A>] is the higher-order 
+Isabelle/HOL 2015 [<A HREF="#References">NPW13</A>] is the higher-order 
 logic incarnation of the generic proof assistant 
-<A HREF="http://www.cl.cam.ac.uk/research/hvg/Isabelle/">Isabelle2013</A>.
+<A HREF="http://www.cl.cam.ac.uk/research/hvg/Isabelle/">Isabelle2015</A>.
 Isabelle/HOL provides several automatic proof tactics, notably an equational
 reasoner [<A HREF="#References">Nip89</A>], a classical reasoner [<A
 HREF="#References">Pau94</A>], and a tableau prover [<A
@@ -108,7 +108,10 @@
 <H3>Expected Competition Performance</H3>
 
 <P>
-Isabelle won last year by a thin margin. This year: first or second place.
+Thanks to the addition of CVC4 and a new version of Vampire,
+Isabelle might have become now strong enough to take on Satallax
+and its various declensions. But we expect Isabelle to end in
+second or third place, to be honest.
 
 <H3>References</H3>
 <DL>
--- a/src/HOL/TPTP/CASC/SysDesc_Nitpick.html	Mon Jul 13 14:40:16 2015 +0200
+++ b/src/HOL/TPTP/CASC/SysDesc_Nitpick.html	Mon Jul 13 16:54:27 2015 +0200
@@ -1,5 +1,5 @@
 <HR><!------------------------------------------------------------------------>
-<H2>Nitpick 2013</H2>
+<H2>Nitpick 2015</H2>
 Jasmin C. Blanchette<BR>
 Technische Universit&auml;t M&uuml;nchen, Germany <BR>
 
@@ -51,8 +51,9 @@
 
 <H3>Expected Competition Performance</H3>
 
+<P>
 Thanks to Kodkod's amazing power, we expect that Nitpick will beat both Satallax
-and Refute with its hands tied behind its back in the TNT category.
+and Refute with its hands tied behind its back.
 
 <H3>References</H3>
 <DL>
--- a/src/HOL/TPTP/CASC/SysDesc_Nitrox.html	Mon Jul 13 14:40:16 2015 +0200
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,78 +0,0 @@
-<HR><!------------------------------------------------------------------------>
-<H2>Nitrox 2013</H2>
-Jasmin C. Blanchette<sup>1</sup>, Emina Torlak<sup>2</sup><BR>
-<sup>1</sup>Technische Universit&auml;t M&uuml;nchen, Germany <BR>
-<sup>2</sup>University of California, Berkeley, USA <BR>
-
-<H3>Architecture</H3>
-
-Nitrox is the first-order version of Nitpick [<A HREF="#References">BN10</A>],
-an open source counterexample generator for Isabelle/HOL
-[<A HREF="#References">NPW13</A>]. It builds on Kodkod
-[<A HREF="#References">TJ07</A>], a highly optimized first-order relational
-model finder based on SAT. The name Nitrox is a portmanteau of <b><i>Nit</i></b>pick
-and Pa<b><i>r</i></b>ad<b><i>ox</i></b> (clever, eh?).
-
-<H3>Strategies</H3>
-
-<p>
-Nitrox employs Kodkod to find a finite model of the negated conjecture. It
-performs a few transformations on the input, such as pushing quantifiers inside,
-but 99% of the solving logic is in Kodkod and the underlying SAT solver.
-
-<p>
-The translation from HOL to Kodkod's first-order relational logic (FORL) is
-parameterized by the cardinalities of the atomic types occurring in it. Nitrox
-enumerates the possible cardinalities for the universe. If a formula has a
-finite counterexample, the tool eventually finds it, unless it runs out of
-resources.
-
-<p>
-Nitpick is optimized to work with higher-order logic (HOL) and its definitional
-principles (e.g., (co)inductive predicates, (co)inductive datatypes,
-(co)recursive functions). When invoked on untyped first-order problem, few of
-its optimizations come into play, and the problem handed to Kodkod is
-essentially a first-order relational logic (FORL) rendering of the TPTP FOF
-problem. There are two main exceptions:
-<ul>
-<li> Nested quantifiers are moved as far inside the formula as possible before
-Kodkod gets a chance to look at them [<A HREF="#References">BN10</A>].
-<li> Definitions invoked with fixed arguments are specialized.
-</ul>
-
-<H3>Implementation</H3>
-
-<p>
-Nitrox, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle
-itself, which adheres to the LCF small-kernel discipline, Nitrox does not
-certify its results and must be trusted. Kodkod is written in Java. MiniSat 1.14
-is used as the SAT solver.
-
-<H3>Expected Competition Performance</H3>
-
-Since Nitpick was designed for HOL, it doesn't have any type inference &agrave;
-la Paradox. It also doesn't use the SAT solver incrementally, which penalizes it
-a bit (but not as much as the missing type inference). Kodkod itself is known to
-perform less well on FOF than Paradox, because it is designed and optimized for
-a somewhat different logic, FORL. On the other hand, Kodkod's symmetry breaking
-might be better calibrated than Paradox's. Hence, we expect Nitrox to end up in
-second or third place in the FNT category.
-
-<H3>References</H3>
-<DL>
-<DT> BN10
-<DD> Blanchette J. C., Nipkow T. (2010),
-     <STRONG>Nitpick: A Counterexample Generator for Higher-Order Logic Based on a Relational Model Finder</STRONG>,
-     ITP 2010, <EM>LNCS</EM> 6172, pp. 131&ndash;146, Springer.
-<DT> NPW13
-<DD> Nipkow T., Paulson L. C., Wenzel M. (2013),
-     <STRONG>Isabelle/HOL: A Proof Assistant for Higher-Order Logic</STRONG>,
-     <A HREF="http://www.cl.cam.ac.uk/research/hvg/Isabelle/dist/Isabelle/doc/tutorial.pdf">http://www.cl.cam.ac.uk/research/hvg/Isabelle/dist/Isabelle/doc/tutorial.pdf</a>.
-<DT> TJ07
-<DD> Torlak E., Jackson D. (2007),
-     <STRONG>Kodkod: A Relational Model Finder</STRONG>, TACAS 2007,
-	 <EM>LNCS</EM> 4424, pp. 632&ndash;647, Springer.
-</DL>
-<P>
-
-<HR><!------------------------------------------------------------------------>
--- a/src/HOL/TPTP/CASC/SysDesc_Refute.html	Mon Jul 13 14:40:16 2015 +0200
+++ b/src/HOL/TPTP/CASC/SysDesc_Refute.html	Mon Jul 13 16:54:27 2015 +0200
@@ -1,5 +1,5 @@
 <HR><!------------------------------------------------------------------------>
-<H2>Refute 2013</H2>
+<H2>Refute 2015</H2>
 Jasmin C. Blanchette<sup>1</sup>, Tjark Weber<sup>2</sup><BR>
 <sup>1</sup>Technische Universit&auml;t M&uuml;nchen, Germany <BR>
 <sup>2</sup>Uppsala Universitet, Sweden <BR>
@@ -33,8 +33,8 @@
 
 <H3>Expected Competition Performance</H3>
 
-We expect that Refute will solve about 75% of the problems solved by Nitpick in
-the TNT category, and perhaps a few problems that Nitpick cannot solve.
+<P>
+We expect Refute to beat Satallax but also to be beaten by Nitpick.
 
 <H3>References</H3>
 <DL>