| author | paulson | 
| Thu, 21 Jun 2018 00:01:21 +0100 | |
| changeset 68475 | b6e48841d0a5 | 
| parent 67560 | 0fa87bd86566 | 
| child 69593 | 3dda49e08b9d | 
| permissions | -rw-r--r-- | 
| 48380 | 1 | (* Title: HOL/Tools/Sledgehammer/sledgehammer_mash.ML | 
| 48248 | 2 | Author: Jasmin Blanchette, TU Muenchen | 
| 57009 | 3 | Author: Cezary Kaliszyk, University of Innsbruck | 
| 48248 | 4 | |
| 5 | Sledgehammer's machine-learning-based relevance filter (MaSh). | |
| 6 | *) | |
| 7 | ||
| 48381 | 8 | signature SLEDGEHAMMER_MASH = | 
| 48248 | 9 | sig | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 10 | type stature = ATP_Problem_Generate.stature | 
| 51004 
5f2788c38127
distinguish raw and non-raw facts, using raw for 10 000s of facts and non-raw after selection of some hundreds
 blanchet parents: 
51003diff
changeset | 11 | type raw_fact = Sledgehammer_Fact.raw_fact | 
| 48296 
e7f01b7e244e
gracefully handle the case of empty theories when going up the accessibility chain
 blanchet parents: 
48293diff
changeset | 12 | type fact = Sledgehammer_Fact.fact | 
| 
e7f01b7e244e
gracefully handle the case of empty theories when going up the accessibility chain
 blanchet parents: 
48293diff
changeset | 13 | type fact_override = Sledgehammer_Fact.fact_override | 
| 55201 | 14 | type params = Sledgehammer_Prover.params | 
| 15 | type prover_result = Sledgehammer_Prover.prover_result | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 16 | |
| 48308 | 17 | val trace : bool Config.T | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 18 | val duplicates : bool Config.T | 
| 51008 | 19 | val MePoN : string | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 20 | val MaShN : string | 
| 51008 | 21 | val MeShN : string | 
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 22 | val mepoN : string | 
| 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 23 | val mashN : string | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 24 | val meshN : string | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 25 | val unlearnN : string | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 26 | val learn_isarN : string | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 27 | val learn_proverN : string | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 28 | val relearn_isarN : string | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 29 | val relearn_proverN : string | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 30 | val fact_filters : string list | 
| 50826 | 31 | val encode_str : string -> string | 
| 32 | val encode_strs : string list -> string | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 33 | val decode_str : string -> string | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 34 | val decode_strs : string -> string list | 
| 50632 
12c097ff3241
slightly more elegant naming convention (to keep low-level and high-level APIs separated)
 blanchet parents: 
50631diff
changeset | 35 | |
| 57532 | 36 | datatype mash_algorithm = | 
| 57458 | 37 | MaSh_NB | 
| 57459 | 38 | | MaSh_kNN | 
| 39 | | MaSh_NB_kNN | |
| 57458 | 40 | | MaSh_NB_Ext | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 41 | | MaSh_kNN_Ext | 
| 57106 | 42 | |
| 57120 | 43 | val is_mash_enabled : unit -> bool | 
| 57532 | 44 | val the_mash_algorithm : unit -> mash_algorithm | 
| 57557 | 45 | val str_of_mash_algorithm : mash_algorithm -> string | 
| 57120 | 46 | |
| 61322 | 47 |   val mesh_facts : ('a list -> 'a list) -> ('a * 'a -> bool) -> int ->
 | 
| 48 |     (real * (('a * real) list * 'a list)) list -> 'a list
 | |
| 61321 | 49 | val nickname_of_thm : thm -> string | 
| 57006 | 50 |   val find_suggested_facts : Proof.context -> ('b * thm) list -> string list -> ('b * thm) list
 | 
| 60641 | 51 | val crude_thm_ord : Proof.context -> thm * thm -> order | 
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 52 | val thm_less : thm * thm -> bool | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 53 | val goal_of_thm : theory -> thm -> thm | 
| 57006 | 54 | val run_prover_for_mash : Proof.context -> params -> string -> string -> fact list -> thm -> | 
| 55 | prover_result | |
| 60640 | 56 | val features_of : Proof.context -> string -> stature -> term list -> string list | 
| 51177 | 57 | val trim_dependencies : string list -> string list option | 
| 61321 | 58 | val isar_dependencies_of : string Symtab.table * string Symtab.table -> thm -> string list option | 
| 57006 | 59 | val prover_dependencies_of : Proof.context -> params -> string -> int -> raw_fact list -> | 
| 60 | string Symtab.table * string Symtab.table -> thm -> bool * string list | |
| 61321 | 61 |   val attach_parents_to_facts : ('a * thm) list -> ('a * thm) list ->
 | 
| 57006 | 62 |     (string list * ('a * thm)) list
 | 
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 63 | val num_extra_feature_facts : int | 
| 53141 
d27e99a6a679
take chained and proximate facts into consideration when computing MaSh features
 blanchet parents: 
53140diff
changeset | 64 | val extra_feature_factor : real | 
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 65 |   val weight_facts_smoothly : 'a list -> ('a * real) list
 | 
| 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 66 |   val weight_facts_steeply : 'a list -> ('a * real) list
 | 
| 57432 | 67 |   val find_mash_suggestions : Proof.context -> int -> string list -> ('a * thm) list ->
 | 
| 68 |     ('a * thm) list -> ('a * thm) list -> ('a * thm) list * ('a * thm) list
 | |
| 60649 
e007aa6a8aa2
more explicit use of context and elimination of Thm.theory_of_thm, although unclear (and untested?) situations remain;
 wenzelm parents: 
60641diff
changeset | 69 | val mash_suggested_facts : Proof.context -> string -> params -> int -> term list -> term -> | 
| 57407 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 70 | raw_fact list -> fact list * fact list | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 71 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 72 | val mash_unlearn : Proof.context -> unit | 
| 54503 | 73 |   val mash_learn_proof : Proof.context -> params -> term -> ('a * thm) list -> thm list -> unit
 | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 74 | val mash_learn_facts : Proof.context -> params -> string -> int -> bool -> Time.time -> | 
| 57120 | 75 | raw_fact list -> string | 
| 57006 | 76 | val mash_learn : Proof.context -> params -> fact_override -> thm list -> bool -> unit | 
| 57432 | 77 | val mash_can_suggest_facts : Proof.context -> bool | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 78 | val mash_can_suggest_facts_fast : Proof.context -> bool | 
| 53819 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 79 | |
| 57108 
dc0b4f50e288
more generous max number of suggestions, for more safety
 blanchet parents: 
57107diff
changeset | 80 | val generous_max_suggestions : int -> int | 
| 50814 | 81 | val mepo_weight : real | 
| 82 | val mash_weight : real | |
| 57006 | 83 | val relevant_facts : Proof.context -> params -> string -> int -> fact_override -> term list -> | 
| 84 | term -> raw_fact list -> (string * fact list) list | |
| 48248 | 85 | end; | 
| 86 | ||
| 48381 | 87 | structure Sledgehammer_MaSh : SLEDGEHAMMER_MASH = | 
| 48248 | 88 | struct | 
| 48249 | 89 | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 90 | open ATP_Util | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 91 | open ATP_Problem_Generate | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 92 | open Sledgehammer_Util | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 93 | open Sledgehammer_Fact | 
| 55201 | 94 | open Sledgehammer_Prover | 
| 55202 
824c48a539c9
renamed many Sledgehammer ML files to clarify structure
 blanchet parents: 
55201diff
changeset | 95 | open Sledgehammer_Prover_Minimize | 
| 48381 | 96 | open Sledgehammer_MePo | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 97 | |
| 61318 | 98 | val anonymous_proof_prefix = "." | 
| 99 | ||
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 100 | val trace = Attrib.setup_config_bool @{binding sledgehammer_mash_trace} (K false)
 | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 101 | val duplicates = Attrib.setup_config_bool @{binding sledgehammer_fact_duplicates} (K false)
 | 
| 51032 
69da236d7838
added option to use SNoW as machine learning algo
 blanchet parents: 
51029diff
changeset | 102 | |
| 48308 | 103 | fun trace_msg ctxt msg = if Config.get ctxt trace then tracing (msg ()) else () | 
| 104 | ||
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 105 | fun gen_eq_thm ctxt = if Config.get ctxt duplicates then Thm.eq_thm_strict else Thm.eq_thm_prop | 
| 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 106 | |
| 51008 | 107 | val MePoN = "MePo" | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 108 | val MaShN = "MaSh" | 
| 51008 | 109 | val MeShN = "MeSh" | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 110 | |
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 111 | val mepoN = "mepo" | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 112 | val mashN = "mash" | 
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 113 | val meshN = "mesh" | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 114 | |
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 115 | val fact_filters = [meshN, mepoN, mashN] | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 116 | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 117 | val unlearnN = "unlearn" | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 118 | val learn_isarN = "learn_isar" | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 119 | val learn_proverN = "learn_prover" | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 120 | val relearn_isarN = "relearn_isar" | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 121 | val relearn_proverN = "relearn_prover" | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 122 | |
| 57368 | 123 | fun map_array_at ary f i = Array.update (ary, i, f (Array.sub (ary, i))) | 
| 124 | ||
| 57371 | 125 | type xtab = int * int Symtab.table | 
| 126 | ||
| 127 | val empty_xtab = (0, Symtab.empty) | |
| 128 | ||
| 129 | fun add_to_xtab key (next, tab) = (next + 1, Symtab.update_new (key, next) tab) | |
| 130 | fun maybe_add_to_xtab key = perhaps (try (add_to_xtab key)) | |
| 131 | ||
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 132 | fun state_file () = Path.expand (Path.explode "$ISABELLE_HOME_USER/mash_state") | 
| 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 133 | val remove_state_file = try File.rm o state_file | 
| 48330 | 134 | |
| 57532 | 135 | datatype mash_algorithm = | 
| 57458 | 136 | MaSh_NB | 
| 57459 | 137 | | MaSh_kNN | 
| 138 | | MaSh_NB_kNN | |
| 57458 | 139 | | MaSh_NB_Ext | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 140 | | MaSh_kNN_Ext | 
| 57278 | 141 | |
| 57532 | 142 | fun mash_algorithm () = | 
| 61043 | 143 |   (case Options.default_string @{system_option MaSh} of
 | 
| 144 | "yes" => SOME MaSh_NB_kNN | |
| 145 | | "sml" => SOME MaSh_NB_kNN | |
| 146 | | "nb" => SOME MaSh_NB | |
| 147 | | "knn" => SOME MaSh_kNN | |
| 148 | | "nb_knn" => SOME MaSh_NB_kNN | |
| 149 | | "nb_ext" => SOME MaSh_NB_Ext | |
| 150 | | "knn_ext" => SOME MaSh_kNN_Ext | |
| 151 | | "none" => NONE | |
| 152 | | "" => NONE | |
| 63692 | 153 |   | algorithm => (warning ("Unknown MaSh algorithm: " ^ quote algorithm); NONE))
 | 
| 57018 | 154 | |
| 57532 | 155 | val is_mash_enabled = is_some o mash_algorithm | 
| 156 | val the_mash_algorithm = the_default MaSh_NB_kNN o mash_algorithm | |
| 57018 | 157 | |
| 57557 | 158 | fun str_of_mash_algorithm MaSh_NB = "nb" | 
| 159 | | str_of_mash_algorithm MaSh_kNN = "knn" | |
| 160 | | str_of_mash_algorithm MaSh_NB_kNN = "nb_knn" | |
| 161 | | str_of_mash_algorithm MaSh_NB_Ext = "nb_ext" | |
| 162 | | str_of_mash_algorithm MaSh_kNN_Ext = "knn_ext" | |
| 163 | ||
| 57459 | 164 | fun scaled_avg [] = 0 | 
| 165 | | scaled_avg xs = Real.ceil (100000000.0 * fold (curry (op +)) xs 0.0) div length xs | |
| 166 | ||
| 167 | fun avg [] = 0.0 | |
| 168 | | avg xs = fold (curry (op +)) xs 0.0 / Real.fromInt (length xs) | |
| 169 | ||
| 170 | fun normalize_scores _ [] = [] | |
| 171 | | normalize_scores max_facts xs = | |
| 62826 | 172 | map (apsnd (curry (op *) (1.0 / avg (map snd (take max_facts xs))))) xs | 
| 57459 | 173 | |
| 61322 | 174 | fun mesh_facts maybe_distinct _ max_facts [(_, (sels, unks))] = | 
| 175 | map fst (take max_facts sels) @ take (max_facts - length sels) unks | |
| 176 | |> maybe_distinct | |
| 177 | | mesh_facts _ fact_eq max_facts mess = | |
| 57459 | 178 | let | 
| 179 | val mess = mess |> map (apsnd (apfst (normalize_scores max_facts))) | |
| 180 | ||
| 181 | fun score_in fact (global_weight, (sels, unks)) = | |
| 182 | let val score_at = try (nth sels) #> Option.map (fn (_, score) => global_weight * score) in | |
| 183 | (case find_index (curry fact_eq fact o fst) sels of | |
| 184 | ~1 => if member fact_eq unks fact then NONE else SOME 0.0 | |
| 185 | | rank => score_at rank) | |
| 186 | end | |
| 187 | ||
| 188 | fun weight_of fact = mess |> map_filter (score_in fact) |> scaled_avg | |
| 189 | in | |
| 190 | fold (union fact_eq o map fst o take max_facts o fst o snd) mess [] | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 191 | |> map (`weight_of) |> sort (int_ord o apply2 fst o swap) | 
| 57459 | 192 | |> map snd |> take max_facts | 
| 193 | end | |
| 194 | ||
| 195 | fun smooth_weight_of_fact rank = Math.pow (1.3, 15.5 - 0.2 * Real.fromInt rank) + 15.0 (* FUDGE *) | |
| 196 | fun steep_weight_of_fact rank = Math.pow (0.62, log2 (Real.fromInt (rank + 1))) (* FUDGE *) | |
| 197 | ||
| 198 | fun weight_facts_smoothly facts = facts ~~ map smooth_weight_of_fact (0 upto length facts - 1) | |
| 199 | fun weight_facts_steeply facts = facts ~~ map steep_weight_of_fact (0 upto length facts - 1) | |
| 200 | ||
| 57660 | 201 | fun sort_array_suffix cmp needed a = | 
| 57009 | 202 | let | 
| 57406 | 203 | exception BOTTOM of int | 
| 204 | ||
| 57460 | 205 | val al = Array.length a | 
| 206 | ||
| 57009 | 207 | fun maxson l i = | 
| 57017 | 208 | let val i31 = i + i + i + 1 in | 
| 57009 | 209 | if i31 + 2 < l then | 
| 57017 | 210 | let val x = Unsynchronized.ref i31 in | 
| 67560 | 211 | if is_less (cmp (Array.sub (a, i31), Array.sub (a, i31 + 1))) then x := i31 + 1 else (); | 
| 212 | if is_less (cmp (Array.sub (a, !x), Array.sub (a, i31 + 2))) then x := i31 + 2 else (); | |
| 57009 | 213 | !x | 
| 214 | end | |
| 215 | else | |
| 67560 | 216 | if i31 + 1 < l andalso is_less (cmp (Array.sub (a, i31), Array.sub (a, i31 + 1))) | 
| 57009 | 217 | then i31 + 1 else if i31 < l then i31 else raise BOTTOM i | 
| 218 | end | |
| 219 | ||
| 220 | fun trickledown l i e = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 221 | let val j = maxson l i in | 
| 67560 | 222 | if is_greater (cmp (Array.sub (a, j), e)) then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 223 | (Array.update (a, i, Array.sub (a, j)); trickledown l j e) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 224 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 225 | Array.update (a, i, e) | 
| 57009 | 226 | end | 
| 227 | ||
| 228 | fun trickle l i e = trickledown l i e handle BOTTOM i => Array.update (a, i, e) | |
| 229 | ||
| 230 | fun bubbledown l i = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 231 | let val j = maxson l i in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 232 | Array.update (a, i, Array.sub (a, j)); | 
| 57009 | 233 | bubbledown l j | 
| 234 | end | |
| 235 | ||
| 236 | fun bubble l i = bubbledown l i handle BOTTOM i => i | |
| 237 | ||
| 238 | fun trickleup i e = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 239 | let val father = (i - 1) div 3 in | 
| 67560 | 240 | if is_less (cmp (Array.sub (a, father), e)) then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 241 | (Array.update (a, i, Array.sub (a, father)); | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 242 | if father > 0 then trickleup father e else Array.update (a, 0, e)) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 243 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 244 | Array.update (a, i, e) | 
| 57009 | 245 | end | 
| 246 | ||
| 57355 | 247 | fun for i = if i < 0 then () else (trickle al i (Array.sub (a, i)); for (i - 1)) | 
| 57009 | 248 | |
| 249 | fun for2 i = | |
| 57660 | 250 | if i < Integer.max 2 (al - needed) then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 251 | () | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 252 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 253 | let val e = Array.sub (a, i) in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 254 | Array.update (a, i, Array.sub (a, 0)); | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 255 | trickleup (bubble i 0) e; | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 256 | for2 (i - 1) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 257 | end | 
| 57009 | 258 | in | 
| 57355 | 259 | for (((al + 1) div 3) - 1); | 
| 260 | for2 (al - 1); | |
| 261 | if al > 1 then | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 262 | let val e = Array.sub (a, 1) in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 263 | Array.update (a, 1, Array.sub (a, 0)); | 
| 57009 | 264 | Array.update (a, 0, e) | 
| 265 | end | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 266 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 267 | () | 
| 57009 | 268 | end | 
| 269 | ||
| 57660 | 270 | fun rev_sort_list_prefix cmp needed xs = | 
| 57460 | 271 | let val ary = Array.fromList xs in | 
| 57660 | 272 | sort_array_suffix cmp needed ary; | 
| 273 | Array.foldl (op ::) [] ary | |
| 57460 | 274 | end | 
| 275 | ||
| 276 | ||
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 277 | (*** Convenience functions for synchronized access ***) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 278 | |
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 279 | fun synchronized_timed_value var time_limit = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 280 | Synchronized.timed_access var time_limit (fn value => SOME (value, value)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 281 | fun synchronized_timed_change_result var time_limit f = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 282 | Synchronized.timed_access var time_limit (SOME o f) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 283 | fun synchronized_timed_change var time_limit f = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 284 | synchronized_timed_change_result var time_limit (fn x => ((), f x)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 285 | |
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 286 | fun mash_time_limit _ = SOME (seconds 0.1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 287 | |
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 288 | |
| 57460 | 289 | (*** Isabelle-agnostic machine learning ***) | 
| 290 | ||
| 291 | structure MaSh = | |
| 292 | struct | |
| 293 | ||
| 57574 | 294 | fun select_visible_facts (big_number : real) recommends = | 
| 57364 | 295 | List.app (fn at => | 
| 296 | let val (j, ov) = Array.sub (recommends, at) in | |
| 57371 | 297 | Array.update (recommends, at, (j, big_number + ov)) | 
| 57364 | 298 | end) | 
| 299 | ||
| 57374 | 300 | fun wider_array_of_vector init vec = | 
| 301 | let val ary = Array.array init in | |
| 302 |     Array.copyVec {src = vec, dst = ary, di = 0};
 | |
| 303 | ary | |
| 304 | end | |
| 305 | ||
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 306 | val nb_def_prior_weight = 1000 (* FUDGE *) | 
| 57095 | 307 | |
| 57374 | 308 | fun learn_facts (tfreq0, sfreq0, dffreq0) num_facts0 num_facts num_feats depss featss = | 
| 57354 | 309 | let | 
| 57374 | 310 | val tfreq = wider_array_of_vector (num_facts, 0) tfreq0 | 
| 311 | val sfreq = wider_array_of_vector (num_facts, Inttab.empty) sfreq0 | |
| 312 | val dffreq = wider_array_of_vector (num_feats, 0) dffreq0 | |
| 313 | ||
| 314 | fun learn_one th feats deps = | |
| 57354 | 315 | let | 
| 57355 | 316 | fun add_th weight t = | 
| 317 | let | |
| 318 | val im = Array.sub (sfreq, t) | |
| 57374 | 319 | fun fold_fn s = Inttab.map_default (s, 0) (Integer.add weight) | 
| 57355 | 320 | in | 
| 57368 | 321 | map_array_at tfreq (Integer.add weight) t; | 
| 57355 | 322 | Array.update (sfreq, t, fold fold_fn feats im) | 
| 323 | end | |
| 324 | ||
| 57368 | 325 | val add_sym = map_array_at dffreq (Integer.add 1) | 
| 57354 | 326 | in | 
| 57355 | 327 | add_th nb_def_prior_weight th; | 
| 328 | List.app (add_th 1) deps; | |
| 329 | List.app add_sym feats | |
| 57354 | 330 | end | 
| 331 | ||
| 57102 | 332 | fun for i = | 
| 57383 | 333 | if i = num_facts then () | 
| 334 | else (learn_one i (Vector.sub (featss, i)) (Vector.sub (depss, i)); for (i + 1)) | |
| 57102 | 335 | in | 
| 57383 | 336 | for num_facts0; | 
| 57374 | 337 | (Array.vector tfreq, Array.vector sfreq, Array.vector dffreq) | 
| 57102 | 338 | end | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 339 | |
| 57376 | 340 | fun naive_bayes (tfreq, sfreq, dffreq) num_facts max_suggs visible_facts goal_feats = | 
| 57278 | 341 | let | 
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 342 | val tau = 0.2 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 343 | val pos_weight = 5.0 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 344 | val def_val = ~18.0 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 345 | val init_val = 30.0 (* FUDGE *) | 
| 57124 | 346 | |
| 57366 | 347 | val ln_afreq = Math.ln (Real.fromInt num_facts) | 
| 57374 | 348 | val idf = Vector.map (fn i => ln_afreq - Math.ln (Real.fromInt i)) dffreq | 
| 57366 | 349 | |
| 57359 | 350 | fun tfidf feat = Vector.sub (idf, feat) | 
| 57102 | 351 | |
| 352 | fun log_posterior i = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 353 | let | 
| 57374 | 354 | val tfreq = Real.fromInt (Vector.sub (tfreq, i)) | 
| 57097 | 355 | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 356 | fun add_feat (f, fw0) (res, sfh) = | 
| 57102 | 357 | (case Inttab.lookup sfh f of | 
| 358 | SOME sf => | |
| 57401 
02f56126b4e4
reintroduced 'extra features' but with lower weight than before (to account for tfidf)
 blanchet parents: 
57387diff
changeset | 359 | (res + fw0 * tfidf f * Math.ln (pos_weight * Real.fromInt sf / tfreq), | 
| 57102 | 360 | Inttab.delete f sfh) | 
| 57409 | 361 | | NONE => (res + fw0 * tfidf f * def_val, sfh)) | 
| 57097 | 362 | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 363 | val (res, sfh) = fold add_feat goal_feats (init_val * Math.ln tfreq, Vector.sub (sfreq, i)) | 
| 57097 | 364 | |
| 57409 | 365 | fun fold_sfh (f, sf) sow = | 
| 59476 | 366 | sow + tfidf f * Math.ln (1.0 - Real.fromInt (sf - 1) / tfreq) | 
| 57097 | 367 | |
| 57102 | 368 | val sum_of_weights = Inttab.fold fold_sfh sfh 0.0 | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 369 | in | 
| 57278 | 370 | res + tau * sum_of_weights | 
| 57097 | 371 | end | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 372 | |
| 57356 | 373 | val posterior = Array.tabulate (num_facts, (fn j => (j, log_posterior j))) | 
| 57102 | 374 | |
| 57364 | 375 | fun ret at acc = | 
| 376 | if at = num_facts then acc else ret (at + 1) (Array.sub (posterior, at) :: acc) | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 377 | in | 
| 57371 | 378 | select_visible_facts 100000.0 posterior visible_facts; | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 379 | sort_array_suffix (Real.compare o apply2 snd) max_suggs posterior; | 
| 57364 | 380 | ret (Integer.max 0 (num_facts - max_suggs)) [] | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 381 | end | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 382 | |
| 59477 | 383 | val initial_k = 0 | 
| 57458 | 384 | |
| 57459 | 385 | fun k_nearest_neighbors dffreq num_facts num_feats depss featss max_suggs visible_facts goal_feats = | 
| 57458 | 386 | let | 
| 387 | exception EXIT of unit | |
| 388 | ||
| 389 | val ln_afreq = Math.ln (Real.fromInt num_facts) | |
| 390 | fun tfidf feat = ln_afreq - Math.ln (Real.fromInt (Vector.sub (dffreq, feat))) | |
| 391 | ||
| 392 | val overlaps_sqr = Array.tabulate (num_facts, rpair 0.0) | |
| 393 | ||
| 57459 | 394 | val feat_facts = Array.array (num_feats, []) | 
| 395 | val _ = Vector.foldl (fn (feats, fact) => | |
| 396 | (List.app (map_array_at feat_facts (cons fact)) feats; fact + 1)) 0 featss | |
| 397 | ||
| 57458 | 398 | fun do_feat (s, sw0) = | 
| 399 | let | |
| 400 | val sw = sw0 * tfidf s | |
| 59476 | 401 | val w6 = Math.pow (sw, 6.0 (* FUDGE *)) | 
| 57458 | 402 | |
| 403 | fun inc_overlap j = | |
| 404 | let val (_, ov) = Array.sub (overlaps_sqr, j) in | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 405 | Array.update (overlaps_sqr, j, (j, w6 + ov)) | 
| 57458 | 406 | end | 
| 407 | in | |
| 408 | List.app inc_overlap (Array.sub (feat_facts, s)) | |
| 409 | end | |
| 410 | ||
| 411 | val _ = List.app do_feat goal_feats | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 412 | val _ = sort_array_suffix (Real.compare o apply2 snd) num_facts overlaps_sqr | 
| 57458 | 413 | val no_recommends = Unsynchronized.ref 0 | 
| 414 | val recommends = Array.tabulate (num_facts, rpair 0.0) | |
| 415 | val age = Unsynchronized.ref 500000000.0 | |
| 416 | ||
| 417 | fun inc_recommend v j = | |
| 418 | let val (_, ov) = Array.sub (recommends, j) in | |
| 419 | if ov <= 0.0 then | |
| 420 | (no_recommends := !no_recommends + 1; Array.update (recommends, j, (j, !age + ov))) | |
| 59476 | 421 | else | 
| 422 | Array.update (recommends, j, (j, v + ov)) | |
| 57458 | 423 | end | 
| 424 | ||
| 425 | val k = Unsynchronized.ref 0 | |
| 426 | fun do_k k = | |
| 427 | if k >= num_facts then | |
| 428 | raise EXIT () | |
| 429 | else | |
| 430 | let | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 431 | val deps_factor = 2.7 (* FUDGE *) | 
| 57458 | 432 | val (j, o2) = Array.sub (overlaps_sqr, num_facts - k - 1) | 
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 433 | val _ = inc_recommend o2 j | 
| 57458 | 434 | val ds = Vector.sub (depss, j) | 
| 435 | val l = Real.fromInt (length ds) | |
| 436 | in | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 437 | List.app (inc_recommend (deps_factor * o2 / l)) ds | 
| 57458 | 438 | end | 
| 439 | ||
| 440 | fun while1 () = | |
| 59477 | 441 | if !k = initial_k + 1 then () else (do_k (!k); k := !k + 1; while1 ()) | 
| 57458 | 442 | handle EXIT () => () | 
| 443 | ||
| 444 | fun while2 () = | |
| 445 | if !no_recommends >= max_suggs then () | |
| 446 | else (do_k (!k); k := !k + 1; age := !age - 10000.0; while2 ()) | |
| 447 | handle EXIT () => () | |
| 448 | ||
| 449 | fun ret acc at = | |
| 450 | if at = num_facts then acc else ret (Array.sub (recommends, at) :: acc) (at + 1) | |
| 451 | in | |
| 452 | while1 (); | |
| 453 | while2 (); | |
| 454 | select_visible_facts 1000000000.0 recommends visible_facts; | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 455 | sort_array_suffix (Real.compare o apply2 snd) max_suggs recommends; | 
| 57458 | 456 | ret [] (Integer.max 0 (num_facts - max_suggs)) | 
| 457 | end | |
| 458 | ||
| 57125 
2f620ef839ee
added another way of invoking Python code, for experiments
 blanchet parents: 
57124diff
changeset | 459 | (* experimental *) | 
| 57376 | 460 | fun external_tool tool max_suggs learns goal_feats = | 
| 57291 | 461 | let | 
| 57297 | 462 | val ser = string_of_int (serial ()) (* poor person's attempt at thread-safety *) | 
| 463 |     val ocs = TextIO.openOut ("adv_syms" ^ ser)
 | |
| 464 |     val ocd = TextIO.openOut ("adv_deps" ^ ser)
 | |
| 465 |     val ocq = TextIO.openOut ("adv_seq" ^ ser)
 | |
| 466 |     val occ = TextIO.openOut ("adv_conj" ^ ser)
 | |
| 57296 | 467 | |
| 57291 | 468 | fun os oc s = TextIO.output (oc, s) | 
| 57296 | 469 | |
| 57297 | 470 | fun ol _ _ _ [] = () | 
| 471 | | ol _ f _ [e] = f e | |
| 57291 | 472 | | ol oc f sep (h :: t) = (f h; os oc sep; ol oc f sep t) | 
| 57296 | 473 | |
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 474 | fun do_learn (name, feats, deps) = | 
| 57357 | 475 | (os ocs name; os ocs ":"; ol ocs (os ocs o quote) ", " feats; os ocs "\n"; | 
| 57297 | 476 | os ocd name; os ocd ":"; ol ocd (os ocd) " " deps; os ocd "\n"; os ocq name; os ocq "\n") | 
| 57296 | 477 | |
| 57291 | 478 | fun forkexec no = | 
| 479 | let | |
| 480 | val cmd = | |
| 57297 | 481 | "~/misc/" ^ tool ^ " adv_syms" ^ ser ^ " adv_deps" ^ ser ^ " " ^ string_of_int no ^ | 
| 482 | " adv_seq" ^ ser ^ " < adv_conj" ^ ser | |
| 57291 | 483 | in | 
| 484 | fst (Isabelle_System.bash_output cmd) | |
| 485 | |> space_explode " " | |
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 486 | |> filter_out (curry (op =) "") | 
| 57291 | 487 | end | 
| 488 | in | |
| 57401 
02f56126b4e4
reintroduced 'extra features' but with lower weight than before (to account for tfidf)
 blanchet parents: 
57387diff
changeset | 489 | (List.app do_learn learns; ol occ (os occ o quote) ", " (map fst goal_feats); | 
| 57297 | 490 | TextIO.closeOut ocs; TextIO.closeOut ocd; TextIO.closeOut ocq; TextIO.closeOut occ; | 
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 491 | forkexec max_suggs) | 
| 57291 | 492 | end | 
| 493 | ||
| 57561 | 494 | fun k_nearest_neighbors_ext max_suggs = | 
| 59477 | 495 |   external_tool ("newknn/knn" ^ " " ^ string_of_int initial_k) max_suggs
 | 
| 57561 | 496 | fun naive_bayes_ext max_suggs = external_tool "predict/nbayes" max_suggs | 
| 57374 | 497 | |
| 57532 | 498 | fun query_external ctxt algorithm max_suggs learns goal_feats = | 
| 57432 | 499 | (trace_msg ctxt (fn () => "MaSh query external " ^ commas (map fst goal_feats)); | 
| 57532 | 500 | (case algorithm of | 
| 57458 | 501 | MaSh_NB_Ext => naive_bayes_ext max_suggs learns goal_feats | 
| 502 | | MaSh_kNN_Ext => k_nearest_neighbors_ext max_suggs learns goal_feats)) | |
| 57371 | 503 | |
| 57532 | 504 | fun query_internal ctxt algorithm num_facts num_feats (fact_names, featss, depss) | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 505 | (freqs as (_, _, dffreq)) visible_facts max_suggs goal_feats int_goal_feats = | 
| 57459 | 506 | let | 
| 507 | fun nb () = | |
| 508 | naive_bayes freqs num_facts max_suggs visible_facts int_goal_feats | |
| 509 | |> map fst | |
| 510 | fun knn () = | |
| 511 | k_nearest_neighbors dffreq num_facts num_feats depss featss max_suggs visible_facts | |
| 512 | int_goal_feats | |
| 513 | |> map fst | |
| 514 | in | |
| 515 |     (trace_msg ctxt (fn () => "MaSh query internal " ^ commas (map fst goal_feats) ^ " from {" ^
 | |
| 516 | elide_string 1000 (space_implode " " (Vector.foldr (op ::) [] fact_names)) ^ "}"); | |
| 57532 | 517 | (case algorithm of | 
| 57459 | 518 | MaSh_NB => nb () | 
| 519 | | MaSh_kNN => knn () | |
| 520 | | MaSh_NB_kNN => | |
| 61322 | 521 | mesh_facts I (op =) max_suggs | 
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 522 | [(0.5 (* FUDGE *), (weight_facts_steeply (nb ()), [])), | 
| 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 523 | (0.5 (* FUDGE *), (weight_facts_steeply (knn ()), []))]) | 
| 57459 | 524 | |> map (curry Vector.sub fact_names)) | 
| 525 | end | |
| 57007 | 526 | |
| 527 | end; | |
| 528 | ||
| 529 | ||
| 57432 | 530 | (*** Persistent, stringly-typed state ***) | 
| 531 | ||
| 532 | fun meta_char c = | |
| 533 |   if Char.isAlphaNum c orelse c = #"_" orelse c = #"." orelse c = #"(" orelse c = #")" orelse
 | |
| 63696 
af310e622d64
tuned MaSh's metacharacters to avoid needless decoding
 blanchet parents: 
63695diff
changeset | 534 | c = #"," orelse c = #"'" then | 
| 57432 | 535 | String.str c | 
| 536 | else | |
| 537 | (* fixed width, in case more digits follow *) | |
| 538 | "%" ^ stringN_of_int 3 (Char.ord c) | |
| 539 | ||
| 540 | fun unmeta_chars accum [] = String.implode (rev accum) | |
| 541 | | unmeta_chars accum (#"%" :: d1 :: d2 :: d3 :: cs) = | |
| 542 | (case Int.fromString (String.implode [d1, d2, d3]) of | |
| 543 | SOME n => unmeta_chars (Char.chr n :: accum) cs | |
| 544 | | NONE => "" (* error *)) | |
| 545 | | unmeta_chars _ (#"%" :: _) = "" (* error *) | |
| 546 | | unmeta_chars accum (c :: cs) = unmeta_chars (c :: accum) cs | |
| 547 | ||
| 548 | val encode_str = String.translate meta_char | |
| 63695 | 549 | val encode_strs = map encode_str #> space_implode " " | 
| 57432 | 550 | |
| 63695 | 551 | fun decode_str s = | 
| 552 | if String.isSubstring "%" s then unmeta_chars [] (String.explode s) else s; | |
| 553 | ||
| 554 | fun decode_strs s = | |
| 555 | space_explode " " s |> String.isSubstring "%" s ? map decode_str; | |
| 50311 | 556 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 557 | datatype proof_kind = Isar_Proof | Automatic_Proof | Isar_Proof_wegen_Prover_Flop | 
| 50311 | 558 | |
| 559 | fun str_of_proof_kind Isar_Proof = "i" | |
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 560 | | str_of_proof_kind Automatic_Proof = "a" | 
| 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 561 | | str_of_proof_kind Isar_Proof_wegen_Prover_Flop = "x" | 
| 50311 | 562 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 563 | fun proof_kind_of_str "a" = Automatic_Proof | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 564 | | proof_kind_of_str "x" = Isar_Proof_wegen_Prover_Flop | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 565 | | proof_kind_of_str _ (* "i" *) = Isar_Proof | 
| 50311 | 566 | |
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 567 | fun add_edge_to name parent = | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 568 | Graph.default_node (parent, (Isar_Proof, [], [])) | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 569 | #> Graph.add_edge (parent, name) | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 570 | |
| 57661 | 571 | fun add_node kind name parents feats deps (accum as (access_G, (fact_xtab, feat_xtab), learns)) = | 
| 572 | let val fact_xtab' = add_to_xtab name fact_xtab in | |
| 573 | ((Graph.new_node (name, (kind, feats, deps)) access_G | |
| 574 | handle Graph.DUP _ => Graph.map_node name (K (kind, feats, deps)) access_G) | |
| 575 | |> fold (add_edge_to name) parents, | |
| 576 | (fact_xtab', fold maybe_add_to_xtab feats feat_xtab), | |
| 577 | (name, feats, deps) :: learns) | |
| 578 | end | |
| 579 | handle Symtab.DUP _ => accum (* robustness (in case the state file violates the invariant) *) | |
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 580 | |
| 50311 | 581 | fun try_graph ctxt when def f = | 
| 582 | f () | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 583 | handle | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 584 | Graph.CYCLES (cycle :: _) => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 585 | (trace_msg ctxt (fn () => "Cycle involving " ^ commas cycle ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 586 | | Graph.DUP name => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 587 | (trace_msg ctxt (fn () => "Duplicate fact " ^ quote name ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 588 | | Graph.UNDEF name => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 589 | (trace_msg ctxt (fn () => "Unknown fact " ^ quote name ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 590 | | exn => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 591 | if Exn.is_interrupt exn then | 
| 62505 | 592 | Exn.reraise exn | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 593 | else | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 594 | (trace_msg ctxt (fn () => "Internal error when " ^ when ^ ":\n" ^ Runtime.exn_message exn); | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 595 | def) | 
| 50311 | 596 | |
| 597 | fun graph_info G = | |
| 598 | string_of_int (length (Graph.keys G)) ^ " node(s), " ^ | |
| 57006 | 599 | string_of_int (fold (Integer.add o length o snd) (Graph.dest G) 0) ^ " edge(s), " ^ | 
| 50311 | 600 | string_of_int (length (Graph.maximals G)) ^ " maximal" | 
| 601 | ||
| 57545 | 602 | type ffds = string vector * int list vector * int list vector | 
| 603 | type freqs = int vector * int Inttab.table vector * int vector | |
| 604 | ||
| 53095 | 605 | type mash_state = | 
| 57358 | 606 |   {access_G : (proof_kind * string list * string list) Graph.T,
 | 
| 57374 | 607 | xtabs : xtab * xtab, | 
| 57545 | 608 | ffds : ffds, | 
| 609 | freqs : freqs, | |
| 57365 | 610 | dirty_facts : string list option} | 
| 50311 | 611 | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 612 | val empty_xtabs = (empty_xtab, empty_xtab) | 
| 57545 | 613 | val empty_ffds = (Vector.fromList [], Vector.fromList [], Vector.fromList []) : ffds | 
| 614 | val empty_freqs = (Vector.fromList [], Vector.fromList [], Vector.fromList []) : freqs | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 615 | |
| 57371 | 616 | val empty_state = | 
| 617 |   {access_G = Graph.empty,
 | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 618 | xtabs = empty_xtabs, | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 619 | ffds = empty_ffds, | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 620 | freqs = empty_freqs, | 
| 57371 | 621 | dirty_facts = SOME []} : mash_state | 
| 622 | ||
| 57545 | 623 | fun recompute_ffds_freqs_from_learns (learns : (string * string list * string list) list) | 
| 624 | ((num_facts, fact_tab), (num_feats, feat_tab)) num_facts0 (fact_names0, featss0, depss0) freqs0 = | |
| 57379 | 625 | let | 
| 57380 | 626 | val fact_names = Vector.concat [fact_names0, Vector.fromList (map #1 learns)] | 
| 627 | val featss = Vector.concat [featss0, | |
| 628 | Vector.fromList (map (map_filter (Symtab.lookup feat_tab) o #2) learns)] | |
| 629 | val depss = Vector.concat [depss0, | |
| 630 | Vector.fromList (map (map_filter (Symtab.lookup fact_tab) o #3) learns)] | |
| 57379 | 631 | in | 
| 632 | ((fact_names, featss, depss), | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 633 | MaSh.learn_facts freqs0 num_facts0 num_facts num_feats depss featss) | 
| 57379 | 634 | end | 
| 635 | ||
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 636 | fun reorder_learns (num_facts, fact_tab) learns = | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 637 |   let val ary = Array.array (num_facts, ("", [], [])) in
 | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 638 | List.app (fn learn as (fact, _, _) => | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 639 | Array.update (ary, the (Symtab.lookup fact_tab fact), learn)) | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 640 | learns; | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 641 | Array.foldr (op ::) [] ary | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 642 | end | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 643 | |
| 57380 | 644 | fun recompute_ffds_freqs_from_access_G access_G (xtabs as (fact_xtab, _)) = | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 645 | let | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 646 | val learns = | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 647 | Graph.schedule (fn _ => fn (fact, (_, feats, deps)) => (fact, feats, deps)) access_G | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 648 | |> reorder_learns fact_xtab | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 649 | in | 
| 57380 | 650 | recompute_ffds_freqs_from_learns learns xtabs 0 empty_ffds empty_freqs | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 651 | end | 
| 50311 | 652 | |
| 653 | local | |
| 654 | ||
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 655 | val version = "*** MaSh version 20161123 ***" | 
| 50357 
187ae76a1757
more robustness in the face of MaSh format changes -- don't overwrite new versions with old versions
 blanchet parents: 
50356diff
changeset | 656 | |
| 53095 | 657 | exception FILE_VERSION_TOO_NEW of unit | 
| 50311 | 658 | |
| 659 | fun extract_node line = | |
| 55286 | 660 | (case space_explode ":" line of | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 661 | [head, tail] => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 662 | (case (space_explode " " head, map (unprefix " ") (space_explode ";" tail)) of | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 663 | ([kind, name], [parents, feats, deps]) => | 
| 57358 | 664 | SOME (proof_kind_of_str kind, decode_str name, decode_strs parents, decode_strs feats, | 
| 57010 | 665 | decode_strs deps) | 
| 55286 | 666 | | _ => NONE) | 
| 667 | | _ => NONE) | |
| 50311 | 668 | |
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 669 | fun would_load_state (memory_time, _) = | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 670 | let val path = state_file () in | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 671 | (case try OS.FileSys.modTime (File.platform_path path) of | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 672 | NONE => false | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 673 | | SOME disk_time => memory_time < disk_time) | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 674 | end; | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 675 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 676 | fun load_state ctxt (time_state as (memory_time, _)) = | 
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 677 | let val path = state_file () in | 
| 60971 | 678 | (case try OS.FileSys.modTime (File.platform_path path) of | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 679 | NONE => time_state | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 680 | | SOME disk_time => | 
| 62826 | 681 | if memory_time >= disk_time then | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 682 | time_state | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 683 | else | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 684 | (disk_time, | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 685 | (case try File.read_lines path of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 686 | SOME (version' :: node_lines) => | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 687 | let | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 688 | fun extract_line_and_add_node line = | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 689 | (case extract_node line of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 690 | NONE => I (* should not happen *) | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 691 | | SOME (kind, name, parents, feats, deps) => add_node kind name parents feats deps) | 
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 692 | |
| 57379 | 693 | val empty_G_etc = (Graph.empty, empty_xtabs, []) | 
| 694 | ||
| 695 | val (access_G, xtabs, rev_learns) = | |
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 696 | (case string_ord (version', version) of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 697 | EQUAL => | 
| 57379 | 698 | try_graph ctxt "loading state" empty_G_etc | 
| 699 | (fn () => fold extract_line_and_add_node node_lines empty_G_etc) | |
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 700 | | LESS => (remove_state_file (); empty_G_etc) (* cannot parse old file *) | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 701 | | GREATER => raise FILE_VERSION_TOO_NEW ()) | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 702 | |
| 57380 | 703 | val (ffds, freqs) = | 
| 704 | recompute_ffds_freqs_from_learns (rev rev_learns) xtabs 0 empty_ffds empty_freqs | |
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 705 | in | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 706 |              trace_msg ctxt (fn () => "Loaded fact graph (" ^ graph_info access_G ^ ")");
 | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 707 |              {access_G = access_G, xtabs = xtabs, ffds = ffds, freqs = freqs, dirty_facts = SOME []}
 | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 708 | end | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 709 | | _ => empty_state))) | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 710 | end | 
| 50311 | 711 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 712 | fun str_of_entry (kind, name, parents, feats, deps) = | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 713 | str_of_proof_kind kind ^ " " ^ encode_str name ^ ": " ^ encode_strs parents ^ "; " ^ | 
| 57358 | 714 | encode_strs feats ^ "; " ^ encode_strs deps ^ "\n" | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 715 | |
| 57365 | 716 | fun save_state _ (time_state as (_, {dirty_facts = SOME [], ...})) = time_state
 | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 717 |   | save_state ctxt (memory_time, {access_G, xtabs, ffds, freqs, dirty_facts}) =
 | 
| 50311 | 718 | let | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 719 | fun append_entry (name, ((kind, feats, deps), (parents, _))) = | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 720 | cons (kind, name, Graph.Keys.dest parents, feats, deps) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 721 | |
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 722 | val path = state_file () | 
| 57365 | 723 | val dirty_facts' = | 
| 60971 | 724 | (case try OS.FileSys.modTime (File.platform_path path) of | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 725 | NONE => NONE | 
| 62826 | 726 | | SOME disk_time => if disk_time <= memory_time then dirty_facts else NONE) | 
| 50311 | 727 | val (banner, entries) = | 
| 57365 | 728 | (case dirty_facts' of | 
| 55286 | 729 | SOME names => (NONE, fold (append_entry o Graph.get_entry access_G) names []) | 
| 730 | | NONE => (SOME (version ^ "\n"), Graph.fold append_entry access_G [])) | |
| 50311 | 731 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 732 | (case banner of SOME s => File.write path s | NONE => (); | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 733 | entries |> chunk_list 500 |> List.app (File.append path o implode o map str_of_entry)) | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 734 | handle IO.Io _ => (); | 
| 50311 | 735 | trace_msg ctxt (fn () => | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 736 |         "Saved fact graph (" ^ graph_info access_G ^
 | 
| 57365 | 737 | (case dirty_facts of | 
| 738 | SOME dirty_facts => "; " ^ string_of_int (length dirty_facts) ^ " dirty fact(s)" | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 739 | | _ => "") ^ ")"); | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 740 | (Time.now (), | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 741 |        {access_G = access_G, xtabs = xtabs, ffds = ffds, freqs = freqs, dirty_facts = SOME []})
 | 
| 50311 | 742 | end | 
| 743 | ||
| 57365 | 744 | val global_state = Synchronized.var "Sledgehammer_MaSh.global_state" (Time.zeroTime, empty_state) | 
| 50311 | 745 | |
| 746 | in | |
| 747 | ||
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 748 | fun map_state ctxt f = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 749 | (trace_msg ctxt (fn () => "Changing MaSh state"); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 750 | synchronized_timed_change global_state mash_time_limit | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 751 | (load_state ctxt ##> f #> save_state ctxt)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 752 | |> ignore | 
| 53095 | 753 | handle FILE_VERSION_TOO_NEW () => () | 
| 50311 | 754 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 755 | fun peek_state ctxt = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 756 | (trace_msg ctxt (fn () => "Peeking at MaSh state"); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 757 | (case synchronized_timed_value global_state mash_time_limit of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 758 | NONE => NONE | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 759 | | SOME state => if would_load_state state then NONE else SOME state)) | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 760 | |
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 761 | fun get_state ctxt = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 762 | (trace_msg ctxt (fn () => "Retrieving MaSh state"); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 763 | synchronized_timed_change_result global_state mash_time_limit | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 764 | (perhaps (try (load_state ctxt)) #> `snd)) | 
| 50311 | 765 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 766 | fun clear_state ctxt = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 767 | (trace_msg ctxt (fn () => "Clearing MaSh state"); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 768 | Synchronized.change global_state (fn _ => (remove_state_file (); (Time.zeroTime, empty_state)))) | 
| 50311 | 769 | |
| 770 | end | |
| 771 | ||
| 772 | ||
| 773 | (*** Isabelle helpers ***) | |
| 774 | ||
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 775 | fun crude_printed_term size t = | 
| 61321 | 776 | let | 
| 777 | fun term _ (res, 0) = (res, 0) | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 778 | | term (t $ u) (res, size) = | 
| 61321 | 779 | let | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 780 |           val (res, size) = term t (res ^ "(", size)
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 781 | val (res, size) = term u (res ^ " ", size) | 
| 61321 | 782 | in | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 783 | (res ^ ")", size) | 
| 61321 | 784 | end | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 785 | | term (Abs (s, _, t)) (res, size) = term t (res ^ "%" ^ s ^ ".", size - 1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 786 | | term (Bound n) (res, size) = (res ^ "#" ^ string_of_int n, size - 1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 787 | | term (Const (s, _)) (res, size) = (res ^ Long_Name.base_name s, size - 1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 788 | | term (Free (s, _)) (res, size) = (res ^ s, size - 1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 789 | | term (Var ((s, _), _)) (res, size) = (res ^ s, size - 1) | 
| 61321 | 790 | in | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 791 |     fst (term t ("", size))
 | 
| 61321 | 792 | end | 
| 48378 | 793 | |
| 61321 | 794 | fun nickname_of_thm th = | 
| 48394 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 795 | if Thm.has_name_hint th then | 
| 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 796 | let val hint = Thm.get_name_hint th in | 
| 50722 | 797 | (* There must be a better way to detect local facts. *) | 
| 59888 | 798 | (case Long_Name.dest_local hint of | 
| 60641 | 799 | SOME suf => | 
| 65458 | 800 | Long_Name.implode [Thm.theory_name th, suf, crude_printed_term 25 (Thm.prop_of th)] | 
| 55286 | 801 | | NONE => hint) | 
| 48394 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 802 | end | 
| 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 803 | else | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 804 | crude_printed_term 50 (Thm.prop_of th) | 
| 48378 | 805 | |
| 51134 | 806 | fun find_suggested_facts ctxt facts = | 
| 48330 | 807 | let | 
| 61321 | 808 | fun add (fact as (_, th)) = Symtab.default (nickname_of_thm th, fact) | 
| 51134 | 809 | val tab = fold add facts Symtab.empty | 
| 810 | fun lookup nick = | |
| 811 | Symtab.lookup tab nick | |
| 57006 | 812 | |> tap (fn NONE => trace_msg ctxt (fn () => "Cannot find " ^ quote nick) | _ => ()) | 
| 51134 | 813 | in map_filter lookup end | 
| 48311 | 814 | |
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 815 | fun free_feature_of s = "f" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 816 | fun thy_feature_of s = "y" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 817 | fun type_feature_of s = "t" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 818 | fun class_feature_of s = "s" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 819 | val local_feature = "local" | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 820 | |
| 60948 | 821 | fun crude_thm_ord ctxt = | 
| 822 | let | |
| 823 | val ancestor_lengths = | |
| 824 | fold (fn thy => Symtab.update (Context.theory_name thy, length (Context.ancestors_of thy))) | |
| 825 | (Theory.nodes_of (Proof_Context.theory_of ctxt)) Symtab.empty | |
| 826 | val ancestor_length = Symtab.lookup ancestor_lengths o Context.theory_id_name | |
| 61321 | 827 | |
| 60948 | 828 | fun crude_theory_ord p = | 
| 829 | if Context.eq_thy_id p then EQUAL | |
| 830 | else if Context.proper_subthy_id p then LESS | |
| 831 | else if Context.proper_subthy_id (swap p) then GREATER | |
| 832 | else | |
| 833 | (case apply2 ancestor_length p of | |
| 834 | (SOME m, SOME n) => | |
| 835 | (case int_ord (m, n) of | |
| 836 | EQUAL => string_ord (apply2 Context.theory_id_name p) | |
| 837 | | ord => ord) | |
| 838 | | _ => string_ord (apply2 Context.theory_id_name p)) | |
| 839 | in | |
| 840 | fn p => | |
| 65458 | 841 | (case crude_theory_ord (apply2 Thm.theory_id p) of | 
| 60948 | 842 | EQUAL => | 
| 843 | (* The hack below is necessary because of odd dependencies that are not reflected in the theory | |
| 844 | comparison. *) | |
| 61321 | 845 | let val q = apply2 nickname_of_thm p in | 
| 60948 | 846 | (* Hack to put "xxx_def" before "xxxI" and "xxxE" *) | 
| 847 | (case bool_ord (apply2 (String.isSuffix "_def") (swap q)) of | |
| 848 | EQUAL => string_ord q | |
| 849 | | ord => ord) | |
| 850 | end | |
| 851 | | ord => ord) | |
| 852 | end; | |
| 48324 | 853 | |
| 65458 | 854 | val thm_less_eq = Context.subthy_id o apply2 Thm.theory_id | 
| 51136 | 855 | fun thm_less p = thm_less_eq p andalso not (thm_less_eq (swap p)) | 
| 856 | ||
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 857 | val freezeT = Type.legacy_freeze_type | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 858 | |
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 859 | fun freeze (t $ u) = freeze t $ freeze u | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 860 | | freeze (Abs (s, T, t)) = Abs (s, freezeT T, freeze t) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 861 | | freeze (Var ((s, _), T)) = Free (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 862 | | freeze (Const (s, T)) = Const (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 863 | | freeze (Free (s, T)) = Free (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 864 | | freeze t = t | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 865 | |
| 59621 
291934bac95e
Thm.cterm_of and Thm.ctyp_of operate on local context;
 wenzelm parents: 
59582diff
changeset | 866 | fun goal_of_thm thy = Thm.prop_of #> freeze #> Thm.global_cterm_of thy #> Goal.init | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 867 | |
| 54141 
f57f8e7a879f
generate a comment storing the goal nickname in "learn_prover"
 blanchet parents: 
54140diff
changeset | 868 | fun run_prover_for_mash ctxt params prover goal_name facts goal = | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 869 | let | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 870 | val problem = | 
| 54141 
f57f8e7a879f
generate a comment storing the goal nickname in "learn_prover"
 blanchet parents: 
54140diff
changeset | 871 |       {comment = "Goal: " ^ goal_name, state = Proof.init ctxt, goal = goal, subgoal = 1,
 | 
| 62735 | 872 |        subgoal_count = 1, factss = [("", facts)], found_proof = I}
 | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 873 | in | 
| 57735 
056a55b44ec7
eliminated Sledgehammer's "min" subcommand (and lots of complications in the code)
 blanchet parents: 
57662diff
changeset | 874 | get_minimizing_prover ctxt MaSh (K ()) prover params problem | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 875 | end | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 876 | |
| 48326 | 877 | val bad_types = [@{type_name prop}, @{type_name bool}, @{type_name fun}]
 | 
| 878 | ||
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 879 | val crude_str_of_sort = space_implode "," o map Long_Name.base_name o subtract (op =) @{sort type}
 | 
| 53086 
15fe0ca088b3
MaSh tweaking: shorter names + killed (broken) SNoW
 blanchet parents: 
53085diff
changeset | 880 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 881 | fun crude_str_of_typ (Type (s, [])) = Long_Name.base_name s | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 882 | | crude_str_of_typ (Type (s, Ts)) = Long_Name.base_name s ^ implode (map crude_str_of_typ Ts) | 
| 53083 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 883 | | crude_str_of_typ (TFree (_, S)) = crude_str_of_sort S | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 884 | | crude_str_of_typ (TVar (_, S)) = crude_str_of_sort S | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 885 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 886 | fun maybe_singleton_str "" = [] | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 887 | | maybe_singleton_str s = [s] | 
| 53128 | 888 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 889 | val max_pat_breadth = 5 (* FUDGE *) | 
| 50585 | 890 | |
| 57406 | 891 | fun term_features_of ctxt thy_name term_max_depth type_max_depth ts = | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 892 | let | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 893 | val thy = Proof_Context.theory_of ctxt | 
| 53082 | 894 | |
| 50393 | 895 | val fixes = map snd (Variable.dest_fixes ctxt) | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 896 | val classes = Sign.classes_of thy | 
| 53082 | 897 | |
| 48304 | 898 |     fun add_classes @{sort type} = I
 | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 899 | | add_classes S = | 
| 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 900 | fold (`(Sorts.super_classes classes) | 
| 57006 | 901 | #> swap #> op :: | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 902 |           #> subtract (op =) @{sort type}
 | 
| 57006 | 903 | #> map class_feature_of | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 904 | #> union (op =)) S | 
| 53082 | 905 | |
| 906 | fun pattify_type 0 _ = [] | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 907 | | pattify_type depth (Type (s, [])) = if member (op =) bad_types s then [] else [s] | 
| 53082 | 908 | | pattify_type depth (Type (s, U :: Ts)) = | 
| 909 | let | |
| 910 | val T = Type (s, Ts) | |
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 911 | val ps = take max_pat_breadth (pattify_type depth T) | 
| 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 912 |           val qs = take max_pat_breadth ("" :: pattify_type (depth - 1) U)
 | 
| 57006 | 913 | in | 
| 914 |           map_product (fn p => fn "" => p | q => p ^ "(" ^ q ^ ")") ps qs
 | |
| 915 | end | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 916 | | pattify_type _ (TFree (_, S)) = maybe_singleton_str (crude_str_of_sort S) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 917 | | pattify_type _ (TVar (_, S)) = maybe_singleton_str (crude_str_of_sort S) | 
| 57006 | 918 | |
| 53082 | 919 | fun add_type_pat depth T = | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 920 | union (op =) (map type_feature_of (pattify_type depth T)) | 
| 57006 | 921 | |
| 53082 | 922 | fun add_type_pats 0 _ = I | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 923 | | add_type_pats depth t = add_type_pat depth t #> add_type_pats (depth - 1) t | 
| 57006 | 924 | |
| 53083 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 925 | fun add_type T = | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 926 | add_type_pats type_max_depth T | 
| 53156 | 927 | #> fold_atyps_sorts (add_classes o snd) T | 
| 57006 | 928 | |
| 53084 | 929 | fun add_subtypes (T as Type (_, Ts)) = add_type T #> fold add_subtypes Ts | 
| 930 | | add_subtypes T = add_type T | |
| 53082 | 931 | |
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 932 | fun pattify_term _ 0 _ = [] | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 933 | | pattify_term _ depth (Const (s, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 934 | if is_widely_irrelevant_const s then [] else [s] | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 935 | | pattify_term _ _ (Free (s, T)) = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 936 | maybe_singleton_str (crude_str_of_typ T) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 937 | |> (if member (op =) fixes s then cons (free_feature_of (Long_Name.append thy_name s)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 938 | else I) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 939 | | pattify_term _ _ (Var (_, T)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 940 | maybe_singleton_str (crude_str_of_typ T) | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 941 | | pattify_term Ts _ (Bound j) = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 942 | maybe_singleton_str (crude_str_of_typ (nth Ts j)) | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 943 | | pattify_term Ts depth (t $ u) = | 
| 50339 
d8dae91f3107
MaSh improvements: deeper patterns + more respect for chained facts
 blanchet parents: 
50338diff
changeset | 944 | let | 
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 945 | val ps = take max_pat_breadth (pattify_term Ts depth t) | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 946 |           val qs = take max_pat_breadth ("" :: pattify_term Ts (depth - 1) u)
 | 
| 53130 | 947 | in | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 948 |           map_product (fn p => fn "" => p | q => p ^ "(" ^ q ^ ")") ps qs
 | 
| 53130 | 949 | end | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 950 | | pattify_term _ _ _ = [] | 
| 57006 | 951 | |
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 952 | fun add_term_pat Ts = union (op =) oo pattify_term Ts | 
| 57006 | 953 | |
| 53130 | 954 | fun add_term_pats _ 0 _ = I | 
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 955 | | add_term_pats Ts depth t = add_term_pat Ts depth t #> add_term_pats Ts (depth - 1) t | 
| 57006 | 956 | |
| 53130 | 957 | fun add_term Ts = add_term_pats Ts term_max_depth | 
| 57006 | 958 | |
| 53085 | 959 | fun add_subterms Ts t = | 
| 55286 | 960 | (case strip_comb t of | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 961 | (Const (s, T), args) => | 
| 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 962 | (not (is_widely_irrelevant_const s) ? add_term Ts t) | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 963 | #> add_subtypes T #> fold (add_subterms Ts) args | 
| 50857 
80768e28c9ee
better handlig of built-ins -- at the top-level, not in subterms
 blanchet parents: 
50841diff
changeset | 964 | | (head, args) => | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 965 | (case head of | 
| 53130 | 966 | Free (_, T) => add_term Ts t #> add_subtypes T | 
| 53084 | 967 | | Var (_, T) => add_subtypes T | 
| 53085 | 968 | | Abs (_, T, body) => add_subtypes T #> add_subterms (T :: Ts) body | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 969 | | _ => I) | 
| 55286 | 970 | #> fold (add_subterms Ts) args) | 
| 57006 | 971 | in | 
| 972 | fold (add_subterms []) ts [] | |
| 973 | end | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 974 | |
| 53085 | 975 | val term_max_depth = 2 | 
| 53155 | 976 | val type_max_depth = 1 | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 977 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 978 | (* TODO: Generate type classes for types? *) | 
| 60640 | 979 | fun features_of ctxt thy_name (scope, _) ts = | 
| 980 | thy_feature_of thy_name :: | |
| 981 | term_features_of ctxt thy_name term_max_depth type_max_depth ts | |
| 982 | |> scope <> Global ? cons local_feature | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 983 | |
| 57006 | 984 | (* Too many dependencies is a sign that a decision procedure is at work. There is not much to learn | 
| 985 | from such proofs. *) | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 986 | val max_dependencies = 20 (* FUDGE *) | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 987 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 988 | val prover_default_max_facts = 25 (* FUDGE *) | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 989 | |
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 990 | (* "type_definition_xxx" facts are characterized by their use of "CollectI". *) | 
| 61321 | 991 | val typedef_dep = nickname_of_thm @{thm CollectI}
 | 
| 57006 | 992 | (* Mysterious parts of the class machinery create lots of proofs that refer exclusively to | 
| 993 | "someI_ex" (and to some internal constructions). *) | |
| 61321 | 994 | val class_some_dep = nickname_of_thm @{thm someI_ex}
 | 
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 995 | |
| 50828 | 996 | val fundef_ths = | 
| 57006 | 997 |   @{thms fundef_ex1_existence fundef_ex1_uniqueness fundef_ex1_iff fundef_default_value}
 | 
| 61321 | 998 | |> map nickname_of_thm | 
| 50828 | 999 | |
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 1000 | (* "Rep_xxx_inject", "Abs_xxx_inverse", etc., are derived using these facts. *) | 
| 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 1001 | val typedef_ths = | 
| 57006 | 1002 |   @{thms type_definition.Abs_inverse type_definition.Rep_inverse type_definition.Rep
 | 
| 1003 | type_definition.Rep_inject type_definition.Abs_inject type_definition.Rep_cases | |
| 1004 | type_definition.Abs_cases type_definition.Rep_induct type_definition.Abs_induct | |
| 1005 | type_definition.Rep_range type_definition.Abs_image} | |
| 61321 | 1006 | |> map nickname_of_thm | 
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 1007 | |
| 61321 | 1008 | fun is_size_def [dep] th = | 
| 55642 
63beb38e9258
adapted to renaming of datatype 'cases' and 'recs' to 'case' and 'rec'
 blanchet parents: 
55286diff
changeset | 1009 | (case first_field ".rec" dep of | 
| 57006 | 1010 | SOME (pref, _) => | 
| 61321 | 1011 | (case first_field ".size" (nickname_of_thm th) of | 
| 57006 | 1012 | SOME (pref', _) => pref = pref' | 
| 1013 | | NONE => false) | |
| 1014 | | NONE => false) | |
| 61321 | 1015 | | is_size_def _ _ = false | 
| 48441 | 1016 | |
| 51177 | 1017 | fun trim_dependencies deps = | 
| 50755 | 1018 | if length deps > max_dependencies then NONE else SOME deps | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1019 | |
| 61321 | 1020 | fun isar_dependencies_of name_tabs th = | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1021 | thms_in_proof max_dependencies (SOME name_tabs) th | 
| 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1022 | |> Option.map (fn deps => | 
| 57006 | 1023 | if deps = [typedef_dep] orelse deps = [class_some_dep] orelse | 
| 57757 | 1024 | exists (member (op =) fundef_ths) deps orelse exists (member (op =) typedef_ths) deps orelse | 
| 61321 | 1025 | is_size_def deps th then | 
| 50755 | 1026 | [] | 
| 1027 | else | |
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1028 | deps) | 
| 48404 | 1029 | |
| 57006 | 1030 | fun prover_dependencies_of ctxt (params as {verbose, max_facts, ...}) prover auto_level facts
 | 
| 1031 | name_tabs th = | |
| 61321 | 1032 | (case isar_dependencies_of name_tabs th of | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1033 | SOME [] => (false, []) | 
| 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1034 | | isar_deps0 => | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1035 | let | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1036 | val isar_deps = these isar_deps0 | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1037 | val thy = Proof_Context.theory_of ctxt | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1038 | val goal = goal_of_thm thy th | 
| 61321 | 1039 | val name = nickname_of_thm th | 
| 52196 
2281f33e8da6
redid rac7830871177 to avoid duplicate fixed variable (e.g. lemma "P (a::nat)" proof - have "!!a::int. Q a" sledgehammer [e])
 blanchet parents: 
52125diff
changeset | 1040 | val (_, hyp_ts, concl_t) = ATP_Util.strip_subgoal goal 1 ctxt | 
| 51136 | 1041 | val facts = facts |> filter (fn (_, th') => thm_less (th', th)) | 
| 57006 | 1042 | |
| 61321 | 1043 | fun nickify ((_, stature), th) = ((nickname_of_thm th, stature), th) | 
| 57006 | 1044 | |
| 61321 | 1045 | fun is_dep dep (_, th) = (nickname_of_thm th = dep) | 
| 57006 | 1046 | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1047 | fun add_isar_dep facts dep accum = | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1048 | if exists (is_dep dep) accum then | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1049 | accum | 
| 55286 | 1050 | else | 
| 1051 | (case find_first (is_dep dep) facts of | |
| 1052 |             SOME ((_, status), th) => accum @ [(("", status), th)]
 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1053 | | NONE => accum (* should not happen *)) | 
| 57006 | 1054 | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1055 | val mepo_facts = | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1056 | facts | 
| 54095 | 1057 | |> mepo_suggested_facts ctxt params (max_facts |> the_default prover_default_max_facts) NONE | 
| 1058 | hyp_ts concl_t | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1059 | val facts = | 
| 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1060 | mepo_facts | 
| 50754 
74a6adcb96ac
also generate queries for goals with too many Isar dependencies
 blanchet parents: 
50751diff
changeset | 1061 | |> fold (add_isar_dep facts) isar_deps | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 1062 | |> map nickify | 
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1063 | val num_isar_deps = length isar_deps | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1064 | in | 
| 48404 | 1065 | if verbose andalso auto_level = 0 then | 
| 58843 | 1066 |         writeln ("MaSh: " ^ quote prover ^ " on " ^ quote name ^ " with " ^
 | 
| 57017 | 1067 | string_of_int num_isar_deps ^ " + " ^ string_of_int (length facts - num_isar_deps) ^ | 
| 63692 | 1068 | " facts") | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1069 | else | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1070 | (); | 
| 55286 | 1071 | (case run_prover_for_mash ctxt params prover name facts goal of | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1072 |         {outcome = NONE, used_facts, ...} =>
 | 
| 48404 | 1073 | (if verbose andalso auto_level = 0 then | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1074 | let val num_facts = length used_facts in | 
| 58843 | 1075 |              writeln ("Found proof with " ^ string_of_int num_facts ^ " fact" ^
 | 
| 63692 | 1076 | plural_s num_facts) | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1077 | end | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1078 | else | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1079 | (); | 
| 50754 
74a6adcb96ac
also generate queries for goals with too many Isar dependencies
 blanchet parents: 
50751diff
changeset | 1080 | (true, map fst used_facts)) | 
| 55286 | 1081 | | _ => (false, isar_deps)) | 
| 1082 | end) | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1083 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1084 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1085 | (*** High-level communication with MaSh ***) | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1086 | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1087 | (* In the following functions, chunks are risers w.r.t. "thm_less_eq". *) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1088 | |
| 61321 | 1089 | fun chunks_and_parents_for chunks th = | 
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1090 | let | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1091 | fun insert_parent new parents = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1092 | let val parents = parents |> filter_out (fn p => thm_less_eq (p, new)) in | 
| 57096 
e4074b91b2a6
always remove duplicates in meshing + use weights for Naive Bayes
 blanchet parents: 
57095diff
changeset | 1093 | parents |> forall (fn p => not (thm_less_eq (new, p))) parents ? cons new | 
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1094 | end | 
| 57006 | 1095 | |
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1096 | fun rechunk seen (rest as th' :: ths) = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1097 | if thm_less_eq (th', th) then (rev seen, rest) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1098 | else rechunk (th' :: seen) ths | 
| 57006 | 1099 | |
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1100 | fun do_chunk [] accum = accum | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1101 | | do_chunk (chunk as hd_chunk :: _) (chunks, parents) = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1102 | if thm_less_eq (hd_chunk, th) then | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1103 | (chunk :: chunks, insert_parent hd_chunk parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1104 | else if thm_less_eq (List.last chunk, th) then | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1105 | let val (front, back as hd_back :: _) = rechunk [] chunk in | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1106 | (front :: back :: chunks, insert_parent hd_back parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1107 | end | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1108 | else | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1109 | (chunk :: chunks, parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1110 | in | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1111 | fold_rev do_chunk chunks ([], []) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1112 | |>> cons [] | 
| 61321 | 1113 | ||> map nickname_of_thm | 
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1114 | end | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1115 | |
| 61321 | 1116 | fun attach_parents_to_facts _ [] = [] | 
| 1117 | | attach_parents_to_facts old_facts (facts as (_, th) :: _) = | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1118 | let | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1119 | fun do_facts _ [] = [] | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1120 | | do_facts (_, parents) [fact] = [(parents, fact)] | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1121 | | do_facts (chunks, parents) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1122 | ((fact as (_, th)) :: (facts as (_, th') :: _)) = | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1123 | let | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1124 | val chunks = app_hd (cons th) chunks | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1125 | val chunks_and_parents' = | 
| 60638 | 1126 | if thm_less_eq (th, th') andalso | 
| 65458 | 1127 | Thm.theory_name th = Thm.theory_name th' | 
| 61321 | 1128 | then (chunks, [nickname_of_thm th]) | 
| 1129 | else chunks_and_parents_for chunks th' | |
| 57006 | 1130 | in | 
| 1131 | (parents, fact) :: do_facts chunks_and_parents' facts | |
| 1132 | end | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1133 | in | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1134 | old_facts @ facts | 
| 61321 | 1135 | |> do_facts (chunks_and_parents_for [[]] th) | 
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1136 | |> drop (length old_facts) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1137 | end | 
| 51177 | 1138 | |
| 61318 | 1139 | fun maximal_wrt_graph _ [] = [] | 
| 1140 | | maximal_wrt_graph G keys = | |
| 1141 | if can (Graph.get_node G o the_single) keys then | |
| 1142 | keys | |
| 1143 | else | |
| 1144 | let | |
| 1145 | val tab = Symtab.empty |> fold (fn name => Symtab.default (name, ())) keys | |
| 57006 | 1146 | |
| 61318 | 1147 | fun insert_new seen name = not (Symtab.defined seen name) ? insert (op =) name | 
| 57006 | 1148 | |
| 61318 | 1149 | fun num_keys keys = Graph.Keys.fold (K (Integer.add 1)) keys 0 | 
| 57006 | 1150 | |
| 61318 | 1151 | fun find_maxes _ (maxs, []) = map snd maxs | 
| 1152 | | find_maxes seen (maxs, new :: news) = | |
| 1153 | find_maxes (seen |> num_keys (Graph.imm_succs G new) > 1 ? Symtab.default (new, ())) | |
| 1154 | (if Symtab.defined tab new then | |
| 1155 | let | |
| 1156 | val newp = Graph.all_preds G [new] | |
| 1157 | fun is_ancestor x yp = member (op =) yp x | |
| 1158 | val maxs = maxs |> filter (fn (_, max) => not (is_ancestor max newp)) | |
| 1159 | in | |
| 1160 | if exists (is_ancestor new o fst) maxs then (maxs, news) | |
| 1161 | else ((newp, new) :: filter_out (fn (_, max) => is_ancestor max newp) maxs, news) | |
| 1162 | end | |
| 1163 | else | |
| 1164 | (maxs, Graph.Keys.fold (insert_new seen) (Graph.imm_preds G new) news)) | |
| 1165 | in | |
| 1166 | find_maxes Symtab.empty ([], | |
| 1167 | G |> Graph.restrict (not o String.isPrefix anonymous_proof_prefix) |> Graph.maximals) | |
| 1168 | end | |
| 53095 | 1169 | |
| 63698 | 1170 | val max_facts_for_shuffle_cleanup = 20 | 
| 1171 | ||
| 61321 | 1172 | fun maximal_wrt_access_graph _ [] = [] | 
| 63698 | 1173 | | maximal_wrt_access_graph access_G (fact :: facts) = | 
| 1174 | let | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1175 | fun cleanup_wrt (_, th) = | 
| 65458 | 1176 | let val thy_id = Thm.theory_id th in | 
| 63698 | 1177 | filter_out (fn (_, th') => | 
| 65458 | 1178 | Context.proper_subthy_id (Thm.theory_id th', thy_id)) | 
| 63698 | 1179 | end | 
| 1180 | ||
| 1181 | fun shuffle_cleanup accum [] = accum | |
| 1182 | | shuffle_cleanup accum (fact :: facts) = | |
| 1183 | let | |
| 1184 | val accum' = accum |> cleanup_wrt fact | |
| 1185 | val facts' = facts |> cleanup_wrt fact | |
| 1186 | in | |
| 1187 | shuffle_cleanup accum' facts' | |
| 1188 | end | |
| 1189 | in | |
| 1190 | fact :: cleanup_wrt fact facts | |
| 1191 | |> (fn facts => facts | |
| 1192 | |> length facts <= max_facts_for_shuffle_cleanup ? shuffle_cleanup []) | |
| 61321 | 1193 | |> map (nickname_of_thm o snd) | 
| 57460 | 1194 | |> maximal_wrt_graph access_G | 
| 1195 | end | |
| 53095 | 1196 | |
| 61321 | 1197 | fun is_fact_in_graph access_G = can (Graph.get_node access_G) o nickname_of_thm | 
| 53095 | 1198 | |
| 53197 | 1199 | val chained_feature_factor = 0.5 (* FUDGE *) | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1200 | val extra_feature_factor = 0.1 (* FUDGE *) | 
| 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1201 | val num_extra_feature_facts = 10 (* FUDGE *) | 
| 53095 | 1202 | |
| 57660 | 1203 | val max_proximity_facts = 100 (* FUDGE *) | 
| 53095 | 1204 | |
| 54060 | 1205 | fun find_mash_suggestions ctxt max_facts suggs facts chained raw_unknown = | 
| 1206 | let | |
| 1207 | val inter_fact = inter (eq_snd Thm.eq_thm_prop) | |
| 1208 | val raw_mash = find_suggested_facts ctxt facts suggs | |
| 1209 | val proximate = take max_proximity_facts facts | |
| 1210 | val unknown_chained = inter_fact raw_unknown chained | |
| 1211 | val unknown_proximate = inter_fact raw_unknown proximate | |
| 1212 | val mess = | |
| 1213 | [(0.9 (* FUDGE *), (map (rpair 1.0) unknown_chained, [])), | |
| 1214 | (0.4 (* FUDGE *), (weight_facts_smoothly unknown_proximate, [])), | |
| 1215 | (0.1 (* FUDGE *), (weight_facts_steeply raw_mash, raw_unknown))] | |
| 57007 | 1216 | val unknown = raw_unknown | 
| 1217 | |> fold (subtract (eq_snd Thm.eq_thm_prop)) [unknown_chained, unknown_proximate] | |
| 57006 | 1218 | in | 
| 61322 | 1219 | (mesh_facts (fact_distinct (op aconv)) (eq_snd (gen_eq_thm ctxt)) max_facts mess, unknown) | 
| 57006 | 1220 | end | 
| 53095 | 1221 | |
| 60649 
e007aa6a8aa2
more explicit use of context and elimination of Thm.theory_of_thm, although unclear (and untested?) situations remain;
 wenzelm parents: 
60641diff
changeset | 1222 | fun mash_suggested_facts ctxt thy_name ({debug, ...} : params) max_suggs hyp_ts concl_t facts =
 | 
| 53095 | 1223 | let | 
| 57532 | 1224 | val algorithm = the_mash_algorithm () | 
| 57052 
ea5912e3b008
until naive Bayes supports weights, don't incorporate 'extra' low-weight features
 blanchet parents: 
57039diff
changeset | 1225 | |
| 57460 | 1226 | val facts = facts | 
| 60641 | 1227 | |> rev_sort_list_prefix (crude_thm_ord ctxt o apply2 snd) | 
| 57460 | 1228 | (Int.max (num_extra_feature_facts, max_proximity_facts)) | 
| 1229 | ||
| 1230 | val chained = filter (fn ((_, (scope, _)), _) => scope = Chained) facts | |
| 54085 | 1231 | |
| 65458 | 1232 | fun fact_has_right_theory (_, th) = thy_name = Thm.theory_name th | 
| 56995 | 1233 | |
| 53141 
d27e99a6a679
take chained and proximate facts into consideration when computing MaSh features
 blanchet parents: 
53140diff
changeset | 1234 | fun chained_or_extra_features_of factor (((_, stature), th), weight) = | 
| 59582 | 1235 | [Thm.prop_of th] | 
| 65458 | 1236 | |> features_of ctxt (Thm.theory_name th) stature | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 1237 | |> map (rpair (weight * factor)) | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1238 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1239 | (case get_state ctxt of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1240 | NONE => ([], []) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1241 |     | SOME {access_G, xtabs = ((num_facts, fact_tab), (num_feats, feat_tab)), ffds, freqs, ...} =>
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1242 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1243 | val goal_feats0 = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1244 | features_of ctxt thy_name (Local, General) (concl_t :: hyp_ts) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1245 | val chained_feats = chained | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1246 | |> map (rpair 1.0) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1247 | |> map (chained_or_extra_features_of chained_feature_factor) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1248 | |> rpair [] |-> fold (union (eq_fst (op =))) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1249 | val extra_feats = facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1250 | |> take (Int.max (0, num_extra_feature_facts - length chained)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1251 | |> filter fact_has_right_theory | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1252 | |> weight_facts_steeply | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1253 | |> map (chained_or_extra_features_of extra_feature_factor) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1254 | |> rpair [] |-> fold (union (eq_fst (op =))) | 
| 57017 | 1255 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1256 | val goal_feats = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1257 | fold (union (eq_fst (op =))) [chained_feats, extra_feats] (map (rpair 1.0) goal_feats0) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1258 | |> debug ? sort (Real.compare o swap o apply2 snd) | 
| 57460 | 1259 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1260 | val parents = maximal_wrt_access_graph access_G facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1261 | val visible_facts = map_filter (Symtab.lookup fact_tab) (Graph.all_preds access_G parents) | 
| 57460 | 1262 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1263 | val suggs = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1264 | if algorithm = MaSh_NB_Ext orelse algorithm = MaSh_kNN_Ext then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1265 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1266 | val learns = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1267 | Graph.schedule (fn _ => fn (fact, (_, feats, deps)) => (fact, feats, deps)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1268 | access_G | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1269 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1270 | MaSh.query_external ctxt algorithm max_suggs learns goal_feats | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1271 | end | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1272 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1273 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1274 | val int_goal_feats = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1275 | map_filter (fn (s, w) => Option.map (rpair w) (Symtab.lookup feat_tab s)) goal_feats | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1276 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1277 | MaSh.query_internal ctxt algorithm num_facts num_feats ffds freqs visible_facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1278 | max_suggs goal_feats int_goal_feats | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1279 | end | 
| 57017 | 1280 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1281 | val unknown = filter_out (is_fact_in_graph access_G o snd) facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1282 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1283 | find_mash_suggestions ctxt max_suggs suggs facts chained unknown | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1284 | |> apply2 (map fact_of_raw_fact) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1285 | end) | 
| 53095 | 1286 | end | 
| 1287 | ||
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1288 | fun mash_unlearn ctxt = (clear_state ctxt; writeln "Reset MaSh") | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1289 | |
| 57661 | 1290 | fun learn_wrt_access_graph ctxt (name, parents, feats, deps) | 
| 1291 | (accum as (access_G, (fact_xtab, feat_xtab))) = | |
| 53095 | 1292 | let | 
| 57371 | 1293 | fun maybe_learn_from from (accum as (parents, access_G)) = | 
| 57013 
ed95456499e6
better way to take invisible facts into account than 'island' business
 blanchet parents: 
57012diff
changeset | 1294 | try_graph ctxt "updating graph" accum (fn () => | 
| 57371 | 1295 | (from :: parents, Graph.add_edge_acyclic (from, name) access_G)) | 
| 1296 | ||
| 1297 | val access_G = access_G |> Graph.default_node (name, (Isar_Proof, feats, deps)) | |
| 1298 | val (parents, access_G) = ([], access_G) |> fold maybe_learn_from parents | |
| 1299 | val (deps, _) = ([], access_G) |> fold maybe_learn_from deps | |
| 1300 | ||
| 57661 | 1301 | val fact_xtab = add_to_xtab name fact_xtab | 
| 57371 | 1302 | val feat_xtab = fold maybe_add_to_xtab feats feat_xtab | 
| 57006 | 1303 | in | 
| 57661 | 1304 | (SOME (name, parents, feats, deps), (access_G, (fact_xtab, feat_xtab))) | 
| 57006 | 1305 | end | 
| 57661 | 1306 | handle Symtab.DUP _ => (NONE, accum) (* facts sometimes have the same name, confusingly *) | 
| 53095 | 1307 | |
| 57381 | 1308 | fun relearn_wrt_access_graph ctxt (name, deps) access_G = | 
| 53095 | 1309 | let | 
| 57371 | 1310 | fun maybe_relearn_from from (accum as (parents, access_G)) = | 
| 53095 | 1311 | try_graph ctxt "updating graph" accum (fn () => | 
| 57371 | 1312 | (from :: parents, Graph.add_edge_acyclic (from, name) access_G)) | 
| 1313 | val access_G = | |
| 1314 | access_G |> Graph.map_node name (fn (_, feats, _) => (Automatic_Proof, feats, deps)) | |
| 1315 | val (deps, _) = ([], access_G) |> fold maybe_relearn_from deps | |
| 57006 | 1316 | in | 
| 57381 | 1317 | ((name, deps), access_G) | 
| 57006 | 1318 | end | 
| 53095 | 1319 | |
| 1320 | fun flop_wrt_access_graph name = | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1321 | Graph.map_node name (fn (_, feats, deps) => (Isar_Proof_wegen_Prover_Flop, feats, deps)) | 
| 53095 | 1322 | |
| 57273 
01b68f625550
automatically learn MaSh facts also in 'blocking' mode
 blanchet parents: 
57150diff
changeset | 1323 | val learn_timeout_slack = 20.0 | 
| 53095 | 1324 | |
| 1325 | fun launch_thread timeout task = | |
| 1326 | let | |
| 1327 | val hard_timeout = time_mult learn_timeout_slack timeout | |
| 1328 | val birth_time = Time.now () | |
| 62826 | 1329 | val death_time = birth_time + hard_timeout | 
| 53095 | 1330 |     val desc = ("Machine learner for Sledgehammer", "")
 | 
| 57006 | 1331 | in | 
| 59471 
ca459352d8c5
more explicit indication of Async_Manager_Legacy as Proof General legacy;
 wenzelm parents: 
59172diff
changeset | 1332 | Async_Manager_Legacy.thread MaShN birth_time death_time desc task | 
| 57006 | 1333 | end | 
| 53095 | 1334 | |
| 61318 | 1335 | fun anonymous_proof_name () = | 
| 1336 | Date.fmt (anonymous_proof_prefix ^ "%Y%m%d.%H%M%S.") (Date.fromTimeLocal (Time.now ())) ^ | |
| 1337 | serial_string () | |
| 57013 
ed95456499e6
better way to take invisible facts into account than 'island' business
 blanchet parents: 
57012diff
changeset | 1338 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1339 | fun mash_learn_proof ctxt ({timeout, ...} : params) t facts used_ths =
 | 
| 57387 
2b6fe2a48352
reintroduced MaSh hints, this time as persistent creatures
 blanchet parents: 
57386diff
changeset | 1340 | if not (null used_ths) andalso is_mash_enabled () then | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1341 | launch_thread timeout (fn () => | 
| 57006 | 1342 | let | 
| 1343 | val thy = Proof_Context.theory_of ctxt | |
| 60640 | 1344 | val feats = features_of ctxt (Context.theory_name thy) (Local, General) [t] | 
| 60641 | 1345 | val facts = rev_sort_list_prefix (crude_thm_ord ctxt o apply2 snd) 1 facts | 
| 57006 | 1346 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1347 | map_state ctxt | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1348 |           (fn {access_G, xtabs as ((num_facts0, _), _), ffds, freqs, dirty_facts} =>
 | 
| 57371 | 1349 | let | 
| 61321 | 1350 | val parents = maximal_wrt_access_graph access_G facts | 
| 57371 | 1351 | val deps = used_ths | 
| 61321 | 1352 | |> filter (is_fact_in_graph access_G) | 
| 1353 | |> map nickname_of_thm | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1354 | |
| 61318 | 1355 | val name = anonymous_proof_name () | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1356 | val (access_G', xtabs', rev_learns) = | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1357 | add_node Automatic_Proof name parents feats deps (access_G, xtabs, []) | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1358 | |
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1359 | val (ffds', freqs') = | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1360 | recompute_ffds_freqs_from_learns (rev rev_learns) xtabs' num_facts0 ffds freqs | 
| 57371 | 1361 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1362 |                {access_G = access_G', xtabs = xtabs', ffds = ffds', freqs = freqs',
 | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1363 | dirty_facts = Option.map (cons name) dirty_facts} | 
| 57371 | 1364 | end); | 
| 57006 | 1365 | (true, "") | 
| 1366 | end) | |
| 53819 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 1367 | else | 
| 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 1368 | () | 
| 53095 | 1369 | |
| 63518 | 1370 | fun sendback sub = Active.sendback_markup_command (sledgehammerN ^ " " ^ sub) | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1371 | |
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1372 | val commit_timeout = seconds 30.0 | 
| 48332 
271a4a6af734
optimize parent computation in MaSh + remove temporary files
 blanchet parents: 
48330diff
changeset | 1373 | |
| 50485 
3c6ac2da2f45
merge aliased theorems in MaSh dependencies, modulo symmetry of equality
 blanchet parents: 
50484diff
changeset | 1374 | (* The timeout is understood in a very relaxed fashion. *) | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1375 | fun mash_learn_facts ctxt (params as {debug, verbose, ...}) prover auto_level run_prover
 | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1376 | learn_timeout facts = | 
| 48304 | 1377 | let | 
| 48318 | 1378 | val timer = Timer.startRealTimer () | 
| 62826 | 1379 | fun next_commit_time () = Timer.checkRealTimer timer + commit_timeout | 
| 48308 | 1380 | in | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1381 | (case get_state ctxt of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1382 | NONE => "MaSh is busy\nPlease try again later" | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1383 |     | SOME {access_G, ...} =>
 | 
| 48304 | 1384 | let | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1385 | val is_in_access_G = is_fact_in_graph access_G o snd | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1386 | val no_new_facts = forall is_in_access_G facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1387 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1388 | if no_new_facts andalso not run_prover then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1389 | if auto_level < 2 then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1390 | "No new " ^ (if run_prover then "automatic" else "Isar") ^ " proofs to learn" ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1391 | (if auto_level = 0 andalso not run_prover then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1392 | "\n\nHint: Try " ^ sendback learn_proverN ^ " to learn from an automatic prover" | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1393 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1394 | "") | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1395 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1396 | "" | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1397 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1398 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1399 | val name_tabs = build_name_tables nickname_of_thm facts | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1400 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1401 | fun deps_of status th = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1402 | if status = Non_Rec_Def orelse status = Rec_Def then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1403 | SOME [] | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1404 | else if run_prover then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1405 | prover_dependencies_of ctxt params prover auto_level facts name_tabs th | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1406 | |> (fn (false, _) => NONE | (true, deps) => trim_dependencies deps) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1407 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1408 | isar_dependencies_of name_tabs th | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1409 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1410 | fun do_commit [] [] [] state = state | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1411 | | do_commit learns relearns flops | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1412 |                   {access_G, xtabs as ((num_facts0, _), _), ffds, freqs, dirty_facts} =
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1413 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1414 | val was_empty = Graph.is_empty access_G | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 1415 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1416 | val (learns, (access_G', xtabs')) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1417 | fold_map (learn_wrt_access_graph ctxt) learns (access_G, xtabs) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1418 | |>> map_filter I | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1419 | val (relearns, access_G'') = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1420 | fold_map (relearn_wrt_access_graph ctxt) relearns access_G' | 
| 57371 | 1421 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1422 | val access_G''' = access_G'' |> fold flop_wrt_access_graph flops | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1423 | val dirty_facts' = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1424 | (case (was_empty, dirty_facts) of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1425 | (false, SOME names) => SOME (map #1 learns @ map #1 relearns @ names) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1426 | | _ => NONE) | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 1427 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1428 | val (ffds', freqs') = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1429 | if null relearns then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1430 | recompute_ffds_freqs_from_learns | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1431 | (map (fn (name, _, feats, deps) => (name, feats, deps)) learns) xtabs' | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1432 | num_facts0 ffds freqs | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1433 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1434 | recompute_ffds_freqs_from_access_G access_G''' xtabs' | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1435 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1436 |                   {access_G = access_G''', xtabs = xtabs', ffds = ffds', freqs = freqs',
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1437 | dirty_facts = dirty_facts'} | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1438 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1439 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1440 | fun commit last learns relearns flops = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1441 | (if debug andalso auto_level = 0 then writeln "Committing..." else (); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1442 | map_state ctxt (do_commit (rev learns) relearns flops); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1443 | if not last andalso auto_level = 0 then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1444 | let val num_proofs = length learns + length relearns in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1445 |                    writeln ("Learned " ^ string_of_int num_proofs ^ " " ^
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1446 | (if run_prover then "automatic" else "Isar") ^ " proof" ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1447 | plural_s num_proofs ^ " in the last " ^ string_of_time commit_timeout) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1448 | end | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1449 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1450 | ()) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1451 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1452 | fun learn_new_fact _ (accum as (_, (_, _, true))) = accum | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1453 | | learn_new_fact (parents, ((_, stature as (_, status)), th)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1454 | (learns, (num_nontrivial, next_commit, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1455 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1456 | val name = nickname_of_thm th | 
| 65458 | 1457 | val feats = features_of ctxt (Thm.theory_name th) stature [Thm.prop_of th] | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1458 | val deps = these (deps_of status th) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1459 | val num_nontrivial = num_nontrivial |> not (null deps) ? Integer.add 1 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1460 | val learns = (name, parents, feats, deps) :: learns | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1461 | val (learns, next_commit) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1462 | if Timer.checkRealTimer timer > next_commit then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1463 | (commit false learns [] []; ([], next_commit_time ())) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1464 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1465 | (learns, next_commit) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1466 | val timed_out = Timer.checkRealTimer timer > learn_timeout | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1467 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1468 | (learns, (num_nontrivial, next_commit, timed_out)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1469 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1470 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1471 | val (num_new_facts, num_nontrivial) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1472 | if no_new_facts then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1473 | (0, 0) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1474 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1475 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1476 | val new_facts = facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1477 | |> sort (crude_thm_ord ctxt o apply2 snd) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1478 | |> attach_parents_to_facts [] | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1479 | |> filter_out (is_in_access_G o snd) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1480 | val (learns, (num_nontrivial, _, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1481 | ([], (0, next_commit_time (), false)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1482 | |> fold learn_new_fact new_facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1483 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1484 | commit true learns [] []; (length new_facts, num_nontrivial) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1485 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1486 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1487 | fun relearn_old_fact _ (accum as (_, (_, _, true))) = accum | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1488 | | relearn_old_fact ((_, (_, status)), th) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1489 | ((relearns, flops), (num_nontrivial, next_commit, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1490 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1491 | val name = nickname_of_thm th | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1492 | val (num_nontrivial, relearns, flops) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1493 | (case deps_of status th of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1494 | SOME deps => (num_nontrivial + 1, (name, deps) :: relearns, flops) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1495 | | NONE => (num_nontrivial, relearns, name :: flops)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1496 | val (relearns, flops, next_commit) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1497 | if Timer.checkRealTimer timer > next_commit then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1498 | (commit false [] relearns flops; ([], [], next_commit_time ())) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1499 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1500 | (relearns, flops, next_commit) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1501 | val timed_out = Timer.checkRealTimer timer > learn_timeout | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1502 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1503 | ((relearns, flops), (num_nontrivial, next_commit, timed_out)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1504 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1505 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1506 | val num_nontrivial = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1507 | if not run_prover then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1508 | num_nontrivial | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1509 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1510 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1511 | val max_isar = 1000 * max_dependencies | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1512 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1513 | fun priority_of th = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1514 | Random.random_range 0 max_isar + | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1515 | (case try (Graph.get_node access_G) (nickname_of_thm th) of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1516 | SOME (Isar_Proof, _, deps) => ~100 * length deps | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1517 | | SOME (Automatic_Proof, _, _) => 2 * max_isar | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1518 | | SOME (Isar_Proof_wegen_Prover_Flop, _, _) => max_isar | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1519 | | NONE => 0) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1520 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1521 | val old_facts = facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1522 | |> filter is_in_access_G | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1523 | |> map (`(priority_of o snd)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1524 | |> sort (int_ord o apply2 fst) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1525 | |> map snd | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1526 | val ((relearns, flops), (num_nontrivial, _, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1527 | (([], []), (num_nontrivial, next_commit_time (), false)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1528 | |> fold relearn_old_fact old_facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1529 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1530 | commit true [] relearns flops; num_nontrivial | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1531 | end | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1532 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1533 | if verbose orelse auto_level < 2 then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1534 | "Learned " ^ string_of_int num_new_facts ^ " fact" ^ plural_s num_new_facts ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1535 | " and " ^ string_of_int num_nontrivial ^ " nontrivial " ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1536 | (if run_prover then "automatic and " else "") ^ "Isar proof" ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1537 | plural_s num_nontrivial ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1538 | (if verbose then " in " ^ string_of_time (Timer.checkRealTimer timer) else "") | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1539 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1540 | "" | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1541 | end | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1542 | end) | 
| 48308 | 1543 | end | 
| 48304 | 1544 | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1545 | fun mash_learn ctxt (params as {provers, timeout, ...}) fact_override chained run_prover =
 | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1546 | let | 
| 48396 | 1547 | val css = Sledgehammer_Fact.clasimpset_rule_table_of ctxt | 
| 48395 
85a7fb65507a
learning should honor the fact override and the chained facts
 blanchet parents: 
48394diff
changeset | 1548 | val ctxt = ctxt |> Config.put instantiate_inducts false | 
| 58919 | 1549 | val facts = | 
| 1550 |       nearly_all_facts ctxt false fact_override Keyword.empty_keywords css chained [] @{prop True}
 | |
| 60641 | 1551 | |> sort (crude_thm_ord ctxt o apply2 snd o swap) | 
| 48404 | 1552 | val num_facts = length facts | 
| 1553 | val prover = hd provers | |
| 57006 | 1554 | |
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1555 | fun learn auto_level run_prover = | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1556 | mash_learn_facts ctxt params prover auto_level run_prover one_year facts | 
| 58843 | 1557 | |> writeln | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1558 | in | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1559 | if run_prover then | 
| 58843 | 1560 |       (writeln ("MaShing through " ^ string_of_int num_facts ^ " fact" ^
 | 
| 57017 | 1561 |          plural_s num_facts ^ " for automatic proofs (" ^ quote prover ^ " timeout: " ^
 | 
| 1562 | string_of_time timeout ^ ").\n\nCollecting Isar proofs first..."); | |
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1563 | learn 1 false; | 
| 63692 | 1564 | writeln "Now collecting automatic proofs\n\ | 
| 1565 | \This may take several hours; you can safely stop the learning process at any point"; | |
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1566 | learn 0 true) | 
| 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1567 | else | 
| 58843 | 1568 |       (writeln ("MaShing through " ^ string_of_int num_facts ^ " fact" ^
 | 
| 57006 | 1569 | plural_s num_facts ^ " for Isar proofs..."); | 
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1570 | learn 0 false) | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1571 | end | 
| 48249 | 1572 | |
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1573 | fun mash_can_suggest_facts ctxt = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1574 | (case get_state ctxt of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1575 | NONE => false | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1576 |   | SOME {access_G, ...} => not (Graph.is_empty access_G))
 | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1577 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1578 | fun mash_can_suggest_facts_fast ctxt = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1579 | (case peek_state ctxt of | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1580 | NONE => false | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1581 |   | SOME (_, {access_G, ...}) => not (Graph.is_empty access_G))
 | 
| 50311 | 1582 | |
| 57274 
0acfdb151e52
more generous formula -- there are lots of duplicates out there
 blanchet parents: 
57273diff
changeset | 1583 | (* Generate more suggestions than requested, because some might be thrown out later for various | 
| 
0acfdb151e52
more generous formula -- there are lots of duplicates out there
 blanchet parents: 
57273diff
changeset | 1584 | reasons (e.g., duplicates). *) | 
| 
0acfdb151e52
more generous formula -- there are lots of duplicates out there
 blanchet parents: 
57273diff
changeset | 1585 | fun generous_max_suggestions max_facts = 3 * max_facts div 2 + 25 | 
| 50383 
4274b25ff4e7
take proximity into account for MaSh + fix a debilitating bug in feature generation
 blanchet parents: 
50382diff
changeset | 1586 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1587 | val mepo_weight = 0.5 (* FUDGE *) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1588 | val mash_weight = 0.5 (* FUDGE *) | 
| 50814 | 1589 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1590 | val max_facts_to_learn_before_query = 100 (* FUDGE *) | 
| 53152 
cbd3c7c48d2c
learn new facts on query if there aren't too many of them in MaSh
 blanchet parents: 
53150diff
changeset | 1591 | |
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1592 | (* The threshold should be large enough so that MaSh does not get activated for Auto Sledgehammer. *) | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1593 | val min_secs_for_learning = 10 | 
| 48318 | 1594 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1595 | fun relevant_facts ctxt (params as {verbose, learn, fact_filter, timeout, ...}) prover
 | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1596 |     max_facts ({add, only, ...} : fact_override) hyp_ts concl_t facts =
 | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1597 | if not (subset (op =) (the_list fact_filter, fact_filters)) then | 
| 63692 | 1598 |     error ("Unknown fact filter: " ^ quote (the fact_filter))
 | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1599 | else if only then | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1600 |     [("", map fact_of_raw_fact facts)]
 | 
| 48321 | 1601 | else if max_facts <= 0 orelse null facts then | 
| 51010 | 1602 |     [("", [])]
 | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1603 | else | 
| 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1604 | let | 
| 60649 
e007aa6a8aa2
more explicit use of context and elimination of Thm.theory_of_thm, although unclear (and untested?) situations remain;
 wenzelm parents: 
60641diff
changeset | 1605 | val thy_name = Context.theory_name (Proof_Context.theory_of ctxt) | 
| 57407 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 1606 | |
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1607 | fun maybe_launch_thread exact min_num_facts_to_learn = | 
| 59471 
ca459352d8c5
more explicit indication of Async_Manager_Legacy as Proof General legacy;
 wenzelm parents: 
59172diff
changeset | 1608 | if not (Async_Manager_Legacy.has_running_threads MaShN) andalso | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1609 | Time.toSeconds timeout >= min_secs_for_learning then | 
| 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1610 | let val timeout = time_mult learn_timeout_slack timeout in | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1611 | (if verbose then | 
| 58843 | 1612 |                writeln ("Started MaShing through " ^
 | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1613 | (if exact then "" else "up to ") ^ string_of_int min_num_facts_to_learn ^ | 
| 63692 | 1614 | " fact" ^ plural_s min_num_facts_to_learn ^ " in the background") | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1615 | else | 
| 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1616 | ()); | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1617 | launch_thread timeout | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1618 | (fn () => (true, mash_learn_facts ctxt params prover 2 false timeout facts)) | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 1619 | end | 
| 48318 | 1620 | else | 
| 1621 | () | |
| 57018 | 1622 | |
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1623 | val mash_enabled = is_mash_enabled () | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1624 | val mash_fast = mash_can_suggest_facts_fast ctxt | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1625 | |
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1626 | fun please_learn () = | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1627 | if mash_fast then | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1628 | (case get_state ctxt of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1629 | NONE => maybe_launch_thread false (length facts) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1630 |           | SOME {access_G, xtabs = ((num_facts0, _), _), ...} =>
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1631 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1632 | val is_in_access_G = is_fact_in_graph access_G o snd | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1633 | val min_num_facts_to_learn = length facts - num_facts0 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1634 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1635 | if min_num_facts_to_learn <= max_facts_to_learn_before_query then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1636 | (case length (filter_out is_in_access_G facts) of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1637 | 0 => () | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1638 | | num_facts_to_learn => | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1639 | if num_facts_to_learn <= max_facts_to_learn_before_query then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1640 | mash_learn_facts ctxt params prover 2 false timeout facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1641 | |> (fn "" => () | s => writeln (MaShN ^ ": " ^ s)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1642 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1643 | maybe_launch_thread true num_facts_to_learn) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1644 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1645 | maybe_launch_thread false min_num_facts_to_learn | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1646 | end) | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1647 | else | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1648 | maybe_launch_thread false (length facts) | 
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1649 | |
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1650 | val _ = | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1651 | if learn andalso mash_enabled andalso fact_filter <> SOME mepoN then please_learn () else () | 
| 57018 | 1652 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1653 | val effective_fact_filter = | 
| 55286 | 1654 | (case fact_filter of | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1655 | SOME ff => ff | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1656 | | NONE => if mash_enabled andalso mash_fast then meshN else mepoN) | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1657 | |
| 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1658 | val unique_facts = drop_duplicate_facts facts | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1659 | val add_ths = Attrib.eval_thms ctxt add | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1660 | |
| 51004 
5f2788c38127
distinguish raw and non-raw facts, using raw for 10 000s of facts and non-raw after selection of some hundreds
 blanchet parents: 
51003diff
changeset | 1661 | fun in_add (_, th) = member Thm.eq_thm_prop add_ths th | 
| 57018 | 1662 | |
| 51003 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1663 | fun add_and_take accepts = | 
| 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1664 | (case add_ths of | 
| 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1665 | [] => accepts | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1666 | | _ => | 
| 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1667 | (unique_facts |> filter in_add |> map fact_of_raw_fact) @ (accepts |> filter_out in_add)) | 
| 48293 | 1668 | |> take max_facts | 
| 57018 | 1669 | |
| 48406 | 1670 | fun mepo () = | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1671 | (mepo_suggested_facts ctxt params max_facts NONE hyp_ts concl_t unique_facts | 
| 54091 | 1672 | |> weight_facts_steeply, []) | 
| 57018 | 1673 | |
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1674 | fun mash () = | 
| 61322 | 1675 | mash_suggested_facts ctxt thy_name params (generous_max_suggestions max_facts) hyp_ts | 
| 1676 | concl_t facts | |
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 1677 | |>> weight_facts_steeply | 
| 57018 | 1678 | |
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1679 | val mess = | 
| 51003 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1680 | (* the order is important for the "case" expression below *) | 
| 54091 | 1681 | [] |> effective_fact_filter <> mepoN ? cons (mash_weight, mash) | 
| 1682 | |> effective_fact_filter <> mashN ? cons (mepo_weight, mepo) | |
| 1683 | |> Par_List.map (apsnd (fn f => f ())) | |
| 61322 | 1684 | val mesh = | 
| 1685 | mesh_facts (fact_distinct (op aconv)) (eq_snd (gen_eq_thm ctxt)) max_facts mess | |
| 1686 | |> add_and_take | |
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1687 | in | 
| 55286 | 1688 | (case (fact_filter, mess) of | 
| 51024 
98fb341d32e3
distinguish MeSh and smart -- with smart, allow combinations of MaSh, MeSh, and MePo in different slices -- and use MaSh also with SMT solvers, based on evaluation
 blanchet parents: 
51020diff
changeset | 1689 | (NONE, [(_, (mepo, _)), (_, (mash, _))]) => | 
| 57658 | 1690 | [(meshN, mesh), | 
| 1691 | (mepoN, mepo |> map fst |> add_and_take), | |
| 51010 | 1692 | (mashN, mash |> map fst |> add_and_take)] | 
| 55286 | 1693 | | _ => [(effective_fact_filter, mesh)]) | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1694 | end | 
| 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1695 | |
| 48248 | 1696 | end; |