| author | haftmann | 
| Sat, 05 Sep 2020 08:32:27 +0000 | |
| changeset 72239 | 12e94c2ff6c5 | 
| parent 70586 | 57df8a85317a | 
| child 73383 | 6b104dc069de | 
| permissions | -rw-r--r-- | 
| 48380 | 1 | (* Title: HOL/Tools/Sledgehammer/sledgehammer_mash.ML | 
| 48248 | 2 | Author: Jasmin Blanchette, TU Muenchen | 
| 57009 | 3 | Author: Cezary Kaliszyk, University of Innsbruck | 
| 48248 | 4 | |
| 5 | Sledgehammer's machine-learning-based relevance filter (MaSh). | |
| 6 | *) | |
| 7 | ||
| 48381 | 8 | signature SLEDGEHAMMER_MASH = | 
| 48248 | 9 | sig | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 10 | type stature = ATP_Problem_Generate.stature | 
| 51004 
5f2788c38127
distinguish raw and non-raw facts, using raw for 10 000s of facts and non-raw after selection of some hundreds
 blanchet parents: 
51003diff
changeset | 11 | type raw_fact = Sledgehammer_Fact.raw_fact | 
| 48296 
e7f01b7e244e
gracefully handle the case of empty theories when going up the accessibility chain
 blanchet parents: 
48293diff
changeset | 12 | type fact = Sledgehammer_Fact.fact | 
| 
e7f01b7e244e
gracefully handle the case of empty theories when going up the accessibility chain
 blanchet parents: 
48293diff
changeset | 13 | type fact_override = Sledgehammer_Fact.fact_override | 
| 55201 | 14 | type params = Sledgehammer_Prover.params | 
| 15 | type prover_result = Sledgehammer_Prover.prover_result | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 16 | |
| 48308 | 17 | val trace : bool Config.T | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 18 | val duplicates : bool Config.T | 
| 51008 | 19 | val MePoN : string | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 20 | val MaShN : string | 
| 51008 | 21 | val MeShN : string | 
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 22 | val mepoN : string | 
| 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 23 | val mashN : string | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 24 | val meshN : string | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 25 | val unlearnN : string | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 26 | val learn_isarN : string | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 27 | val learn_proverN : string | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 28 | val relearn_isarN : string | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 29 | val relearn_proverN : string | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 30 | val fact_filters : string list | 
| 50826 | 31 | val encode_str : string -> string | 
| 32 | val encode_strs : string list -> string | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 33 | val decode_str : string -> string | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 34 | val decode_strs : string -> string list | 
| 50632 
12c097ff3241
slightly more elegant naming convention (to keep low-level and high-level APIs separated)
 blanchet parents: 
50631diff
changeset | 35 | |
| 57532 | 36 | datatype mash_algorithm = | 
| 57458 | 37 | MaSh_NB | 
| 57459 | 38 | | MaSh_kNN | 
| 39 | | MaSh_NB_kNN | |
| 57458 | 40 | | MaSh_NB_Ext | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 41 | | MaSh_kNN_Ext | 
| 57106 | 42 | |
| 57120 | 43 | val is_mash_enabled : unit -> bool | 
| 57532 | 44 | val the_mash_algorithm : unit -> mash_algorithm | 
| 57557 | 45 | val str_of_mash_algorithm : mash_algorithm -> string | 
| 57120 | 46 | |
| 61322 | 47 |   val mesh_facts : ('a list -> 'a list) -> ('a * 'a -> bool) -> int ->
 | 
| 48 |     (real * (('a * real) list * 'a list)) list -> 'a list
 | |
| 61321 | 49 | val nickname_of_thm : thm -> string | 
| 57006 | 50 |   val find_suggested_facts : Proof.context -> ('b * thm) list -> string list -> ('b * thm) list
 | 
| 70586 | 51 | val crude_thm_ord : Proof.context -> thm ord | 
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 52 | val thm_less : thm * thm -> bool | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 53 | val goal_of_thm : theory -> thm -> thm | 
| 57006 | 54 | val run_prover_for_mash : Proof.context -> params -> string -> string -> fact list -> thm -> | 
| 55 | prover_result | |
| 60640 | 56 | val features_of : Proof.context -> string -> stature -> term list -> string list | 
| 51177 | 57 | val trim_dependencies : string list -> string list option | 
| 61321 | 58 | val isar_dependencies_of : string Symtab.table * string Symtab.table -> thm -> string list option | 
| 57006 | 59 | val prover_dependencies_of : Proof.context -> params -> string -> int -> raw_fact list -> | 
| 60 | string Symtab.table * string Symtab.table -> thm -> bool * string list | |
| 61321 | 61 |   val attach_parents_to_facts : ('a * thm) list -> ('a * thm) list ->
 | 
| 57006 | 62 |     (string list * ('a * thm)) list
 | 
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 63 | val num_extra_feature_facts : int | 
| 53141 
d27e99a6a679
take chained and proximate facts into consideration when computing MaSh features
 blanchet parents: 
53140diff
changeset | 64 | val extra_feature_factor : real | 
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 65 |   val weight_facts_smoothly : 'a list -> ('a * real) list
 | 
| 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 66 |   val weight_facts_steeply : 'a list -> ('a * real) list
 | 
| 57432 | 67 |   val find_mash_suggestions : Proof.context -> int -> string list -> ('a * thm) list ->
 | 
| 68 |     ('a * thm) list -> ('a * thm) list -> ('a * thm) list * ('a * thm) list
 | |
| 60649 
e007aa6a8aa2
more explicit use of context and elimination of Thm.theory_of_thm, although unclear (and untested?) situations remain;
 wenzelm parents: 
60641diff
changeset | 69 | val mash_suggested_facts : Proof.context -> string -> params -> int -> term list -> term -> | 
| 57407 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 70 | raw_fact list -> fact list * fact list | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 71 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 72 | val mash_unlearn : Proof.context -> unit | 
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 73 | val mash_learn_proof : Proof.context -> params -> term -> thm list -> unit | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 74 | val mash_learn_facts : Proof.context -> params -> string -> int -> bool -> Time.time -> | 
| 57120 | 75 | raw_fact list -> string | 
| 57006 | 76 | val mash_learn : Proof.context -> params -> fact_override -> thm list -> bool -> unit | 
| 57432 | 77 | val mash_can_suggest_facts : Proof.context -> bool | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 78 | val mash_can_suggest_facts_fast : Proof.context -> bool | 
| 53819 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 79 | |
| 57108 
dc0b4f50e288
more generous max number of suggestions, for more safety
 blanchet parents: 
57107diff
changeset | 80 | val generous_max_suggestions : int -> int | 
| 50814 | 81 | val mepo_weight : real | 
| 82 | val mash_weight : real | |
| 57006 | 83 | val relevant_facts : Proof.context -> params -> string -> int -> fact_override -> term list -> | 
| 84 | term -> raw_fact list -> (string * fact list) list | |
| 48248 | 85 | end; | 
| 86 | ||
| 48381 | 87 | structure Sledgehammer_MaSh : SLEDGEHAMMER_MASH = | 
| 48248 | 88 | struct | 
| 48249 | 89 | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 90 | open ATP_Util | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 91 | open ATP_Problem_Generate | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 92 | open Sledgehammer_Util | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 93 | open Sledgehammer_Fact | 
| 55201 | 94 | open Sledgehammer_Prover | 
| 55202 
824c48a539c9
renamed many Sledgehammer ML files to clarify structure
 blanchet parents: 
55201diff
changeset | 95 | open Sledgehammer_Prover_Minimize | 
| 48381 | 96 | open Sledgehammer_MePo | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 97 | |
| 61318 | 98 | val anonymous_proof_prefix = "." | 
| 99 | ||
| 69593 | 100 | val trace = Attrib.setup_config_bool \<^binding>\<open>sledgehammer_mash_trace\<close> (K false) | 
| 101 | val duplicates = Attrib.setup_config_bool \<^binding>\<open>sledgehammer_fact_duplicates\<close> (K false) | |
| 51032 
69da236d7838
added option to use SNoW as machine learning algo
 blanchet parents: 
51029diff
changeset | 102 | |
| 48308 | 103 | fun trace_msg ctxt msg = if Config.get ctxt trace then tracing (msg ()) else () | 
| 104 | ||
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 105 | fun gen_eq_thm ctxt = if Config.get ctxt duplicates then Thm.eq_thm_strict else Thm.eq_thm_prop | 
| 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 106 | |
| 51008 | 107 | val MePoN = "MePo" | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 108 | val MaShN = "MaSh" | 
| 51008 | 109 | val MeShN = "MeSh" | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 110 | |
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 111 | val mepoN = "mepo" | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 112 | val mashN = "mash" | 
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 113 | val meshN = "mesh" | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 114 | |
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 115 | val fact_filters = [meshN, mepoN, mashN] | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 116 | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 117 | val unlearnN = "unlearn" | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 118 | val learn_isarN = "learn_isar" | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 119 | val learn_proverN = "learn_prover" | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 120 | val relearn_isarN = "relearn_isar" | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 121 | val relearn_proverN = "relearn_prover" | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 122 | |
| 57368 | 123 | fun map_array_at ary f i = Array.update (ary, i, f (Array.sub (ary, i))) | 
| 124 | ||
| 57371 | 125 | type xtab = int * int Symtab.table | 
| 126 | ||
| 127 | val empty_xtab = (0, Symtab.empty) | |
| 128 | ||
| 129 | fun add_to_xtab key (next, tab) = (next + 1, Symtab.update_new (key, next) tab) | |
| 130 | fun maybe_add_to_xtab key = perhaps (try (add_to_xtab key)) | |
| 131 | ||
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 132 | fun state_file () = Path.expand (Path.explode "$ISABELLE_HOME_USER/mash_state") | 
| 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 133 | val remove_state_file = try File.rm o state_file | 
| 48330 | 134 | |
| 57532 | 135 | datatype mash_algorithm = | 
| 57458 | 136 | MaSh_NB | 
| 57459 | 137 | | MaSh_kNN | 
| 138 | | MaSh_NB_kNN | |
| 57458 | 139 | | MaSh_NB_Ext | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 140 | | MaSh_kNN_Ext | 
| 57278 | 141 | |
| 57532 | 142 | fun mash_algorithm () = | 
| 69593 | 143 | (case Options.default_string \<^system_option>\<open>MaSh\<close> of | 
| 61043 | 144 | "yes" => SOME MaSh_NB_kNN | 
| 145 | | "sml" => SOME MaSh_NB_kNN | |
| 146 | | "nb" => SOME MaSh_NB | |
| 147 | | "knn" => SOME MaSh_kNN | |
| 148 | | "nb_knn" => SOME MaSh_NB_kNN | |
| 149 | | "nb_ext" => SOME MaSh_NB_Ext | |
| 150 | | "knn_ext" => SOME MaSh_kNN_Ext | |
| 151 | | "none" => NONE | |
| 152 | | "" => NONE | |
| 63692 | 153 |   | algorithm => (warning ("Unknown MaSh algorithm: " ^ quote algorithm); NONE))
 | 
| 57018 | 154 | |
| 57532 | 155 | val is_mash_enabled = is_some o mash_algorithm | 
| 156 | val the_mash_algorithm = the_default MaSh_NB_kNN o mash_algorithm | |
| 57018 | 157 | |
| 57557 | 158 | fun str_of_mash_algorithm MaSh_NB = "nb" | 
| 159 | | str_of_mash_algorithm MaSh_kNN = "knn" | |
| 160 | | str_of_mash_algorithm MaSh_NB_kNN = "nb_knn" | |
| 161 | | str_of_mash_algorithm MaSh_NB_Ext = "nb_ext" | |
| 162 | | str_of_mash_algorithm MaSh_kNN_Ext = "knn_ext" | |
| 163 | ||
| 57459 | 164 | fun scaled_avg [] = 0 | 
| 165 | | scaled_avg xs = Real.ceil (100000000.0 * fold (curry (op +)) xs 0.0) div length xs | |
| 166 | ||
| 167 | fun avg [] = 0.0 | |
| 168 | | avg xs = fold (curry (op +)) xs 0.0 / Real.fromInt (length xs) | |
| 169 | ||
| 170 | fun normalize_scores _ [] = [] | |
| 171 | | normalize_scores max_facts xs = | |
| 62826 | 172 | map (apsnd (curry (op *) (1.0 / avg (map snd (take max_facts xs))))) xs | 
| 57459 | 173 | |
| 61322 | 174 | fun mesh_facts maybe_distinct _ max_facts [(_, (sels, unks))] = | 
| 175 | map fst (take max_facts sels) @ take (max_facts - length sels) unks | |
| 176 | |> maybe_distinct | |
| 177 | | mesh_facts _ fact_eq max_facts mess = | |
| 57459 | 178 | let | 
| 179 | val mess = mess |> map (apsnd (apfst (normalize_scores max_facts))) | |
| 180 | ||
| 181 | fun score_in fact (global_weight, (sels, unks)) = | |
| 182 | let val score_at = try (nth sels) #> Option.map (fn (_, score) => global_weight * score) in | |
| 183 | (case find_index (curry fact_eq fact o fst) sels of | |
| 184 | ~1 => if member fact_eq unks fact then NONE else SOME 0.0 | |
| 185 | | rank => score_at rank) | |
| 186 | end | |
| 187 | ||
| 188 | fun weight_of fact = mess |> map_filter (score_in fact) |> scaled_avg | |
| 189 | in | |
| 190 | fold (union fact_eq o map fst o take max_facts o fst o snd) mess [] | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 191 | |> map (`weight_of) |> sort (int_ord o apply2 fst o swap) | 
| 57459 | 192 | |> map snd |> take max_facts | 
| 193 | end | |
| 194 | ||
| 195 | fun smooth_weight_of_fact rank = Math.pow (1.3, 15.5 - 0.2 * Real.fromInt rank) + 15.0 (* FUDGE *) | |
| 196 | fun steep_weight_of_fact rank = Math.pow (0.62, log2 (Real.fromInt (rank + 1))) (* FUDGE *) | |
| 197 | ||
| 198 | fun weight_facts_smoothly facts = facts ~~ map smooth_weight_of_fact (0 upto length facts - 1) | |
| 199 | fun weight_facts_steeply facts = facts ~~ map steep_weight_of_fact (0 upto length facts - 1) | |
| 200 | ||
| 57660 | 201 | fun sort_array_suffix cmp needed a = | 
| 57009 | 202 | let | 
| 57406 | 203 | exception BOTTOM of int | 
| 204 | ||
| 57460 | 205 | val al = Array.length a | 
| 206 | ||
| 57009 | 207 | fun maxson l i = | 
| 57017 | 208 | let val i31 = i + i + i + 1 in | 
| 57009 | 209 | if i31 + 2 < l then | 
| 57017 | 210 | let val x = Unsynchronized.ref i31 in | 
| 67560 | 211 | if is_less (cmp (Array.sub (a, i31), Array.sub (a, i31 + 1))) then x := i31 + 1 else (); | 
| 212 | if is_less (cmp (Array.sub (a, !x), Array.sub (a, i31 + 2))) then x := i31 + 2 else (); | |
| 57009 | 213 | !x | 
| 214 | end | |
| 215 | else | |
| 67560 | 216 | if i31 + 1 < l andalso is_less (cmp (Array.sub (a, i31), Array.sub (a, i31 + 1))) | 
| 57009 | 217 | then i31 + 1 else if i31 < l then i31 else raise BOTTOM i | 
| 218 | end | |
| 219 | ||
| 220 | fun trickledown l i e = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 221 | let val j = maxson l i in | 
| 67560 | 222 | if is_greater (cmp (Array.sub (a, j), e)) then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 223 | (Array.update (a, i, Array.sub (a, j)); trickledown l j e) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 224 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 225 | Array.update (a, i, e) | 
| 57009 | 226 | end | 
| 227 | ||
| 228 | fun trickle l i e = trickledown l i e handle BOTTOM i => Array.update (a, i, e) | |
| 229 | ||
| 230 | fun bubbledown l i = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 231 | let val j = maxson l i in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 232 | Array.update (a, i, Array.sub (a, j)); | 
| 57009 | 233 | bubbledown l j | 
| 234 | end | |
| 235 | ||
| 236 | fun bubble l i = bubbledown l i handle BOTTOM i => i | |
| 237 | ||
| 238 | fun trickleup i e = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 239 | let val father = (i - 1) div 3 in | 
| 67560 | 240 | if is_less (cmp (Array.sub (a, father), e)) then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 241 | (Array.update (a, i, Array.sub (a, father)); | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 242 | if father > 0 then trickleup father e else Array.update (a, 0, e)) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 243 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 244 | Array.update (a, i, e) | 
| 57009 | 245 | end | 
| 246 | ||
| 57355 | 247 | fun for i = if i < 0 then () else (trickle al i (Array.sub (a, i)); for (i - 1)) | 
| 57009 | 248 | |
| 249 | fun for2 i = | |
| 57660 | 250 | if i < Integer.max 2 (al - needed) then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 251 | () | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 252 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 253 | let val e = Array.sub (a, i) in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 254 | Array.update (a, i, Array.sub (a, 0)); | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 255 | trickleup (bubble i 0) e; | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 256 | for2 (i - 1) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 257 | end | 
| 57009 | 258 | in | 
| 57355 | 259 | for (((al + 1) div 3) - 1); | 
| 260 | for2 (al - 1); | |
| 261 | if al > 1 then | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 262 | let val e = Array.sub (a, 1) in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 263 | Array.update (a, 1, Array.sub (a, 0)); | 
| 57009 | 264 | Array.update (a, 0, e) | 
| 265 | end | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 266 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 267 | () | 
| 57009 | 268 | end | 
| 269 | ||
| 57660 | 270 | fun rev_sort_list_prefix cmp needed xs = | 
| 57460 | 271 | let val ary = Array.fromList xs in | 
| 57660 | 272 | sort_array_suffix cmp needed ary; | 
| 273 | Array.foldl (op ::) [] ary | |
| 57460 | 274 | end | 
| 275 | ||
| 276 | ||
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 277 | (*** Convenience functions for synchronized access ***) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 278 | |
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 279 | fun synchronized_timed_value var time_limit = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 280 | Synchronized.timed_access var time_limit (fn value => SOME (value, value)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 281 | fun synchronized_timed_change_result var time_limit f = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 282 | Synchronized.timed_access var time_limit (SOME o f) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 283 | fun synchronized_timed_change var time_limit f = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 284 | synchronized_timed_change_result var time_limit (fn x => ((), f x)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 285 | |
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 286 | fun mash_time_limit _ = SOME (seconds 0.1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 287 | |
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 288 | |
| 57460 | 289 | (*** Isabelle-agnostic machine learning ***) | 
| 290 | ||
| 291 | structure MaSh = | |
| 292 | struct | |
| 293 | ||
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 294 | fun select_fact_idxs (big_number : real) recommends = | 
| 57364 | 295 | List.app (fn at => | 
| 296 | let val (j, ov) = Array.sub (recommends, at) in | |
| 57371 | 297 | Array.update (recommends, at, (j, big_number + ov)) | 
| 57364 | 298 | end) | 
| 299 | ||
| 57374 | 300 | fun wider_array_of_vector init vec = | 
| 301 | let val ary = Array.array init in | |
| 302 |     Array.copyVec {src = vec, dst = ary, di = 0};
 | |
| 303 | ary | |
| 304 | end | |
| 305 | ||
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 306 | val nb_def_prior_weight = 1000 (* FUDGE *) | 
| 57095 | 307 | |
| 57374 | 308 | fun learn_facts (tfreq0, sfreq0, dffreq0) num_facts0 num_facts num_feats depss featss = | 
| 57354 | 309 | let | 
| 57374 | 310 | val tfreq = wider_array_of_vector (num_facts, 0) tfreq0 | 
| 311 | val sfreq = wider_array_of_vector (num_facts, Inttab.empty) sfreq0 | |
| 312 | val dffreq = wider_array_of_vector (num_feats, 0) dffreq0 | |
| 313 | ||
| 314 | fun learn_one th feats deps = | |
| 57354 | 315 | let | 
| 57355 | 316 | fun add_th weight t = | 
| 317 | let | |
| 318 | val im = Array.sub (sfreq, t) | |
| 57374 | 319 | fun fold_fn s = Inttab.map_default (s, 0) (Integer.add weight) | 
| 57355 | 320 | in | 
| 57368 | 321 | map_array_at tfreq (Integer.add weight) t; | 
| 57355 | 322 | Array.update (sfreq, t, fold fold_fn feats im) | 
| 323 | end | |
| 324 | ||
| 57368 | 325 | val add_sym = map_array_at dffreq (Integer.add 1) | 
| 57354 | 326 | in | 
| 57355 | 327 | add_th nb_def_prior_weight th; | 
| 328 | List.app (add_th 1) deps; | |
| 329 | List.app add_sym feats | |
| 57354 | 330 | end | 
| 331 | ||
| 57102 | 332 | fun for i = | 
| 57383 | 333 | if i = num_facts then () | 
| 334 | else (learn_one i (Vector.sub (featss, i)) (Vector.sub (depss, i)); for (i + 1)) | |
| 57102 | 335 | in | 
| 57383 | 336 | for num_facts0; | 
| 57374 | 337 | (Array.vector tfreq, Array.vector sfreq, Array.vector dffreq) | 
| 57102 | 338 | end | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 339 | |
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 340 | fun naive_bayes (tfreq, sfreq, dffreq) num_facts max_suggs fact_idxs goal_feats = | 
| 57278 | 341 | let | 
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 342 | val tau = 0.2 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 343 | val pos_weight = 5.0 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 344 | val def_val = ~18.0 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 345 | val init_val = 30.0 (* FUDGE *) | 
| 57124 | 346 | |
| 57366 | 347 | val ln_afreq = Math.ln (Real.fromInt num_facts) | 
| 57374 | 348 | val idf = Vector.map (fn i => ln_afreq - Math.ln (Real.fromInt i)) dffreq | 
| 57366 | 349 | |
| 57359 | 350 | fun tfidf feat = Vector.sub (idf, feat) | 
| 57102 | 351 | |
| 352 | fun log_posterior i = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 353 | let | 
| 57374 | 354 | val tfreq = Real.fromInt (Vector.sub (tfreq, i)) | 
| 57097 | 355 | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 356 | fun add_feat (f, fw0) (res, sfh) = | 
| 57102 | 357 | (case Inttab.lookup sfh f of | 
| 358 | SOME sf => | |
| 57401 
02f56126b4e4
reintroduced 'extra features' but with lower weight than before (to account for tfidf)
 blanchet parents: 
57387diff
changeset | 359 | (res + fw0 * tfidf f * Math.ln (pos_weight * Real.fromInt sf / tfreq), | 
| 57102 | 360 | Inttab.delete f sfh) | 
| 57409 | 361 | | NONE => (res + fw0 * tfidf f * def_val, sfh)) | 
| 57097 | 362 | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 363 | val (res, sfh) = fold add_feat goal_feats (init_val * Math.ln tfreq, Vector.sub (sfreq, i)) | 
| 57097 | 364 | |
| 57409 | 365 | fun fold_sfh (f, sf) sow = | 
| 59476 | 366 | sow + tfidf f * Math.ln (1.0 - Real.fromInt (sf - 1) / tfreq) | 
| 57097 | 367 | |
| 57102 | 368 | val sum_of_weights = Inttab.fold fold_sfh sfh 0.0 | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 369 | in | 
| 57278 | 370 | res + tau * sum_of_weights | 
| 57097 | 371 | end | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 372 | |
| 57356 | 373 | val posterior = Array.tabulate (num_facts, (fn j => (j, log_posterior j))) | 
| 57102 | 374 | |
| 57364 | 375 | fun ret at acc = | 
| 376 | if at = num_facts then acc else ret (at + 1) (Array.sub (posterior, at) :: acc) | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 377 | in | 
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 378 | select_fact_idxs 100000.0 posterior fact_idxs; | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 379 | sort_array_suffix (Real.compare o apply2 snd) max_suggs posterior; | 
| 57364 | 380 | ret (Integer.max 0 (num_facts - max_suggs)) [] | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 381 | end | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 382 | |
| 59477 | 383 | val initial_k = 0 | 
| 57458 | 384 | |
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 385 | fun k_nearest_neighbors dffreq num_facts num_feats depss featss max_suggs fact_idxs goal_feats = | 
| 57458 | 386 | let | 
| 387 | exception EXIT of unit | |
| 388 | ||
| 389 | val ln_afreq = Math.ln (Real.fromInt num_facts) | |
| 390 | fun tfidf feat = ln_afreq - Math.ln (Real.fromInt (Vector.sub (dffreq, feat))) | |
| 391 | ||
| 392 | val overlaps_sqr = Array.tabulate (num_facts, rpair 0.0) | |
| 393 | ||
| 57459 | 394 | val feat_facts = Array.array (num_feats, []) | 
| 395 | val _ = Vector.foldl (fn (feats, fact) => | |
| 396 | (List.app (map_array_at feat_facts (cons fact)) feats; fact + 1)) 0 featss | |
| 397 | ||
| 57458 | 398 | fun do_feat (s, sw0) = | 
| 399 | let | |
| 400 | val sw = sw0 * tfidf s | |
| 59476 | 401 | val w6 = Math.pow (sw, 6.0 (* FUDGE *)) | 
| 57458 | 402 | |
| 403 | fun inc_overlap j = | |
| 404 | let val (_, ov) = Array.sub (overlaps_sqr, j) in | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 405 | Array.update (overlaps_sqr, j, (j, w6 + ov)) | 
| 57458 | 406 | end | 
| 407 | in | |
| 408 | List.app inc_overlap (Array.sub (feat_facts, s)) | |
| 409 | end | |
| 410 | ||
| 411 | val _ = List.app do_feat goal_feats | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 412 | val _ = sort_array_suffix (Real.compare o apply2 snd) num_facts overlaps_sqr | 
| 57458 | 413 | val no_recommends = Unsynchronized.ref 0 | 
| 414 | val recommends = Array.tabulate (num_facts, rpair 0.0) | |
| 415 | val age = Unsynchronized.ref 500000000.0 | |
| 416 | ||
| 417 | fun inc_recommend v j = | |
| 418 | let val (_, ov) = Array.sub (recommends, j) in | |
| 419 | if ov <= 0.0 then | |
| 420 | (no_recommends := !no_recommends + 1; Array.update (recommends, j, (j, !age + ov))) | |
| 59476 | 421 | else | 
| 422 | Array.update (recommends, j, (j, v + ov)) | |
| 57458 | 423 | end | 
| 424 | ||
| 425 | val k = Unsynchronized.ref 0 | |
| 426 | fun do_k k = | |
| 427 | if k >= num_facts then | |
| 428 | raise EXIT () | |
| 429 | else | |
| 430 | let | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 431 | val deps_factor = 2.7 (* FUDGE *) | 
| 57458 | 432 | val (j, o2) = Array.sub (overlaps_sqr, num_facts - k - 1) | 
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 433 | val _ = inc_recommend o2 j | 
| 57458 | 434 | val ds = Vector.sub (depss, j) | 
| 435 | val l = Real.fromInt (length ds) | |
| 436 | in | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 437 | List.app (inc_recommend (deps_factor * o2 / l)) ds | 
| 57458 | 438 | end | 
| 439 | ||
| 440 | fun while1 () = | |
| 59477 | 441 | if !k = initial_k + 1 then () else (do_k (!k); k := !k + 1; while1 ()) | 
| 57458 | 442 | handle EXIT () => () | 
| 443 | ||
| 444 | fun while2 () = | |
| 445 | if !no_recommends >= max_suggs then () | |
| 446 | else (do_k (!k); k := !k + 1; age := !age - 10000.0; while2 ()) | |
| 447 | handle EXIT () => () | |
| 448 | ||
| 449 | fun ret acc at = | |
| 450 | if at = num_facts then acc else ret (Array.sub (recommends, at) :: acc) (at + 1) | |
| 451 | in | |
| 452 | while1 (); | |
| 453 | while2 (); | |
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 454 | select_fact_idxs 1000000000.0 recommends fact_idxs; | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 455 | sort_array_suffix (Real.compare o apply2 snd) max_suggs recommends; | 
| 57458 | 456 | ret [] (Integer.max 0 (num_facts - max_suggs)) | 
| 457 | end | |
| 458 | ||
| 57125 
2f620ef839ee
added another way of invoking Python code, for experiments
 blanchet parents: 
57124diff
changeset | 459 | (* experimental *) | 
| 57376 | 460 | fun external_tool tool max_suggs learns goal_feats = | 
| 57291 | 461 | let | 
| 57297 | 462 | val ser = string_of_int (serial ()) (* poor person's attempt at thread-safety *) | 
| 463 |     val ocs = TextIO.openOut ("adv_syms" ^ ser)
 | |
| 464 |     val ocd = TextIO.openOut ("adv_deps" ^ ser)
 | |
| 465 |     val ocq = TextIO.openOut ("adv_seq" ^ ser)
 | |
| 466 |     val occ = TextIO.openOut ("adv_conj" ^ ser)
 | |
| 57296 | 467 | |
| 57291 | 468 | fun os oc s = TextIO.output (oc, s) | 
| 57296 | 469 | |
| 57297 | 470 | fun ol _ _ _ [] = () | 
| 471 | | ol _ f _ [e] = f e | |
| 57291 | 472 | | ol oc f sep (h :: t) = (f h; os oc sep; ol oc f sep t) | 
| 57296 | 473 | |
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 474 | fun do_learn (name, feats, deps) = | 
| 57357 | 475 | (os ocs name; os ocs ":"; ol ocs (os ocs o quote) ", " feats; os ocs "\n"; | 
| 57297 | 476 | os ocd name; os ocd ":"; ol ocd (os ocd) " " deps; os ocd "\n"; os ocq name; os ocq "\n") | 
| 57296 | 477 | |
| 57291 | 478 | fun forkexec no = | 
| 479 | let | |
| 480 | val cmd = | |
| 57297 | 481 | "~/misc/" ^ tool ^ " adv_syms" ^ ser ^ " adv_deps" ^ ser ^ " " ^ string_of_int no ^ | 
| 482 | " adv_seq" ^ ser ^ " < adv_conj" ^ ser | |
| 57291 | 483 | in | 
| 484 | fst (Isabelle_System.bash_output cmd) | |
| 485 | |> space_explode " " | |
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 486 | |> filter_out (curry (op =) "") | 
| 57291 | 487 | end | 
| 488 | in | |
| 57401 
02f56126b4e4
reintroduced 'extra features' but with lower weight than before (to account for tfidf)
 blanchet parents: 
57387diff
changeset | 489 | (List.app do_learn learns; ol occ (os occ o quote) ", " (map fst goal_feats); | 
| 57297 | 490 | TextIO.closeOut ocs; TextIO.closeOut ocd; TextIO.closeOut ocq; TextIO.closeOut occ; | 
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 491 | forkexec max_suggs) | 
| 57291 | 492 | end | 
| 493 | ||
| 57561 | 494 | fun k_nearest_neighbors_ext max_suggs = | 
| 59477 | 495 |   external_tool ("newknn/knn" ^ " " ^ string_of_int initial_k) max_suggs
 | 
| 57561 | 496 | fun naive_bayes_ext max_suggs = external_tool "predict/nbayes" max_suggs | 
| 57374 | 497 | |
| 57532 | 498 | fun query_external ctxt algorithm max_suggs learns goal_feats = | 
| 57432 | 499 | (trace_msg ctxt (fn () => "MaSh query external " ^ commas (map fst goal_feats)); | 
| 57532 | 500 | (case algorithm of | 
| 57458 | 501 | MaSh_NB_Ext => naive_bayes_ext max_suggs learns goal_feats | 
| 502 | | MaSh_kNN_Ext => k_nearest_neighbors_ext max_suggs learns goal_feats)) | |
| 57371 | 503 | |
| 57532 | 504 | fun query_internal ctxt algorithm num_facts num_feats (fact_names, featss, depss) | 
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 505 | (freqs as (_, _, dffreq)) fact_idxs max_suggs goal_feats int_goal_feats = | 
| 57459 | 506 | let | 
| 507 | fun nb () = | |
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 508 | naive_bayes freqs num_facts max_suggs fact_idxs int_goal_feats | 
| 57459 | 509 | |> map fst | 
| 510 | fun knn () = | |
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 511 | k_nearest_neighbors dffreq num_facts num_feats depss featss max_suggs fact_idxs int_goal_feats | 
| 57459 | 512 | |> map fst | 
| 513 | in | |
| 514 |     (trace_msg ctxt (fn () => "MaSh query internal " ^ commas (map fst goal_feats) ^ " from {" ^
 | |
| 515 | elide_string 1000 (space_implode " " (Vector.foldr (op ::) [] fact_names)) ^ "}"); | |
| 57532 | 516 | (case algorithm of | 
| 57459 | 517 | MaSh_NB => nb () | 
| 518 | | MaSh_kNN => knn () | |
| 519 | | MaSh_NB_kNN => | |
| 61322 | 520 | mesh_facts I (op =) max_suggs | 
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 521 | [(0.5 (* FUDGE *), (weight_facts_steeply (nb ()), [])), | 
| 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 522 | (0.5 (* FUDGE *), (weight_facts_steeply (knn ()), []))]) | 
| 57459 | 523 | |> map (curry Vector.sub fact_names)) | 
| 524 | end | |
| 57007 | 525 | |
| 526 | end; | |
| 527 | ||
| 528 | ||
| 57432 | 529 | (*** Persistent, stringly-typed state ***) | 
| 530 | ||
| 531 | fun meta_char c = | |
| 532 |   if Char.isAlphaNum c orelse c = #"_" orelse c = #"." orelse c = #"(" orelse c = #")" orelse
 | |
| 63696 
af310e622d64
tuned MaSh's metacharacters to avoid needless decoding
 blanchet parents: 
63695diff
changeset | 533 | c = #"," orelse c = #"'" then | 
| 57432 | 534 | String.str c | 
| 535 | else | |
| 536 | (* fixed width, in case more digits follow *) | |
| 537 | "%" ^ stringN_of_int 3 (Char.ord c) | |
| 538 | ||
| 539 | fun unmeta_chars accum [] = String.implode (rev accum) | |
| 540 | | unmeta_chars accum (#"%" :: d1 :: d2 :: d3 :: cs) = | |
| 541 | (case Int.fromString (String.implode [d1, d2, d3]) of | |
| 542 | SOME n => unmeta_chars (Char.chr n :: accum) cs | |
| 543 | | NONE => "" (* error *)) | |
| 544 | | unmeta_chars _ (#"%" :: _) = "" (* error *) | |
| 545 | | unmeta_chars accum (c :: cs) = unmeta_chars (c :: accum) cs | |
| 546 | ||
| 547 | val encode_str = String.translate meta_char | |
| 63695 | 548 | val encode_strs = map encode_str #> space_implode " " | 
| 57432 | 549 | |
| 63695 | 550 | fun decode_str s = | 
| 551 | if String.isSubstring "%" s then unmeta_chars [] (String.explode s) else s; | |
| 552 | ||
| 553 | fun decode_strs s = | |
| 554 | space_explode " " s |> String.isSubstring "%" s ? map decode_str; | |
| 50311 | 555 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 556 | datatype proof_kind = Isar_Proof | Automatic_Proof | Isar_Proof_wegen_Prover_Flop | 
| 50311 | 557 | |
| 558 | fun str_of_proof_kind Isar_Proof = "i" | |
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 559 | | str_of_proof_kind Automatic_Proof = "a" | 
| 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 560 | | str_of_proof_kind Isar_Proof_wegen_Prover_Flop = "x" | 
| 50311 | 561 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 562 | fun proof_kind_of_str "a" = Automatic_Proof | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 563 | | proof_kind_of_str "x" = Isar_Proof_wegen_Prover_Flop | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 564 | | proof_kind_of_str _ (* "i" *) = Isar_Proof | 
| 50311 | 565 | |
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 566 | fun add_edge_to name parent = | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 567 | Graph.default_node (parent, (Isar_Proof, [], [])) | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 568 | #> Graph.add_edge (parent, name) | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 569 | |
| 57661 | 570 | fun add_node kind name parents feats deps (accum as (access_G, (fact_xtab, feat_xtab), learns)) = | 
| 571 | let val fact_xtab' = add_to_xtab name fact_xtab in | |
| 572 | ((Graph.new_node (name, (kind, feats, deps)) access_G | |
| 573 | handle Graph.DUP _ => Graph.map_node name (K (kind, feats, deps)) access_G) | |
| 574 | |> fold (add_edge_to name) parents, | |
| 575 | (fact_xtab', fold maybe_add_to_xtab feats feat_xtab), | |
| 576 | (name, feats, deps) :: learns) | |
| 577 | end | |
| 578 | handle Symtab.DUP _ => accum (* robustness (in case the state file violates the invariant) *) | |
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 579 | |
| 50311 | 580 | fun try_graph ctxt when def f = | 
| 581 | f () | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 582 | handle | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 583 | Graph.CYCLES (cycle :: _) => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 584 | (trace_msg ctxt (fn () => "Cycle involving " ^ commas cycle ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 585 | | Graph.DUP name => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 586 | (trace_msg ctxt (fn () => "Duplicate fact " ^ quote name ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 587 | | Graph.UNDEF name => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 588 | (trace_msg ctxt (fn () => "Unknown fact " ^ quote name ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 589 | | exn => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 590 | if Exn.is_interrupt exn then | 
| 62505 | 591 | Exn.reraise exn | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 592 | else | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 593 | (trace_msg ctxt (fn () => "Internal error when " ^ when ^ ":\n" ^ Runtime.exn_message exn); | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 594 | def) | 
| 50311 | 595 | |
| 596 | fun graph_info G = | |
| 597 | string_of_int (length (Graph.keys G)) ^ " node(s), " ^ | |
| 57006 | 598 | string_of_int (fold (Integer.add o length o snd) (Graph.dest G) 0) ^ " edge(s), " ^ | 
| 50311 | 599 | string_of_int (length (Graph.maximals G)) ^ " maximal" | 
| 600 | ||
| 57545 | 601 | type ffds = string vector * int list vector * int list vector | 
| 602 | type freqs = int vector * int Inttab.table vector * int vector | |
| 603 | ||
| 53095 | 604 | type mash_state = | 
| 57358 | 605 |   {access_G : (proof_kind * string list * string list) Graph.T,
 | 
| 57374 | 606 | xtabs : xtab * xtab, | 
| 57545 | 607 | ffds : ffds, | 
| 608 | freqs : freqs, | |
| 57365 | 609 | dirty_facts : string list option} | 
| 50311 | 610 | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 611 | val empty_xtabs = (empty_xtab, empty_xtab) | 
| 57545 | 612 | val empty_ffds = (Vector.fromList [], Vector.fromList [], Vector.fromList []) : ffds | 
| 613 | val empty_freqs = (Vector.fromList [], Vector.fromList [], Vector.fromList []) : freqs | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 614 | |
| 57371 | 615 | val empty_state = | 
| 616 |   {access_G = Graph.empty,
 | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 617 | xtabs = empty_xtabs, | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 618 | ffds = empty_ffds, | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 619 | freqs = empty_freqs, | 
| 57371 | 620 | dirty_facts = SOME []} : mash_state | 
| 621 | ||
| 57545 | 622 | fun recompute_ffds_freqs_from_learns (learns : (string * string list * string list) list) | 
| 623 | ((num_facts, fact_tab), (num_feats, feat_tab)) num_facts0 (fact_names0, featss0, depss0) freqs0 = | |
| 57379 | 624 | let | 
| 57380 | 625 | val fact_names = Vector.concat [fact_names0, Vector.fromList (map #1 learns)] | 
| 626 | val featss = Vector.concat [featss0, | |
| 627 | Vector.fromList (map (map_filter (Symtab.lookup feat_tab) o #2) learns)] | |
| 628 | val depss = Vector.concat [depss0, | |
| 629 | Vector.fromList (map (map_filter (Symtab.lookup fact_tab) o #3) learns)] | |
| 57379 | 630 | in | 
| 631 | ((fact_names, featss, depss), | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 632 | MaSh.learn_facts freqs0 num_facts0 num_facts num_feats depss featss) | 
| 57379 | 633 | end | 
| 634 | ||
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 635 | fun reorder_learns (num_facts, fact_tab) learns = | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 636 |   let val ary = Array.array (num_facts, ("", [], [])) in
 | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 637 | List.app (fn learn as (fact, _, _) => | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 638 | Array.update (ary, the (Symtab.lookup fact_tab fact), learn)) | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 639 | learns; | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 640 | Array.foldr (op ::) [] ary | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 641 | end | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 642 | |
| 57380 | 643 | fun recompute_ffds_freqs_from_access_G access_G (xtabs as (fact_xtab, _)) = | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 644 | let | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 645 | val learns = | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 646 | Graph.schedule (fn _ => fn (fact, (_, feats, deps)) => (fact, feats, deps)) access_G | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 647 | |> reorder_learns fact_xtab | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 648 | in | 
| 57380 | 649 | recompute_ffds_freqs_from_learns learns xtabs 0 empty_ffds empty_freqs | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 650 | end | 
| 50311 | 651 | |
| 652 | local | |
| 653 | ||
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 654 | val version = "*** MaSh version 20190121 ***" | 
| 50357 
187ae76a1757
more robustness in the face of MaSh format changes -- don't overwrite new versions with old versions
 blanchet parents: 
50356diff
changeset | 655 | |
| 53095 | 656 | exception FILE_VERSION_TOO_NEW of unit | 
| 50311 | 657 | |
| 658 | fun extract_node line = | |
| 55286 | 659 | (case space_explode ":" line of | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 660 | [head, tail] => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 661 | (case (space_explode " " head, map (unprefix " ") (space_explode ";" tail)) of | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 662 | ([kind, name], [parents, feats, deps]) => | 
| 57358 | 663 | SOME (proof_kind_of_str kind, decode_str name, decode_strs parents, decode_strs feats, | 
| 57010 | 664 | decode_strs deps) | 
| 55286 | 665 | | _ => NONE) | 
| 666 | | _ => NONE) | |
| 50311 | 667 | |
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 668 | fun would_load_state (memory_time, _) = | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 669 | let val path = state_file () in | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 670 | (case try OS.FileSys.modTime (File.platform_path path) of | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 671 | NONE => false | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 672 | | SOME disk_time => memory_time < disk_time) | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 673 | end; | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 674 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 675 | fun load_state ctxt (time_state as (memory_time, _)) = | 
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 676 | let val path = state_file () in | 
| 60971 | 677 | (case try OS.FileSys.modTime (File.platform_path path) of | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 678 | NONE => time_state | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 679 | | SOME disk_time => | 
| 62826 | 680 | if memory_time >= disk_time then | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 681 | time_state | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 682 | else | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 683 | (disk_time, | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 684 | (case try File.read_lines path of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 685 | SOME (version' :: node_lines) => | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 686 | let | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 687 | fun extract_line_and_add_node line = | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 688 | (case extract_node line of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 689 | NONE => I (* should not happen *) | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 690 | | SOME (kind, name, parents, feats, deps) => add_node kind name parents feats deps) | 
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 691 | |
| 57379 | 692 | val empty_G_etc = (Graph.empty, empty_xtabs, []) | 
| 693 | ||
| 694 | val (access_G, xtabs, rev_learns) = | |
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 695 | (case string_ord (version', version) of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 696 | EQUAL => | 
| 57379 | 697 | try_graph ctxt "loading state" empty_G_etc | 
| 698 | (fn () => fold extract_line_and_add_node node_lines empty_G_etc) | |
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 699 | | LESS => (remove_state_file (); empty_G_etc) (* cannot parse old file *) | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 700 | | GREATER => raise FILE_VERSION_TOO_NEW ()) | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 701 | |
| 57380 | 702 | val (ffds, freqs) = | 
| 703 | recompute_ffds_freqs_from_learns (rev rev_learns) xtabs 0 empty_ffds empty_freqs | |
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 704 | in | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 705 |              trace_msg ctxt (fn () => "Loaded fact graph (" ^ graph_info access_G ^ ")");
 | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 706 |              {access_G = access_G, xtabs = xtabs, ffds = ffds, freqs = freqs, dirty_facts = SOME []}
 | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 707 | end | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 708 | | _ => empty_state))) | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 709 | end | 
| 50311 | 710 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 711 | fun str_of_entry (kind, name, parents, feats, deps) = | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 712 | str_of_proof_kind kind ^ " " ^ encode_str name ^ ": " ^ encode_strs parents ^ "; " ^ | 
| 57358 | 713 | encode_strs feats ^ "; " ^ encode_strs deps ^ "\n" | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 714 | |
| 57365 | 715 | fun save_state _ (time_state as (_, {dirty_facts = SOME [], ...})) = time_state
 | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 716 |   | save_state ctxt (memory_time, {access_G, xtabs, ffds, freqs, dirty_facts}) =
 | 
| 50311 | 717 | let | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 718 | fun append_entry (name, ((kind, feats, deps), (parents, _))) = | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 719 | cons (kind, name, Graph.Keys.dest parents, feats, deps) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 720 | |
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 721 | val path = state_file () | 
| 57365 | 722 | val dirty_facts' = | 
| 60971 | 723 | (case try OS.FileSys.modTime (File.platform_path path) of | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 724 | NONE => NONE | 
| 62826 | 725 | | SOME disk_time => if disk_time <= memory_time then dirty_facts else NONE) | 
| 50311 | 726 | val (banner, entries) = | 
| 57365 | 727 | (case dirty_facts' of | 
| 55286 | 728 | SOME names => (NONE, fold (append_entry o Graph.get_entry access_G) names []) | 
| 729 | | NONE => (SOME (version ^ "\n"), Graph.fold append_entry access_G [])) | |
| 50311 | 730 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 731 | (case banner of SOME s => File.write path s | NONE => (); | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 732 | entries |> chunk_list 500 |> List.app (File.append path o implode o map str_of_entry)) | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 733 | handle IO.Io _ => (); | 
| 50311 | 734 | trace_msg ctxt (fn () => | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 735 |         "Saved fact graph (" ^ graph_info access_G ^
 | 
| 57365 | 736 | (case dirty_facts of | 
| 737 | SOME dirty_facts => "; " ^ string_of_int (length dirty_facts) ^ " dirty fact(s)" | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 738 | | _ => "") ^ ")"); | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 739 | (Time.now (), | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 740 |        {access_G = access_G, xtabs = xtabs, ffds = ffds, freqs = freqs, dirty_facts = SOME []})
 | 
| 50311 | 741 | end | 
| 742 | ||
| 57365 | 743 | val global_state = Synchronized.var "Sledgehammer_MaSh.global_state" (Time.zeroTime, empty_state) | 
| 50311 | 744 | |
| 745 | in | |
| 746 | ||
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 747 | fun map_state ctxt f = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 748 | (trace_msg ctxt (fn () => "Changing MaSh state"); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 749 | synchronized_timed_change global_state mash_time_limit | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 750 | (load_state ctxt ##> f #> save_state ctxt)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 751 | |> ignore | 
| 53095 | 752 | handle FILE_VERSION_TOO_NEW () => () | 
| 50311 | 753 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 754 | fun peek_state ctxt = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 755 | (trace_msg ctxt (fn () => "Peeking at MaSh state"); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 756 | (case synchronized_timed_value global_state mash_time_limit of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 757 | NONE => NONE | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 758 | | SOME state => if would_load_state state then NONE else SOME state)) | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 759 | |
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 760 | fun get_state ctxt = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 761 | (trace_msg ctxt (fn () => "Retrieving MaSh state"); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 762 | synchronized_timed_change_result global_state mash_time_limit | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 763 | (perhaps (try (load_state ctxt)) #> `snd)) | 
| 50311 | 764 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 765 | fun clear_state ctxt = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 766 | (trace_msg ctxt (fn () => "Clearing MaSh state"); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 767 | Synchronized.change global_state (fn _ => (remove_state_file (); (Time.zeroTime, empty_state)))) | 
| 50311 | 768 | |
| 769 | end | |
| 770 | ||
| 771 | ||
| 772 | (*** Isabelle helpers ***) | |
| 773 | ||
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 774 | fun crude_printed_term size t = | 
| 61321 | 775 | let | 
| 776 | fun term _ (res, 0) = (res, 0) | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 777 | | term (t $ u) (res, size) = | 
| 61321 | 778 | let | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 779 |           val (res, size) = term t (res ^ "(", size)
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 780 | val (res, size) = term u (res ^ " ", size) | 
| 61321 | 781 | in | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 782 | (res ^ ")", size) | 
| 61321 | 783 | end | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 784 | | term (Abs (s, _, t)) (res, size) = term t (res ^ "%" ^ s ^ ".", size - 1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 785 | | term (Bound n) (res, size) = (res ^ "#" ^ string_of_int n, size - 1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 786 | | term (Const (s, _)) (res, size) = (res ^ Long_Name.base_name s, size - 1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 787 | | term (Free (s, _)) (res, size) = (res ^ s, size - 1) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 788 | | term (Var ((s, _), _)) (res, size) = (res ^ s, size - 1) | 
| 61321 | 789 | in | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 790 |     fst (term t ("", size))
 | 
| 61321 | 791 | end | 
| 48378 | 792 | |
| 61321 | 793 | fun nickname_of_thm th = | 
| 48394 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 794 | if Thm.has_name_hint th then | 
| 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 795 | let val hint = Thm.get_name_hint th in | 
| 50722 | 796 | (* There must be a better way to detect local facts. *) | 
| 59888 | 797 | (case Long_Name.dest_local hint of | 
| 60641 | 798 | SOME suf => | 
| 65458 | 799 | Long_Name.implode [Thm.theory_name th, suf, crude_printed_term 25 (Thm.prop_of th)] | 
| 55286 | 800 | | NONE => hint) | 
| 48394 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 801 | end | 
| 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 802 | else | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 803 | crude_printed_term 50 (Thm.prop_of th) | 
| 48378 | 804 | |
| 51134 | 805 | fun find_suggested_facts ctxt facts = | 
| 48330 | 806 | let | 
| 61321 | 807 | fun add (fact as (_, th)) = Symtab.default (nickname_of_thm th, fact) | 
| 51134 | 808 | val tab = fold add facts Symtab.empty | 
| 809 | fun lookup nick = | |
| 810 | Symtab.lookup tab nick | |
| 57006 | 811 | |> tap (fn NONE => trace_msg ctxt (fn () => "Cannot find " ^ quote nick) | _ => ()) | 
| 51134 | 812 | in map_filter lookup end | 
| 48311 | 813 | |
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 814 | fun free_feature_of s = "f" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 815 | fun thy_feature_of s = "y" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 816 | fun type_feature_of s = "t" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 817 | fun class_feature_of s = "s" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 818 | val local_feature = "local" | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 819 | |
| 60948 | 820 | fun crude_thm_ord ctxt = | 
| 821 | let | |
| 822 | val ancestor_lengths = | |
| 823 | fold (fn thy => Symtab.update (Context.theory_name thy, length (Context.ancestors_of thy))) | |
| 824 | (Theory.nodes_of (Proof_Context.theory_of ctxt)) Symtab.empty | |
| 825 | val ancestor_length = Symtab.lookup ancestor_lengths o Context.theory_id_name | |
| 61321 | 826 | |
| 60948 | 827 | fun crude_theory_ord p = | 
| 828 | if Context.eq_thy_id p then EQUAL | |
| 829 | else if Context.proper_subthy_id p then LESS | |
| 830 | else if Context.proper_subthy_id (swap p) then GREATER | |
| 831 | else | |
| 832 | (case apply2 ancestor_length p of | |
| 833 | (SOME m, SOME n) => | |
| 834 | (case int_ord (m, n) of | |
| 835 | EQUAL => string_ord (apply2 Context.theory_id_name p) | |
| 836 | | ord => ord) | |
| 837 | | _ => string_ord (apply2 Context.theory_id_name p)) | |
| 838 | in | |
| 839 | fn p => | |
| 65458 | 840 | (case crude_theory_ord (apply2 Thm.theory_id p) of | 
| 60948 | 841 | EQUAL => | 
| 842 | (* The hack below is necessary because of odd dependencies that are not reflected in the theory | |
| 843 | comparison. *) | |
| 61321 | 844 | let val q = apply2 nickname_of_thm p in | 
| 60948 | 845 | (* Hack to put "xxx_def" before "xxxI" and "xxxE" *) | 
| 846 | (case bool_ord (apply2 (String.isSuffix "_def") (swap q)) of | |
| 847 | EQUAL => string_ord q | |
| 848 | | ord => ord) | |
| 849 | end | |
| 850 | | ord => ord) | |
| 851 | end; | |
| 48324 | 852 | |
| 65458 | 853 | val thm_less_eq = Context.subthy_id o apply2 Thm.theory_id | 
| 51136 | 854 | fun thm_less p = thm_less_eq p andalso not (thm_less_eq (swap p)) | 
| 855 | ||
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 856 | val freezeT = Type.legacy_freeze_type | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 857 | |
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 858 | fun freeze (t $ u) = freeze t $ freeze u | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 859 | | freeze (Abs (s, T, t)) = Abs (s, freezeT T, freeze t) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 860 | | freeze (Var ((s, _), T)) = Free (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 861 | | freeze (Const (s, T)) = Const (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 862 | | freeze (Free (s, T)) = Free (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 863 | | freeze t = t | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 864 | |
| 59621 
291934bac95e
Thm.cterm_of and Thm.ctyp_of operate on local context;
 wenzelm parents: 
59582diff
changeset | 865 | fun goal_of_thm thy = Thm.prop_of #> freeze #> Thm.global_cterm_of thy #> Goal.init | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 866 | |
| 54141 
f57f8e7a879f
generate a comment storing the goal nickname in "learn_prover"
 blanchet parents: 
54140diff
changeset | 867 | fun run_prover_for_mash ctxt params prover goal_name facts goal = | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 868 | let | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 869 | val problem = | 
| 54141 
f57f8e7a879f
generate a comment storing the goal nickname in "learn_prover"
 blanchet parents: 
54140diff
changeset | 870 |       {comment = "Goal: " ^ goal_name, state = Proof.init ctxt, goal = goal, subgoal = 1,
 | 
| 62735 | 871 |        subgoal_count = 1, factss = [("", facts)], found_proof = I}
 | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 872 | in | 
| 57735 
056a55b44ec7
eliminated Sledgehammer's "min" subcommand (and lots of complications in the code)
 blanchet parents: 
57662diff
changeset | 873 | get_minimizing_prover ctxt MaSh (K ()) prover params problem | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 874 | end | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 875 | |
| 69593 | 876 | val bad_types = [\<^type_name>\<open>prop\<close>, \<^type_name>\<open>bool\<close>, \<^type_name>\<open>fun\<close>] | 
| 48326 | 877 | |
| 69593 | 878 | val crude_str_of_sort = space_implode "," o map Long_Name.base_name o subtract (op =) \<^sort>\<open>type\<close> | 
| 53086 
15fe0ca088b3
MaSh tweaking: shorter names + killed (broken) SNoW
 blanchet parents: 
53085diff
changeset | 879 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 880 | fun crude_str_of_typ (Type (s, [])) = Long_Name.base_name s | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 881 | | crude_str_of_typ (Type (s, Ts)) = Long_Name.base_name s ^ implode (map crude_str_of_typ Ts) | 
| 53083 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 882 | | crude_str_of_typ (TFree (_, S)) = crude_str_of_sort S | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 883 | | crude_str_of_typ (TVar (_, S)) = crude_str_of_sort S | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 884 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 885 | fun maybe_singleton_str "" = [] | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 886 | | maybe_singleton_str s = [s] | 
| 53128 | 887 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 888 | val max_pat_breadth = 5 (* FUDGE *) | 
| 50585 | 889 | |
| 57406 | 890 | fun term_features_of ctxt thy_name term_max_depth type_max_depth ts = | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 891 | let | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 892 | val thy = Proof_Context.theory_of ctxt | 
| 53082 | 893 | |
| 50393 | 894 | val fixes = map snd (Variable.dest_fixes ctxt) | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 895 | val classes = Sign.classes_of thy | 
| 53082 | 896 | |
| 69593 | 897 | fun add_classes \<^sort>\<open>type\<close> = I | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 898 | | add_classes S = | 
| 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 899 | fold (`(Sorts.super_classes classes) | 
| 57006 | 900 | #> swap #> op :: | 
| 69593 | 901 | #> subtract (op =) \<^sort>\<open>type\<close> | 
| 57006 | 902 | #> map class_feature_of | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 903 | #> union (op =)) S | 
| 53082 | 904 | |
| 905 | fun pattify_type 0 _ = [] | |
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 906 | | pattify_type _ (Type (s, [])) = if member (op =) bad_types s then [] else [s] | 
| 53082 | 907 | | pattify_type depth (Type (s, U :: Ts)) = | 
| 908 | let | |
| 909 | val T = Type (s, Ts) | |
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 910 | val ps = take max_pat_breadth (pattify_type depth T) | 
| 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 911 |           val qs = take max_pat_breadth ("" :: pattify_type (depth - 1) U)
 | 
| 57006 | 912 | in | 
| 913 |           map_product (fn p => fn "" => p | q => p ^ "(" ^ q ^ ")") ps qs
 | |
| 914 | end | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 915 | | pattify_type _ (TFree (_, S)) = maybe_singleton_str (crude_str_of_sort S) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 916 | | pattify_type _ (TVar (_, S)) = maybe_singleton_str (crude_str_of_sort S) | 
| 57006 | 917 | |
| 53082 | 918 | fun add_type_pat depth T = | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 919 | union (op =) (map type_feature_of (pattify_type depth T)) | 
| 57006 | 920 | |
| 53082 | 921 | fun add_type_pats 0 _ = I | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 922 | | add_type_pats depth t = add_type_pat depth t #> add_type_pats (depth - 1) t | 
| 57006 | 923 | |
| 53083 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 924 | fun add_type T = | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 925 | add_type_pats type_max_depth T | 
| 53156 | 926 | #> fold_atyps_sorts (add_classes o snd) T | 
| 57006 | 927 | |
| 53084 | 928 | fun add_subtypes (T as Type (_, Ts)) = add_type T #> fold add_subtypes Ts | 
| 929 | | add_subtypes T = add_type T | |
| 53082 | 930 | |
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 931 | fun pattify_term _ 0 _ = [] | 
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 932 | | pattify_term _ _ (Const (s, _)) = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 933 | if is_widely_irrelevant_const s then [] else [s] | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 934 | | pattify_term _ _ (Free (s, T)) = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 935 | maybe_singleton_str (crude_str_of_typ T) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 936 | |> (if member (op =) fixes s then cons (free_feature_of (Long_Name.append thy_name s)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 937 | else I) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 938 | | pattify_term _ _ (Var (_, T)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 939 | maybe_singleton_str (crude_str_of_typ T) | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 940 | | pattify_term Ts _ (Bound j) = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 941 | maybe_singleton_str (crude_str_of_typ (nth Ts j)) | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 942 | | pattify_term Ts depth (t $ u) = | 
| 50339 
d8dae91f3107
MaSh improvements: deeper patterns + more respect for chained facts
 blanchet parents: 
50338diff
changeset | 943 | let | 
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 944 | val ps = take max_pat_breadth (pattify_term Ts depth t) | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 945 |           val qs = take max_pat_breadth ("" :: pattify_term Ts (depth - 1) u)
 | 
| 53130 | 946 | in | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 947 |           map_product (fn p => fn "" => p | q => p ^ "(" ^ q ^ ")") ps qs
 | 
| 53130 | 948 | end | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 949 | | pattify_term _ _ _ = [] | 
| 57006 | 950 | |
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 951 | fun add_term_pat Ts = union (op =) oo pattify_term Ts | 
| 57006 | 952 | |
| 53130 | 953 | fun add_term_pats _ 0 _ = I | 
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 954 | | add_term_pats Ts depth t = add_term_pat Ts depth t #> add_term_pats Ts (depth - 1) t | 
| 57006 | 955 | |
| 53130 | 956 | fun add_term Ts = add_term_pats Ts term_max_depth | 
| 57006 | 957 | |
| 53085 | 958 | fun add_subterms Ts t = | 
| 55286 | 959 | (case strip_comb t of | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 960 | (Const (s, T), args) => | 
| 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 961 | (not (is_widely_irrelevant_const s) ? add_term Ts t) | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 962 | #> add_subtypes T #> fold (add_subterms Ts) args | 
| 50857 
80768e28c9ee
better handlig of built-ins -- at the top-level, not in subterms
 blanchet parents: 
50841diff
changeset | 963 | | (head, args) => | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 964 | (case head of | 
| 53130 | 965 | Free (_, T) => add_term Ts t #> add_subtypes T | 
| 53084 | 966 | | Var (_, T) => add_subtypes T | 
| 53085 | 967 | | Abs (_, T, body) => add_subtypes T #> add_subterms (T :: Ts) body | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 968 | | _ => I) | 
| 55286 | 969 | #> fold (add_subterms Ts) args) | 
| 57006 | 970 | in | 
| 971 | fold (add_subterms []) ts [] | |
| 972 | end | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 973 | |
| 53085 | 974 | val term_max_depth = 2 | 
| 53155 | 975 | val type_max_depth = 1 | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 976 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 977 | (* TODO: Generate type classes for types? *) | 
| 60640 | 978 | fun features_of ctxt thy_name (scope, _) ts = | 
| 979 | thy_feature_of thy_name :: | |
| 980 | term_features_of ctxt thy_name term_max_depth type_max_depth ts | |
| 981 | |> scope <> Global ? cons local_feature | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 982 | |
| 57006 | 983 | (* Too many dependencies is a sign that a decision procedure is at work. There is not much to learn | 
| 984 | from such proofs. *) | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 985 | val max_dependencies = 20 (* FUDGE *) | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 986 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 987 | val prover_default_max_facts = 25 (* FUDGE *) | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 988 | |
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 989 | (* "type_definition_xxx" facts are characterized by their use of "CollectI". *) | 
| 61321 | 990 | val typedef_dep = nickname_of_thm @{thm CollectI}
 | 
| 57006 | 991 | (* Mysterious parts of the class machinery create lots of proofs that refer exclusively to | 
| 992 | "someI_ex" (and to some internal constructions). *) | |
| 61321 | 993 | val class_some_dep = nickname_of_thm @{thm someI_ex}
 | 
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 994 | |
| 50828 | 995 | val fundef_ths = | 
| 57006 | 996 |   @{thms fundef_ex1_existence fundef_ex1_uniqueness fundef_ex1_iff fundef_default_value}
 | 
| 61321 | 997 | |> map nickname_of_thm | 
| 50828 | 998 | |
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 999 | (* "Rep_xxx_inject", "Abs_xxx_inverse", etc., are derived using these facts. *) | 
| 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 1000 | val typedef_ths = | 
| 57006 | 1001 |   @{thms type_definition.Abs_inverse type_definition.Rep_inverse type_definition.Rep
 | 
| 1002 | type_definition.Rep_inject type_definition.Abs_inject type_definition.Rep_cases | |
| 1003 | type_definition.Abs_cases type_definition.Rep_induct type_definition.Abs_induct | |
| 1004 | type_definition.Rep_range type_definition.Abs_image} | |
| 61321 | 1005 | |> map nickname_of_thm | 
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 1006 | |
| 61321 | 1007 | fun is_size_def [dep] th = | 
| 55642 
63beb38e9258
adapted to renaming of datatype 'cases' and 'recs' to 'case' and 'rec'
 blanchet parents: 
55286diff
changeset | 1008 | (case first_field ".rec" dep of | 
| 57006 | 1009 | SOME (pref, _) => | 
| 61321 | 1010 | (case first_field ".size" (nickname_of_thm th) of | 
| 57006 | 1011 | SOME (pref', _) => pref = pref' | 
| 1012 | | NONE => false) | |
| 1013 | | NONE => false) | |
| 61321 | 1014 | | is_size_def _ _ = false | 
| 48441 | 1015 | |
| 51177 | 1016 | fun trim_dependencies deps = | 
| 50755 | 1017 | if length deps > max_dependencies then NONE else SOME deps | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1018 | |
| 61321 | 1019 | fun isar_dependencies_of name_tabs th = | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1020 | thms_in_proof max_dependencies (SOME name_tabs) th | 
| 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1021 | |> Option.map (fn deps => | 
| 57006 | 1022 | if deps = [typedef_dep] orelse deps = [class_some_dep] orelse | 
| 57757 | 1023 | exists (member (op =) fundef_ths) deps orelse exists (member (op =) typedef_ths) deps orelse | 
| 61321 | 1024 | is_size_def deps th then | 
| 50755 | 1025 | [] | 
| 1026 | else | |
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1027 | deps) | 
| 48404 | 1028 | |
| 57006 | 1029 | fun prover_dependencies_of ctxt (params as {verbose, max_facts, ...}) prover auto_level facts
 | 
| 1030 | name_tabs th = | |
| 61321 | 1031 | (case isar_dependencies_of name_tabs th of | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1032 | SOME [] => (false, []) | 
| 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1033 | | isar_deps0 => | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1034 | let | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 1035 | val isar_deps = these isar_deps0 | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1036 | val thy = Proof_Context.theory_of ctxt | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1037 | val goal = goal_of_thm thy th | 
| 61321 | 1038 | val name = nickname_of_thm th | 
| 52196 
2281f33e8da6
redid rac7830871177 to avoid duplicate fixed variable (e.g. lemma "P (a::nat)" proof - have "!!a::int. Q a" sledgehammer [e])
 blanchet parents: 
52125diff
changeset | 1039 | val (_, hyp_ts, concl_t) = ATP_Util.strip_subgoal goal 1 ctxt | 
| 51136 | 1040 | val facts = facts |> filter (fn (_, th') => thm_less (th', th)) | 
| 57006 | 1041 | |
| 61321 | 1042 | fun nickify ((_, stature), th) = ((nickname_of_thm th, stature), th) | 
| 57006 | 1043 | |
| 61321 | 1044 | fun is_dep dep (_, th) = (nickname_of_thm th = dep) | 
| 57006 | 1045 | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1046 | fun add_isar_dep facts dep accum = | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1047 | if exists (is_dep dep) accum then | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1048 | accum | 
| 55286 | 1049 | else | 
| 1050 | (case find_first (is_dep dep) facts of | |
| 1051 |             SOME ((_, status), th) => accum @ [(("", status), th)]
 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1052 | | NONE => accum (* should not happen *)) | 
| 57006 | 1053 | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1054 | val mepo_facts = | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1055 | facts | 
| 54095 | 1056 | |> mepo_suggested_facts ctxt params (max_facts |> the_default prover_default_max_facts) NONE | 
| 1057 | hyp_ts concl_t | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1058 | val facts = | 
| 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1059 | mepo_facts | 
| 50754 
74a6adcb96ac
also generate queries for goals with too many Isar dependencies
 blanchet parents: 
50751diff
changeset | 1060 | |> fold (add_isar_dep facts) isar_deps | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 1061 | |> map nickify | 
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1062 | val num_isar_deps = length isar_deps | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1063 | in | 
| 48404 | 1064 | if verbose andalso auto_level = 0 then | 
| 58843 | 1065 |         writeln ("MaSh: " ^ quote prover ^ " on " ^ quote name ^ " with " ^
 | 
| 57017 | 1066 | string_of_int num_isar_deps ^ " + " ^ string_of_int (length facts - num_isar_deps) ^ | 
| 63692 | 1067 | " facts") | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1068 | else | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1069 | (); | 
| 55286 | 1070 | (case run_prover_for_mash ctxt params prover name facts goal of | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1071 |         {outcome = NONE, used_facts, ...} =>
 | 
| 48404 | 1072 | (if verbose andalso auto_level = 0 then | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1073 | let val num_facts = length used_facts in | 
| 58843 | 1074 |              writeln ("Found proof with " ^ string_of_int num_facts ^ " fact" ^
 | 
| 63692 | 1075 | plural_s num_facts) | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1076 | end | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1077 | else | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1078 | (); | 
| 50754 
74a6adcb96ac
also generate queries for goals with too many Isar dependencies
 blanchet parents: 
50751diff
changeset | 1079 | (true, map fst used_facts)) | 
| 55286 | 1080 | | _ => (false, isar_deps)) | 
| 1081 | end) | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1082 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1083 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1084 | (*** High-level communication with MaSh ***) | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1085 | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1086 | (* In the following functions, chunks are risers w.r.t. "thm_less_eq". *) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1087 | |
| 61321 | 1088 | fun chunks_and_parents_for chunks th = | 
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1089 | let | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1090 | fun insert_parent new parents = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1091 | let val parents = parents |> filter_out (fn p => thm_less_eq (p, new)) in | 
| 57096 
e4074b91b2a6
always remove duplicates in meshing + use weights for Naive Bayes
 blanchet parents: 
57095diff
changeset | 1092 | parents |> forall (fn p => not (thm_less_eq (new, p))) parents ? cons new | 
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1093 | end | 
| 57006 | 1094 | |
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1095 | fun rechunk seen (rest as th' :: ths) = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1096 | if thm_less_eq (th', th) then (rev seen, rest) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1097 | else rechunk (th' :: seen) ths | 
| 57006 | 1098 | |
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1099 | fun do_chunk [] accum = accum | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1100 | | do_chunk (chunk as hd_chunk :: _) (chunks, parents) = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1101 | if thm_less_eq (hd_chunk, th) then | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1102 | (chunk :: chunks, insert_parent hd_chunk parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1103 | else if thm_less_eq (List.last chunk, th) then | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1104 | let val (front, back as hd_back :: _) = rechunk [] chunk in | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1105 | (front :: back :: chunks, insert_parent hd_back parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1106 | end | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1107 | else | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1108 | (chunk :: chunks, parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1109 | in | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1110 | fold_rev do_chunk chunks ([], []) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1111 | |>> cons [] | 
| 61321 | 1112 | ||> map nickname_of_thm | 
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1113 | end | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1114 | |
| 61321 | 1115 | fun attach_parents_to_facts _ [] = [] | 
| 1116 | | attach_parents_to_facts old_facts (facts as (_, th) :: _) = | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1117 | let | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1118 | fun do_facts _ [] = [] | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1119 | | do_facts (_, parents) [fact] = [(parents, fact)] | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1120 | | do_facts (chunks, parents) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1121 | ((fact as (_, th)) :: (facts as (_, th') :: _)) = | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1122 | let | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1123 | val chunks = app_hd (cons th) chunks | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1124 | val chunks_and_parents' = | 
| 60638 | 1125 | if thm_less_eq (th, th') andalso | 
| 65458 | 1126 | Thm.theory_name th = Thm.theory_name th' | 
| 61321 | 1127 | then (chunks, [nickname_of_thm th]) | 
| 1128 | else chunks_and_parents_for chunks th' | |
| 57006 | 1129 | in | 
| 1130 | (parents, fact) :: do_facts chunks_and_parents' facts | |
| 1131 | end | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1132 | in | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1133 | old_facts @ facts | 
| 61321 | 1134 | |> do_facts (chunks_and_parents_for [[]] th) | 
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1135 | |> drop (length old_facts) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1136 | end | 
| 51177 | 1137 | |
| 61321 | 1138 | fun is_fact_in_graph access_G = can (Graph.get_node access_G) o nickname_of_thm | 
| 53095 | 1139 | |
| 53197 | 1140 | val chained_feature_factor = 0.5 (* FUDGE *) | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1141 | val extra_feature_factor = 0.1 (* FUDGE *) | 
| 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1142 | val num_extra_feature_facts = 10 (* FUDGE *) | 
| 53095 | 1143 | |
| 57660 | 1144 | val max_proximity_facts = 100 (* FUDGE *) | 
| 53095 | 1145 | |
| 54060 | 1146 | fun find_mash_suggestions ctxt max_facts suggs facts chained raw_unknown = | 
| 1147 | let | |
| 1148 | val inter_fact = inter (eq_snd Thm.eq_thm_prop) | |
| 1149 | val raw_mash = find_suggested_facts ctxt facts suggs | |
| 1150 | val proximate = take max_proximity_facts facts | |
| 1151 | val unknown_chained = inter_fact raw_unknown chained | |
| 1152 | val unknown_proximate = inter_fact raw_unknown proximate | |
| 1153 | val mess = | |
| 1154 | [(0.9 (* FUDGE *), (map (rpair 1.0) unknown_chained, [])), | |
| 1155 | (0.4 (* FUDGE *), (weight_facts_smoothly unknown_proximate, [])), | |
| 1156 | (0.1 (* FUDGE *), (weight_facts_steeply raw_mash, raw_unknown))] | |
| 57007 | 1157 | val unknown = raw_unknown | 
| 1158 | |> fold (subtract (eq_snd Thm.eq_thm_prop)) [unknown_chained, unknown_proximate] | |
| 57006 | 1159 | in | 
| 61322 | 1160 | (mesh_facts (fact_distinct (op aconv)) (eq_snd (gen_eq_thm ctxt)) max_facts mess, unknown) | 
| 57006 | 1161 | end | 
| 53095 | 1162 | |
| 60649 
e007aa6a8aa2
more explicit use of context and elimination of Thm.theory_of_thm, although unclear (and untested?) situations remain;
 wenzelm parents: 
60641diff
changeset | 1163 | fun mash_suggested_facts ctxt thy_name ({debug, ...} : params) max_suggs hyp_ts concl_t facts =
 | 
| 53095 | 1164 | let | 
| 57532 | 1165 | val algorithm = the_mash_algorithm () | 
| 57052 
ea5912e3b008
until naive Bayes supports weights, don't incorporate 'extra' low-weight features
 blanchet parents: 
57039diff
changeset | 1166 | |
| 57460 | 1167 | val facts = facts | 
| 60641 | 1168 | |> rev_sort_list_prefix (crude_thm_ord ctxt o apply2 snd) | 
| 57460 | 1169 | (Int.max (num_extra_feature_facts, max_proximity_facts)) | 
| 1170 | ||
| 1171 | val chained = filter (fn ((_, (scope, _)), _) => scope = Chained) facts | |
| 54085 | 1172 | |
| 65458 | 1173 | fun fact_has_right_theory (_, th) = thy_name = Thm.theory_name th | 
| 56995 | 1174 | |
| 53141 
d27e99a6a679
take chained and proximate facts into consideration when computing MaSh features
 blanchet parents: 
53140diff
changeset | 1175 | fun chained_or_extra_features_of factor (((_, stature), th), weight) = | 
| 59582 | 1176 | [Thm.prop_of th] | 
| 65458 | 1177 | |> features_of ctxt (Thm.theory_name th) stature | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 1178 | |> map (rpair (weight * factor)) | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1179 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1180 | (case get_state ctxt of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1181 | NONE => ([], []) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1182 |     | SOME {access_G, xtabs = ((num_facts, fact_tab), (num_feats, feat_tab)), ffds, freqs, ...} =>
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1183 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1184 | val goal_feats0 = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1185 | features_of ctxt thy_name (Local, General) (concl_t :: hyp_ts) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1186 | val chained_feats = chained | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1187 | |> map (rpair 1.0) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1188 | |> map (chained_or_extra_features_of chained_feature_factor) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1189 | |> rpair [] |-> fold (union (eq_fst (op =))) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1190 | val extra_feats = facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1191 | |> take (Int.max (0, num_extra_feature_facts - length chained)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1192 | |> filter fact_has_right_theory | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1193 | |> weight_facts_steeply | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1194 | |> map (chained_or_extra_features_of extra_feature_factor) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1195 | |> rpair [] |-> fold (union (eq_fst (op =))) | 
| 57017 | 1196 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1197 | val goal_feats = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1198 | fold (union (eq_fst (op =))) [chained_feats, extra_feats] (map (rpair 1.0) goal_feats0) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1199 | |> debug ? sort (Real.compare o swap o apply2 snd) | 
| 57460 | 1200 | |
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 1201 | val fact_idxs = map_filter (Symtab.lookup fact_tab o nickname_of_thm o snd) facts | 
| 57460 | 1202 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1203 | val suggs = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1204 | if algorithm = MaSh_NB_Ext orelse algorithm = MaSh_kNN_Ext then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1205 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1206 | val learns = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1207 | Graph.schedule (fn _ => fn (fact, (_, feats, deps)) => (fact, feats, deps)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1208 | access_G | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1209 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1210 | MaSh.query_external ctxt algorithm max_suggs learns goal_feats | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1211 | end | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1212 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1213 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1214 | val int_goal_feats = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1215 | map_filter (fn (s, w) => Option.map (rpair w) (Symtab.lookup feat_tab s)) goal_feats | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1216 | in | 
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 1217 | MaSh.query_internal ctxt algorithm num_facts num_feats ffds freqs fact_idxs max_suggs | 
| 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 1218 | goal_feats int_goal_feats | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1219 | end | 
| 57017 | 1220 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1221 | val unknown = filter_out (is_fact_in_graph access_G o snd) facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1222 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1223 | find_mash_suggestions ctxt max_suggs suggs facts chained unknown | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1224 | |> apply2 (map fact_of_raw_fact) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1225 | end) | 
| 53095 | 1226 | end | 
| 1227 | ||
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1228 | fun mash_unlearn ctxt = (clear_state ctxt; writeln "Reset MaSh") | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1229 | |
| 57661 | 1230 | fun learn_wrt_access_graph ctxt (name, parents, feats, deps) | 
| 1231 | (accum as (access_G, (fact_xtab, feat_xtab))) = | |
| 53095 | 1232 | let | 
| 57371 | 1233 | fun maybe_learn_from from (accum as (parents, access_G)) = | 
| 57013 
ed95456499e6
better way to take invisible facts into account than 'island' business
 blanchet parents: 
57012diff
changeset | 1234 | try_graph ctxt "updating graph" accum (fn () => | 
| 57371 | 1235 | (from :: parents, Graph.add_edge_acyclic (from, name) access_G)) | 
| 1236 | ||
| 1237 | val access_G = access_G |> Graph.default_node (name, (Isar_Proof, feats, deps)) | |
| 1238 | val (parents, access_G) = ([], access_G) |> fold maybe_learn_from parents | |
| 1239 | val (deps, _) = ([], access_G) |> fold maybe_learn_from deps | |
| 1240 | ||
| 57661 | 1241 | val fact_xtab = add_to_xtab name fact_xtab | 
| 57371 | 1242 | val feat_xtab = fold maybe_add_to_xtab feats feat_xtab | 
| 57006 | 1243 | in | 
| 57661 | 1244 | (SOME (name, parents, feats, deps), (access_G, (fact_xtab, feat_xtab))) | 
| 57006 | 1245 | end | 
| 57661 | 1246 | handle Symtab.DUP _ => (NONE, accum) (* facts sometimes have the same name, confusingly *) | 
| 53095 | 1247 | |
| 57381 | 1248 | fun relearn_wrt_access_graph ctxt (name, deps) access_G = | 
| 53095 | 1249 | let | 
| 57371 | 1250 | fun maybe_relearn_from from (accum as (parents, access_G)) = | 
| 53095 | 1251 | try_graph ctxt "updating graph" accum (fn () => | 
| 57371 | 1252 | (from :: parents, Graph.add_edge_acyclic (from, name) access_G)) | 
| 1253 | val access_G = | |
| 1254 | access_G |> Graph.map_node name (fn (_, feats, _) => (Automatic_Proof, feats, deps)) | |
| 1255 | val (deps, _) = ([], access_G) |> fold maybe_relearn_from deps | |
| 57006 | 1256 | in | 
| 57381 | 1257 | ((name, deps), access_G) | 
| 57006 | 1258 | end | 
| 53095 | 1259 | |
| 1260 | fun flop_wrt_access_graph name = | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1261 | Graph.map_node name (fn (_, feats, deps) => (Isar_Proof_wegen_Prover_Flop, feats, deps)) | 
| 53095 | 1262 | |
| 57273 
01b68f625550
automatically learn MaSh facts also in 'blocking' mode
 blanchet parents: 
57150diff
changeset | 1263 | val learn_timeout_slack = 20.0 | 
| 53095 | 1264 | |
| 1265 | fun launch_thread timeout task = | |
| 1266 | let | |
| 1267 | val hard_timeout = time_mult learn_timeout_slack timeout | |
| 1268 | val birth_time = Time.now () | |
| 62826 | 1269 | val death_time = birth_time + hard_timeout | 
| 53095 | 1270 |     val desc = ("Machine learner for Sledgehammer", "")
 | 
| 57006 | 1271 | in | 
| 59471 
ca459352d8c5
more explicit indication of Async_Manager_Legacy as Proof General legacy;
 wenzelm parents: 
59172diff
changeset | 1272 | Async_Manager_Legacy.thread MaShN birth_time death_time desc task | 
| 57006 | 1273 | end | 
| 53095 | 1274 | |
| 61318 | 1275 | fun anonymous_proof_name () = | 
| 1276 | Date.fmt (anonymous_proof_prefix ^ "%Y%m%d.%H%M%S.") (Date.fromTimeLocal (Time.now ())) ^ | |
| 1277 | serial_string () | |
| 57013 
ed95456499e6
better way to take invisible facts into account than 'island' business
 blanchet parents: 
57012diff
changeset | 1278 | |
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 1279 | fun mash_learn_proof ctxt ({timeout, ...} : params) t used_ths =
 | 
| 57387 
2b6fe2a48352
reintroduced MaSh hints, this time as persistent creatures
 blanchet parents: 
57386diff
changeset | 1280 | if not (null used_ths) andalso is_mash_enabled () then | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1281 | launch_thread timeout (fn () => | 
| 57006 | 1282 | let | 
| 1283 | val thy = Proof_Context.theory_of ctxt | |
| 60640 | 1284 | val feats = features_of ctxt (Context.theory_name thy) (Local, General) [t] | 
| 57006 | 1285 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1286 | map_state ctxt | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1287 |           (fn {access_G, xtabs as ((num_facts0, _), _), ffds, freqs, dirty_facts} =>
 | 
| 57371 | 1288 | let | 
| 1289 | val deps = used_ths | |
| 61321 | 1290 | |> filter (is_fact_in_graph access_G) | 
| 1291 | |> map nickname_of_thm | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1292 | |
| 61318 | 1293 | val name = anonymous_proof_name () | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1294 | val (access_G', xtabs', rev_learns) = | 
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 1295 | add_node Automatic_Proof name [] (* ignore parents *) feats deps | 
| 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 1296 | (access_G, xtabs, []) | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1297 | |
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1298 | val (ffds', freqs') = | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1299 | recompute_ffds_freqs_from_learns (rev rev_learns) xtabs' num_facts0 ffds freqs | 
| 57371 | 1300 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1301 |                {access_G = access_G', xtabs = xtabs', ffds = ffds', freqs = freqs',
 | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1302 | dirty_facts = Option.map (cons name) dirty_facts} | 
| 57371 | 1303 | end); | 
| 57006 | 1304 | (true, "") | 
| 1305 | end) | |
| 53819 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 1306 | else | 
| 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 1307 | () | 
| 53095 | 1308 | |
| 63518 | 1309 | fun sendback sub = Active.sendback_markup_command (sledgehammerN ^ " " ^ sub) | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1310 | |
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1311 | val commit_timeout = seconds 30.0 | 
| 48332 
271a4a6af734
optimize parent computation in MaSh + remove temporary files
 blanchet parents: 
48330diff
changeset | 1312 | |
| 50485 
3c6ac2da2f45
merge aliased theorems in MaSh dependencies, modulo symmetry of equality
 blanchet parents: 
50484diff
changeset | 1313 | (* The timeout is understood in a very relaxed fashion. *) | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1314 | fun mash_learn_facts ctxt (params as {debug, verbose, ...}) prover auto_level run_prover
 | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1315 | learn_timeout facts = | 
| 48304 | 1316 | let | 
| 48318 | 1317 | val timer = Timer.startRealTimer () | 
| 62826 | 1318 | fun next_commit_time () = Timer.checkRealTimer timer + commit_timeout | 
| 48308 | 1319 | in | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1320 | (case get_state ctxt of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1321 | NONE => "MaSh is busy\nPlease try again later" | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1322 |     | SOME {access_G, ...} =>
 | 
| 48304 | 1323 | let | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1324 | val is_in_access_G = is_fact_in_graph access_G o snd | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1325 | val no_new_facts = forall is_in_access_G facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1326 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1327 | if no_new_facts andalso not run_prover then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1328 | if auto_level < 2 then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1329 | "No new " ^ (if run_prover then "automatic" else "Isar") ^ " proofs to learn" ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1330 | (if auto_level = 0 andalso not run_prover then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1331 | "\n\nHint: Try " ^ sendback learn_proverN ^ " to learn from an automatic prover" | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1332 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1333 | "") | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1334 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1335 | "" | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1336 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1337 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1338 | val name_tabs = build_name_tables nickname_of_thm facts | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1339 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1340 | fun deps_of status th = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1341 | if status = Non_Rec_Def orelse status = Rec_Def then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1342 | SOME [] | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1343 | else if run_prover then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1344 | prover_dependencies_of ctxt params prover auto_level facts name_tabs th | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1345 | |> (fn (false, _) => NONE | (true, deps) => trim_dependencies deps) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1346 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1347 | isar_dependencies_of name_tabs th | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1348 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1349 | fun do_commit [] [] [] state = state | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1350 | | do_commit learns relearns flops | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1351 |                   {access_G, xtabs as ((num_facts0, _), _), ffds, freqs, dirty_facts} =
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1352 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1353 | val was_empty = Graph.is_empty access_G | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 1354 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1355 | val (learns, (access_G', xtabs')) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1356 | fold_map (learn_wrt_access_graph ctxt) learns (access_G, xtabs) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1357 | |>> map_filter I | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1358 | val (relearns, access_G'') = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1359 | fold_map (relearn_wrt_access_graph ctxt) relearns access_G' | 
| 57371 | 1360 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1361 | val access_G''' = access_G'' |> fold flop_wrt_access_graph flops | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1362 | val dirty_facts' = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1363 | (case (was_empty, dirty_facts) of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1364 | (false, SOME names) => SOME (map #1 learns @ map #1 relearns @ names) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1365 | | _ => NONE) | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 1366 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1367 | val (ffds', freqs') = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1368 | if null relearns then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1369 | recompute_ffds_freqs_from_learns | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1370 | (map (fn (name, _, feats, deps) => (name, feats, deps)) learns) xtabs' | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1371 | num_facts0 ffds freqs | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1372 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1373 | recompute_ffds_freqs_from_access_G access_G''' xtabs' | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1374 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1375 |                   {access_G = access_G''', xtabs = xtabs', ffds = ffds', freqs = freqs',
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1376 | dirty_facts = dirty_facts'} | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1377 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1378 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1379 | fun commit last learns relearns flops = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1380 | (if debug andalso auto_level = 0 then writeln "Committing..." else (); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1381 | map_state ctxt (do_commit (rev learns) relearns flops); | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1382 | if not last andalso auto_level = 0 then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1383 | let val num_proofs = length learns + length relearns in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1384 |                    writeln ("Learned " ^ string_of_int num_proofs ^ " " ^
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1385 | (if run_prover then "automatic" else "Isar") ^ " proof" ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1386 | plural_s num_proofs ^ " in the last " ^ string_of_time commit_timeout) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1387 | end | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1388 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1389 | ()) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1390 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1391 | fun learn_new_fact _ (accum as (_, (_, _, true))) = accum | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1392 | | learn_new_fact (parents, ((_, stature as (_, status)), th)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1393 | (learns, (num_nontrivial, next_commit, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1394 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1395 | val name = nickname_of_thm th | 
| 65458 | 1396 | val feats = features_of ctxt (Thm.theory_name th) stature [Thm.prop_of th] | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1397 | val deps = these (deps_of status th) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1398 | val num_nontrivial = num_nontrivial |> not (null deps) ? Integer.add 1 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1399 | val learns = (name, parents, feats, deps) :: learns | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1400 | val (learns, next_commit) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1401 | if Timer.checkRealTimer timer > next_commit then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1402 | (commit false learns [] []; ([], next_commit_time ())) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1403 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1404 | (learns, next_commit) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1405 | val timed_out = Timer.checkRealTimer timer > learn_timeout | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1406 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1407 | (learns, (num_nontrivial, next_commit, timed_out)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1408 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1409 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1410 | val (num_new_facts, num_nontrivial) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1411 | if no_new_facts then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1412 | (0, 0) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1413 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1414 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1415 | val new_facts = facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1416 | |> sort (crude_thm_ord ctxt o apply2 snd) | 
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 1417 | |> map (pair []) (* ignore parents *) | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1418 | |> filter_out (is_in_access_G o snd) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1419 | val (learns, (num_nontrivial, _, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1420 | ([], (0, next_commit_time (), false)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1421 | |> fold learn_new_fact new_facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1422 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1423 | commit true learns [] []; (length new_facts, num_nontrivial) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1424 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1425 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1426 | fun relearn_old_fact _ (accum as (_, (_, _, true))) = accum | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1427 | | relearn_old_fact ((_, (_, status)), th) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1428 | ((relearns, flops), (num_nontrivial, next_commit, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1429 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1430 | val name = nickname_of_thm th | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1431 | val (num_nontrivial, relearns, flops) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1432 | (case deps_of status th of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1433 | SOME deps => (num_nontrivial + 1, (name, deps) :: relearns, flops) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1434 | | NONE => (num_nontrivial, relearns, name :: flops)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1435 | val (relearns, flops, next_commit) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1436 | if Timer.checkRealTimer timer > next_commit then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1437 | (commit false [] relearns flops; ([], [], next_commit_time ())) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1438 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1439 | (relearns, flops, next_commit) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1440 | val timed_out = Timer.checkRealTimer timer > learn_timeout | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1441 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1442 | ((relearns, flops), (num_nontrivial, next_commit, timed_out)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1443 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1444 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1445 | val num_nontrivial = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1446 | if not run_prover then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1447 | num_nontrivial | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1448 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1449 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1450 | val max_isar = 1000 * max_dependencies | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1451 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1452 | fun priority_of th = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1453 | Random.random_range 0 max_isar + | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1454 | (case try (Graph.get_node access_G) (nickname_of_thm th) of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1455 | SOME (Isar_Proof, _, deps) => ~100 * length deps | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1456 | | SOME (Automatic_Proof, _, _) => 2 * max_isar | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1457 | | SOME (Isar_Proof_wegen_Prover_Flop, _, _) => max_isar | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1458 | | NONE => 0) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1459 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1460 | val old_facts = facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1461 | |> filter is_in_access_G | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1462 | |> map (`(priority_of o snd)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1463 | |> sort (int_ord o apply2 fst) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1464 | |> map snd | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1465 | val ((relearns, flops), (num_nontrivial, _, _)) = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1466 | (([], []), (num_nontrivial, next_commit_time (), false)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1467 | |> fold relearn_old_fact old_facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1468 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1469 | commit true [] relearns flops; num_nontrivial | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1470 | end | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1471 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1472 | if verbose orelse auto_level < 2 then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1473 | "Learned " ^ string_of_int num_new_facts ^ " fact" ^ plural_s num_new_facts ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1474 | " and " ^ string_of_int num_nontrivial ^ " nontrivial " ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1475 | (if run_prover then "automatic and " else "") ^ "Isar proof" ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1476 | plural_s num_nontrivial ^ | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1477 | (if verbose then " in " ^ string_of_time (Timer.checkRealTimer timer) else "") | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1478 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1479 | "" | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1480 | end | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1481 | end) | 
| 48308 | 1482 | end | 
| 48304 | 1483 | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1484 | fun mash_learn ctxt (params as {provers, timeout, ...}) fact_override chained run_prover =
 | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1485 | let | 
| 48396 | 1486 | val css = Sledgehammer_Fact.clasimpset_rule_table_of ctxt | 
| 48395 
85a7fb65507a
learning should honor the fact override and the chained facts
 blanchet parents: 
48394diff
changeset | 1487 | val ctxt = ctxt |> Config.put instantiate_inducts false | 
| 58919 | 1488 | val facts = | 
| 69593 | 1489 | nearly_all_facts ctxt false fact_override Keyword.empty_keywords css chained [] \<^prop>\<open>True\<close> | 
| 60641 | 1490 | |> sort (crude_thm_ord ctxt o apply2 snd o swap) | 
| 48404 | 1491 | val num_facts = length facts | 
| 1492 | val prover = hd provers | |
| 57006 | 1493 | |
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1494 | fun learn auto_level run_prover = | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1495 | mash_learn_facts ctxt params prover auto_level run_prover one_year facts | 
| 58843 | 1496 | |> writeln | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1497 | in | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1498 | if run_prover then | 
| 58843 | 1499 |       (writeln ("MaShing through " ^ string_of_int num_facts ^ " fact" ^
 | 
| 57017 | 1500 |          plural_s num_facts ^ " for automatic proofs (" ^ quote prover ^ " timeout: " ^
 | 
| 1501 | string_of_time timeout ^ ").\n\nCollecting Isar proofs first..."); | |
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1502 | learn 1 false; | 
| 63692 | 1503 | writeln "Now collecting automatic proofs\n\ | 
| 1504 | \This may take several hours; you can safely stop the learning process at any point"; | |
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1505 | learn 0 true) | 
| 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1506 | else | 
| 58843 | 1507 |       (writeln ("MaShing through " ^ string_of_int num_facts ^ " fact" ^
 | 
| 57006 | 1508 | plural_s num_facts ^ " for Isar proofs..."); | 
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1509 | learn 0 false) | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1510 | end | 
| 48249 | 1511 | |
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1512 | fun mash_can_suggest_facts ctxt = | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1513 | (case get_state ctxt of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1514 | NONE => false | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1515 |   | SOME {access_G, ...} => not (Graph.is_empty access_G))
 | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1516 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1517 | fun mash_can_suggest_facts_fast ctxt = | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1518 | (case peek_state ctxt of | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1519 | NONE => false | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1520 |   | SOME (_, {access_G, ...}) => not (Graph.is_empty access_G))
 | 
| 50311 | 1521 | |
| 57274 
0acfdb151e52
more generous formula -- there are lots of duplicates out there
 blanchet parents: 
57273diff
changeset | 1522 | (* Generate more suggestions than requested, because some might be thrown out later for various | 
| 
0acfdb151e52
more generous formula -- there are lots of duplicates out there
 blanchet parents: 
57273diff
changeset | 1523 | reasons (e.g., duplicates). *) | 
| 69706 
6d6235b828fc
get rid of visibility in MaSh -- it slows it down more than it helps
 blanchet parents: 
69593diff
changeset | 1524 | fun generous_max_suggestions max_facts = 2 * max_facts + 25 (* FUDGE *) | 
| 50383 
4274b25ff4e7
take proximity into account for MaSh + fix a debilitating bug in feature generation
 blanchet parents: 
50382diff
changeset | 1525 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1526 | val mepo_weight = 0.5 (* FUDGE *) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1527 | val mash_weight = 0.5 (* FUDGE *) | 
| 50814 | 1528 | |
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1529 | val max_facts_to_learn_before_query = 100 (* FUDGE *) | 
| 53152 
cbd3c7c48d2c
learn new facts on query if there aren't too many of them in MaSh
 blanchet parents: 
53150diff
changeset | 1530 | |
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1531 | (* The threshold should be large enough so that MaSh does not get activated for Auto Sledgehammer. *) | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1532 | val min_secs_for_learning = 10 | 
| 48318 | 1533 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1534 | fun relevant_facts ctxt (params as {verbose, learn, fact_filter, timeout, ...}) prover
 | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1535 |     max_facts ({add, only, ...} : fact_override) hyp_ts concl_t facts =
 | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1536 | if not (subset (op =) (the_list fact_filter, fact_filters)) then | 
| 63692 | 1537 |     error ("Unknown fact filter: " ^ quote (the fact_filter))
 | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1538 | else if only then | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1539 |     [("", map fact_of_raw_fact facts)]
 | 
| 48321 | 1540 | else if max_facts <= 0 orelse null facts then | 
| 51010 | 1541 |     [("", [])]
 | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1542 | else | 
| 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1543 | let | 
| 60649 
e007aa6a8aa2
more explicit use of context and elimination of Thm.theory_of_thm, although unclear (and untested?) situations remain;
 wenzelm parents: 
60641diff
changeset | 1544 | val thy_name = Context.theory_name (Proof_Context.theory_of ctxt) | 
| 57407 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 1545 | |
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1546 | fun maybe_launch_thread exact min_num_facts_to_learn = | 
| 59471 
ca459352d8c5
more explicit indication of Async_Manager_Legacy as Proof General legacy;
 wenzelm parents: 
59172diff
changeset | 1547 | if not (Async_Manager_Legacy.has_running_threads MaShN) andalso | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1548 | Time.toSeconds timeout >= min_secs_for_learning then | 
| 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1549 | let val timeout = time_mult learn_timeout_slack timeout in | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1550 | (if verbose then | 
| 58843 | 1551 |                writeln ("Started MaShing through " ^
 | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1552 | (if exact then "" else "up to ") ^ string_of_int min_num_facts_to_learn ^ | 
| 63692 | 1553 | " fact" ^ plural_s min_num_facts_to_learn ^ " in the background") | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1554 | else | 
| 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1555 | ()); | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1556 | launch_thread timeout | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1557 | (fn () => (true, mash_learn_facts ctxt params prover 2 false timeout facts)) | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 1558 | end | 
| 48318 | 1559 | else | 
| 1560 | () | |
| 57018 | 1561 | |
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1562 | val mash_enabled = is_mash_enabled () | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1563 | val mash_fast = mash_can_suggest_facts_fast ctxt | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1564 | |
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1565 | fun please_learn () = | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1566 | if mash_fast then | 
| 64522 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1567 | (case get_state ctxt of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1568 | NONE => maybe_launch_thread false (length facts) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1569 |           | SOME {access_G, xtabs = ((num_facts0, _), _), ...} =>
 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1570 | let | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1571 | val is_in_access_G = is_fact_in_graph access_G o snd | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1572 | val min_num_facts_to_learn = length facts - num_facts0 | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1573 | in | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1574 | if min_num_facts_to_learn <= max_facts_to_learn_before_query then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1575 | (case length (filter_out is_in_access_G facts) of | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1576 | 0 => () | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1577 | | num_facts_to_learn => | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1578 | if num_facts_to_learn <= max_facts_to_learn_before_query then | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1579 | mash_learn_facts ctxt params prover 2 false timeout facts | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1580 | |> (fn "" => () | s => writeln (MaShN ^ ": " ^ s)) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1581 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1582 | maybe_launch_thread true num_facts_to_learn) | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1583 | else | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1584 | maybe_launch_thread false min_num_facts_to_learn | 
| 
b66f8caf86b6
made MaSh faster and less likely to hang seemingly forever
 blanchet parents: 
63698diff
changeset | 1585 | end) | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1586 | else | 
| 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1587 | maybe_launch_thread false (length facts) | 
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1588 | |
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1589 | val _ = | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1590 | if learn andalso mash_enabled andalso fact_filter <> SOME mepoN then please_learn () else () | 
| 57018 | 1591 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1592 | val effective_fact_filter = | 
| 55286 | 1593 | (case fact_filter of | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1594 | SOME ff => ff | 
| 63697 
0afe49623cf9
avoid loading MaSh file first time around for higher responsiveness of Sledgehammer
 blanchet parents: 
63696diff
changeset | 1595 | | NONE => if mash_enabled andalso mash_fast then meshN else mepoN) | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1596 | |
| 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1597 | val unique_facts = drop_duplicate_facts facts | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1598 | val add_ths = Attrib.eval_thms ctxt add | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1599 | |
| 51004 
5f2788c38127
distinguish raw and non-raw facts, using raw for 10 000s of facts and non-raw after selection of some hundreds
 blanchet parents: 
51003diff
changeset | 1600 | fun in_add (_, th) = member Thm.eq_thm_prop add_ths th | 
| 57018 | 1601 | |
| 51003 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1602 | fun add_and_take accepts = | 
| 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1603 | (case add_ths of | 
| 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1604 | [] => accepts | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1605 | | _ => | 
| 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1606 | (unique_facts |> filter in_add |> map fact_of_raw_fact) @ (accepts |> filter_out in_add)) | 
| 48293 | 1607 | |> take max_facts | 
| 57018 | 1608 | |
| 48406 | 1609 | fun mepo () = | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1610 | (mepo_suggested_facts ctxt params max_facts NONE hyp_ts concl_t unique_facts | 
| 54091 | 1611 | |> weight_facts_steeply, []) | 
| 57018 | 1612 | |
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1613 | fun mash () = | 
| 61322 | 1614 | mash_suggested_facts ctxt thy_name params (generous_max_suggestions max_facts) hyp_ts | 
| 1615 | concl_t facts | |
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 1616 | |>> weight_facts_steeply | 
| 57018 | 1617 | |
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1618 | val mess = | 
| 51003 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1619 | (* the order is important for the "case" expression below *) | 
| 54091 | 1620 | [] |> effective_fact_filter <> mepoN ? cons (mash_weight, mash) | 
| 1621 | |> effective_fact_filter <> mashN ? cons (mepo_weight, mepo) | |
| 1622 | |> Par_List.map (apsnd (fn f => f ())) | |
| 61322 | 1623 | val mesh = | 
| 1624 | mesh_facts (fact_distinct (op aconv)) (eq_snd (gen_eq_thm ctxt)) max_facts mess | |
| 1625 | |> add_and_take | |
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1626 | in | 
| 55286 | 1627 | (case (fact_filter, mess) of | 
| 51024 
98fb341d32e3
distinguish MeSh and smart -- with smart, allow combinations of MaSh, MeSh, and MePo in different slices -- and use MaSh also with SMT solvers, based on evaluation
 blanchet parents: 
51020diff
changeset | 1628 | (NONE, [(_, (mepo, _)), (_, (mash, _))]) => | 
| 57658 | 1629 | [(meshN, mesh), | 
| 1630 | (mepoN, mepo |> map fst |> add_and_take), | |
| 51010 | 1631 | (mashN, mash |> map fst |> add_and_take)] | 
| 55286 | 1632 | | _ => [(effective_fact_filter, mesh)]) | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1633 | end | 
| 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1634 | |
| 48248 | 1635 | end; |