| author | wenzelm | 
| Fri, 20 Mar 2015 22:18:40 +0100 | |
| changeset 59771 | c6e60787ffe2 | 
| parent 59621 | 291934bac95e | 
| child 59875 | 9779b0c59ad4 | 
| permissions | -rw-r--r-- | 
| 48380 | 1 | (* Title: HOL/Tools/Sledgehammer/sledgehammer_mash.ML | 
| 48248 | 2 | Author: Jasmin Blanchette, TU Muenchen | 
| 57009 | 3 | Author: Cezary Kaliszyk, University of Innsbruck | 
| 48248 | 4 | |
| 5 | Sledgehammer's machine-learning-based relevance filter (MaSh). | |
| 6 | *) | |
| 7 | ||
| 48381 | 8 | signature SLEDGEHAMMER_MASH = | 
| 48248 | 9 | sig | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 10 | type stature = ATP_Problem_Generate.stature | 
| 51004 
5f2788c38127
distinguish raw and non-raw facts, using raw for 10 000s of facts and non-raw after selection of some hundreds
 blanchet parents: 
51003diff
changeset | 11 | type raw_fact = Sledgehammer_Fact.raw_fact | 
| 48296 
e7f01b7e244e
gracefully handle the case of empty theories when going up the accessibility chain
 blanchet parents: 
48293diff
changeset | 12 | type fact = Sledgehammer_Fact.fact | 
| 
e7f01b7e244e
gracefully handle the case of empty theories when going up the accessibility chain
 blanchet parents: 
48293diff
changeset | 13 | type fact_override = Sledgehammer_Fact.fact_override | 
| 55201 | 14 | type params = Sledgehammer_Prover.params | 
| 15 | type prover_result = Sledgehammer_Prover.prover_result | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 16 | |
| 48308 | 17 | val trace : bool Config.T | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 18 | val duplicates : bool Config.T | 
| 51008 | 19 | val MePoN : string | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 20 | val MaShN : string | 
| 51008 | 21 | val MeShN : string | 
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 22 | val mepoN : string | 
| 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 23 | val mashN : string | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 24 | val meshN : string | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 25 | val unlearnN : string | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 26 | val learn_isarN : string | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 27 | val learn_proverN : string | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 28 | val relearn_isarN : string | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 29 | val relearn_proverN : string | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 30 | val fact_filters : string list | 
| 50826 | 31 | val encode_str : string -> string | 
| 32 | val encode_strs : string list -> string | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 33 | val decode_str : string -> string | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 34 | val decode_strs : string -> string list | 
| 50632 
12c097ff3241
slightly more elegant naming convention (to keep low-level and high-level APIs separated)
 blanchet parents: 
50631diff
changeset | 35 | |
| 57532 | 36 | datatype mash_algorithm = | 
| 57458 | 37 | MaSh_NB | 
| 57459 | 38 | | MaSh_kNN | 
| 39 | | MaSh_NB_kNN | |
| 57458 | 40 | | MaSh_NB_Ext | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 41 | | MaSh_kNN_Ext | 
| 57106 | 42 | |
| 57120 | 43 | val is_mash_enabled : unit -> bool | 
| 57532 | 44 | val the_mash_algorithm : unit -> mash_algorithm | 
| 57557 | 45 | val str_of_mash_algorithm : mash_algorithm -> string | 
| 57120 | 46 | |
| 57459 | 47 |   val mesh_facts : ('a * 'a -> bool) -> int -> (real * (('a * real) list * 'a list)) list -> 'a list
 | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 48 | val nickname_of_thm : thm -> string | 
| 57006 | 49 |   val find_suggested_facts : Proof.context -> ('b * thm) list -> string list -> ('b * thm) list
 | 
| 51135 | 50 | val crude_thm_ord : thm * thm -> order | 
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 51 | val thm_less : thm * thm -> bool | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 52 | val goal_of_thm : theory -> thm -> thm | 
| 57006 | 53 | val run_prover_for_mash : Proof.context -> params -> string -> string -> fact list -> thm -> | 
| 54 | prover_result | |
| 57406 | 55 | val features_of : Proof.context -> theory -> stature -> term list -> string list | 
| 51177 | 56 | val trim_dependencies : string list -> string list option | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 57 | val isar_dependencies_of : string Symtab.table * string Symtab.table -> thm -> string list option | 
| 57006 | 58 | val prover_dependencies_of : Proof.context -> params -> string -> int -> raw_fact list -> | 
| 59 | string Symtab.table * string Symtab.table -> thm -> bool * string list | |
| 60 |   val attach_parents_to_facts : ('a * thm) list -> ('a * thm) list ->
 | |
| 61 |     (string list * ('a * thm)) list
 | |
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 62 | val num_extra_feature_facts : int | 
| 53141 
d27e99a6a679
take chained and proximate facts into consideration when computing MaSh features
 blanchet parents: 
53140diff
changeset | 63 | val extra_feature_factor : real | 
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 64 |   val weight_facts_smoothly : 'a list -> ('a * real) list
 | 
| 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 65 |   val weight_facts_steeply : 'a list -> ('a * real) list
 | 
| 57432 | 66 |   val find_mash_suggestions : Proof.context -> int -> string list -> ('a * thm) list ->
 | 
| 67 |     ('a * thm) list -> ('a * thm) list -> ('a * thm) list * ('a * thm) list
 | |
| 57407 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 68 | val mash_suggested_facts : Proof.context -> theory -> params -> int -> term list -> term -> | 
| 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 69 | raw_fact list -> fact list * fact list | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 70 | |
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 71 | val mash_unlearn : unit -> unit | 
| 54503 | 72 |   val mash_learn_proof : Proof.context -> params -> term -> ('a * thm) list -> thm list -> unit
 | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 73 | val mash_learn_facts : Proof.context -> params -> string -> int -> bool -> Time.time -> | 
| 57120 | 74 | raw_fact list -> string | 
| 57006 | 75 | val mash_learn : Proof.context -> params -> fact_override -> thm list -> bool -> unit | 
| 57432 | 76 | val mash_can_suggest_facts : Proof.context -> bool | 
| 53819 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 77 | |
| 57108 
dc0b4f50e288
more generous max number of suggestions, for more safety
 blanchet parents: 
57107diff
changeset | 78 | val generous_max_suggestions : int -> int | 
| 50814 | 79 | val mepo_weight : real | 
| 80 | val mash_weight : real | |
| 57006 | 81 | val relevant_facts : Proof.context -> params -> string -> int -> fact_override -> term list -> | 
| 82 | term -> raw_fact list -> (string * fact list) list | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 83 | val kill_learners : unit -> unit | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 84 | val running_learners : unit -> unit | 
| 48248 | 85 | end; | 
| 86 | ||
| 48381 | 87 | structure Sledgehammer_MaSh : SLEDGEHAMMER_MASH = | 
| 48248 | 88 | struct | 
| 48249 | 89 | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 90 | open ATP_Util | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 91 | open ATP_Problem_Generate | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 92 | open Sledgehammer_Util | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 93 | open Sledgehammer_Fact | 
| 55201 | 94 | open Sledgehammer_Prover | 
| 55202 
824c48a539c9
renamed many Sledgehammer ML files to clarify structure
 blanchet parents: 
55201diff
changeset | 95 | open Sledgehammer_Prover_Minimize | 
| 48381 | 96 | open Sledgehammer_MePo | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 97 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 98 | val trace = Attrib.setup_config_bool @{binding sledgehammer_mash_trace} (K false)
 | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 99 | val duplicates = Attrib.setup_config_bool @{binding sledgehammer_fact_duplicates} (K false)
 | 
| 51032 
69da236d7838
added option to use SNoW as machine learning algo
 blanchet parents: 
51029diff
changeset | 100 | |
| 48308 | 101 | fun trace_msg ctxt msg = if Config.get ctxt trace then tracing (msg ()) else () | 
| 102 | ||
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 103 | fun gen_eq_thm ctxt = if Config.get ctxt duplicates then Thm.eq_thm_strict else Thm.eq_thm_prop | 
| 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 104 | |
| 51008 | 105 | val MePoN = "MePo" | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 106 | val MaShN = "MaSh" | 
| 51008 | 107 | val MeShN = "MeSh" | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 108 | |
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 109 | val mepoN = "mepo" | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 110 | val mashN = "mash" | 
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 111 | val meshN = "mesh" | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 112 | |
| 48379 
2b5ad61e2ccc
renamed "iter" fact filter to "MePo" (Meng--Paulson)
 blanchet parents: 
48378diff
changeset | 113 | val fact_filters = [meshN, mepoN, mashN] | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 114 | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 115 | val unlearnN = "unlearn" | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 116 | val learn_isarN = "learn_isar" | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 117 | val learn_proverN = "learn_prover" | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 118 | val relearn_isarN = "relearn_isar" | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 119 | val relearn_proverN = "relearn_prover" | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 120 | |
| 57368 | 121 | fun map_array_at ary f i = Array.update (ary, i, f (Array.sub (ary, i))) | 
| 122 | ||
| 57371 | 123 | type xtab = int * int Symtab.table | 
| 124 | ||
| 125 | val empty_xtab = (0, Symtab.empty) | |
| 126 | ||
| 127 | fun add_to_xtab key (next, tab) = (next + 1, Symtab.update_new (key, next) tab) | |
| 128 | fun maybe_add_to_xtab key = perhaps (try (add_to_xtab key)) | |
| 129 | ||
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 130 | fun state_file () = Path.expand (Path.explode "$ISABELLE_HOME_USER/mash_state") | 
| 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 131 | val remove_state_file = try File.rm o state_file | 
| 48330 | 132 | |
| 57532 | 133 | datatype mash_algorithm = | 
| 57458 | 134 | MaSh_NB | 
| 57459 | 135 | | MaSh_kNN | 
| 136 | | MaSh_NB_kNN | |
| 57458 | 137 | | MaSh_NB_Ext | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 138 | | MaSh_kNN_Ext | 
| 57278 | 139 | |
| 57554 | 140 | (* TODO: eliminate "MASH" environment variable after Isabelle2014 release *) | 
| 57532 | 141 | fun mash_algorithm () = | 
| 57089 | 142 |   let val flag1 = Options.default_string @{system_option MaSh} in
 | 
| 57028 | 143 | (case if flag1 <> "none" (* default *) then flag1 else getenv "MASH" of | 
| 57462 | 144 | "yes" => SOME MaSh_NB_kNN | 
| 145 | | "sml" => SOME MaSh_NB_kNN | |
| 57458 | 146 | | "nb" => SOME MaSh_NB | 
| 57459 | 147 | | "knn" => SOME MaSh_kNN | 
| 148 | | "nb_knn" => SOME MaSh_NB_kNN | |
| 57458 | 149 | | "nb_ext" => SOME MaSh_NB_Ext | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 150 | | "knn_ext" => SOME MaSh_kNN_Ext | 
| 57554 | 151 | | "" => NONE | 
| 57532 | 152 |     | algorithm => (warning ("Unknown MaSh algorithm: " ^ quote algorithm ^ "."); NONE))
 | 
| 57028 | 153 | end | 
| 57018 | 154 | |
| 57532 | 155 | val is_mash_enabled = is_some o mash_algorithm | 
| 156 | val the_mash_algorithm = the_default MaSh_NB_kNN o mash_algorithm | |
| 57018 | 157 | |
| 57557 | 158 | fun str_of_mash_algorithm MaSh_NB = "nb" | 
| 159 | | str_of_mash_algorithm MaSh_kNN = "knn" | |
| 160 | | str_of_mash_algorithm MaSh_NB_kNN = "nb_knn" | |
| 161 | | str_of_mash_algorithm MaSh_NB_Ext = "nb_ext" | |
| 162 | | str_of_mash_algorithm MaSh_kNN_Ext = "knn_ext" | |
| 163 | ||
| 57459 | 164 | fun scaled_avg [] = 0 | 
| 165 | | scaled_avg xs = Real.ceil (100000000.0 * fold (curry (op +)) xs 0.0) div length xs | |
| 166 | ||
| 167 | fun avg [] = 0.0 | |
| 168 | | avg xs = fold (curry (op +)) xs 0.0 / Real.fromInt (length xs) | |
| 169 | ||
| 170 | fun normalize_scores _ [] = [] | |
| 171 | | normalize_scores max_facts xs = | |
| 172 | map (apsnd (curry Real.* (1.0 / avg (map snd (take max_facts xs))))) xs | |
| 173 | ||
| 174 | fun mesh_facts fact_eq max_facts [(_, (sels, unks))] = | |
| 175 | distinct fact_eq (map fst (take max_facts sels) @ take (max_facts - length sels) unks) | |
| 176 | | mesh_facts fact_eq max_facts mess = | |
| 177 | let | |
| 178 | val mess = mess |> map (apsnd (apfst (normalize_scores max_facts))) | |
| 179 | ||
| 180 | fun score_in fact (global_weight, (sels, unks)) = | |
| 181 | let val score_at = try (nth sels) #> Option.map (fn (_, score) => global_weight * score) in | |
| 182 | (case find_index (curry fact_eq fact o fst) sels of | |
| 183 | ~1 => if member fact_eq unks fact then NONE else SOME 0.0 | |
| 184 | | rank => score_at rank) | |
| 185 | end | |
| 186 | ||
| 187 | fun weight_of fact = mess |> map_filter (score_in fact) |> scaled_avg | |
| 188 | in | |
| 189 | fold (union fact_eq o map fst o take max_facts o fst o snd) mess [] | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 190 | |> map (`weight_of) |> sort (int_ord o apply2 fst o swap) | 
| 57459 | 191 | |> map snd |> take max_facts | 
| 192 | end | |
| 193 | ||
| 194 | fun smooth_weight_of_fact rank = Math.pow (1.3, 15.5 - 0.2 * Real.fromInt rank) + 15.0 (* FUDGE *) | |
| 195 | fun steep_weight_of_fact rank = Math.pow (0.62, log2 (Real.fromInt (rank + 1))) (* FUDGE *) | |
| 196 | ||
| 197 | fun weight_facts_smoothly facts = facts ~~ map smooth_weight_of_fact (0 upto length facts - 1) | |
| 198 | fun weight_facts_steeply facts = facts ~~ map steep_weight_of_fact (0 upto length facts - 1) | |
| 199 | ||
| 57660 | 200 | fun sort_array_suffix cmp needed a = | 
| 57009 | 201 | let | 
| 57406 | 202 | exception BOTTOM of int | 
| 203 | ||
| 57460 | 204 | val al = Array.length a | 
| 205 | ||
| 57009 | 206 | fun maxson l i = | 
| 57017 | 207 | let val i31 = i + i + i + 1 in | 
| 57009 | 208 | if i31 + 2 < l then | 
| 57017 | 209 | let val x = Unsynchronized.ref i31 in | 
| 210 | if cmp (Array.sub (a, i31), Array.sub (a, i31 + 1)) = LESS then x := i31 + 1 else (); | |
| 211 | if cmp (Array.sub (a, !x), Array.sub (a, i31 + 2)) = LESS then x := i31 + 2 else (); | |
| 57009 | 212 | !x | 
| 213 | end | |
| 214 | else | |
| 215 | if i31 + 1 < l andalso cmp (Array.sub (a, i31), Array.sub (a, i31 + 1)) = LESS | |
| 216 | then i31 + 1 else if i31 < l then i31 else raise BOTTOM i | |
| 217 | end | |
| 218 | ||
| 219 | fun trickledown l i e = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 220 | let val j = maxson l i in | 
| 57009 | 221 | if cmp (Array.sub (a, j), e) = GREATER then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 222 | (Array.update (a, i, Array.sub (a, j)); trickledown l j e) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 223 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 224 | Array.update (a, i, e) | 
| 57009 | 225 | end | 
| 226 | ||
| 227 | fun trickle l i e = trickledown l i e handle BOTTOM i => Array.update (a, i, e) | |
| 228 | ||
| 229 | fun bubbledown l i = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 230 | let val j = maxson l i in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 231 | Array.update (a, i, Array.sub (a, j)); | 
| 57009 | 232 | bubbledown l j | 
| 233 | end | |
| 234 | ||
| 235 | fun bubble l i = bubbledown l i handle BOTTOM i => i | |
| 236 | ||
| 237 | fun trickleup i e = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 238 | let val father = (i - 1) div 3 in | 
| 57009 | 239 | if cmp (Array.sub (a, father), e) = LESS then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 240 | (Array.update (a, i, Array.sub (a, father)); | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 241 | if father > 0 then trickleup father e else Array.update (a, 0, e)) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 242 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 243 | Array.update (a, i, e) | 
| 57009 | 244 | end | 
| 245 | ||
| 57355 | 246 | fun for i = if i < 0 then () else (trickle al i (Array.sub (a, i)); for (i - 1)) | 
| 57009 | 247 | |
| 248 | fun for2 i = | |
| 57660 | 249 | if i < Integer.max 2 (al - needed) then | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 250 | () | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 251 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 252 | let val e = Array.sub (a, i) in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 253 | Array.update (a, i, Array.sub (a, 0)); | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 254 | trickleup (bubble i 0) e; | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 255 | for2 (i - 1) | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 256 | end | 
| 57009 | 257 | in | 
| 57355 | 258 | for (((al + 1) div 3) - 1); | 
| 259 | for2 (al - 1); | |
| 260 | if al > 1 then | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 261 | let val e = Array.sub (a, 1) in | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 262 | Array.update (a, 1, Array.sub (a, 0)); | 
| 57009 | 263 | Array.update (a, 0, e) | 
| 264 | end | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 265 | else | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 266 | () | 
| 57009 | 267 | end | 
| 268 | ||
| 57660 | 269 | fun rev_sort_list_prefix cmp needed xs = | 
| 57460 | 270 | let val ary = Array.fromList xs in | 
| 57660 | 271 | sort_array_suffix cmp needed ary; | 
| 272 | Array.foldl (op ::) [] ary | |
| 57460 | 273 | end | 
| 274 | ||
| 275 | ||
| 276 | (*** Isabelle-agnostic machine learning ***) | |
| 277 | ||
| 278 | structure MaSh = | |
| 279 | struct | |
| 280 | ||
| 57574 | 281 | fun select_visible_facts (big_number : real) recommends = | 
| 57364 | 282 | List.app (fn at => | 
| 283 | let val (j, ov) = Array.sub (recommends, at) in | |
| 57371 | 284 | Array.update (recommends, at, (j, big_number + ov)) | 
| 57364 | 285 | end) | 
| 286 | ||
| 57374 | 287 | fun wider_array_of_vector init vec = | 
| 288 | let val ary = Array.array init in | |
| 289 |     Array.copyVec {src = vec, dst = ary, di = 0};
 | |
| 290 | ary | |
| 291 | end | |
| 292 | ||
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 293 | val nb_def_prior_weight = 1000 (* FUDGE *) | 
| 57095 | 294 | |
| 57374 | 295 | fun learn_facts (tfreq0, sfreq0, dffreq0) num_facts0 num_facts num_feats depss featss = | 
| 57354 | 296 | let | 
| 57374 | 297 | val tfreq = wider_array_of_vector (num_facts, 0) tfreq0 | 
| 298 | val sfreq = wider_array_of_vector (num_facts, Inttab.empty) sfreq0 | |
| 299 | val dffreq = wider_array_of_vector (num_feats, 0) dffreq0 | |
| 300 | ||
| 301 | fun learn_one th feats deps = | |
| 57354 | 302 | let | 
| 57355 | 303 | fun add_th weight t = | 
| 304 | let | |
| 305 | val im = Array.sub (sfreq, t) | |
| 57374 | 306 | fun fold_fn s = Inttab.map_default (s, 0) (Integer.add weight) | 
| 57355 | 307 | in | 
| 57368 | 308 | map_array_at tfreq (Integer.add weight) t; | 
| 57355 | 309 | Array.update (sfreq, t, fold fold_fn feats im) | 
| 310 | end | |
| 311 | ||
| 57368 | 312 | val add_sym = map_array_at dffreq (Integer.add 1) | 
| 57354 | 313 | in | 
| 57355 | 314 | add_th nb_def_prior_weight th; | 
| 315 | List.app (add_th 1) deps; | |
| 316 | List.app add_sym feats | |
| 57354 | 317 | end | 
| 318 | ||
| 57102 | 319 | fun for i = | 
| 57383 | 320 | if i = num_facts then () | 
| 321 | else (learn_one i (Vector.sub (featss, i)) (Vector.sub (depss, i)); for (i + 1)) | |
| 57102 | 322 | in | 
| 57383 | 323 | for num_facts0; | 
| 57374 | 324 | (Array.vector tfreq, Array.vector sfreq, Array.vector dffreq) | 
| 57102 | 325 | end | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 326 | |
| 57376 | 327 | fun naive_bayes (tfreq, sfreq, dffreq) num_facts max_suggs visible_facts goal_feats = | 
| 57278 | 328 | let | 
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 329 | val tau = 0.2 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 330 | val pos_weight = 5.0 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 331 | val def_val = ~18.0 (* FUDGE *) | 
| 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 332 | val init_val = 30.0 (* FUDGE *) | 
| 57124 | 333 | |
| 57366 | 334 | val ln_afreq = Math.ln (Real.fromInt num_facts) | 
| 57374 | 335 | val idf = Vector.map (fn i => ln_afreq - Math.ln (Real.fromInt i)) dffreq | 
| 57366 | 336 | |
| 57359 | 337 | fun tfidf feat = Vector.sub (idf, feat) | 
| 57102 | 338 | |
| 339 | fun log_posterior i = | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 340 | let | 
| 57374 | 341 | val tfreq = Real.fromInt (Vector.sub (tfreq, i)) | 
| 57097 | 342 | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 343 | fun add_feat (f, fw0) (res, sfh) = | 
| 57102 | 344 | (case Inttab.lookup sfh f of | 
| 345 | SOME sf => | |
| 57401 
02f56126b4e4
reintroduced 'extra features' but with lower weight than before (to account for tfidf)
 blanchet parents: 
57387diff
changeset | 346 | (res + fw0 * tfidf f * Math.ln (pos_weight * Real.fromInt sf / tfreq), | 
| 57102 | 347 | Inttab.delete f sfh) | 
| 57409 | 348 | | NONE => (res + fw0 * tfidf f * def_val, sfh)) | 
| 57097 | 349 | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 350 | val (res, sfh) = fold add_feat goal_feats (init_val * Math.ln tfreq, Vector.sub (sfreq, i)) | 
| 57097 | 351 | |
| 57409 | 352 | fun fold_sfh (f, sf) sow = | 
| 59476 | 353 | sow + tfidf f * Math.ln (1.0 - Real.fromInt (sf - 1) / tfreq) | 
| 57097 | 354 | |
| 57102 | 355 | val sum_of_weights = Inttab.fold fold_sfh sfh 0.0 | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 356 | in | 
| 57278 | 357 | res + tau * sum_of_weights | 
| 57097 | 358 | end | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 359 | |
| 57356 | 360 | val posterior = Array.tabulate (num_facts, (fn j => (j, log_posterior j))) | 
| 57102 | 361 | |
| 57364 | 362 | fun ret at acc = | 
| 363 | if at = num_facts then acc else ret (at + 1) (Array.sub (posterior, at) :: acc) | |
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 364 | in | 
| 57371 | 365 | select_visible_facts 100000.0 posterior visible_facts; | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 366 | sort_array_suffix (Real.compare o apply2 snd) max_suggs posterior; | 
| 57364 | 367 | ret (Integer.max 0 (num_facts - max_suggs)) [] | 
| 57029 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 368 | end | 
| 
75cc30d2b83f
added naive Bayes ML implementation, due to Cezary Kaliszyk (like k-NN)
 blanchet parents: 
57028diff
changeset | 369 | |
| 59477 | 370 | val initial_k = 0 | 
| 57458 | 371 | |
| 57459 | 372 | fun k_nearest_neighbors dffreq num_facts num_feats depss featss max_suggs visible_facts goal_feats = | 
| 57458 | 373 | let | 
| 374 | exception EXIT of unit | |
| 375 | ||
| 376 | val ln_afreq = Math.ln (Real.fromInt num_facts) | |
| 377 | fun tfidf feat = ln_afreq - Math.ln (Real.fromInt (Vector.sub (dffreq, feat))) | |
| 378 | ||
| 379 | val overlaps_sqr = Array.tabulate (num_facts, rpair 0.0) | |
| 380 | ||
| 57459 | 381 | val feat_facts = Array.array (num_feats, []) | 
| 382 | val _ = Vector.foldl (fn (feats, fact) => | |
| 383 | (List.app (map_array_at feat_facts (cons fact)) feats; fact + 1)) 0 featss | |
| 384 | ||
| 57458 | 385 | fun do_feat (s, sw0) = | 
| 386 | let | |
| 387 | val sw = sw0 * tfidf s | |
| 59476 | 388 | val w6 = Math.pow (sw, 6.0 (* FUDGE *)) | 
| 57458 | 389 | |
| 390 | fun inc_overlap j = | |
| 391 | let val (_, ov) = Array.sub (overlaps_sqr, j) in | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 392 | Array.update (overlaps_sqr, j, (j, w6 + ov)) | 
| 57458 | 393 | end | 
| 394 | in | |
| 395 | List.app inc_overlap (Array.sub (feat_facts, s)) | |
| 396 | end | |
| 397 | ||
| 398 | val _ = List.app do_feat goal_feats | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 399 | val _ = sort_array_suffix (Real.compare o apply2 snd) num_facts overlaps_sqr | 
| 57458 | 400 | val no_recommends = Unsynchronized.ref 0 | 
| 401 | val recommends = Array.tabulate (num_facts, rpair 0.0) | |
| 402 | val age = Unsynchronized.ref 500000000.0 | |
| 403 | ||
| 404 | fun inc_recommend v j = | |
| 405 | let val (_, ov) = Array.sub (recommends, j) in | |
| 406 | if ov <= 0.0 then | |
| 407 | (no_recommends := !no_recommends + 1; Array.update (recommends, j, (j, !age + ov))) | |
| 59476 | 408 | else | 
| 409 | Array.update (recommends, j, (j, v + ov)) | |
| 57458 | 410 | end | 
| 411 | ||
| 412 | val k = Unsynchronized.ref 0 | |
| 413 | fun do_k k = | |
| 414 | if k >= num_facts then | |
| 415 | raise EXIT () | |
| 416 | else | |
| 417 | let | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 418 | val deps_factor = 2.7 (* FUDGE *) | 
| 57458 | 419 | val (j, o2) = Array.sub (overlaps_sqr, num_facts - k - 1) | 
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 420 | val _ = inc_recommend o2 j | 
| 57458 | 421 | val ds = Vector.sub (depss, j) | 
| 422 | val l = Real.fromInt (length ds) | |
| 423 | in | |
| 57531 
4d9895d39b59
improvements to the machine learning algos (due to Cezary K.)
 blanchet parents: 
57462diff
changeset | 424 | List.app (inc_recommend (deps_factor * o2 / l)) ds | 
| 57458 | 425 | end | 
| 426 | ||
| 427 | fun while1 () = | |
| 59477 | 428 | if !k = initial_k + 1 then () else (do_k (!k); k := !k + 1; while1 ()) | 
| 57458 | 429 | handle EXIT () => () | 
| 430 | ||
| 431 | fun while2 () = | |
| 432 | if !no_recommends >= max_suggs then () | |
| 433 | else (do_k (!k); k := !k + 1; age := !age - 10000.0; while2 ()) | |
| 434 | handle EXIT () => () | |
| 435 | ||
| 436 | fun ret acc at = | |
| 437 | if at = num_facts then acc else ret (Array.sub (recommends, at) :: acc) (at + 1) | |
| 438 | in | |
| 439 | while1 (); | |
| 440 | while2 (); | |
| 441 | select_visible_facts 1000000000.0 recommends visible_facts; | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 442 | sort_array_suffix (Real.compare o apply2 snd) max_suggs recommends; | 
| 57458 | 443 | ret [] (Integer.max 0 (num_facts - max_suggs)) | 
| 444 | end | |
| 445 | ||
| 57125 
2f620ef839ee
added another way of invoking Python code, for experiments
 blanchet parents: 
57124diff
changeset | 446 | (* experimental *) | 
| 57376 | 447 | fun external_tool tool max_suggs learns goal_feats = | 
| 57291 | 448 | let | 
| 57297 | 449 | val ser = string_of_int (serial ()) (* poor person's attempt at thread-safety *) | 
| 450 |     val ocs = TextIO.openOut ("adv_syms" ^ ser)
 | |
| 451 |     val ocd = TextIO.openOut ("adv_deps" ^ ser)
 | |
| 452 |     val ocq = TextIO.openOut ("adv_seq" ^ ser)
 | |
| 453 |     val occ = TextIO.openOut ("adv_conj" ^ ser)
 | |
| 57296 | 454 | |
| 57291 | 455 | fun os oc s = TextIO.output (oc, s) | 
| 57296 | 456 | |
| 57297 | 457 | fun ol _ _ _ [] = () | 
| 458 | | ol _ f _ [e] = f e | |
| 57291 | 459 | | ol oc f sep (h :: t) = (f h; os oc sep; ol oc f sep t) | 
| 57296 | 460 | |
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 461 | fun do_learn (name, feats, deps) = | 
| 57357 | 462 | (os ocs name; os ocs ":"; ol ocs (os ocs o quote) ", " feats; os ocs "\n"; | 
| 57297 | 463 | os ocd name; os ocd ":"; ol ocd (os ocd) " " deps; os ocd "\n"; os ocq name; os ocq "\n") | 
| 57296 | 464 | |
| 57291 | 465 | fun forkexec no = | 
| 466 | let | |
| 467 | val cmd = | |
| 57297 | 468 | "~/misc/" ^ tool ^ " adv_syms" ^ ser ^ " adv_deps" ^ ser ^ " " ^ string_of_int no ^ | 
| 469 | " adv_seq" ^ ser ^ " < adv_conj" ^ ser | |
| 57291 | 470 | in | 
| 471 | fst (Isabelle_System.bash_output cmd) | |
| 472 | |> space_explode " " | |
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 473 | |> filter_out (curry (op =) "") | 
| 57291 | 474 | end | 
| 475 | in | |
| 57401 
02f56126b4e4
reintroduced 'extra features' but with lower weight than before (to account for tfidf)
 blanchet parents: 
57387diff
changeset | 476 | (List.app do_learn learns; ol occ (os occ o quote) ", " (map fst goal_feats); | 
| 57297 | 477 | TextIO.closeOut ocs; TextIO.closeOut ocd; TextIO.closeOut ocq; TextIO.closeOut occ; | 
| 57294 
ef9d4e1ceb00
use strings to communicate with external process, to ease debugging
 blanchet parents: 
57291diff
changeset | 478 | forkexec max_suggs) | 
| 57291 | 479 | end | 
| 480 | ||
| 57561 | 481 | fun k_nearest_neighbors_ext max_suggs = | 
| 59477 | 482 |   external_tool ("newknn/knn" ^ " " ^ string_of_int initial_k) max_suggs
 | 
| 57561 | 483 | fun naive_bayes_ext max_suggs = external_tool "predict/nbayes" max_suggs | 
| 57374 | 484 | |
| 57532 | 485 | fun query_external ctxt algorithm max_suggs learns goal_feats = | 
| 57432 | 486 | (trace_msg ctxt (fn () => "MaSh query external " ^ commas (map fst goal_feats)); | 
| 57532 | 487 | (case algorithm of | 
| 57458 | 488 | MaSh_NB_Ext => naive_bayes_ext max_suggs learns goal_feats | 
| 489 | | MaSh_kNN_Ext => k_nearest_neighbors_ext max_suggs learns goal_feats)) | |
| 57371 | 490 | |
| 57532 | 491 | fun query_internal ctxt algorithm num_facts num_feats (fact_names, featss, depss) | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 492 | (freqs as (_, _, dffreq)) visible_facts max_suggs goal_feats int_goal_feats = | 
| 57459 | 493 | let | 
| 494 | fun nb () = | |
| 495 | naive_bayes freqs num_facts max_suggs visible_facts int_goal_feats | |
| 496 | |> map fst | |
| 497 | fun knn () = | |
| 498 | k_nearest_neighbors dffreq num_facts num_feats depss featss max_suggs visible_facts | |
| 499 | int_goal_feats | |
| 500 | |> map fst | |
| 501 | in | |
| 502 |     (trace_msg ctxt (fn () => "MaSh query internal " ^ commas (map fst goal_feats) ^ " from {" ^
 | |
| 503 | elide_string 1000 (space_implode " " (Vector.foldr (op ::) [] fact_names)) ^ "}"); | |
| 57532 | 504 | (case algorithm of | 
| 57459 | 505 | MaSh_NB => nb () | 
| 506 | | MaSh_kNN => knn () | |
| 507 | | MaSh_NB_kNN => | |
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 508 | mesh_facts (op =) max_suggs | 
| 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 509 | [(0.5 (* FUDGE *), (weight_facts_steeply (nb ()), [])), | 
| 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 510 | (0.5 (* FUDGE *), (weight_facts_steeply (knn ()), []))]) | 
| 57459 | 511 | |> map (curry Vector.sub fact_names)) | 
| 512 | end | |
| 57007 | 513 | |
| 514 | end; | |
| 515 | ||
| 516 | ||
| 57432 | 517 | (*** Persistent, stringly-typed state ***) | 
| 518 | ||
| 519 | fun meta_char c = | |
| 520 |   if Char.isAlphaNum c orelse c = #"_" orelse c = #"." orelse c = #"(" orelse c = #")" orelse
 | |
| 521 | c = #"," then | |
| 522 | String.str c | |
| 523 | else | |
| 524 | (* fixed width, in case more digits follow *) | |
| 525 | "%" ^ stringN_of_int 3 (Char.ord c) | |
| 526 | ||
| 527 | fun unmeta_chars accum [] = String.implode (rev accum) | |
| 528 | | unmeta_chars accum (#"%" :: d1 :: d2 :: d3 :: cs) = | |
| 529 | (case Int.fromString (String.implode [d1, d2, d3]) of | |
| 530 | SOME n => unmeta_chars (Char.chr n :: accum) cs | |
| 531 | | NONE => "" (* error *)) | |
| 532 | | unmeta_chars _ (#"%" :: _) = "" (* error *) | |
| 533 | | unmeta_chars accum (c :: cs) = unmeta_chars (c :: accum) cs | |
| 534 | ||
| 535 | val encode_str = String.translate meta_char | |
| 536 | val decode_str = String.explode #> unmeta_chars [] | |
| 537 | ||
| 538 | val encode_strs = map encode_str #> space_implode " " | |
| 57461 | 539 | val decode_strs = space_explode " " #> map decode_str | 
| 50311 | 540 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 541 | datatype proof_kind = Isar_Proof | Automatic_Proof | Isar_Proof_wegen_Prover_Flop | 
| 50311 | 542 | |
| 543 | fun str_of_proof_kind Isar_Proof = "i" | |
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 544 | | str_of_proof_kind Automatic_Proof = "a" | 
| 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 545 | | str_of_proof_kind Isar_Proof_wegen_Prover_Flop = "x" | 
| 50311 | 546 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 547 | fun proof_kind_of_str "a" = Automatic_Proof | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 548 | | proof_kind_of_str "x" = Isar_Proof_wegen_Prover_Flop | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 549 | | proof_kind_of_str _ (* "i" *) = Isar_Proof | 
| 50311 | 550 | |
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 551 | fun add_edge_to name parent = | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 552 | Graph.default_node (parent, (Isar_Proof, [], [])) | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 553 | #> Graph.add_edge (parent, name) | 
| 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 554 | |
| 57661 | 555 | fun add_node kind name parents feats deps (accum as (access_G, (fact_xtab, feat_xtab), learns)) = | 
| 556 | let val fact_xtab' = add_to_xtab name fact_xtab in | |
| 557 | ((Graph.new_node (name, (kind, feats, deps)) access_G | |
| 558 | handle Graph.DUP _ => Graph.map_node name (K (kind, feats, deps)) access_G) | |
| 559 | |> fold (add_edge_to name) parents, | |
| 560 | (fact_xtab', fold maybe_add_to_xtab feats feat_xtab), | |
| 561 | (name, feats, deps) :: learns) | |
| 562 | end | |
| 563 | handle Symtab.DUP _ => accum (* robustness (in case the state file violates the invariant) *) | |
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 564 | |
| 50311 | 565 | fun try_graph ctxt when def f = | 
| 566 | f () | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 567 | handle | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 568 | Graph.CYCLES (cycle :: _) => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 569 | (trace_msg ctxt (fn () => "Cycle involving " ^ commas cycle ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 570 | | Graph.DUP name => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 571 | (trace_msg ctxt (fn () => "Duplicate fact " ^ quote name ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 572 | | Graph.UNDEF name => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 573 | (trace_msg ctxt (fn () => "Unknown fact " ^ quote name ^ " when " ^ when); def) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 574 | | exn => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 575 | if Exn.is_interrupt exn then | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 576 | reraise exn | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 577 | else | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 578 | (trace_msg ctxt (fn () => "Internal error when " ^ when ^ ":\n" ^ Runtime.exn_message exn); | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 579 | def) | 
| 50311 | 580 | |
| 581 | fun graph_info G = | |
| 582 | string_of_int (length (Graph.keys G)) ^ " node(s), " ^ | |
| 57006 | 583 | string_of_int (fold (Integer.add o length o snd) (Graph.dest G) 0) ^ " edge(s), " ^ | 
| 50311 | 584 | string_of_int (length (Graph.maximals G)) ^ " maximal" | 
| 585 | ||
| 57545 | 586 | type ffds = string vector * int list vector * int list vector | 
| 587 | type freqs = int vector * int Inttab.table vector * int vector | |
| 588 | ||
| 53095 | 589 | type mash_state = | 
| 57358 | 590 |   {access_G : (proof_kind * string list * string list) Graph.T,
 | 
| 57374 | 591 | xtabs : xtab * xtab, | 
| 57545 | 592 | ffds : ffds, | 
| 593 | freqs : freqs, | |
| 57365 | 594 | dirty_facts : string list option} | 
| 50311 | 595 | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 596 | val empty_xtabs = (empty_xtab, empty_xtab) | 
| 57545 | 597 | val empty_ffds = (Vector.fromList [], Vector.fromList [], Vector.fromList []) : ffds | 
| 598 | val empty_freqs = (Vector.fromList [], Vector.fromList [], Vector.fromList []) : freqs | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 599 | |
| 57371 | 600 | val empty_state = | 
| 601 |   {access_G = Graph.empty,
 | |
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 602 | xtabs = empty_xtabs, | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 603 | ffds = empty_ffds, | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 604 | freqs = empty_freqs, | 
| 57371 | 605 | dirty_facts = SOME []} : mash_state | 
| 606 | ||
| 57545 | 607 | fun recompute_ffds_freqs_from_learns (learns : (string * string list * string list) list) | 
| 608 | ((num_facts, fact_tab), (num_feats, feat_tab)) num_facts0 (fact_names0, featss0, depss0) freqs0 = | |
| 57379 | 609 | let | 
| 57380 | 610 | val fact_names = Vector.concat [fact_names0, Vector.fromList (map #1 learns)] | 
| 611 | val featss = Vector.concat [featss0, | |
| 612 | Vector.fromList (map (map_filter (Symtab.lookup feat_tab) o #2) learns)] | |
| 613 | val depss = Vector.concat [depss0, | |
| 614 | Vector.fromList (map (map_filter (Symtab.lookup fact_tab) o #3) learns)] | |
| 57379 | 615 | in | 
| 616 | ((fact_names, featss, depss), | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 617 | MaSh.learn_facts freqs0 num_facts0 num_facts num_feats depss featss) | 
| 57379 | 618 | end | 
| 619 | ||
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 620 | fun reorder_learns (num_facts, fact_tab) learns = | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 621 |   let val ary = Array.array (num_facts, ("", [], [])) in
 | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 622 | List.app (fn learn as (fact, _, _) => | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 623 | Array.update (ary, the (Symtab.lookup fact_tab fact), learn)) | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 624 | learns; | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 625 | Array.foldr (op ::) [] ary | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 626 | end | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 627 | |
| 57380 | 628 | fun recompute_ffds_freqs_from_access_G access_G (xtabs as (fact_xtab, _)) = | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 629 | let | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 630 | val learns = | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 631 | Graph.schedule (fn _ => fn (fact, (_, feats, deps)) => (fact, feats, deps)) access_G | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 632 | |> reorder_learns fact_xtab | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 633 | in | 
| 57380 | 634 | recompute_ffds_freqs_from_learns learns xtabs 0 empty_ffds empty_freqs | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 635 | end | 
| 50311 | 636 | |
| 637 | local | |
| 638 | ||
| 57358 | 639 | val version = "*** MaSh version 20140625 ***" | 
| 50357 
187ae76a1757
more robustness in the face of MaSh format changes -- don't overwrite new versions with old versions
 blanchet parents: 
50356diff
changeset | 640 | |
| 53095 | 641 | exception FILE_VERSION_TOO_NEW of unit | 
| 50311 | 642 | |
| 643 | fun extract_node line = | |
| 55286 | 644 | (case space_explode ":" line of | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 645 | [head, tail] => | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 646 | (case (space_explode " " head, map (unprefix " ") (space_explode ";" tail)) of | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 647 | ([kind, name], [parents, feats, deps]) => | 
| 57358 | 648 | SOME (proof_kind_of_str kind, decode_str name, decode_strs parents, decode_strs feats, | 
| 57010 | 649 | decode_strs deps) | 
| 55286 | 650 | | _ => NONE) | 
| 651 | | _ => NONE) | |
| 50311 | 652 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 653 | fun load_state ctxt (time_state as (memory_time, _)) = | 
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 654 | let val path = state_file () in | 
| 57386 | 655 | (case try OS.FileSys.modTime (Path.implode path) of | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 656 | NONE => time_state | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 657 | | SOME disk_time => | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 658 | if Time.>= (memory_time, disk_time) then | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 659 | time_state | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 660 | else | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 661 | (disk_time, | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 662 | (case try File.read_lines path of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 663 | SOME (version' :: node_lines) => | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 664 | let | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 665 | fun extract_line_and_add_node line = | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 666 | (case extract_node line of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 667 | NONE => I (* should not happen *) | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 668 | | SOME (kind, name, parents, feats, deps) => add_node kind name parents feats deps) | 
| 57011 
a4428f517f46
implemented learning of single proofs in SML MaSh
 blanchet parents: 
57010diff
changeset | 669 | |
| 57379 | 670 | val empty_G_etc = (Graph.empty, empty_xtabs, []) | 
| 671 | ||
| 672 | val (access_G, xtabs, rev_learns) = | |
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 673 | (case string_ord (version', version) of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 674 | EQUAL => | 
| 57379 | 675 | try_graph ctxt "loading state" empty_G_etc | 
| 676 | (fn () => fold extract_line_and_add_node node_lines empty_G_etc) | |
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 677 | | LESS => (remove_state_file (); empty_G_etc) (* cannot parse old file *) | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 678 | | GREATER => raise FILE_VERSION_TOO_NEW ()) | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 679 | |
| 57380 | 680 | val (ffds, freqs) = | 
| 681 | recompute_ffds_freqs_from_learns (rev rev_learns) xtabs 0 empty_ffds empty_freqs | |
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 682 | in | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 683 |              trace_msg ctxt (fn () => "Loaded fact graph (" ^ graph_info access_G ^ ")");
 | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 684 |              {access_G = access_G, xtabs = xtabs, ffds = ffds, freqs = freqs, dirty_facts = SOME []}
 | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 685 | end | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 686 | | _ => empty_state))) | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 687 | end | 
| 50311 | 688 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 689 | fun str_of_entry (kind, name, parents, feats, deps) = | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 690 | str_of_proof_kind kind ^ " " ^ encode_str name ^ ": " ^ encode_strs parents ^ "; " ^ | 
| 57358 | 691 | encode_strs feats ^ "; " ^ encode_strs deps ^ "\n" | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 692 | |
| 57365 | 693 | fun save_state _ (time_state as (_, {dirty_facts = SOME [], ...})) = time_state
 | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 694 |   | save_state ctxt (memory_time, {access_G, xtabs, ffds, freqs, dirty_facts}) =
 | 
| 50311 | 695 | let | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 696 | fun append_entry (name, ((kind, feats, deps), (parents, _))) = | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 697 | cons (kind, name, Graph.Keys.dest parents, feats, deps) | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 698 | |
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 699 | val path = state_file () | 
| 57365 | 700 | val dirty_facts' = | 
| 57076 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 701 | (case try OS.FileSys.modTime (Path.implode path) of | 
| 
3d4b172d2209
automatically reload state file when it changes on disk
 blanchet parents: 
57062diff
changeset | 702 | NONE => NONE | 
| 57662 
cffd1d6ae1e5
don't needlessly regenerate entire file when the time stamps are equal
 blanchet parents: 
57661diff
changeset | 703 | | SOME disk_time => if Time.<= (disk_time, memory_time) then dirty_facts else NONE) | 
| 50311 | 704 | val (banner, entries) = | 
| 57365 | 705 | (case dirty_facts' of | 
| 55286 | 706 | SOME names => (NONE, fold (append_entry o Graph.get_entry access_G) names []) | 
| 707 | | NONE => (SOME (version ^ "\n"), Graph.fold append_entry access_G [])) | |
| 50311 | 708 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 709 | (case banner of SOME s => File.write path s | NONE => (); | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 710 | entries |> chunk_list 500 |> List.app (File.append path o implode o map str_of_entry)) | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 711 | handle IO.Io _ => (); | 
| 50311 | 712 | trace_msg ctxt (fn () => | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 713 |         "Saved fact graph (" ^ graph_info access_G ^
 | 
| 57365 | 714 | (case dirty_facts of | 
| 715 | SOME dirty_facts => "; " ^ string_of_int (length dirty_facts) ^ " dirty fact(s)" | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 716 | | _ => "") ^ ")"); | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 717 | (Time.now (), | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 718 |        {access_G = access_G, xtabs = xtabs, ffds = ffds, freqs = freqs, dirty_facts = SOME []})
 | 
| 50311 | 719 | end | 
| 720 | ||
| 57365 | 721 | val global_state = Synchronized.var "Sledgehammer_MaSh.global_state" (Time.zeroTime, empty_state) | 
| 50311 | 722 | |
| 723 | in | |
| 724 | ||
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 725 | fun map_state ctxt f = | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 726 | Synchronized.change global_state (load_state ctxt ##> f #> save_state ctxt) | 
| 53095 | 727 | handle FILE_VERSION_TOO_NEW () => () | 
| 50311 | 728 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 729 | fun peek_state ctxt = | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 730 | Synchronized.change_result global_state (perhaps (try (load_state ctxt)) #> `snd) | 
| 50311 | 731 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 732 | fun clear_state () = | 
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 733 | Synchronized.change global_state (fn _ => (remove_state_file (); (Time.zeroTime, empty_state))) | 
| 50311 | 734 | |
| 735 | end | |
| 736 | ||
| 737 | ||
| 738 | (*** Isabelle helpers ***) | |
| 739 | ||
| 50722 | 740 | val local_prefix = "local" ^ Long_Name.separator | 
| 48378 | 741 | |
| 50722 | 742 | fun elided_backquote_thm threshold th = | 
| 57006 | 743 | elide_string threshold (backquote_thm (Proof_Context.init_global (Thm.theory_of_thm th)) th) | 
| 48378 | 744 | |
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 745 | val thy_name_of_thm = Context.theory_name o Thm.theory_of_thm | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 746 | |
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 747 | fun nickname_of_thm th = | 
| 48394 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 748 | if Thm.has_name_hint th then | 
| 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 749 | let val hint = Thm.get_name_hint th in | 
| 50722 | 750 | (* There must be a better way to detect local facts. *) | 
| 55286 | 751 | (case try (unprefix local_prefix) hint of | 
| 48394 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 752 | SOME suf => | 
| 55286 | 753 | thy_name_of_thm th ^ Long_Name.separator ^ suf ^ Long_Name.separator ^ | 
| 754 | elided_backquote_thm 50 th | |
| 755 | | NONE => hint) | |
| 48394 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 756 | end | 
| 
82fc8c956cdc
fixed various issues with MaSh's file handling + tune output + generate local facts again + handle nameless facts gracefully
 blanchet parents: 
48392diff
changeset | 757 | else | 
| 50722 | 758 | elided_backquote_thm 200 th | 
| 48378 | 759 | |
| 51134 | 760 | fun find_suggested_facts ctxt facts = | 
| 48330 | 761 | let | 
| 51134 | 762 | fun add (fact as (_, th)) = Symtab.default (nickname_of_thm th, fact) | 
| 763 | val tab = fold add facts Symtab.empty | |
| 764 | fun lookup nick = | |
| 765 | Symtab.lookup tab nick | |
| 57006 | 766 | |> tap (fn NONE => trace_msg ctxt (fn () => "Cannot find " ^ quote nick) | _ => ()) | 
| 51134 | 767 | in map_filter lookup end | 
| 48311 | 768 | |
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 769 | fun free_feature_of s = "f" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 770 | fun thy_feature_of s = "y" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 771 | fun type_feature_of s = "t" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 772 | fun class_feature_of s = "s" ^ s | 
| 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 773 | val local_feature = "local" | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 774 | |
| 51135 | 775 | fun crude_theory_ord p = | 
| 50722 | 776 | if Theory.subthy p then | 
| 777 | if Theory.eq_thy p then EQUAL else LESS | |
| 48324 | 778 | else if Theory.subthy (swap p) then | 
| 779 | GREATER | |
| 55286 | 780 | else | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 781 | (case int_ord (apply2 (length o Theory.ancestors_of) p) of | 
| 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 782 | EQUAL => string_ord (apply2 Context.theory_name p) | 
| 55286 | 783 | | order => order) | 
| 48324 | 784 | |
| 51135 | 785 | fun crude_thm_ord p = | 
| 59582 | 786 | (case crude_theory_ord (apply2 Thm.theory_of_thm p) of | 
| 50359 
da395f0e7dea
tweaked order of theorems to avoid forward dependencies (MaSh)
 blanchet parents: 
50357diff
changeset | 787 | EQUAL => | 
| 57039 | 788 | (* The hack below is necessary because of odd dependencies that are not reflected in the theory | 
| 789 | comparison. *) | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 790 | let val q = apply2 nickname_of_thm p in | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 791 | (* Hack to put "xxx_def" before "xxxI" and "xxxE" *) | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 792 | (case bool_ord (apply2 (String.isSuffix "_def") (swap q)) of | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 793 | EQUAL => string_ord q | 
| 55286 | 794 | | ord => ord) | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 795 | end | 
| 55286 | 796 | | ord => ord) | 
| 48324 | 797 | |
| 59582 | 798 | val thm_less_eq = Theory.subthy o apply2 Thm.theory_of_thm | 
| 51136 | 799 | fun thm_less p = thm_less_eq p andalso not (thm_less_eq (swap p)) | 
| 800 | ||
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 801 | val freezeT = Type.legacy_freeze_type | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 802 | |
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 803 | fun freeze (t $ u) = freeze t $ freeze u | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 804 | | freeze (Abs (s, T, t)) = Abs (s, freezeT T, freeze t) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 805 | | freeze (Var ((s, _), T)) = Free (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 806 | | freeze (Const (s, T)) = Const (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 807 | | freeze (Free (s, T)) = Free (s, freezeT T) | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 808 | | freeze t = t | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 809 | |
| 59621 
291934bac95e
Thm.cterm_of and Thm.ctyp_of operate on local context;
 wenzelm parents: 
59582diff
changeset | 810 | fun goal_of_thm thy = Thm.prop_of #> freeze #> Thm.global_cterm_of thy #> Goal.init | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 811 | |
| 54141 
f57f8e7a879f
generate a comment storing the goal nickname in "learn_prover"
 blanchet parents: 
54140diff
changeset | 812 | fun run_prover_for_mash ctxt params prover goal_name facts goal = | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 813 | let | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 814 | val problem = | 
| 54141 
f57f8e7a879f
generate a comment storing the goal nickname in "learn_prover"
 blanchet parents: 
54140diff
changeset | 815 |       {comment = "Goal: " ^ goal_name, state = Proof.init ctxt, goal = goal, subgoal = 1,
 | 
| 
f57f8e7a879f
generate a comment storing the goal nickname in "learn_prover"
 blanchet parents: 
54140diff
changeset | 816 |        subgoal_count = 1, factss = [("", facts)]}
 | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 817 | in | 
| 57735 
056a55b44ec7
eliminated Sledgehammer's "min" subcommand (and lots of complications in the code)
 blanchet parents: 
57662diff
changeset | 818 | get_minimizing_prover ctxt MaSh (K ()) prover params problem | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 819 | end | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 820 | |
| 48326 | 821 | val bad_types = [@{type_name prop}, @{type_name bool}, @{type_name fun}]
 | 
| 822 | ||
| 53086 
15fe0ca088b3
MaSh tweaking: shorter names + killed (broken) SNoW
 blanchet parents: 
53085diff
changeset | 823 | val pat_tvar_prefix = "_" | 
| 
15fe0ca088b3
MaSh tweaking: shorter names + killed (broken) SNoW
 blanchet parents: 
53085diff
changeset | 824 | val pat_var_prefix = "_" | 
| 
15fe0ca088b3
MaSh tweaking: shorter names + killed (broken) SNoW
 blanchet parents: 
53085diff
changeset | 825 | |
| 53089 | 826 | (* try "Long_Name.base_name" for shorter names *) | 
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 827 | fun massage_long_name s = s | 
| 53083 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 828 | |
| 57006 | 829 | val crude_str_of_sort = space_implode ":" o map massage_long_name o subtract (op =) @{sort type}
 | 
| 53086 
15fe0ca088b3
MaSh tweaking: shorter names + killed (broken) SNoW
 blanchet parents: 
53085diff
changeset | 830 | |
| 
15fe0ca088b3
MaSh tweaking: shorter names + killed (broken) SNoW
 blanchet parents: 
53085diff
changeset | 831 | fun crude_str_of_typ (Type (s, [])) = massage_long_name s | 
| 57006 | 832 | | crude_str_of_typ (Type (s, Ts)) = massage_long_name s ^ implode (map crude_str_of_typ Ts) | 
| 53083 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 833 | | crude_str_of_typ (TFree (_, S)) = crude_str_of_sort S | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 834 | | crude_str_of_typ (TVar (_, S)) = crude_str_of_sort S | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 835 | |
| 53128 | 836 | fun maybe_singleton_str _ "" = [] | 
| 837 | | maybe_singleton_str pref s = [pref ^ s] | |
| 838 | ||
| 53148 
c898409d8630
fixed subtle bug with "take" + thread overlord through
 blanchet parents: 
53142diff
changeset | 839 | val max_pat_breadth = 10 (* FUDGE *) | 
| 50585 | 840 | |
| 57406 | 841 | fun term_features_of ctxt thy_name term_max_depth type_max_depth ts = | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 842 | let | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 843 | val thy = Proof_Context.theory_of ctxt | 
| 53082 | 844 | |
| 50393 | 845 | val fixes = map snd (Variable.dest_fixes ctxt) | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 846 | val classes = Sign.classes_of thy | 
| 53082 | 847 | |
| 48304 | 848 |     fun add_classes @{sort type} = I
 | 
| 50392 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 849 | | add_classes S = | 
| 
190053ee24ed
expand type classes into their ancestors for MaSh
 blanchet parents: 
50391diff
changeset | 850 | fold (`(Sorts.super_classes classes) | 
| 57006 | 851 | #> swap #> op :: | 
| 852 |           #> subtract (op =) @{sort type} #> map massage_long_name
 | |
| 853 | #> map class_feature_of | |
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 854 | #> union (op =)) S | 
| 53082 | 855 | |
| 856 | fun pattify_type 0 _ = [] | |
| 857 | | pattify_type _ (Type (s, [])) = | |
| 53086 
15fe0ca088b3
MaSh tweaking: shorter names + killed (broken) SNoW
 blanchet parents: 
53085diff
changeset | 858 | if member (op =) bad_types s then [] else [massage_long_name s] | 
| 53082 | 859 | | pattify_type depth (Type (s, U :: Ts)) = | 
| 860 | let | |
| 861 | val T = Type (s, Ts) | |
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 862 | val ps = take max_pat_breadth (pattify_type depth T) | 
| 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 863 |           val qs = take max_pat_breadth ("" :: pattify_type (depth - 1) U)
 | 
| 57006 | 864 | in | 
| 865 |           map_product (fn p => fn "" => p | q => p ^ "(" ^ q ^ ")") ps qs
 | |
| 866 | end | |
| 57552 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 867 | | pattify_type _ (TFree (_, S)) = maybe_singleton_str pat_tvar_prefix (crude_str_of_sort S) | 
| 
1072599c43f6
no need for 'mash' subdirectory after removal of Python program
 blanchet parents: 
57546diff
changeset | 868 | | pattify_type _ (TVar (_, S)) = maybe_singleton_str pat_tvar_prefix (crude_str_of_sort S) | 
| 57006 | 869 | |
| 53082 | 870 | fun add_type_pat depth T = | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 871 | union (op =) (map type_feature_of (pattify_type depth T)) | 
| 57006 | 872 | |
| 53082 | 873 | fun add_type_pats 0 _ = I | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 874 | | add_type_pats depth t = add_type_pat depth t #> add_type_pats (depth - 1) t | 
| 57006 | 875 | |
| 53083 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 876 | fun add_type T = | 
| 
019ecbb18e3f
generate patterns for variables as well in MaSh (cf. HOL(y)Hammer)
 blanchet parents: 
53082diff
changeset | 877 | add_type_pats type_max_depth T | 
| 53156 | 878 | #> fold_atyps_sorts (add_classes o snd) T | 
| 57006 | 879 | |
| 53084 | 880 | fun add_subtypes (T as Type (_, Ts)) = add_type T #> fold add_subtypes Ts | 
| 881 | | add_subtypes T = add_type T | |
| 53082 | 882 | |
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 883 | fun pattify_term _ 0 _ = [] | 
| 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 884 | | pattify_term _ _ (Const (s, _)) = | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 885 | if is_widely_irrelevant_const s then [] else [massage_long_name s] | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 886 | | pattify_term _ _ (Free (s, T)) = | 
| 53128 | 887 | maybe_singleton_str pat_var_prefix (crude_str_of_typ T) | 
| 53090 
1426c97311f2
treat frees as both consts and vars, for more hits
 blanchet parents: 
53089diff
changeset | 888 | |> (if member (op =) fixes s then | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 889 | cons (free_feature_of (massage_long_name (thy_name ^ Long_Name.separator ^ s))) | 
| 53090 
1426c97311f2
treat frees as both consts and vars, for more hits
 blanchet parents: 
53089diff
changeset | 890 | else | 
| 
1426c97311f2
treat frees as both consts and vars, for more hits
 blanchet parents: 
53089diff
changeset | 891 | I) | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 892 | | pattify_term _ _ (Var (_, T)) = maybe_singleton_str pat_var_prefix (crude_str_of_typ T) | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 893 | | pattify_term Ts _ (Bound j) = | 
| 54695 | 894 | maybe_singleton_str pat_var_prefix (crude_str_of_typ (nth Ts j)) | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 895 | | pattify_term Ts depth (t $ u) = | 
| 50339 
d8dae91f3107
MaSh improvements: deeper patterns + more respect for chained facts
 blanchet parents: 
50338diff
changeset | 896 | let | 
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 897 | val ps = take max_pat_breadth (pattify_term Ts depth t) | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 898 |           val qs = take max_pat_breadth ("" :: pattify_term Ts (depth - 1) u)
 | 
| 53130 | 899 | in | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 900 |           map_product (fn p => fn "" => p | q => p ^ "(" ^ q ^ ")") ps qs
 | 
| 53130 | 901 | end | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 902 | | pattify_term _ _ _ = [] | 
| 57006 | 903 | |
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 904 | fun add_term_pat Ts = union (op =) oo pattify_term Ts | 
| 57006 | 905 | |
| 53130 | 906 | fun add_term_pats _ 0 _ = I | 
| 57055 
df3a26987a8d
reverted '|' features in MaSh -- these sounded like a good idea but never really worked
 blanchet parents: 
57052diff
changeset | 907 | | add_term_pats Ts depth t = add_term_pat Ts depth t #> add_term_pats Ts (depth - 1) t | 
| 57006 | 908 | |
| 53130 | 909 | fun add_term Ts = add_term_pats Ts term_max_depth | 
| 57006 | 910 | |
| 53085 | 911 | fun add_subterms Ts t = | 
| 55286 | 912 | (case strip_comb t of | 
| 54089 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 913 | (Const (s, T), args) => | 
| 
b13f6731f873
use same relevance filter for ATP and SMT solvers -- attempting to filter out certain ground instances of polymorphic symbols like + and 0 has unexpected side-effects that lead to incompletenesses (relevant facts not being selected)
 blanchet parents: 
54085diff
changeset | 914 | (not (is_widely_irrelevant_const s) ? add_term Ts t) | 
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 915 | #> add_subtypes T #> fold (add_subterms Ts) args | 
| 50857 
80768e28c9ee
better handlig of built-ins -- at the top-level, not in subterms
 blanchet parents: 
50841diff
changeset | 916 | | (head, args) => | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 917 | (case head of | 
| 53130 | 918 | Free (_, T) => add_term Ts t #> add_subtypes T | 
| 53084 | 919 | | Var (_, T) => add_subtypes T | 
| 53085 | 920 | | Abs (_, T, body) => add_subtypes T #> add_subterms (T :: Ts) body | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 921 | | _ => I) | 
| 55286 | 922 | #> fold (add_subterms Ts) args) | 
| 57006 | 923 | in | 
| 924 | fold (add_subterms []) ts [] | |
| 925 | end | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 926 | |
| 53085 | 927 | val term_max_depth = 2 | 
| 53155 | 928 | val type_max_depth = 1 | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 929 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 930 | (* TODO: Generate type classes for types? *) | 
| 57406 | 931 | fun features_of ctxt thy (scope, _) ts = | 
| 50393 | 932 | let val thy_name = Context.theory_name thy in | 
| 933 | thy_feature_of thy_name :: | |
| 57406 | 934 | term_features_of ctxt thy_name term_max_depth type_max_depth ts | 
| 50393 | 935 | |> scope <> Global ? cons local_feature | 
| 936 | end | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 937 | |
| 57006 | 938 | (* Too many dependencies is a sign that a decision procedure is at work. There is not much to learn | 
| 939 | from such proofs. *) | |
| 50434 
960a3429615c
more MaSh tweaking -- in particular, export the same facts in "MaSh_Export" as are later tried in "MaSh_Eval"
 blanchet parents: 
50412diff
changeset | 940 | val max_dependencies = 20 | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 941 | |
| 54125 
420b876ff1e2
if slicing is disabled, don't enforce last slice's "max_facts", but rather the maximum "max_facts"
 blanchet parents: 
54123diff
changeset | 942 | val prover_default_max_facts = 25 | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 943 | |
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 944 | (* "type_definition_xxx" facts are characterized by their use of "CollectI". *) | 
| 50755 | 945 | val typedef_dep = nickname_of_thm @{thm CollectI}
 | 
| 57006 | 946 | (* Mysterious parts of the class machinery create lots of proofs that refer exclusively to | 
| 947 | "someI_ex" (and to some internal constructions). *) | |
| 50755 | 948 | val class_some_dep = nickname_of_thm @{thm someI_ex}
 | 
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 949 | |
| 50828 | 950 | val fundef_ths = | 
| 57006 | 951 |   @{thms fundef_ex1_existence fundef_ex1_uniqueness fundef_ex1_iff fundef_default_value}
 | 
| 50828 | 952 | |> map nickname_of_thm | 
| 953 | ||
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 954 | (* "Rep_xxx_inject", "Abs_xxx_inverse", etc., are derived using these facts. *) | 
| 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 955 | val typedef_ths = | 
| 57006 | 956 |   @{thms type_definition.Abs_inverse type_definition.Rep_inverse type_definition.Rep
 | 
| 957 | type_definition.Rep_inject type_definition.Abs_inject type_definition.Rep_cases | |
| 958 | type_definition.Abs_cases type_definition.Rep_induct type_definition.Abs_induct | |
| 959 | type_definition.Rep_range type_definition.Abs_image} | |
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 960 | |> map nickname_of_thm | 
| 48438 
3e45c98fe127
distinguish between recursive and nonrecursive definitions + clean up typedef dependencies in MaSh
 blanchet parents: 
48436diff
changeset | 961 | |
| 48441 | 962 | fun is_size_def [dep] th = | 
| 55642 
63beb38e9258
adapted to renaming of datatype 'cases' and 'recs' to 'case' and 'rec'
 blanchet parents: 
55286diff
changeset | 963 | (case first_field ".rec" dep of | 
| 57006 | 964 | SOME (pref, _) => | 
| 965 | (case first_field ".size" (nickname_of_thm th) of | |
| 966 | SOME (pref', _) => pref = pref' | |
| 967 | | NONE => false) | |
| 968 | | NONE => false) | |
| 48441 | 969 | | is_size_def _ _ = false | 
| 970 | ||
| 51177 | 971 | fun trim_dependencies deps = | 
| 50755 | 972 | if length deps > max_dependencies then NONE else SOME deps | 
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 973 | |
| 50755 | 974 | fun isar_dependencies_of name_tabs th = | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 975 | thms_in_proof max_dependencies (SOME name_tabs) th | 
| 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 976 | |> Option.map (fn deps => | 
| 57006 | 977 | if deps = [typedef_dep] orelse deps = [class_some_dep] orelse | 
| 57757 | 978 | exists (member (op =) fundef_ths) deps orelse exists (member (op =) typedef_ths) deps orelse | 
| 979 | is_size_def deps th then | |
| 50755 | 980 | [] | 
| 981 | else | |
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 982 | deps) | 
| 48404 | 983 | |
| 57006 | 984 | fun prover_dependencies_of ctxt (params as {verbose, max_facts, ...}) prover auto_level facts
 | 
| 985 | name_tabs th = | |
| 55286 | 986 | (case isar_dependencies_of name_tabs th of | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 987 | SOME [] => (false, []) | 
| 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 988 | | isar_deps0 => | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 989 | let | 
| 57306 
ff10067b2248
optimized traversal of proof terms by skipping bad apples (e.g. full_exhaustive_int'.pinduct)
 blanchet parents: 
57305diff
changeset | 990 | val isar_deps = these isar_deps0 | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 991 | val thy = Proof_Context.theory_of ctxt | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 992 | val goal = goal_of_thm thy th | 
| 54141 
f57f8e7a879f
generate a comment storing the goal nickname in "learn_prover"
 blanchet parents: 
54140diff
changeset | 993 | val name = nickname_of_thm th | 
| 52196 
2281f33e8da6
redid rac7830871177 to avoid duplicate fixed variable (e.g. lemma "P (a::nat)" proof - have "!!a::int. Q a" sledgehammer [e])
 blanchet parents: 
52125diff
changeset | 994 | val (_, hyp_ts, concl_t) = ATP_Util.strip_subgoal goal 1 ctxt | 
| 51136 | 995 | val facts = facts |> filter (fn (_, th') => thm_less (th', th)) | 
| 57006 | 996 | |
| 51004 
5f2788c38127
distinguish raw and non-raw facts, using raw for 10 000s of facts and non-raw after selection of some hundreds
 blanchet parents: 
51003diff
changeset | 997 | fun nickify ((_, stature), th) = ((nickname_of_thm th, stature), th) | 
| 57006 | 998 | |
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 999 | fun is_dep dep (_, th) = nickname_of_thm th = dep | 
| 57006 | 1000 | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1001 | fun add_isar_dep facts dep accum = | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1002 | if exists (is_dep dep) accum then | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1003 | accum | 
| 55286 | 1004 | else | 
| 1005 | (case find_first (is_dep dep) facts of | |
| 1006 |             SOME ((_, status), th) => accum @ [(("", status), th)]
 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1007 | | NONE => accum (* should not happen *)) | 
| 57006 | 1008 | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1009 | val mepo_facts = | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1010 | facts | 
| 54095 | 1011 | |> mepo_suggested_facts ctxt params (max_facts |> the_default prover_default_max_facts) NONE | 
| 1012 | hyp_ts concl_t | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1013 | val facts = | 
| 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1014 | mepo_facts | 
| 50754 
74a6adcb96ac
also generate queries for goals with too many Isar dependencies
 blanchet parents: 
50751diff
changeset | 1015 | |> fold (add_isar_dep facts) isar_deps | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 1016 | |> map nickify | 
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1017 | val num_isar_deps = length isar_deps | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1018 | in | 
| 48404 | 1019 | if verbose andalso auto_level = 0 then | 
| 58843 | 1020 |         writeln ("MaSh: " ^ quote prover ^ " on " ^ quote name ^ " with " ^
 | 
| 57017 | 1021 | string_of_int num_isar_deps ^ " + " ^ string_of_int (length facts - num_isar_deps) ^ | 
| 1022 | " facts.") | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1023 | else | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1024 | (); | 
| 55286 | 1025 | (case run_prover_for_mash ctxt params prover name facts goal of | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1026 |         {outcome = NONE, used_facts, ...} =>
 | 
| 48404 | 1027 | (if verbose andalso auto_level = 0 then | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1028 | let val num_facts = length used_facts in | 
| 58843 | 1029 |              writeln ("Found proof with " ^ string_of_int num_facts ^ " fact" ^
 | 
| 57017 | 1030 | plural_s num_facts ^ ".") | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1031 | end | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1032 | else | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1033 | (); | 
| 50754 
74a6adcb96ac
also generate queries for goals with too many Isar dependencies
 blanchet parents: 
50751diff
changeset | 1034 | (true, map fst used_facts)) | 
| 55286 | 1035 | | _ => (false, isar_deps)) | 
| 1036 | end) | |
| 48251 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1037 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1038 | |
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1039 | (*** High-level communication with MaSh ***) | 
| 
6cdcfbddc077
moved most of MaSh exporter code to Sledgehammer
 blanchet parents: 
48249diff
changeset | 1040 | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1041 | (* In the following functions, chunks are risers w.r.t. "thm_less_eq". *) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1042 | |
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1043 | fun chunks_and_parents_for chunks th = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1044 | let | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1045 | fun insert_parent new parents = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1046 | let val parents = parents |> filter_out (fn p => thm_less_eq (p, new)) in | 
| 57096 
e4074b91b2a6
always remove duplicates in meshing + use weights for Naive Bayes
 blanchet parents: 
57095diff
changeset | 1047 | parents |> forall (fn p => not (thm_less_eq (new, p))) parents ? cons new | 
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1048 | end | 
| 57006 | 1049 | |
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1050 | fun rechunk seen (rest as th' :: ths) = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1051 | if thm_less_eq (th', th) then (rev seen, rest) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1052 | else rechunk (th' :: seen) ths | 
| 57006 | 1053 | |
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1054 | fun do_chunk [] accum = accum | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1055 | | do_chunk (chunk as hd_chunk :: _) (chunks, parents) = | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1056 | if thm_less_eq (hd_chunk, th) then | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1057 | (chunk :: chunks, insert_parent hd_chunk parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1058 | else if thm_less_eq (List.last chunk, th) then | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1059 | let val (front, back as hd_back :: _) = rechunk [] chunk in | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1060 | (front :: back :: chunks, insert_parent hd_back parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1061 | end | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1062 | else | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1063 | (chunk :: chunks, parents) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1064 | in | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1065 | fold_rev do_chunk chunks ([], []) | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1066 | |>> cons [] | 
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1067 | ||> map nickname_of_thm | 
| 51181 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1068 | end | 
| 
d0fa18638478
implement (more) accurate computation of parents
 blanchet parents: 
51180diff
changeset | 1069 | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1070 | fun attach_parents_to_facts _ [] = [] | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1071 | | attach_parents_to_facts old_facts (facts as (_, th) :: _) = | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1072 | let | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1073 | fun do_facts _ [] = [] | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1074 | | do_facts (_, parents) [fact] = [(parents, fact)] | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1075 | | do_facts (chunks, parents) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1076 | ((fact as (_, th)) :: (facts as (_, th') :: _)) = | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1077 | let | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1078 | val chunks = app_hd (cons th) chunks | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1079 | val chunks_and_parents' = | 
| 57370 | 1080 | if thm_less_eq (th, th') andalso thy_name_of_thm th = thy_name_of_thm th' then | 
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1081 | (chunks, [nickname_of_thm th]) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1082 | else | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1083 | chunks_and_parents_for chunks th' | 
| 57006 | 1084 | in | 
| 1085 | (parents, fact) :: do_facts chunks_and_parents' facts | |
| 1086 | end | |
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1087 | in | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1088 | old_facts @ facts | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1089 | |> do_facts (chunks_and_parents_for [[]] th) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1090 | |> drop (length old_facts) | 
| 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1091 | end | 
| 51177 | 1092 | |
| 53095 | 1093 | fun maximal_wrt_graph G keys = | 
| 1094 | let | |
| 1095 | val tab = Symtab.empty |> fold (fn name => Symtab.default (name, ())) keys | |
| 57006 | 1096 | |
| 57012 | 1097 | fun insert_new seen name = not (Symtab.defined seen name) ? insert (op =) name | 
| 57006 | 1098 | |
| 53095 | 1099 | fun num_keys keys = Graph.Keys.fold (K (Integer.add 1)) keys 0 | 
| 57006 | 1100 | |
| 53095 | 1101 | fun find_maxes _ (maxs, []) = map snd maxs | 
| 1102 | | find_maxes seen (maxs, new :: news) = | |
| 57012 | 1103 | find_maxes (seen |> num_keys (Graph.imm_succs G new) > 1 ? Symtab.default (new, ())) | 
| 1104 | (if Symtab.defined tab new then | |
| 1105 | let | |
| 1106 | val newp = Graph.all_preds G [new] | |
| 1107 | fun is_ancestor x yp = member (op =) yp x | |
| 1108 | val maxs = maxs |> filter (fn (_, max) => not (is_ancestor max newp)) | |
| 1109 | in | |
| 1110 | if exists (is_ancestor new o fst) maxs then (maxs, news) | |
| 1111 | else ((newp, new) :: filter_out (fn (_, max) => is_ancestor max newp) maxs, news) | |
| 1112 | end | |
| 1113 | else | |
| 1114 | (maxs, Graph.Keys.fold (insert_new seen) (Graph.imm_preds G new) news)) | |
| 57006 | 1115 | in | 
| 1116 | find_maxes Symtab.empty ([], Graph.maximals G) | |
| 1117 | end | |
| 53095 | 1118 | |
| 59486 
2025a17bb20f
careful about visibility of facts that have the same 'theory' in optimization
 blanchet parents: 
59477diff
changeset | 1119 | fun strict_subthy thyp = Theory.subthy thyp andalso not (Theory.subthy (swap thyp)) | 
| 
2025a17bb20f
careful about visibility of facts that have the same 'theory' in optimization
 blanchet parents: 
59477diff
changeset | 1120 | |
| 57460 | 1121 | fun maximal_wrt_access_graph _ [] = [] | 
| 1122 | | maximal_wrt_access_graph access_G ((fact as (_, th)) :: facts) = | |
| 59582 | 1123 | let val thy = Thm.theory_of_thm th in | 
| 1124 | fact :: filter_out (fn (_, th') => strict_subthy (Thm.theory_of_thm th', thy)) facts | |
| 57460 | 1125 | |> map (nickname_of_thm o snd) | 
| 1126 | |> maximal_wrt_graph access_G | |
| 1127 | end | |
| 53095 | 1128 | |
| 1129 | fun is_fact_in_graph access_G = can (Graph.get_node access_G) o nickname_of_thm | |
| 1130 | ||
| 53197 | 1131 | val chained_feature_factor = 0.5 (* FUDGE *) | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1132 | val extra_feature_factor = 0.1 (* FUDGE *) | 
| 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1133 | val num_extra_feature_facts = 10 (* FUDGE *) | 
| 53095 | 1134 | |
| 57660 | 1135 | val max_proximity_facts = 100 (* FUDGE *) | 
| 53095 | 1136 | |
| 54060 | 1137 | fun find_mash_suggestions ctxt max_facts suggs facts chained raw_unknown = | 
| 1138 | let | |
| 1139 | val inter_fact = inter (eq_snd Thm.eq_thm_prop) | |
| 1140 | val raw_mash = find_suggested_facts ctxt facts suggs | |
| 1141 | val proximate = take max_proximity_facts facts | |
| 1142 | val unknown_chained = inter_fact raw_unknown chained | |
| 1143 | val unknown_proximate = inter_fact raw_unknown proximate | |
| 1144 | val mess = | |
| 1145 | [(0.9 (* FUDGE *), (map (rpair 1.0) unknown_chained, [])), | |
| 1146 | (0.4 (* FUDGE *), (weight_facts_smoothly unknown_proximate, [])), | |
| 1147 | (0.1 (* FUDGE *), (weight_facts_steeply raw_mash, raw_unknown))] | |
| 57007 | 1148 | val unknown = raw_unknown | 
| 1149 | |> fold (subtract (eq_snd Thm.eq_thm_prop)) [unknown_chained, unknown_proximate] | |
| 57006 | 1150 | in | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1151 | (mesh_facts (eq_snd (gen_eq_thm ctxt)) max_facts mess, unknown) | 
| 57006 | 1152 | end | 
| 53095 | 1153 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1154 | fun mash_suggested_facts ctxt thy ({debug, ...} : params) max_suggs hyp_ts concl_t facts =
 | 
| 53095 | 1155 | let | 
| 53559 
3858246c7c8f
when pouring in extra features into the goal, only consider facts from the current theory -- the bottom 10 facts of the last import might be completely unrelated
 blanchet parents: 
53558diff
changeset | 1156 | val thy_name = Context.theory_name thy | 
| 57532 | 1157 | val algorithm = the_mash_algorithm () | 
| 57052 
ea5912e3b008
until naive Bayes supports weights, don't incorporate 'extra' low-weight features
 blanchet parents: 
57039diff
changeset | 1158 | |
| 57460 | 1159 | val facts = facts | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 1160 | |> rev_sort_list_prefix (crude_thm_ord o apply2 snd) | 
| 57460 | 1161 | (Int.max (num_extra_feature_facts, max_proximity_facts)) | 
| 1162 | ||
| 1163 | val chained = filter (fn ((_, (scope, _)), _) => scope = Chained) facts | |
| 54085 | 1164 | |
| 53559 
3858246c7c8f
when pouring in extra features into the goal, only consider facts from the current theory -- the bottom 10 facts of the last import might be completely unrelated
 blanchet parents: 
53558diff
changeset | 1165 | fun fact_has_right_theory (_, th) = | 
| 59582 | 1166 | thy_name = Context.theory_name (Thm.theory_of_thm th) | 
| 56995 | 1167 | |
| 53141 
d27e99a6a679
take chained and proximate facts into consideration when computing MaSh features
 blanchet parents: 
53140diff
changeset | 1168 | fun chained_or_extra_features_of factor (((_, stature), th), weight) = | 
| 59582 | 1169 | [Thm.prop_of th] | 
| 1170 | |> features_of ctxt (Thm.theory_of_thm th) stature | |
| 57404 
a68ae60c1504
got rid of hard-coded weights, now that we have TFIDF
 blanchet parents: 
57403diff
changeset | 1171 | |> map (rpair (weight * factor)) | 
| 54085 | 1172 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1173 |     val {access_G, xtabs = ((num_facts, fact_tab), (num_feats, feat_tab)), ffds, freqs, ...} =
 | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1174 | peek_state ctxt | 
| 57017 | 1175 | |
| 57460 | 1176 | val goal_feats0 = features_of ctxt thy (Local, General) (concl_t :: hyp_ts) | 
| 1177 | val chained_feats = chained | |
| 1178 | |> map (rpair 1.0) | |
| 1179 | |> map (chained_or_extra_features_of chained_feature_factor) | |
| 1180 | |> rpair [] |-> fold (union (eq_fst (op =))) | |
| 1181 | val extra_feats = facts | |
| 1182 | |> take (Int.max (0, num_extra_feature_facts - length chained)) | |
| 1183 | |> filter fact_has_right_theory | |
| 1184 | |> weight_facts_steeply | |
| 1185 | |> map (chained_or_extra_features_of extra_feature_factor) | |
| 1186 | |> rpair [] |-> fold (union (eq_fst (op =))) | |
| 1187 | ||
| 1188 | val goal_feats = | |
| 1189 | fold (union (eq_fst (op =))) [chained_feats, extra_feats] (map (rpair 1.0) goal_feats0) | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 1190 | |> debug ? sort (Real.compare o swap o apply2 snd) | 
| 57460 | 1191 | |
| 1192 | val parents = maximal_wrt_access_graph access_G facts | |
| 1193 | val visible_facts = map_filter (Symtab.lookup fact_tab) (Graph.all_preds access_G parents) | |
| 1194 | ||
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1195 | val suggs = | 
| 57532 | 1196 | if algorithm = MaSh_NB_Ext orelse algorithm = MaSh_kNN_Ext then | 
| 57460 | 1197 | let | 
| 1198 | val learns = | |
| 1199 | Graph.schedule (fn _ => fn (fact, (_, feats, deps)) => (fact, feats, deps)) access_G | |
| 1200 | in | |
| 57532 | 1201 | MaSh.query_external ctxt algorithm max_suggs learns goal_feats | 
| 57460 | 1202 | end | 
| 1203 | else | |
| 1204 | let | |
| 1205 | val int_goal_feats = | |
| 1206 | map_filter (fn (s, w) => Option.map (rpair w) (Symtab.lookup feat_tab s)) goal_feats | |
| 1207 | in | |
| 57532 | 1208 | MaSh.query_internal ctxt algorithm num_facts num_feats ffds freqs visible_facts max_suggs | 
| 57460 | 1209 | goal_feats int_goal_feats | 
| 1210 | end | |
| 57017 | 1211 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1212 | val unknown = filter_out (is_fact_in_graph access_G o snd) facts | 
| 53095 | 1213 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1214 | find_mash_suggestions ctxt max_suggs suggs facts chained unknown | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 1215 | |> apply2 (map fact_of_raw_fact) | 
| 53095 | 1216 | end | 
| 1217 | ||
| 58843 | 1218 | fun mash_unlearn () = (clear_state (); writeln "Reset MaSh.") | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1219 | |
| 57661 | 1220 | fun learn_wrt_access_graph ctxt (name, parents, feats, deps) | 
| 1221 | (accum as (access_G, (fact_xtab, feat_xtab))) = | |
| 53095 | 1222 | let | 
| 57371 | 1223 | fun maybe_learn_from from (accum as (parents, access_G)) = | 
| 57013 
ed95456499e6
better way to take invisible facts into account than 'island' business
 blanchet parents: 
57012diff
changeset | 1224 | try_graph ctxt "updating graph" accum (fn () => | 
| 57371 | 1225 | (from :: parents, Graph.add_edge_acyclic (from, name) access_G)) | 
| 1226 | ||
| 1227 | val access_G = access_G |> Graph.default_node (name, (Isar_Proof, feats, deps)) | |
| 1228 | val (parents, access_G) = ([], access_G) |> fold maybe_learn_from parents | |
| 1229 | val (deps, _) = ([], access_G) |> fold maybe_learn_from deps | |
| 1230 | ||
| 57661 | 1231 | val fact_xtab = add_to_xtab name fact_xtab | 
| 57371 | 1232 | val feat_xtab = fold maybe_add_to_xtab feats feat_xtab | 
| 57006 | 1233 | in | 
| 57661 | 1234 | (SOME (name, parents, feats, deps), (access_G, (fact_xtab, feat_xtab))) | 
| 57006 | 1235 | end | 
| 57661 | 1236 | handle Symtab.DUP _ => (NONE, accum) (* facts sometimes have the same name, confusingly *) | 
| 53095 | 1237 | |
| 57381 | 1238 | fun relearn_wrt_access_graph ctxt (name, deps) access_G = | 
| 53095 | 1239 | let | 
| 57371 | 1240 | fun maybe_relearn_from from (accum as (parents, access_G)) = | 
| 53095 | 1241 | try_graph ctxt "updating graph" accum (fn () => | 
| 57371 | 1242 | (from :: parents, Graph.add_edge_acyclic (from, name) access_G)) | 
| 1243 | val access_G = | |
| 1244 | access_G |> Graph.map_node name (fn (_, feats, _) => (Automatic_Proof, feats, deps)) | |
| 1245 | val (deps, _) = ([], access_G) |> fold maybe_relearn_from deps | |
| 57006 | 1246 | in | 
| 57381 | 1247 | ((name, deps), access_G) | 
| 57006 | 1248 | end | 
| 53095 | 1249 | |
| 1250 | fun flop_wrt_access_graph name = | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1251 | Graph.map_node name (fn (_, feats, deps) => (Isar_Proof_wegen_Prover_Flop, feats, deps)) | 
| 53095 | 1252 | |
| 57273 
01b68f625550
automatically learn MaSh facts also in 'blocking' mode
 blanchet parents: 
57150diff
changeset | 1253 | val learn_timeout_slack = 20.0 | 
| 53095 | 1254 | |
| 1255 | fun launch_thread timeout task = | |
| 1256 | let | |
| 1257 | val hard_timeout = time_mult learn_timeout_slack timeout | |
| 1258 | val birth_time = Time.now () | |
| 1259 | val death_time = Time.+ (birth_time, hard_timeout) | |
| 1260 |     val desc = ("Machine learner for Sledgehammer", "")
 | |
| 57006 | 1261 | in | 
| 59471 
ca459352d8c5
more explicit indication of Async_Manager_Legacy as Proof General legacy;
 wenzelm parents: 
59172diff
changeset | 1262 | Async_Manager_Legacy.thread MaShN birth_time death_time desc task | 
| 57006 | 1263 | end | 
| 53095 | 1264 | |
| 57013 
ed95456499e6
better way to take invisible facts into account than 'island' business
 blanchet parents: 
57012diff
changeset | 1265 | fun learned_proof_name () = | 
| 
ed95456499e6
better way to take invisible facts into account than 'island' business
 blanchet parents: 
57012diff
changeset | 1266 | Date.fmt ".%Y%m%d.%H%M%S." (Date.fromTimeLocal (Time.now ())) ^ serial_string () | 
| 
ed95456499e6
better way to take invisible facts into account than 'island' business
 blanchet parents: 
57012diff
changeset | 1267 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1268 | fun mash_learn_proof ctxt ({timeout, ...} : params) t facts used_ths =
 | 
| 57387 
2b6fe2a48352
reintroduced MaSh hints, this time as persistent creatures
 blanchet parents: 
57386diff
changeset | 1269 | if not (null used_ths) andalso is_mash_enabled () then | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1270 | launch_thread timeout (fn () => | 
| 57006 | 1271 | let | 
| 1272 | val thy = Proof_Context.theory_of ctxt | |
| 57406 | 1273 | val feats = features_of ctxt thy (Local, General) [t] | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 1274 | val facts = rev_sort_list_prefix (crude_thm_ord o apply2 snd) 1 facts | 
| 57006 | 1275 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1276 | map_state ctxt | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1277 |           (fn {access_G, xtabs as ((num_facts0, _), _), ffds, freqs, dirty_facts} =>
 | 
| 57371 | 1278 | let | 
| 1279 | val parents = maximal_wrt_access_graph access_G facts | |
| 1280 | val deps = used_ths | |
| 1281 | |> filter (is_fact_in_graph access_G) | |
| 1282 | |> map nickname_of_thm | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1283 | |
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1284 | val name = learned_proof_name () | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1285 | val (access_G', xtabs', rev_learns) = | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1286 | add_node Automatic_Proof name parents feats deps (access_G, xtabs, []) | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1287 | |
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1288 | val (ffds', freqs') = | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1289 | recompute_ffds_freqs_from_learns (rev rev_learns) xtabs' num_facts0 ffds freqs | 
| 57371 | 1290 | in | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1291 |                {access_G = access_G', xtabs = xtabs', ffds = ffds', freqs = freqs',
 | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1292 | dirty_facts = Option.map (cons name) dirty_facts} | 
| 57371 | 1293 | end); | 
| 57006 | 1294 | (true, "") | 
| 1295 | end) | |
| 53819 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 1296 | else | 
| 
e55d641d0a70
honor MaSh's zero-overhead policy -- no learning if the tool is disabled
 blanchet parents: 
53790diff
changeset | 1297 | () | 
| 53095 | 1298 | |
| 57353 | 1299 | fun sendback sub = Active.sendback_markup [Markup.padding_command] (sledgehammerN ^ " " ^ sub) | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1300 | |
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1301 | val commit_timeout = seconds 30.0 | 
| 48332 
271a4a6af734
optimize parent computation in MaSh + remove temporary files
 blanchet parents: 
48330diff
changeset | 1302 | |
| 50485 
3c6ac2da2f45
merge aliased theorems in MaSh dependencies, modulo symmetry of equality
 blanchet parents: 
50484diff
changeset | 1303 | (* The timeout is understood in a very relaxed fashion. *) | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1304 | fun mash_learn_facts ctxt (params as {debug, verbose, ...}) prover auto_level run_prover
 | 
| 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1305 | learn_timeout facts = | 
| 48304 | 1306 | let | 
| 48318 | 1307 | val timer = Timer.startRealTimer () | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1308 | fun next_commit_time () = Time.+ (Timer.checkRealTimer timer, commit_timeout) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1309 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1310 |     val {access_G, ...} = peek_state ctxt
 | 
| 53095 | 1311 | val is_in_access_G = is_fact_in_graph access_G o snd | 
| 51177 | 1312 | val no_new_facts = forall is_in_access_G facts | 
| 48308 | 1313 | in | 
| 51177 | 1314 | if no_new_facts andalso not run_prover then | 
| 48404 | 1315 | if auto_level < 2 then | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1316 | "No new " ^ (if run_prover then "automatic" else "Isar") ^ " proofs to learn." ^ | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1317 | (if auto_level = 0 andalso not run_prover then | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1318 | "\n\nHint: Try " ^ sendback learn_proverN ^ " to learn from an automatic prover." | 
| 48404 | 1319 | else | 
| 1320 | "") | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1321 | else | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1322 | "" | 
| 48308 | 1323 | else | 
| 48304 | 1324 | let | 
| 50735 
6b232d76cbc9
refined class handling, to prevent cycles in fact graph
 blanchet parents: 
50732diff
changeset | 1325 | val name_tabs = build_name_tables nickname_of_thm facts | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1326 | |
| 48439 
67a6bcbd3587
removed MaSh junk arising from primrec definitions
 blanchet parents: 
48438diff
changeset | 1327 | fun deps_of status th = | 
| 57017 | 1328 | if status = Non_Rec_Def orelse status = Rec_Def then | 
| 48439 
67a6bcbd3587
removed MaSh junk arising from primrec definitions
 blanchet parents: 
48438diff
changeset | 1329 | SOME [] | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1330 | else if run_prover then | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1331 | prover_dependencies_of ctxt params prover auto_level facts name_tabs th | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1332 | |> (fn (false, _) => NONE | (true, deps) => trim_dependencies deps) | 
| 48404 | 1333 | else | 
| 50735 
6b232d76cbc9
refined class handling, to prevent cycles in fact graph
 blanchet parents: 
50732diff
changeset | 1334 | isar_dependencies_of name_tabs th | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1335 | |
| 48669 
cdcdb0547f29
remember ATP flops to avoid repeating them too quickly
 blanchet parents: 
48668diff
changeset | 1336 | fun do_commit [] [] [] state = state | 
| 57382 | 1337 | | do_commit learns relearns flops | 
| 1338 |               {access_G, xtabs as ((num_facts0, _), _), ffds, freqs, dirty_facts} =
 | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1339 | let | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 1340 | val was_empty = Graph.is_empty access_G | 
| 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 1341 | |
| 57658 | 1342 | val (learns, (access_G', xtabs')) = | 
| 57381 | 1343 | fold_map (learn_wrt_access_graph ctxt) learns (access_G, xtabs) | 
| 57661 | 1344 | |>> map_filter I | 
| 57658 | 1345 | val (relearns, access_G'') = | 
| 1346 | fold_map (relearn_wrt_access_graph ctxt) relearns access_G' | |
| 57371 | 1347 | |
| 57658 | 1348 | val access_G''' = access_G'' |> fold flop_wrt_access_graph flops | 
| 1349 | val dirty_facts' = | |
| 57365 | 1350 | (case (was_empty, dirty_facts) of | 
| 57062 
eb6777796782
avoid slow inspection of proof terms now that dependencies are stored in 'state'
 blanchet parents: 
57061diff
changeset | 1351 | (false, SOME names) => SOME (map #1 learns @ map #1 relearns @ names) | 
| 55286 | 1352 | | _ => NONE) | 
| 57378 
fe96689f393b
recompute learning data at learning time, not query time
 blanchet parents: 
57377diff
changeset | 1353 | |
| 57382 | 1354 | val (ffds', freqs') = | 
| 1355 | if null relearns then | |
| 1356 | recompute_ffds_freqs_from_learns | |
| 57658 | 1357 | (map (fn (name, _, feats, deps) => (name, feats, deps)) learns) xtabs' num_facts0 | 
| 57382 | 1358 | ffds freqs | 
| 1359 | else | |
| 57658 | 1360 | recompute_ffds_freqs_from_access_G access_G''' xtabs' | 
| 48404 | 1361 | in | 
| 57658 | 1362 |               {access_G = access_G''', xtabs = xtabs', ffds = ffds', freqs = freqs',
 | 
| 1363 | dirty_facts = dirty_facts'} | |
| 48404 | 1364 | end | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1365 | |
| 50631 | 1366 | fun commit last learns relearns flops = | 
| 58843 | 1367 | (if debug andalso auto_level = 0 then writeln "Committing..." else (); | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1368 | map_state ctxt (do_commit (rev learns) relearns flops); | 
| 48404 | 1369 | if not last andalso auto_level = 0 then | 
| 50631 | 1370 | let val num_proofs = length learns + length relearns in | 
| 58843 | 1371 |                writeln ("Learned " ^ string_of_int num_proofs ^ " " ^
 | 
| 57017 | 1372 | (if run_prover then "automatic" else "Isar") ^ " proof" ^ | 
| 1373 | plural_s num_proofs ^ " in the last " ^ string_of_time commit_timeout ^ ".") | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1374 | end | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1375 | else | 
| 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1376 | ()) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1377 | |
| 51177 | 1378 | fun learn_new_fact _ (accum as (_, (_, _, true))) = accum | 
| 1379 | | learn_new_fact (parents, ((_, stature as (_, status)), th)) | |
| 57385 | 1380 | (learns, (num_nontrivial, next_commit, _)) = | 
| 48318 | 1381 | let | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 1382 | val name = nickname_of_thm th | 
| 59582 | 1383 | val feats = features_of ctxt (Thm.theory_of_thm th) stature [Thm.prop_of th] | 
| 57757 | 1384 | val deps = these (deps_of status th) | 
| 57385 | 1385 | val num_nontrivial = num_nontrivial |> not (null deps) ? Integer.add 1 | 
| 50631 | 1386 | val learns = (name, parents, feats, deps) :: learns | 
| 1387 | val (learns, next_commit) = | |
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1388 | if Time.> (Timer.checkRealTimer timer, next_commit) then | 
| 50631 | 1389 | (commit false learns [] []; ([], next_commit_time ())) | 
| 48392 
ca998fa08cd9
added "learn_from_atp" command to MaSh, for patient users
 blanchet parents: 
48390diff
changeset | 1390 | else | 
| 50631 | 1391 | (learns, next_commit) | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1392 | val timed_out = Time.> (Timer.checkRealTimer timer, learn_timeout) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1393 | in | 
| 57385 | 1394 | (learns, (num_nontrivial, next_commit, timed_out)) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1395 | end | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1396 | |
| 57385 | 1397 | val (num_new_facts, num_nontrivial) = | 
| 51177 | 1398 | if no_new_facts then | 
| 57385 | 1399 | (0, 0) | 
| 48404 | 1400 | else | 
| 1401 | let | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1402 | val new_facts = facts | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 1403 | |> sort (crude_thm_ord o apply2 snd) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1404 | |> attach_parents_to_facts [] | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1405 | |> filter_out (is_in_access_G o snd) | 
| 57385 | 1406 | val (learns, (num_nontrivial, _, _)) = | 
| 51177 | 1407 | ([], (0, next_commit_time (), false)) | 
| 51182 
962190eab40d
provide two modes for MaSh driver: linearized or real visibility
 blanchet parents: 
51181diff
changeset | 1408 | |> fold learn_new_fact new_facts | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1409 | in | 
| 57385 | 1410 | commit true learns [] []; (length new_facts, num_nontrivial) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1411 | end | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1412 | |
| 48404 | 1413 | fun relearn_old_fact _ (accum as (_, (_, _, true))) = accum | 
| 57385 | 1414 | | relearn_old_fact ((_, (_, status)), th) | 
| 1415 | ((relearns, flops), (num_nontrivial, next_commit, _)) = | |
| 48404 | 1416 | let | 
| 50624 
4d0997abce79
improved thm order hack, in case the default names are overridden
 blanchet parents: 
50623diff
changeset | 1417 | val name = nickname_of_thm th | 
| 57385 | 1418 | val (num_nontrivial, relearns, flops) = | 
| 55286 | 1419 | (case deps_of status th of | 
| 57385 | 1420 | SOME deps => (num_nontrivial + 1, (name, deps) :: relearns, flops) | 
| 1421 | | NONE => (num_nontrivial, relearns, name :: flops)) | |
| 50631 | 1422 | val (relearns, flops, next_commit) = | 
| 48404 | 1423 | if Time.> (Timer.checkRealTimer timer, next_commit) then | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1424 | (commit false [] relearns flops; ([], [], next_commit_time ())) | 
| 48404 | 1425 | else | 
| 50631 | 1426 | (relearns, flops, next_commit) | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1427 | val timed_out = Time.> (Timer.checkRealTimer timer, learn_timeout) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1428 | in | 
| 57385 | 1429 | ((relearns, flops), (num_nontrivial, next_commit, timed_out)) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1430 | end | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1431 | |
| 57385 | 1432 | val num_nontrivial = | 
| 51177 | 1433 | if not run_prover then | 
| 57385 | 1434 | num_nontrivial | 
| 48404 | 1435 | else | 
| 1436 | let | |
| 48668 
5d63c23b4042
remember which MaSh proofs were found using ATPs
 blanchet parents: 
48667diff
changeset | 1437 | val max_isar = 1000 * max_dependencies | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1438 | |
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1439 | fun priority_of th = | 
| 59172 | 1440 | Random.random_range 0 max_isar + | 
| 57062 
eb6777796782
avoid slow inspection of proof terms now that dependencies are stored in 'state'
 blanchet parents: 
57061diff
changeset | 1441 | (case try (Graph.get_node access_G) (nickname_of_thm th) of | 
| 
eb6777796782
avoid slow inspection of proof terms now that dependencies are stored in 'state'
 blanchet parents: 
57061diff
changeset | 1442 | SOME (Isar_Proof, _, deps) => ~100 * length deps | 
| 
eb6777796782
avoid slow inspection of proof terms now that dependencies are stored in 'state'
 blanchet parents: 
57061diff
changeset | 1443 | | SOME (Automatic_Proof, _, _) => 2 * max_isar | 
| 
eb6777796782
avoid slow inspection of proof terms now that dependencies are stored in 'state'
 blanchet parents: 
57061diff
changeset | 1444 | | SOME (Isar_Proof_wegen_Prover_Flop, _, _) => max_isar | 
| 
eb6777796782
avoid slow inspection of proof terms now that dependencies are stored in 'state'
 blanchet parents: 
57061diff
changeset | 1445 | | NONE => 0) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1446 | |
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1447 | val old_facts = facts | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1448 | |> filter is_in_access_G | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1449 | |> map (`(priority_of o snd)) | 
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 1450 | |> sort (int_ord o apply2 fst) | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1451 | |> map snd | 
| 57385 | 1452 | val ((relearns, flops), (num_nontrivial, _, _)) = | 
| 1453 | (([], []), (num_nontrivial, next_commit_time (), false)) | |
| 48404 | 1454 | |> fold relearn_old_fact old_facts | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1455 | in | 
| 57385 | 1456 | commit true [] relearns flops; num_nontrivial | 
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1457 | end | 
| 48318 | 1458 | in | 
| 48404 | 1459 | if verbose orelse auto_level < 2 then | 
| 57385 | 1460 | "Learned " ^ string_of_int num_new_facts ^ " fact" ^ plural_s num_new_facts ^ " and " ^ | 
| 1461 | string_of_int num_nontrivial ^ " nontrivial " ^ | |
| 1462 | (if run_prover then "automatic and " else "") ^ "Isar proof" ^ plural_s num_nontrivial ^ | |
| 57006 | 1463 | (if verbose then " in " ^ string_of_time (Timer.checkRealTimer timer) else "") ^ "." | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 1464 | else | 
| 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 1465 | "" | 
| 48318 | 1466 | end | 
| 48308 | 1467 | end | 
| 48304 | 1468 | |
| 54123 
271a8377656f
remove overloading of "max_facts" -- it already controls the number of facts passed to ATPs for 'learn_prover'
 blanchet parents: 
54115diff
changeset | 1469 | fun mash_learn ctxt (params as {provers, timeout, ...}) fact_override chained run_prover =
 | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1470 | let | 
| 48396 | 1471 | val css = Sledgehammer_Fact.clasimpset_rule_table_of ctxt | 
| 48395 
85a7fb65507a
learning should honor the fact override and the chained facts
 blanchet parents: 
48394diff
changeset | 1472 | val ctxt = ctxt |> Config.put instantiate_inducts false | 
| 58919 | 1473 | val facts = | 
| 1474 |       nearly_all_facts ctxt false fact_override Keyword.empty_keywords css chained [] @{prop True}
 | |
| 59058 
a78612c67ec0
renamed "pairself" to "apply2", in accordance to @{apply 2};
 wenzelm parents: 
58919diff
changeset | 1475 | |> sort (crude_thm_ord o apply2 snd o swap) | 
| 48404 | 1476 | val num_facts = length facts | 
| 1477 | val prover = hd provers | |
| 57006 | 1478 | |
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1479 | fun learn auto_level run_prover = | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1480 | mash_learn_facts ctxt params prover auto_level run_prover one_year facts | 
| 58843 | 1481 | |> writeln | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1482 | in | 
| 50484 
8ec31bdb9d36
adopt the neutral "prover" terminology for MaSh rather than the ambiguous/wrong ATP terminology (which sometimes excludes SMT solvers)
 blanchet parents: 
50450diff
changeset | 1483 | if run_prover then | 
| 58843 | 1484 |       (writeln ("MaShing through " ^ string_of_int num_facts ^ " fact" ^
 | 
| 57017 | 1485 |          plural_s num_facts ^ " for automatic proofs (" ^ quote prover ^ " timeout: " ^
 | 
| 1486 | string_of_time timeout ^ ").\n\nCollecting Isar proofs first..."); | |
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1487 | learn 1 false; | 
| 58843 | 1488 | writeln "Now collecting automatic proofs. This may take several hours. You \ | 
| 57017 | 1489 | \can safely stop the learning process at any point."; | 
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1490 | learn 0 true) | 
| 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1491 | else | 
| 58843 | 1492 |       (writeln ("MaShing through " ^ string_of_int num_facts ^ " fact" ^
 | 
| 57006 | 1493 | plural_s num_facts ^ " for Isar proofs..."); | 
| 50340 
72519bf5f135
simplify MaSh term patterns + add missing global facts if there aren't too many
 blanchet parents: 
50339diff
changeset | 1494 | learn 0 false) | 
| 48316 
252f45c04042
drastic overhaul of MaSh data structures + fixed a few performance issues
 blanchet parents: 
48315diff
changeset | 1495 | end | 
| 48249 | 1496 | |
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1497 | fun mash_can_suggest_facts ctxt = not (Graph.is_empty (#access_G (peek_state ctxt))) | 
| 50311 | 1498 | |
| 57274 
0acfdb151e52
more generous formula -- there are lots of duplicates out there
 blanchet parents: 
57273diff
changeset | 1499 | (* Generate more suggestions than requested, because some might be thrown out later for various | 
| 
0acfdb151e52
more generous formula -- there are lots of duplicates out there
 blanchet parents: 
57273diff
changeset | 1500 | reasons (e.g., duplicates). *) | 
| 
0acfdb151e52
more generous formula -- there are lots of duplicates out there
 blanchet parents: 
57273diff
changeset | 1501 | fun generous_max_suggestions max_facts = 3 * max_facts div 2 + 25 | 
| 50383 
4274b25ff4e7
take proximity into account for MaSh + fix a debilitating bug in feature generation
 blanchet parents: 
50382diff
changeset | 1502 | |
| 50814 | 1503 | val mepo_weight = 0.5 | 
| 1504 | val mash_weight = 0.5 | |
| 1505 | ||
| 53152 
cbd3c7c48d2c
learn new facts on query if there aren't too many of them in MaSh
 blanchet parents: 
53150diff
changeset | 1506 | val max_facts_to_learn_before_query = 100 | 
| 
cbd3c7c48d2c
learn new facts on query if there aren't too many of them in MaSh
 blanchet parents: 
53150diff
changeset | 1507 | |
| 57005 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1508 | (* The threshold should be large enough so that MaSh does not get activated for Auto Sledgehammer | 
| 
33f3d2ea803d
store all MaSh data on the Isabelle side, in preparation for replacing 'mash.py' with ML solution
 blanchet parents: 
56995diff
changeset | 1509 | and Try. *) | 
| 48318 | 1510 | val min_secs_for_learning = 15 | 
| 1511 | ||
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1512 | fun relevant_facts ctxt (params as {verbose, learn, fact_filter, timeout, ...}) prover
 | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1513 |     max_facts ({add, only, ...} : fact_override) hyp_ts concl_t facts =
 | 
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1514 | if not (subset (op =) (the_list fact_filter, fact_filters)) then | 
| 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1515 |     error ("Unknown fact filter: " ^ quote (the fact_filter) ^ ".")
 | 
| 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1516 | else if only then | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1517 |     [("", map fact_of_raw_fact facts)]
 | 
| 48321 | 1518 | else if max_facts <= 0 orelse null facts then | 
| 51010 | 1519 |     [("", [])]
 | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1520 | else | 
| 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1521 | let | 
| 57407 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 1522 | val thy = Proof_Context.theory_of ctxt | 
| 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 1523 | |
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1524 | fun maybe_launch_thread exact min_num_facts_to_learn = | 
| 59471 
ca459352d8c5
more explicit indication of Async_Manager_Legacy as Proof General legacy;
 wenzelm parents: 
59172diff
changeset | 1525 | if not (Async_Manager_Legacy.has_running_threads MaShN) andalso | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1526 | Time.toSeconds timeout >= min_secs_for_learning then | 
| 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1527 | let val timeout = time_mult learn_timeout_slack timeout in | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1528 | (if verbose then | 
| 58843 | 1529 |                writeln ("Started MaShing through " ^
 | 
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1530 | (if exact then "" else "at least ") ^ string_of_int min_num_facts_to_learn ^ | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1531 | " fact" ^ plural_s min_num_facts_to_learn ^ " in the background.") | 
| 57405 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1532 | else | 
| 
1f49da059947
reintroduced 'extra features' + only print message in verbose mode
 blanchet parents: 
57404diff
changeset | 1533 | ()); | 
| 54816 
10d48c2a3e32
made timeouts in Sledgehammer not be 'option's -- simplified lots of code
 blanchet parents: 
54799diff
changeset | 1534 | launch_thread timeout | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1535 | (fn () => (true, mash_learn_facts ctxt params prover 2 false timeout facts)) | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 1536 | end | 
| 48318 | 1537 | else | 
| 1538 | () | |
| 57018 | 1539 | |
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1540 | fun please_learn () = | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1541 | let | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1542 |           val {access_G, xtabs = ((num_facts0, _), _), ...} = peek_state ctxt
 | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1543 | val is_in_access_G = is_fact_in_graph access_G o snd | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1544 | val min_num_facts_to_learn = length facts - num_facts0 | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1545 | in | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1546 | if min_num_facts_to_learn <= max_facts_to_learn_before_query then | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1547 | (case length (filter_out is_in_access_G facts) of | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1548 | 0 => () | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1549 | | num_facts_to_learn => | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1550 | if num_facts_to_learn <= max_facts_to_learn_before_query then | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1551 | mash_learn_facts ctxt params prover 2 false timeout facts | 
| 58843 | 1552 | |> (fn "" => () | s => writeln (MaShN ^ ": " ^ s)) | 
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1553 | else | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1554 | maybe_launch_thread true num_facts_to_learn) | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1555 | else | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1556 | maybe_launch_thread false min_num_facts_to_learn | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1557 | end | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1558 | |
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1559 | val mash_enabled = is_mash_enabled () | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1560 | val _ = | 
| 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1561 | if learn andalso mash_enabled andalso fact_filter <> SOME mepoN then please_learn () else () | 
| 57018 | 1562 | |
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1563 | val effective_fact_filter = | 
| 55286 | 1564 | (case fact_filter of | 
| 57431 
02c408aed5ee
killed Python version of MaSh, now that the SML version works adequately
 blanchet parents: 
57409diff
changeset | 1565 | SOME ff => ff | 
| 57555 
f60d70566525
also learn when 'fact_filter =' is set explicitly
 blanchet parents: 
57554diff
changeset | 1566 | | NONE => if mash_enabled andalso mash_can_suggest_facts ctxt then meshN else mepoN) | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1567 | |
| 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1568 | val unique_facts = drop_duplicate_facts facts | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1569 | val add_ths = Attrib.eval_thms ctxt add | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1570 | |
| 51004 
5f2788c38127
distinguish raw and non-raw facts, using raw for 10 000s of facts and non-raw after selection of some hundreds
 blanchet parents: 
51003diff
changeset | 1571 | fun in_add (_, th) = member Thm.eq_thm_prop add_ths th | 
| 57018 | 1572 | |
| 51003 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1573 | fun add_and_take accepts = | 
| 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1574 | (case add_ths of | 
| 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1575 | [] => accepts | 
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1576 | | _ => | 
| 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1577 | (unique_facts |> filter in_add |> map fact_of_raw_fact) @ (accepts |> filter_out in_add)) | 
| 48293 | 1578 | |> take max_facts | 
| 57018 | 1579 | |
| 48406 | 1580 | fun mepo () = | 
| 54143 
18def1c73c79
make sure add: doesn't add duplicates, and works for [no_atp] facts
 blanchet parents: 
54141diff
changeset | 1581 | (mepo_suggested_facts ctxt params max_facts NONE hyp_ts concl_t unique_facts | 
| 54091 | 1582 | |> weight_facts_steeply, []) | 
| 57018 | 1583 | |
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1584 | fun mash () = | 
| 57407 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 1585 | mash_suggested_facts ctxt thy params (generous_max_suggestions max_facts) hyp_ts concl_t | 
| 
140e16bc2a40
use right theory name for theorems in evaluation driver
 blanchet parents: 
57406diff
changeset | 1586 | facts | 
| 53140 
a1235e90da5f
pour extra features from proximate facts into goal, in exporter
 blanchet parents: 
53137diff
changeset | 1587 | |>> weight_facts_steeply | 
| 57018 | 1588 | |
| 48314 
ee33ba3c0e05
added option to control which fact filter is used
 blanchet parents: 
48313diff
changeset | 1589 | val mess = | 
| 51003 
198cb05fb35b
report (MeSh, MePo, MaSh) triple, to be able to use different filters in different slices
 blanchet parents: 
51001diff
changeset | 1590 | (* the order is important for the "case" expression below *) | 
| 54091 | 1591 | [] |> effective_fact_filter <> mepoN ? cons (mash_weight, mash) | 
| 1592 | |> effective_fact_filter <> mashN ? cons (mepo_weight, mepo) | |
| 1593 | |> Par_List.map (apsnd (fn f => f ())) | |
| 57150 
f591096a9c94
add option to keep duplicates, for more precise evaluation of relevance filters
 blanchet parents: 
57133diff
changeset | 1594 | val mesh = mesh_facts (eq_snd (gen_eq_thm ctxt)) max_facts mess |> add_and_take | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1595 | in | 
| 55286 | 1596 | (case (fact_filter, mess) of | 
| 51024 
98fb341d32e3
distinguish MeSh and smart -- with smart, allow combinations of MaSh, MeSh, and MePo in different slices -- and use MaSh also with SMT solvers, based on evaluation
 blanchet parents: 
51020diff
changeset | 1597 | (NONE, [(_, (mepo, _)), (_, (mash, _))]) => | 
| 57658 | 1598 | [(meshN, mesh), | 
| 1599 | (mepoN, mepo |> map fst |> add_and_take), | |
| 51010 | 1600 | (mashN, mash |> map fst |> add_and_take)] | 
| 55286 | 1601 | | _ => [(effective_fact_filter, mesh)]) | 
| 48288 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1602 | end | 
| 
255c6e1fd505
rationalize relevance filter, slowing moving code from Iter to MaSh
 blanchet parents: 
48251diff
changeset | 1603 | |
| 59471 
ca459352d8c5
more explicit indication of Async_Manager_Legacy as Proof General legacy;
 wenzelm parents: 
59172diff
changeset | 1604 | fun kill_learners () = Async_Manager_Legacy.kill_threads MaShN "learner" | 
| 
ca459352d8c5
more explicit indication of Async_Manager_Legacy as Proof General legacy;
 wenzelm parents: 
59172diff
changeset | 1605 | fun running_learners () = Async_Manager_Legacy.running_threads MaShN "learner" | 
| 48319 
340187063d84
use async manager to manage MaSh learners to make sure they get killed cleanly
 blanchet parents: 
48318diff
changeset | 1606 | |
| 48248 | 1607 | end; |