| author | blanchet | 
| Fri, 07 Jun 2013 12:11:55 +0200 | |
| changeset 52344 | ff05e50efa0d | 
| parent 51423 | e5f9a6d9ca82 | 
| child 53843 | 88c6e630c15f | 
| permissions | -rw-r--r-- | 
| 29315 
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
 wenzelm parents: 
28454diff
changeset | 1 | (* Title: Pure/Thy/thy_syntax.ML | 
| 23726 | 2 | Author: Makarius | 
| 3 | ||
| 29315 
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
 wenzelm parents: 
28454diff
changeset | 4 | Superficial theory syntax: tokens and spans. | 
| 23726 | 5 | *) | 
| 6 | ||
| 29315 
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
 wenzelm parents: 
28454diff
changeset | 7 | signature THY_SYNTAX = | 
| 23726 | 8 | sig | 
| 36959 
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
 wenzelm parents: 
36950diff
changeset | 9 | val parse_tokens: Scan.lexicon * Scan.lexicon -> Position.T -> string -> Token.T list | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 10 | val reports_of_tokens: Token.T list -> bool * (Position.report * string) list | 
| 40131 
7cbebd636e79
explicitly qualify type Output.output, which is a slightly odd internal feature;
 wenzelm parents: 
39507diff
changeset | 11 | val present_token: Token.T -> Output.output | 
| 48878 | 12 | datatype span_kind = Command of string * Position.T | Ignored | Malformed | 
| 48867 
e9beabf045ab
some support for inlining file content into outer syntax token language;
 wenzelm parents: 
48865diff
changeset | 13 | datatype span = Span of span_kind * Token.T list | 
| 27842 | 14 | val span_kind: span -> span_kind | 
| 36959 
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
 wenzelm parents: 
36950diff
changeset | 15 | val span_content: span -> Token.T list | 
| 48865 | 16 | val present_span: span -> Output.output | 
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 17 | val parse_spans: Token.T list -> span list | 
| 51225 | 18 |   datatype 'a element = Element of 'a * ('a element list * 'a) option
 | 
| 51321 | 19 | val atom: 'a -> 'a element | 
| 51225 | 20 |   val map_element: ('a -> 'b) -> 'a element -> 'b element
 | 
| 21 | val flat_element: 'a element -> 'a list | |
| 51268 
fcc4b89a600d
simplified Outer_Syntax.read_span: internalized Toplevel.is_ignored;
 wenzelm parents: 
51267diff
changeset | 22 | val last_element: 'a element -> 'a | 
| 51225 | 23 | val parse_elements: span list -> span element list | 
| 23726 | 24 | end; | 
| 25 | ||
| 37216 
3165bc303f66
modernized some structure names, keeping a few legacy aliases;
 wenzelm parents: 
37197diff
changeset | 26 | structure Thy_Syntax: THY_SYNTAX = | 
| 23726 | 27 | struct | 
| 28 | ||
| 23803 | 29 | (** tokens **) | 
| 30 | ||
| 31 | (* parse *) | |
| 23726 | 32 | |
| 40523 
1050315f6ee2
simplified/robustified treatment of malformed symbols, which are now fully internalized (total Symbol.explode etc.);
 wenzelm parents: 
40290diff
changeset | 33 | fun parse_tokens lexs pos = | 
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 34 | Source.of_string #> | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 35 | Symbol.source #> | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 36 |   Token.source {do_recover = SOME false} (K lexs) pos #>
 | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 37 | Source.exhaust; | 
| 23726 | 38 | |
| 39 | ||
| 23803 | 40 | (* present *) | 
| 23726 | 41 | |
| 42 | local | |
| 43 | ||
| 44 | val token_kind_markup = | |
| 50201 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 45 | fn Token.Command => (Markup.command, "") | 
| 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 46 | | Token.Keyword => (Markup.keyword, "") | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 47 | | Token.Ident => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 48 | | Token.LongIdent => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 49 | | Token.SymIdent => (Markup.empty, "") | 
| 50201 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 50 | | Token.Var => (Markup.var, "") | 
| 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 51 | | Token.TypeIdent => (Markup.tfree, "") | 
| 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 52 | | Token.TypeVar => (Markup.tvar, "") | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 53 | | Token.Nat => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 54 | | Token.Float => (Markup.empty, "") | 
| 50201 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 55 | | Token.String => (Markup.string, "") | 
| 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 56 | | Token.AltString => (Markup.altstring, "") | 
| 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 57 | | Token.Verbatim => (Markup.verbatim, "") | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 58 | | Token.Space => (Markup.empty, "") | 
| 50201 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 59 | | Token.Comment => (Markup.comment, "") | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 60 | | Token.InternalValue => (Markup.empty, "") | 
| 50201 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 61 | | Token.Error msg => (Markup.bad, msg) | 
| 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 62 | | Token.Sync => (Markup.control, "") | 
| 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 63 | | Token.EOF => (Markup.control, ""); | 
| 23726 | 64 | |
| 37192 | 65 | fun token_markup tok = | 
| 50238 
98d35a7368bd
more uniform Symbol.is_ascii_identifier in ML/Scala;
 wenzelm parents: 
50201diff
changeset | 66 | if Token.keyword_with (not o Symbol.is_ascii_identifier) tok | 
| 50201 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 67 | then (Markup.operator, "") | 
| 37197 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 68 | else | 
| 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 69 | let | 
| 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 70 | val kind = Token.kind_of tok; | 
| 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 71 | val props = | 
| 38471 
0924654b8163
report command token name instead of kind, which can be retrieved later via Outer_Syntax.keyword_kind;
 wenzelm parents: 
38422diff
changeset | 72 | if kind = Token.Command | 
| 
0924654b8163
report command token name instead of kind, which can be retrieved later via Outer_Syntax.keyword_kind;
 wenzelm parents: 
38422diff
changeset | 73 | then Markup.properties [(Markup.nameN, Token.content_of tok)] | 
| 37197 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 74 | else I; | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 75 | val (markup, txt) = token_kind_markup kind; | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 76 | in (props markup, txt) end; | 
| 37192 | 77 | |
| 48749 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 78 | fun reports_of_token tok = | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 79 | let | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 80 | val malformed_symbols = | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 81 | Symbol_Pos.explode (Token.source_position_of tok) | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 82 | |> map_filter (fn (sym, pos) => | 
| 48756 
1c843142758e
sneak message into "bad" markup as property -- to be displayed after YXML parsing;
 wenzelm parents: 
48752diff
changeset | 83 | if Symbol.is_malformed sym | 
| 50201 
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
 wenzelm parents: 
48878diff
changeset | 84 | then SOME ((pos, Markup.bad), "Malformed symbolic character") else NONE); | 
| 48749 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 85 | val is_malformed = Token.is_error tok orelse not (null malformed_symbols); | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 86 | val (markup, txt) = token_markup tok; | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 87 | val reports = ((Token.position_of tok, markup), txt) :: malformed_symbols; | 
| 48749 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 88 | in (is_malformed, reports) end; | 
| 40528 | 89 | |
| 23803 | 90 | in | 
| 91 | ||
| 48749 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 92 | fun reports_of_tokens toks = | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 93 | let val results = map reports_of_token toks | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 94 | in (exists fst results, maps snd results) end; | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 95 | |
| 23726 | 96 | fun present_token tok = | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 97 | Markup.enclose (fst (token_markup tok)) (Output.output (Token.unparse tok)); | 
| 23726 | 98 | |
| 23803 | 99 | end; | 
| 100 | ||
| 101 | ||
| 102 | ||
| 27665 | 103 | (** spans **) | 
| 104 | ||
| 27842 | 105 | (* type span *) | 
| 106 | ||
| 48878 | 107 | datatype span_kind = Command of string * Position.T | Ignored | Malformed; | 
| 36959 
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
 wenzelm parents: 
36950diff
changeset | 108 | datatype span = Span of span_kind * Token.T list; | 
| 23803 | 109 | |
| 27842 | 110 | fun span_kind (Span (k, _)) = k; | 
| 111 | fun span_content (Span (_, toks)) = toks; | |
| 112 | ||
| 48865 | 113 | val present_span = implode o map present_token o span_content; | 
| 114 | ||
| 23803 | 115 | |
| 116 | (* parse *) | |
| 23726 | 117 | |
| 23803 | 118 | local | 
| 119 | ||
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 120 | fun make_span toks = | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 121 | if not (null toks) andalso Token.is_command (hd toks) then | 
| 48878 | 122 | Span (Command (Token.content_of (hd toks), Token.position_of (hd toks)), toks) | 
| 51267 | 123 | else if forall Token.is_improper toks then Span (Ignored, toks) | 
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 124 | else Span (Malformed, toks); | 
| 23726 | 125 | |
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 126 | fun flush (result, span) = if null span then (result, span) else (rev span :: result, []); | 
| 23726 | 127 | |
| 128 | in | |
| 129 | ||
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 130 | fun parse_spans toks = | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 131 | fold (fn tok => Token.is_command tok ? flush #> apsnd (cons tok)) toks ([], []) | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 132 | |> flush | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 133 | |> #1 |> rev |> map make_span; | 
| 23803 | 134 | |
| 135 | end; | |
| 136 | ||
| 137 | ||
| 28434 | 138 | |
| 43621 | 139 | (** specification elements: commands with optional proof **) | 
| 140 | ||
| 51225 | 141 | datatype 'a element = Element of 'a * ('a element list * 'a) option;
 | 
| 142 | ||
| 143 | fun element (a, b) = Element (a, SOME b); | |
| 144 | fun atom a = Element (a, NONE); | |
| 43621 | 145 | |
| 51225 | 146 | fun map_element f (Element (a, NONE)) = Element (f a, NONE) | 
| 147 | | map_element f (Element (a, SOME (elems, b))) = | |
| 148 | Element (f a, SOME ((map o map_element) f elems, f b)); | |
| 149 | ||
| 150 | fun flat_element (Element (a, NONE)) = [a] | |
| 151 | | flat_element (Element (a, SOME (elems, b))) = a :: maps flat_element elems @ [b]; | |
| 43621 | 152 | |
| 51268 
fcc4b89a600d
simplified Outer_Syntax.read_span: internalized Toplevel.is_ignored;
 wenzelm parents: 
51267diff
changeset | 153 | fun last_element (Element (a, NONE)) = a | 
| 
fcc4b89a600d
simplified Outer_Syntax.read_span: internalized Toplevel.is_ignored;
 wenzelm parents: 
51267diff
changeset | 154 | | last_element (Element (_, SOME (_, b))) = b; | 
| 
fcc4b89a600d
simplified Outer_Syntax.read_span: internalized Toplevel.is_ignored;
 wenzelm parents: 
51267diff
changeset | 155 | |
| 28434 | 156 | |
| 157 | (* scanning spans *) | |
| 158 | ||
| 48878 | 159 | val eof = Span (Command ("", Position.none), []);
 | 
| 28434 | 160 | |
| 48878 | 161 | fun is_eof (Span (Command ("", _), _)) = true
 | 
| 28434 | 162 | | is_eof _ = false; | 
| 163 | ||
| 164 | val not_eof = not o is_eof; | |
| 165 | ||
| 166 | val stopper = Scan.stopper (K eof) is_eof; | |
| 167 | ||
| 168 | ||
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 169 | (* parse *) | 
| 28434 | 170 | |
| 171 | local | |
| 172 | ||
| 48878 | 173 | fun command_with pred = | 
| 174 | Scan.one (fn (Span (Command (name, _), _)) => pred name | _ => false); | |
| 28434 | 175 | |
| 51225 | 176 | val proof_atom = | 
| 177 | Scan.one (fn (Span (Command (name, _), _)) => Keyword.is_proof_body name | _ => true) >> atom; | |
| 28434 | 178 | |
| 51225 | 179 | fun proof_element x = (command_with Keyword.is_proof_goal -- proof_rest >> element || proof_atom) x | 
| 180 | and proof_rest x = (Scan.repeat proof_element -- command_with Keyword.is_qed) x; | |
| 181 | ||
| 182 | val other_element = | |
| 183 | command_with Keyword.is_theory_goal -- proof_rest >> element || | |
| 184 | Scan.one not_eof >> atom; | |
| 28434 | 185 | |
| 186 | in | |
| 187 | ||
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 188 | val parse_elements = | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 189 | Source.of_list #> | 
| 51225 | 190 | Source.source stopper (Scan.bulk other_element) NONE #> | 
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 191 | Source.exhaust; | 
| 28434 | 192 | |
| 23726 | 193 | end; | 
| 28434 | 194 | |
| 195 | end; |