| author | wenzelm | 
| Mon, 20 Aug 2012 17:05:53 +0200 | |
| changeset 48867 | e9beabf045ab | 
| parent 48865 | 9824fd676bf4 | 
| child 48878 | 5e850e6fa3c3 | 
| permissions | -rw-r--r-- | 
| 29315 
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
 wenzelm parents: 
28454diff
changeset | 1 | (* Title: Pure/Thy/thy_syntax.ML | 
| 23726 | 2 | Author: Makarius | 
| 3 | ||
| 29315 
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
 wenzelm parents: 
28454diff
changeset | 4 | Superficial theory syntax: tokens and spans. | 
| 23726 | 5 | *) | 
| 6 | ||
| 29315 
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
 wenzelm parents: 
28454diff
changeset | 7 | signature THY_SYNTAX = | 
| 23726 | 8 | sig | 
| 36959 
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
 wenzelm parents: 
36950diff
changeset | 9 | val parse_tokens: Scan.lexicon * Scan.lexicon -> Position.T -> string -> Token.T list | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 10 | val reports_of_tokens: Token.T list -> bool * (Position.report * string) list | 
| 40131 
7cbebd636e79
explicitly qualify type Output.output, which is a slightly odd internal feature;
 wenzelm parents: 
39507diff
changeset | 11 | val present_token: Token.T -> Output.output | 
| 27842 | 12 | datatype span_kind = Command of string | Ignored | Malformed | 
| 48867 
e9beabf045ab
some support for inlining file content into outer syntax token language;
 wenzelm parents: 
48865diff
changeset | 13 | datatype span = Span of span_kind * Token.T list | 
| 27842 | 14 | val span_kind: span -> span_kind | 
| 36959 
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
 wenzelm parents: 
36950diff
changeset | 15 | val span_content: span -> Token.T list | 
| 48865 | 16 | val present_span: span -> Output.output | 
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 17 | val parse_spans: Token.T list -> span list | 
| 43621 | 18 |   type element = {head: span, proof: span list, proper_proof: bool}
 | 
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 19 | val parse_elements: span list -> element list | 
| 23726 | 20 | end; | 
| 21 | ||
| 37216 
3165bc303f66
modernized some structure names, keeping a few legacy aliases;
 wenzelm parents: 
37197diff
changeset | 22 | structure Thy_Syntax: THY_SYNTAX = | 
| 23726 | 23 | struct | 
| 24 | ||
| 23803 | 25 | (** tokens **) | 
| 26 | ||
| 27 | (* parse *) | |
| 23726 | 28 | |
| 40523 
1050315f6ee2
simplified/robustified treatment of malformed symbols, which are now fully internalized (total Symbol.explode etc.);
 wenzelm parents: 
40290diff
changeset | 29 | fun parse_tokens lexs pos = | 
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 30 | Source.of_string #> | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 31 | Symbol.source #> | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 32 |   Token.source {do_recover = SOME false} (K lexs) pos #>
 | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 33 | Source.exhaust; | 
| 23726 | 34 | |
| 35 | ||
| 23803 | 36 | (* present *) | 
| 23726 | 37 | |
| 38 | local | |
| 39 | ||
| 40 | val token_kind_markup = | |
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 41 | fn Token.Command => (Isabelle_Markup.command, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 42 | | Token.Keyword => (Isabelle_Markup.keyword, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 43 | | Token.Ident => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 44 | | Token.LongIdent => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 45 | | Token.SymIdent => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 46 | | Token.Var => (Isabelle_Markup.var, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 47 | | Token.TypeIdent => (Isabelle_Markup.tfree, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 48 | | Token.TypeVar => (Isabelle_Markup.tvar, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 49 | | Token.Nat => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 50 | | Token.Float => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 51 | | Token.String => (Isabelle_Markup.string, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 52 | | Token.AltString => (Isabelle_Markup.altstring, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 53 | | Token.Verbatim => (Isabelle_Markup.verbatim, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 54 | | Token.Space => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 55 | | Token.Comment => (Isabelle_Markup.comment, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 56 | | Token.InternalValue => (Markup.empty, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 57 | | Token.Error msg => (Isabelle_Markup.bad, msg) | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 58 | | Token.Sync => (Isabelle_Markup.control, "") | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 59 | | Token.EOF => (Isabelle_Markup.control, ""); | 
| 23726 | 60 | |
| 37192 | 61 | fun token_markup tok = | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 62 | if Token.keyword_with (not o Lexicon.is_ascii_identifier) tok | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 63 | then (Isabelle_Markup.operator, "") | 
| 37197 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 64 | else | 
| 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 65 | let | 
| 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 66 | val kind = Token.kind_of tok; | 
| 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 67 | val props = | 
| 38471 
0924654b8163
report command token name instead of kind, which can be retrieved later via Outer_Syntax.keyword_kind;
 wenzelm parents: 
38422diff
changeset | 68 | if kind = Token.Command | 
| 
0924654b8163
report command token name instead of kind, which can be retrieved later via Outer_Syntax.keyword_kind;
 wenzelm parents: 
38422diff
changeset | 69 | then Markup.properties [(Markup.nameN, Token.content_of tok)] | 
| 37197 
953fc4983439
more detailed token markup, including command kind as sub_kind;
 wenzelm parents: 
37193diff
changeset | 70 | else I; | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 71 | val (markup, txt) = token_kind_markup kind; | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 72 | in (props markup, txt) end; | 
| 37192 | 73 | |
| 48749 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 74 | fun reports_of_token tok = | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 75 | let | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 76 | val malformed_symbols = | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 77 | Symbol_Pos.explode (Token.source_position_of tok) | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 78 | |> map_filter (fn (sym, pos) => | 
| 48756 
1c843142758e
sneak message into "bad" markup as property -- to be displayed after YXML parsing;
 wenzelm parents: 
48752diff
changeset | 79 | if Symbol.is_malformed sym | 
| 48773 
0e1bab274672
more liberal scanning of potentially malformed symbols;
 wenzelm parents: 
48768diff
changeset | 80 | then SOME ((pos, Isabelle_Markup.bad), "Malformed symbolic character") else NONE); | 
| 48749 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 81 | val is_malformed = Token.is_error tok orelse not (null malformed_symbols); | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 82 | val (markup, txt) = token_markup tok; | 
| 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 83 | val reports = ((Token.position_of tok, markup), txt) :: malformed_symbols; | 
| 48749 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 84 | in (is_malformed, reports) end; | 
| 40528 | 85 | |
| 23803 | 86 | in | 
| 87 | ||
| 48749 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 88 | fun reports_of_tokens toks = | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 89 | let val results = map reports_of_token toks | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 90 | in (exists fst results, maps snd results) end; | 
| 
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
 wenzelm parents: 
46811diff
changeset | 91 | |
| 23726 | 92 | fun present_token tok = | 
| 48768 
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
 wenzelm parents: 
48756diff
changeset | 93 | Markup.enclose (fst (token_markup tok)) (Output.output (Token.unparse tok)); | 
| 23726 | 94 | |
| 23803 | 95 | end; | 
| 96 | ||
| 97 | ||
| 98 | ||
| 27665 | 99 | (** spans **) | 
| 100 | ||
| 27842 | 101 | (* type span *) | 
| 102 | ||
| 103 | datatype span_kind = Command of string | Ignored | Malformed; | |
| 36959 
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
 wenzelm parents: 
36950diff
changeset | 104 | datatype span = Span of span_kind * Token.T list; | 
| 23803 | 105 | |
| 27842 | 106 | fun span_kind (Span (k, _)) = k; | 
| 107 | fun span_content (Span (_, toks)) = toks; | |
| 108 | ||
| 48865 | 109 | val present_span = implode o map present_token o span_content; | 
| 110 | ||
| 23803 | 111 | |
| 112 | (* parse *) | |
| 23726 | 113 | |
| 23803 | 114 | local | 
| 115 | ||
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 116 | fun make_span toks = | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 117 | if not (null toks) andalso Token.is_command (hd toks) then | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 118 | Span (Command (Token.content_of (hd toks)), toks) | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 119 | else if forall (not o Token.is_proper) toks then Span (Ignored, toks) | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 120 | else Span (Malformed, toks); | 
| 23726 | 121 | |
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 122 | fun flush (result, span) = if null span then (result, span) else (rev span :: result, []); | 
| 23726 | 123 | |
| 124 | in | |
| 125 | ||
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 126 | fun parse_spans toks = | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 127 | fold (fn tok => Token.is_command tok ? flush #> apsnd (cons tok)) toks ([], []) | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 128 | |> flush | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 129 | |> #1 |> rev |> map make_span; | 
| 23803 | 130 | |
| 131 | end; | |
| 132 | ||
| 133 | ||
| 28434 | 134 | |
| 43621 | 135 | (** specification elements: commands with optional proof **) | 
| 136 | ||
| 137 | type element = {head: span, proof: span list, proper_proof: bool};
 | |
| 138 | ||
| 139 | fun make_element head proof proper_proof = | |
| 140 |   {head = head, proof = proof, proper_proof = proper_proof};
 | |
| 141 | ||
| 28434 | 142 | |
| 143 | (* scanning spans *) | |
| 144 | ||
| 145 | val eof = Span (Command "", []); | |
| 146 | ||
| 147 | fun is_eof (Span (Command "", _)) = true | |
| 148 | | is_eof _ = false; | |
| 149 | ||
| 150 | val not_eof = not o is_eof; | |
| 151 | ||
| 152 | val stopper = Scan.stopper (K eof) is_eof; | |
| 153 | ||
| 154 | ||
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 155 | (* parse *) | 
| 28434 | 156 | |
| 157 | local | |
| 158 | ||
| 159 | fun command_with pred = Scan.one (fn (Span (Command name, _)) => pred name | _ => false); | |
| 160 | ||
| 161 | val proof = Scan.pass 1 (Scan.repeat (Scan.depend (fn d => | |
| 28454 
c63168db774c
unit_source: more rigid parsing, stop after final qed;
 wenzelm parents: 
28438diff
changeset | 162 | if d <= 0 then Scan.fail | 
| 28434 | 163 | else | 
| 36950 | 164 | command_with Keyword.is_qed_global >> pair ~1 || | 
| 165 | command_with Keyword.is_proof_goal >> pair (d + 1) || | |
| 166 | (if d = 0 then Scan.fail else command_with Keyword.is_qed >> pair (d - 1)) || | |
| 167 | Scan.unless (command_with Keyword.is_theory) (Scan.one not_eof) >> pair d)) -- Scan.state); | |
| 28434 | 168 | |
| 43621 | 169 | val element = | 
| 170 | command_with Keyword.is_theory_goal -- proof | |
| 171 | >> (fn (a, (bs, d)) => make_element a bs (d >= 0)) || | |
| 172 | Scan.one not_eof >> (fn a => make_element a [] true); | |
| 28434 | 173 | |
| 174 | in | |
| 175 | ||
| 46811 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 176 | val parse_elements = | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 177 | Source.of_list #> | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 178 | Source.source stopper (Scan.bulk element) NONE #> | 
| 
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
 wenzelm parents: 
45666diff
changeset | 179 | Source.exhaust; | 
| 28434 | 180 | |
| 23726 | 181 | end; | 
| 28434 | 182 | |
| 183 | end; |