author | wenzelm |
Sun, 24 Feb 2013 17:29:55 +0100 | |
changeset 51268 | fcc4b89a600d |
parent 51267 | c68c1b89a0f1 |
child 51321 | 75682dfff630 |
permissions | -rw-r--r-- |
29315
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
wenzelm
parents:
28454
diff
changeset
|
1 |
(* Title: Pure/Thy/thy_syntax.ML |
23726 | 2 |
Author: Makarius |
3 |
||
29315
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
wenzelm
parents:
28454
diff
changeset
|
4 |
Superficial theory syntax: tokens and spans. |
23726 | 5 |
*) |
6 |
||
29315
b074c05f00ad
renamed ThyEdit (in thy_edit.ML) to ThySyntax (in thy_syntax.ML);
wenzelm
parents:
28454
diff
changeset
|
7 |
signature THY_SYNTAX = |
23726 | 8 |
sig |
36959
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
wenzelm
parents:
36950
diff
changeset
|
9 |
val parse_tokens: Scan.lexicon * Scan.lexicon -> Position.T -> string -> Token.T list |
48768
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
10 |
val reports_of_tokens: Token.T list -> bool * (Position.report * string) list |
40131
7cbebd636e79
explicitly qualify type Output.output, which is a slightly odd internal feature;
wenzelm
parents:
39507
diff
changeset
|
11 |
val present_token: Token.T -> Output.output |
48878 | 12 |
datatype span_kind = Command of string * Position.T | Ignored | Malformed |
48867
e9beabf045ab
some support for inlining file content into outer syntax token language;
wenzelm
parents:
48865
diff
changeset
|
13 |
datatype span = Span of span_kind * Token.T list |
27842 | 14 |
val span_kind: span -> span_kind |
36959
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
wenzelm
parents:
36950
diff
changeset
|
15 |
val span_content: span -> Token.T list |
48865 | 16 |
val present_span: span -> Output.output |
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
17 |
val parse_spans: Token.T list -> span list |
51225 | 18 |
datatype 'a element = Element of 'a * ('a element list * 'a) option |
19 |
val map_element: ('a -> 'b) -> 'a element -> 'b element |
|
20 |
val flat_element: 'a element -> 'a list |
|
51268
fcc4b89a600d
simplified Outer_Syntax.read_span: internalized Toplevel.is_ignored;
wenzelm
parents:
51267
diff
changeset
|
21 |
val last_element: 'a element -> 'a |
51225 | 22 |
val parse_elements: span list -> span element list |
23726 | 23 |
end; |
24 |
||
37216
3165bc303f66
modernized some structure names, keeping a few legacy aliases;
wenzelm
parents:
37197
diff
changeset
|
25 |
structure Thy_Syntax: THY_SYNTAX = |
23726 | 26 |
struct |
27 |
||
23803 | 28 |
(** tokens **) |
29 |
||
30 |
(* parse *) |
|
23726 | 31 |
|
40523
1050315f6ee2
simplified/robustified treatment of malformed symbols, which are now fully internalized (total Symbol.explode etc.);
wenzelm
parents:
40290
diff
changeset
|
32 |
fun parse_tokens lexs pos = |
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
33 |
Source.of_string #> |
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
34 |
Symbol.source #> |
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
35 |
Token.source {do_recover = SOME false} (K lexs) pos #> |
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
36 |
Source.exhaust; |
23726 | 37 |
|
38 |
||
23803 | 39 |
(* present *) |
23726 | 40 |
|
41 |
local |
|
42 |
||
43 |
val token_kind_markup = |
|
50201
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
44 |
fn Token.Command => (Markup.command, "") |
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
45 |
| Token.Keyword => (Markup.keyword, "") |
48768
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
46 |
| Token.Ident => (Markup.empty, "") |
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
47 |
| Token.LongIdent => (Markup.empty, "") |
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
48 |
| Token.SymIdent => (Markup.empty, "") |
50201
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
49 |
| Token.Var => (Markup.var, "") |
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
50 |
| Token.TypeIdent => (Markup.tfree, "") |
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
51 |
| Token.TypeVar => (Markup.tvar, "") |
48768
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
52 |
| Token.Nat => (Markup.empty, "") |
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
53 |
| Token.Float => (Markup.empty, "") |
50201
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
54 |
| Token.String => (Markup.string, "") |
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
55 |
| Token.AltString => (Markup.altstring, "") |
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
56 |
| Token.Verbatim => (Markup.verbatim, "") |
48768
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
57 |
| Token.Space => (Markup.empty, "") |
50201
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
58 |
| Token.Comment => (Markup.comment, "") |
48768
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
59 |
| Token.InternalValue => (Markup.empty, "") |
50201
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
60 |
| Token.Error msg => (Markup.bad, msg) |
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
61 |
| Token.Sync => (Markup.control, "") |
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
62 |
| Token.EOF => (Markup.control, ""); |
23726 | 63 |
|
37192 | 64 |
fun token_markup tok = |
50238
98d35a7368bd
more uniform Symbol.is_ascii_identifier in ML/Scala;
wenzelm
parents:
50201
diff
changeset
|
65 |
if Token.keyword_with (not o Symbol.is_ascii_identifier) tok |
50201
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
66 |
then (Markup.operator, "") |
37197
953fc4983439
more detailed token markup, including command kind as sub_kind;
wenzelm
parents:
37193
diff
changeset
|
67 |
else |
953fc4983439
more detailed token markup, including command kind as sub_kind;
wenzelm
parents:
37193
diff
changeset
|
68 |
let |
953fc4983439
more detailed token markup, including command kind as sub_kind;
wenzelm
parents:
37193
diff
changeset
|
69 |
val kind = Token.kind_of tok; |
953fc4983439
more detailed token markup, including command kind as sub_kind;
wenzelm
parents:
37193
diff
changeset
|
70 |
val props = |
38471
0924654b8163
report command token name instead of kind, which can be retrieved later via Outer_Syntax.keyword_kind;
wenzelm
parents:
38422
diff
changeset
|
71 |
if kind = Token.Command |
0924654b8163
report command token name instead of kind, which can be retrieved later via Outer_Syntax.keyword_kind;
wenzelm
parents:
38422
diff
changeset
|
72 |
then Markup.properties [(Markup.nameN, Token.content_of tok)] |
37197
953fc4983439
more detailed token markup, including command kind as sub_kind;
wenzelm
parents:
37193
diff
changeset
|
73 |
else I; |
48768
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
74 |
val (markup, txt) = token_kind_markup kind; |
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
75 |
in (props markup, txt) end; |
37192 | 76 |
|
48749
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
77 |
fun reports_of_token tok = |
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
78 |
let |
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
79 |
val malformed_symbols = |
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
80 |
Symbol_Pos.explode (Token.source_position_of tok) |
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
81 |
|> map_filter (fn (sym, pos) => |
48756
1c843142758e
sneak message into "bad" markup as property -- to be displayed after YXML parsing;
wenzelm
parents:
48752
diff
changeset
|
82 |
if Symbol.is_malformed sym |
50201
c26369c9eda6
Isabelle-specific implementation of quasi-abstract markup elements -- back to module arrangement before d83797ef0d2d;
wenzelm
parents:
48878
diff
changeset
|
83 |
then SOME ((pos, Markup.bad), "Malformed symbolic character") else NONE); |
48749
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
84 |
val is_malformed = Token.is_error tok orelse not (null malformed_symbols); |
48768
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
85 |
val (markup, txt) = token_markup tok; |
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
86 |
val reports = ((Token.position_of tok, markup), txt) :: malformed_symbols; |
48749
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
87 |
in (is_malformed, reports) end; |
40528 | 88 |
|
23803 | 89 |
in |
90 |
||
48749
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
91 |
fun reports_of_tokens toks = |
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
92 |
let val results = map reports_of_token toks |
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
93 |
in (exists fst results, maps snd results) end; |
c197b3c3e7fa
some attempts to keep malformed syntax errors focussed, without too much red spilled onto the document view;
wenzelm
parents:
46811
diff
changeset
|
94 |
|
23726 | 95 |
fun present_token tok = |
48768
abc45de5bb22
clarified "bad" markup: proper body text, invent missing serial on Scala side (counting backwards);
wenzelm
parents:
48756
diff
changeset
|
96 |
Markup.enclose (fst (token_markup tok)) (Output.output (Token.unparse tok)); |
23726 | 97 |
|
23803 | 98 |
end; |
99 |
||
100 |
||
101 |
||
27665 | 102 |
(** spans **) |
103 |
||
27842 | 104 |
(* type span *) |
105 |
||
48878 | 106 |
datatype span_kind = Command of string * Position.T | Ignored | Malformed; |
36959
f5417836dbea
renamed structure OuterLex to Token and type token to Token.T, keeping legacy aliases for some time;
wenzelm
parents:
36950
diff
changeset
|
107 |
datatype span = Span of span_kind * Token.T list; |
23803 | 108 |
|
27842 | 109 |
fun span_kind (Span (k, _)) = k; |
110 |
fun span_content (Span (_, toks)) = toks; |
|
111 |
||
48865 | 112 |
val present_span = implode o map present_token o span_content; |
113 |
||
23803 | 114 |
|
115 |
(* parse *) |
|
23726 | 116 |
|
23803 | 117 |
local |
118 |
||
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
119 |
fun make_span toks = |
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
120 |
if not (null toks) andalso Token.is_command (hd toks) then |
48878 | 121 |
Span (Command (Token.content_of (hd toks), Token.position_of (hd toks)), toks) |
51267 | 122 |
else if forall Token.is_improper toks then Span (Ignored, toks) |
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
123 |
else Span (Malformed, toks); |
23726 | 124 |
|
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
125 |
fun flush (result, span) = if null span then (result, span) else (rev span :: result, []); |
23726 | 126 |
|
127 |
in |
|
128 |
||
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
129 |
fun parse_spans toks = |
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
130 |
fold (fn tok => Token.is_command tok ? flush #> apsnd (cons tok)) toks ([], []) |
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
131 |
|> flush |
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
132 |
|> #1 |> rev |> map make_span; |
23803 | 133 |
|
134 |
end; |
|
135 |
||
136 |
||
28434 | 137 |
|
43621 | 138 |
(** specification elements: commands with optional proof **) |
139 |
||
51225 | 140 |
datatype 'a element = Element of 'a * ('a element list * 'a) option; |
141 |
||
142 |
fun element (a, b) = Element (a, SOME b); |
|
143 |
fun atom a = Element (a, NONE); |
|
43621 | 144 |
|
51225 | 145 |
fun map_element f (Element (a, NONE)) = Element (f a, NONE) |
146 |
| map_element f (Element (a, SOME (elems, b))) = |
|
147 |
Element (f a, SOME ((map o map_element) f elems, f b)); |
|
148 |
||
149 |
fun flat_element (Element (a, NONE)) = [a] |
|
150 |
| flat_element (Element (a, SOME (elems, b))) = a :: maps flat_element elems @ [b]; |
|
43621 | 151 |
|
51268
fcc4b89a600d
simplified Outer_Syntax.read_span: internalized Toplevel.is_ignored;
wenzelm
parents:
51267
diff
changeset
|
152 |
fun last_element (Element (a, NONE)) = a |
fcc4b89a600d
simplified Outer_Syntax.read_span: internalized Toplevel.is_ignored;
wenzelm
parents:
51267
diff
changeset
|
153 |
| last_element (Element (_, SOME (_, b))) = b; |
fcc4b89a600d
simplified Outer_Syntax.read_span: internalized Toplevel.is_ignored;
wenzelm
parents:
51267
diff
changeset
|
154 |
|
28434 | 155 |
|
156 |
(* scanning spans *) |
|
157 |
||
48878 | 158 |
val eof = Span (Command ("", Position.none), []); |
28434 | 159 |
|
48878 | 160 |
fun is_eof (Span (Command ("", _), _)) = true |
28434 | 161 |
| is_eof _ = false; |
162 |
||
163 |
val not_eof = not o is_eof; |
|
164 |
||
165 |
val stopper = Scan.stopper (K eof) is_eof; |
|
166 |
||
167 |
||
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
168 |
(* parse *) |
28434 | 169 |
|
170 |
local |
|
171 |
||
48878 | 172 |
fun command_with pred = |
173 |
Scan.one (fn (Span (Command (name, _), _)) => pred name | _ => false); |
|
28434 | 174 |
|
51225 | 175 |
val proof_atom = |
176 |
Scan.one (fn (Span (Command (name, _), _)) => Keyword.is_proof_body name | _ => true) >> atom; |
|
28434 | 177 |
|
51225 | 178 |
fun proof_element x = (command_with Keyword.is_proof_goal -- proof_rest >> element || proof_atom) x |
179 |
and proof_rest x = (Scan.repeat proof_element -- command_with Keyword.is_qed) x; |
|
180 |
||
181 |
val other_element = |
|
182 |
command_with Keyword.is_theory_goal -- proof_rest >> element || |
|
183 |
Scan.one not_eof >> atom; |
|
28434 | 184 |
|
185 |
in |
|
186 |
||
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
187 |
val parse_elements = |
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
188 |
Source.of_list #> |
51225 | 189 |
Source.source stopper (Scan.bulk other_element) NONE #> |
46811
03a2dc9e0624
clarified command span: include trailing whitespace/comments and thus reduce number of ignored spans with associated transactions and states (factor 2);
wenzelm
parents:
45666
diff
changeset
|
190 |
Source.exhaust; |
28434 | 191 |
|
23726 | 192 |
end; |
28434 | 193 |
|
194 |
end; |