12 macros or code their own translation functions. It describes the |
12 macros or code their own translation functions. It describes the |
13 transformations between parse trees, abstract syntax trees and terms. |
13 transformations between parse trees, abstract syntax trees and terms. |
14 |
14 |
15 |
15 |
16 \section{Abstract syntax trees} \label{sec:asts} |
16 \section{Abstract syntax trees} \label{sec:asts} |
17 \index{ASTs|(} |
17 \index{ASTs|(} |
18 |
18 |
19 The parser, given a token list from the lexer, applies productions to yield |
19 The parser, given a token list from the lexer, applies productions to yield |
20 a parse tree\index{parse trees}. By applying some internal transformations |
20 a parse tree\index{parse trees}. By applying some internal transformations |
21 the parse tree becomes an abstract syntax tree, or \AST{}. Macro |
21 the parse tree becomes an abstract syntax tree, or \AST{}. Macro |
22 expansion, further translations and finally type inference yields a |
22 expansion, further translations and finally type inference yields a |
70 Appl [Constant "_abs", Variable "x", Variable "t"], |
70 Appl [Constant "_abs", Variable "x", Variable "t"], |
71 Appl [Constant "fun", Variable "'a", Variable "'b"]] |
71 Appl [Constant "fun", Variable "'a", Variable "'b"]] |
72 \end{ttbox} |
72 \end{ttbox} |
73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. |
73 is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}. |
74 Both {\tt ()} and {\tt (f)} are illegal because they have too few |
74 Both {\tt ()} and {\tt (f)} are illegal because they have too few |
75 subtrees. |
75 subtrees. |
76 |
76 |
77 The resemblance to Lisp's S-expressions is intentional, but there are two |
77 The resemblance to Lisp's S-expressions is intentional, but there are two |
78 kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the |
78 kinds of atomic symbols: $\Constant x$ and $\Variable x$. Do not take the |
79 names {\tt Constant} and {\tt Variable} too literally; in the later |
79 names {\tt Constant} and {\tt Variable} too literally; in the later |
80 translation to terms, $\Variable x$ may become a constant, free or bound |
80 translation to terms, $\Variable x$ may become a constant, free or bound |
103 |
103 |
104 The parse tree is constructed by nesting the right-hand sides of the |
104 The parse tree is constructed by nesting the right-hand sides of the |
105 productions used to recognize the input. Such parse trees are simply lists |
105 productions used to recognize the input. Such parse trees are simply lists |
106 of tokens and constituent parse trees, the latter representing the |
106 of tokens and constituent parse trees, the latter representing the |
107 nonterminals of the productions. Let us refer to the actual productions in |
107 nonterminals of the productions. Let us refer to the actual productions in |
108 the form displayed by {\tt Syntax.print_syntax}. |
108 the form displayed by {\tt print_syntax} (see \S\ref{sec:inspct-thy} for an |
|
109 example). |
109 |
110 |
110 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s |
111 Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s |
111 by stripping out delimiters and copy productions. More precisely, the |
112 by stripping out delimiters and copy productions. More precisely, the |
112 mapping $\astofpt{-}$ is derived from the productions as follows: |
113 mapping $\astofpt{-}$ is derived from the productions as follows: |
113 \begin{itemize} |
114 \begin{itemize} |
114 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, |
115 \item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id}, |
115 \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string. |
116 \ndx{var}, \ndx{tid}, \ndx{tvar}, \ndx{xnum} or \ndx{xstr} token, and $s$ |
|
117 its associated string. Note that for {\tt xstr} this does not include the |
|
118 quotes. |
116 |
119 |
117 \item Copy productions:\index{productions!copy} |
120 \item Copy productions:\index{productions!copy} |
118 $\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for |
121 $\astofpt{\ldots P \ldots} = \astofpt{P}$. Here $\ldots$ stands for |
119 strings of delimiters, which are discarded. $P$ stands for the single |
122 strings of delimiters, which are discarded. $P$ stands for the single |
120 constituent that is not a delimiter; it is either a nonterminal symbol or |
123 constituent that is not a delimiter; it is either a nonterminal symbol or |
121 a name token. |
124 a name token. |
122 |
125 |
123 \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. |
126 \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$. |
124 Here there are no constituents other than delimiters, which are |
127 Here there are no constituents other than delimiters, which are |
125 discarded. |
128 discarded. |
126 |
129 |
127 \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and |
130 \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and |
128 the remaining constituents $P@1$, \ldots, $P@n$ are built into an |
131 the remaining constituents $P@1$, \ldots, $P@n$ are built into an |
129 application whose head constant is~$c$: |
132 application whose head constant is~$c$: |
130 \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = |
133 \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = |
131 \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} |
134 \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}} |
132 \] |
135 \] |
133 \end{itemize} |
136 \end{itemize} |
134 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, |
137 Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==}, |
135 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. |
138 {\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax. |
252 of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ |
255 of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$ |
253 be obtained from~$t$ by replacing all bound occurrences of~$x$ by |
256 be obtained from~$t$ by replacing all bound occurrences of~$x$ by |
254 the free variable $x'$. This replaces corresponding occurrences of the |
257 the free variable $x'$. This replaces corresponding occurrences of the |
255 constructor \ttindex{Bound} by the term $\ttfct{Free} (x', |
258 constructor \ttindex{Bound} by the term $\ttfct{Free} (x', |
256 \mtt{dummyT})$: |
259 \mtt{dummyT})$: |
257 \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = |
260 \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = |
258 \Appl{\Constant \mtt{"_abs"}, |
261 \Appl{\Constant \mtt{"_abs"}, |
259 constrain(\Variable x', \tau), \astofterm{t'}}. |
262 constrain(\Variable x', \tau), \astofterm{t'}}. |
260 \] |
263 \] |
261 |
264 |
262 \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. |
265 \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$. |
263 The occurrence of constructor \ttindex{Bound} should never happen |
266 The occurrence of constructor \ttindex{Bound} should never happen |
264 when printing well-typed terms; it indicates a de Bruijn index with no |
267 when printing well-typed terms; it indicates a de Bruijn index with no |
265 matching abstraction. |
268 matching abstraction. |
266 |
269 |
267 \item Where $f$ is not an application, |
270 \item Where $f$ is not an application, |
268 \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = |
271 \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = |
269 \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} |
272 \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}} |
270 \] |
273 \] |
271 \end{itemize} |
274 \end{itemize} |
272 % |
275 % |
273 Type constraints\index{type constraints} are inserted to allow the printing |
276 Type constraints\index{type constraints} are inserted to allow the printing |
274 of types. This is governed by the boolean variable \ttindex{show_types}: |
277 of types. This is governed by the boolean variable \ttindex{show_types}: |
275 \begin{itemize} |
278 \begin{itemize} |
276 \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or |
279 \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or |
277 \ttindex{show_types} is set to {\tt false}. |
280 \ttindex{show_types} is set to {\tt false}. |
278 |
281 |
279 \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, |
282 \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, |
280 \astofterm{\tau}}$ \ otherwise. |
283 \astofterm{\tau}}$ \ otherwise. |
281 |
284 |
282 Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type |
285 Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type |
283 constructors go to {\tt Constant}s; type identifiers go to {\tt |
286 constructors go to {\tt Constant}s; type identifiers go to {\tt |
284 Variable}s; type applications go to {\tt Appl}s with the type |
287 Variable}s; type applications go to {\tt Appl}s with the type |
285 constructor as the first element. If \ttindex{show_sorts} is set to |
288 constructor as the first element. If \ttindex{show_sorts} is set to |
336 express all but the most obscure translations. |
339 express all but the most obscure translations. |
337 |
340 |
338 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set |
341 Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set |
339 theory.\footnote{This and the following theories are complete working |
342 theory.\footnote{This and the following theories are complete working |
340 examples, though they specify only syntax, no axioms. The file {\tt |
343 examples, though they specify only syntax, no axioms. The file {\tt |
341 ZF/ZF.thy} presents the full set theory definition, including many |
344 ZF/ZF.thy} presents a full set theory definition, including many |
342 macro rules.} Theory {\tt SET} defines constants for set comprehension |
345 macro rules.} Theory {\tt SET} defines constants for set comprehension |
343 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal |
346 ({\tt Collect}), replacement ({\tt Replace}) and bounded universal |
344 quantification ({\tt Ball}). Each of these binds some variables. Without |
347 quantification ({\tt Ball}). Each of these binds some variables. Without |
345 additional syntax we should have to write $\forall x \in A. P$ as {\tt |
348 additional syntax we should have to write $\forall x \in A. P$ as {\tt |
346 Ball(A,\%x.P)}, and similarly for the others. |
349 Ball(A,\%x.P)}, and similarly for the others. |
347 |
350 |
348 \begin{figure} |
351 \begin{figure} |
349 \begin{ttbox} |
352 \begin{ttbox} |
350 SET = Pure + |
353 SET = Pure + |
351 types |
354 types |
352 i, o |
355 i o |
353 arities |
356 arities |
354 i, o :: logic |
357 i, o :: logic |
355 consts |
358 consts |
356 Trueprop :: "o => prop" ("_" 5) |
359 Trueprop :: "o => prop" ("_" 5) |
357 Collect :: "[i, i => o] => i" |
360 Collect :: "[i, i => o] => i" |
|
361 Replace :: "[i, [i, i] => o] => i" |
|
362 Ball :: "[i, i => o] => o" |
|
363 syntax |
358 "{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})") |
364 "{\at}Collect" :: "[idt, i, o] => i" ("(1{\ttlbrace}_:_./ _{\ttrbrace})") |
359 Replace :: "[i, [i, i] => o] => i" |
|
360 "{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})") |
365 "{\at}Replace" :: "[idt, idt, i, o] => i" ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})") |
361 Ball :: "[i, i => o] => o" |
|
362 "{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10) |
366 "{\at}Ball" :: "[idt, i, o] => o" ("(3ALL _:_./ _)" 10) |
363 translations |
367 translations |
364 "{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)" |
368 "{\ttlbrace}x:A. P{\ttrbrace}" == "Collect(A, \%x. P)" |
365 "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)" |
369 "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)" |
366 "ALL x:A. P" == "Ball(A, \%x. P)" |
370 "ALL x:A. P" == "Ball(A, \%x. P)" |
367 end |
371 end |
368 \end{ttbox} |
372 \end{ttbox} |
369 \caption{Macro example: set theory}\label{fig:set_trans} |
373 \caption{Macro example: set theory}\label{fig:set_trans} |
370 \end{figure} |
374 \end{figure} |
371 |
375 |
372 The theory specifies a variable-binding syntax through additional |
376 The theory specifies a variable-binding syntax through additional productions |
373 productions that have mixfix declarations. Each non-copy production must |
377 that have mixfix declarations. Each non-copy production must specify some |
374 specify some constant, which is used for building \AST{}s. |
378 constant, which is used for building \AST{}s. \index{constants!syntactic} The |
375 \index{constants!syntactic} The additional constants are decorated with |
379 additional constants are decorated with {\tt\at} to stress their purely |
376 {\tt\at} to stress their purely syntactic purpose; they should never occur |
380 syntactic purpose; they may not occur within the final well-typed terms, |
377 within the final well-typed terms. Furthermore, they cannot be written in |
381 being declared as {\tt syntax} rather than {\tt consts}. |
378 formulae because they are not legal identifiers. |
|
379 |
382 |
380 The translations cause the replacement of external forms by internal forms |
383 The translations cause the replacement of external forms by internal forms |
381 after parsing, and vice versa before printing of terms. As a specification |
384 after parsing, and vice versa before printing of terms. As a specification |
382 of the set theory notation, they should be largely self-explanatory. The |
385 of the set theory notation, they should be largely self-explanatory. The |
383 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball}, |
386 syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball}, |
433 {\tt _K}). Thus, a constant whose name starts with an underscore can |
436 {\tt _K}). Thus, a constant whose name starts with an underscore can |
434 appear in macro rules but not in ordinary terms. |
437 appear in macro rules but not in ordinary terms. |
435 |
438 |
436 Some atoms of the macro rule's \AST{} are designated as constants for |
439 Some atoms of the macro rule's \AST{} are designated as constants for |
437 matching. These are all names that have been declared as classes, types or |
440 matching. These are all names that have been declared as classes, types or |
438 constants. |
441 constants (logical and syntactic). |
439 |
442 |
440 The result of this preprocessing is two lists of macro rules, each stored |
443 The result of this preprocessing is two lists of macro rules, each stored |
441 as a pair of \AST{}s. They can be viewed using {\tt Syntax.print_syntax} |
444 as a pair of \AST{}s. They can be viewed using {\tt print_syntax} |
442 (sections \ttindex{parse_rules} and \ttindex{print_rules}). For |
445 (sections \ttindex{parse_rules} and \ttindex{print_rules}). For |
443 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are |
446 theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are |
444 \begin{ttbox} |
447 \begin{ttbox} |
445 parse_rules: |
448 parse_rules: |
446 ("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P)) |
449 ("{\at}Collect" x A P) -> ("Collect" A ("_abs" x P)) |
475 Another trap concerns type constraints. If \ttindex{show_types} is set to |
478 Another trap concerns type constraints. If \ttindex{show_types} is set to |
476 {\tt true}, bound variables will be decorated by their meta types at the |
479 {\tt true}, bound variables will be decorated by their meta types at the |
477 binding place (but not at occurrences in the body). Matching with |
480 binding place (but not at occurrences in the body). Matching with |
478 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y |
481 \verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y |
479 "i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to |
482 "i")} rather than only {\tt y}. \AST{} rewriting will cause the constraint to |
480 appear in the external form, say \verb|{y::i:A::i. P::o}|. |
483 appear in the external form, say \verb|{y::i:A::i. P::o}|. |
481 |
484 |
482 To allow such constraints to be re-read, your syntax should specify bound |
485 To allow such constraints to be re-read, your syntax should specify bound |
483 variables using the nonterminal~\ndx{idt}. This is the case in our |
486 variables using the nonterminal~\ndx{idt}. This is the case in our |
484 example. Choosing {\tt id} instead of {\tt idt} is a common error, |
487 example. Choosing {\tt id} instead of {\tt idt} is a common error, |
485 especially since it appears in former versions of most of Isabelle's |
488 especially since it appears in former versions of most of Isabelle's |
524 The second case above may look odd. This is where {\tt Variable}s of |
527 The second case above may look odd. This is where {\tt Variable}s of |
525 non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not |
528 non-rule \AST{}s behave like {\tt Constant}s. Recall that \AST{}s are not |
526 far removed from parse trees; at this level it is not yet known which |
529 far removed from parse trees; at this level it is not yet known which |
527 identifiers will become constants, bounds, frees, types or classes. As |
530 identifiers will become constants, bounds, frees, types or classes. As |
528 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as |
531 \S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as |
529 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and |
532 {\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid}, |
530 \ndx{tvar} become {\tt Variable}s. On the other hand, when \AST{}s |
533 \ndx{tvar}, \ndx{xnum} and \ndx{xstr} become {\tt Variable}s. On the other |
531 generated from terms for printing, all constants and type constructors |
534 hand, when \AST{}s generated from terms for printing, all constants and type |
532 become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may contain a |
535 constructors become {\tt Constant}s; see \S\ref{sec:asts}. Thus \AST{}s may |
533 messy mixture of {\tt Variable}s and {\tt Constant}s. This is |
536 contain a messy mixture of {\tt Variable}s and {\tt Constant}s. This is |
534 insignificant at macro level because matching treats them alike. |
537 insignificant at macro level because matching treats them alike. |
535 |
538 |
536 Because of this behaviour, different kinds of atoms with the same name are |
539 Because of this behaviour, different kinds of atoms with the same name are |
537 indistinguishable, which may make some rules prone to misbehaviour. Example: |
540 indistinguishable, which may make some rules prone to misbehaviour. Example: |
538 \begin{ttbox} |
541 \begin{ttbox} |
539 types |
542 types |
540 Nil |
543 Nil |
541 consts |
544 consts |
542 Nil :: "'a list" |
545 Nil :: "'a list" |
|
546 syntax |
543 "[]" :: "'a list" ("[]") |
547 "[]" :: "'a list" ("[]") |
544 translations |
548 translations |
545 "[]" == "Nil" |
549 "[]" == "Nil" |
546 \end{ttbox} |
550 \end{ttbox} |
547 The term {\tt Nil} will be printed as {\tt []}, just as expected. |
551 The term {\tt Nil} will be printed as {\tt []}, just as expected. |
548 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be |
552 The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be |
549 expected! How is the type~{\tt Nil} printed? |
553 expected! Guess how type~{\tt Nil} is printed? |
550 |
554 |
551 Normalizing an \AST{} involves repeatedly applying macro rules until none |
555 Normalizing an \AST{} involves repeatedly applying macro rules until none |
552 are applicable. Macro rules are chosen in the order that they appear in the |
556 are applicable. Macro rules are chosen in the order that they appear in the |
553 {\tt translations} section. You can watch the normalization of \AST{}s |
557 {\tt translations} section. You can watch the normalization of \AST{}s |
554 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to |
558 during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to |
571 \index{"@Finset@{\tt\at Finset} constant} |
575 \index{"@Finset@{\tt\at Finset} constant} |
572 \begin{ttbox} |
576 \begin{ttbox} |
573 FINSET = SET + |
577 FINSET = SET + |
574 types |
578 types |
575 is |
579 is |
576 consts |
580 syntax |
577 "" :: "i => is" ("_") |
581 "" :: "i => is" ("_") |
578 "{\at}Enum" :: "[i, is] => is" ("_,/ _") |
582 "{\at}Enum" :: "[i, is] => is" ("_,/ _") |
|
583 consts |
579 empty :: "i" ("{\ttlbrace}{\ttrbrace}") |
584 empty :: "i" ("{\ttlbrace}{\ttrbrace}") |
580 insert :: "[i, i] => i" |
585 insert :: "[i, i] => i" |
|
586 syntax |
581 "{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}") |
587 "{\at}Finset" :: "is => i" ("{\ttlbrace}(_){\ttrbrace}") |
582 translations |
588 translations |
583 "{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})" |
589 "{\ttlbrace}x, xs{\ttrbrace}" == "insert(x, {\ttlbrace}xs{\ttrbrace})" |
584 "{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})" |
590 "{\ttlbrace}x{\ttrbrace}" == "insert(x, {\ttlbrace}{\ttrbrace})" |
585 end |
591 end |
594 i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|} |
600 i} separated by commas. The mixfix declaration \hbox{\verb|"_,/ _"|} |
595 allows a line break after the comma for \rmindex{pretty printing}; if no |
601 allows a line break after the comma for \rmindex{pretty printing}; if no |
596 line break is required then a space is printed instead. |
602 line break is required then a space is printed instead. |
597 |
603 |
598 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities} |
604 The nonterminal is declared as the type~{\tt is}, but with no {\tt arities} |
599 declaration. Hence {\tt is} is not a logical type and no default |
605 declaration. Hence {\tt is} is not a logical type and may be used safely as |
600 productions are added. If we had needed enumerations of the nonterminal |
606 a new nonterminal for custom syntax. The nonterminal~{\tt is} can later be |
601 {\tt logic}, which would include all the logical types, we could have used |
607 re-used for other enumerations of type~{\tt i} like lists or tuples. If we |
602 the predefined nonterminal symbol \ndx{args} and skipped this part |
608 had needed polymorphic enumerations, we could have used the predefined |
603 altogether. The nonterminal~{\tt is} can later be reused for other |
609 nonterminal symbol \ndx{args} and skipped this part altogether. |
604 enumerations of type~{\tt i} like lists or tuples. |
|
605 |
610 |
606 \index{"@Finset@{\tt\at Finset} constant} |
611 \index{"@Finset@{\tt\at Finset} constant} |
607 Next follows {\tt empty}, which is already equipped with its syntax |
612 Next follows {\tt empty}, which is already equipped with its syntax |
608 \verb|{}|, and {\tt insert} without concrete syntax. The syntactic |
613 \verb|{}|, and {\tt insert} without concrete syntax. The syntactic |
609 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt |
614 constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt |
618 ("{\at}Finset" x) -> ("insert" x "empty") |
623 ("{\at}Finset" x) -> ("insert" x "empty") |
619 print_rules: |
624 print_rules: |
620 ("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs)) |
625 ("insert" x ("{\at}Finset" xs)) -> ("{\at}Finset" ("{\at}Enum" x xs)) |
621 ("insert" x "empty") -> ("{\at}Finset" x) |
626 ("insert" x "empty") -> ("{\at}Finset" x) |
622 \end{ttbox} |
627 \end{ttbox} |
623 This shows that \verb|{x, xs}| indeed matches any set enumeration of at least |
628 This shows that \verb|{x,xs}| indeed matches any set enumeration of at least |
624 two elements, binding the first to {\tt x} and the rest to {\tt xs}. |
629 two elements, binding the first to {\tt x} and the rest to {\tt xs}. |
625 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration. |
630 Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration. |
626 The parse rules only work in the order given. |
631 The parse rules only work in the order given. |
627 |
632 |
628 \begin{warn} |
633 \begin{warn} |
629 The \AST{} rewriter cannot distinguish constants from variables and looks |
634 The \AST{} rewriter cannot distinguish constants from variables and looks |
630 only for names of atoms. Thus the names of {\tt Constant}s occurring in |
635 only for names of atoms. Thus the names of {\tt Constant}s occurring in |
633 sufficiently long and strange names. If a bound variable's name gets |
638 sufficiently long and strange names. If a bound variable's name gets |
634 rewritten, the result will be incorrect; for example, the term |
639 rewritten, the result will be incorrect; for example, the term |
635 \begin{ttbox} |
640 \begin{ttbox} |
636 \%empty insert. insert(x, empty) |
641 \%empty insert. insert(x, empty) |
637 \end{ttbox} |
642 \end{ttbox} |
638 \par\noindent is printed as \verb|%empty insert. {x}|. |
643 \par\noindent is incorrectly printed as \verb|%empty insert. {x}|. |
639 \end{warn} |
644 \end{warn} |
640 |
645 |
641 |
646 |
642 \subsection{Example: a parse macro for dependent types}\label{prod_trans} |
647 \subsection{Example: a parse macro for dependent types}\label{prod_trans} |
643 \index{examples!of macros} |
648 \index{examples!of macros} |
644 |
649 |
645 As stated earlier, a macro rule may not introduce new {\tt Variable}s on |
650 As stated earlier, a macro rule may not introduce new {\tt Variable}s on |
646 the right-hand side. Something like \verb|"K(B)" => "%x. B"| is illegal; |
651 the right-hand side. Something like \verb|"K(B)" => "%x.B"| is illegal; |
647 if allowed, it could cause variable capture. In such cases you usually |
652 if allowed, it could cause variable capture. In such cases you usually |
648 must fall back on translation functions. But a trick can make things |
653 must fall back on translation functions. But a trick can make things |
649 readable in some cases: {\em calling translation functions by parse |
654 readable in some cases: {\em calling\/} translation functions by parse |
650 macros}: |
655 macros: |
651 \begin{ttbox} |
656 \begin{ttbox} |
652 PROD = FINSET + |
657 PROD = FINSET + |
653 consts |
658 consts |
654 Pi :: "[i, i => i] => i" |
659 Pi :: "[i, i => i] => i" |
|
660 syntax |
655 "{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10) |
661 "{\at}PROD" :: "[idt, i, i] => i" ("(3PROD _:_./ _)" 10) |
656 "{\at}->" :: "[i, i] => i" ("(_ ->/ _)" [51, 50] 50) |
662 "{\at}->" :: "[i, i] => i" ("(_ ->/ _)" [51, 50] 50) |
657 \ttbreak |
663 \ttbreak |
658 translations |
664 translations |
659 "PROD x:A. B" => "Pi(A, \%x. B)" |
665 "PROD x:A. B" => "Pi(A, \%x. B)" |
661 end |
667 end |
662 ML |
668 ML |
663 val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))]; |
669 val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))]; |
664 \end{ttbox} |
670 \end{ttbox} |
665 |
671 |
666 Here {\tt Pi} is an internal constant for constructing general products. |
672 Here {\tt Pi} is a logical constant for constructing general products. |
667 Two external forms exist: the general case {\tt PROD x:A.B} and the |
673 Two external forms exist: the general case {\tt PROD x:A.B} and the |
668 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B} |
674 function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B} |
669 does not depend on~{\tt x}. |
675 does not depend on~{\tt x}. |
670 |
676 |
671 The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B| |
677 The second parse macro introduces {\tt _K(B)}, which later becomes |
672 due to a parse translation associated with \cdx{_K}. The order of the |
678 \verb|%x.B| due to a parse translation associated with \cdx{_K}. |
673 parse rules is critical. Unfortunately there is no such trick for |
679 Unfortunately there is no such trick for printing, so we have to add a {\tt |
674 printing, so we have to add a {\tt ML} section for the print translation |
680 ML} section for the print translation \ttindex{dependent_tr'}. |
675 \ttindex{dependent_tr'}. |
|
676 |
681 |
677 Recall that identifiers with a leading {\tt _} are allowed in translation |
682 Recall that identifiers with a leading {\tt _} are allowed in translation |
678 rules, but not in ordinary terms. Thus we can create \AST{}s containing |
683 rules, but not in ordinary terms. Thus we can create \AST{}s containing |
679 names that are not directly expressible. |
684 names that are not directly expressible. |
680 |
685 |
683 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions. |
688 \S\ref{sec:tr_funs} below for more of the arcane lore of translation functions. |
684 \index{macros|)}\index{rewriting!syntactic|)} |
689 \index{macros|)}\index{rewriting!syntactic|)} |
685 |
690 |
686 |
691 |
687 \section{Translation functions} \label{sec:tr_funs} |
692 \section{Translation functions} \label{sec:tr_funs} |
688 \index{translations|(} |
693 \index{translations|(} |
689 % |
694 % |
690 This section describes the translation function mechanism. By writing |
695 This section describes the translation function mechanism. By writing |
691 \ML{} functions, you can do almost everything with terms or \AST{}s during |
696 \ML{} functions, you can do almost everything with terms or \AST{}s during |
692 parsing and printing. The logic \LK\ is a good example of sophisticated |
697 parsing and printing. The logic \LK\ is a good example of sophisticated |
693 transformations between internal and external representations of sequents; |
698 transformations between internal and external representations of sequents; |
694 here, macros would be useless. |
699 here, macros would be useless. |
695 |
700 |
696 A full understanding of translations requires some familiarity |
701 A full understanding of translations requires some familiarity |
697 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ}, |
702 with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ}, |
698 {\tt Syntax.ast} and the encodings of types and terms as such at the various |
703 {\tt Syntax.ast} and the encodings of types and terms as such at the various |
699 stages of the parsing or printing process. Most users should never need to |
704 stages of the parsing or printing process. Most users should never need to |
702 \subsection{Declaring translation functions} |
707 \subsection{Declaring translation functions} |
703 There are four kinds of translation functions. Each such function is |
708 There are four kinds of translation functions. Each such function is |
704 associated with a name, which triggers calls to it. Such names can be |
709 associated with a name, which triggers calls to it. Such names can be |
705 constants (logical or syntactic) or type constructors. |
710 constants (logical or syntactic) or type constructors. |
706 |
711 |
707 {\tt Syntax.print_syntax} displays the sets of names associated with the |
712 {\tt print_syntax} displays the sets of names associated with the |
708 translation functions of a {\tt Syntax.syntax} under |
713 translation functions of a {\tt Syntax.syntax} under |
709 \ttindex{parse_ast_translation}, \ttindex{parse_translation}, |
714 \ttindex{parse_ast_translation}, \ttindex{parse_translation}, |
710 \ttindex{print_translation} and \ttindex{print_ast_translation}. You can |
715 \ttindex{print_translation} and \ttindex{print_ast_translation}. You can |
711 add new ones via the {\tt ML} section\index{*ML section} of |
716 add new ones via the {\tt ML} section\index{*ML section} of |
712 a {\tt .thy} file. There may never be more than one function of the same |
717 a {\tt .thy} file. There may never be more than one function of the same |
713 kind per name. Conceptually, the {\tt ML} section should appear between |
718 kind per name. Even though the {\tt ML} section is the very last part of a |
714 {\tt consts} and {\tt translations}; newly installed translation functions |
719 {\tt .thy} file, newly installed translation functions are effective when |
715 are already effective when macros and logical rules are parsed. |
720 processing the preceding section. |
716 |
721 |
717 The {\tt ML} section is copied verbatim into the \ML\ file generated from a |
722 The {\tt ML} section is copied verbatim near the beginning of the \ML\ file |
718 {\tt .thy} file. Definitions made here are accessible as components of an |
723 generated from a {\tt .thy} file. Definitions made here are accessible as |
719 \ML\ structure; to make some definitions private, use an \ML{} {\tt local} |
724 components of an \ML\ structure; to make some definitions private, use an |
720 declaration. The {\tt ML} section may install translation functions by |
725 \ML{} {\tt local} declaration. The {\tt ML} section may install translation |
721 declaring any of the following identifiers: |
726 functions by declaring any of the following identifiers: |
722 \begin{ttbox} |
727 \begin{ttbox} |
723 val parse_ast_translation : (string * (ast list -> ast)) list |
728 val parse_ast_translation : (string * (ast list -> ast)) list |
724 val print_ast_translation : (string * (ast list -> ast)) list |
729 val print_ast_translation : (string * (ast list -> ast)) list |
725 val parse_translation : (string * (term list -> term)) list |
730 val parse_translation : (string * (term list -> term)) list |
726 val print_translation : (string * (term list -> term)) list |
731 val print_translation : (string * (term list -> term)) list |
740 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp |
745 has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp |
741 x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated |
746 x@1 \ttapp \ldots \ttapp x@n$. Terms allow more sophisticated |
742 transformations than \AST{}s do, typically involving abstractions and bound |
747 transformations than \AST{}s do, typically involving abstractions and bound |
743 variables. |
748 variables. |
744 |
749 |
745 Regardless of whether they act on terms or \AST{}s, |
750 Regardless of whether they act on terms or \AST{}s, parse translations differ |
746 parse translations differ from print translations fundamentally: |
751 from print translations in their overall behaviour fundamentally: |
747 \begin{description} |
752 \begin{description} |
748 \item[Parse translations] are applied bottom-up. The arguments are already |
753 \item[Parse translations] are applied bottom-up. The arguments are already |
749 in translated form. The translations must not fail; exceptions trigger |
754 in translated form. The translations must not fail; exceptions trigger |
750 an error message. |
755 an error message. |
751 |
756 |
784 sources to locate more examples. |
789 sources to locate more examples. |
785 |
790 |
786 Here is the parse translation for {\tt _K}: |
791 Here is the parse translation for {\tt _K}: |
787 \begin{ttbox} |
792 \begin{ttbox} |
788 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t) |
793 fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t) |
789 | k_tr ts = raise TERM("k_tr",ts); |
794 | k_tr ts = raise TERM ("k_tr", ts); |
790 \end{ttbox} |
795 \end{ttbox} |
791 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new |
796 If {\tt k_tr} is called with exactly one argument~$t$, it creates a new |
792 {\tt Abs} node with a body derived from $t$. Since terms given to parse |
797 {\tt Abs} node with a body derived from $t$. Since terms given to parse |
793 translations are not yet typed, the type of the bound variable in the new |
798 translations are not yet typed, the type of the bound variable in the new |
794 {\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound} |
799 {\tt Abs} is simply {\tt dummyT}. The function increments all {\tt Bound} |
795 nodes referring to outer abstractions by calling \ttindex{incr_boundvars}, |
800 nodes referring to outer abstractions by calling \ttindex{incr_boundvars}, |
796 a basic term manipulation function defined in {\tt Pure/term.ML}. |
801 a basic term manipulation function defined in {\tt Pure/term.ML}. |
797 |
802 |
798 Here is the print translation for dependent types: |
803 Here is the print translation for dependent types: |
799 \begin{ttbox} |
804 \begin{ttbox} |
800 fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) = |
805 fun dependent_tr' (q, r) (A :: Abs (x, T, B) :: ts) = |
801 if 0 mem (loose_bnos B) then |
806 if 0 mem (loose_bnos B) then |
802 let val (x', B') = variant_abs (x, dummyT, B); |
807 let val (x', B') = variant_abs (x, dummyT, B); |
803 in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts) |
808 in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts) |
804 end |
809 end |
805 else list_comb (Const (r, dummyT) $ A $ B, ts) |
810 else list_comb (Const (r, dummyT) $ A $ B, ts) |
831 let val (x', B') = variant_abs (x, dummyT, B); |
836 let val (x', B') = variant_abs (x, dummyT, B); |
832 \end{ttbox}\index{*variant_abs} |
837 \end{ttbox}\index{*variant_abs} |
833 replaces bound variable occurrences in~$B$ by the free variable $x'$ with |
838 replaces bound variable occurrences in~$B$ by the free variable $x'$ with |
834 type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the |
839 type {\tt dummyT}. Only the binding occurrence of~$x'$ is given the |
835 correct type~{\tt T}, so this is the only place where a type |
840 correct type~{\tt T}, so this is the only place where a type |
836 constraint might appear. |
841 constraint might appear. |
837 \index{translations|)} |
842 \index{translations|)} |
838 \index{syntax!transformations|)} |
843 \index{syntax!transformations|)} |