| author | blanchet | 
| Mon, 22 Jun 2015 16:56:03 +0200 | |
| changeset 60544 | 3daf5eacec05 | 
| parent 58618 | 782f0b662cae | 
| child 61416 | b9a3324e4e62 | 
| permissions | -rw-r--r-- | 
| 57344 | 1  | 
(*:wrap=hard:maxLineLen=78:*)  | 
2  | 
||
| 30124 | 3  | 
theory Syntax  | 
4  | 
imports Base  | 
|
5  | 
begin  | 
|
6  | 
||
| 58618 | 7  | 
chapter \<open>Concrete syntax and type-checking\<close>  | 
| 30124 | 8  | 
|
| 58618 | 9  | 
text \<open>Pure @{text "\<lambda>"}-calculus as introduced in \chref{ch:logic} is
 | 
| 45258 | 10  | 
an adequate foundation for logical languages --- in the tradition of  | 
11  | 
  \emph{higher-order abstract syntax} --- but end-users require
 | 
|
12  | 
additional means for reading and printing of terms and types. This  | 
|
13  | 
  important add-on outside the logical core is called \emph{inner
 | 
|
14  | 
  syntax} in Isabelle jargon, as opposed to the \emph{outer syntax} of
 | 
|
| 58555 | 15  | 
  the theory and proof language @{cite "isabelle-isar-ref"}.
 | 
| 45258 | 16  | 
|
| 58555 | 17  | 
  For example, according to @{cite church40} quantifiers are represented as
 | 
| 57496 | 18  | 
  higher-order constants @{text "All :: ('a \<Rightarrow> bool) \<Rightarrow> bool"} such that @{text
 | 
19  | 
"All (\<lambda>x::'a. B x)"} faithfully represents the idea that is displayed in  | 
|
20  | 
  Isabelle as @{text "\<forall>x::'a. B x"} via @{keyword "binder"} notation.
 | 
|
| 58555 | 21  | 
  Moreover, type-inference in the style of Hindley-Milner @{cite hindleymilner}
 | 
| 57496 | 22  | 
  (and extensions) enables users to write @{text "\<forall>x. B x"} concisely, when
 | 
23  | 
  the type @{text "'a"} is already clear from the
 | 
|
24  | 
  context.\footnote{Type-inference taken to the extreme can easily confuse
 | 
|
25  | 
users. Beginners often stumble over unexpectedly general types inferred by  | 
|
26  | 
the system.}  | 
|
| 45258 | 27  | 
|
28  | 
  \medskip The main inner syntax operations are \emph{read} for
 | 
|
29  | 
  parsing together with type-checking, and \emph{pretty} for formatted
 | 
|
30  | 
  output.  See also \secref{sec:read-print}.
 | 
|
31  | 
||
32  | 
Furthermore, the input and output syntax layers are sub-divided into  | 
|
33  | 
  separate phases for \emph{concrete syntax} versus \emph{abstract
 | 
|
34  | 
  syntax}, see also \secref{sec:parse-unparse} and
 | 
|
35  | 
  \secref{sec:term-check}, respectively.  This results in the
 | 
|
36  | 
following decomposition of the main operations:  | 
|
37  | 
||
38  | 
  \begin{itemize}
 | 
|
39  | 
||
40  | 
  \item @{text "read = parse; check"}
 | 
|
41  | 
||
42  | 
  \item @{text "pretty = uncheck; unparse"}
 | 
|
43  | 
||
44  | 
  \end{itemize}
 | 
|
45  | 
||
| 57496 | 46  | 
For example, some specification package might thus intercept syntax  | 
47  | 
  processing at a well-defined stage after @{text "parse"}, to a augment the
 | 
|
48  | 
  resulting pre-term before full type-reconstruction is performed by @{text
 | 
|
49  | 
"check"}. Note that the formal status of bound variables, versus free  | 
|
50  | 
variables, versus constants must not be changed between these phases.  | 
|
| 57345 | 51  | 
|
52  | 
  \medskip In general, @{text check} and @{text uncheck} operate
 | 
|
53  | 
simultaneously on a list of terms. This is particular important for  | 
|
54  | 
type-checking, to reconstruct types for several terms of the same context  | 
|
55  | 
  and scope. In contrast, @{text parse} and @{text unparse} operate separately
 | 
|
| 57496 | 56  | 
on single terms.  | 
| 57345 | 57  | 
|
58  | 
There are analogous operations to read and print types, with the same  | 
|
59  | 
sub-division into phases.  | 
|
| 58618 | 60  | 
\<close>  | 
| 45258 | 61  | 
|
| 30272 | 62  | 
|
| 58618 | 63  | 
section \<open>Reading and pretty printing \label{sec:read-print}\<close>
 | 
| 34924 | 64  | 
|
| 58618 | 65  | 
text \<open>  | 
| 57496 | 66  | 
Read and print operations are roughly dual to each other, such that for the  | 
67  | 
  user @{text "s' = pretty (read s)"} looks similar to the original source
 | 
|
68  | 
  text @{text "s"}, but the details depend on many side-conditions. There are
 | 
|
69  | 
also explicit options to control the removal of type information in the  | 
|
70  | 
  output. The default configuration routinely looses information, so @{text
 | 
|
71  | 
"t' = read (pretty t)"} might fail, or produce a differently typed term, or  | 
|
72  | 
a completely different term in the face of syntactic overloading.  | 
|
| 58618 | 73  | 
\<close>  | 
| 34924 | 74  | 
|
| 58618 | 75  | 
text %mlref \<open>  | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
76  | 
  \begin{mldecls}
 | 
| 57345 | 77  | 
  @{index_ML Syntax.read_typs: "Proof.context -> string list -> typ list"} \\
 | 
78  | 
  @{index_ML Syntax.read_terms: "Proof.context -> string list -> term list"} \\
 | 
|
79  | 
  @{index_ML Syntax.read_props: "Proof.context -> string list -> term list"} \\[0.5ex]
 | 
|
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
80  | 
  @{index_ML Syntax.read_typ: "Proof.context -> string -> typ"} \\
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
81  | 
  @{index_ML Syntax.read_term: "Proof.context -> string -> term"} \\
 | 
| 57345 | 82  | 
  @{index_ML Syntax.read_prop: "Proof.context -> string -> term"} \\[0.5ex]
 | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
83  | 
  @{index_ML Syntax.pretty_typ: "Proof.context -> typ -> Pretty.T"} \\
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
84  | 
  @{index_ML Syntax.pretty_term: "Proof.context -> term -> Pretty.T"} \\
 | 
| 57345 | 85  | 
  @{index_ML Syntax.string_of_typ: "Proof.context -> typ -> string"} \\
 | 
86  | 
  @{index_ML Syntax.string_of_term: "Proof.context -> term -> string"} \\
 | 
|
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
87  | 
  \end{mldecls}
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
88  | 
|
| 57345 | 89  | 
  \begin{description}
 | 
90  | 
||
| 57346 | 91  | 
  \item @{ML Syntax.read_typs}~@{text "ctxt strs"} parses and checks a
 | 
| 57345 | 92  | 
simultaneous list of source strings as types of the logic.  | 
93  | 
||
| 57346 | 94  | 
  \item @{ML Syntax.read_terms}~@{text "ctxt strs"} parses and checks a
 | 
| 57345 | 95  | 
simultaneous list of source strings as terms of the logic.  | 
| 57346 | 96  | 
Type-reconstruction puts all parsed terms into the same scope: types of  | 
97  | 
free variables ultimately need to coincide.  | 
|
| 57345 | 98  | 
|
99  | 
If particular type-constraints are required for some of the arguments, the  | 
|
| 57346 | 100  | 
read operations needs to be split into its parse and check phases. Then it  | 
| 57496 | 101  | 
  is possible to use @{ML Type.constraint} on the intermediate pre-terms
 | 
| 57846 | 102  | 
  (\secref{sec:term-check}).
 | 
| 57345 | 103  | 
|
| 57346 | 104  | 
  \item @{ML Syntax.read_props}~@{text "ctxt strs"} parses and checks a
 | 
105  | 
simultaneous list of source strings as terms of the logic, with an implicit  | 
|
106  | 
  type-constraint for each argument to enforce type @{typ prop}; this also
 | 
|
| 57496 | 107  | 
affects the inner syntax for parsing. The remaining type-reconstruction  | 
108  | 
  works as for @{ML Syntax.read_terms}.
 | 
|
| 57345 | 109  | 
|
110  | 
  \item @{ML Syntax.read_typ}, @{ML Syntax.read_term}, @{ML Syntax.read_prop}
 | 
|
| 57496 | 111  | 
are like the simultaneous versions, but operate on a single argument only.  | 
112  | 
This convenient shorthand is adequate in situations where a single item in  | 
|
113  | 
  its own scope is processed. Do not use @{ML "map o Syntax.read_term"} where
 | 
|
114  | 
  @{ML Syntax.read_terms} is actually intended!
 | 
|
| 57345 | 115  | 
|
116  | 
  \item @{ML Syntax.pretty_typ}~@{text "ctxt T"} and @{ML
 | 
|
117  | 
  Syntax.pretty_term}~@{text "ctxt t"} uncheck and pretty-print the given type
 | 
|
118  | 
or term, respectively. Although the uncheck phase acts on a simultaneous  | 
|
| 57496 | 119  | 
list as well, this is rarely used in practice, so only the singleton case is  | 
120  | 
provided as combined pretty operation. There is no distinction of term vs.\  | 
|
121  | 
proposition.  | 
|
| 57345 | 122  | 
|
123  | 
  \item @{ML Syntax.string_of_typ} and @{ML Syntax.string_of_term} are
 | 
|
124  | 
  convenient compositions of @{ML Syntax.pretty_typ} and @{ML
 | 
|
125  | 
  Syntax.pretty_term} with @{ML Pretty.string_of} for output. The result may
 | 
|
126  | 
be concatenated with other strings, as long as there is no further  | 
|
127  | 
formatting and line-breaking involved.  | 
|
128  | 
||
129  | 
  \end{description}
 | 
|
130  | 
||
| 57346 | 131  | 
  @{ML Syntax.read_term}, @{ML Syntax.read_prop}, and @{ML
 | 
132  | 
Syntax.string_of_term} are the most important operations in practice.  | 
|
| 57345 | 133  | 
|
| 57496 | 134  | 
\medskip Note that the string values that are passed in and out are  | 
| 57346 | 135  | 
annotated by the system, to carry further markup that is relevant for the  | 
| 58555 | 136  | 
  Prover IDE @{cite "isabelle-jedit"}. User code should neither compose its
 | 
137  | 
own input strings, nor try to analyze the output strings. Conceptually this  | 
|
138  | 
is an abstract datatype, encoded as concrete string for historical reasons.  | 
|
| 57345 | 139  | 
|
140  | 
The standard way to provide the required position markup for input works via  | 
|
141  | 
  the outer syntax parser wrapper @{ML Parse.inner_syntax}, which is already
 | 
|
142  | 
  part of @{ML Parse.typ}, @{ML Parse.term}, @{ML Parse.prop}. So a string
 | 
|
143  | 
obtained from one of the latter may be directly passed to the corresponding  | 
|
| 57346 | 144  | 
read operation: this yields PIDE markup of the input and precise positions  | 
145  | 
for warning and error messages.  | 
|
| 58618 | 146  | 
\<close>  | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
147  | 
|
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
148  | 
|
| 58618 | 149  | 
section \<open>Parsing and unparsing \label{sec:parse-unparse}\<close>
 | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
150  | 
|
| 58618 | 151  | 
text \<open>  | 
| 57496 | 152  | 
Parsing and unparsing converts between actual source text and a certain  | 
153  | 
  \emph{pre-term} format, where all bindings and scopes are already resolved
 | 
|
154  | 
faithfully. Thus the names of free variables or constants are determined in  | 
|
155  | 
the sense of the logical context, but type information might be still  | 
|
156  | 
  missing. Pre-terms support an explicit language of \emph{type constraints}
 | 
|
157  | 
  that may be augmented by user code to guide the later \emph{check} phase.
 | 
|
| 45258 | 158  | 
|
| 57496 | 159  | 
Actual parsing is based on traditional lexical analysis and Earley parsing  | 
160  | 
for arbitrary context-free grammars. The user can specify the grammar  | 
|
161  | 
  declaratively via mixfix annotations. Moreover, there are \emph{syntax
 | 
|
162  | 
translations} that can be augmented by the user, either declaratively via  | 
|
163  | 
  @{command translations} or programmatically via @{command
 | 
|
| 58555 | 164  | 
  parse_translation}, @{command print_translation} @{cite
 | 
165  | 
"isabelle-isar-ref"}. The final scope-resolution is performed by the system,  | 
|
166  | 
according to name spaces for types, term variables and constants determined  | 
|
167  | 
by the context.  | 
|
| 58618 | 168  | 
\<close>  | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
169  | 
|
| 58618 | 170  | 
text %mlref \<open>  | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
171  | 
  \begin{mldecls}
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
172  | 
  @{index_ML Syntax.parse_typ: "Proof.context -> string -> typ"} \\
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
173  | 
  @{index_ML Syntax.parse_term: "Proof.context -> string -> term"} \\
 | 
| 57496 | 174  | 
  @{index_ML Syntax.parse_prop: "Proof.context -> string -> term"} \\[0.5ex]
 | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
175  | 
  @{index_ML Syntax.unparse_typ: "Proof.context -> typ -> Pretty.T"} \\
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
176  | 
  @{index_ML Syntax.unparse_term: "Proof.context -> term -> Pretty.T"} \\
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
177  | 
  \end{mldecls}
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
178  | 
|
| 57346 | 179  | 
  \begin{description}
 | 
180  | 
||
| 57496 | 181  | 
  \item @{ML Syntax.parse_typ}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 182  | 
pre-type that is ready to be used with subsequent check operations.  | 
183  | 
||
| 57496 | 184  | 
  \item @{ML Syntax.parse_term}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 185  | 
pre-term that is ready to be used with subsequent check operations.  | 
186  | 
||
| 57496 | 187  | 
  \item @{ML Syntax.parse_prop}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 188  | 
pre-term that is ready to be used with subsequent check operations. The  | 
189  | 
  inner syntax category is @{typ prop} and a suitable type-constraint is
 | 
|
| 57496 | 190  | 
included to ensure that this information is observed in subsequent type  | 
191  | 
reconstruction.  | 
|
| 57346 | 192  | 
|
193  | 
  \item @{ML Syntax.unparse_typ}~@{text "ctxt T"} unparses a type after
 | 
|
194  | 
uncheck operations, to turn it into a pretty tree.  | 
|
195  | 
||
196  | 
  \item @{ML Syntax.unparse_term}~@{text "ctxt T"} unparses a term after
 | 
|
197  | 
uncheck operations, to turn it into a pretty tree. There is no distinction  | 
|
198  | 
for propositions here.  | 
|
199  | 
||
200  | 
  \end{description}
 | 
|
201  | 
||
| 57496 | 202  | 
  These operations always operate on a single item; use the combinator @{ML
 | 
203  | 
map} to apply them to a list.  | 
|
| 58618 | 204  | 
\<close>  | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
205  | 
|
| 39852 | 206  | 
|
| 58618 | 207  | 
section \<open>Checking and unchecking \label{sec:term-check}\<close>
 | 
| 34924 | 208  | 
|
| 58618 | 209  | 
text \<open>These operations define the transition from pre-terms and  | 
| 45258 | 210  | 
fully-annotated terms in the sense of the logical core  | 
211  | 
  (\chref{ch:logic}).
 | 
|
212  | 
||
213  | 
  The \emph{check} phase is meant to subsume a variety of mechanisms
 | 
|
214  | 
in the manner of ``type-inference'' or ``type-reconstruction'' or  | 
|
215  | 
``type-improvement'', not just type-checking in the narrow sense.  | 
|
216  | 
  The \emph{uncheck} phase is roughly dual, it prunes type-information
 | 
|
217  | 
before pretty printing.  | 
|
218  | 
||
219  | 
  A typical add-on for the check/uncheck syntax layer is the @{command
 | 
|
| 58555 | 220  | 
  abbreviation} mechanism @{cite "isabelle-isar-ref"}. Here the user specifies
 | 
| 57496 | 221  | 
  syntactic definitions that are managed by the system as polymorphic @{text
 | 
222  | 
  "let"} bindings. These are expanded during the @{text "check"} phase, and
 | 
|
223  | 
  contracted during the @{text "uncheck"} phase, without affecting the
 | 
|
224  | 
type-assignment of the given terms.  | 
|
| 45258 | 225  | 
|
| 57496 | 226  | 
\medskip The precise meaning of type checking depends on the context ---  | 
227  | 
additional check/uncheck modules might be defined in user space.  | 
|
| 45258 | 228  | 
|
229  | 
  For example, the @{command class} command defines a context where
 | 
|
230  | 
  @{text "check"} treats certain type instances of overloaded
 | 
|
231  | 
constants according to the ``dictionary construction'' of its  | 
|
232  | 
logical foundation. This involves ``type improvement''  | 
|
233  | 
(specialization of slightly too general types) and replacement by  | 
|
| 58555 | 234  | 
  certain locale parameters.  See also @{cite "Haftmann-Wenzel:2009"}.
 | 
| 58618 | 235  | 
\<close>  | 
| 39852 | 236  | 
|
| 58618 | 237  | 
text %mlref \<open>  | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
238  | 
  \begin{mldecls}
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
239  | 
  @{index_ML Syntax.check_typs: "Proof.context -> typ list -> typ list"} \\
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
240  | 
  @{index_ML Syntax.check_terms: "Proof.context -> term list -> term list"} \\
 | 
| 57496 | 241  | 
  @{index_ML Syntax.check_props: "Proof.context -> term list -> term list"} \\[0.5ex]
 | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
242  | 
  @{index_ML Syntax.uncheck_typs: "Proof.context -> typ list -> typ list"} \\
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
243  | 
  @{index_ML Syntax.uncheck_terms: "Proof.context -> term list -> term list"} \\
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
244  | 
  \end{mldecls}
 | 
| 
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
245  | 
|
| 57346 | 246  | 
  \begin{description}
 | 
247  | 
||
248  | 
  \item @{ML Syntax.check_typs}~@{text "ctxt Ts"} checks a simultaneous list
 | 
|
249  | 
of pre-types as types of the logic. Typically, this involves normalization  | 
|
250  | 
of type synonyms.  | 
|
251  | 
||
252  | 
  \item @{ML Syntax.check_terms}~@{text "ctxt ts"} checks a simultaneous list
 | 
|
253  | 
of pre-terms as terms of the logic. Typically, this involves type-inference  | 
|
254  | 
and normalization term abbreviations. The types within the given terms are  | 
|
255  | 
  treated in the same way as for @{ML Syntax.check_typs}.
 | 
|
256  | 
||
257  | 
Applications sometimes need to check several types and terms together. The  | 
|
258  | 
  standard approach uses @{ML Logic.mk_type} to embed the language of types
 | 
|
259  | 
into that of terms; all arguments are appended into one list of terms that  | 
|
| 57496 | 260  | 
  is checked; afterwards the type arguments are recovered with @{ML
 | 
| 57346 | 261  | 
Logic.dest_type}.  | 
262  | 
||
263  | 
  \item @{ML Syntax.check_props}~@{text "ctxt ts"} checks a simultaneous list
 | 
|
264  | 
of pre-terms as terms of the logic, such that all terms are constrained by  | 
|
265  | 
  type @{typ prop}. The remaining check operation works as @{ML
 | 
|
266  | 
Syntax.check_terms} above.  | 
|
267  | 
||
268  | 
  \item @{ML Syntax.uncheck_typs}~@{text "ctxt Ts"} unchecks a simultaneous
 | 
|
269  | 
list of types of the logic, in preparation of pretty printing.  | 
|
270  | 
||
271  | 
  \item @{ML Syntax.uncheck_terms}~@{text "ctxt ts"} unchecks a simultaneous
 | 
|
272  | 
list of terms of the logic, in preparation of pretty printing. There is no  | 
|
273  | 
distinction for propositions here.  | 
|
274  | 
||
275  | 
  \end{description}
 | 
|
276  | 
||
| 57496 | 277  | 
These operations always operate simultaneously on a list; use the combinator  | 
278  | 
  @{ML singleton} to apply them to a single item.
 | 
|
| 58618 | 279  | 
\<close>  | 
| 
39876
 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 
wenzelm 
parents: 
39865 
diff
changeset
 | 
280  | 
|
| 30124 | 281  | 
end  |