| author | wenzelm | 
| Mon, 12 Oct 2015 20:58:58 +0200 | |
| changeset 61416 | b9a3324e4e62 | 
| parent 58618 | 782f0b662cae | 
| child 61439 | 2bf52eec4e8a | 
| permissions | -rw-r--r-- | 
| 57344 | 1 | (*:wrap=hard:maxLineLen=78:*) | 
| 2 | ||
| 30124 | 3 | theory Syntax | 
| 4 | imports Base | |
| 5 | begin | |
| 6 | ||
| 58618 | 7 | chapter \<open>Concrete syntax and type-checking\<close> | 
| 30124 | 8 | |
| 58618 | 9 | text \<open>Pure @{text "\<lambda>"}-calculus as introduced in \chref{ch:logic} is
 | 
| 45258 | 10 | an adequate foundation for logical languages --- in the tradition of | 
| 11 |   \emph{higher-order abstract syntax} --- but end-users require
 | |
| 12 | additional means for reading and printing of terms and types. This | |
| 13 |   important add-on outside the logical core is called \emph{inner
 | |
| 14 |   syntax} in Isabelle jargon, as opposed to the \emph{outer syntax} of
 | |
| 58555 | 15 |   the theory and proof language @{cite "isabelle-isar-ref"}.
 | 
| 45258 | 16 | |
| 58555 | 17 |   For example, according to @{cite church40} quantifiers are represented as
 | 
| 57496 | 18 |   higher-order constants @{text "All :: ('a \<Rightarrow> bool) \<Rightarrow> bool"} such that @{text
 | 
| 19 | "All (\<lambda>x::'a. B x)"} faithfully represents the idea that is displayed in | |
| 20 |   Isabelle as @{text "\<forall>x::'a. B x"} via @{keyword "binder"} notation.
 | |
| 58555 | 21 |   Moreover, type-inference in the style of Hindley-Milner @{cite hindleymilner}
 | 
| 57496 | 22 |   (and extensions) enables users to write @{text "\<forall>x. B x"} concisely, when
 | 
| 23 |   the type @{text "'a"} is already clear from the
 | |
| 24 |   context.\footnote{Type-inference taken to the extreme can easily confuse
 | |
| 25 | users. Beginners often stumble over unexpectedly general types inferred by | |
| 26 | the system.} | |
| 45258 | 27 | |
| 61416 | 28 | \<^medskip> | 
| 29 |   The main inner syntax operations are \emph{read} for
 | |
| 45258 | 30 |   parsing together with type-checking, and \emph{pretty} for formatted
 | 
| 31 |   output.  See also \secref{sec:read-print}.
 | |
| 32 | ||
| 33 | Furthermore, the input and output syntax layers are sub-divided into | |
| 34 |   separate phases for \emph{concrete syntax} versus \emph{abstract
 | |
| 35 |   syntax}, see also \secref{sec:parse-unparse} and
 | |
| 36 |   \secref{sec:term-check}, respectively.  This results in the
 | |
| 37 | following decomposition of the main operations: | |
| 38 | ||
| 39 |   \begin{itemize}
 | |
| 40 | ||
| 61416 | 41 |   \<^item> @{text "read = parse; check"}
 | 
| 45258 | 42 | |
| 61416 | 43 |   \<^item> @{text "pretty = uncheck; unparse"}
 | 
| 45258 | 44 | |
| 45 |   \end{itemize}
 | |
| 46 | ||
| 57496 | 47 | For example, some specification package might thus intercept syntax | 
| 48 |   processing at a well-defined stage after @{text "parse"}, to a augment the
 | |
| 49 |   resulting pre-term before full type-reconstruction is performed by @{text
 | |
| 50 | "check"}. Note that the formal status of bound variables, versus free | |
| 51 | variables, versus constants must not be changed between these phases. | |
| 57345 | 52 | |
| 61416 | 53 | \<^medskip> | 
| 54 |   In general, @{text check} and @{text uncheck} operate
 | |
| 57345 | 55 | simultaneously on a list of terms. This is particular important for | 
| 56 | type-checking, to reconstruct types for several terms of the same context | |
| 57 |   and scope. In contrast, @{text parse} and @{text unparse} operate separately
 | |
| 57496 | 58 | on single terms. | 
| 57345 | 59 | |
| 60 | There are analogous operations to read and print types, with the same | |
| 61 | sub-division into phases. | |
| 58618 | 62 | \<close> | 
| 45258 | 63 | |
| 30272 | 64 | |
| 58618 | 65 | section \<open>Reading and pretty printing \label{sec:read-print}\<close>
 | 
| 34924 | 66 | |
| 58618 | 67 | text \<open> | 
| 57496 | 68 | Read and print operations are roughly dual to each other, such that for the | 
| 69 |   user @{text "s' = pretty (read s)"} looks similar to the original source
 | |
| 70 |   text @{text "s"}, but the details depend on many side-conditions. There are
 | |
| 71 | also explicit options to control the removal of type information in the | |
| 72 |   output. The default configuration routinely looses information, so @{text
 | |
| 73 | "t' = read (pretty t)"} might fail, or produce a differently typed term, or | |
| 74 | a completely different term in the face of syntactic overloading. | |
| 58618 | 75 | \<close> | 
| 34924 | 76 | |
| 58618 | 77 | text %mlref \<open> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 78 |   \begin{mldecls}
 | 
| 57345 | 79 |   @{index_ML Syntax.read_typs: "Proof.context -> string list -> typ list"} \\
 | 
| 80 |   @{index_ML Syntax.read_terms: "Proof.context -> string list -> term list"} \\
 | |
| 81 |   @{index_ML Syntax.read_props: "Proof.context -> string list -> term list"} \\[0.5ex]
 | |
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 82 |   @{index_ML Syntax.read_typ: "Proof.context -> string -> typ"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 83 |   @{index_ML Syntax.read_term: "Proof.context -> string -> term"} \\
 | 
| 57345 | 84 |   @{index_ML Syntax.read_prop: "Proof.context -> string -> term"} \\[0.5ex]
 | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 85 |   @{index_ML Syntax.pretty_typ: "Proof.context -> typ -> Pretty.T"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 86 |   @{index_ML Syntax.pretty_term: "Proof.context -> term -> Pretty.T"} \\
 | 
| 57345 | 87 |   @{index_ML Syntax.string_of_typ: "Proof.context -> typ -> string"} \\
 | 
| 88 |   @{index_ML Syntax.string_of_term: "Proof.context -> term -> string"} \\
 | |
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 89 |   \end{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 90 | |
| 57345 | 91 |   \begin{description}
 | 
| 92 | ||
| 57346 | 93 |   \item @{ML Syntax.read_typs}~@{text "ctxt strs"} parses and checks a
 | 
| 57345 | 94 | simultaneous list of source strings as types of the logic. | 
| 95 | ||
| 57346 | 96 |   \item @{ML Syntax.read_terms}~@{text "ctxt strs"} parses and checks a
 | 
| 57345 | 97 | simultaneous list of source strings as terms of the logic. | 
| 57346 | 98 | Type-reconstruction puts all parsed terms into the same scope: types of | 
| 99 | free variables ultimately need to coincide. | |
| 57345 | 100 | |
| 101 | If particular type-constraints are required for some of the arguments, the | |
| 57346 | 102 | read operations needs to be split into its parse and check phases. Then it | 
| 57496 | 103 |   is possible to use @{ML Type.constraint} on the intermediate pre-terms
 | 
| 57846 | 104 |   (\secref{sec:term-check}).
 | 
| 57345 | 105 | |
| 57346 | 106 |   \item @{ML Syntax.read_props}~@{text "ctxt strs"} parses and checks a
 | 
| 107 | simultaneous list of source strings as terms of the logic, with an implicit | |
| 108 |   type-constraint for each argument to enforce type @{typ prop}; this also
 | |
| 57496 | 109 | affects the inner syntax for parsing. The remaining type-reconstruction | 
| 110 |   works as for @{ML Syntax.read_terms}.
 | |
| 57345 | 111 | |
| 112 |   \item @{ML Syntax.read_typ}, @{ML Syntax.read_term}, @{ML Syntax.read_prop}
 | |
| 57496 | 113 | are like the simultaneous versions, but operate on a single argument only. | 
| 114 | This convenient shorthand is adequate in situations where a single item in | |
| 115 |   its own scope is processed. Do not use @{ML "map o Syntax.read_term"} where
 | |
| 116 |   @{ML Syntax.read_terms} is actually intended!
 | |
| 57345 | 117 | |
| 118 |   \item @{ML Syntax.pretty_typ}~@{text "ctxt T"} and @{ML
 | |
| 119 |   Syntax.pretty_term}~@{text "ctxt t"} uncheck and pretty-print the given type
 | |
| 120 | or term, respectively. Although the uncheck phase acts on a simultaneous | |
| 57496 | 121 | list as well, this is rarely used in practice, so only the singleton case is | 
| 122 | provided as combined pretty operation. There is no distinction of term vs.\ | |
| 123 | proposition. | |
| 57345 | 124 | |
| 125 |   \item @{ML Syntax.string_of_typ} and @{ML Syntax.string_of_term} are
 | |
| 126 |   convenient compositions of @{ML Syntax.pretty_typ} and @{ML
 | |
| 127 |   Syntax.pretty_term} with @{ML Pretty.string_of} for output. The result may
 | |
| 128 | be concatenated with other strings, as long as there is no further | |
| 129 | formatting and line-breaking involved. | |
| 130 | ||
| 131 |   \end{description}
 | |
| 132 | ||
| 57346 | 133 |   @{ML Syntax.read_term}, @{ML Syntax.read_prop}, and @{ML
 | 
| 134 | Syntax.string_of_term} are the most important operations in practice. | |
| 57345 | 135 | |
| 61416 | 136 | \<^medskip> | 
| 137 | Note that the string values that are passed in and out are | |
| 57346 | 138 | annotated by the system, to carry further markup that is relevant for the | 
| 58555 | 139 |   Prover IDE @{cite "isabelle-jedit"}. User code should neither compose its
 | 
| 140 | own input strings, nor try to analyze the output strings. Conceptually this | |
| 141 | is an abstract datatype, encoded as concrete string for historical reasons. | |
| 57345 | 142 | |
| 143 | The standard way to provide the required position markup for input works via | |
| 144 |   the outer syntax parser wrapper @{ML Parse.inner_syntax}, which is already
 | |
| 145 |   part of @{ML Parse.typ}, @{ML Parse.term}, @{ML Parse.prop}. So a string
 | |
| 146 | obtained from one of the latter may be directly passed to the corresponding | |
| 57346 | 147 | read operation: this yields PIDE markup of the input and precise positions | 
| 148 | for warning and error messages. | |
| 58618 | 149 | \<close> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 150 | |
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 151 | |
| 58618 | 152 | section \<open>Parsing and unparsing \label{sec:parse-unparse}\<close>
 | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 153 | |
| 58618 | 154 | text \<open> | 
| 57496 | 155 | Parsing and unparsing converts between actual source text and a certain | 
| 156 |   \emph{pre-term} format, where all bindings and scopes are already resolved
 | |
| 157 | faithfully. Thus the names of free variables or constants are determined in | |
| 158 | the sense of the logical context, but type information might be still | |
| 159 |   missing. Pre-terms support an explicit language of \emph{type constraints}
 | |
| 160 |   that may be augmented by user code to guide the later \emph{check} phase.
 | |
| 45258 | 161 | |
| 57496 | 162 | Actual parsing is based on traditional lexical analysis and Earley parsing | 
| 163 | for arbitrary context-free grammars. The user can specify the grammar | |
| 164 |   declaratively via mixfix annotations. Moreover, there are \emph{syntax
 | |
| 165 | translations} that can be augmented by the user, either declaratively via | |
| 166 |   @{command translations} or programmatically via @{command
 | |
| 58555 | 167 |   parse_translation}, @{command print_translation} @{cite
 | 
| 168 | "isabelle-isar-ref"}. The final scope-resolution is performed by the system, | |
| 169 | according to name spaces for types, term variables and constants determined | |
| 170 | by the context. | |
| 58618 | 171 | \<close> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 172 | |
| 58618 | 173 | text %mlref \<open> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 174 |   \begin{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 175 |   @{index_ML Syntax.parse_typ: "Proof.context -> string -> typ"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 176 |   @{index_ML Syntax.parse_term: "Proof.context -> string -> term"} \\
 | 
| 57496 | 177 |   @{index_ML Syntax.parse_prop: "Proof.context -> string -> term"} \\[0.5ex]
 | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 178 |   @{index_ML Syntax.unparse_typ: "Proof.context -> typ -> Pretty.T"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 179 |   @{index_ML Syntax.unparse_term: "Proof.context -> term -> Pretty.T"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 180 |   \end{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 181 | |
| 57346 | 182 |   \begin{description}
 | 
| 183 | ||
| 57496 | 184 |   \item @{ML Syntax.parse_typ}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 185 | pre-type that is ready to be used with subsequent check operations. | 
| 186 | ||
| 57496 | 187 |   \item @{ML Syntax.parse_term}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 188 | pre-term that is ready to be used with subsequent check operations. | 
| 189 | ||
| 57496 | 190 |   \item @{ML Syntax.parse_prop}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 191 | pre-term that is ready to be used with subsequent check operations. The | 
| 192 |   inner syntax category is @{typ prop} and a suitable type-constraint is
 | |
| 57496 | 193 | included to ensure that this information is observed in subsequent type | 
| 194 | reconstruction. | |
| 57346 | 195 | |
| 196 |   \item @{ML Syntax.unparse_typ}~@{text "ctxt T"} unparses a type after
 | |
| 197 | uncheck operations, to turn it into a pretty tree. | |
| 198 | ||
| 199 |   \item @{ML Syntax.unparse_term}~@{text "ctxt T"} unparses a term after
 | |
| 200 | uncheck operations, to turn it into a pretty tree. There is no distinction | |
| 201 | for propositions here. | |
| 202 | ||
| 203 |   \end{description}
 | |
| 204 | ||
| 57496 | 205 |   These operations always operate on a single item; use the combinator @{ML
 | 
| 206 | map} to apply them to a list. | |
| 58618 | 207 | \<close> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 208 | |
| 39852 | 209 | |
| 58618 | 210 | section \<open>Checking and unchecking \label{sec:term-check}\<close>
 | 
| 34924 | 211 | |
| 58618 | 212 | text \<open>These operations define the transition from pre-terms and | 
| 45258 | 213 | fully-annotated terms in the sense of the logical core | 
| 214 |   (\chref{ch:logic}).
 | |
| 215 | ||
| 216 |   The \emph{check} phase is meant to subsume a variety of mechanisms
 | |
| 217 | in the manner of ``type-inference'' or ``type-reconstruction'' or | |
| 218 | ``type-improvement'', not just type-checking in the narrow sense. | |
| 219 |   The \emph{uncheck} phase is roughly dual, it prunes type-information
 | |
| 220 | before pretty printing. | |
| 221 | ||
| 222 |   A typical add-on for the check/uncheck syntax layer is the @{command
 | |
| 58555 | 223 |   abbreviation} mechanism @{cite "isabelle-isar-ref"}. Here the user specifies
 | 
| 57496 | 224 |   syntactic definitions that are managed by the system as polymorphic @{text
 | 
| 225 |   "let"} bindings. These are expanded during the @{text "check"} phase, and
 | |
| 226 |   contracted during the @{text "uncheck"} phase, without affecting the
 | |
| 227 | type-assignment of the given terms. | |
| 45258 | 228 | |
| 61416 | 229 | \<^medskip> | 
| 230 | The precise meaning of type checking depends on the context --- | |
| 57496 | 231 | additional check/uncheck modules might be defined in user space. | 
| 45258 | 232 | |
| 233 |   For example, the @{command class} command defines a context where
 | |
| 234 |   @{text "check"} treats certain type instances of overloaded
 | |
| 235 | constants according to the ``dictionary construction'' of its | |
| 236 | logical foundation. This involves ``type improvement'' | |
| 237 | (specialization of slightly too general types) and replacement by | |
| 58555 | 238 |   certain locale parameters.  See also @{cite "Haftmann-Wenzel:2009"}.
 | 
| 58618 | 239 | \<close> | 
| 39852 | 240 | |
| 58618 | 241 | text %mlref \<open> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 242 |   \begin{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 243 |   @{index_ML Syntax.check_typs: "Proof.context -> typ list -> typ list"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 244 |   @{index_ML Syntax.check_terms: "Proof.context -> term list -> term list"} \\
 | 
| 57496 | 245 |   @{index_ML Syntax.check_props: "Proof.context -> term list -> term list"} \\[0.5ex]
 | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 246 |   @{index_ML Syntax.uncheck_typs: "Proof.context -> typ list -> typ list"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 247 |   @{index_ML Syntax.uncheck_terms: "Proof.context -> term list -> term list"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 248 |   \end{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 249 | |
| 57346 | 250 |   \begin{description}
 | 
| 251 | ||
| 252 |   \item @{ML Syntax.check_typs}~@{text "ctxt Ts"} checks a simultaneous list
 | |
| 253 | of pre-types as types of the logic. Typically, this involves normalization | |
| 254 | of type synonyms. | |
| 255 | ||
| 256 |   \item @{ML Syntax.check_terms}~@{text "ctxt ts"} checks a simultaneous list
 | |
| 257 | of pre-terms as terms of the logic. Typically, this involves type-inference | |
| 258 | and normalization term abbreviations. The types within the given terms are | |
| 259 |   treated in the same way as for @{ML Syntax.check_typs}.
 | |
| 260 | ||
| 261 | Applications sometimes need to check several types and terms together. The | |
| 262 |   standard approach uses @{ML Logic.mk_type} to embed the language of types
 | |
| 263 | into that of terms; all arguments are appended into one list of terms that | |
| 57496 | 264 |   is checked; afterwards the type arguments are recovered with @{ML
 | 
| 57346 | 265 | Logic.dest_type}. | 
| 266 | ||
| 267 |   \item @{ML Syntax.check_props}~@{text "ctxt ts"} checks a simultaneous list
 | |
| 268 | of pre-terms as terms of the logic, such that all terms are constrained by | |
| 269 |   type @{typ prop}. The remaining check operation works as @{ML
 | |
| 270 | Syntax.check_terms} above. | |
| 271 | ||
| 272 |   \item @{ML Syntax.uncheck_typs}~@{text "ctxt Ts"} unchecks a simultaneous
 | |
| 273 | list of types of the logic, in preparation of pretty printing. | |
| 274 | ||
| 275 |   \item @{ML Syntax.uncheck_terms}~@{text "ctxt ts"} unchecks a simultaneous
 | |
| 276 | list of terms of the logic, in preparation of pretty printing. There is no | |
| 277 | distinction for propositions here. | |
| 278 | ||
| 279 |   \end{description}
 | |
| 280 | ||
| 57496 | 281 | These operations always operate simultaneously on a list; use the combinator | 
| 282 |   @{ML singleton} to apply them to a single item.
 | |
| 58618 | 283 | \<close> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 284 | |
| 30124 | 285 | end |