| author | wenzelm | 
| Sun, 29 Mar 2015 22:38:18 +0200 | |
| changeset 59847 | c5c4a936357a | 
| parent 58618 | 782f0b662cae | 
| child 61416 | b9a3324e4e62 | 
| permissions | -rw-r--r-- | 
| 57344 | 1 | (*:wrap=hard:maxLineLen=78:*) | 
| 2 | ||
| 30124 | 3 | theory Syntax | 
| 4 | imports Base | |
| 5 | begin | |
| 6 | ||
| 58618 | 7 | chapter \<open>Concrete syntax and type-checking\<close> | 
| 30124 | 8 | |
| 58618 | 9 | text \<open>Pure @{text "\<lambda>"}-calculus as introduced in \chref{ch:logic} is
 | 
| 45258 | 10 | an adequate foundation for logical languages --- in the tradition of | 
| 11 |   \emph{higher-order abstract syntax} --- but end-users require
 | |
| 12 | additional means for reading and printing of terms and types. This | |
| 13 |   important add-on outside the logical core is called \emph{inner
 | |
| 14 |   syntax} in Isabelle jargon, as opposed to the \emph{outer syntax} of
 | |
| 58555 | 15 |   the theory and proof language @{cite "isabelle-isar-ref"}.
 | 
| 45258 | 16 | |
| 58555 | 17 |   For example, according to @{cite church40} quantifiers are represented as
 | 
| 57496 | 18 |   higher-order constants @{text "All :: ('a \<Rightarrow> bool) \<Rightarrow> bool"} such that @{text
 | 
| 19 | "All (\<lambda>x::'a. B x)"} faithfully represents the idea that is displayed in | |
| 20 |   Isabelle as @{text "\<forall>x::'a. B x"} via @{keyword "binder"} notation.
 | |
| 58555 | 21 |   Moreover, type-inference in the style of Hindley-Milner @{cite hindleymilner}
 | 
| 57496 | 22 |   (and extensions) enables users to write @{text "\<forall>x. B x"} concisely, when
 | 
| 23 |   the type @{text "'a"} is already clear from the
 | |
| 24 |   context.\footnote{Type-inference taken to the extreme can easily confuse
 | |
| 25 | users. Beginners often stumble over unexpectedly general types inferred by | |
| 26 | the system.} | |
| 45258 | 27 | |
| 28 |   \medskip The main inner syntax operations are \emph{read} for
 | |
| 29 |   parsing together with type-checking, and \emph{pretty} for formatted
 | |
| 30 |   output.  See also \secref{sec:read-print}.
 | |
| 31 | ||
| 32 | Furthermore, the input and output syntax layers are sub-divided into | |
| 33 |   separate phases for \emph{concrete syntax} versus \emph{abstract
 | |
| 34 |   syntax}, see also \secref{sec:parse-unparse} and
 | |
| 35 |   \secref{sec:term-check}, respectively.  This results in the
 | |
| 36 | following decomposition of the main operations: | |
| 37 | ||
| 38 |   \begin{itemize}
 | |
| 39 | ||
| 40 |   \item @{text "read = parse; check"}
 | |
| 41 | ||
| 42 |   \item @{text "pretty = uncheck; unparse"}
 | |
| 43 | ||
| 44 |   \end{itemize}
 | |
| 45 | ||
| 57496 | 46 | For example, some specification package might thus intercept syntax | 
| 47 |   processing at a well-defined stage after @{text "parse"}, to a augment the
 | |
| 48 |   resulting pre-term before full type-reconstruction is performed by @{text
 | |
| 49 | "check"}. Note that the formal status of bound variables, versus free | |
| 50 | variables, versus constants must not be changed between these phases. | |
| 57345 | 51 | |
| 52 |   \medskip In general, @{text check} and @{text uncheck} operate
 | |
| 53 | simultaneously on a list of terms. This is particular important for | |
| 54 | type-checking, to reconstruct types for several terms of the same context | |
| 55 |   and scope. In contrast, @{text parse} and @{text unparse} operate separately
 | |
| 57496 | 56 | on single terms. | 
| 57345 | 57 | |
| 58 | There are analogous operations to read and print types, with the same | |
| 59 | sub-division into phases. | |
| 58618 | 60 | \<close> | 
| 45258 | 61 | |
| 30272 | 62 | |
| 58618 | 63 | section \<open>Reading and pretty printing \label{sec:read-print}\<close>
 | 
| 34924 | 64 | |
| 58618 | 65 | text \<open> | 
| 57496 | 66 | Read and print operations are roughly dual to each other, such that for the | 
| 67 |   user @{text "s' = pretty (read s)"} looks similar to the original source
 | |
| 68 |   text @{text "s"}, but the details depend on many side-conditions. There are
 | |
| 69 | also explicit options to control the removal of type information in the | |
| 70 |   output. The default configuration routinely looses information, so @{text
 | |
| 71 | "t' = read (pretty t)"} might fail, or produce a differently typed term, or | |
| 72 | a completely different term in the face of syntactic overloading. | |
| 58618 | 73 | \<close> | 
| 34924 | 74 | |
| 58618 | 75 | text %mlref \<open> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 76 |   \begin{mldecls}
 | 
| 57345 | 77 |   @{index_ML Syntax.read_typs: "Proof.context -> string list -> typ list"} \\
 | 
| 78 |   @{index_ML Syntax.read_terms: "Proof.context -> string list -> term list"} \\
 | |
| 79 |   @{index_ML Syntax.read_props: "Proof.context -> string list -> term list"} \\[0.5ex]
 | |
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 80 |   @{index_ML Syntax.read_typ: "Proof.context -> string -> typ"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 81 |   @{index_ML Syntax.read_term: "Proof.context -> string -> term"} \\
 | 
| 57345 | 82 |   @{index_ML Syntax.read_prop: "Proof.context -> string -> term"} \\[0.5ex]
 | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 83 |   @{index_ML Syntax.pretty_typ: "Proof.context -> typ -> Pretty.T"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 84 |   @{index_ML Syntax.pretty_term: "Proof.context -> term -> Pretty.T"} \\
 | 
| 57345 | 85 |   @{index_ML Syntax.string_of_typ: "Proof.context -> typ -> string"} \\
 | 
| 86 |   @{index_ML Syntax.string_of_term: "Proof.context -> term -> string"} \\
 | |
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 87 |   \end{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 88 | |
| 57345 | 89 |   \begin{description}
 | 
| 90 | ||
| 57346 | 91 |   \item @{ML Syntax.read_typs}~@{text "ctxt strs"} parses and checks a
 | 
| 57345 | 92 | simultaneous list of source strings as types of the logic. | 
| 93 | ||
| 57346 | 94 |   \item @{ML Syntax.read_terms}~@{text "ctxt strs"} parses and checks a
 | 
| 57345 | 95 | simultaneous list of source strings as terms of the logic. | 
| 57346 | 96 | Type-reconstruction puts all parsed terms into the same scope: types of | 
| 97 | free variables ultimately need to coincide. | |
| 57345 | 98 | |
| 99 | If particular type-constraints are required for some of the arguments, the | |
| 57346 | 100 | read operations needs to be split into its parse and check phases. Then it | 
| 57496 | 101 |   is possible to use @{ML Type.constraint} on the intermediate pre-terms
 | 
| 57846 | 102 |   (\secref{sec:term-check}).
 | 
| 57345 | 103 | |
| 57346 | 104 |   \item @{ML Syntax.read_props}~@{text "ctxt strs"} parses and checks a
 | 
| 105 | simultaneous list of source strings as terms of the logic, with an implicit | |
| 106 |   type-constraint for each argument to enforce type @{typ prop}; this also
 | |
| 57496 | 107 | affects the inner syntax for parsing. The remaining type-reconstruction | 
| 108 |   works as for @{ML Syntax.read_terms}.
 | |
| 57345 | 109 | |
| 110 |   \item @{ML Syntax.read_typ}, @{ML Syntax.read_term}, @{ML Syntax.read_prop}
 | |
| 57496 | 111 | are like the simultaneous versions, but operate on a single argument only. | 
| 112 | This convenient shorthand is adequate in situations where a single item in | |
| 113 |   its own scope is processed. Do not use @{ML "map o Syntax.read_term"} where
 | |
| 114 |   @{ML Syntax.read_terms} is actually intended!
 | |
| 57345 | 115 | |
| 116 |   \item @{ML Syntax.pretty_typ}~@{text "ctxt T"} and @{ML
 | |
| 117 |   Syntax.pretty_term}~@{text "ctxt t"} uncheck and pretty-print the given type
 | |
| 118 | or term, respectively. Although the uncheck phase acts on a simultaneous | |
| 57496 | 119 | list as well, this is rarely used in practice, so only the singleton case is | 
| 120 | provided as combined pretty operation. There is no distinction of term vs.\ | |
| 121 | proposition. | |
| 57345 | 122 | |
| 123 |   \item @{ML Syntax.string_of_typ} and @{ML Syntax.string_of_term} are
 | |
| 124 |   convenient compositions of @{ML Syntax.pretty_typ} and @{ML
 | |
| 125 |   Syntax.pretty_term} with @{ML Pretty.string_of} for output. The result may
 | |
| 126 | be concatenated with other strings, as long as there is no further | |
| 127 | formatting and line-breaking involved. | |
| 128 | ||
| 129 |   \end{description}
 | |
| 130 | ||
| 57346 | 131 |   @{ML Syntax.read_term}, @{ML Syntax.read_prop}, and @{ML
 | 
| 132 | Syntax.string_of_term} are the most important operations in practice. | |
| 57345 | 133 | |
| 57496 | 134 | \medskip Note that the string values that are passed in and out are | 
| 57346 | 135 | annotated by the system, to carry further markup that is relevant for the | 
| 58555 | 136 |   Prover IDE @{cite "isabelle-jedit"}. User code should neither compose its
 | 
| 137 | own input strings, nor try to analyze the output strings. Conceptually this | |
| 138 | is an abstract datatype, encoded as concrete string for historical reasons. | |
| 57345 | 139 | |
| 140 | The standard way to provide the required position markup for input works via | |
| 141 |   the outer syntax parser wrapper @{ML Parse.inner_syntax}, which is already
 | |
| 142 |   part of @{ML Parse.typ}, @{ML Parse.term}, @{ML Parse.prop}. So a string
 | |
| 143 | obtained from one of the latter may be directly passed to the corresponding | |
| 57346 | 144 | read operation: this yields PIDE markup of the input and precise positions | 
| 145 | for warning and error messages. | |
| 58618 | 146 | \<close> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 147 | |
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 148 | |
| 58618 | 149 | section \<open>Parsing and unparsing \label{sec:parse-unparse}\<close>
 | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 150 | |
| 58618 | 151 | text \<open> | 
| 57496 | 152 | Parsing and unparsing converts between actual source text and a certain | 
| 153 |   \emph{pre-term} format, where all bindings and scopes are already resolved
 | |
| 154 | faithfully. Thus the names of free variables or constants are determined in | |
| 155 | the sense of the logical context, but type information might be still | |
| 156 |   missing. Pre-terms support an explicit language of \emph{type constraints}
 | |
| 157 |   that may be augmented by user code to guide the later \emph{check} phase.
 | |
| 45258 | 158 | |
| 57496 | 159 | Actual parsing is based on traditional lexical analysis and Earley parsing | 
| 160 | for arbitrary context-free grammars. The user can specify the grammar | |
| 161 |   declaratively via mixfix annotations. Moreover, there are \emph{syntax
 | |
| 162 | translations} that can be augmented by the user, either declaratively via | |
| 163 |   @{command translations} or programmatically via @{command
 | |
| 58555 | 164 |   parse_translation}, @{command print_translation} @{cite
 | 
| 165 | "isabelle-isar-ref"}. The final scope-resolution is performed by the system, | |
| 166 | according to name spaces for types, term variables and constants determined | |
| 167 | by the context. | |
| 58618 | 168 | \<close> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 169 | |
| 58618 | 170 | text %mlref \<open> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 171 |   \begin{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 172 |   @{index_ML Syntax.parse_typ: "Proof.context -> string -> typ"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 173 |   @{index_ML Syntax.parse_term: "Proof.context -> string -> term"} \\
 | 
| 57496 | 174 |   @{index_ML Syntax.parse_prop: "Proof.context -> string -> term"} \\[0.5ex]
 | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 175 |   @{index_ML Syntax.unparse_typ: "Proof.context -> typ -> Pretty.T"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 176 |   @{index_ML Syntax.unparse_term: "Proof.context -> term -> Pretty.T"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 177 |   \end{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 178 | |
| 57346 | 179 |   \begin{description}
 | 
| 180 | ||
| 57496 | 181 |   \item @{ML Syntax.parse_typ}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 182 | pre-type that is ready to be used with subsequent check operations. | 
| 183 | ||
| 57496 | 184 |   \item @{ML Syntax.parse_term}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 185 | pre-term that is ready to be used with subsequent check operations. | 
| 186 | ||
| 57496 | 187 |   \item @{ML Syntax.parse_prop}~@{text "ctxt str"} parses a source string as
 | 
| 57346 | 188 | pre-term that is ready to be used with subsequent check operations. The | 
| 189 |   inner syntax category is @{typ prop} and a suitable type-constraint is
 | |
| 57496 | 190 | included to ensure that this information is observed in subsequent type | 
| 191 | reconstruction. | |
| 57346 | 192 | |
| 193 |   \item @{ML Syntax.unparse_typ}~@{text "ctxt T"} unparses a type after
 | |
| 194 | uncheck operations, to turn it into a pretty tree. | |
| 195 | ||
| 196 |   \item @{ML Syntax.unparse_term}~@{text "ctxt T"} unparses a term after
 | |
| 197 | uncheck operations, to turn it into a pretty tree. There is no distinction | |
| 198 | for propositions here. | |
| 199 | ||
| 200 |   \end{description}
 | |
| 201 | ||
| 57496 | 202 |   These operations always operate on a single item; use the combinator @{ML
 | 
| 203 | map} to apply them to a list. | |
| 58618 | 204 | \<close> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 205 | |
| 39852 | 206 | |
| 58618 | 207 | section \<open>Checking and unchecking \label{sec:term-check}\<close>
 | 
| 34924 | 208 | |
| 58618 | 209 | text \<open>These operations define the transition from pre-terms and | 
| 45258 | 210 | fully-annotated terms in the sense of the logical core | 
| 211 |   (\chref{ch:logic}).
 | |
| 212 | ||
| 213 |   The \emph{check} phase is meant to subsume a variety of mechanisms
 | |
| 214 | in the manner of ``type-inference'' or ``type-reconstruction'' or | |
| 215 | ``type-improvement'', not just type-checking in the narrow sense. | |
| 216 |   The \emph{uncheck} phase is roughly dual, it prunes type-information
 | |
| 217 | before pretty printing. | |
| 218 | ||
| 219 |   A typical add-on for the check/uncheck syntax layer is the @{command
 | |
| 58555 | 220 |   abbreviation} mechanism @{cite "isabelle-isar-ref"}. Here the user specifies
 | 
| 57496 | 221 |   syntactic definitions that are managed by the system as polymorphic @{text
 | 
| 222 |   "let"} bindings. These are expanded during the @{text "check"} phase, and
 | |
| 223 |   contracted during the @{text "uncheck"} phase, without affecting the
 | |
| 224 | type-assignment of the given terms. | |
| 45258 | 225 | |
| 57496 | 226 | \medskip The precise meaning of type checking depends on the context --- | 
| 227 | additional check/uncheck modules might be defined in user space. | |
| 45258 | 228 | |
| 229 |   For example, the @{command class} command defines a context where
 | |
| 230 |   @{text "check"} treats certain type instances of overloaded
 | |
| 231 | constants according to the ``dictionary construction'' of its | |
| 232 | logical foundation. This involves ``type improvement'' | |
| 233 | (specialization of slightly too general types) and replacement by | |
| 58555 | 234 |   certain locale parameters.  See also @{cite "Haftmann-Wenzel:2009"}.
 | 
| 58618 | 235 | \<close> | 
| 39852 | 236 | |
| 58618 | 237 | text %mlref \<open> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 238 |   \begin{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 239 |   @{index_ML Syntax.check_typs: "Proof.context -> typ list -> typ list"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 240 |   @{index_ML Syntax.check_terms: "Proof.context -> term list -> term list"} \\
 | 
| 57496 | 241 |   @{index_ML Syntax.check_props: "Proof.context -> term list -> term list"} \\[0.5ex]
 | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 242 |   @{index_ML Syntax.uncheck_typs: "Proof.context -> typ list -> typ list"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 243 |   @{index_ML Syntax.uncheck_terms: "Proof.context -> term list -> term list"} \\
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 244 |   \end{mldecls}
 | 
| 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 245 | |
| 57346 | 246 |   \begin{description}
 | 
| 247 | ||
| 248 |   \item @{ML Syntax.check_typs}~@{text "ctxt Ts"} checks a simultaneous list
 | |
| 249 | of pre-types as types of the logic. Typically, this involves normalization | |
| 250 | of type synonyms. | |
| 251 | ||
| 252 |   \item @{ML Syntax.check_terms}~@{text "ctxt ts"} checks a simultaneous list
 | |
| 253 | of pre-terms as terms of the logic. Typically, this involves type-inference | |
| 254 | and normalization term abbreviations. The types within the given terms are | |
| 255 |   treated in the same way as for @{ML Syntax.check_typs}.
 | |
| 256 | ||
| 257 | Applications sometimes need to check several types and terms together. The | |
| 258 |   standard approach uses @{ML Logic.mk_type} to embed the language of types
 | |
| 259 | into that of terms; all arguments are appended into one list of terms that | |
| 57496 | 260 |   is checked; afterwards the type arguments are recovered with @{ML
 | 
| 57346 | 261 | Logic.dest_type}. | 
| 262 | ||
| 263 |   \item @{ML Syntax.check_props}~@{text "ctxt ts"} checks a simultaneous list
 | |
| 264 | of pre-terms as terms of the logic, such that all terms are constrained by | |
| 265 |   type @{typ prop}. The remaining check operation works as @{ML
 | |
| 266 | Syntax.check_terms} above. | |
| 267 | ||
| 268 |   \item @{ML Syntax.uncheck_typs}~@{text "ctxt Ts"} unchecks a simultaneous
 | |
| 269 | list of types of the logic, in preparation of pretty printing. | |
| 270 | ||
| 271 |   \item @{ML Syntax.uncheck_terms}~@{text "ctxt ts"} unchecks a simultaneous
 | |
| 272 | list of terms of the logic, in preparation of pretty printing. There is no | |
| 273 | distinction for propositions here. | |
| 274 | ||
| 275 |   \end{description}
 | |
| 276 | ||
| 57496 | 277 | These operations always operate simultaneously on a list; use the combinator | 
| 278 |   @{ML singleton} to apply them to a single item.
 | |
| 58618 | 279 | \<close> | 
| 39876 
1ff9bce085bd
preliminary material on "Concrete syntax and type-checking";
 wenzelm parents: 
39865diff
changeset | 280 | |
| 30124 | 281 | end |