doc-src/IsarRef/Thy/Outer_Syntax.thy
author wenzelm
Thu Nov 13 21:37:18 2008 +0100 (2008-11-13)
changeset 28754 6f2e67a3dfaa
parent 28753 b5926a48c943
child 28762 f5d79aeffd81
permissions -rw-r--r--
moved section "Proof method expressions" to proof chapter;
minor rearrangement of proof sections;
     1 (* $Id$ *)
     2 
     3 theory Outer_Syntax
     4 imports Main
     5 begin
     6 
     7 chapter {* Outer syntax *}
     8 
     9 text {*
    10   The rather generic framework of Isabelle/Isar syntax emerges from
    11   three main syntactic categories: \emph{commands} of the top-level
    12   Isar engine (covering theory and proof elements), \emph{methods} for
    13   general goal refinements (analogous to traditional ``tactics''), and
    14   \emph{attributes} for operations on facts (within a certain
    15   context).  Subsequently we give a reference of basic syntactic
    16   entities underlying Isabelle/Isar syntax in a bottom-up manner.
    17   Concrete theory and proof language elements will be introduced later
    18   on.
    19 
    20   \medskip In order to get started with writing well-formed
    21   Isabelle/Isar documents, the most important aspect to be noted is
    22   the difference of \emph{inner} versus \emph{outer} syntax.  Inner
    23   syntax is that of Isabelle types and terms of the logic, while outer
    24   syntax is that of Isabelle/Isar theory sources (specifications and
    25   proofs).  As a general rule, inner syntax entities may occur only as
    26   \emph{atomic entities} within outer syntax.  For example, the string
    27   @{verbatim "\"x + y\""} and identifier @{verbatim z} are legal term
    28   specifications within a theory, while @{verbatim "x + y"} without
    29   quotes is not.
    30 
    31   Printed theory documents usually omit quotes to gain readability
    32   (this is a matter of {\LaTeX} macro setup, say via @{verbatim
    33   "\\isabellestyle"}, see also \cite{isabelle-sys}).  Experienced
    34   users of Isabelle/Isar may easily reconstruct the lost technical
    35   information, while mere readers need not care about quotes at all.
    36 
    37   \medskip Isabelle/Isar input may contain any number of input
    38   termination characters ``@{verbatim ";"}'' (semicolon) to separate
    39   commands explicitly.  This is particularly useful in interactive
    40   shell sessions to make clear where the current command is intended
    41   to end.  Otherwise, the interpreter loop will continue to issue a
    42   secondary prompt ``@{verbatim "#"}'' until an end-of-command is
    43   clearly recognized from the input syntax, e.g.\ encounter of the
    44   next command keyword.
    45 
    46   More advanced interfaces such as Proof~General \cite{proofgeneral}
    47   do not require explicit semicolons, the amount of input text is
    48   determined automatically by inspecting the present content of the
    49   Emacs text buffer.  In the printed presentation of Isabelle/Isar
    50   documents semicolons are omitted altogether for readability.
    51 
    52   \begin{warn}
    53     Proof~General requires certain syntax classification tables in
    54     order to achieve properly synchronized interaction with the
    55     Isabelle/Isar process.  These tables need to be consistent with
    56     the Isabelle version and particular logic image to be used in a
    57     running session (common object-logics may well change the outer
    58     syntax).  The standard setup should work correctly with any of the
    59     ``official'' logic images derived from Isabelle/HOL (including
    60     HOLCF etc.).  Users of alternative logics may need to tell
    61     Proof~General explicitly, e.g.\ by giving an option @{verbatim "-k ZF"}
    62     (in conjunction with @{verbatim "-l ZF"}, to specify the default
    63     logic image).  Note that option @{verbatim "-L"} does both
    64     of this at the same time.
    65   \end{warn}
    66 *}
    67 
    68 
    69 section {* Lexical matters \label{sec:lex-syntax} *}
    70 
    71 text {*
    72   The Isabelle/Isar outer syntax provides token classes as presented
    73   below; most of these coincide with the inner lexical syntax as
    74   presented in \cite{isabelle-ref}.
    75 
    76   \begin{matharray}{rcl}
    77     @{syntax_def ident} & = & letter\,quasiletter^* \\
    78     @{syntax_def longident} & = & ident (\verb,.,ident)^+ \\
    79     @{syntax_def symident} & = & sym^+ ~|~ \verb,\,\verb,<,ident\verb,>, \\
    80     @{syntax_def nat} & = & digit^+ \\
    81     @{syntax_def var} & = & ident ~|~ \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
    82     @{syntax_def typefree} & = & \verb,',ident \\
    83     @{syntax_def typevar} & = & typefree ~|~ \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
    84     @{syntax_def string} & = & \verb,", ~\dots~ \verb,", \\
    85     @{syntax_def altstring} & = & \backquote ~\dots~ \backquote \\
    86     @{syntax_def verbatim} & = & \verb,{*, ~\dots~ \verb,*,\verb,}, \\[1ex]
    87 
    88     letter & = & latin ~|~ \verb,\,\verb,<,latin\verb,>, ~|~ \verb,\,\verb,<,latin\,latin\verb,>, ~|~ greek ~|~ \\
    89            &   & \verb,\<^isub>, ~|~ \verb,\<^isup>, \\
    90     quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
    91     latin & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
    92     digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
    93     sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~
    94      \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \\
    95     & & \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \texttt{\at} ~|~
    96     \verb,^, ~|~ \verb,_, ~|~ \verb,|, ~|~ \verb,~, \\
    97     greek & = & \verb,\<alpha>, ~|~ \verb,\<beta>, ~|~ \verb,\<gamma>, ~|~ \verb,\<delta>, ~| \\
    98           &   & \verb,\<epsilon>, ~|~ \verb,\<zeta>, ~|~ \verb,\<eta>, ~|~ \verb,\<theta>, ~| \\
    99           &   & \verb,\<iota>, ~|~ \verb,\<kappa>, ~|~ \verb,\<mu>, ~|~ \verb,\<nu>, ~| \\
   100           &   & \verb,\<xi>, ~|~ \verb,\<pi>, ~|~ \verb,\<rho>, ~|~ \verb,\<sigma>, ~|~ \verb,\<tau>, ~| \\
   101           &   & \verb,\<upsilon>, ~|~ \verb,\<phi>, ~|~ \verb,\<chi>, ~|~ \verb,\<psi>, ~| \\
   102           &   & \verb,\<omega>, ~|~ \verb,\<Gamma>, ~|~ \verb,\<Delta>, ~|~ \verb,\<Theta>, ~| \\
   103           &   & \verb,\<Lambda>, ~|~ \verb,\<Xi>, ~|~ \verb,\<Pi>, ~|~ \verb,\<Sigma>, ~| \\
   104           &   & \verb,\<Upsilon>, ~|~ \verb,\<Phi>, ~|~ \verb,\<Psi>, ~|~ \verb,\<Omega>, \\
   105   \end{matharray}
   106 
   107   The syntax of @{syntax string} admits any characters, including
   108   newlines; ``@{verbatim "\""}'' (double-quote) and ``@{verbatim
   109   "\\"}'' (backslash) need to be escaped by a backslash; arbitrary
   110   character codes may be specified as ``@{verbatim "\\"}@{text ddd}'',
   111   with three decimal digits.  Alternative strings according to
   112   @{syntax altstring} are analogous, using single back-quotes instead.
   113   The body of @{syntax verbatim} may consist of any text not
   114   containing ``@{verbatim "*"}@{verbatim "}"}''; this allows
   115   convenient inclusion of quotes without further escapes.  The greek
   116   letters do \emph{not} include @{verbatim "\<lambda>"}, which is already used
   117   differently in the meta-logic.
   118 
   119   Common mathematical symbols such as @{text \<forall>} are represented in
   120   Isabelle as @{verbatim \<forall>}.  There are infinitely many Isabelle
   121   symbols like this, although proper presentation is left to front-end
   122   tools such as {\LaTeX} or Proof~General with the X-Symbol package.
   123   A list of standard Isabelle symbols that work well with these tools
   124   is given in \cite[appendix~A]{isabelle-sys}.
   125   
   126   Source comments take the form @{verbatim "(*"}~@{text
   127   "\<dots>"}~@{verbatim "*)"} and may be nested, although user-interface
   128   tools might prevent this.  Note that this form indicates source
   129   comments only, which are stripped after lexical analysis of the
   130   input.  The Isar syntax also provides proper \emph{document
   131   comments} that are considered as part of the text (see
   132   \secref{sec:comments}).
   133 *}
   134 
   135 
   136 section {* Common syntax entities *}
   137 
   138 text {*
   139   We now introduce several basic syntactic entities, such as names,
   140   terms, and theorem specifications, which are factored out of the
   141   actual Isar language elements to be described later.
   142 *}
   143 
   144 
   145 subsection {* Names *}
   146 
   147 text {*
   148   Entity \railqtok{name} usually refers to any name of types,
   149   constants, theorems etc.\ that are to be \emph{declared} or
   150   \emph{defined} (so qualified identifiers are excluded here).  Quoted
   151   strings provide an escape for non-identifier names or those ruled
   152   out by outer syntax keywords (e.g.\ quoted @{verbatim "\"let\""}).
   153   Already existing objects are usually referenced by
   154   \railqtok{nameref}.
   155 
   156   \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   157   \indexoutertoken{int}
   158   \begin{rail}
   159     name: ident | symident | string | nat
   160     ;
   161     parname: '(' name ')'
   162     ;
   163     nameref: name | longident
   164     ;
   165     int: nat | '-' nat
   166     ;
   167   \end{rail}
   168 *}
   169 
   170 
   171 subsection {* Comments \label{sec:comments} *}
   172 
   173 text {*
   174   Large chunks of plain \railqtok{text} are usually given
   175   \railtok{verbatim}, i.e.\ enclosed in @{verbatim "{"}@{verbatim
   176   "*"}~@{text "\<dots>"}~@{verbatim "*"}@{verbatim "}"}.  For convenience,
   177   any of the smaller text units conforming to \railqtok{nameref} are
   178   admitted as well.  A marginal \railnonterm{comment} is of the form
   179   @{verbatim "--"} \railqtok{text}.  Any number of these may occur
   180   within Isabelle/Isar commands.
   181 
   182   \indexoutertoken{text}\indexouternonterm{comment}
   183   \begin{rail}
   184     text: verbatim | nameref
   185     ;
   186     comment: '--' text
   187     ;
   188   \end{rail}
   189 *}
   190 
   191 
   192 subsection {* Type classes, sorts and arities *}
   193 
   194 text {*
   195   Classes are specified by plain names.  Sorts have a very simple
   196   inner syntax, which is either a single class name @{text c} or a
   197   list @{text "{c\<^sub>1, \<dots>, c\<^sub>n}"} referring to the
   198   intersection of these classes.  The syntax of type arities is given
   199   directly at the outer level.
   200 
   201   \indexouternonterm{sort}\indexouternonterm{arity}
   202   \indexouternonterm{classdecl}
   203   \begin{rail}
   204     classdecl: name (('<' | subseteq) (nameref + ','))?
   205     ;
   206     sort: nameref
   207     ;
   208     arity: ('(' (sort + ',') ')')? sort
   209     ;
   210   \end{rail}
   211 *}
   212 
   213 
   214 subsection {* Types and terms \label{sec:types-terms} *}
   215 
   216 text {*
   217   The actual inner Isabelle syntax, that of types and terms of the
   218   logic, is far too sophisticated in order to be modelled explicitly
   219   at the outer theory level.  Basically, any such entity has to be
   220   quoted to turn it into a single token (the parsing and type-checking
   221   is performed internally later).  For convenience, a slightly more
   222   liberal convention is adopted: quotes may be omitted for any type or
   223   term that is already atomic at the outer level.  For example, one
   224   may just write @{verbatim x} instead of quoted @{verbatim "\"x\""}.
   225   Note that symbolic identifiers (e.g.\ @{verbatim "++"} or @{text
   226   "\<forall>"} are available as well, provided these have not been superseded
   227   by commands or other keywords already (such as @{verbatim "="} or
   228   @{verbatim "+"}).
   229 
   230   \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   231   \begin{rail}
   232     type: nameref | typefree | typevar
   233     ;
   234     term: nameref | var
   235     ;
   236     prop: term
   237     ;
   238   \end{rail}
   239 
   240   Positional instantiations are indicated by giving a sequence of
   241   terms, or the placeholder ``@{text _}'' (underscore), which means to
   242   skip a position.
   243 
   244   \indexoutertoken{inst}\indexoutertoken{insts}
   245   \begin{rail}
   246     inst: underscore | term
   247     ;
   248     insts: (inst *)
   249     ;
   250   \end{rail}
   251 
   252   Type declarations and definitions usually refer to
   253   \railnonterm{typespec} on the left-hand side.  This models basic
   254   type constructor application at the outer syntax level.  Note that
   255   only plain postfix notation is available here, but no infixes.
   256 
   257   \indexouternonterm{typespec}
   258   \begin{rail}
   259     typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   260     ;
   261   \end{rail}
   262 *}
   263 
   264 
   265 subsection {* Term patterns and declarations \label{sec:term-decls} *}
   266 
   267 text {*
   268   Wherever explicit propositions (or term fragments) occur in a proof
   269   text, casual binding of schematic term variables may be given
   270   specified via patterns of the form ``@{text "(\<IS> p\<^sub>1 \<dots>
   271   p\<^sub>n)"}''.  This works both for \railqtok{term} and \railqtok{prop}.
   272 
   273   \indexouternonterm{termpat}\indexouternonterm{proppat}
   274   \begin{rail}
   275     termpat: '(' ('is' term +) ')'
   276     ;
   277     proppat: '(' ('is' prop +) ')'
   278     ;
   279   \end{rail}
   280 
   281   \medskip Declarations of local variables @{text "x :: \<tau>"} and
   282   logical propositions @{text "a : \<phi>"} represent different views on
   283   the same principle of introducing a local scope.  In practice, one
   284   may usually omit the typing of \railnonterm{vars} (due to
   285   type-inference), and the naming of propositions (due to implicit
   286   references of current facts).  In any case, Isar proof elements
   287   usually admit to introduce multiple such items simultaneously.
   288 
   289   \indexouternonterm{vars}\indexouternonterm{props}
   290   \begin{rail}
   291     vars: (name+) ('::' type)?
   292     ;
   293     props: thmdecl? (prop proppat? +)
   294     ;
   295   \end{rail}
   296 
   297   The treatment of multiple declarations corresponds to the
   298   complementary focus of \railnonterm{vars} versus
   299   \railnonterm{props}.  In ``@{text "x\<^sub>1 \<dots> x\<^sub>n :: \<tau>"}''
   300   the typing refers to all variables, while in @{text "a: \<phi>\<^sub>1 \<dots>
   301   \<phi>\<^sub>n"} the naming refers to all propositions collectively.
   302   Isar language elements that refer to \railnonterm{vars} or
   303   \railnonterm{props} typically admit separate typings or namings via
   304   another level of iteration, with explicit @{keyword_ref "and"}
   305   separators; e.g.\ see @{command "fix"} and @{command "assume"} in
   306   \secref{sec:proof-context}.
   307 *}
   308 
   309 
   310 subsection {* Mixfix annotations *}
   311 
   312 text {*
   313   Mixfix annotations specify concrete \emph{inner} syntax of Isabelle
   314   types and terms.  Some commands such as @{command "types"} (see
   315   \secref{sec:types-pure}) admit infixes only, while @{command
   316   "consts"} (see \secref{sec:consts}) and @{command "syntax"} (see
   317   \secref{sec:syn-trans}) support the full range of general mixfixes
   318   and binders.
   319 
   320   \indexouternonterm{infix}\indexouternonterm{mixfix}\indexouternonterm{structmixfix}
   321   \begin{rail}
   322     infix: '(' ('infix' | 'infixl' | 'infixr') string nat ')'
   323     ;
   324     mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
   325     ;
   326     structmixfix: mixfix | '(' 'structure' ')'
   327     ;
   328 
   329     prios: '[' (nat + ',') ']'
   330     ;
   331   \end{rail}
   332 
   333   Here the \railtok{string} specifications refer to the actual mixfix
   334   template, which may include literal text, spacing, blocks, and
   335   arguments (denoted by ``@{text _}''); the special symbol
   336   ``@{verbatim "\<index>"}'' (printed as ``@{text "\<index>"}'') represents an index
   337   argument that specifies an implicit structure reference (see also
   338   \secref{sec:locale}).  Infix and binder declarations provide common
   339   abbreviations for particular mixfix declarations.  So in practice,
   340   mixfix templates mostly degenerate to literal text for concrete
   341   syntax, such as ``@{verbatim "++"}'' for an infix symbol.
   342 
   343   \medskip In full generality, mixfix declarations work as follows.
   344   Suppose a constant @{text "c :: \<tau>\<^sub>1 \<Rightarrow> \<dots> \<tau>\<^sub>n \<Rightarrow> \<tau>"} is
   345   annotated by @{text "(mixfix [p\<^sub>1, \<dots>, p\<^sub>n] p)"}, where @{text
   346   "mixfix"} is a string @{text "d\<^sub>0 _ d\<^sub>1 _ \<dots> _ d\<^sub>n"} consisting of
   347   delimiters that surround argument positions as indicated by
   348   underscores.
   349 
   350   Altogether this determines a production for a context-free priority
   351   grammar, where for each argument @{text "i"} the syntactic category
   352   is determined by @{text "\<tau>\<^sub>i"} (with priority @{text "p\<^sub>i"}), and
   353   the result category is determined from @{text "\<tau>"} (with
   354   priority @{text "p"}).  Priority specifications are optional, with
   355   default 0 for arguments and 1000 for the result.
   356 
   357   Since @{text "\<tau>"} may be again a function type, the constant
   358   type scheme may have more argument positions than the mixfix
   359   pattern.  Printing a nested application @{text "c t\<^sub>1 \<dots> t\<^sub>m"} for
   360   @{text "m > n"} works by attaching concrete notation only to the
   361   innermost part, essentially by printing @{text "(c t\<^sub>1 \<dots> t\<^sub>n) \<dots> t\<^sub>m"}
   362   instead.  If a term has fewer arguments than specified in the mixfix
   363   template, the concrete syntax is ignored.
   364 
   365   \medskip A mixfix template may also contain additional directives
   366   for pretty printing, notably spaces, blocks, and breaks.  The
   367   general template format is a sequence over any of the following
   368   entities.
   369 
   370   \begin{itemize}
   371 
   372   \item @{text "\<^bold>d"} is a delimiter, namely a non-empty
   373   sequence of characters other than the special characters @{text "'"}
   374   (single quote), @{text "_"} (underscore), @{text "\<index>"} (index
   375   symbol), @{text "/"} (slash), @{text "("} and @{text ")"}
   376   (parentheses).
   377 
   378   A single quote escapes the special meaning of these meta-characters,
   379   producing a literal version of the following character, unless that
   380   is a blank.  A single quote followed by a blank separates
   381   delimiters, without affecting printing, but input tokens may have
   382   additional white space here.
   383 
   384   \item @{text "_"} is an argument position, which stands for a
   385   certain syntactic category in the underlying grammar.
   386 
   387   \item @{text "\<index>"} is an indexed argument position; this is
   388   the place where implicit structure arguments can be attached.
   389 
   390   \item @{text "\<^bold>s"} is a non-empty sequence of spaces for
   391   printing.  This and the following specifications do not affect
   392   parsing at all.
   393 
   394   \item @{text "(\<^bold>n"} opens a pretty printing block.  The
   395   optional number specifies how much indentation to add when a line
   396   break occurs within the block.  If the parenthesis is not followed
   397   by digits, the indentation defaults to 0.  A block specified via
   398   @{text "(00"} is unbreakable.
   399 
   400   \item @{text ")"} closes a pretty printing block.
   401 
   402   \item @{text "//"} forces a line break.
   403 
   404   \item @{text "/\<^bold>s"} allows a line break.  Here @{text
   405   "\<^bold>s"} stands for the string of spaces (zero or more) right
   406   after the slash.  These spaces are printed if the break is
   407   \emph{not} taken.
   408 
   409   \end{itemize}
   410 
   411   For example, the template @{text "(_ +/ _)"} specifies an infix
   412   operator.  There are two argument positions; the delimiter @{text
   413   "+"} is preceded by a space and followed by a space or line break;
   414   the entire phrase is a pretty printing block.
   415 
   416   The general idea of pretty printing with blocks and breaks is also
   417   described in \cite{paulson-ml2}.
   418 *}
   419 
   420 
   421 subsection {* Attributes and theorems \label{sec:syn-att} *}
   422 
   423 text {* Attributes have their own ``semi-inner'' syntax, in the sense
   424   that input conforming to \railnonterm{args} below is parsed by the
   425   attribute a second time.  The attribute argument specifications may
   426   be any sequence of atomic entities (identifiers, strings etc.), or
   427   properly bracketed argument lists.  Below \railqtok{atom} refers to
   428   any atomic entity, including any \railtok{keyword} conforming to
   429   \railtok{symident}.
   430 
   431   \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   432   \begin{rail}
   433     atom: nameref | typefree | typevar | var | nat | keyword
   434     ;
   435     arg: atom | '(' args ')' | '[' args ']'
   436     ;
   437     args: arg *
   438     ;
   439     attributes: '[' (nameref args * ',') ']'
   440     ;
   441   \end{rail}
   442 
   443   Theorem specifications come in several flavors:
   444   \railnonterm{axmdecl} and \railnonterm{thmdecl} usually refer to
   445   axioms, assumptions or results of goal statements, while
   446   \railnonterm{thmdef} collects lists of existing theorems.  Existing
   447   theorems are given by \railnonterm{thmref} and
   448   \railnonterm{thmrefs}, the former requires an actual singleton
   449   result.
   450 
   451   There are three forms of theorem references:
   452   \begin{enumerate}
   453   
   454   \item named facts @{text "a"},
   455 
   456   \item selections from named facts @{text "a(i)"} or @{text "a(j - k)"},
   457 
   458   \item literal fact propositions using @{syntax_ref altstring} syntax
   459   @{verbatim "`"}@{text "\<phi>"}@{verbatim "`"} (see also method
   460   @{method_ref fact}).
   461 
   462   \end{enumerate}
   463 
   464   Any kind of theorem specification may include lists of attributes
   465   both on the left and right hand sides; attributes are applied to any
   466   immediately preceding fact.  If names are omitted, the theorems are
   467   not stored within the theorem database of the theory or proof
   468   context, but any given attributes are applied nonetheless.
   469 
   470   An extra pair of brackets around attributes (like ``@{text
   471   "[[simproc a]]"}'') abbreviates a theorem reference involving an
   472   internal dummy fact, which will be ignored later on.  So only the
   473   effect of the attribute on the background context will persist.
   474   This form of in-place declarations is particularly useful with
   475   commands like @{command "declare"} and @{command "using"}.
   476 
   477   \indexouternonterm{axmdecl}\indexouternonterm{thmdecl}
   478   \indexouternonterm{thmdef}\indexouternonterm{thmref}
   479   \indexouternonterm{thmrefs}\indexouternonterm{selection}
   480   \begin{rail}
   481     axmdecl: name attributes? ':'
   482     ;
   483     thmdecl: thmbind ':'
   484     ;
   485     thmdef: thmbind '='
   486     ;
   487     thmref: (nameref selection? | altstring) attributes? | '[' attributes ']'
   488     ;
   489     thmrefs: thmref +
   490     ;
   491 
   492     thmbind: name attributes | name | attributes
   493     ;
   494     selection: '(' ((nat | nat '-' nat?) + ',') ')'
   495     ;
   496   \end{rail}
   497 *}
   498 
   499 end