doc-src/IsarRef/syntax.tex
author wenzelm
Sat Oct 30 20:13:16 1999 +0200 (1999-10-30)
changeset 7981 5120a2a15d06
parent 7895 7c492d8bc8e3
child 8102 424f6e663977
permissions -rw-r--r--
tuned;
     1 
     2 \chapter{Isar Syntax Primitives}
     3 
     4 We give a complete reference of all basic syntactic entities underlying the
     5 Isabelle/Isar document syntax.  Actual theory and proof commands will be
     6 introduced later on.
     7 
     8 \medskip
     9 
    10 In order to get started with writing well-formed Isabelle/Isar documents, the
    11 most important aspect to be noted is the difference of \emph{inner} versus
    12 \emph{outer} syntax.  Inner syntax is that of Isabelle types and terms of the
    13 logic, while outer syntax is that of Isabelle/Isar theories (including
    14 proofs).  As a general rule, inner syntax entities may occur only as
    15 \emph{atomic entities} within outer syntax.  For example, the string
    16 \texttt{"x + y"} and identifier \texttt{z} are legal term specifications
    17 within a theory, while \texttt{x + y} is not.
    18 
    19 \begin{warn}
    20   Note that classic Isabelle theories used to fake parts of the inner type
    21   syntax, with rather complicated rules when quotes may be omitted.  Despite
    22   the minor drawback of requiring quotes more often, the syntax of
    23   Isabelle/Isar is simpler and more robust in that respect.
    24 \end{warn}
    25 
    26 \medskip
    27 
    28 Another notable point is proper input termination.  Proof~General demands any
    29 command to be terminated by ``\texttt{;}''
    30 (semicolon)\index{semicolon}\index{*;}.  As far as plain Isabelle/Isar is
    31 concerned, commands may be directly run together, though.  In the presentation
    32 of Isabelle/Isar documents, semicolons are omitted in order to gain
    33 readability.
    34 
    35 
    36 \section{Lexical matters}\label{sec:lex-syntax}
    37 
    38 The Isabelle/Isar outer syntax provides token classes as presented below.
    39 Note that some of these coincide (by full intention) with the inner lexical
    40 syntax as presented in \cite{isabelle-ref}.  These different levels of syntax
    41 should not be confused, though.
    42 
    43 %FIXME keyword, command
    44 \begin{matharray}{rcl}
    45   ident & = & letter~quasiletter^* \\
    46   longident & = & ident\verb,.,ident~\dots~ident \\
    47   symident & = & sym^+ \\
    48   nat & = & digit^+ \\
    49   var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
    50   typefree & = & \verb,',ident \\
    51   typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
    52   string & = & \verb,", ~\dots~ \verb,", \\
    53   verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\
    54 \end{matharray}
    55 \begin{matharray}{rcl}
    56   letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
    57   digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
    58   quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
    59   sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~  %$
    60    \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~
    61    \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \mathtt{\at} ~|~ \\
    62   & & \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\
    63 \end{matharray}
    64 
    65 The syntax of \texttt{string} admits any characters, including newlines;
    66 ``\verb|"|'' (double-quote) and ``\verb|\|'' (backslash) have to be escaped by
    67 a backslash.  Note that ML-style control characters are \emph{not} supported.
    68 The body of \texttt{verbatim} may consist of any text not containing
    69 ``\verb|*}|''.
    70 
    71 Comments take the form \texttt{(*~\dots~*)} and may be
    72 nested\footnote{Proof~General may get confused by nested comments.}, just as
    73 in ML. Note that these are \emph{source} comments only, which are stripped
    74 after lexical analysis of the input.  The Isar document syntax also provides
    75 \emph{formal comments} that are actually part of the text (see
    76 \S\ref{sec:comments}).
    77 
    78 
    79 \section{Common syntax entities}
    80 
    81 Subsequently, we introduce several basic syntactic entities, such as names,
    82 terms, and theorem specifications, which have been factored out of the actual
    83 Isar language elements to be described later.
    84 
    85 Note that some of the basic syntactic entities introduced below (e.g.\ 
    86 \railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\ 
    87 \railnonterm{sort}), especially for the sake of error messages.  E.g.\ syntax
    88 elements such as $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}
    89 would really report a missing name or type rather than any of the constituent
    90 primitive tokens such as \railtoken{ident} or \railtoken{string}.
    91 
    92 
    93 \subsection{Names}
    94 
    95 Entity \railqtoken{name} usually refers to any name of types, constants,
    96 theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified
    97 identifiers are excluded).  Quoted strings provide an escape for
    98 non-identifier names or those ruled out by outer syntax keywords (e.g.\ 
    99 \verb|"let"|).  Already existing objects are usually referenced by
   100 \railqtoken{nameref}.
   101 
   102 \indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
   103 \begin{rail}
   104   name: ident | symident | string
   105   ;
   106   parname: '(' name ')'
   107   ;
   108   nameref: name | longident
   109   ;
   110 \end{rail}
   111 
   112 
   113 \subsection{Comments}\label{sec:comments}
   114 
   115 Large chunks of plain \railqtoken{text} are usually given
   116 \railtoken{verbatim}, i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|.  For
   117 convenience, any of the smaller text units conforming to \railqtoken{nameref}
   118 are admitted as well.  Almost any of the Isar commands may be annotated by a
   119 marginal \railnonterm{comment} of the form \texttt{--} \railqtoken{text}.
   120 Note that the latter kind of comment is actually part of the language, while
   121 source level comments \verb|(*|~\dots~\verb|*)| are stripped at the lexical
   122 level.  A few commands such as $\PROOFNAME$ admit additional markup with a
   123 ``level of interest'': \texttt{\%} followed by an optional number $n$ (default
   124 $n = 1$) indicates that the respective part of the document becomes $n$ levels
   125 more obscure; \texttt{\%\%} means that interest drops by $\infty$ --- abandon
   126 every hope, who enter here.
   127 
   128 \indexoutertoken{text}\indexouternonterm{comment}\indexouternonterm{interest}
   129 \begin{rail}
   130   text: verbatim | nameref
   131   ;
   132   comment: '--' text
   133   ;
   134   interest: percent nat? | ppercent
   135   ;
   136 \end{rail}
   137 
   138 
   139 \subsection{Type classes, sorts and arities}
   140 
   141 The syntax of sorts and arities is given directly at the outer level.  Note
   142 that this is in contrast to types and terms (see \ref{sec:types-terms}).
   143 
   144 \indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}
   145 \indexouternonterm{classdecl}
   146 \begin{rail}
   147   classdecl: name ('<' (nameref + ','))?
   148   ;
   149   sort: nameref | lbrace (nameref * ',') rbrace
   150   ;
   151   arity: ('(' (sort + ',') ')')? sort
   152   ;
   153   simplearity: ('(' (sort + ',') ')')? nameref
   154   ;
   155 \end{rail}
   156 
   157 
   158 \subsection{Types and terms}\label{sec:types-terms}
   159 
   160 The actual inner Isabelle syntax, that of types and terms of the logic, is far
   161 too sophisticated in order to be modelled explicitly at the outer theory
   162 level.  Basically, any such entity has to be quoted here to turn it into a
   163 single token (the parsing and type-checking is performed internally later).
   164 For convenience, a slightly more liberal convention is adopted: quotes may be
   165 omitted for any type or term that is already \emph{atomic} at the outer level.
   166 For example, one may write just \texttt{x} instead of \texttt{"x"}.  Note that
   167 symbolic identifiers (e.g.\ \texttt{++}) are available as well, provided these
   168 are not superseded by commands or keywords (e.g.\ \texttt{+}).
   169 
   170 \indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
   171 \begin{rail}
   172   type: nameref | typefree | typevar
   173   ;
   174   term: nameref | var | nat
   175   ;
   176   prop: term
   177   ;
   178 \end{rail}
   179 
   180 Type declarations and definitions usually refer to \railnonterm{typespec} on
   181 the left-hand side.  This models basic type constructor application at the
   182 outer syntax level.  Note that only plain postfix notation is available here,
   183 but no infixes.
   184 
   185 \indexouternonterm{typespec}
   186 \begin{rail}
   187   typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
   188   ;
   189 \end{rail}
   190 
   191 
   192 \subsection{Term patterns}\label{sec:term-pats}
   193 
   194 Assumptions and goal statements usually admit casual binding of schematic term
   195 variables by giving (optional) patterns of the form $\ISS{p@1\;\dots}{p@n}$.
   196 There are separate versions available for \railqtoken{term}s and
   197 \railqtoken{prop}s.  The latter provides a $\CONCLNAME$ part with patterns
   198 referring the (atomic) conclusion of a rule.
   199 
   200 \indexouternonterm{termpat}\indexouternonterm{proppat}
   201 \begin{rail}
   202   termpat: '(' ('is' term +) ')'
   203   ;
   204   proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'
   205   ;
   206 \end{rail}
   207 
   208 
   209 \subsection{Mixfix annotations}
   210 
   211 Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and
   212 terms.  Some commands such as $\TYPES$ (see \S\ref{sec:types-pure}) admit
   213 infixes only, while $\CONSTS$ (see \S\ref{sec:consts}) and
   214 $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans}) support the full range of
   215 general mixfixes and binders.
   216 
   217 \indexouternonterm{infix}\indexouternonterm{mixfix}
   218 \begin{rail}
   219   infix: '(' ('infixl' | 'infixr') string? nat ')'
   220   ;
   221   mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
   222   ;
   223 
   224   prios: '[' (nat + ',') ']'
   225   ;
   226 \end{rail}
   227 
   228 
   229 \subsection{Attributes and theorems}\label{sec:syn-att}
   230 
   231 Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own
   232 ``semi-inner'' syntax, in the sense that input conforming to
   233 \railnonterm{args} below is parsed by the attribute a second time.  The
   234 attribute argument specifications may be any sequence of atomic entities
   235 (identifiers, strings etc.), or properly bracketed argument lists.  Below
   236 \railqtoken{atom} refers to any atomic entity, including any
   237 \railtoken{keyword} conforming to \railtoken{symident}.
   238 
   239 \indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
   240 \begin{rail}
   241   atom: nameref | typefree | typevar | var | nat | keyword
   242   ;
   243   arg: atom | '(' args ')' | '[' args ']' | lbrace args rbrace
   244   ;
   245   args: arg *
   246   ;
   247   attributes: '[' (nameref args * ',') ']'
   248   ;
   249 \end{rail}
   250 
   251 Theorem specifications come in several flavors: \railnonterm{axmdecl} and
   252 \railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal
   253 statements, while \railnonterm{thmdef} collects lists of existing theorems.
   254 Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},
   255 the former requires an actual singleton result.  Any of these theorem
   256 specifications may include lists of attributes both on the left and right hand
   257 sides; attributes are applied to any immediately preceding theorem.  If names
   258 are omitted, the theorems are not stored within the theorem database of the
   259 theory or proof context; any given attributes are still applied, though.
   260 
   261 \indexouternonterm{thmdecl}\indexouternonterm{axmdecl}
   262 \indexouternonterm{thmdef}\indexouternonterm{thmrefs}
   263 \begin{rail}
   264   axmdecl: name attributes? ':'
   265   ;
   266   thmdecl: thmname ':'
   267   ;
   268   thmdef: thmname '='
   269   ;
   270   thmref: nameref attributes?
   271   ;
   272   thmrefs: thmref +
   273   ;
   274 
   275   thmname: name attributes | name | attributes
   276   ;
   277 \end{rail}
   278 
   279 
   280 \subsection{Proof methods}\label{sec:syn-meth}
   281 
   282 Proof methods are either basic ones, or expressions composed of methods via
   283 ``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),
   284 ``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once).  In practice,
   285 proof methods are usually just a comma separated list of
   286 \railqtoken{nameref}~\railnonterm{args} specifications.  Note that parentheses
   287 may be dropped for single method specifications (with no arguments).
   288 
   289 \indexouternonterm{method}
   290 \begin{rail}
   291   method: (nameref | '(' methods ')') (() | '?' | '+')
   292   ;
   293   methods: (nameref args | method) + (',' | '|')
   294   ;
   295 \end{rail}
   296 
   297 
   298 %%% Local Variables: 
   299 %%% mode: latex
   300 %%% TeX-master: "isar-ref"
   301 %%% End: