doc-src/IsarRef/syntax.tex
author wenzelm
Sat Oct 30 20:13:16 1999 +0200 (1999-10-30)
changeset 7981 5120a2a15d06
parent 7895 7c492d8bc8e3
child 8102 424f6e663977
permissions -rw-r--r--
tuned;
wenzelm@7046
     1
wenzelm@7895
     2
\chapter{Isar Syntax Primitives}
wenzelm@7046
     3
wenzelm@7315
     4
We give a complete reference of all basic syntactic entities underlying the
wenzelm@7335
     5
Isabelle/Isar document syntax.  Actual theory and proof commands will be
wenzelm@7335
     6
introduced later on.
wenzelm@7315
     7
wenzelm@7315
     8
\medskip
wenzelm@7315
     9
wenzelm@7315
    10
In order to get started with writing well-formed Isabelle/Isar documents, the
wenzelm@7315
    11
most important aspect to be noted is the difference of \emph{inner} versus
wenzelm@7315
    12
\emph{outer} syntax.  Inner syntax is that of Isabelle types and terms of the
wenzelm@7895
    13
logic, while outer syntax is that of Isabelle/Isar theories (including
wenzelm@7895
    14
proofs).  As a general rule, inner syntax entities may occur only as
wenzelm@7895
    15
\emph{atomic entities} within outer syntax.  For example, the string
wenzelm@7895
    16
\texttt{"x + y"} and identifier \texttt{z} are legal term specifications
wenzelm@7895
    17
within a theory, while \texttt{x + y} is not.
wenzelm@7315
    18
wenzelm@7315
    19
\begin{warn}
wenzelm@7895
    20
  Note that classic Isabelle theories used to fake parts of the inner type
wenzelm@7981
    21
  syntax, with rather complicated rules when quotes may be omitted.  Despite
wenzelm@7981
    22
  the minor drawback of requiring quotes more often, the syntax of
wenzelm@7981
    23
  Isabelle/Isar is simpler and more robust in that respect.
wenzelm@7315
    24
\end{warn}
wenzelm@7315
    25
wenzelm@7466
    26
\medskip
wenzelm@7466
    27
wenzelm@7466
    28
Another notable point is proper input termination.  Proof~General demands any
wenzelm@7466
    29
command to be terminated by ``\texttt{;}''
wenzelm@7466
    30
(semicolon)\index{semicolon}\index{*;}.  As far as plain Isabelle/Isar is
wenzelm@7981
    31
concerned, commands may be directly run together, though.  In the presentation
wenzelm@7981
    32
of Isabelle/Isar documents, semicolons are omitted in order to gain
wenzelm@7981
    33
readability.
wenzelm@7466
    34
wenzelm@7315
    35
wenzelm@7315
    36
\section{Lexical matters}\label{sec:lex-syntax}
wenzelm@7315
    37
wenzelm@7315
    38
The Isabelle/Isar outer syntax provides token classes as presented below.
wenzelm@7895
    39
Note that some of these coincide (by full intention) with the inner lexical
wenzelm@7895
    40
syntax as presented in \cite{isabelle-ref}.  These different levels of syntax
wenzelm@7895
    41
should not be confused, though.
wenzelm@7134
    42
wenzelm@7335
    43
%FIXME keyword, command
wenzelm@7315
    44
\begin{matharray}{rcl}
wenzelm@7315
    45
  ident & = & letter~quasiletter^* \\
wenzelm@7315
    46
  longident & = & ident\verb,.,ident~\dots~ident \\
wenzelm@7315
    47
  symident & = & sym^+ \\
wenzelm@7315
    48
  nat & = & digit^+ \\
wenzelm@7315
    49
  var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
wenzelm@7315
    50
  typefree & = & \verb,',ident \\
wenzelm@7315
    51
  typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
wenzelm@7315
    52
  string & = & \verb,", ~\dots~ \verb,", \\
wenzelm@7319
    53
  verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\
wenzelm@7319
    54
\end{matharray}
wenzelm@7319
    55
\begin{matharray}{rcl}
wenzelm@7315
    56
  letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
wenzelm@7315
    57
  digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
wenzelm@7315
    58
  quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
wenzelm@7315
    59
  sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~  %$
wenzelm@7319
    60
   \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~
wenzelm@7319
    61
   \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \mathtt{\at} ~|~ \\
wenzelm@7319
    62
  & & \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\
wenzelm@7315
    63
\end{matharray}
wenzelm@7315
    64
wenzelm@7315
    65
The syntax of \texttt{string} admits any characters, including newlines;
wenzelm@7895
    66
``\verb|"|'' (double-quote) and ``\verb|\|'' (backslash) have to be escaped by
wenzelm@7981
    67
a backslash.  Note that ML-style control characters are \emph{not} supported.
wenzelm@7981
    68
The body of \texttt{verbatim} may consist of any text not containing
wenzelm@7981
    69
``\verb|*}|''.
wenzelm@7315
    70
wenzelm@7895
    71
Comments take the form \texttt{(*~\dots~*)} and may be
wenzelm@7981
    72
nested\footnote{Proof~General may get confused by nested comments.}, just as
wenzelm@7981
    73
in ML. Note that these are \emph{source} comments only, which are stripped
wenzelm@7981
    74
after lexical analysis of the input.  The Isar document syntax also provides
wenzelm@7981
    75
\emph{formal comments} that are actually part of the text (see
wenzelm@7895
    76
\S\ref{sec:comments}).
wenzelm@7315
    77
wenzelm@7046
    78
wenzelm@7046
    79
\section{Common syntax entities}
wenzelm@7046
    80
wenzelm@7335
    81
Subsequently, we introduce several basic syntactic entities, such as names,
wenzelm@7895
    82
terms, and theorem specifications, which have been factored out of the actual
wenzelm@7895
    83
Isar language elements to be described later.
wenzelm@7134
    84
wenzelm@7981
    85
Note that some of the basic syntactic entities introduced below (e.g.\ 
wenzelm@7895
    86
\railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\ 
wenzelm@7895
    87
\railnonterm{sort}), especially for the sake of error messages.  E.g.\ syntax
wenzelm@7895
    88
elements such as $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}
wenzelm@7895
    89
would really report a missing name or type rather than any of the constituent
wenzelm@7895
    90
primitive tokens such as \railtoken{ident} or \railtoken{string}.
wenzelm@7046
    91
wenzelm@7050
    92
wenzelm@7050
    93
\subsection{Names}
wenzelm@7050
    94
wenzelm@7134
    95
Entity \railqtoken{name} usually refers to any name of types, constants,
wenzelm@7167
    96
theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified
wenzelm@7134
    97
identifiers are excluded).  Quoted strings provide an escape for
wenzelm@7134
    98
non-identifier names or those ruled out by outer syntax keywords (e.g.\ 
wenzelm@7134
    99
\verb|"let"|).  Already existing objects are usually referenced by
wenzelm@7134
   100
\railqtoken{nameref}.
wenzelm@7050
   101
wenzelm@7141
   102
\indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
wenzelm@7046
   103
\begin{rail}
wenzelm@7167
   104
  name: ident | symident | string
wenzelm@7046
   105
  ;
wenzelm@7167
   106
  parname: '(' name ')'
wenzelm@7141
   107
  ;
wenzelm@7167
   108
  nameref: name | longident
wenzelm@7046
   109
  ;
wenzelm@7046
   110
\end{rail}
wenzelm@7046
   111
wenzelm@7050
   112
wenzelm@7315
   113
\subsection{Comments}\label{sec:comments}
wenzelm@7046
   114
wenzelm@7167
   115
Large chunks of plain \railqtoken{text} are usually given
wenzelm@7895
   116
\railtoken{verbatim}, i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|.  For
wenzelm@7175
   117
convenience, any of the smaller text units conforming to \railqtoken{nameref}
wenzelm@7466
   118
are admitted as well.  Almost any of the Isar commands may be annotated by a
wenzelm@7466
   119
marginal \railnonterm{comment} of the form \texttt{--} \railqtoken{text}.
wenzelm@7466
   120
Note that the latter kind of comment is actually part of the language, while
wenzelm@7895
   121
source level comments \verb|(*|~\dots~\verb|*)| are stripped at the lexical
wenzelm@7466
   122
level.  A few commands such as $\PROOFNAME$ admit additional markup with a
wenzelm@7466
   123
``level of interest'': \texttt{\%} followed by an optional number $n$ (default
wenzelm@7466
   124
$n = 1$) indicates that the respective part of the document becomes $n$ levels
wenzelm@7466
   125
more obscure; \texttt{\%\%} means that interest drops by $\infty$ --- abandon
wenzelm@7466
   126
every hope, who enter here.
wenzelm@7050
   127
wenzelm@7050
   128
\indexoutertoken{text}\indexouternonterm{comment}\indexouternonterm{interest}
wenzelm@7046
   129
\begin{rail}
wenzelm@7167
   130
  text: verbatim | nameref
wenzelm@7050
   131
  ;
wenzelm@7167
   132
  comment: '--' text
wenzelm@7046
   133
  ;
wenzelm@7167
   134
  interest: percent nat? | ppercent
wenzelm@7046
   135
  ;
wenzelm@7046
   136
\end{rail}
wenzelm@7046
   137
wenzelm@7046
   138
wenzelm@7335
   139
\subsection{Type classes, sorts and arities}
wenzelm@7046
   140
wenzelm@7050
   141
The syntax of sorts and arities is given directly at the outer level.  Note
wenzelm@7335
   142
that this is in contrast to types and terms (see \ref{sec:types-terms}).
wenzelm@7050
   143
wenzelm@7050
   144
\indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}
wenzelm@7135
   145
\indexouternonterm{classdecl}
wenzelm@7046
   146
\begin{rail}
wenzelm@7321
   147
  classdecl: name ('<' (nameref + ','))?
wenzelm@7046
   148
  ;
wenzelm@7167
   149
  sort: nameref | lbrace (nameref * ',') rbrace
wenzelm@7046
   150
  ;
wenzelm@7167
   151
  arity: ('(' (sort + ',') ')')? sort
wenzelm@7050
   152
  ;
wenzelm@7167
   153
  simplearity: ('(' (sort + ',') ')')? nameref
wenzelm@7167
   154
  ;
wenzelm@7050
   155
\end{rail}
wenzelm@7050
   156
wenzelm@7050
   157
wenzelm@7167
   158
\subsection{Types and terms}\label{sec:types-terms}
wenzelm@7050
   159
wenzelm@7167
   160
The actual inner Isabelle syntax, that of types and terms of the logic, is far
wenzelm@7895
   161
too sophisticated in order to be modelled explicitly at the outer theory
wenzelm@7895
   162
level.  Basically, any such entity has to be quoted here to turn it into a
wenzelm@7981
   163
single token (the parsing and type-checking is performed internally later).
wenzelm@7981
   164
For convenience, a slightly more liberal convention is adopted: quotes may be
wenzelm@7895
   165
omitted for any type or term that is already \emph{atomic} at the outer level.
wenzelm@7895
   166
For example, one may write just \texttt{x} instead of \texttt{"x"}.  Note that
wenzelm@7981
   167
symbolic identifiers (e.g.\ \texttt{++}) are available as well, provided these
wenzelm@7981
   168
are not superseded by commands or keywords (e.g.\ \texttt{+}).
wenzelm@7050
   169
wenzelm@7050
   170
\indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
wenzelm@7050
   171
\begin{rail}
wenzelm@7167
   172
  type: nameref | typefree | typevar
wenzelm@7050
   173
  ;
wenzelm@7466
   174
  term: nameref | var | nat
wenzelm@7050
   175
  ;
wenzelm@7167
   176
  prop: term
wenzelm@7050
   177
  ;
wenzelm@7050
   178
\end{rail}
wenzelm@7050
   179
wenzelm@7167
   180
Type declarations and definitions usually refer to \railnonterm{typespec} on
wenzelm@7167
   181
the left-hand side.  This models basic type constructor application at the
wenzelm@7167
   182
outer syntax level.  Note that only plain postfix notation is available here,
wenzelm@7167
   183
but no infixes.
wenzelm@7050
   184
wenzelm@7050
   185
\indexouternonterm{typespec}
wenzelm@7050
   186
\begin{rail}
wenzelm@7167
   187
  typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
wenzelm@7050
   188
  ;
wenzelm@7050
   189
\end{rail}
wenzelm@7050
   190
wenzelm@7050
   191
wenzelm@7315
   192
\subsection{Term patterns}\label{sec:term-pats}
wenzelm@7050
   193
wenzelm@7895
   194
Assumptions and goal statements usually admit casual binding of schematic term
wenzelm@7981
   195
variables by giving (optional) patterns of the form $\ISS{p@1\;\dots}{p@n}$.
wenzelm@7167
   196
There are separate versions available for \railqtoken{term}s and
wenzelm@7167
   197
\railqtoken{prop}s.  The latter provides a $\CONCLNAME$ part with patterns
wenzelm@7167
   198
referring the (atomic) conclusion of a rule.
wenzelm@7050
   199
wenzelm@7050
   200
\indexouternonterm{termpat}\indexouternonterm{proppat}
wenzelm@7050
   201
\begin{rail}
wenzelm@7167
   202
  termpat: '(' ('is' term +) ')'
wenzelm@7050
   203
  ;
wenzelm@7167
   204
  proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'
wenzelm@7046
   205
  ;
wenzelm@7046
   206
\end{rail}
wenzelm@7046
   207
wenzelm@7046
   208
wenzelm@7050
   209
\subsection{Mixfix annotations}
wenzelm@7050
   210
wenzelm@7134
   211
Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and
wenzelm@7981
   212
terms.  Some commands such as $\TYPES$ (see \S\ref{sec:types-pure}) admit
wenzelm@7981
   213
infixes only, while $\CONSTS$ (see \S\ref{sec:consts}) and
wenzelm@7981
   214
$\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans}) support the full range of
wenzelm@7981
   215
general mixfixes and binders.
wenzelm@7046
   216
wenzelm@7050
   217
\indexouternonterm{infix}\indexouternonterm{mixfix}
wenzelm@7046
   218
\begin{rail}
wenzelm@7167
   219
  infix: '(' ('infixl' | 'infixr') string? nat ')'
wenzelm@7167
   220
  ;
wenzelm@7175
   221
  mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
wenzelm@7050
   222
  ;
wenzelm@7050
   223
wenzelm@7175
   224
  prios: '[' (nat + ',') ']'
wenzelm@7050
   225
  ;
wenzelm@7046
   226
\end{rail}
wenzelm@7046
   227
wenzelm@7050
   228
wenzelm@7134
   229
\subsection{Attributes and theorems}\label{sec:syn-att}
wenzelm@7050
   230
wenzelm@7050
   231
Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own
wenzelm@7335
   232
``semi-inner'' syntax, in the sense that input conforming to
wenzelm@7335
   233
\railnonterm{args} below is parsed by the attribute a second time.  The
wenzelm@7335
   234
attribute argument specifications may be any sequence of atomic entities
wenzelm@7335
   235
(identifiers, strings etc.), or properly bracketed argument lists.  Below
wenzelm@7981
   236
\railqtoken{atom} refers to any atomic entity, including any
wenzelm@7981
   237
\railtoken{keyword} conforming to \railtoken{symident}.
wenzelm@7050
   238
wenzelm@7050
   239
\indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
wenzelm@7050
   240
\begin{rail}
wenzelm@7466
   241
  atom: nameref | typefree | typevar | var | nat | keyword
wenzelm@7134
   242
  ;
wenzelm@7167
   243
  arg: atom | '(' args ')' | '[' args ']' | lbrace args rbrace
wenzelm@7050
   244
  ;
wenzelm@7167
   245
  args: arg *
wenzelm@7134
   246
  ;
wenzelm@7167
   247
  attributes: '[' (nameref args * ',') ']'
wenzelm@7050
   248
  ;
wenzelm@7050
   249
\end{rail}
wenzelm@7050
   250
wenzelm@7895
   251
Theorem specifications come in several flavors: \railnonterm{axmdecl} and
wenzelm@7175
   252
\railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal
wenzelm@7981
   253
statements, while \railnonterm{thmdef} collects lists of existing theorems.
wenzelm@7981
   254
Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},
wenzelm@7981
   255
the former requires an actual singleton result.  Any of these theorem
wenzelm@7175
   256
specifications may include lists of attributes both on the left and right hand
wenzelm@7466
   257
sides; attributes are applied to any immediately preceding theorem.  If names
wenzelm@7981
   258
are omitted, the theorems are not stored within the theorem database of the
wenzelm@7981
   259
theory or proof context; any given attributes are still applied, though.
wenzelm@7050
   260
wenzelm@7135
   261
\indexouternonterm{thmdecl}\indexouternonterm{axmdecl}
wenzelm@7135
   262
\indexouternonterm{thmdef}\indexouternonterm{thmrefs}
wenzelm@7050
   263
\begin{rail}
wenzelm@7167
   264
  axmdecl: name attributes? ':'
wenzelm@7050
   265
  ;
wenzelm@7167
   266
  thmdecl: thmname ':'
wenzelm@7135
   267
  ;
wenzelm@7167
   268
  thmdef: thmname '='
wenzelm@7050
   269
  ;
wenzelm@7175
   270
  thmref: nameref attributes?
wenzelm@7175
   271
  ;
wenzelm@7175
   272
  thmrefs: thmref +
wenzelm@7134
   273
  ;
wenzelm@7167
   274
wenzelm@7167
   275
  thmname: name attributes | name | attributes
wenzelm@7050
   276
  ;
wenzelm@7050
   277
\end{rail}
wenzelm@7046
   278
wenzelm@7046
   279
wenzelm@7050
   280
\subsection{Proof methods}\label{sec:syn-meth}
wenzelm@7046
   281
wenzelm@7050
   282
Proof methods are either basic ones, or expressions composed of methods via
wenzelm@7175
   283
``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),
wenzelm@7981
   284
``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once).  In practice,
wenzelm@7981
   285
proof methods are usually just a comma separated list of
wenzelm@7981
   286
\railqtoken{nameref}~\railnonterm{args} specifications.  Note that parentheses
wenzelm@7981
   287
may be dropped for single method specifications (with no arguments).
wenzelm@7046
   288
wenzelm@7050
   289
\indexouternonterm{method}
wenzelm@7050
   290
\begin{rail}
wenzelm@7430
   291
  method: (nameref | '(' methods ')') (() | '?' | '+')
wenzelm@7134
   292
  ;
wenzelm@7167
   293
  methods: (nameref args | method) + (',' | '|')
wenzelm@7050
   294
  ;
wenzelm@7050
   295
\end{rail}
wenzelm@7046
   296
wenzelm@7046
   297
wenzelm@7046
   298
%%% Local Variables: 
wenzelm@7046
   299
%%% mode: latex
wenzelm@7046
   300
%%% TeX-master: "isar-ref"
wenzelm@7046
   301
%%% End: