doc-src/IsarRef/syntax.tex
author wenzelm
Tue, 21 Mar 2000 17:32:44 +0100
changeset 8548 7c5fe9d17712
parent 8532 46bb6a4b3ac9
child 8593 68619606c5d1
permissions -rw-r--r--
tuned;


\chapter{Isar Syntax Primitives}

We give a complete reference of all basic syntactic entities underlying the
Isabelle/Isar document syntax.  Actual theory and proof commands will be
introduced later on.

\medskip

In order to get started with writing well-formed Isabelle/Isar documents, the
most important aspect to be noted is the difference of \emph{inner} versus
\emph{outer} syntax.  Inner syntax is that of Isabelle types and terms of the
logic, while outer syntax is that of Isabelle/Isar theories (including
proofs).  As a general rule, inner syntax entities may occur only as
\emph{atomic entities} within outer syntax.  For example, the string
\texttt{"x + y"} and identifier \texttt{z} are legal term specifications
within a theory, while \texttt{x + y} is not.

\begin{warn}
  Note that classic Isabelle theories used to fake parts of the inner syntax
  of types, with rather complicated rules when quotes may be omitted.  Despite
  the minor drawback of requiring quotes more often, the syntax of
  Isabelle/Isar is much simpler and more robust in that respect.
\end{warn}

\medskip

Another notable point is proper input termination.  Proof~General demands any
command to be terminated by ``\texttt{;}''
(semicolon)\index{semicolon}\index{*;}.  As far as plain Isabelle/Isar is
concerned, commands may be directly run together, though.  In the presentation
of Isabelle/Isar documents, semicolons are omitted in order to gain
readability.


\section{Lexical matters}\label{sec:lex-syntax}

The Isabelle/Isar outer syntax provides token classes as presented below.
Note that some of these coincide (by full intention) with the inner lexical
syntax as presented in \cite{isabelle-ref}.  These different levels of syntax
should not be confused, though.

%FIXME keyword, command
\begin{matharray}{rcl}
  ident & = & letter~quasiletter^* \\
  longident & = & ident\verb,.,ident~\dots~ident \\
  symident & = & sym^+ ~|~ symbol \\
  nat & = & digit^+ \\
  var & = & \verb,?,ident ~|~ \verb,?,ident\verb,.,nat \\
  typefree & = & \verb,',ident \\
  typevar & = & \verb,?,typefree ~|~ \verb,?,typefree\verb,.,nat \\
  string & = & \verb,", ~\dots~ \verb,", \\
  verbatim & = & \verb,{*, ~\dots~ \verb,*}, \\
\end{matharray}
\begin{matharray}{rcl}
  letter & = & \verb,a, ~|~ \dots ~|~ \verb,z, ~|~ \verb,A, ~|~ \dots ~|~ \verb,Z, \\
  digit & = & \verb,0, ~|~ \dots ~|~ \verb,9, \\
  quasiletter & = & letter ~|~ digit ~|~ \verb,_, ~|~ \verb,', \\
  sym & = & \verb,!, ~|~ \verb,#, ~|~ \verb,$, ~|~ \verb,%, ~|~ \verb,&, ~|~  %$
   \verb,*, ~|~ \verb,+, ~|~ \verb,-, ~|~ \verb,/, ~|~ \verb,:, ~|~
   \verb,<, ~|~ \verb,=, ~|~ \verb,>, ~|~ \verb,?, ~|~ \mathtt{\at} ~|~ \\
  & & \verb,^, ~|~ \verb,_, ~|~ \verb,`, ~|~ \verb,|, ~|~ \verb,~, \\
  symbol & = & {\forall} ~|~ {\exists} ~|~ \dots
\end{matharray}

The syntax of \texttt{string} admits any characters, including newlines;
``\verb|"|'' (double-quote) and ``\verb|\|'' (backslash) have to be escaped by
a backslash.  Note that ML-style control characters are \emph{not} supported.
The body of \texttt{verbatim} may consist of any text not containing
``\verb|*}|''.

Comments take the form \texttt{(*~\dots~*)} and may be
nested\footnote{Proof~General may occasionally get confused by nested
  comments.}, just as in ML. Note that these are \emph{source} comments only,
which are stripped after lexical analysis of the input.  The Isar document
syntax also provides \emph{formal comments} that are actually part of the text
(see \S\ref{sec:comments}).


\section{Common syntax entities}

Subsequently, we introduce several basic syntactic entities, such as names,
terms, and theorem specifications, which have been factored out of the actual
Isar language elements to be described later.

Note that some of the basic syntactic entities introduced below (e.g.\ 
\railqtoken{name}) act much like tokens rather than plain nonterminals (e.g.\ 
\railnonterm{sort}), especially for the sake of error messages.  E.g.\ syntax
elements such as $\CONSTS$ referring to \railqtoken{name} or \railqtoken{type}
would really report a missing name or type rather than any of the constituent
primitive tokens such as \railtoken{ident} or \railtoken{string}.


\subsection{Names}

Entity \railqtoken{name} usually refers to any name of types, constants,
theorems etc.\ that are to be \emph{declared} or \emph{defined} (so qualified
identifiers are excluded here).  Quoted strings provide an escape for
non-identifier names or those ruled out by outer syntax keywords (e.g.\ 
\verb|"let"|).  Already existing objects are usually referenced by
\railqtoken{nameref}.

\indexoutertoken{name}\indexoutertoken{parname}\indexoutertoken{nameref}
\begin{rail}
  name: ident | symident | string | nat
  ;
  parname: '(' name ')'
  ;
  nameref: name | longident
  ;
\end{rail}


\subsection{Comments}\label{sec:comments}

Large chunks of plain \railqtoken{text} are usually given
\railtoken{verbatim}, i.e.\ enclosed in \verb|{*|~\dots~\verb|*}|.  For
convenience, any of the smaller text units conforming to \railqtoken{nameref}
are admitted as well.  Almost any of the Isar commands may be annotated by
marginal \railnonterm{comment} of the form \texttt{--} \railqtoken{text}.
Note that the latter kind of comment is actually part of the language, while
source level comments \verb|(*|~\dots~\verb|*)| are stripped at the lexical
level.  A few commands such as $\PROOFNAME$ admit additional markup with a
``level of interest'': \texttt{\%} followed by an optional number $n$ (default
$n = 1$) indicates that the respective part of the document becomes $n$ levels
more obscure; \texttt{\%\%} means that interest drops by $\infty$ --- abandon
every hope, who enter here.

\indexoutertoken{text}\indexouternonterm{comment}\indexouternonterm{interest}
\begin{rail}
  text: verbatim | nameref
  ;
  comment: ('--' text +)
  ;
  interest: percent nat? | ppercent
  ;
\end{rail}


\subsection{Type classes, sorts and arities}

The syntax of sorts and arities is given directly at the outer level.  Note
that this is in contrast to types and terms (see \ref{sec:types-terms}).

\indexouternonterm{sort}\indexouternonterm{arity}\indexouternonterm{simplearity}
\indexouternonterm{classdecl}
\begin{rail}
  classdecl: name ('<' (nameref + ','))?
  ;
  sort: nameref | lbrace (nameref * ',') rbrace
  ;
  arity: ('(' (sort + ',') ')')? sort
  ;
  simplearity: ('(' (sort + ',') ')')? nameref
  ;
\end{rail}


\subsection{Types and terms}\label{sec:types-terms}

The actual inner Isabelle syntax, that of types and terms of the logic, is far
too sophisticated in order to be modelled explicitly at the outer theory
level.  Basically, any such entity has to be quoted to turn it into a single
token (the parsing and type-checking is performed internally later).  For
convenience, a slightly more liberal convention is adopted: quotes may be
omitted for any type or term that is already \emph{atomic} at the outer level.
For example, one may write just \texttt{x} instead of \texttt{"x"}.  Note that
symbolic identifiers (e.g.\ \texttt{++} or $\forall$) are available as well,
provided these are not superseded by commands or keywords (e.g.\ \texttt{+}).

\indexoutertoken{type}\indexoutertoken{term}\indexoutertoken{prop}
\begin{rail}
  type: nameref | typefree | typevar
  ;
  term: nameref | var | nat
  ;
  prop: term
  ;
\end{rail}

Type declarations and definitions usually refer to \railnonterm{typespec} on
the left-hand side.  This models basic type constructor application at the
outer syntax level.  Note that only plain postfix notation is available here,
but no infixes.

\indexouternonterm{typespec}
\begin{rail}
  typespec: (() | typefree | '(' ( typefree + ',' ) ')') name
  ;
\end{rail}


\subsection{Term patterns}\label{sec:term-pats}

Assumptions and goal statements usually admit casual binding of schematic term
variables by giving (optional) patterns of the form $\ISS{p@1\;\dots}{p@n}$.
There are separate versions available for \railqtoken{term}s and
\railqtoken{prop}s.  The latter provides a $\CONCLNAME$ part with patterns
referring the (atomic) conclusion of a rule.

\indexouternonterm{termpat}\indexouternonterm{proppat}
\begin{rail}
  termpat: '(' ('is' term +) ')'
  ;
  proppat: '(' (('is' prop +) | 'concl' ('is' prop +) | ('is' prop +) 'concl' ('is' prop +)) ')'
  ;
\end{rail}


\subsection{Mixfix annotations}

Mixfix annotations specify concrete \emph{inner} syntax of Isabelle types and
terms (see also \cite{isabelle-ref}).  Some commands such as $\TYPES$ (see
\S\ref{sec:types-pure}) admit infixes only, while $\CONSTS$ (see
\S\ref{sec:consts}) and $\isarkeyword{syntax}$ (see \S\ref{sec:syn-trans})
support the full range of general mixfixes and binders.

\indexouternonterm{infix}\indexouternonterm{mixfix}
\begin{rail}
  infix: '(' ('infixl' | 'infixr') string? nat ')'
  ;
  mixfix: infix | '(' string prios? nat? ')' | '(' 'binder' string prios? nat ')'
  ;

  prios: '[' (nat + ',') ']'
  ;
\end{rail}


\subsection{Attributes and theorems}\label{sec:syn-att}

Attributes (and proof methods, see \S\ref{sec:syn-meth}) have their own
``semi-inner'' syntax, in the sense that input conforming to
\railnonterm{args} below is parsed by the attribute a second time.  The
attribute argument specifications may be any sequence of atomic entities
(identifiers, strings etc.), or properly bracketed argument lists.  Below
\railqtoken{atom} refers to any atomic entity, including any
\railtoken{keyword} conforming to \railtoken{symident}.

\indexoutertoken{atom}\indexouternonterm{args}\indexouternonterm{attributes}
\begin{rail}
  atom: nameref | typefree | typevar | var | nat | keyword
  ;
  arg: atom | '(' args ')' | '[' args ']' | lbrace args rbrace
  ;
  args: arg *
  ;
  attributes: '[' (nameref args * ',') ']'
  ;
\end{rail}

Theorem specifications come in several flavors: \railnonterm{axmdecl} and
\railnonterm{thmdecl} usually refer to axioms, assumptions or results of goal
statements, while \railnonterm{thmdef} collects lists of existing theorems.
Existing theorems are given by \railnonterm{thmref} and \railnonterm{thmrefs},
the former requires an actual singleton result.  Any of these theorem
specifications may include lists of attributes both on the left and right hand
sides; attributes are applied to any immediately preceding theorem.  If names
are omitted, the theorems are not stored within the theorem database of the
theory or proof context; any given attributes are still applied, though.

\indexouternonterm{thmdecl}\indexouternonterm{axmdecl}
\indexouternonterm{thmdef}\indexouternonterm{thmrefs}
\begin{rail}
  axmdecl: name attributes? ':'
  ;
  thmdecl: thmname ':'
  ;
  thmdef: thmname '='
  ;
  thmref: nameref attributes?
  ;
  thmrefs: thmref +
  ;

  thmname: name attributes | name | attributes
  ;
\end{rail}


\subsection{Proof methods}\label{sec:syn-meth}

Proof methods are either basic ones, or expressions composed of methods via
``\texttt{,}'' (sequential composition), ``\texttt{|}'' (alternative choices),
``\texttt{?}'' (try), ``\texttt{+}'' (repeat at least once).  In practice,
proof methods are usually just a comma separated list of
\railqtoken{nameref}~\railnonterm{args} specifications.  Note that parentheses
may be dropped for single method specifications (with no arguments).

\indexouternonterm{method}
\begin{rail}
  method: (nameref | '(' methods ')') (() | '?' | '+')
  ;
  methods: (nameref args | method) + (',' | '|')
  ;
\end{rail}

Proper use of Isar proof methods does \emph{not} involve goal addressing.
Nevertheless, specifying goal ranges may occasionally come in handy in
emulating tactic scripts.  Note that $[n-]$ refers to all goals, starting from
$n$.  All goals may be specified by $[!]$, which is the same as $[1-]$.

\indexouternonterm{goalspec}
\begin{rail}
  goalspec: '[' (nat '-' nat | nat '-' | nat | '!' ) ']'
  ;
\end{rail}


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "isar-ref"
%%% End: