doc-src/Ref/defining.tex
author blanchet
Tue, 03 Aug 2010 14:06:29 +0200
changeset 38177 84c3d801bdda
parent 30184 37969710e61f
child 42840 e87888b4152f
permissions -rw-r--r--
make example easier to parse


\chapter{Defining Logics} \label{Defining-Logics}

\section{Mixfix declarations} \label{sec:mixfix}
\index{mixfix declarations|(}

When defining a theory, you declare new constants by giving their names,
their type, and an optional {\bf mixfix annotation}.  Mixfix annotations
allow you to extend Isabelle's basic $\lambda$-calculus syntax with
readable notation.  They can express any context-free priority grammar.
Isabelle syntax definitions are inspired by \OBJ~\cite{OBJ}; they are more
general than the priority declarations of \ML\ and Prolog.

A mixfix annotation defines a production of the priority grammar.  It
describes the concrete syntax, the translation to abstract syntax, and the
pretty printing.  Special case annotations provide a simple means of
specifying infix operators and binders.

\subsection{The general mixfix form}
Here is a detailed account of mixfix declarations.  Suppose the following
line occurs within a {\tt consts} or {\tt syntax} section of a {\tt .thy}
file:
\begin{center}
  {\tt $c$ ::\ "$\sigma$" ("$template$" $ps$ $p$)}
\end{center}
This constant declaration and mixfix annotation are interpreted as follows:
\begin{itemize}\index{productions}
\item The string {\tt $c$} is the name of the constant associated with the
  production; unless it is a valid identifier, it must be enclosed in
  quotes.  If $c$ is empty (given as~{\tt ""}) then this is a copy
  production.\index{productions!copy} Otherwise, parsing an instance of the
  phrase $template$ generates the \AST{} {\tt ("$c$" $a@1$ $\ldots$
    $a@n$)}, where $a@i$ is the \AST{} generated by parsing the $i$-th
  argument.

  \item The constant $c$, if non-empty, is declared to have type $\sigma$
    ({\tt consts} section only).

  \item The string $template$ specifies the right-hand side of
    the production.  It has the form
    \[ w@0 \;_\; w@1 \;_\; \ldots \;_\; w@n, \]
    where each occurrence of {\tt_} denotes an argument position and
    the~$w@i$ do not contain~{\tt _}.  (If you want a literal~{\tt _} in
    the concrete syntax, you must escape it as described below.)  The $w@i$
    may consist of \rmindex{delimiters}, spaces or
    \rmindex{pretty printing} annotations (see below).

  \item The type $\sigma$ specifies the production's nonterminal symbols
    (or name tokens).  If $template$ is of the form above then $\sigma$
    must be a function type with at least~$n$ argument positions, say
    $\sigma = [\tau@1, \dots, \tau@n] \To \tau$.  Nonterminal symbols are
    derived from the types $\tau@1$, \ldots,~$\tau@n$, $\tau$ as described
    below.  Any of these may be function types.

  \item The optional list~$ps$ may contain at most $n$ integers, say {\tt
      [$p@1$, $\ldots$, $p@m$]}, where $p@i$ is the minimal
    priority\indexbold{priorities} required of any phrase that may appear
    as the $i$-th argument.  Missing priorities default to~0.
    
  \item The integer $p$ is the priority of this production.  If
    omitted, it defaults to the maximal priority.  Priorities range
    between 0 and \ttindexbold{max_pri} (= 1000).

\end{itemize}
%
The resulting production is \[ A^{(p)}= w@0\, A@1^{(p@1)}\, w@1\,
A@2^{(p@2)}\, \dots\, A@n^{(p@n)}\, w@n \] where $A$ and the $A@i$ are the
nonterminals corresponding to the types $\tau$ and $\tau@i$ respectively.
The nonterminal symbol associated with a type $(\ldots)ty$ is {\tt logic}, if
this is a logical type (namely one of class {\tt logic} excluding {\tt
prop}).  Otherwise it is $ty$ (note that only the outermost type constructor
is taken into account).  Finally, the nonterminal of a type variable is {\tt
any}.

\begin{warn}
  Theories must sometimes declare types for purely syntactic purposes ---
  merely playing the role of nonterminals.  One example is \tydx{type}, the
  built-in type of types.  This is a `type of all types' in the syntactic
  sense only.  Do not declare such types under {\tt arities} as belonging to
  class {\tt logic}\index{*logic class}, for that would make them useless as
  separate nonterminal symbols.
\end{warn}

Associating nonterminals with types allows a constant's type to specify
syntax as well.  We can declare the function~$f$ to have type $[\tau@1,
\ldots, \tau@n]\To \tau$ and, through a mixfix annotation, specify the layout
of the function's $n$ arguments.  The constant's name, in this case~$f$, will
also serve as the label in the abstract syntax tree.

You may also declare mixfix syntax without adding constants to the theory's
signature, by using a {\tt syntax} section instead of {\tt consts}.  Thus a
production need not map directly to a logical function (this typically
requires additional syntactic translations, see also
Chapter~\ref{chap:syntax}).


\medskip
As a special case of the general mixfix declaration, the form
\begin{center}
  {\tt $c$ ::\ "$\sigma$" ("$template$")}
\end{center}
specifies no priorities.  The resulting production puts no priority
constraints on any of its arguments and has maximal priority itself.
Omitting priorities in this manner is prone to syntactic ambiguities unless
the production's right-hand side is fully bracketed, as in
\verb|"if _ then _ else _ fi"|.

Omitting the mixfix annotation completely, as in {\tt $c$ ::\ "$\sigma$"},
is sensible only if~$c$ is an identifier.  Otherwise you will be unable to
write terms involving~$c$.


\subsection{Example: arithmetic expressions}
\index{examples!of mixfix declarations}
This theory specification contains a {\tt syntax} section with mixfix
declarations encoding the priority grammar from
\S\ref{sec:priority_grammars}:
\begin{ttbox}
ExpSyntax = Pure +
types
  exp
syntax
  "0" :: exp                 ("0"      9)
  "+" :: [exp, exp] => exp   ("_ + _"  [0, 1] 0)
  "*" :: [exp, exp] => exp   ("_ * _"  [3, 2] 2)
  "-" :: exp => exp          ("- _"    [3] 3)
end
\end{ttbox}
Executing {\tt Syntax.print_gram} reveals the productions derived from the
above mixfix declarations (lots of additional information deleted):
\begin{ttbox}
Syntax.print_gram (syn_of ExpSyntax.thy);
{\out exp = "0"  => "0" (9)}
{\out exp = exp[0] "+" exp[1]  => "+" (0)}
{\out exp = exp[3] "*" exp[2]  => "*" (2)}
{\out exp = "-" exp[3]  => "-" (3)}
\end{ttbox}

Note that because {\tt exp} is not of class {\tt logic}, it has been
retained as a separate nonterminal.  This also entails that the syntax
does not provide for identifiers or paranthesized expressions.
Normally you would also want to add the declaration {\tt arities
  exp::logic} after {\tt types} and use {\tt consts} instead of {\tt
  syntax}.  Try this as an exercise and study the changes in the
grammar.


\subsection{Infixes}
\indexbold{infixes}

Infix operators associating to the left or right can be declared using
{\tt infixl} or {\tt infixr}.  Basically, the form {\tt $c$ ::\ 
  $\sigma$ (infixl $p$)} abbreviates the mixfix declarations
\begin{ttbox}
"op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p\), \(p+1\)] \(p\))
"op \(c\)" :: \(\sigma\)   ("op \(c\)")
\end{ttbox}
and {\tt $c$ ::\ $\sigma$ (infixr $p$)} abbreviates the mixfix declarations
\begin{ttbox}
"op \(c\)" :: \(\sigma\)   ("(_ \(c\)/ _)" [\(p+1\), \(p\)] \(p\))
"op \(c\)" :: \(\sigma\)   ("op \(c\)")
\end{ttbox}
The infix operator is declared as a constant with the prefix {\tt op}.
Thus, prefixing infixes with \sdx{op} makes them behave like ordinary
function symbols, as in \ML.  Special characters occurring in~$c$ must be
escaped, as in delimiters, using a single quote.

A slightly more general form of infix declarations allows constant
names to be independent from their concrete syntax, namely \texttt{$c$
  ::\ $\sigma$\ (infixl "$sy$" $p$)}, the same for \texttt{infixr}.  As
an example consider:
\begin{ttbox}
and :: [bool, bool] => bool  (infixr "&" 35)
\end{ttbox}
The internal constant name will then be just \texttt{and}, without any
\texttt{op} prefixed.


\subsection{Binders}
\indexbold{binders}
\begingroup
\def\Q{{\cal Q}}
A {\bf binder} is a variable-binding construct such as a quantifier.  The
constant declaration
\begin{ttbox}
\(c\) :: \(\sigma\)   (binder "\(\Q\)" [\(pb\)] \(p\))
\end{ttbox}
introduces a constant~$c$ of type~$\sigma$, which must have the form
$(\tau@1 \To \tau@2) \To \tau@3$.  Its concrete syntax is $\Q~x.P$, where
$x$ is a bound variable of type~$\tau@1$, the body~$P$ has type $\tau@2$
and the whole term has type~$\tau@3$.  The optional integer $pb$
specifies the body's priority, by default~$p$.  Special characters
in $\Q$ must be escaped using a single quote.

The declaration is expanded internally to something like
\begin{ttbox}
\(c\)\hskip3pt    :: (\(\tau@1\) => \(\tau@2\)) => \(\tau@3\)
"\(\Q\)"  :: [idts, \(\tau@2\)] => \(\tau@3\)   ("(3\(\Q\)_./ _)" [0, \(pb\)] \(p\))
\end{ttbox}
Here \ndx{idts} is the nonterminal symbol for a list of identifiers with
\index{type constraints}
optional type constraints (see Fig.\ts\ref{fig:pure_gram}).  The
declaration also installs a parse translation\index{translations!parse}
for~$\Q$ and a print translation\index{translations!print} for~$c$ to
translate between the internal and external forms.

A binder of type $(\sigma \To \tau) \To \tau$ can be nested by giving a
list of variables.  The external form $\Q~x@1~x@2 \ldots x@n. P$
corresponds to the internal form
\[ c(\lambda x@1. c(\lambda x@2. \ldots c(\lambda x@n. P) \ldots)). \]

\medskip
For example, let us declare the quantifier~$\forall$:\index{quantifiers}
\begin{ttbox}
All :: ('a => o) => o   (binder "ALL " 10)
\end{ttbox}
This lets us write $\forall x.P$ as either {\tt All(\%$x$.$P$)} or {\tt ALL
  $x$.$P$}.  When printing, Isabelle prefers the latter form, but must fall
back on ${\tt All}(P)$ if $P$ is not an abstraction.  Both $P$ and {\tt ALL
  $x$.$P$} have type~$o$, the type of formulae, while the bound variable
can be polymorphic.
\endgroup

\index{mixfix declarations|)}


\section{*Alternative print modes} \label{sec:prmodes}
\index{print modes|(}
%
Isabelle's pretty printer supports alternative output syntaxes.  These
may be used independently or in cooperation.  The currently active
print modes (with precedence from left to right) are determined by a
reference variable.
\begin{ttbox}\index{*print_mode}
print_mode: string list ref
\end{ttbox}
Initially this may already contain some print mode identifiers,
depending on how Isabelle has been invoked (e.g.\ by some user
interface).  So changes should be incremental --- adding or deleting
modes relative to the current value.

Any \ML{} string is a legal print mode identifier, without any predeclaration
required.  The following names should be considered reserved, though:
\texttt{""} (the empty string), \texttt{symbols}, \texttt{xsymbols}, and
\texttt{latex}.

There is a separate table of mixfix productions for pretty printing
associated with each print mode.  The currently active ones are
conceptually just concatenated from left to right, with the standard
syntax output table always coming last as default.  Thus mixfix
productions of preceding modes in the list may override those of later
ones.  Also note that token translations are always relative to some
print mode (see \S\ref{sec:tok_tr}).

\medskip The canonical application of print modes is optional printing
of mathematical symbols from a special screen font instead of {\sc
  ascii}.  Another example is to re-use Isabelle's advanced
$\lambda$-term printing mechanisms to generate completely different
output, say for interfacing external tools like \rmindex{model
  checkers} (see also \texttt{HOL/Modelcheck}).

\index{print modes|)}


\section{Ambiguity of parsed expressions} \label{sec:ambiguity}
\index{ambiguity!of parsed expressions}

To keep the grammar small and allow common productions to be shared
all logical types (except {\tt prop}) are internally represented
by one nonterminal, namely {\tt logic}.  This and omitted or too freely
chosen priorities may lead to ways of parsing an expression that were
not intended by the theory's maker.  In most cases Isabelle is able to
select one of multiple parse trees that an expression has lead
to by checking which of them can be typed correctly.  But this may not
work in every case and always slows down parsing.
The warning and error messages that can be produced during this process are
as follows:

If an ambiguity can be resolved by type inference the following
warning is shown to remind the user that parsing is (unnecessarily)
slowed down.  In cases where it's not easily possible to eliminate the
ambiguity the frequency of the warning can be controlled by changing
the value of {\tt Syntax.ambiguity_level} which has type {\tt int
ref}.  Its default value is 1 and by increasing it one can control how
many parse trees are necessary to generate the warning.

\begin{ttbox}
{\out Ambiguous input "\dots"}
{\out produces the following parse trees:}
{\out \dots}
{\out Fortunately, only one parse tree is type correct.}
{\out You may still want to disambiguate your grammar or your input.}
\end{ttbox}

The following message is normally caused by using the same
syntax in two different productions:

\begin{ttbox}
{\out Ambiguous input "..."}
{\out produces the following parse trees:}
{\out \dots}
{\out More than one term is type correct:}
{\out \dots}
\end{ttbox}

Ambiguities occuring in syntax translation rules cannot be resolved by
type inference because it is not necessary for these rules to be type
correct.  Therefore Isabelle always generates an error message and the
ambiguity should be eliminated by changing the grammar or the rule.


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "ref"
%%% End: