| 
323
 | 
     1  | 
%% $Id$
  | 
| 
 | 
     2  | 
\chapter{Syntax Transformations} \label{chap:syntax}
 | 
| 
 | 
     3  | 
\newcommand\ttapp{\mathrel{\hbox{\tt\$}}}
 | 
| 
 | 
     4  | 
\newcommand\mtt[1]{\mbox{\tt #1}}
 | 
| 
 | 
     5  | 
\newcommand\ttfct[1]{\mathop{\mtt{#1}}\nolimits}
 | 
| 
 | 
     6  | 
\newcommand\Constant{\ttfct{Constant}}
 | 
| 
 | 
     7  | 
\newcommand\Variable{\ttfct{Variable}}
 | 
| 
 | 
     8  | 
\newcommand\Appl[1]{\ttfct{Appl}\,[#1]}
 | 
| 
 | 
     9  | 
\index{syntax!transformations|(}
 | 
| 
 | 
    10  | 
  | 
| 
 | 
    11  | 
This chapter is intended for experienced Isabelle users who need to define
  | 
| 
 | 
    12  | 
macros or code their own translation functions.  It describes the
  | 
| 
 | 
    13  | 
transformations between parse trees, abstract syntax trees and terms.
  | 
| 
 | 
    14  | 
  | 
| 
 | 
    15  | 
  | 
| 
 | 
    16  | 
\section{Abstract syntax trees} \label{sec:asts}
 | 
| 
 | 
    17  | 
\index{ASTs|(} 
 | 
| 
 | 
    18  | 
  | 
| 
 | 
    19  | 
The parser, given a token list from the lexer, applies productions to yield
  | 
| 
 | 
    20  | 
a parse tree\index{parse trees}.  By applying some internal transformations
 | 
| 
 | 
    21  | 
the parse tree becomes an abstract syntax tree, or \AST{}.  Macro
 | 
| 
 | 
    22  | 
expansion, further translations and finally type inference yields a
  | 
| 
 | 
    23  | 
well-typed term.  The printing process is the reverse, except for some
  | 
| 
 | 
    24  | 
subtleties to be discussed later.
  | 
| 
 | 
    25  | 
  | 
| 
 | 
    26  | 
Figure~\ref{fig:parse_print} outlines the parsing and printing process.
 | 
| 
 | 
    27  | 
Much of the complexity is due to the macro mechanism.  Using macros, you
  | 
| 
 | 
    28  | 
can specify most forms of concrete syntax without writing any \ML{} code.
 | 
| 
 | 
    29  | 
  | 
| 
 | 
    30  | 
\begin{figure}
 | 
| 
 | 
    31  | 
\begin{center}
 | 
| 
 | 
    32  | 
\begin{tabular}{cl}
 | 
| 
 | 
    33  | 
string          & \\
  | 
| 
 | 
    34  | 
$\downarrow$    & parser \\
  | 
| 
 | 
    35  | 
parse tree      & \\
  | 
| 
 | 
    36  | 
$\downarrow$    & parse \AST{} translation \\
 | 
| 
 | 
    37  | 
\AST{}             & \\
 | 
| 
 | 
    38  | 
$\downarrow$    & \AST{} rewriting (macros) \\
 | 
| 
 | 
    39  | 
\AST{}             & \\
 | 
| 
 | 
    40  | 
$\downarrow$    & parse translation, type inference \\
  | 
| 
 | 
    41  | 
--- well-typed term --- & \\
  | 
| 
 | 
    42  | 
$\downarrow$    & print translation \\
  | 
| 
 | 
    43  | 
\AST{}             & \\
 | 
| 
 | 
    44  | 
$\downarrow$    & \AST{} rewriting (macros) \\
 | 
| 
 | 
    45  | 
\AST{}             & \\
 | 
| 
 | 
    46  | 
$\downarrow$    & print \AST{} translation, printer \\
 | 
| 
 | 
    47  | 
string          &
  | 
| 
 | 
    48  | 
\end{tabular}
 | 
| 
 | 
    49  | 
  | 
| 
 | 
    50  | 
\end{center}
 | 
| 
 | 
    51  | 
\caption{Parsing and printing}\label{fig:parse_print}
 | 
| 
 | 
    52  | 
\end{figure}
 | 
| 
 | 
    53  | 
  | 
| 
 | 
    54  | 
Abstract syntax trees are an intermediate form between the raw parse trees
  | 
| 
 | 
    55  | 
and the typed $\lambda$-terms.  An \AST{} is either an atom (constant or
 | 
| 
 | 
    56  | 
variable) or a list of {\em at least two\/} subtrees.  Internally, they
 | 
| 
 | 
    57  | 
have type \mltydx{Syntax.ast}: \index{*Constant} \index{*Variable}
 | 
| 
 | 
    58  | 
\index{*Appl}
 | 
| 
 | 
    59  | 
\begin{ttbox}
 | 
| 
 | 
    60  | 
datatype ast = Constant of string
  | 
| 
 | 
    61  | 
             | Variable of string
  | 
| 
 | 
    62  | 
             | Appl of ast list
  | 
| 
 | 
    63  | 
\end{ttbox}
 | 
| 
 | 
    64  | 
%
  | 
| 
 | 
    65  | 
Isabelle uses an S-expression syntax for abstract syntax trees.  Constant
  | 
| 
 | 
    66  | 
atoms are shown as quoted strings, variable atoms as non-quoted strings and
  | 
| 
332
 | 
    67  | 
applications as a parenthesised list of subtrees.  For example, the \AST
  | 
| 
323
 | 
    68  | 
\begin{ttbox}
 | 
| 
 | 
    69  | 
Appl [Constant "_constrain",
  | 
| 
 | 
    70  | 
      Appl [Constant "_abs", Variable "x", Variable "t"],
  | 
| 
 | 
    71  | 
      Appl [Constant "fun", Variable "'a", Variable "'b"]]
  | 
| 
 | 
    72  | 
\end{ttbox}
 | 
| 
 | 
    73  | 
is shown as {\tt ("_constrain" ("_abs" x t) ("fun" 'a 'b))}.
 | 
| 
 | 
    74  | 
Both {\tt ()} and {\tt (f)} are illegal because they have too few
 | 
| 
 | 
    75  | 
subtrees. 
  | 
| 
 | 
    76  | 
  | 
| 
 | 
    77  | 
The resemblance to Lisp's S-expressions is intentional, but there are two
  | 
| 
 | 
    78  | 
kinds of atomic symbols: $\Constant x$ and $\Variable x$.  Do not take the
  | 
| 
 | 
    79  | 
names {\tt Constant} and {\tt Variable} too literally; in the later
 | 
| 
 | 
    80  | 
translation to terms, $\Variable x$ may become a constant, free or bound
  | 
| 
 | 
    81  | 
variable, even a type constructor or class name; the actual outcome depends
  | 
| 
 | 
    82  | 
on the context.
  | 
| 
 | 
    83  | 
  | 
| 
 | 
    84  | 
Similarly, you can think of ${\tt (} f~x@1~\ldots~x@n{\tt )}$ as the
 | 
| 
 | 
    85  | 
application of~$f$ to the arguments $x@1, \ldots, x@n$.  But the kind of
  | 
| 
 | 
    86  | 
application is determined later by context; it could be a type constructor
  | 
| 
 | 
    87  | 
applied to types.
  | 
| 
 | 
    88  | 
  | 
| 
 | 
    89  | 
Forms like {\tt (("_abs" x $t$) $u$)} are legal, but \AST{}s are
 | 
| 
 | 
    90  | 
first-order: the {\tt "_abs"} does not bind the {\tt x} in any way.  Later
 | 
| 
 | 
    91  | 
at the term level, {\tt ("_abs" x $t$)} will become an {\tt Abs} node and
 | 
| 
 | 
    92  | 
occurrences of {\tt x} in $t$ will be replaced by bound variables (the term
 | 
| 
 | 
    93  | 
constructor \ttindex{Bound}).
 | 
| 
 | 
    94  | 
  | 
| 
 | 
    95  | 
  | 
| 
 | 
    96  | 
\section{Transforming parse trees to \AST{}s}\label{sec:astofpt}
 | 
| 
 | 
    97  | 
\index{ASTs!made from parse trees}
 | 
| 
 | 
    98  | 
\newcommand\astofpt[1]{\lbrakk#1\rbrakk}
 | 
| 
 | 
    99  | 
  | 
| 
 | 
   100  | 
The parse tree is the raw output of the parser.  Translation functions,
  | 
| 
 | 
   101  | 
called {\bf parse AST translations}\indexbold{translations!parse AST},
 | 
| 
 | 
   102  | 
transform the parse tree into an abstract syntax tree.
  | 
| 
 | 
   103  | 
  | 
| 
 | 
   104  | 
The parse tree is constructed by nesting the right-hand sides of the
  | 
| 
 | 
   105  | 
productions used to recognize the input.  Such parse trees are simply lists
  | 
| 
 | 
   106  | 
of tokens and constituent parse trees, the latter representing the
  | 
| 
 | 
   107  | 
nonterminals of the productions.  Let us refer to the actual productions in
  | 
| 
 | 
   108  | 
the form displayed by {\tt Syntax.print_syntax}.
 | 
| 
 | 
   109  | 
  | 
| 
 | 
   110  | 
Ignoring parse \AST{} translations, parse trees are transformed to \AST{}s
 | 
| 
 | 
   111  | 
by stripping out delimiters and copy productions.  More precisely, the
  | 
| 
 | 
   112  | 
mapping $\astofpt{-}$ is derived from the productions as follows:
 | 
| 
 | 
   113  | 
\begin{itemize}
 | 
| 
 | 
   114  | 
\item Name tokens: $\astofpt{t} = \Variable s$, where $t$ is an \ndx{id},
 | 
| 
 | 
   115  | 
  \ndx{var}, \ndx{tid} or \ndx{tvar} token, and $s$ its associated string.
 | 
| 
 | 
   116  | 
  | 
| 
 | 
   117  | 
\item Copy productions:\index{productions!copy}
 | 
| 
 | 
   118  | 
  $\astofpt{\ldots P \ldots} = \astofpt{P}$.  Here $\ldots$ stands for
 | 
| 
 | 
   119  | 
  strings of delimiters, which are discarded.  $P$ stands for the single
  | 
| 
 | 
   120  | 
  constituent that is not a delimiter; it is either a nonterminal symbol or
  | 
| 
 | 
   121  | 
  a name token.
  | 
| 
 | 
   122  | 
  | 
| 
 | 
   123  | 
  \item 0-ary productions: $\astofpt{\ldots \mtt{=>} c} = \Constant c$.
 | 
| 
 | 
   124  | 
    Here there are no constituents other than delimiters, which are
  | 
| 
 | 
   125  | 
    discarded. 
  | 
| 
 | 
   126  | 
  | 
| 
 | 
   127  | 
  \item $n$-ary productions, where $n \ge 1$: delimiters are discarded and
  | 
| 
 | 
   128  | 
    the remaining constituents $P@1$, \ldots, $P@n$ are built into an
  | 
| 
 | 
   129  | 
    application whose head constant is~$c$:
  | 
| 
 | 
   130  | 
    \[ \astofpt{\ldots P@1 \ldots P@n \ldots \mtt{=>} c} = 
 | 
| 
 | 
   131  | 
       \Appl{\Constant c, \astofpt{P@1}, \ldots, \astofpt{P@n}}
 | 
| 
 | 
   132  | 
    \]
  | 
| 
 | 
   133  | 
\end{itemize}
 | 
| 
 | 
   134  | 
Figure~\ref{fig:parse_ast} presents some simple examples, where {\tt ==},
 | 
| 
 | 
   135  | 
{\tt _appl}, {\tt _args}, and so forth name productions of the Pure syntax.
 | 
| 
 | 
   136  | 
These examples illustrate the need for further translations to make \AST{}s
 | 
| 
 | 
   137  | 
closer to the typed $\lambda$-calculus.  The Pure syntax provides
  | 
| 
 | 
   138  | 
predefined parse \AST{} translations\index{translations!parse AST} for
 | 
| 
 | 
   139  | 
ordinary applications, type applications, nested abstractions, meta
  | 
| 
 | 
   140  | 
implications and function types.  Figure~\ref{fig:parse_ast_tr} shows their
 | 
| 
 | 
   141  | 
effect on some representative input strings.
  | 
| 
 | 
   142  | 
  | 
| 
 | 
   143  | 
  | 
| 
 | 
   144  | 
\begin{figure}
 | 
| 
 | 
   145  | 
\begin{center}
 | 
| 
 | 
   146  | 
\tt\begin{tabular}{ll}
 | 
| 
 | 
   147  | 
\rm input string    & \rm \AST \\\hline
  | 
| 
 | 
   148  | 
"f"                 & f \\
  | 
| 
 | 
   149  | 
"'a"                & 'a \\
  | 
| 
 | 
   150  | 
"t == u"            & ("==" t u) \\
 | 
| 
 | 
   151  | 
"f(x)"              & ("_appl" f x) \\
 | 
| 
 | 
   152  | 
"f(x, y)"           & ("_appl" f ("_args" x y)) \\
 | 
| 
 | 
   153  | 
"f(x, y, z)"        & ("_appl" f ("_args" x ("_args" y z))) \\
 | 
| 
 | 
   154  | 
"\%x y.\ t"         & ("_lambda" ("_idts" x y) t) \\
 | 
| 
 | 
   155  | 
\end{tabular}
 | 
| 
 | 
   156  | 
\end{center}
 | 
| 
 | 
   157  | 
\caption{Parsing examples using the Pure syntax}\label{fig:parse_ast} 
 | 
| 
 | 
   158  | 
\end{figure}
 | 
| 
 | 
   159  | 
  | 
| 
 | 
   160  | 
\begin{figure}
 | 
| 
 | 
   161  | 
\begin{center}
 | 
| 
 | 
   162  | 
\tt\begin{tabular}{ll}
 | 
| 
 | 
   163  | 
\rm input string            & \rm \AST{} \\\hline
 | 
| 
 | 
   164  | 
"f(x, y, z)"                & (f x y z) \\
  | 
| 
 | 
   165  | 
"'a ty"                     & (ty 'a) \\
  | 
| 
 | 
   166  | 
"('a, 'b) ty"               & (ty 'a 'b) \\
 | 
| 
 | 
   167  | 
"\%x y z.\ t"               & ("_abs" x ("_abs" y ("_abs" z t))) \\
 | 
| 
 | 
   168  | 
"\%x ::\ 'a.\ t"            & ("_abs" ("_constrain" x 'a) t) \\
 | 
| 
 | 
   169  | 
"[| P; Q; R |] => S"        & ("==>" P ("==>" Q ("==>" R S))) \\
 | 
| 
 | 
   170  | 
"['a, 'b, 'c] => 'd"        & ("fun" 'a ("fun" 'b ("fun" 'c 'd)))
 | 
| 
 | 
   171  | 
\end{tabular}
 | 
| 
 | 
   172  | 
\end{center}
 | 
| 
 | 
   173  | 
\caption{Built-in parse \AST{} translations}\label{fig:parse_ast_tr}
 | 
| 
 | 
   174  | 
\end{figure}
 | 
| 
 | 
   175  | 
  | 
| 
 | 
   176  | 
The names of constant heads in the \AST{} control the translation process.
 | 
| 
 | 
   177  | 
The list of constants invoking parse \AST{} translations appears in the
 | 
| 
 | 
   178  | 
output of {\tt Syntax.print_syntax} under {\tt parse_ast_translation}.
 | 
| 
 | 
   179  | 
  | 
| 
 | 
   180  | 
  | 
| 
 | 
   181  | 
\section{Transforming \AST{}s to terms}\label{sec:termofast}
 | 
| 
 | 
   182  | 
\index{terms!made from ASTs}
 | 
| 
 | 
   183  | 
\newcommand\termofast[1]{\lbrakk#1\rbrakk}
 | 
| 
 | 
   184  | 
  | 
| 
 | 
   185  | 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | 
| 
 | 
   186  | 
transformed into a term.  This term is probably ill-typed since type
  | 
| 
 | 
   187  | 
inference has not occurred yet.  The term may contain type constraints
  | 
| 
 | 
   188  | 
consisting of applications with head {\tt "_constrain"}; the second
 | 
| 
 | 
   189  | 
argument is a type encoded as a term.  Type inference later introduces
  | 
| 
 | 
   190  | 
correct types or rejects the input.
  | 
| 
 | 
   191  | 
  | 
| 
 | 
   192  | 
Another set of translation functions, namely parse
  | 
| 
 | 
   193  | 
translations\index{translations!parse}, may affect this process.  If we
 | 
| 
 | 
   194  | 
ignore parse translations for the time being, then \AST{}s are transformed
 | 
| 
 | 
   195  | 
to terms by mapping \AST{} constants to constants, \AST{} variables to
 | 
| 
 | 
   196  | 
schematic or free variables and \AST{} applications to applications.
 | 
| 
 | 
   197  | 
  | 
| 
 | 
   198  | 
More precisely, the mapping $\termofast{-}$ is defined by
 | 
| 
 | 
   199  | 
\begin{itemize}
 | 
| 
 | 
   200  | 
\item Constants: $\termofast{\Constant x} = \ttfct{Const} (x,
 | 
| 
 | 
   201  | 
  \mtt{dummyT})$.
 | 
| 
 | 
   202  | 
  | 
| 
 | 
   203  | 
\item Schematic variables: $\termofast{\Variable \mtt{"?}xi\mtt"} =
 | 
| 
 | 
   204  | 
  \ttfct{Var} ((x, i), \mtt{dummyT})$, where $x$ is the base name and $i$
 | 
| 
 | 
   205  | 
  the index extracted from~$xi$.
  | 
| 
 | 
   206  | 
  | 
| 
 | 
   207  | 
\item Free variables: $\termofast{\Variable x} = \ttfct{Free} (x,
 | 
| 
 | 
   208  | 
  \mtt{dummyT})$.
 | 
| 
 | 
   209  | 
  | 
| 
 | 
   210  | 
\item Function applications with $n$ arguments:
  | 
| 
 | 
   211  | 
    \[ \termofast{\Appl{f, x@1, \ldots, x@n}} = 
 | 
| 
 | 
   212  | 
       \termofast{f} \ttapp
 | 
| 
 | 
   213  | 
         \termofast{x@1} \ttapp \ldots \ttapp \termofast{x@n}
 | 
| 
 | 
   214  | 
    \]
  | 
| 
 | 
   215  | 
\end{itemize}
 | 
| 
 | 
   216  | 
Here \ttindex{Const}, \ttindex{Var}, \ttindex{Free} and
 | 
| 
 | 
   217  | 
\verb|$|\index{$@{\tt\$}} are constructors of the datatype \mltydx{term},
 | 
| 
 | 
   218  | 
while \ttindex{dummyT} stands for some dummy type that is ignored during
 | 
| 
 | 
   219  | 
type inference.
  | 
| 
 | 
   220  | 
  | 
| 
 | 
   221  | 
So far the outcome is still a first-order term.  Abstractions and bound
  | 
| 
 | 
   222  | 
variables (constructors \ttindex{Abs} and \ttindex{Bound}) are introduced
 | 
| 
 | 
   223  | 
by parse translations.  Such translations are attached to {\tt "_abs"},
 | 
| 
 | 
   224  | 
{\tt "!!"} and user-defined binders.
 | 
| 
 | 
   225  | 
  | 
| 
 | 
   226  | 
  | 
| 
 | 
   227  | 
\section{Printing of terms}
 | 
| 
 | 
   228  | 
\newcommand\astofterm[1]{\lbrakk#1\rbrakk}\index{ASTs!made from terms}
 | 
| 
 | 
   229  | 
  | 
| 
 | 
   230  | 
The output phase is essentially the inverse of the input phase.  Terms are
  | 
| 
 | 
   231  | 
translated via abstract syntax trees into strings.  Finally the strings are
  | 
| 
 | 
   232  | 
pretty printed.
  | 
| 
 | 
   233  | 
  | 
| 
 | 
   234  | 
Print translations (\S\ref{sec:tr_funs}) may affect the transformation of
 | 
| 
 | 
   235  | 
terms into \AST{}s.  Ignoring those, the transformation maps
 | 
| 
 | 
   236  | 
term constants, variables and applications to the corresponding constructs
  | 
| 
 | 
   237  | 
on \AST{}s.  Abstractions are mapped to applications of the special
 | 
| 
 | 
   238  | 
constant {\tt _abs}.
 | 
| 
 | 
   239  | 
  | 
| 
 | 
   240  | 
More precisely, the mapping $\astofterm{-}$ is defined as follows:
 | 
| 
 | 
   241  | 
\begin{itemize}
 | 
| 
 | 
   242  | 
  \item $\astofterm{\ttfct{Const} (x, \tau)} = \Constant x$.
 | 
| 
 | 
   243  | 
  | 
| 
 | 
   244  | 
  \item $\astofterm{\ttfct{Free} (x, \tau)} = constrain (\Variable x,
 | 
| 
 | 
   245  | 
    \tau)$.
  | 
| 
 | 
   246  | 
  | 
| 
 | 
   247  | 
  \item $\astofterm{\ttfct{Var} ((x, i), \tau)} = constrain (\Variable
 | 
| 
 | 
   248  | 
    \mtt{"?}xi\mtt", \tau)$, where $\mtt?xi$ is the string representation of
 | 
| 
 | 
   249  | 
    the {\tt indexname} $(x, i)$.
 | 
| 
 | 
   250  | 
  | 
| 
 | 
   251  | 
  \item For the abstraction $\lambda x::\tau.t$, let $x'$ be a variant
  | 
| 
 | 
   252  | 
    of~$x$ renamed to differ from all names occurring in~$t$, and let $t'$
  | 
| 
 | 
   253  | 
    be obtained from~$t$ by replacing all bound occurrences of~$x$ by
  | 
| 
 | 
   254  | 
    the free variable $x'$.  This replaces corresponding occurrences of the
  | 
| 
 | 
   255  | 
    constructor \ttindex{Bound} by the term $\ttfct{Free} (x',
 | 
| 
 | 
   256  | 
    \mtt{dummyT})$:
 | 
| 
 | 
   257  | 
   \[ \astofterm{\ttfct{Abs} (x, \tau, t)} = 
 | 
| 
 | 
   258  | 
      \Appl{\Constant \mtt{"_abs"}, 
 | 
| 
 | 
   259  | 
        constrain(\Variable x', \tau), \astofterm{t'}}.
 | 
| 
 | 
   260  | 
    \]
  | 
| 
 | 
   261  | 
  | 
| 
 | 
   262  | 
  \item $\astofterm{\ttfct{Bound} i} = \Variable \mtt{"B.}i\mtt"$.  
 | 
| 
 | 
   263  | 
    The occurrence of constructor \ttindex{Bound} should never happen
 | 
| 
 | 
   264  | 
    when printing well-typed terms; it indicates a de Bruijn index with no
  | 
| 
 | 
   265  | 
    matching abstraction.
  | 
| 
 | 
   266  | 
  | 
| 
 | 
   267  | 
  \item Where $f$ is not an application,
  | 
| 
 | 
   268  | 
    \[ \astofterm{f \ttapp x@1 \ttapp \ldots \ttapp x@n} = 
 | 
| 
 | 
   269  | 
       \Appl{\astofterm{f}, \astofterm{x@1}, \ldots,\astofterm{x@n}}
 | 
| 
 | 
   270  | 
    \]
  | 
| 
 | 
   271  | 
\end{itemize}
 | 
| 
 | 
   272  | 
%
  | 
| 
332
 | 
   273  | 
Type constraints\index{type constraints} are inserted to allow the printing
 | 
| 
 | 
   274  | 
of types.  This is governed by the boolean variable \ttindex{show_types}:
 | 
| 
323
 | 
   275  | 
\begin{itemize}
 | 
| 
 | 
   276  | 
  \item $constrain(x, \tau) = x$ \ if $\tau = \mtt{dummyT}$ \index{*dummyT} or
 | 
| 
332
 | 
   277  | 
    \ttindex{show_types} is set to {\tt false}.
 | 
| 
323
 | 
   278  | 
  | 
| 
 | 
   279  | 
  \item $constrain(x, \tau) = \Appl{\Constant \mtt{"_constrain"}, x, 
 | 
| 
 | 
   280  | 
         \astofterm{\tau}}$ \ otherwise.  
 | 
| 
 | 
   281  | 
  | 
| 
 | 
   282  | 
    Here, $\astofterm{\tau}$ is the \AST{} encoding of $\tau$: type
 | 
| 
 | 
   283  | 
    constructors go to {\tt Constant}s; type identifiers go to {\tt
 | 
| 
 | 
   284  | 
      Variable}s; type applications go to {\tt Appl}s with the type
 | 
| 
 | 
   285  | 
    constructor as the first element.  If \ttindex{show_sorts} is set to
 | 
| 
 | 
   286  | 
    {\tt true}, some type variables are decorated with an \AST{} encoding
 | 
| 
 | 
   287  | 
    of their sort.
  | 
| 
 | 
   288  | 
\end{itemize}
 | 
| 
 | 
   289  | 
%
  | 
| 
 | 
   290  | 
The \AST{}, after application of macros (see \S\ref{sec:macros}), is
 | 
| 
 | 
   291  | 
transformed into the final output string.  The built-in {\bf print AST
 | 
| 
332
 | 
   292  | 
  translations}\indexbold{translations!print AST} reverse the
 | 
| 
323
 | 
   293  | 
parse \AST{} translations of Fig.\ts\ref{fig:parse_ast_tr}.
 | 
| 
 | 
   294  | 
  | 
| 
 | 
   295  | 
For the actual printing process, the names attached to productions
  | 
| 
 | 
   296  | 
of the form $\ldots A^{(p@1)}@1 \ldots A^{(p@n)}@n \ldots \mtt{=>} c$ play
 | 
| 
 | 
   297  | 
a vital role.  Each \AST{} with constant head $c$, namely $\mtt"c\mtt"$ or
 | 
| 
 | 
   298  | 
$(\mtt"c\mtt"~ x@1 \ldots x@n)$, is printed according to the production
  | 
| 
 | 
   299  | 
for~$c$.  Each argument~$x@i$ is converted to a string, and put in
  | 
| 
 | 
   300  | 
parentheses if its priority~$(p@i)$ requires this.  The resulting strings
  | 
| 
 | 
   301  | 
and their syntactic sugar (denoted by \dots{} above) are joined to make a
 | 
| 
 | 
   302  | 
single string.
  | 
| 
 | 
   303  | 
  | 
| 
 | 
   304  | 
If an application $(\mtt"c\mtt"~ x@1 \ldots x@m)$ has more arguments than the
  | 
| 
 | 
   305  | 
corresponding production, it is first split into $((\mtt"c\mtt"~ x@1 \ldots
  | 
| 
 | 
   306  | 
x@n) ~ x@{n+1} \ldots x@m)$.  Applications with too few arguments or with
 | 
| 
 | 
   307  | 
non-constant head or without a corresponding production are printed as
  | 
| 
 | 
   308  | 
$f(x@1, \ldots, x@l)$ or $(\alpha@1, \ldots, \alpha@l) ty$.  An occurrence of
  | 
| 
 | 
   309  | 
$\Variable x$ is simply printed as~$x$.
  | 
| 
 | 
   310  | 
  | 
| 
 | 
   311  | 
Blanks are {\em not\/} inserted automatically.  If blanks are required to
 | 
| 
 | 
   312  | 
separate tokens, specify them in the mixfix declaration, possibly preceded
  | 
| 
 | 
   313  | 
by a slash~({\tt/}) to allow a line break.
 | 
| 
 | 
   314  | 
\index{ASTs|)}
 | 
| 
 | 
   315  | 
  | 
| 
 | 
   316  | 
  | 
| 
 | 
   317  | 
  | 
| 
 | 
   318  | 
\section{Macros: Syntactic rewriting} \label{sec:macros}
 | 
| 
 | 
   319  | 
\index{macros|(}\index{rewriting!syntactic|(} 
 | 
| 
 | 
   320  | 
  | 
| 
 | 
   321  | 
Mixfix declarations alone can handle situations where there is a direct
  | 
| 
 | 
   322  | 
connection between the concrete syntax and the underlying term.  Sometimes
  | 
| 
 | 
   323  | 
we require a more elaborate concrete syntax, such as quantifiers and list
  | 
| 
 | 
   324  | 
notation.  Isabelle's {\bf macros} and {\bf translation functions} can
 | 
| 
 | 
   325  | 
perform translations such as
  | 
| 
 | 
   326  | 
\begin{center}\tt
 | 
| 
 | 
   327  | 
  \begin{tabular}{r@{$\quad\protect\rightleftharpoons\quad$}l}
 | 
| 
 | 
   328  | 
    ALL x:A.P   & Ball(A, \%x.P)        \\ \relax
  | 
| 
 | 
   329  | 
    [x, y, z]   & Cons(x, Cons(y, Cons(z, Nil)))
  | 
| 
 | 
   330  | 
  \end{tabular}
 | 
| 
 | 
   331  | 
\end{center}
 | 
| 
 | 
   332  | 
Translation functions (see \S\ref{sec:tr_funs}) must be coded in ML; they
 | 
| 
 | 
   333  | 
are the most powerful translation mechanism but are difficult to read or
  | 
| 
 | 
   334  | 
write.  Macros are specified by first-order rewriting systems that operate
  | 
| 
 | 
   335  | 
on abstract syntax trees.  They are usually easy to read and write, and can
  | 
| 
 | 
   336  | 
express all but the most obscure translations.
  | 
| 
 | 
   337  | 
  | 
| 
 | 
   338  | 
Figure~\ref{fig:set_trans} defines a fragment of first-order logic and set
 | 
| 
 | 
   339  | 
theory.\footnote{This and the following theories are complete working
 | 
| 
 | 
   340  | 
  examples, though they specify only syntax, no axioms.  The file {\tt
 | 
| 
332
 | 
   341  | 
    ZF/ZF.thy} presents the full set theory definition, including many
  | 
| 
323
 | 
   342  | 
  macro rules.}  Theory {\tt SET} defines constants for set comprehension
 | 
| 
 | 
   343  | 
({\tt Collect}), replacement ({\tt Replace}) and bounded universal
 | 
| 
 | 
   344  | 
quantification ({\tt Ball}).  Each of these binds some variables.  Without
 | 
| 
332
 | 
   345  | 
additional syntax we should have to write $\forall x \in A.  P$ as {\tt
 | 
| 
323
 | 
   346  | 
  Ball(A,\%x.P)}, and similarly for the others.
  | 
| 
 | 
   347  | 
  | 
| 
 | 
   348  | 
\begin{figure}
 | 
| 
 | 
   349  | 
\begin{ttbox}
 | 
| 
 | 
   350  | 
SET = Pure +
  | 
| 
 | 
   351  | 
types
  | 
| 
 | 
   352  | 
  i, o
  | 
| 
 | 
   353  | 
arities
  | 
| 
 | 
   354  | 
  i, o :: logic
  | 
| 
 | 
   355  | 
consts
  | 
| 
 | 
   356  | 
  Trueprop      :: "o => prop"              ("_" 5)
 | 
| 
 | 
   357  | 
  Collect       :: "[i, i => o] => i"
  | 
| 
 | 
   358  | 
  "{\at}Collect"    :: "[idt, i, o] => i"       ("(1{\ttlbrace}_:_./ _{\ttrbrace})")
 | 
| 
 | 
   359  | 
  Replace       :: "[i, [i, i] => o] => i"
  | 
| 
 | 
   360  | 
  "{\at}Replace"    :: "[idt, idt, i, o] => i"  ("(1{\ttlbrace}_./ _:_, _{\ttrbrace})")
 | 
| 
 | 
   361  | 
  Ball          :: "[i, i => o] => o"
  | 
| 
 | 
   362  | 
  "{\at}Ball"       :: "[idt, i, o] => o"       ("(3ALL _:_./ _)" 10)
 | 
| 
 | 
   363  | 
translations
  | 
| 
 | 
   364  | 
  "{\ttlbrace}x:A. P{\ttrbrace}"    == "Collect(A, \%x. P)"
 | 
| 
 | 
   365  | 
  "{\ttlbrace}y. x:A, Q{\ttrbrace}" == "Replace(A, \%x y. Q)"
 | 
| 
 | 
   366  | 
  "ALL x:A. P"  == "Ball(A, \%x. P)"
  | 
| 
 | 
   367  | 
end
  | 
| 
 | 
   368  | 
\end{ttbox}
 | 
| 
 | 
   369  | 
\caption{Macro example: set theory}\label{fig:set_trans}
 | 
| 
 | 
   370  | 
\end{figure}
 | 
| 
 | 
   371  | 
  | 
| 
 | 
   372  | 
The theory specifies a variable-binding syntax through additional
  | 
| 
 | 
   373  | 
productions that have mixfix declarations.  Each non-copy production must
  | 
| 
 | 
   374  | 
specify some constant, which is used for building \AST{}s.
 | 
| 
 | 
   375  | 
\index{constants!syntactic} The additional constants are decorated with
 | 
| 
 | 
   376  | 
{\tt\at} to stress their purely syntactic purpose; they should never occur
 | 
| 
 | 
   377  | 
within the final well-typed terms.  Furthermore, they cannot be written in
  | 
| 
 | 
   378  | 
formulae because they are not legal identifiers.
  | 
| 
 | 
   379  | 
  | 
| 
 | 
   380  | 
The translations cause the replacement of external forms by internal forms
  | 
| 
 | 
   381  | 
after parsing, and vice versa before printing of terms.  As a specification
  | 
| 
 | 
   382  | 
of the set theory notation, they should be largely self-explanatory.  The
  | 
| 
 | 
   383  | 
syntactic constants, {\tt\at Collect}, {\tt\at Replace} and {\tt\at Ball},
 | 
| 
 | 
   384  | 
appear implicitly in the macro rules via their mixfix forms.
  | 
| 
 | 
   385  | 
  | 
| 
 | 
   386  | 
Macros can define variable-binding syntax because they operate on \AST{}s,
 | 
| 
 | 
   387  | 
which have no inbuilt notion of bound variable.  The macro variables {\tt
 | 
| 
 | 
   388  | 
  x} and~{\tt y} have type~{\tt idt} and therefore range over identifiers,
 | 
| 
 | 
   389  | 
in this case bound variables.  The macro variables {\tt P} and~{\tt Q}
 | 
| 
 | 
   390  | 
range over formulae containing bound variable occurrences.
  | 
| 
 | 
   391  | 
  | 
| 
 | 
   392  | 
Other applications of the macro system can be less straightforward, and
  | 
| 
 | 
   393  | 
there are peculiarities.  The rest of this section will describe in detail
  | 
| 
 | 
   394  | 
how Isabelle macros are preprocessed and applied.
  | 
| 
 | 
   395  | 
  | 
| 
 | 
   396  | 
  | 
| 
 | 
   397  | 
\subsection{Specifying macros}
 | 
| 
 | 
   398  | 
Macros are basically rewrite rules on \AST{}s.  But unlike other macro
 | 
| 
 | 
   399  | 
systems found in programming languages, Isabelle's macros work in both
  | 
| 
 | 
   400  | 
directions.  Therefore a syntax contains two lists of rewrites: one for
  | 
| 
 | 
   401  | 
parsing and one for printing.
  | 
| 
 | 
   402  | 
  | 
| 
 | 
   403  | 
\index{*translations section}
 | 
| 
 | 
   404  | 
The {\tt translations} section specifies macros.  The syntax for a macro is
 | 
| 
 | 
   405  | 
\[ (root)\; string \quad
  | 
| 
 | 
   406  | 
   \left\{\begin{array}[c]{c} \mtt{=>} \\ \mtt{<=} \\ \mtt{==} \end{array}
 | 
| 
 | 
   407  | 
   \right\} \quad
  | 
| 
 | 
   408  | 
   (root)\; string 
  | 
| 
 | 
   409  | 
\]
  | 
| 
 | 
   410  | 
%
  | 
| 
 | 
   411  | 
This specifies a parse rule ({\tt =>}), a print rule ({\tt <=}), or both
 | 
| 
 | 
   412  | 
({\tt ==}).  The two strings specify the left and right-hand sides of the
 | 
| 
 | 
   413  | 
macro rule.  The $(root)$ specification is optional; it specifies the
  | 
| 
 | 
   414  | 
nonterminal for parsing the $string$ and if omitted defaults to {\tt
 | 
| 
 | 
   415  | 
  logic}.  \AST{} rewrite rules $(l, r)$ must obey certain conditions:
 | 
| 
 | 
   416  | 
\begin{itemize}
 | 
| 
 | 
   417  | 
\item Rules must be left linear: $l$ must not contain repeated variables.
  | 
| 
 | 
   418  | 
  | 
| 
 | 
   419  | 
\item Rules must have constant heads, namely $l = \mtt"c\mtt"$ or $l =
  | 
| 
 | 
   420  | 
  (\mtt"c\mtt" ~ x@1 \ldots x@n)$.
  | 
| 
 | 
   421  | 
  | 
| 
 | 
   422  | 
\item Every variable in~$r$ must also occur in~$l$.
  | 
| 
 | 
   423  | 
\end{itemize}
 | 
| 
 | 
   424  | 
  | 
| 
 | 
   425  | 
Macro rules may refer to any syntax from the parent theories.  They may
  | 
| 
499
 | 
   426  | 
also refer to anything defined before the {\tt .thy} file's {\tt
 | 
| 
323
 | 
   427  | 
  translations} section --- including any mixfix declarations.
  | 
| 
 | 
   428  | 
  | 
| 
 | 
   429  | 
Upon declaration, both sides of the macro rule undergo parsing and parse
  | 
| 
 | 
   430  | 
\AST{} translations (see \S\ref{sec:asts}), but do not themselves undergo
 | 
| 
 | 
   431  | 
macro expansion.  The lexer runs in a different mode that additionally
  | 
| 
 | 
   432  | 
accepts identifiers of the form $\_~letter~quasiletter^*$ (like {\tt _idt},
 | 
| 
 | 
   433  | 
{\tt _K}).  Thus, a constant whose name starts with an underscore can
 | 
| 
 | 
   434  | 
appear in macro rules but not in ordinary terms.
  | 
| 
 | 
   435  | 
  | 
| 
 | 
   436  | 
Some atoms of the macro rule's \AST{} are designated as constants for
 | 
| 
 | 
   437  | 
matching.  These are all names that have been declared as classes, types or
  | 
| 
 | 
   438  | 
constants.
  | 
| 
 | 
   439  | 
  | 
| 
 | 
   440  | 
The result of this preprocessing is two lists of macro rules, each stored
  | 
| 
 | 
   441  | 
as a pair of \AST{}s.  They can be viewed using {\tt Syntax.print_syntax}
 | 
| 
 | 
   442  | 
(sections \ttindex{parse_rules} and \ttindex{print_rules}).  For
 | 
| 
 | 
   443  | 
theory~{\tt SET} of Fig.~\ref{fig:set_trans} these are
 | 
| 
 | 
   444  | 
\begin{ttbox}
 | 
| 
 | 
   445  | 
parse_rules:
  | 
| 
 | 
   446  | 
  ("{\at}Collect" x A P)  ->  ("Collect" A ("_abs" x P))
 | 
| 
 | 
   447  | 
  ("{\at}Replace" y x A Q)  ->  ("Replace" A ("_abs" x ("_abs" y Q)))
 | 
| 
 | 
   448  | 
  ("{\at}Ball" x A P)  ->  ("Ball" A ("_abs" x P))
 | 
| 
 | 
   449  | 
print_rules:
  | 
| 
 | 
   450  | 
  ("Collect" A ("_abs" x P))  ->  ("{\at}Collect" x A P)
 | 
| 
 | 
   451  | 
  ("Replace" A ("_abs" x ("_abs" y Q)))  ->  ("{\at}Replace" y x A Q)
 | 
| 
 | 
   452  | 
  ("Ball" A ("_abs" x P))  ->  ("{\at}Ball" x A P)
 | 
| 
 | 
   453  | 
\end{ttbox}
 | 
| 
 | 
   454  | 
  | 
| 
 | 
   455  | 
\begin{warn}
 | 
| 
 | 
   456  | 
  Avoid choosing variable names that have previously been used as
  | 
| 
 | 
   457  | 
  constants, types or type classes; the {\tt consts} section in the output
 | 
| 
 | 
   458  | 
  of {\tt Syntax.print_syntax} lists all such names.  If a macro rule works
 | 
| 
 | 
   459  | 
  incorrectly, inspect its internal form as shown above, recalling that
  | 
| 
 | 
   460  | 
  constants appear as quoted strings and variables without quotes.
  | 
| 
 | 
   461  | 
\end{warn}
 | 
| 
 | 
   462  | 
  | 
| 
 | 
   463  | 
\begin{warn}
 | 
| 
 | 
   464  | 
If \ttindex{eta_contract} is set to {\tt true}, terms will be
 | 
| 
 | 
   465  | 
$\eta$-contracted {\em before\/} the \AST{} rewriter sees them.  Thus some
 | 
| 
 | 
   466  | 
abstraction nodes needed for print rules to match may vanish.  For example,
  | 
| 
332
 | 
   467  | 
\verb|Ball(A, %x. P(x))| contracts to {\tt Ball(A, P)}; the print rule does
 | 
| 
323
 | 
   468  | 
not apply and the output will be {\tt Ball(A, P)}.  This problem would not
 | 
| 
 | 
   469  | 
occur if \ML{} translation functions were used instead of macros (as is
 | 
| 
 | 
   470  | 
done for binder declarations).
  | 
| 
 | 
   471  | 
\end{warn}
 | 
| 
 | 
   472  | 
  | 
| 
 | 
   473  | 
  | 
| 
 | 
   474  | 
\begin{warn}
 | 
| 
 | 
   475  | 
Another trap concerns type constraints.  If \ttindex{show_types} is set to
 | 
| 
 | 
   476  | 
{\tt true}, bound variables will be decorated by their meta types at the
 | 
| 
 | 
   477  | 
binding place (but not at occurrences in the body).  Matching with
  | 
| 
 | 
   478  | 
\verb|Collect(A, %x. P)| binds {\tt x} to something like {\tt ("_constrain" y
 | 
| 
 | 
   479  | 
"i")} rather than only {\tt y}.  \AST{} rewriting will cause the constraint to
 | 
| 
 | 
   480  | 
appear in the external form, say \verb|{y::i:A::i. P::o}|.  
 | 
| 
 | 
   481  | 
  | 
| 
 | 
   482  | 
To allow such constraints to be re-read, your syntax should specify bound
  | 
| 
 | 
   483  | 
variables using the nonterminal~\ndx{idt}.  This is the case in our
 | 
| 
 | 
   484  | 
example.  Choosing {\tt id} instead of {\tt idt} is a common error,
 | 
| 
 | 
   485  | 
especially since it appears in former versions of most of Isabelle's
  | 
| 
 | 
   486  | 
object-logics.
  | 
| 
 | 
   487  | 
\end{warn}
 | 
| 
 | 
   488  | 
  | 
| 
 | 
   489  | 
  | 
| 
 | 
   490  | 
  | 
| 
 | 
   491  | 
\subsection{Applying rules}
 | 
| 
 | 
   492  | 
As a term is being parsed or printed, an \AST{} is generated as an
 | 
| 
 | 
   493  | 
intermediate form (recall Fig.\ts\ref{fig:parse_print}).  The \AST{} is
 | 
| 
332
 | 
   494  | 
normalised by applying macro rules in the manner of a traditional term
  | 
| 
323
 | 
   495  | 
rewriting system.  We first examine how a single rule is applied.
  | 
| 
 | 
   496  | 
  | 
| 
332
 | 
   497  | 
Let $t$ be the abstract syntax tree to be normalised and $(l, r)$ some
  | 
| 
323
 | 
   498  | 
translation rule.  A subtree~$u$ of $t$ is a {\bf redex} if it is an
 | 
| 
 | 
   499  | 
instance of~$l$; in this case $l$ is said to {\bf match}~$u$.  A redex
 | 
| 
 | 
   500  | 
matched by $l$ may be replaced by the corresponding instance of~$r$, thus
  | 
| 
 | 
   501  | 
{\bf rewriting} the \AST~$t$.  Matching requires some notion of {\bf
 | 
| 
 | 
   502  | 
  place-holders} that may occur in rule patterns but not in ordinary
  | 
| 
 | 
   503  | 
\AST{}s; {\tt Variable} atoms serve this purpose.
 | 
| 
 | 
   504  | 
  | 
| 
 | 
   505  | 
The matching of the object~$u$ by the pattern~$l$ is performed as follows:
  | 
| 
 | 
   506  | 
\begin{itemize}
 | 
| 
 | 
   507  | 
  \item Every constant matches itself.
  | 
| 
 | 
   508  | 
  | 
| 
 | 
   509  | 
  \item $\Variable x$ in the object matches $\Constant x$ in the pattern.
  | 
| 
 | 
   510  | 
    This point is discussed further below.
  | 
| 
 | 
   511  | 
  | 
| 
 | 
   512  | 
  \item Every \AST{} in the object matches $\Variable x$ in the pattern,
 | 
| 
 | 
   513  | 
    binding~$x$ to~$u$.
  | 
| 
 | 
   514  | 
  | 
| 
 | 
   515  | 
  \item One application matches another if they have the same number of
  | 
| 
 | 
   516  | 
    subtrees and corresponding subtrees match.
  | 
| 
 | 
   517  | 
  | 
| 
 | 
   518  | 
  \item In every other case, matching fails.  In particular, {\tt
 | 
| 
 | 
   519  | 
      Constant}~$x$ can only match itself.
  | 
| 
 | 
   520  | 
\end{itemize}
 | 
| 
 | 
   521  | 
A successful match yields a substitution that is applied to~$r$, generating
  | 
| 
 | 
   522  | 
the instance that replaces~$u$.
  | 
| 
 | 
   523  | 
  | 
| 
 | 
   524  | 
The second case above may look odd.  This is where {\tt Variable}s of
 | 
| 
 | 
   525  | 
non-rule \AST{}s behave like {\tt Constant}s.  Recall that \AST{}s are not
 | 
| 
 | 
   526  | 
far removed from parse trees; at this level it is not yet known which
  | 
| 
 | 
   527  | 
identifiers will become constants, bounds, frees, types or classes.  As
  | 
| 
 | 
   528  | 
\S\ref{sec:asts} describes, former parse tree heads appear in \AST{}s as
 | 
| 
 | 
   529  | 
{\tt Constant}s, while the name tokens \ndx{id}, \ndx{var}, \ndx{tid} and
 | 
| 
 | 
   530  | 
\ndx{tvar} become {\tt Variable}s.  On the other hand, when \AST{}s
 | 
| 
 | 
   531  | 
generated from terms for printing, all constants and type constructors
  | 
| 
 | 
   532  | 
become {\tt Constant}s; see \S\ref{sec:asts}.  Thus \AST{}s may contain a
 | 
| 
 | 
   533  | 
messy mixture of {\tt Variable}s and {\tt Constant}s.  This is
 | 
| 
 | 
   534  | 
insignificant at macro level because matching treats them alike.
  | 
| 
 | 
   535  | 
  | 
| 
 | 
   536  | 
Because of this behaviour, different kinds of atoms with the same name are
  | 
| 
 | 
   537  | 
indistinguishable, which may make some rules prone to misbehaviour.  Example:
  | 
| 
 | 
   538  | 
\begin{ttbox}
 | 
| 
 | 
   539  | 
types
  | 
| 
 | 
   540  | 
  Nil
  | 
| 
 | 
   541  | 
consts
  | 
| 
 | 
   542  | 
  Nil     :: "'a list"
  | 
| 
 | 
   543  | 
  "[]"    :: "'a list"    ("[]")
 | 
| 
 | 
   544  | 
translations
  | 
| 
 | 
   545  | 
  "[]"    == "Nil"
  | 
| 
 | 
   546  | 
\end{ttbox}
 | 
| 
 | 
   547  | 
The term {\tt Nil} will be printed as {\tt []}, just as expected.
 | 
| 
 | 
   548  | 
The term \verb|%Nil.t| will be printed as \verb|%[].t|, which might not be
  | 
| 
332
 | 
   549  | 
expected!  How is the type~{\tt Nil} printed?
 | 
| 
323
 | 
   550  | 
  | 
| 
 | 
   551  | 
Normalizing an \AST{} involves repeatedly applying macro rules until none
 | 
| 
332
 | 
   552  | 
are applicable.  Macro rules are chosen in the order that they appear in the
  | 
| 
323
 | 
   553  | 
{\tt translations} section.  You can watch the normalization of \AST{}s
 | 
| 
 | 
   554  | 
during parsing and printing by setting \ttindex{Syntax.trace_norm_ast} to
 | 
| 
 | 
   555  | 
{\tt true}.\index{tracing!of macros} Alternatively, use
 | 
| 
 | 
   556  | 
\ttindex{Syntax.test_read}.  The information displayed when tracing
 | 
| 
 | 
   557  | 
includes the \AST{} before normalization ({\tt pre}), redexes with results
 | 
| 
 | 
   558  | 
({\tt rewrote}), the normal form finally reached ({\tt post}) and some
 | 
| 
 | 
   559  | 
statistics ({\tt normalize}).  If tracing is off,
 | 
| 
 | 
   560  | 
\ttindex{Syntax.stat_norm_ast} can be set to {\tt true} in order to enable
 | 
| 
 | 
   561  | 
printing of the normal form and statistics only.
  | 
| 
 | 
   562  | 
  | 
| 
 | 
   563  | 
  | 
| 
 | 
   564  | 
\subsection{Example: the syntax of finite sets}
 | 
| 
 | 
   565  | 
\index{examples!of macros}
 | 
| 
 | 
   566  | 
  | 
| 
 | 
   567  | 
This example demonstrates the use of recursive macros to implement a
  | 
| 
 | 
   568  | 
convenient notation for finite sets.
  | 
| 
 | 
   569  | 
\index{*empty constant}\index{*insert constant}\index{{}@\verb'{}' symbol}
 | 
| 
 | 
   570  | 
\index{"@Enum@{\tt\at Enum} constant}
 | 
| 
 | 
   571  | 
\index{"@Finset@{\tt\at Finset} constant}
 | 
| 
 | 
   572  | 
\begin{ttbox}
 | 
| 
 | 
   573  | 
FINSET = SET +
  | 
| 
 | 
   574  | 
types
  | 
| 
 | 
   575  | 
  is
  | 
| 
 | 
   576  | 
consts
  | 
| 
 | 
   577  | 
  ""            :: "i => is"                ("_")
 | 
| 
 | 
   578  | 
  "{\at}Enum"       :: "[i, is] => is"          ("_,/ _")
 | 
| 
 | 
   579  | 
  empty         :: "i"                      ("{\ttlbrace}{\ttrbrace}")
 | 
| 
 | 
   580  | 
  insert        :: "[i, i] => i"
  | 
| 
 | 
   581  | 
  "{\at}Finset"     :: "is => i"                ("{\ttlbrace}(_){\ttrbrace}")
 | 
| 
 | 
   582  | 
translations
  | 
| 
 | 
   583  | 
  "{\ttlbrace}x, xs{\ttrbrace}"     == "insert(x, {\ttlbrace}xs{\ttrbrace})"
 | 
| 
 | 
   584  | 
  "{\ttlbrace}x{\ttrbrace}"         == "insert(x, {\ttlbrace}{\ttrbrace})"
 | 
| 
 | 
   585  | 
end
  | 
| 
 | 
   586  | 
\end{ttbox}
 | 
| 
 | 
   587  | 
Finite sets are internally built up by {\tt empty} and {\tt insert}.  The
 | 
| 
 | 
   588  | 
declarations above specify \verb|{x, y, z}| as the external representation
 | 
| 
 | 
   589  | 
of
  | 
| 
 | 
   590  | 
\begin{ttbox}
 | 
| 
 | 
   591  | 
insert(x, insert(y, insert(z, empty)))
  | 
| 
 | 
   592  | 
\end{ttbox}
 | 
| 
 | 
   593  | 
The nonterminal symbol~\ndx{is} stands for one or more objects of type~{\tt
 | 
| 
 | 
   594  | 
  i} separated by commas.  The mixfix declaration \hbox{\verb|"_,/ _"|}
 | 
| 
 | 
   595  | 
allows a line break after the comma for \rmindex{pretty printing}; if no
 | 
| 
 | 
   596  | 
line break is required then a space is printed instead.
  | 
| 
 | 
   597  | 
  | 
| 
 | 
   598  | 
The nonterminal is declared as the type~{\tt is}, but with no {\tt arities}
 | 
| 
 | 
   599  | 
declaration.  Hence {\tt is} is not a logical type and no default
 | 
| 
 | 
   600  | 
productions are added.  If we had needed enumerations of the nonterminal
  | 
| 
 | 
   601  | 
{\tt logic}, which would include all the logical types, we could have used
 | 
| 
 | 
   602  | 
the predefined nonterminal symbol \ndx{args} and skipped this part
 | 
| 
 | 
   603  | 
altogether.  The nonterminal~{\tt is} can later be reused for other
 | 
| 
 | 
   604  | 
enumerations of type~{\tt i} like lists or tuples.
 | 
| 
 | 
   605  | 
  | 
| 
 | 
   606  | 
\index{"@Finset@{\tt\at Finset} constant}
 | 
| 
 | 
   607  | 
Next follows {\tt empty}, which is already equipped with its syntax
 | 
| 
 | 
   608  | 
\verb|{}|, and {\tt insert} without concrete syntax.  The syntactic
 | 
| 
 | 
   609  | 
constant {\tt\at Finset} provides concrete syntax for enumerations of~{\tt
 | 
| 
 | 
   610  | 
  i} enclosed in curly braces.  Remember that a pair of parentheses, as in
  | 
| 
 | 
   611  | 
\verb|"{(_)}"|, specifies a block of indentation for pretty printing.
 | 
| 
 | 
   612  | 
  | 
| 
 | 
   613  | 
The translations may look strange at first.  Macro rules are best
  | 
| 
 | 
   614  | 
understood in their internal forms:
  | 
| 
 | 
   615  | 
\begin{ttbox}
 | 
| 
 | 
   616  | 
parse_rules:
  | 
| 
 | 
   617  | 
  ("{\at}Finset" ("{\at}Enum" x xs))  ->  ("insert" x ("{\at}Finset" xs))
 | 
| 
 | 
   618  | 
  ("{\at}Finset" x)  ->  ("insert" x "empty")
 | 
| 
 | 
   619  | 
print_rules:
  | 
| 
 | 
   620  | 
  ("insert" x ("{\at}Finset" xs))  ->  ("{\at}Finset" ("{\at}Enum" x xs))
 | 
| 
 | 
   621  | 
  ("insert" x "empty")  ->  ("{\at}Finset" x)
 | 
| 
 | 
   622  | 
\end{ttbox}
 | 
| 
 | 
   623  | 
This shows that \verb|{x, xs}| indeed matches any set enumeration of at least
 | 
| 
 | 
   624  | 
two elements, binding the first to {\tt x} and the rest to {\tt xs}.
 | 
| 
 | 
   625  | 
Likewise, \verb|{xs}| and \verb|{x}| represent any set enumeration.  
 | 
| 
 | 
   626  | 
The parse rules only work in the order given.
  | 
| 
 | 
   627  | 
  | 
| 
 | 
   628  | 
\begin{warn}
 | 
| 
332
 | 
   629  | 
  The \AST{} rewriter cannot distinguish constants from variables and looks
 | 
| 
323
 | 
   630  | 
  only for names of atoms.  Thus the names of {\tt Constant}s occurring in
 | 
| 
 | 
   631  | 
  the (internal) left-hand side of translation rules should be regarded as
  | 
| 
 | 
   632  | 
  \rmindex{reserved words}.  Choose non-identifiers like {\tt\at Finset} or
 | 
| 
 | 
   633  | 
  sufficiently long and strange names.  If a bound variable's name gets
  | 
| 
 | 
   634  | 
  rewritten, the result will be incorrect; for example, the term
  | 
| 
 | 
   635  | 
\begin{ttbox}
 | 
| 
 | 
   636  | 
\%empty insert. insert(x, empty)
  | 
| 
 | 
   637  | 
\end{ttbox}
 | 
| 
332
 | 
   638  | 
\par\noindent is printed as \verb|%empty insert. {x}|.
 | 
| 
323
 | 
   639  | 
\end{warn}
 | 
| 
 | 
   640  | 
  | 
| 
 | 
   641  | 
  | 
| 
 | 
   642  | 
\subsection{Example: a parse macro for dependent types}\label{prod_trans}
 | 
| 
 | 
   643  | 
\index{examples!of macros}
 | 
| 
 | 
   644  | 
  | 
| 
 | 
   645  | 
As stated earlier, a macro rule may not introduce new {\tt Variable}s on
 | 
| 
 | 
   646  | 
the right-hand side.  Something like \verb|"K(B)" => "%x. B"| is illegal;
  | 
| 
 | 
   647  | 
if allowed, it could cause variable capture.  In such cases you usually
  | 
| 
 | 
   648  | 
must fall back on translation functions.  But a trick can make things
  | 
| 
 | 
   649  | 
readable in some cases: {\em calling translation functions by parse
 | 
| 
 | 
   650  | 
  macros}:
  | 
| 
 | 
   651  | 
\begin{ttbox}
 | 
| 
 | 
   652  | 
PROD = FINSET +
  | 
| 
 | 
   653  | 
consts
  | 
| 
 | 
   654  | 
  Pi            :: "[i, i => i] => i"
  | 
| 
 | 
   655  | 
  "{\at}PROD"       :: "[idt, i, i] => i"     ("(3PROD _:_./ _)" 10)
 | 
| 
 | 
   656  | 
  "{\at}->"         :: "[i, i] => i"          ("(_ ->/ _)" [51, 50] 50)
 | 
| 
 | 
   657  | 
\ttbreak
  | 
| 
 | 
   658  | 
translations
  | 
| 
 | 
   659  | 
  "PROD x:A. B" => "Pi(A, \%x. B)"
  | 
| 
 | 
   660  | 
  "A -> B"      => "Pi(A, _K(B))"
  | 
| 
 | 
   661  | 
end
  | 
| 
 | 
   662  | 
ML
  | 
| 
 | 
   663  | 
  val print_translation = [("Pi", dependent_tr' ("{\at}PROD", "{\at}->"))];
 | 
| 
 | 
   664  | 
\end{ttbox}
 | 
| 
 | 
   665  | 
  | 
| 
 | 
   666  | 
Here {\tt Pi} is an internal constant for constructing general products.
 | 
| 
 | 
   667  | 
Two external forms exist: the general case {\tt PROD x:A.B} and the
 | 
| 
 | 
   668  | 
function space {\tt A -> B}, which abbreviates \verb|Pi(A, %x.B)| when {\tt B}
 | 
| 
 | 
   669  | 
does not depend on~{\tt x}.
 | 
| 
 | 
   670  | 
  | 
| 
 | 
   671  | 
The second parse macro introduces {\tt _K(B)}, which later becomes \verb|%x.B|
 | 
| 
 | 
   672  | 
due to a parse translation associated with \cdx{_K}.  The order of the
 | 
| 
 | 
   673  | 
parse rules is critical.  Unfortunately there is no such trick for
  | 
| 
 | 
   674  | 
printing, so we have to add a {\tt ML} section for the print translation
 | 
| 
 | 
   675  | 
\ttindex{dependent_tr'}.
 | 
| 
 | 
   676  | 
  | 
| 
 | 
   677  | 
Recall that identifiers with a leading {\tt _} are allowed in translation
 | 
| 
 | 
   678  | 
rules, but not in ordinary terms.  Thus we can create \AST{}s containing
 | 
| 
 | 
   679  | 
names that are not directly expressible.
  | 
| 
 | 
   680  | 
  | 
| 
 | 
   681  | 
The parse translation for {\tt _K} is already installed in Pure, and {\tt
 | 
| 
 | 
   682  | 
dependent_tr'} is exported by the syntax module for public use.  See
  | 
| 
 | 
   683  | 
\S\ref{sec:tr_funs} below for more of the arcane lore of translation functions.
 | 
| 
 | 
   684  | 
\index{macros|)}\index{rewriting!syntactic|)}
 | 
| 
 | 
   685  | 
  | 
| 
 | 
   686  | 
  | 
| 
 | 
   687  | 
\section{Translation functions} \label{sec:tr_funs}
 | 
| 
 | 
   688  | 
\index{translations|(} 
 | 
| 
 | 
   689  | 
%
  | 
| 
 | 
   690  | 
This section describes the translation function mechanism.  By writing
  | 
| 
 | 
   691  | 
\ML{} functions, you can do almost everything with terms or \AST{}s during
 | 
| 
 | 
   692  | 
parsing and printing.  The logic \LK\ is a good example of sophisticated
  | 
| 
332
 | 
   693  | 
transformations between internal and external representations of sequents;
  | 
| 
 | 
   694  | 
here, macros would be useless. 
  | 
| 
323
 | 
   695  | 
  | 
| 
 | 
   696  | 
A full understanding of translations requires some familiarity
  | 
| 
 | 
   697  | 
with Isabelle's internals, especially the datatypes {\tt term}, {\tt typ},
 | 
| 
 | 
   698  | 
{\tt Syntax.ast} and the encodings of types and terms as such at the various
 | 
| 
 | 
   699  | 
stages of the parsing or printing process.  Most users should never need to
  | 
| 
 | 
   700  | 
use translation functions.
  | 
| 
 | 
   701  | 
  | 
| 
 | 
   702  | 
\subsection{Declaring translation functions}
 | 
| 
 | 
   703  | 
There are four kinds of translation functions.  Each such function is
  | 
| 
 | 
   704  | 
associated with a name, which triggers calls to it.  Such names can be
  | 
| 
 | 
   705  | 
constants (logical or syntactic) or type constructors.
  | 
| 
 | 
   706  | 
  | 
| 
 | 
   707  | 
{\tt Syntax.print_syntax} displays the sets of names associated with the
 | 
| 
 | 
   708  | 
translation functions of a {\tt Syntax.syntax} under
 | 
| 
 | 
   709  | 
\ttindex{parse_ast_translation}, \ttindex{parse_translation},
 | 
| 
 | 
   710  | 
\ttindex{print_translation} and \ttindex{print_ast_translation}.  You can
 | 
| 
 | 
   711  | 
add new ones via the {\tt ML} section\index{*ML section} of
 | 
| 
 | 
   712  | 
a {\tt .thy} file.  There may never be more than one function of the same
 | 
| 
 | 
   713  | 
kind per name.  Conceptually, the {\tt ML} section should appear between
 | 
| 
 | 
   714  | 
{\tt consts} and {\tt translations}; newly installed translation functions
 | 
| 
 | 
   715  | 
are already effective when macros and logical rules are parsed.
  | 
| 
 | 
   716  | 
  | 
| 
 | 
   717  | 
The {\tt ML} section is copied verbatim into the \ML\ file generated from a
 | 
| 
 | 
   718  | 
{\tt .thy} file.  Definitions made here are accessible as components of an
 | 
| 
 | 
   719  | 
\ML\ structure; to make some definitions private, use an \ML{} {\tt local}
 | 
| 
 | 
   720  | 
declaration.  The {\tt ML} section may install translation functions by
 | 
| 
 | 
   721  | 
declaring any of the following identifiers:
  | 
| 
 | 
   722  | 
\begin{ttbox}
 | 
| 
 | 
   723  | 
val parse_ast_translation : (string * (ast list -> ast)) list
  | 
| 
 | 
   724  | 
val print_ast_translation : (string * (ast list -> ast)) list
  | 
| 
 | 
   725  | 
val parse_translation     : (string * (term list -> term)) list
  | 
| 
 | 
   726  | 
val print_translation     : (string * (term list -> term)) list
  | 
| 
 | 
   727  | 
\end{ttbox}
 | 
| 
 | 
   728  | 
  | 
| 
 | 
   729  | 
\subsection{The translation strategy}
 | 
| 
 | 
   730  | 
All four kinds of translation functions are treated similarly.  They are
  | 
| 
 | 
   731  | 
called during the transformations between parse trees, \AST{}s and terms
 | 
| 
 | 
   732  | 
(recall Fig.\ts\ref{fig:parse_print}).  Whenever a combination of the form
 | 
| 
 | 
   733  | 
$(\mtt"c\mtt"~x@1 \ldots x@n)$ is encountered, and a translation function
  | 
| 
 | 
   734  | 
$f$ of appropriate kind exists for $c$, the result is computed by the \ML{}
 | 
| 
 | 
   735  | 
function call $f \mtt[ x@1, \ldots, x@n \mtt]$.
  | 
| 
 | 
   736  | 
  | 
| 
 | 
   737  | 
For \AST{} translations, the arguments $x@1, \ldots, x@n$ are \AST{}s.  A
 | 
| 
 | 
   738  | 
combination has the form $\Constant c$ or $\Appl{\Constant c, x@1, \ldots,
 | 
| 
 | 
   739  | 
  x@n}$.  For term translations, the arguments are terms and a combination
  | 
| 
 | 
   740  | 
has the form $\ttfct{Const} (c, \tau)$ or $\ttfct{Const} (c, \tau) \ttapp
 | 
| 
 | 
   741  | 
x@1 \ttapp \ldots \ttapp x@n$.  Terms allow more sophisticated
  | 
| 
 | 
   742  | 
transformations than \AST{}s do, typically involving abstractions and bound
 | 
| 
 | 
   743  | 
variables.
  | 
| 
 | 
   744  | 
  | 
| 
 | 
   745  | 
Regardless of whether they act on terms or \AST{}s,
 | 
| 
 | 
   746  | 
parse translations differ from print translations fundamentally:
  | 
| 
 | 
   747  | 
\begin{description}
 | 
| 
 | 
   748  | 
\item[Parse translations] are applied bottom-up.  The arguments are already
  | 
| 
 | 
   749  | 
  in translated form.  The translations must not fail; exceptions trigger
  | 
| 
 | 
   750  | 
  an error message.
  | 
| 
 | 
   751  | 
  | 
| 
 | 
   752  | 
\item[Print translations] are applied top-down.  They are supplied with
  | 
| 
 | 
   753  | 
  arguments that are partly still in internal form.  The result again
  | 
| 
 | 
   754  | 
  undergoes translation; therefore a print translation should not introduce
  | 
| 
 | 
   755  | 
  as head the very constant that invoked it.  The function may raise
  | 
| 
 | 
   756  | 
  exception \xdx{Match} to indicate failure; in this event it has no
 | 
| 
 | 
   757  | 
  effect.
  | 
| 
 | 
   758  | 
\end{description}
 | 
| 
 | 
   759  | 
  | 
| 
 | 
   760  | 
Only constant atoms --- constructor \ttindex{Constant} for \AST{}s and
 | 
| 
 | 
   761  | 
\ttindex{Const} for terms --- can invoke translation functions.  This
 | 
| 
 | 
   762  | 
causes another difference between parsing and printing.
  | 
| 
 | 
   763  | 
  | 
| 
 | 
   764  | 
Parsing starts with a string and the constants are not yet identified.
  | 
| 
 | 
   765  | 
Only parse tree heads create {\tt Constant}s in the resulting \AST, as
 | 
| 
 | 
   766  | 
described in \S\ref{sec:astofpt}.  Macros and parse \AST{} translations may
 | 
| 
 | 
   767  | 
introduce further {\tt Constant}s.  When the final \AST{} is converted to a
 | 
| 
 | 
   768  | 
term, all {\tt Constant}s become {\tt Const}s, as described in
 | 
| 
 | 
   769  | 
\S\ref{sec:termofast}.
 | 
| 
 | 
   770  | 
  | 
| 
 | 
   771  | 
Printing starts with a well-typed term and all the constants are known.  So
  | 
| 
 | 
   772  | 
all logical constants and type constructors may invoke print translations.
  | 
| 
 | 
   773  | 
These, and macros, may introduce further constants.
  | 
| 
 | 
   774  | 
  | 
| 
 | 
   775  | 
  | 
| 
 | 
   776  | 
\subsection{Example: a print translation for dependent types}
 | 
| 
 | 
   777  | 
\index{examples!of translations}\indexbold{*dependent_tr'}
 | 
| 
 | 
   778  | 
  | 
| 
 | 
   779  | 
Let us continue the dependent type example (page~\pageref{prod_trans}) by
 | 
| 
 | 
   780  | 
examining the parse translation for~\cdx{_K} and the print translation
 | 
| 
 | 
   781  | 
{\tt dependent_tr'}, which are both built-in.  By convention, parse
 | 
| 
 | 
   782  | 
translations have names ending with {\tt _tr} and print translations have
 | 
| 
 | 
   783  | 
names ending with {\tt _tr'}.  Search for such names in the Isabelle
 | 
| 
 | 
   784  | 
sources to locate more examples.
  | 
| 
 | 
   785  | 
  | 
| 
 | 
   786  | 
Here is the parse translation for {\tt _K}:
 | 
| 
 | 
   787  | 
\begin{ttbox}
 | 
| 
 | 
   788  | 
fun k_tr [t] = Abs ("x", dummyT, incr_boundvars 1 t)
 | 
| 
 | 
   789  | 
  | k_tr ts = raise TERM("k_tr",ts);
 | 
| 
 | 
   790  | 
\end{ttbox}
 | 
| 
 | 
   791  | 
If {\tt k_tr} is called with exactly one argument~$t$, it creates a new
 | 
| 
 | 
   792  | 
{\tt Abs} node with a body derived from $t$.  Since terms given to parse
 | 
| 
 | 
   793  | 
translations are not yet typed, the type of the bound variable in the new
  | 
| 
 | 
   794  | 
{\tt Abs} is simply {\tt dummyT}.  The function increments all {\tt Bound}
 | 
| 
 | 
   795  | 
nodes referring to outer abstractions by calling \ttindex{incr_boundvars},
 | 
| 
 | 
   796  | 
a basic term manipulation function defined in {\tt Pure/term.ML}.
 | 
| 
 | 
   797  | 
  | 
| 
 | 
   798  | 
Here is the print translation for dependent types:
  | 
| 
 | 
   799  | 
\begin{ttbox}
 | 
| 
 | 
   800  | 
fun dependent_tr' (q,r) (A :: Abs (x, T, B) :: ts) =
  | 
| 
 | 
   801  | 
      if 0 mem (loose_bnos B) then
  | 
| 
 | 
   802  | 
        let val (x', B') = variant_abs (x, dummyT, B);
  | 
| 
 | 
   803  | 
        in list_comb (Const (q, dummyT) $ Free (x', T) $ A $ B', ts)
  | 
| 
 | 
   804  | 
        end
  | 
| 
 | 
   805  | 
      else list_comb (Const (r, dummyT) $ A $ B, ts)
  | 
| 
 | 
   806  | 
  | dependent_tr' _ _ = raise Match;
  | 
| 
 | 
   807  | 
\end{ttbox}
 | 
| 
332
 | 
   808  | 
The argument {\tt (q,r)} is supplied to the curried function {\tt
 | 
| 
 | 
   809  | 
  dependent_tr'} by a partial application during its installation.  We
  | 
| 
 | 
   810  | 
can set up print translations for both {\tt Pi} and {\tt Sigma} by
 | 
| 
 | 
   811  | 
including
  | 
| 
323
 | 
   812  | 
\begin{ttbox}\index{*ML section}
 | 
| 
 | 
   813  | 
val print_translation =
  | 
| 
 | 
   814  | 
  [("Pi",    dependent_tr' ("{\at}PROD", "{\at}->")),
 | 
| 
 | 
   815  | 
   ("Sigma", dependent_tr' ("{\at}SUM", "{\at}*"))];
 | 
| 
 | 
   816  | 
\end{ttbox}
 | 
| 
 | 
   817  | 
within the {\tt ML} section.  The first of these transforms ${\tt Pi}(A,
 | 
| 
 | 
   818  | 
\mtt{Abs}(x, T, B))$ into $\hbox{\tt{\at}PROD}(x', A, B')$ or
 | 
| 
332
 | 
   819  | 
$\hbox{\tt{\at}->}(A, B)$, choosing the latter form if $B$ does not depend
 | 
| 
323
 | 
   820  | 
on~$x$.  It checks this using \ttindex{loose_bnos}, yet another function
 | 
| 
 | 
   821  | 
from {\tt Pure/term.ML}.  Note that $x'$ is a version of $x$ renamed away
 | 
| 
332
 | 
   822  | 
from all names in $B$, and $B'$ is the body $B$ with {\tt Bound} nodes
 | 
| 
 | 
   823  | 
referring to the {\tt Abs} node replaced by $\ttfct{Free} (x',
 | 
| 
323
 | 
   824  | 
\mtt{dummyT})$.
 | 
| 
 | 
   825  | 
  | 
| 
 | 
   826  | 
We must be careful with types here.  While types of {\tt Const}s are
 | 
| 
 | 
   827  | 
ignored, type constraints may be printed for some {\tt Free}s and
 | 
| 
 | 
   828  | 
{\tt Var}s if \ttindex{show_types} is set to {\tt true}.  Variables of type
 | 
| 
 | 
   829  | 
\ttindex{dummyT} are never printed with constraint, though.  The line
 | 
| 
 | 
   830  | 
\begin{ttbox}
 | 
| 
 | 
   831  | 
        let val (x', B') = variant_abs (x, dummyT, B);
  | 
| 
 | 
   832  | 
\end{ttbox}\index{*variant_abs}
 | 
| 
 | 
   833  | 
replaces bound variable occurrences in~$B$ by the free variable $x'$ with
  | 
| 
 | 
   834  | 
type {\tt dummyT}.  Only the binding occurrence of~$x'$ is given the
 | 
| 
 | 
   835  | 
correct type~{\tt T}, so this is the only place where a type
 | 
| 
 | 
   836  | 
constraint might appear. 
  | 
| 
 | 
   837  | 
\index{translations|)}
 | 
| 
 | 
   838  | 
\index{syntax!transformations|)}
 |