src/Doc/IsarRef/Inner_Syntax.thy
changeset 52414 8429123bc58a
parent 52413 a59ba6de9687
child 52430 289e36c2870a
equal deleted inserted replaced
52413:a59ba6de9687 52414:8429123bc58a
  1577   the status of constant versus variable is not yet know during
  1577   the status of constant versus variable is not yet know during
  1578   parsing.  This is in contrast to print translations, where constants
  1578   parsing.  This is in contrast to print translations, where constants
  1579   are explicitly known from the given term in its fully internal form.
  1579   are explicitly known from the given term in its fully internal form.
  1580 *}
  1580 *}
  1581 
  1581 
       
  1582 
       
  1583 subsection {* Built-in syntax transformations *}
       
  1584 
       
  1585 text {*
       
  1586   Here are some further details of the main syntax transformation
       
  1587   phases of \figref{fig:parse-print}.
       
  1588 *}
       
  1589 
       
  1590 
       
  1591 subsubsection {* Transforming parse trees to ASTs *}
       
  1592 
       
  1593 text {* The parse tree is the raw output of the parser.  It is
       
  1594   transformed into an AST according to some basic scheme that may be
       
  1595   augmented by AST translation functions as explained in
       
  1596   \secref{sec:tr-funs}.
       
  1597 
       
  1598   The parse tree is constructed by nesting the right-hand sides of the
       
  1599   productions used to recognize the input.  Such parse trees are
       
  1600   simply lists of tokens and constituent parse trees, the latter
       
  1601   representing the nonterminals of the productions.  Ignoring AST
       
  1602   translation functions, parse trees are transformed to ASTs by
       
  1603   stripping out delimiters and copy productions, while retaining some
       
  1604   source position information from input tokens.
       
  1605 
       
  1606   The Pure syntax provides predefined AST translations to make the
       
  1607   basic @{text "\<lambda>"}-term structure more apparent within the
       
  1608   (first-order) AST representation, and thus facilitate the use of
       
  1609   @{command translations} (see also \secref{sec:syn-trans}).  This
       
  1610   covers ordinary term application, type application, nested
       
  1611   abstraction, iterated meta implications and function types.  The
       
  1612   effect is illustrated on some representative input strings is as
       
  1613   follows:
       
  1614 
       
  1615   \begin{center}
       
  1616   \begin{tabular}{ll}
       
  1617   input source & AST \\
       
  1618   \hline
       
  1619   @{text "f x y z"} & @{verbatim "(f x y z)"} \\
       
  1620   @{text "'a ty"} & @{verbatim "(ty 'a)"} \\
       
  1621   @{text "('a, 'b)ty"} & @{verbatim "(ty 'a 'b)"} \\
       
  1622   @{text "\<lambda>x y z. t"} & @{verbatim "(\"_abs\" x (\"_abs\" y (\"_abs\" z t)))"} \\
       
  1623   @{text "\<lambda>x :: 'a. t"} & @{verbatim "(\"_abs\" (\"_constrain\" x 'a) t)"} \\
       
  1624   @{text "\<lbrakk>P; Q; R\<rbrakk> \<Longrightarrow> S"} & @{verbatim "(\"==>\" P (\"==>\" Q (\"==>\" R S)))"} \\
       
  1625    @{text "['a, 'b, 'c] \<Rightarrow> 'd"} & @{verbatim "(\"fun\" 'a (\"fun\" 'b (\"fun\" 'c 'd)))"} \\
       
  1626   \end{tabular}
       
  1627   \end{center}
       
  1628 
       
  1629   Note that type and sort constraints may occur in further places ---
       
  1630   translations need to be ready to cope with them.  The built-in
       
  1631   syntax transformation from parse trees to ASTs insert additional
       
  1632   constraints that represent source positions.
       
  1633 *}
       
  1634 
       
  1635 
       
  1636 subsubsection {* Transforming ASTs to terms *}
       
  1637 
       
  1638 text {* After application of macros (\secref{sec:syn-trans}), the AST
       
  1639   is transformed into a term.  This term still lacks proper type
       
  1640   information, but it might contain some constraints consisting of
       
  1641   applications with head @{verbatim "_constrain"}, where the second
       
  1642   argument is a type encoded as a pre-term within the syntax.  Type
       
  1643   inference later introduces correct types, or indicates type errors
       
  1644   in the input.
       
  1645 
       
  1646   Ignoring parse translations, ASTs are transformed to terms by
       
  1647   mapping AST constants to term constants, AST variables to term
       
  1648   variables or constants (according to the name space), and AST
       
  1649   applications to iterated term applications.
       
  1650 
       
  1651   The outcome is still a first-order term.  Proper abstractions and
       
  1652   bound variables are introduced by parse translations associated with
       
  1653   certain syntax constants.  Thus @{verbatim "(_abs x x)"} eventually
       
  1654   becomes a de-Bruijn term @{verbatim "Abs (\"x\", _, Bound 0)"}.
       
  1655 *}
       
  1656 
       
  1657 
       
  1658 subsubsection {* Printing of terms *}
       
  1659 
       
  1660 text {* The output phase is essentially the inverse of the input
       
  1661   phase.  Terms are translated via abstract syntax trees into
       
  1662   pretty-printed text.
       
  1663 
       
  1664   Ignoring print translations, the transformation maps term constants,
       
  1665   variables and applications to the corresponding constructs on ASTs.
       
  1666   Abstractions are mapped to applications of the special constant
       
  1667   @{verbatim "_abs"} as seen before.  Type constraints are represented
       
  1668   via special @{verbatim "_constrain"} forms, according to various
       
  1669   policies of type annotation determined elsewhere.  Sort constraints
       
  1670   of type variables are handled in a similar fashion.
       
  1671 
       
  1672   After application of macros (\secref{sec:syn-trans}), the AST is
       
  1673   finally pretty-printed.  The built-in print AST translations reverse
       
  1674   the corresponding parse AST translations.
       
  1675 
       
  1676   \medskip For the actual printing process, the priority grammar
       
  1677   (\secref{sec:priority-grammar}) plays a vital role: productions are
       
  1678   used as templates for pretty printing, with argument slots stemming
       
  1679   from nonterminals, and syntactic sugar stemming from literal tokens.
       
  1680 
       
  1681   Each AST application with constant head @{text "c"} and arguments
       
  1682   @{text "t\<^sub>1"}, \dots, @{text "t\<^sub>n"} (for @{text "n = 0"} the AST is
       
  1683   just the constant @{text "c"} itself) is printed according to the
       
  1684   first grammar production of result name @{text "c"}.  The required
       
  1685   syntax priority of the argument slot is given by its nonterminal
       
  1686   @{text "A\<^sup>(\<^sup>p\<^sup>)"}.  The argument @{text "t\<^sub>i"} that corresponds to the
       
  1687   position of @{text "A\<^sup>(\<^sup>p\<^sup>)"} is printed recursively, and then put in
       
  1688   parentheses \emph{if} its priority @{text "p"} requires this.  The
       
  1689   resulting output is concatenated with the syntactic sugar according
       
  1690   to the grammar production.
       
  1691 
       
  1692   If an AST application @{text "(c x\<^sub>1 \<dots> x\<^sub>m)"} has more arguments than
       
  1693   the corresponding production, it is first split into @{text "((c x\<^sub>1
       
  1694   \<dots> x\<^sub>n) x\<^sub>n\<^sub>+\<^sub>1 \<dots> x\<^sub>m)"} and then printed recursively as above.
       
  1695 
       
  1696   Applications with too few arguments or with non-constant head or
       
  1697   without a corresponding production are printed in prefix-form like
       
  1698   @{text "f t\<^sub>1 \<dots> t\<^sub>n"} for terms.
       
  1699 
       
  1700   Multiple productions associated with some name @{text "c"} are tried
       
  1701   in order of appearance within the grammar.  An occurrence of some
       
  1702   AST variable @{text "x"} is printed as @{text "x"} outright.
       
  1703 
       
  1704   \medskip White space is \emph{not} inserted automatically.  If
       
  1705   blanks (or breaks) are required to separate tokens, they need to be
       
  1706   specified in the mixfix declaration (\secref{sec:mixfix}).
       
  1707 *}
       
  1708 
  1582 end
  1709 end