| 33817 |      1 | % vim:nojs: tw=76 sw=4 sts=4 fo=awn fdm=marker
 | 
|  |      2 | %
 | 
|  |      3 | % 20090406 T. Bourke
 | 
|  |      4 | %	Original document.
 | 
|  |      5 | %
 | 
|  |      6 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 | 
|  |      7 | \documentclass[a4paper,draft]{article} % XXX
 | 
|  |      8 | %\documentclass[a4paper,final]{article}
 | 
|  |      9 | % Preamble
 | 
|  |     10 | \usepackage[T1]{fontenc}
 | 
|  |     11 | \usepackage{textcomp}
 | 
|  |     12 | \usepackage{ifdraft}
 | 
|  |     13 | 
 | 
|  |     14 | \bibliographystyle{abbrv} % alpha
 | 
|  |     15 | 
 | 
|  |     16 | \newcommand{\refsec}[1]{Section~\ref{sec:#1}}
 | 
|  |     17 | \newcommand{\reffig}[1]{Figure~\ref{fig:#1}}
 | 
|  |     18 | \newcommand{\reftab}[1]{Table~\ref{tab:#1}}
 | 
|  |     19 | 
 | 
|  |     20 | %
 | 
|  |     21 | \usepackage{acronym}
 | 
|  |     22 | \usepackage{pst-all}
 | 
|  |     23 | \usepackage{url}
 | 
|  |     24 | 
 | 
|  |     25 | \title{Isabelle find\_theorems on the web}
 | 
|  |     26 | \author{T. Bourke\thanks{NICTA}}
 | 
|  |     27 | 
 | 
|  |     28 | \special{!/pdfmark where
 | 
|  |     29 |  {pop} {userdict /pdfmark /cleartomark load put} ifelse
 | 
|  |     30 |  [ /Author   (T. Bourke, NICTA)
 | 
|  |     31 |    /Title    (IsabelleWWW: find_theorems
 | 
|  |     32 | 	      ($Date: 2008-10-03 16:09:02 +1000 (Fri, 03 Oct 2008) $))
 | 
|  |     33 |    /Subject  (Web interface to find_theorems)
 | 
|  |     34 |    /Keywords (isabelle, sml, http, www)
 | 
|  |     35 |    /DOCINFO pdfmark}
 | 
|  |     36 | 
 | 
|  |     37 | \begin{document}
 | 
|  |     38 | % Title page and abstract
 | 
|  |     39 | \maketitle
 | 
|  |     40 | 
 | 
|  |     41 | \begin{abstract}
 | 
|  |     42 | The Isabelle find\_theorems command processes queries against a theory 
 | 
|  |     43 | database and returns a list of matching theorems.
 | 
|  |     44 | It is usually given from the Proof General or command-line interface.
 | 
|  |     45 | This document describes a web server implementation.
 | 
|  |     46 | Two design alternatives are presented and an overview of an implementation 
 | 
|  |     47 | of one is described.
 | 
|  |     48 | \end{abstract}
 | 
|  |     49 | 
 | 
|  |     50 | \section{Introduction}\label{sec:intro}
 | 
|  |     51 | 
 | 
|  |     52 | This document describes the design and implementation of a web interface for 
 | 
|  |     53 | the Isabelle find\_theorems command.
 | 
|  |     54 | The design requirements and their consequences are detailed in \refsec{req}.
 | 
|  |     55 | Two architectures were considered: \begin{enumerate}
 | 
|  |     56 | 
 | 
|  |     57 | \item \emph{one process}: integrate a web server into Isabelle.
 | 
|  |     58 | 
 | 
|  |     59 | \item \emph{two processes}: run Isabelle as a web server module.
 | 
|  |     60 | 
 | 
|  |     61 | \end{enumerate}
 | 
|  |     62 | A brief evaluation of the one process architecture is presented in 
 | 
|  |     63 | \refsec{oneproc}.
 | 
|  |     64 | An implementation of the two process is presented in \refsec{twoproc}.
 | 
|  |     65 | 
 | 
|  |     66 | \section{Design requirements}\label{sec:req}
 | 
|  |     67 | 
 | 
|  |     68 | The main requirements are:\begin{enumerate}
 | 
|  |     69 | \item The system will allow users to search for theorems from a web browser.
 | 
|  |     70 | \item It will allow queries across disparate Isabelle theories.
 | 
|  |     71 | \item It will, at a minimum, handle theories from the L4.verified project.
 | 
|  |     72 | \item It will run on a secure network.
 | 
|  |     73 | \item There will be at most ten simultaneous users.
 | 
|  |     74 | \end{enumerate}
 | 
|  |     75 | 
 | 
|  |     76 | \noindent
 | 
|  |     77 | Several \emph{a priori} choices are fixed:\begin{enumerate}
 | 
|  |     78 | \item The search will run against an Isabelle heap.
 | 
|  |     79 | \item A single heap will be built from the theories to be searched.
 | 
|  |     80 | \item The system must be persistent, for two reasons: \begin{enumerate}
 | 
|  |     81 |     \item Isabelle is slow to start against large heaps.
 | 
|  |     82 |     \item Later enhancements may require tracking states at the server.
 | 
|  |     83 | \end{enumerate}
 | 
|  |     84 | \end{enumerate}
 | 
|  |     85 | 
 | 
|  |     86 | \section{Evaluation: Isabelle web server}\label{sec:oneproc}
 | 
|  |     87 | 
 | 
|  |     88 | The advantage of integrating a web server into Isabelle is that the 
 | 
|  |     89 | find\_theorems service could be provided by a single process, which, in 
 | 
|  |     90 | principle, would simplify administration.
 | 
|  |     91 | Implementing even a simple \ac{HTTP} service from scratch is an unacceptable 
 | 
|  |     92 | cost and fraught with potential problems and limitations.
 | 
|  |     93 | It is preferable to adapt an existing system.
 | 
|  |     94 | 
 | 
|  |     95 | As Isabelle is written in \ac{SML}, only \ac{HTTP} services also written in 
 | 
|  |     96 | \ac{SML} can realistically be considered.
 | 
|  |     97 | In principle Isabelle compiles on both Poly/ML and \ac{SML/NJ}, but in 
 | 
|  |     98 | practice the former is faster, more widely used, and better supported.
 | 
|  |     99 | In particular, the L4.verified project does not build effectively on 
 | 
|  |    100 | \ac{SML/NJ}.
 | 
|  |    101 | This further limits the potential to adapt an existing system.
 | 
|  |    102 | 
 | 
|  |    103 | There are three \ac{SML} web server projects:\\
 | 
|  |    104 | \centerline{\begin{tabular}{ll}
 | 
|  |    105 | SMLServer &
 | 
|  |    106 |     \url{http://www.smlserver.org}\\
 | 
|  |    107 | FoxNet web server &
 | 
|  |    108 |     \url{http://www.cs.cmu.edu/~fox/}\\
 | 
|  |    109 | Swerve web server &
 | 
|  |    110 |     \url{http://mlton.org/Swerve}
 | 
|  |    111 | \end{tabular}}
 | 
|  |    112 | 
 | 
|  |    113 | Unfortunately, none of these projects is suitable.
 | 
|  |    114 | 
 | 
|  |    115 | The SMLServer is an Apache web server plugin.
 | 
|  |    116 | It runs \ac{SML} programs that generate dynamic web pages.
 | 
|  |    117 | SMLServer is based on the MLKit compiler.
 | 
|  |    118 | It is unlikely that Isabelle and the l4.verified heaps could be compiled in 
 | 
|  |    119 | MLKit, at least not without significant effort.
 | 
|  |    120 | 
 | 
|  |    121 | The FoxNet web server was developed as part of the Fox project at \ac{CMU}.
 | 
|  |    122 | The source is not readily available.
 | 
|  |    123 | 
 | 
|  |    124 | The Swerve web server is part of an unpublished book on systems programming 
 | 
|  |    125 | in \ac{SML}.
 | 
|  |    126 | The source code is available and it is readily patched to run under the 
 | 
|  |    127 | latest version of SML/NJ (110.67).
 | 
|  |    128 | Adapting it to Poly/ML would require non-trivial effort because it is based 
 | 
|  |    129 | on \ac{CML}, whose implementation on SML/NJ makes use of continuations 
 | 
|  |    130 | (SMLofNJ.cont).
 | 
|  |    131 | 
 | 
|  |    132 | \section{Implementation: Isabelle web module}\label{sec:twoproc}
 | 
|  |    133 | 
 | 
|  |    134 | The description of the two process solution is divided into two subsections.
 | 
|  |    135 | The first contains an overview of the architecture and web server specifics.
 | 
|  |    136 | The second contains a summary of an \ac{SML} implementation of the web 
 | 
|  |    137 | module in Isabelle.
 | 
|  |    138 | 
 | 
|  |    139 | \subsection{Architecture and web server}\label{sec:oneproc:arch}
 | 
|  |    140 | 
 | 
|  |    141 | \newcommand{\component}[1]{%
 | 
|  |    142 |     \rput(0,0){\psframe(-.8,-.6)(.8,.6)}%
 | 
|  |    143 |     \rput(0,0){\parbox{4.3em}{\centering{#1}}}}
 | 
|  |    144 | 
 | 
|  |    145 | \begin{figure}
 | 
|  |    146 | \centering%
 | 
|  |    147 | \begin{pspicture}(-4.8,0)(3.3,4)%\psgrid
 | 
|  |    148 |     \newpsstyle{conn}{arrows=->}%
 | 
|  |    149 |     %
 | 
|  |    150 |     \rput(-2.2,3.3){\component{web server}}%
 | 
|  |    151 |     \rput( 2.2,3.3){\component{web module}}%
 | 
|  |    152 |     \rput(-2.2,0.7){\component{web client}}%
 | 
|  |    153 |     \rput(-4.2,3.4){%
 | 
|  |    154 | 	\psellipse(0,-.2)(.4,.2)%
 | 
|  |    155 | 	\psframe[linestyle=none,fillstyle=solid,fillcolor=white]%
 | 
|  |    156 | 		(-.4,-.2)(.4,.1)%
 | 
|  |    157 | 	\psellipse(0,.1)(.4,.2)%
 | 
|  |    158 | 	\psline(-.38,-.2)(-.38,.1)\psline(.38,-.2)(.38,.1)%
 | 
|  |    159 |     }%
 | 
|  |    160 |     \psline[style=conn,arrows=<->](-3.0,3.3)(-3.8,3.3)%
 | 
|  |    161 |     %
 | 
|  |    162 |     \rput[rB](3.3,2.15){server}%
 | 
|  |    163 |     \psline[linestyle=dashed](-4.8,2)(3.3,2)%
 | 
|  |    164 |     \rput[rt](3.3,1.90){network}%
 | 
|  |    165 |     %
 | 
|  |    166 |     \rput[B](0.0,3.55){\psframebox*{module protocol}}%
 | 
|  |    167 |     \psline[style=conn](-1.4,3.4)(1.4,3.4)%
 | 
|  |    168 |     \psline[style=conn](1.4,3.2)(-1.4,3.2)%
 | 
|  |    169 |     %
 | 
|  |    170 |     \rput[B]{90}(-2.4,2.0){\psframebox*{\ac{HTTP}}}%
 | 
|  |    171 |     \psline[style=conn](-2.1,2.7)(-2.1,1.3)%
 | 
|  |    172 |     \psline[style=conn](-2.3,1.3)(-2.3,2.7)%
 | 
|  |    173 | \end{pspicture}
 | 
|  |    174 | \caption{Overview of web module architecture\label{fig:modulearch}}
 | 
|  |    175 | \end{figure}
 | 
|  |    176 | 
 | 
|  |    177 | An overview of a simple web server architecture is presented in 
 | 
|  |    178 | \reffig{modulearch}.
 | 
|  |    179 | A \emph{web client} requests a \ac{URL} from a \emph{web server} over 
 | 
|  |    180 | \ac{HTTP}.
 | 
|  |    181 | The web server processes the request by fetching static elements from its 
 | 
|  |    182 | local file system and communicating with \emph{web modules} to dynamically 
 | 
|  |    183 | generate other elements.
 | 
|  |    184 | The elements are sent back across the network to the web client.
 | 
|  |    185 | 
 | 
|  |    186 | The web server communicates with web modules over a \emph{module protocol}, 
 | 
|  |    187 | which dictates a means of passing requests and receiving responses.
 | 
|  |    188 | There are several common module protocols.
 | 
|  |    189 | 
 | 
|  |    190 | In the \ac{CGI}, the web server executes processes to service dynamic 
 | 
|  |    191 | \acp{URL}.
 | 
|  |    192 | Request details are placed in environment variables and sent on the standard 
 | 
|  |    193 | input channels of processes, responses are read from the standard output 
 | 
|  |    194 | channels of processes and transmitted back to web clients.
 | 
|  |    195 | 
 | 
|  |    196 | The chief disadvantage of \ac{CGI} is that it creates a new process for 
 | 
|  |    197 | every request.
 | 
|  |    198 | Fast \ac{CGI} web modules, on the other hand, run persistently in a loop.
 | 
|  |    199 | They receive and respond to web server requests over a duplex socket.
 | 
|  |    200 | The Fast \ac{CGI} protocol is quite complicated.
 | 
|  |    201 | There are, however, alternatives like \ac{SCGI} that are easier for web 
 | 
|  |    202 | modules to support.
 | 
|  |    203 | This is important when programming in \ac{SML} because both the number of 
 | 
|  |    204 | developers and available libraries are small.
 | 
|  |    205 | 
 | 
|  |    206 | An \ac{SCGI} web module listens on a socket for requests from a web server.
 | 
|  |    207 | Requests are sent as stream of bytes.
 | 
|  |    208 | The first part of the request is a null-delimited sequence of name and value 
 | 
|  |    209 | pairs.
 | 
|  |    210 | The second part is unparsed text sent from the web client.
 | 
|  |    211 | The web module responds by sending text, usually \ac{HTTP} headers followed 
 | 
|  |    212 | by \ac{HTML} data, back over the socket.
 | 
|  |    213 | The whole protocol can be described in two 
 | 
|  |    214 | pages.\footnote{\url{http://python.ca/scgi/protocol.txt}}
 | 
|  |    215 | 
 | 
|  |    216 | Both the Apache and Lighttpd web servers support the \ac{SCGI} protocol.
 | 
|  |    217 | Lighttpd was chosen because it seemed to be small, fast, and easy to 
 | 
|  |    218 | configure.
 | 
|  |    219 | Two settings are required to connect lighttpd to an \ac{SCGI} web module: 
 | 
|  |    220 | \begin{verbatim}
 | 
|  |    221 | server.modules = ( "mod_scgi" )
 | 
|  |    222 | scgi.server = ("/isabelle" => ((
 | 
|  |    223 | 		  "host" => "127.0.0.1",
 | 
|  |    224 | 		  "port" => 64000,
 | 
|  |    225 | 		  "check-local" => "disable")))
 | 
|  |    226 | \end{verbatim}
 | 
|  |    227 | In this example, the \texttt{scgi.server} entry directs the web server to 
 | 
|  |    228 | pass all \acp{URL} beginning with \texttt{/isabelle} to the web module 
 | 
|  |    229 | listening on port \texttt{64000}.
 | 
|  |    230 | 
 | 
|  |    231 | \subsection{Implementation in \acs{SML}}\label{sec:oneproc:impl}
 | 
|  |    232 | 
 | 
|  |    233 | \begin{table}
 | 
|  |    234 | \begin{tabular}{lp{.70\textwidth}}
 | 
|  |    235 | \textbf{Module}	     &	\textbf{Description}\\\hline
 | 
|  |    236 | Mime		     &	Rudimentary support for mime types.\\
 | 
|  |    237 | HttpStatus	     &	\ac{HTTP} header status codes.\\
 | 
|  |    238 | HttpUtil	     &	Produce \ac{HTTP} headers and parse query strings.\\
 | 
|  |    239 | Xhtml		     &	Rudimentary support for generating \ac{HTML}.\\
 | 
|  |    240 | SocketUtil	     &	Routines from The Standard ML Basis Library
 | 
|  |    241 | 			book.\footnote{Chapter 10, Gansner and Reppy, 
 | 
|  |    242 | 			Cambridge University Press.}\\
 | 
|  |    243 | \textbf{ScgiReq}     &	Parse \ac{SCGI} requests.\\
 | 
|  |    244 | \textbf{ScgiServer}  &	\ac{SCGI} server event loop and dispatch.\\
 | 
|  |    245 | \textbf{FindTheorems}&	\ac{HTML} interface to find\_theorems.
 | 
|  |    246 | \end{tabular}
 | 
|  |    247 | \caption{Web module implementation\label{tab:moduleimpl}}
 | 
|  |    248 | \end{table}
 | 
|  |    249 | 
 | 
|  |    250 | The find\_theorems web module is implemented in \ac{SML}.
 | 
|  |    251 | It uses elements of the Pure library in Isabelle.
 | 
|  |    252 | The program comprises the nine modules shown in \reftab{moduleimpl}.
 | 
|  |    253 | The three most important ones are shown in bold: ScgiReq, ScgiServer, and 
 | 
|  |    254 | FindTheorems.
 | 
|  |    255 | 
 | 
|  |    256 | ScgiReq implements the client-side of the \ac{SCGI} protocol.
 | 
|  |    257 | It maps a binary input stream into a request data type.
 | 
|  |    258 | 
 | 
|  |    259 | ScgiServer is a generic, multi-threaded \ac{SCGI} request dispatcher.
 | 
|  |    260 | It accepts \ac{SCGI} requests on a designated socket, selects an appropriate 
 | 
|  |    261 | handler from an internal list, and calls the handler in a new thread.
 | 
|  |    262 | 
 | 
|  |    263 | FindTheorems registers a handler with ScgiServer.
 | 
|  |    264 | It parses \ac{SCGI}/\ac{HTTP} requests, calls the Isabelle find\_theorems 
 | 
|  |    265 | function, and returns the results as \ac{HTML}.
 | 
|  |    266 | The \ac{HTML} generation of individual theorems is handled by the \ac{HTML} 
 | 
|  |    267 | print mode of Isabelle, but the form fields and page structure were manually 
 | 
|  |    268 | implemented.
 | 
|  |    269 | 
 | 
|  |    270 | The server is started by calling \texttt{ScgiServer.server}.
 | 
|  |    271 | Scripts have been created to automate this process.
 | 
|  |    272 | 
 | 
|  |    273 | \subsection{Handling symbols}\label{sec:unicode}
 | 
|  |    274 | 
 | 
|  |    275 | Isabelle theorems are usually written in mathematical notation.
 | 
|  |    276 | Internally, however, Isabelle only manipulates \acs{ASCII} strings.
 | 
|  |    277 | Symbols are encoded by strings that begin with a backslash and contain a 
 | 
|  |    278 | symbol name between angle braces, for example, the symbol~$\longrightarrow$ 
 | 
|  |    279 | becomes~\verb+\<longrightarrow>+.
 | 
|  |    280 | The existing Thy/Html module in the Isabelle Pure theory converts many of 
 | 
|  |    281 | these symbols to \ac{HTML} entities.
 | 
|  |    282 | Custom routines are required to convert the missing symbols to \ac{HTML} 
 | 
|  |    283 | \emph{numeric character references}, which are the Unicode codepoints of 
 | 
|  |    284 | symbols printed in decimal between \verb+&#+ and \verb+;+.
 | 
|  |    285 | Further, other routines were required for converting \acs{UTF-8} encoded 
 | 
|  |    286 | strings sent from web browsers into Isabelle's symbol encoding.
 | 
|  |    287 | 
 | 
|  |    288 | Isabelle is distributed with a text file that maps Isabelle symbols to 
 | 
|  |    289 | Unicode codepoints.
 | 
|  |    290 | A module was written to parse this file into symbol tables that map back and 
 | 
|  |    291 | forth between Isabelle symbols and Unicode codepoints, and also between 
 | 
|  |    292 | Isabelle \acs{ASCII} abbreviations (like \verb+-->+ for $\longrightarrow$) 
 | 
|  |    293 | and Unicode codepoints.
 | 
|  |    294 | 
 | 
|  |    295 | The conversion from Isabelle symbols to \ac{HTML} numeric character 
 | 
|  |    296 | references is handled by a new printing mode, which is based in large part 
 | 
|  |    297 | on the existing \ac{HTML} printing mode.
 | 
|  |    298 | The new mode is used in combination with the existing \texttt{xsymbol} mode, 
 | 
|  |    299 | to ensure that Isabelle symbols are used instead of \acs{ASCII} 
 | 
|  |    300 | abbreviations.
 | 
|  |    301 | 
 | 
|  |    302 | The conversion from \acs{UTF-8} is handled by a custom routine.
 | 
|  |    303 | Additionally, there is a JavaScript routine that converts from Isabelle 
 | 
|  |    304 | symbol encodings to \acs{UTF-8}, so that users can conveniently view 
 | 
|  |    305 | manually-entered or pasted mathematical characters in the web browser 
 | 
|  |    306 | interface.
 | 
|  |    307 | 
 | 
|  |    308 | \section{Abbreviations}\label{sec:abbr}
 | 
|  |    309 | 
 | 
|  |    310 | \begin{acronym}[SML/NJ] % longest acronym here
 | 
|  |    311 |     \acro{ASCII}{American Standard Code for Information Interchange}
 | 
|  |    312 |     \acro{CGI}{Common Gateway Interface}
 | 
|  |    313 |     \acro{CML}{Concurrent ML}
 | 
|  |    314 |     \acro{CMU}{Carnegie Mellon University}
 | 
|  |    315 |     \acro{HTML}{Hyper Text Markup Language}
 | 
|  |    316 |     \acro{HTTP}{Hyper Text Transfer Protocol}
 | 
|  |    317 |     \acro{ML}{Meta Language}
 | 
|  |    318 |     \acro{SCGI}{Simple CGI}
 | 
|  |    319 |     \acro{SML}{Standard ML}
 | 
|  |    320 |     \acro{SML/NJ}{Standard ML of New Jersey}
 | 
|  |    321 |     \acro{URL}{Universal Resource Locator}
 | 
|  |    322 |     \acro{UTF-8}{8-bit Unicode Transformation Format}
 | 
|  |    323 |     \acro{WWW}{World Wide Web}
 | 
|  |    324 | \end{acronym}
 | 
|  |    325 | 
 | 
|  |    326 | \end{document}
 |