33817
|
1 |
% vim:nojs: tw=76 sw=4 sts=4 fo=awn fdm=marker
|
|
2 |
%
|
|
3 |
% 20090406 T. Bourke
|
|
4 |
% Original document.
|
|
5 |
%
|
|
6 |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
7 |
\documentclass[a4paper,draft]{article} % XXX
|
|
8 |
%\documentclass[a4paper,final]{article}
|
|
9 |
% Preamble
|
|
10 |
\usepackage[T1]{fontenc}
|
|
11 |
\usepackage{textcomp}
|
|
12 |
\usepackage{ifdraft}
|
|
13 |
|
|
14 |
\bibliographystyle{abbrv} % alpha
|
|
15 |
|
|
16 |
\newcommand{\refsec}[1]{Section~\ref{sec:#1}}
|
|
17 |
\newcommand{\reffig}[1]{Figure~\ref{fig:#1}}
|
|
18 |
\newcommand{\reftab}[1]{Table~\ref{tab:#1}}
|
|
19 |
|
|
20 |
%
|
|
21 |
\usepackage{acronym}
|
|
22 |
\usepackage{pst-all}
|
|
23 |
\usepackage{url}
|
|
24 |
|
|
25 |
\title{Isabelle find\_theorems on the web}
|
|
26 |
\author{T. Bourke\thanks{NICTA}}
|
|
27 |
|
|
28 |
\special{!/pdfmark where
|
|
29 |
{pop} {userdict /pdfmark /cleartomark load put} ifelse
|
|
30 |
[ /Author (T. Bourke, NICTA)
|
|
31 |
/Title (IsabelleWWW: find_theorems
|
|
32 |
($Date: 2008-10-03 16:09:02 +1000 (Fri, 03 Oct 2008) $))
|
|
33 |
/Subject (Web interface to find_theorems)
|
|
34 |
/Keywords (isabelle, sml, http, www)
|
|
35 |
/DOCINFO pdfmark}
|
|
36 |
|
|
37 |
\begin{document}
|
|
38 |
% Title page and abstract
|
|
39 |
\maketitle
|
|
40 |
|
|
41 |
\begin{abstract}
|
|
42 |
The Isabelle find\_theorems command processes queries against a theory
|
|
43 |
database and returns a list of matching theorems.
|
|
44 |
It is usually given from the Proof General or command-line interface.
|
|
45 |
This document describes a web server implementation.
|
|
46 |
Two design alternatives are presented and an overview of an implementation
|
|
47 |
of one is described.
|
|
48 |
\end{abstract}
|
|
49 |
|
|
50 |
\section{Introduction}\label{sec:intro}
|
|
51 |
|
|
52 |
This document describes the design and implementation of a web interface for
|
|
53 |
the Isabelle find\_theorems command.
|
|
54 |
The design requirements and their consequences are detailed in \refsec{req}.
|
|
55 |
Two architectures were considered: \begin{enumerate}
|
|
56 |
|
|
57 |
\item \emph{one process}: integrate a web server into Isabelle.
|
|
58 |
|
|
59 |
\item \emph{two processes}: run Isabelle as a web server module.
|
|
60 |
|
|
61 |
\end{enumerate}
|
|
62 |
A brief evaluation of the one process architecture is presented in
|
|
63 |
\refsec{oneproc}.
|
|
64 |
An implementation of the two process is presented in \refsec{twoproc}.
|
|
65 |
|
|
66 |
\section{Design requirements}\label{sec:req}
|
|
67 |
|
|
68 |
The main requirements are:\begin{enumerate}
|
|
69 |
\item The system will allow users to search for theorems from a web browser.
|
|
70 |
\item It will allow queries across disparate Isabelle theories.
|
|
71 |
\item It will, at a minimum, handle theories from the L4.verified project.
|
|
72 |
\item It will run on a secure network.
|
|
73 |
\item There will be at most ten simultaneous users.
|
|
74 |
\end{enumerate}
|
|
75 |
|
|
76 |
\noindent
|
|
77 |
Several \emph{a priori} choices are fixed:\begin{enumerate}
|
|
78 |
\item The search will run against an Isabelle heap.
|
|
79 |
\item A single heap will be built from the theories to be searched.
|
|
80 |
\item The system must be persistent, for two reasons: \begin{enumerate}
|
|
81 |
\item Isabelle is slow to start against large heaps.
|
|
82 |
\item Later enhancements may require tracking states at the server.
|
|
83 |
\end{enumerate}
|
|
84 |
\end{enumerate}
|
|
85 |
|
|
86 |
\section{Evaluation: Isabelle web server}\label{sec:oneproc}
|
|
87 |
|
|
88 |
The advantage of integrating a web server into Isabelle is that the
|
|
89 |
find\_theorems service could be provided by a single process, which, in
|
|
90 |
principle, would simplify administration.
|
|
91 |
Implementing even a simple \ac{HTTP} service from scratch is an unacceptable
|
|
92 |
cost and fraught with potential problems and limitations.
|
|
93 |
It is preferable to adapt an existing system.
|
|
94 |
|
|
95 |
As Isabelle is written in \ac{SML}, only \ac{HTTP} services also written in
|
|
96 |
\ac{SML} can realistically be considered.
|
|
97 |
In principle Isabelle compiles on both Poly/ML and \ac{SML/NJ}, but in
|
|
98 |
practice the former is faster, more widely used, and better supported.
|
|
99 |
In particular, the L4.verified project does not build effectively on
|
|
100 |
\ac{SML/NJ}.
|
|
101 |
This further limits the potential to adapt an existing system.
|
|
102 |
|
|
103 |
There are three \ac{SML} web server projects:\\
|
|
104 |
\centerline{\begin{tabular}{ll}
|
|
105 |
SMLServer &
|
|
106 |
\url{http://www.smlserver.org}\\
|
|
107 |
FoxNet web server &
|
|
108 |
\url{http://www.cs.cmu.edu/~fox/}\\
|
|
109 |
Swerve web server &
|
|
110 |
\url{http://mlton.org/Swerve}
|
|
111 |
\end{tabular}}
|
|
112 |
|
|
113 |
Unfortunately, none of these projects is suitable.
|
|
114 |
|
|
115 |
The SMLServer is an Apache web server plugin.
|
|
116 |
It runs \ac{SML} programs that generate dynamic web pages.
|
|
117 |
SMLServer is based on the MLKit compiler.
|
|
118 |
It is unlikely that Isabelle and the l4.verified heaps could be compiled in
|
|
119 |
MLKit, at least not without significant effort.
|
|
120 |
|
|
121 |
The FoxNet web server was developed as part of the Fox project at \ac{CMU}.
|
|
122 |
The source is not readily available.
|
|
123 |
|
|
124 |
The Swerve web server is part of an unpublished book on systems programming
|
|
125 |
in \ac{SML}.
|
|
126 |
The source code is available and it is readily patched to run under the
|
|
127 |
latest version of SML/NJ (110.67).
|
|
128 |
Adapting it to Poly/ML would require non-trivial effort because it is based
|
|
129 |
on \ac{CML}, whose implementation on SML/NJ makes use of continuations
|
|
130 |
(SMLofNJ.cont).
|
|
131 |
|
|
132 |
\section{Implementation: Isabelle web module}\label{sec:twoproc}
|
|
133 |
|
|
134 |
The description of the two process solution is divided into two subsections.
|
|
135 |
The first contains an overview of the architecture and web server specifics.
|
|
136 |
The second contains a summary of an \ac{SML} implementation of the web
|
|
137 |
module in Isabelle.
|
|
138 |
|
|
139 |
\subsection{Architecture and web server}\label{sec:oneproc:arch}
|
|
140 |
|
|
141 |
\newcommand{\component}[1]{%
|
|
142 |
\rput(0,0){\psframe(-.8,-.6)(.8,.6)}%
|
|
143 |
\rput(0,0){\parbox{4.3em}{\centering{#1}}}}
|
|
144 |
|
|
145 |
\begin{figure}
|
|
146 |
\centering%
|
|
147 |
\begin{pspicture}(-4.8,0)(3.3,4)%\psgrid
|
|
148 |
\newpsstyle{conn}{arrows=->}%
|
|
149 |
%
|
|
150 |
\rput(-2.2,3.3){\component{web server}}%
|
|
151 |
\rput( 2.2,3.3){\component{web module}}%
|
|
152 |
\rput(-2.2,0.7){\component{web client}}%
|
|
153 |
\rput(-4.2,3.4){%
|
|
154 |
\psellipse(0,-.2)(.4,.2)%
|
|
155 |
\psframe[linestyle=none,fillstyle=solid,fillcolor=white]%
|
|
156 |
(-.4,-.2)(.4,.1)%
|
|
157 |
\psellipse(0,.1)(.4,.2)%
|
|
158 |
\psline(-.38,-.2)(-.38,.1)\psline(.38,-.2)(.38,.1)%
|
|
159 |
}%
|
|
160 |
\psline[style=conn,arrows=<->](-3.0,3.3)(-3.8,3.3)%
|
|
161 |
%
|
|
162 |
\rput[rB](3.3,2.15){server}%
|
|
163 |
\psline[linestyle=dashed](-4.8,2)(3.3,2)%
|
|
164 |
\rput[rt](3.3,1.90){network}%
|
|
165 |
%
|
|
166 |
\rput[B](0.0,3.55){\psframebox*{module protocol}}%
|
|
167 |
\psline[style=conn](-1.4,3.4)(1.4,3.4)%
|
|
168 |
\psline[style=conn](1.4,3.2)(-1.4,3.2)%
|
|
169 |
%
|
|
170 |
\rput[B]{90}(-2.4,2.0){\psframebox*{\ac{HTTP}}}%
|
|
171 |
\psline[style=conn](-2.1,2.7)(-2.1,1.3)%
|
|
172 |
\psline[style=conn](-2.3,1.3)(-2.3,2.7)%
|
|
173 |
\end{pspicture}
|
|
174 |
\caption{Overview of web module architecture\label{fig:modulearch}}
|
|
175 |
\end{figure}
|
|
176 |
|
|
177 |
An overview of a simple web server architecture is presented in
|
|
178 |
\reffig{modulearch}.
|
|
179 |
A \emph{web client} requests a \ac{URL} from a \emph{web server} over
|
|
180 |
\ac{HTTP}.
|
|
181 |
The web server processes the request by fetching static elements from its
|
|
182 |
local file system and communicating with \emph{web modules} to dynamically
|
|
183 |
generate other elements.
|
|
184 |
The elements are sent back across the network to the web client.
|
|
185 |
|
|
186 |
The web server communicates with web modules over a \emph{module protocol},
|
|
187 |
which dictates a means of passing requests and receiving responses.
|
|
188 |
There are several common module protocols.
|
|
189 |
|
|
190 |
In the \ac{CGI}, the web server executes processes to service dynamic
|
|
191 |
\acp{URL}.
|
|
192 |
Request details are placed in environment variables and sent on the standard
|
|
193 |
input channels of processes, responses are read from the standard output
|
|
194 |
channels of processes and transmitted back to web clients.
|
|
195 |
|
|
196 |
The chief disadvantage of \ac{CGI} is that it creates a new process for
|
|
197 |
every request.
|
|
198 |
Fast \ac{CGI} web modules, on the other hand, run persistently in a loop.
|
|
199 |
They receive and respond to web server requests over a duplex socket.
|
|
200 |
The Fast \ac{CGI} protocol is quite complicated.
|
|
201 |
There are, however, alternatives like \ac{SCGI} that are easier for web
|
|
202 |
modules to support.
|
|
203 |
This is important when programming in \ac{SML} because both the number of
|
|
204 |
developers and available libraries are small.
|
|
205 |
|
|
206 |
An \ac{SCGI} web module listens on a socket for requests from a web server.
|
|
207 |
Requests are sent as stream of bytes.
|
|
208 |
The first part of the request is a null-delimited sequence of name and value
|
|
209 |
pairs.
|
|
210 |
The second part is unparsed text sent from the web client.
|
|
211 |
The web module responds by sending text, usually \ac{HTTP} headers followed
|
|
212 |
by \ac{HTML} data, back over the socket.
|
|
213 |
The whole protocol can be described in two
|
|
214 |
pages.\footnote{\url{http://python.ca/scgi/protocol.txt}}
|
|
215 |
|
|
216 |
Both the Apache and Lighttpd web servers support the \ac{SCGI} protocol.
|
|
217 |
Lighttpd was chosen because it seemed to be small, fast, and easy to
|
|
218 |
configure.
|
|
219 |
Two settings are required to connect lighttpd to an \ac{SCGI} web module:
|
|
220 |
\begin{verbatim}
|
|
221 |
server.modules = ( "mod_scgi" )
|
|
222 |
scgi.server = ("/isabelle" => ((
|
|
223 |
"host" => "127.0.0.1",
|
|
224 |
"port" => 64000,
|
|
225 |
"check-local" => "disable")))
|
|
226 |
\end{verbatim}
|
|
227 |
In this example, the \texttt{scgi.server} entry directs the web server to
|
|
228 |
pass all \acp{URL} beginning with \texttt{/isabelle} to the web module
|
|
229 |
listening on port \texttt{64000}.
|
|
230 |
|
|
231 |
\subsection{Implementation in \acs{SML}}\label{sec:oneproc:impl}
|
|
232 |
|
|
233 |
\begin{table}
|
|
234 |
\begin{tabular}{lp{.70\textwidth}}
|
|
235 |
\textbf{Module} & \textbf{Description}\\\hline
|
|
236 |
Mime & Rudimentary support for mime types.\\
|
|
237 |
HttpStatus & \ac{HTTP} header status codes.\\
|
|
238 |
HttpUtil & Produce \ac{HTTP} headers and parse query strings.\\
|
|
239 |
Xhtml & Rudimentary support for generating \ac{HTML}.\\
|
|
240 |
SocketUtil & Routines from The Standard ML Basis Library
|
|
241 |
book.\footnote{Chapter 10, Gansner and Reppy,
|
|
242 |
Cambridge University Press.}\\
|
|
243 |
\textbf{ScgiReq} & Parse \ac{SCGI} requests.\\
|
|
244 |
\textbf{ScgiServer} & \ac{SCGI} server event loop and dispatch.\\
|
|
245 |
\textbf{FindTheorems}& \ac{HTML} interface to find\_theorems.
|
|
246 |
\end{tabular}
|
|
247 |
\caption{Web module implementation\label{tab:moduleimpl}}
|
|
248 |
\end{table}
|
|
249 |
|
|
250 |
The find\_theorems web module is implemented in \ac{SML}.
|
|
251 |
It uses elements of the Pure library in Isabelle.
|
|
252 |
The program comprises the nine modules shown in \reftab{moduleimpl}.
|
|
253 |
The three most important ones are shown in bold: ScgiReq, ScgiServer, and
|
|
254 |
FindTheorems.
|
|
255 |
|
|
256 |
ScgiReq implements the client-side of the \ac{SCGI} protocol.
|
|
257 |
It maps a binary input stream into a request data type.
|
|
258 |
|
|
259 |
ScgiServer is a generic, multi-threaded \ac{SCGI} request dispatcher.
|
|
260 |
It accepts \ac{SCGI} requests on a designated socket, selects an appropriate
|
|
261 |
handler from an internal list, and calls the handler in a new thread.
|
|
262 |
|
|
263 |
FindTheorems registers a handler with ScgiServer.
|
|
264 |
It parses \ac{SCGI}/\ac{HTTP} requests, calls the Isabelle find\_theorems
|
|
265 |
function, and returns the results as \ac{HTML}.
|
|
266 |
The \ac{HTML} generation of individual theorems is handled by the \ac{HTML}
|
|
267 |
print mode of Isabelle, but the form fields and page structure were manually
|
|
268 |
implemented.
|
|
269 |
|
|
270 |
The module is built by using a \texttt{ROOT.ML} file inside a heap that
|
|
271 |
contains the theories to be searched.
|
|
272 |
The server is started by calling \texttt{ScgiServer.server}.
|
|
273 |
Scripts have been created to automate this process.
|
|
274 |
|
|
275 |
\subsection{Handling symbols}\label{sec:unicode}
|
|
276 |
|
|
277 |
Isabelle theorems are usually written in mathematical notation.
|
|
278 |
Internally, however, Isabelle only manipulates \acs{ASCII} strings.
|
|
279 |
Symbols are encoded by strings that begin with a backslash and contain a
|
|
280 |
symbol name between angle braces, for example, the symbol~$\longrightarrow$
|
|
281 |
becomes~\verb+\<longrightarrow>+.
|
|
282 |
The existing Thy/Html module in the Isabelle Pure theory converts many of
|
|
283 |
these symbols to \ac{HTML} entities.
|
|
284 |
Custom routines are required to convert the missing symbols to \ac{HTML}
|
|
285 |
\emph{numeric character references}, which are the Unicode codepoints of
|
|
286 |
symbols printed in decimal between \verb+&#+ and \verb+;+.
|
|
287 |
Further, other routines were required for converting \acs{UTF-8} encoded
|
|
288 |
strings sent from web browsers into Isabelle's symbol encoding.
|
|
289 |
|
|
290 |
Isabelle is distributed with a text file that maps Isabelle symbols to
|
|
291 |
Unicode codepoints.
|
|
292 |
A module was written to parse this file into symbol tables that map back and
|
|
293 |
forth between Isabelle symbols and Unicode codepoints, and also between
|
|
294 |
Isabelle \acs{ASCII} abbreviations (like \verb+-->+ for $\longrightarrow$)
|
|
295 |
and Unicode codepoints.
|
|
296 |
|
|
297 |
The conversion from Isabelle symbols to \ac{HTML} numeric character
|
|
298 |
references is handled by a new printing mode, which is based in large part
|
|
299 |
on the existing \ac{HTML} printing mode.
|
|
300 |
The new mode is used in combination with the existing \texttt{xsymbol} mode,
|
|
301 |
to ensure that Isabelle symbols are used instead of \acs{ASCII}
|
|
302 |
abbreviations.
|
|
303 |
|
|
304 |
The conversion from \acs{UTF-8} is handled by a custom routine.
|
|
305 |
Additionally, there is a JavaScript routine that converts from Isabelle
|
|
306 |
symbol encodings to \acs{UTF-8}, so that users can conveniently view
|
|
307 |
manually-entered or pasted mathematical characters in the web browser
|
|
308 |
interface.
|
|
309 |
|
|
310 |
\section{Abbreviations}\label{sec:abbr}
|
|
311 |
|
|
312 |
\begin{acronym}[SML/NJ] % longest acronym here
|
|
313 |
\acro{ASCII}{American Standard Code for Information Interchange}
|
|
314 |
\acro{CGI}{Common Gateway Interface}
|
|
315 |
\acro{CML}{Concurrent ML}
|
|
316 |
\acro{CMU}{Carnegie Mellon University}
|
|
317 |
\acro{HTML}{Hyper Text Markup Language}
|
|
318 |
\acro{HTTP}{Hyper Text Transfer Protocol}
|
|
319 |
\acro{ML}{Meta Language}
|
|
320 |
\acro{SCGI}{Simple CGI}
|
|
321 |
\acro{SML}{Standard ML}
|
|
322 |
\acro{SML/NJ}{Standard ML of New Jersey}
|
|
323 |
\acro{URL}{Universal Resource Locator}
|
|
324 |
\acro{UTF-8}{8-bit Unicode Transformation Format}
|
|
325 |
\acro{WWW}{World Wide Web}
|
|
326 |
\end{acronym}
|
|
327 |
|
|
328 |
\end{document}
|