[developers] SBCL port: XML

Ben Waldron bmw20 at cl.cam.ac.uk
Mon Oct 23 20:29:46 CEST 2006


Ann Copestake wrote:
> what do you mean by making the swap `official' - remove pxml or include both?
>   
I suggest removing pxml and replacing it with S-XML, as (see below) it 
is more efficient, supports Lisps other than Allegro, provides the same 
LXML output (modulo a couple of minor points) as the pxml code (meaning 
no large changes needed to our source code), and provides adequate XML 
conformance.

> have you evaluated relative speed?  is it tested on any of the standard test 
> suites?  how actively is it supported?
>   
In terms of relative speed, S-XML runs an order of magnitude faster. 
There is a similar improvement in terms of memory usage.

The S-XML API (and code) appears cleaner than that for pxml.

That said, the level of XML support is less complete. Both pxml and 
S-XML are non-validating. S-XML also ignores DTD's (meaning no XML 
entity support) and doesn't support special tags such as processing 
instructions. For example, the OASIS xml test files consist of: 98 
positive tests for which pxml gives 3 false negatives, and S-XML 10; 249 
negative tests for which pxml gives 18 false positives, and S-XML 147. 
On James Clark's mxl test files: pxml gives 1 false negative and 16 
false positives (despite what it says in the documentation); S-XML gives 
37 false negatives and 121 false positives. The cases where S-XML fails 
to accept well-formed XML (false negatives) are due to: ignoring XML 
entities (intended; 31 cases), failure to handle processing instructions 
(3 cases), failure to handle whitespace within a closing tag (apparently 
this is acceptable XML; 2 cases), and failure to handle non-ASCII tag 
names (1 case).

As far as the LKB is concerned, I think the level of XML conformance 
supplied by S-XML is adequate. The LKB needs an XML reader (and, 
arguably, writer) in order to implement interfaces to the outside world; 
currently these interfaces consist of the SMAF preprocessor interface, 
an (R)MRS XML interface, and the SPPP preprocessor interface. We also 
require the ability to process input derived from a general XML text. 
But this functionality, arguably, does not belong in the LKB proper; it 
should be done by a separate module which talks to the LKB via the SMAF 
interface (eg. as in the SciBorg project at Cambridge).

The effect of swapping pxml with S-XML on the CSLI test suite: identical 
parse results, speed not significantly altered (0.8% improvement), 
reduction in memory usage.

So far as support is concerned, S-XML is an open source project with a 
low-volume mailing list. Email sent to the list appears to get a helpful 
response. The last update to the source code was earlier this year. For 
pxml, support presumably comes from Franz. According to the Franz 
website, the last activity on this project was dated 2003.

There are alternatives (of various quality) to both these two XML 
modules for Lisp; eg. CL-XML, xmsns, Allegro's SAX XML parser, a Lisp 
wrapper around the SAX-like expat parser, Closure XML, XMLisp, ... None 
of these appear (at a cursory glance) to provide the LXML output format 
that we currently get from our XML module.

What do people think? The bottom line is that I need a good XML module 
that will run under SBCL.

- Ben



More information about the developers mailing list