[developers] SBCL port: XML
Ben Waldron
bmw20 at cl.cam.ac.uk
Mon Oct 23 20:29:46 CEST 2006
Ann Copestake wrote:
> what do you mean by making the swap `official' - remove pxml or include both?
>
I suggest removing pxml and replacing it with S-XML, as (see below) it
is more efficient, supports Lisps other than Allegro, provides the same
LXML output (modulo a couple of minor points) as the pxml code (meaning
no large changes needed to our source code), and provides adequate XML
conformance.
> have you evaluated relative speed? is it tested on any of the standard test
> suites? how actively is it supported?
>
In terms of relative speed, S-XML runs an order of magnitude faster.
There is a similar improvement in terms of memory usage.
The S-XML API (and code) appears cleaner than that for pxml.
That said, the level of XML support is less complete. Both pxml and
S-XML are non-validating. S-XML also ignores DTD's (meaning no XML
entity support) and doesn't support special tags such as processing
instructions. For example, the OASIS xml test files consist of: 98
positive tests for which pxml gives 3 false negatives, and S-XML 10; 249
negative tests for which pxml gives 18 false positives, and S-XML 147.
On James Clark's mxl test files: pxml gives 1 false negative and 16
false positives (despite what it says in the documentation); S-XML gives
37 false negatives and 121 false positives. The cases where S-XML fails
to accept well-formed XML (false negatives) are due to: ignoring XML
entities (intended; 31 cases), failure to handle processing instructions
(3 cases), failure to handle whitespace within a closing tag (apparently
this is acceptable XML; 2 cases), and failure to handle non-ASCII tag
names (1 case).
As far as the LKB is concerned, I think the level of XML conformance
supplied by S-XML is adequate. The LKB needs an XML reader (and,
arguably, writer) in order to implement interfaces to the outside world;
currently these interfaces consist of the SMAF preprocessor interface,
an (R)MRS XML interface, and the SPPP preprocessor interface. We also
require the ability to process input derived from a general XML text.
But this functionality, arguably, does not belong in the LKB proper; it
should be done by a separate module which talks to the LKB via the SMAF
interface (eg. as in the SciBorg project at Cambridge).
The effect of swapping pxml with S-XML on the CSLI test suite: identical
parse results, speed not significantly altered (0.8% improvement),
reduction in memory usage.
So far as support is concerned, S-XML is an open source project with a
low-volume mailing list. Email sent to the list appears to get a helpful
response. The last update to the source code was earlier this year. For
pxml, support presumably comes from Franz. According to the Franz
website, the last activity on this project was dated 2003.
There are alternatives (of various quality) to both these two XML
modules for Lisp; eg. CL-XML, xmsns, Allegro's SAX XML parser, a Lisp
wrapper around the SAX-like expat parser, Closure XML, XMLisp, ... None
of these appear (at a cursory glance) to provide the LXML output format
that we currently get from our XML module.
What do people think? The bottom line is that I need a good XML module
that will run under SBCL.
- Ben
More information about the developers
mailing list