[lkb] SPPP
Ben Waldron
bmw20 at cl.cam.ac.uk
Mon Jul 31 19:20:39 CEST 2006
Hi Emily-
SPPP is not the only way to pass ambiguous input to the LKB. You can use
SMAF XML input for this purpose (amongst others). See SmafTop on the
wiki. Sample SMAF looks something like this:
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE smaf SYSTEM 'smaf.dtd'>
<smaf>
<text>The dog barks.</text>
<olac:olac xmlns:olac='http://www.language-archives.org/OLAC/1.0/'
xmlns='http://purl.org/dc/elements/1.1/'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:schemaLocation='http://www.language-archives.org/OLAC/1.0/
http://www.language-archives.org/OLAC/1.0/olac.xsd'>
<dc:identifier>s3</dc:identifier>
<creator>x-preprocessor 1.00</creator>
<created>17:05:49 7/31/2006 (UTC)</created>
</olac:olac>
<lattice init='v0' final='v3' cfrom='0' cto='14'>
<edge type='token' id='t1' cfrom='0' cto='3' source='v0'
target='v1'>The</edge>
<edge type='token' id='t2' cfrom='4' cto='7' source='v1'
target='v2'>dog</edge>
<edge type='token' id='t3' cfrom='8' cto='14' source='v2'
target='v3'>barks.</edge>
</lattice>
</smaf>
You can parse such XML input directly inside the LKB via eg.
LKB(7): (parse "<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE smaf
SYSTEM 'smaf.dtd'><smaf>...</smaf>")
or SMAF XML can be sent to the LKB via a socket (you get back XML
containing RMRS analyses):
LKB(7): (run-parse-server)
;;; [entering PARSE server mode]
;;; starting server on port 9876
;;; waiting for connection
;;; [end each input segment with CONTROL-Q]
- Ben
Emily M. Bender wrote:
> Hi,
>
> Some colleagues and I are trying to figure out SPPP, in the context of
> hooking up a morphological analyzer for Hebrew (U. Haifa/Shuly
> Wintner) to a Hebrew grammar based on the Matrix (UW/Margalit
> Zabludowski). We would like to preserve the ambiguity found by the
> morphological analyzer and let the (syntactic) grammar disambiguate as
> it can. Thus, SPPP looks like it might be the right way to go.
>
> However, from the wiki page (LkbSppp), it seems like SPPP is expecting
> the external binary to be able to produce output that corresponds to
> LKB rules rather than (say) a lattice of segmentations of the input
> form. Is the general idea to take such a form-based lattice and then
> (with another, still external, script) change them to SPPP XML with
> the LKB rule identifiers projected from the forms themselves? Does
> anyone have any such XML-generating code lying around that we could
> work from? Is SPPP the only way to pass ambiguous morphological output
> to the LKB?
>
> Thanks,
> Emily
>
>
>
>
>
>
More information about the lkb
mailing list