[lkb] SPPP

Ben Waldron bmw20 at cl.cam.ac.uk
Mon Jul 31 19:20:39 CEST 2006


Hi Emily-

SPPP is not the only way to pass ambiguous input to the LKB. You can use 
SMAF XML input for this purpose (amongst others). See SmafTop on the 
wiki. Sample SMAF looks something like this:

<?xml version='1.0' encoding='UTF-8'?>
 <!DOCTYPE smaf SYSTEM 'smaf.dtd'>
 <smaf>
  <text>The dog barks.</text>
  <olac:olac xmlns:olac='http://www.language-archives.org/OLAC/1.0/' 
xmlns='http://purl.org/dc/elements/1.1/' 
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' 
xsi:schemaLocation='http://www.language-archives.org/OLAC/1.0/ 
http://www.language-archives.org/OLAC/1.0/olac.xsd'>
   <dc:identifier>s3</dc:identifier>
   <creator>x-preprocessor 1.00</creator>
   <created>17:05:49 7/31/2006 (UTC)</created>
  </olac:olac>
  <lattice init='v0' final='v3' cfrom='0' cto='14'>
   <edge type='token' id='t1' cfrom='0' cto='3' source='v0' 
target='v1'>The</edge>
   <edge type='token' id='t2' cfrom='4' cto='7' source='v1' 
target='v2'>dog</edge>
   <edge type='token' id='t3' cfrom='8' cto='14' source='v2' 
target='v3'>barks.</edge>
  </lattice>
 </smaf>

You can parse such XML input directly inside the LKB via eg.

LKB(7): (parse "<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE smaf 
SYSTEM 'smaf.dtd'><smaf>...</smaf>")

or SMAF XML can be sent to the LKB via a socket (you get back XML 
containing RMRS analyses):

LKB(7): (run-parse-server)
;;; [entering PARSE server mode]
;;; starting server on port 9876
;;; waiting for connection
;;; [end each input segment with CONTROL-Q]

- Ben

Emily M. Bender wrote:
> Hi,
>
> Some colleagues and I are trying to figure out SPPP, in the context of
> hooking up a morphological analyzer for Hebrew (U. Haifa/Shuly
> Wintner) to a Hebrew grammar based on the Matrix (UW/Margalit
> Zabludowski).  We would like to preserve the ambiguity found by the
> morphological analyzer and let the (syntactic) grammar disambiguate as
> it can.  Thus, SPPP looks like it might be the right way to go.
>
> However, from the wiki page (LkbSppp), it seems like SPPP is expecting
> the external binary to be able to produce output that corresponds to
> LKB rules rather than (say) a lattice of segmentations of the input
> form.  Is the general idea to take such a form-based lattice and then
> (with another, still external, script) change them to SPPP XML with
> the LKB rule identifiers projected from the forms themselves?  Does
> anyone have any such XML-generating code lying around that we could
> work from?  Is SPPP the only way to pass ambiguous morphological output
> to the LKB?
>
> Thanks,
> Emily
>
>
>
>
>  
>   




More information about the lkb mailing list