[developers] Partial Parsing

Sat Dec 10 08:57:45 CET 2011

Thanks for the useful explanation.

2011/12/9 Yi Zhang <yizhang at dfki.de>:
> hi Francis,
>
> as various robust/partial parsing methods seem to create a lot of confusions over the years, let me try to explain the currently available options in this mail, for i think i'm guilty for creating most of the confusion without proper documentation of the functionalities.
>
>> (a) what is the recommended way of calling this.  My best guess is
>> 'cheap -robust=1 -mrs -partial japanese'  i am loading a generative
>> model "gm := "tc.gm"." I am not sure if this model is the right one
>> for robust=1.
>>
> first, -partial and -robust are two different ways of building partial analyses in cases where a full hpsg analysis is not available. -partial tries to paste together fragmented analyses without trying to build a complete derivation tree. -robust on the other hand, consults a PCFG to build a complete pseudo derivation according to the probabilistic model. therefore -partial and -robust options should NOT be used together.

When I tried just "cheap -robust=1 -mrs japanese" I got no output for
my ungrammatical sentence.   Is that the expected behaviour?

>
>> (b) When we turn partial and MRS on, what exactly gets output?  I was
>> expecting MRSs from edges that combine to form a spanning non-parse,
>> but it seems that we get more than that (e.g. some words appear more
>> than once in the output).  I append an example to this mail.  Is this
>> the expected result?  Is there any combinations of settings that will
>> get a single spanning non-parse (I am sure there is a better name for
>> it).
>>
> the current implementation of the -partial mode in the main branch of the PET implements a shortest-path algorithm which finds ALL paths (of fragments connecting the complete sentence) with the equal weights. then each fragment of every path goes through MRS extraction to build fragmented MRSes. since ALL paths are returned, having duplicated words for certain spanning position will happen frequently.
>
> as alternative to this shortest-path algorithm, in our (Zhang et al. 2007b) paper (ACL workshop on deep linguistic processing), we discussed various ways of getting one best partial parse (path). several models were implemented then, but was not merged back into the main for the lack of feedback from actual usage of such mode. much later, in April 2010, i added a simplified version of such algorithm for Bart's experiment into then the `nep' branch of PET (r704). this was not picked up later when the merge happened.
>
> i suppose it is still trivial to apply the changes of revision r704 to the main branch (with some minor modification), then you will get non-duplicated partial parsing results by just using -mrs -partial options. moreover, if we can get a feeling about the actual usage of the -partial mode, we can probably change the default behavior to, say, only print one fragment-path instead of all of them?

That would be very much the desired behaviour for Petter, Lea and I.

>> (c) has anyone anywhere done something that stitches the MRS fragments
>> together?  For our purposes almost anything would do (e.g.
>> lump(e,h1,h2)) with as many lump_rels as needed to fix the bits
>> together, ...  this would allow us to pass the fragments to the MT
>> module and get candidate fragment translations for alignment.
>>
> one thing to be careful about when stitching the MRS fragments is that, strictly speaking each fragment has it's own variable namespace, i.e. variables with the same name in different fragments are not meant to be coindexed.

Thanks for pointing that out.

> i hope this explanation helped.

It did indeed.

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University