[developers] Partial Parsing

Sat Dec 10 15:31:43 CET 2011

hi francis,

if you were to use -robust mode (which i believe is quite experimental), you would need to load a PCFG using the ‘pcfg’ setting (the ‘gm’ setting only applied to chart pruning).

however, rather than trying to stitch together fragments procedurally, i'm wondering whether you wouldn't be better off using explicit robustness rules, as suggested by Cramer (2011)?  this would presuppose you use chart pruning, i think, but for all i know no special code or additional heuristics would be needed, and you can experiment very flexibly.  this approach gives the grammarian great power (and great responsibility :-).  it would also avoid the issue of variable duplicates pointed out by yi.

best, oe

On 10. des. 2011, at 08:57, Francis Bond <bond at ieee.org> wrote:

> Thanks for the useful explanation.
> 
> 2011/12/9 Yi Zhang <yizhang at dfki.de>:
>> hi Francis,
>> 
>> as various robust/partial parsing methods seem to create a lot of confusions over the years, let me try to explain the currently available options in this mail, for i think i'm guilty for creating most of the confusion without proper documentation of the functionalities.
>> 
>>> (a) what is the recommended way of calling this.  My best guess is
>>> 'cheap -robust=1 -mrs -partial japanese'  i am loading a generative
>>> model "gm := "tc.gm"." I am not sure if this model is the right one
>>> for robust=1.
>>> 
>> first, -partial and -robust are two different ways of building partial analyses in cases where a full hpsg analysis is not available. -partial tries to paste together fragmented analyses without trying to build a complete derivation tree. -robust on the other hand, consults a PCFG to build a complete pseudo derivation according to the probabilistic model. therefore -partial and -robust options should NOT be used together.
> 
> When I tried just "cheap -robust=1 -mrs japanese" I got no output for
> my ungrammatical sentence.   Is that the expected behaviour?
> 
>> 
>>> (b) When we turn partial and MRS on, what exactly gets output?  I was
>>> expecting MRSs from edges that combine to form a spanning non-parse,
>>> but it seems that we get more than that (e.g. some words appear more
>>> than once in the output).  I append an example to this mail.  Is this
>>> the expected result?  Is there any combinations of settings that will
>>> get a single spanning non-parse (I am sure there is a better name for
>>> it).
>>> 
>> the current implementation of the -partial mode in the main branch of the PET implements a shortest-path algorithm which finds ALL paths (of fragments connecting the complete sentence) with the equal weights. then each fragment of every path goes through MRS extraction to build fragmented MRSes. since ALL paths are returned, having duplicated words for certain spanning position will happen frequently.
>> 
>> as alternative to this shortest-path algorithm, in our (Zhang et al. 2007b) paper (ACL workshop on deep linguistic processing), we discussed various ways of getting one best partial parse (path). several models were implemented then, but was not merged back into the main for the lack of feedback from actual usage of such mode. much later, in April 2010, i added a simplified version of such algorithm for Bart's experiment into then the `nep' branch of PET (r704). this was not picked up later when the merge happened.
>> 
>> i suppose it is still trivial to apply the changes of revision r704 to the main branch (with some minor modification), then you will get non-duplicated partial parsing results by just using -mrs -partial options. moreover, if we can get a feeling about the actual usage of the -partial mode, we can probably change the default behavior to, say, only print one fragment-path instead of all of them?
> 
> That would be very much the desired behaviour for Petter, Lea and I.
> 
>>> (c) has anyone anywhere done something that stitches the MRS fragments
>>> together?  For our purposes almost anything would do (e.g.
>>> lump(e,h1,h2)) with as many lump_rels as needed to fix the bits
>>> together, ...  this would allow us to pass the fragments to the MT
>>> module and get candidate fragment translations for alignment.
>>> 
>> one thing to be careful about when stitching the MRS fragments is that, strictly speaking each fragment has it's own variable namespace, i.e. variables with the same name in different fragments are not meant to be coindexed.
> 
> Thanks for pointing that out.
> 
>> i hope this explanation helped.
> 
> It did indeed.
> 
> -- 
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>