[developers] Partial Parsing

Sat Dec 10 16:28:39 CET 2011

it's been a little while i read it, but i believe there is a fair
amount of technical detail in bart's dissertation, which is
available on-line:

  http://www.coli.uni-saarland.de/~bcramer/dissertation_cramer.pdf

bart obtained encouraging results.  i would be curious to
see how far you can push this.  but to get started, would
it not be relatively straightforward to add just one robust
rule, minimally constrain its daughers (to avoid spurious
ambiguity in robust trees, e.g. force strict left-branching),
and give it a high penalty (this is the part i would not know
immediately how to do; the brute-force approch would be
to add pseudo-features to both the PCFG and MEM, with
all combinations of the robust rule and other rules in its
RHS; but i believe there is a special setting for this).  in
essence, this would seem like a pretty good emulation
of the shortest path fragment heuristic, in the grammar.

for all i know, bart may still be lurking on the DELPH-IN
mailing lists and maybe willing to comment more?

cheers, oe

On Sat, Dec 10, 2011 at 15:50, Francis Bond <bond at ieee.org> wrote:
> G'day,
>
> On 10 December 2011 22:31, Stephan Oepen <oe at ifi.uio.no> wrote:
>> hi francis,
>>
>> if you were to use -robust mode (which i believe is quite experimental), you would need to load a PCFG using the ‘pcfg’ setting (the ‘gm’ setting only applied to chart pruning).
>
> Is it the same format as the 'gm' grammar?
>
>> however, rather than trying to stitch together fragments procedurally, i'm wondering whether you wouldn't be better off using explicit robustness rules, as suggested by Cramer (2011)?  this would presuppose you use chart pruning, i think, but for all i know no special code or additional heuristics would be needed, and you can experiment very flexibly.  this approach gives the grammarian great power (and great responsibility :-).  it would also avoid the issue of variable duplicates pointed out by yi.
>
> Thanks for the suggestion.  I  would be happy to try that, although I
> haven't found any documentation that leaves me confident enough to
> actually write robustness rules and set the robustness penalty 'c'.
>
>> best, oe
>>
>>
>>
>> On 10. des. 2011, at 08:57, Francis Bond <bond at ieee.org> wrote:
>>
>>> Thanks for the useful explanation.
>>>
>>> 2011/12/9 Yi Zhang <yizhang at dfki.de>:
>>>> hi Francis,
>>>>
>>>> as various robust/partial parsing methods seem to create a lot of confusions over the years, let me try to explain the currently available options in this mail, for i think i'm guilty for creating most of the confusion without proper documentation of the functionalities.
>>>>
>>>>> (a) what is the recommended way of calling this.  My best guess is
>>>>> 'cheap -robust=1 -mrs -partial japanese'  i am loading a generative
>>>>> model "gm := "tc.gm"." I am not sure if this model is the right one
>>>>> for robust=1.
>>>>>
>>>> first, -partial and -robust are two different ways of building partial analyses in cases where a full hpsg analysis is not available. -partial tries to paste together fragmented analyses without trying to build a complete derivation tree. -robust on the other hand, consults a PCFG to build a complete pseudo derivation according to the probabilistic model. therefore -partial and -robust options should NOT be used together.
>>>
>>> When I tried just "cheap -robust=1 -mrs japanese" I got no output for
>>> my ungrammatical sentence.   Is that the expected behaviour?
>>>
>>>>
>>>>> (b) When we turn partial and MRS on, what exactly gets output?  I was
>>>>> expecting MRSs from edges that combine to form a spanning non-parse,
>>>>> but it seems that we get more than that (e.g. some words appear more
>>>>> than once in the output).  I append an example to this mail.  Is this
>>>>> the expected result?  Is there any combinations of settings that will
>>>>> get a single spanning non-parse (I am sure there is a better name for
>>>>> it).
>>>>>
>>>> the current implementation of the -partial mode in the main branch of the PET implements a shortest-path algorithm which finds ALL paths (of fragments connecting the complete sentence) with the equal weights. then each fragment of every path goes through MRS extraction to build fragmented MRSes. since ALL paths are returned, having duplicated words for certain spanning position will happen frequently.
>>>>
>>>> as alternative to this shortest-path algorithm, in our (Zhang et al. 2007b) paper (ACL workshop on deep linguistic processing), we discussed various ways of getting one best partial parse (path). several models were implemented then, but was not merged back into the main for the lack of feedback from actual usage of such mode. much later, in April 2010, i added a simplified version of such algorithm for Bart's experiment into then the `nep' branch of PET (r704). this was not picked up later when the merge happened.
>>>>
>>>> i suppose it is still trivial to apply the changes of revision r704 to the main branch (with some minor modification), then you will get non-duplicated partial parsing results by just using -mrs -partial options. moreover, if we can get a feeling about the actual usage of the -partial mode, we can probably change the default behavior to, say, only print one fragment-path instead of all of them?
>>>
>>> That would be very much the desired behaviour for Petter, Lea and I.
>>>
>>>>> (c) has anyone anywhere done something that stitches the MRS fragments
>>>>> together?  For our purposes almost anything would do (e.g.
>>>>> lump(e,h1,h2)) with as many lump_rels as needed to fix the bits
>>>>> together, ...  this would allow us to pass the fragments to the MT
>>>>> module and get candidate fragment translations for alignment.
>>>>>
>>>> one thing to be careful about when stitching the MRS fragments is that, strictly speaking each fragment has it's own variable namespace, i.e. variables with the same name in different fragments are not meant to be coindexed.
>>>
>>> Thanks for pointing that out.
>>>
>>>> i hope this explanation helped.
>>>
>>> It did indeed.
>>>
>>> --
>>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>>> Division of Linguistics and Multilingual Studies
>>> Nanyang Technological University
>>>
>
>
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++    --- oe at ifi.uio.no; stephan at oepen.net; http://www.emmtee.net/oe/ ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++