[itsdb] parse big corpus with itsdb

Berthold Crysmann crysmann at dfki.de
Thu Nov 9 09:40:41 CET 2006


On Thu, 2006-11-09 at 17:39 +1100, David Martinez wrote:

>  	Hi,
> 
>  	I have been playing with the parameters "nsolutions", "results", 
> and "packing" and I can't get the output I need: a single export tree for 
> each parsed sentence.

I would guess you need -results=1, packing=15 and -nsolutions >>0 if you
need selective unpacking. Please correct me, if I'm wrong. 

Otherwise, -packing=7 -results=1 should always give you the globally
optimal candidate. That's what we've been using here so far. See also
Francis's synopsis. 

B


>  	When using nsolutions=1 and packing=15 in most cases I obtain a 
> set of "inactive" trees from the sentence. Is there some way of choosing 
> the best candidate?
>  	Thank you,
> 
>  	David
> 
> 
> On Wed, 8 Nov 2006, Berthold Crysmann wrote:
> 
> > On Wed, 2006-11-08 at 10:57 +0100, Yi Zhang wrote:
> >> Hi,
> >>
> >>  Here are my understanding of the options:
> >>
> >>
> >>
> >>
> >>         -results sits in the output routine and stops it printing all
> >>         the
> >>         results.  They are still all calculated.
> >>
> >> I think that's right.
> >>
> >>
> >>         -nsolutions asks cheap to only produce the top "n" parses.
> >>
> >> Due to the use of ambiguity packing, the parsing is splitted into two
> >> phases: i) packed parse forest creation; ii) unpacking the readings.
> >> `-nsolutions' can have effect in both phases.
> >>
> >> In the first phase, if `-nsolutions' is set to be non-zero, the forest
> >> creation phase will stop when the `first' n (with kind of beam search
> >> i think) packed trees are found. If `-nsolutions' is not set or set to
> >> be zero, the entire packed parse forest will be created.
> >>
> >> In the unpacking phase, the effect depends on the unpacking mechanism
> >> used:
> >> - if `packing=7' (which is the default exhaustive unpacking) is used,
> >> all the readings will be unpacked (with lots of unification operations
> >> replayed), and sorted according to the scoring model. `-nsolutions'
> >> won't have any effect on this phase. So you might finally get more
> >> readings than `-nsolutions'.
> >
> >
> >
> >> - if `packing=15' (selective unpacking) is used, only the best n
> >> readings will be unpacked from the parse forest. But note that
> >> `-nsolutions' must be set to >0, otherwise the parser will fall back
> >> into exhaustive unpacking like `-packing=7'. Current implementation
> >> supports the basic branching and grand-parenting (with arbitrary
> >> number of levels) features in the scoring model.
> >>
> >
> > Grandparenting in Pet sounds like a great improvement. What is still
> > missing, as compared to lkb/tsdb++? Ngrams?
> >
> > Thanks for the description.
> >
> >
> > B
> >
> >> I also think the use of `-nsolutions' is particularly vague at the
> >> moment. I believe this is partly due to the split of the parsing
> >> phases. To PET developers, should the option be splitted for
> >> particular phases of parsing?
> >>
> >> Stephan and Bernd, please correct me if I am wrong :-)
> >>
> >> Best,
> >> yi
> >
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/itsdb/attachments/20061109/608ccbc0/attachment.html>


More information about the itsdb mailing list