[developers] [itsdb] parse big corpus with itsdb

Yi Zhang yzhang at coli.uni-sb.de
Thu Nov 9 09:40:15 CET 2006


hi,

can you be more specific about the phenomena (i.e. the sentences you are
parsing)? i guess it can be the case that `nsolutions=1' caused the early
escape in the parse forest creation phase. and some (best) readings can be
missing from that on. `-packing=15' differs from `-packing=7' only in the
unpacking phase. i doubt if the forest creation phase (with ambi. packing)
promises to be best-first. if not, then the best reading could be missing
before unpacking. PET developers, please correct me if i am wrong on this.

as i mentioned in my previous message, i would suggest `-nsolutions' to be
separated into two options of PET (one for forest creation, one for
selective unpacking) to make things less confusing. since this issue has
more to do with the pet development than itsdb, i'm cc-ing this to the
delph-in developers list.


best,
yi




On 11/9/06, David Martinez <davidm at csse.unimelb.edu.au> wrote:
>
>
>         Hi,
>
>         I have been playing with the parameters "nsolutions", "results",
> and "packing" and I can't get the output I need: a single export tree for
> each parsed sentence.
>         When using nsolutions=1 and packing=15 in most cases I obtain a
> set of "inactive" trees from the sentence. Is there some way of choosing
> the best candidate?
>         Thank you,
>
>         David
>
>
> On Wed, 8 Nov 2006, Berthold Crysmann wrote:
>
> > On Wed, 2006-11-08 at 10:57 +0100, Yi Zhang wrote:
> >> Hi,
> >>
> >>  Here are my understanding of the options:
> >>
> >>
> >>
> >>
> >>         -results sits in the output routine and stops it printing all
> >>         the
> >>         results.  They are still all calculated.
> >>
> >> I think that's right.
> >>
> >>
> >>         -nsolutions asks cheap to only produce the top "n" parses.
> >>
> >> Due to the use of ambiguity packing, the parsing is splitted into two
> >> phases: i) packed parse forest creation; ii) unpacking the readings.
> >> `-nsolutions' can have effect in both phases.
> >>
> >> In the first phase, if `-nsolutions' is set to be non-zero, the forest
> >> creation phase will stop when the `first' n (with kind of beam search
> >> i think) packed trees are found. If `-nsolutions' is not set or set to
> >> be zero, the entire packed parse forest will be created.
> >>
> >> In the unpacking phase, the effect depends on the unpacking mechanism
> >> used:
> >> - if `packing=7' (which is the default exhaustive unpacking) is used,
> >> all the readings will be unpacked (with lots of unification operations
> >> replayed), and sorted according to the scoring model. `-nsolutions'
> >> won't have any effect on this phase. So you might finally get more
> >> readings than `-nsolutions'.
> >
> >
> >
> >> - if `packing=15' (selective unpacking) is used, only the best n
> >> readings will be unpacked from the parse forest. But note that
> >> `-nsolutions' must be set to >0, otherwise the parser will fall back
> >> into exhaustive unpacking like `-packing=7'. Current implementation
> >> supports the basic branching and grand-parenting (with arbitrary
> >> number of levels) features in the scoring model.
> >>
> >
> > Grandparenting in Pet sounds like a great improvement. What is still
> > missing, as compared to lkb/tsdb++? Ngrams?
> >
> > Thanks for the description.
> >
> >
> > B
> >
> >> I also think the use of `-nsolutions' is particularly vague at the
> >> moment. I believe this is partly due to the split of the parsing
> >> phases. To PET developers, should the option be splitted for
> >> particular phases of parsing?
> >>
> >> Stephan and Bernd, please correct me if I am wrong :-)
> >>
> >> Best,
> >> yi
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20061109/c43c4de1/attachment.html>


More information about the developers mailing list