[itsdb] parse big corpus with itsdb
Berthold Crysmann
crysmann at dfki.de
Wed Nov 8 11:47:27 CET 2006
On Wed, 2006-11-08 at 10:57 +0100, Yi Zhang wrote:
> Hi,
>
> Here are my understanding of the options:
>
>
>
>
> -results sits in the output routine and stops it printing all
> the
> results. They are still all calculated.
>
> I think that's right.
>
>
> -nsolutions asks cheap to only produce the top "n" parses.
>
> Due to the use of ambiguity packing, the parsing is splitted into two
> phases: i) packed parse forest creation; ii) unpacking the readings.
> `-nsolutions' can have effect in both phases.
>
> In the first phase, if `-nsolutions' is set to be non-zero, the forest
> creation phase will stop when the `first' n (with kind of beam search
> i think) packed trees are found. If `-nsolutions' is not set or set to
> be zero, the entire packed parse forest will be created.
>
> In the unpacking phase, the effect depends on the unpacking mechanism
> used:
> - if `packing=7' (which is the default exhaustive unpacking) is used,
> all the readings will be unpacked (with lots of unification operations
> replayed), and sorted according to the scoring model. `-nsolutions'
> won't have any effect on this phase. So you might finally get more
> readings than `-nsolutions'.
> - if `packing=15' (selective unpacking) is used, only the best n
> readings will be unpacked from the parse forest. But note that
> `-nsolutions' must be set to >0, otherwise the parser will fall back
> into exhaustive unpacking like `-packing=7'. Current implementation
> supports the basic branching and grand-parenting (with arbitrary
> number of levels) features in the scoring model.
>
Grandparenting in Pet sounds like a great improvement. What is still
missing, as compared to lkb/tsdb++? Ngrams?
Thanks for the description.
B
> I also think the use of `-nsolutions' is particularly vague at the
> moment. I believe this is partly due to the split of the parsing
> phases. To PET developers, should the option be splitted for
> particular phases of parsing?
>
> Stephan and Bernd, please correct me if I am wrong :-)
>
> Best,
> yi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/itsdb/attachments/20061108/c01b9fb3/attachment.html>
More information about the itsdb
mailing list