[itsdb] parse big corpus with itsdb

Berthold Crysmann crysmann at dfki.de
Wed Nov 8 11:47:27 CET 2006


On Wed, 2006-11-08 at 10:57 +0100, Yi Zhang wrote:
> Hi,
> 
>  Here are my understanding of the options:
> 
> 
> 
> 
>         -results sits in the output routine and stops it printing all
>         the 
>         results.  They are still all calculated.
>         
> I think that's right.
> 
> 
>         -nsolutions asks cheap to only produce the top "n" parses. 
> 
> Due to the use of ambiguity packing, the parsing is splitted into two
> phases: i) packed parse forest creation; ii) unpacking the readings.
> `-nsolutions' can have effect in both phases. 
> 
> In the first phase, if `-nsolutions' is set to be non-zero, the forest
> creation phase will stop when the `first' n (with kind of beam search
> i think) packed trees are found. If `-nsolutions' is not set or set to
> be zero, the entire packed parse forest will be created. 
> 
> In the unpacking phase, the effect depends on the unpacking mechanism
> used:
> - if `packing=7' (which is the default exhaustive unpacking) is used,
> all the readings will be unpacked (with lots of unification operations
> replayed), and sorted according to the scoring model. `-nsolutions'
> won't have any effect on this phase. So you might finally get more
> readings than `-nsolutions'. 



> - if `packing=15' (selective unpacking) is used, only the best n
> readings will be unpacked from the parse forest. But note that
> `-nsolutions' must be set to >0, otherwise the parser will fall back
> into exhaustive unpacking like `-packing=7'. Current implementation
> supports the basic branching and grand-parenting (with arbitrary
> number of levels) features in the scoring model. 
> 

Grandparenting in Pet sounds like a great improvement. What is still
missing, as compared to lkb/tsdb++? Ngrams?

Thanks for the description. 


B

> I also think the use of `-nsolutions' is particularly vague at the
> moment. I believe this is partly due to the split of the parsing
> phases. To PET developers, should the option be splitted for
> particular phases of parsing? 
> 
> Stephan and Bernd, please correct me if I am wrong :-)
> 
> Best,
> yi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/itsdb/attachments/20061108/c01b9fb3/attachment.html>


More information about the itsdb mailing list