[itsdb] parse big corpus with itsdb
    Berthold Crysmann 
    crysmann at dfki.de
       
    Wed Nov  8 11:47:27 CET 2006
    
    
  
On Wed, 2006-11-08 at 10:57 +0100, Yi Zhang wrote:
> Hi,
> 
>  Here are my understanding of the options:
> 
> 
> 
> 
>         -results sits in the output routine and stops it printing all
>         the 
>         results.  They are still all calculated.
>         
> I think that's right.
> 
> 
>         -nsolutions asks cheap to only produce the top "n" parses. 
> 
> Due to the use of ambiguity packing, the parsing is splitted into two
> phases: i) packed parse forest creation; ii) unpacking the readings.
> `-nsolutions' can have effect in both phases. 
> 
> In the first phase, if `-nsolutions' is set to be non-zero, the forest
> creation phase will stop when the `first' n (with kind of beam search
> i think) packed trees are found. If `-nsolutions' is not set or set to
> be zero, the entire packed parse forest will be created. 
> 
> In the unpacking phase, the effect depends on the unpacking mechanism
> used:
> - if `packing=7' (which is the default exhaustive unpacking) is used,
> all the readings will be unpacked (with lots of unification operations
> replayed), and sorted according to the scoring model. `-nsolutions'
> won't have any effect on this phase. So you might finally get more
> readings than `-nsolutions'. 
> - if `packing=15' (selective unpacking) is used, only the best n
> readings will be unpacked from the parse forest. But note that
> `-nsolutions' must be set to >0, otherwise the parser will fall back
> into exhaustive unpacking like `-packing=7'. Current implementation
> supports the basic branching and grand-parenting (with arbitrary
> number of levels) features in the scoring model. 
> 
Grandparenting in Pet sounds like a great improvement. What is still
missing, as compared to lkb/tsdb++? Ngrams?
Thanks for the description. 
B
> I also think the use of `-nsolutions' is particularly vague at the
> moment. I believe this is partly due to the split of the parsing
> phases. To PET developers, should the option be splitted for
> particular phases of parsing? 
> 
> Stephan and Bernd, please correct me if I am wrong :-)
> 
> Best,
> yi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/itsdb/attachments/20061108/c01b9fb3/attachment.html>
    
    
More information about the itsdb
mailing list