[itsdb] parse big corpus with itsdb

Thu Nov 9 07:39:50 CET 2006

 	Hi,

 	I have been playing with the parameters "nsolutions", "results", 
and "packing" and I can't get the output I need: a single export tree for 
each parsed sentence.
 	When using nsolutions=1 and packing=15 in most cases I obtain a 
set of "inactive" trees from the sentence. Is there some way of choosing 
the best candidate?
 	Thank you,

 	David

On Wed, 8 Nov 2006, Berthold Crysmann wrote:

> On Wed, 2006-11-08 at 10:57 +0100, Yi Zhang wrote:
>> Hi,
>>
>>  Here are my understanding of the options:
>>
>>
>>
>>
>>         -results sits in the output routine and stops it printing all
>>         the
>>         results.  They are still all calculated.
>>
>> I think that's right.
>>
>>
>>         -nsolutions asks cheap to only produce the top "n" parses.
>>
>> Due to the use of ambiguity packing, the parsing is splitted into two
>> phases: i) packed parse forest creation; ii) unpacking the readings.
>> `-nsolutions' can have effect in both phases.
>>
>> In the first phase, if `-nsolutions' is set to be non-zero, the forest
>> creation phase will stop when the `first' n (with kind of beam search
>> i think) packed trees are found. If `-nsolutions' is not set or set to
>> be zero, the entire packed parse forest will be created.
>>
>> In the unpacking phase, the effect depends on the unpacking mechanism
>> used:
>> - if `packing=7' (which is the default exhaustive unpacking) is used,
>> all the readings will be unpacked (with lots of unification operations
>> replayed), and sorted according to the scoring model. `-nsolutions'
>> won't have any effect on this phase. So you might finally get more
>> readings than `-nsolutions'.
>
>
>
>> - if `packing=15' (selective unpacking) is used, only the best n
>> readings will be unpacked from the parse forest. But note that
>> `-nsolutions' must be set to >0, otherwise the parser will fall back
>> into exhaustive unpacking like `-packing=7'. Current implementation
>> supports the basic branching and grand-parenting (with arbitrary
>> number of levels) features in the scoring model.
>>
>
> Grandparenting in Pet sounds like a great improvement. What is still
> missing, as compared to lkb/tsdb++? Ngrams?
>
> Thanks for the description.
>
>
> B
>
>> I also think the use of `-nsolutions' is particularly vague at the
>> moment. I believe this is partly due to the split of the parsing
>> phases. To PET developers, should the option be splitted for
>> particular phases of parsing?
>>
>> Stephan and Bernd, please correct me if I am wrong :-)
>>
>> Best,
>> yi
>