[developers] [itsdb] parse big corpus with itsdb

Mon Nov 13 04:39:36 CET 2006

> > I'm curious in what scenarios folks want to do the `parseability' test.
> > Surely, for precision grammars, knowing that a sentence has some parse
> > is much less interesting than knowing that it has an appropriate parse?

> For automatic lexical acquistion, just trying to identify what strings cause
> soemthing not to parse at all can be very useful.

Van Noord (2004) is a nice example of this, e.g. 

But, you raise an interesting point, Emily, about the correlation between the
probabilities generated by the parse selection models and the validity of the
parses, which got me athinking: the way normalisation works in maxent models,
the larger the number of (inactive) parses, and hence the smaller the
individual probabilities. Clearly if we standardise the number of parses
somehow, or alternatively come up with some way of scaling the probabilities
relative to the number of parses, this isn't a problem. These both strike me
as hacks, however, and I wonder if the *un*normalised weights from the parse
selection models would be a better figure of merit. Has anyone looked at this?

Tim