[developers] [itsdb] parse big corpus with itsdb

Thu Nov 16 15:40:31 CET 2006

Hi Tim!

On 11/13/06, Timothy Baldwin <tim at csse.unimelb.edu.au> wrote:
> But, you raise an interesting point, Emily, about the correlation between the
> probabilities generated by the parse selection models and the validity of the
> parses, which got me athinking: the way normalisation works in maxent models,
> the larger the number of (inactive) parses, and hence the smaller the
> individual probabilities. Clearly if we standardise the number of parses
> somehow, or alternatively come up with some way of scaling the probabilities
> relative to the number of parses, this isn't a problem. These both strike me
> as hacks, however, and I wonder if the *un*normalised weights from the parse
> selection models would be a better figure of merit. Has anyone looked at this?

For end-to-end ranking in LOGON, we often need to compare the scores
of target strings that have been generated from different MRSs (as
produced by transfer). This is a bit tricky since it involves
comparing probabilities of outputs that are conditioned on distinct
inputs. As you point out, the normalization in a conditional maxent
model means that the "size" of a probability is relative to the number
of other competing candidates for the same input. As a temporary
solution in LOGON, we have therefore been using the non-normalized
scores directly, just as you're suggesting (and then combining these
with the corresponding scores from analysis and transfer). But we're
still scratching our heads over this..

Cheers,
-erik