[developers] [itsdb] parse big corpus with itsdb

Erik Velldal e.velldal at gmail.com
Thu Nov 16 15:40:31 CET 2006

Hi Tim!

On 11/13/06, Timothy Baldwin <tim at csse.unimelb.edu.au> wrote:
> But, you raise an interesting point, Emily, about the correlation between the
> probabilities generated by the parse selection models and the validity of the
> parses, which got me athinking: the way normalisation works in maxent models,
> the larger the number of (inactive) parses, and hence the smaller the
> individual probabilities. Clearly if we standardise the number of parses
> somehow, or alternatively come up with some way of scaling the probabilities
> relative to the number of parses, this isn't a problem. These both strike me
> as hacks, however, and I wonder if the *un*normalised weights from the parse
> selection models would be a better figure of merit. Has anyone looked at this?

For end-to-end ranking in LOGON, we often need to compare the scores
of target strings that have been generated from different MRSs (as
produced by transfer). This is a bit tricky since it involves
comparing probabilities of outputs that are conditioned on distinct
inputs. As you point out, the normalization in a conditional maxent
model means that the "size" of a probability is relative to the number
of other competing candidates for the same input. As a temporary
solution in LOGON, we have therefore been using the non-normalized
scores directly, just as you're suggesting (and then combining these
with the corresponding scores from analysis and transfer). But we're
still scratching our heads over this..


More information about the developers mailing list