[developers] Documenting variable name conventions in Logon Modeling

W.P. McNeill (UW) billmcn at u.washington.edu
Wed Oct 14 00:37:38 CEST 2009


There are some confusing variable name conventions in the
LogonModeling<http://wiki.delph-in.net/moin/LogonModeling>system that
I am trying to document.  If people can give me help sorting
through the naming conventions I'll make sure it gets documented on the Wiki
in a clear fashion.

When you want to use the Logon system to do parse or generation ranking
experiments, you specify your experiment features as lisp variables in a
feature grid file that looks like this:

(in-package :tsdb)

(load "parsing.lisp")

(batch-experiment
 :source "jhpstg" :skeleton "jhpstg"
 :nfold 10 :niterations 2 :type :mem
 :prefix "jhpstg"
 :score-similarities nil
 :grandparenting '(0 2 3 4)
 :active-edges-p '(nil t)
 :lexicalization-p nil
 :constituent-weight '(1 2 0)
 :ngram-size '(0 2 3 4) :ngram-back-off-p '(nil t)
 :lm-p nil
 :random-sample-size nil
 :counts-absolute 0 :counts-contexts 0 :counts-events 0 :counts-relevant 1
 :variance '(nil 1e4 1e2 1e0 1e-2 1e-4 1e-6)
 :relative-tolerance '(1e-6 1e-8 1e-10))

For each combination of variables, the Logon system produces a TSDB profile
in a directory with a filename that looks like this:

[jhpstg] GP[3] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[:::1] RS[]
MM[tao_lmvm] MI[5000] RT[1.0e-8] AT[1.0e-20] VA[1.0e-4] PC[100]

Most of the variables in the lisp file map to items in the output profile
filename and vice versa, but the mappings can be obscure.  The relevant
functions that perform the mappings appear to
be feature-environment, mem-environment, and svm-environment in
learner.lisp, which unfortunately are not documented.

By working backwards from the sample feature grid lisp files and reading the
source code, I've put together this table of correspondences:
http://spreadsheets.google.com/pub?key=t7uBUaLi1Y6w5wYWmF_YrDg&output=html.
 Can people help me fill it out completely?

This spreadsheet uses the full directory names, not the compact directory
names.  It also doesn't list the SVM parameters.  The Source column
indicates the place where the experimenter specifies the parameter value.
 In this column, "grid file" means that the parameter is specified with a
lisp variable in the feature grid file with the name that appears in the
Lisp Parameter column.

Specific questions:

   1. Can you specify the feature parameters use-preterminal-types-p
   ngram-tag in the features grid file?  (From the source it would appear so,
   but they don't show up in the sample feature grid lisp files.)
   2. How do you specify absolute-tolerance and redwoods-train-percentage?
    (The source code lists these as mem-environment parameters, and it's
   unclear whether these are specified differently than the features.)
   3. What is the deal with the FT[:::] in the profile filenames?  I find
   this completely cryptic, and the source code seems to indicate that it
   contains the lm-p value twice.

Thanks.

-- 
W.P. McNeill
http://staff.washington.edu/billmcn/index.shtml
Sent from Seattle, WA, United States
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20091013/6d673ffd/attachment.html>


More information about the developers mailing list