[developers] revisions to [incr tsdb()] database schema

Stephan Oepen oe at ifi.uio.no
Mon Aug 15 22:30:48 CEST 2011


i just checked in a revision to [incr tsdb()] where i extended the
database schema,
specifically adding two new fields to the 'parse' relation.  this
change will affect
those among you maintaining their own [incr tsdb()] skeletons (at least the SRG
and JACY, for all i know; but probably the Matrix developers too).
[incr tsdb()] will
always read older profile formats, but it will typically refuse to
write to a profile (i.e.
batch parse, generate, or translate) that does not conform to the
latest revision of
the database schema.

hence, you will need to upgrade your skeletons.  in a nutshell, please
grab a copy
of the current database schema:


and use the above to replace the file 'relations' in all your
'private' skeletons (but
not existing full profiles or treebanks).  for JACY, for example,
something like the
foillowing should do the trick:

  cd $LOGONROOT/dfki/jacy/tsdb/skeletons
  wget -N http://svn.emmtee.net/trunk/lingo/lkb/src/tsdb/skeletons/english/Relations
  for i in $(find . -name relations); do \cp ./Relations $i; done

the new database schema has a size of 9067 bytes.

in principle, the above upgrade should be straightforward, as this
more recent change
in the [incr tsdb()] database schema only affects the 'parse'
relation, which will never
be present in a skeleton.  however, for some of you the situation may
be a bit more
complex: in mid-2010 i had revised the 'item' relation, and in case
your skeletons do
not yet reflect that revision, a little more labor on your side may be needed.

the current 'item' relation should have 15 fields.  for a particular
skeleton, you could
check the number you have using a command like the following:

  awk -F@ '{print NF}' item | sort -nu

in case the above, when applied to your skeletons, returns a smaller
number (most
likely that would be 12), it will be necessary to 'pad' all 'item'
files in your skeletons
with three additional empty fields, immediately following the
'i-input' field.  the fields
i added last year are 'i-tokens' (string), 'i-gloss' (string), and
'i-translation' (string).  in
case you have code to generate [incr tsdb()] skeletons, please update
the software

finally, revisions to the [incr tsdb()] database schema will typically
cause trouble for
users of the '-tsdbdump' option in PET (which writes [incr
tsdb()]-like profiles directly
from the PET parser).  personally, i recommend against use of this
facility wherever
possible (and encourage running PET as an [incr tsdb()] client
instead).  i depend on
other PET developers to maintain the '-tsdbdump' option, hence it may
take a short
while before the '-tsdbdump' option can be made compatible with the
latest database

as to availability of these changes, they are effective immediately in the LOGON
tree.  please 'make update' at your earliest possible convenience, and then make
sure to pick up the current 'relations' file from the URL above---in
case you actually
have [incr tsdb()] skeletons that you maintain locally (or as part of
a grammar).  as
for the non-LOGON universe, i am in the process of migrating all changes to the
[incr tsdb()] trunk in the DELPH-IN repository, and will then ask
david brodbeck (at
UW, our build master for the so-called LinGO builds) to generate a
fresh archive of
the LKB and [incr tsdb()] for standalone installations.

--- the above is not nearly as complex as it must sound :-).  please
let me know in
case you need assistance!

best wishes, oe

+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++    --- oe at ifi.uio.no; stephan at oepen.net; http://www.emmtee.net/oe/ ---

More information about the developers mailing list