[developers] pyDelphin / [incr tsdb()] question

Emily M. Bender ebender at uw.edu
Thu Apr 11 19:25:30 CEST 2019

Thanks, Stephan! That explains why I have a vague memory of it not
mattering whether the number in that field is actually accurate...


On Thu, Apr 11, 2019 at 9:53 AM Stephan Oepen <oe at ifi.uio.no> wrote:

> hi emily and mike,
> the [incr tsdb()] import facilities support some mixed-content document
> formats, notably the un*x mbox format (you can guess when that was useful
> functionality).  to represent all data in the profile while not pretending
> that there is linguistic content (worth sending to the parser) in email
> headers, the corresponding items are marked as i-length = -1 (or maybe 0,
> not quite sure).  this is the reason for the ‘Process | ...’ commands to
> require a non-zero, positive length ... in other words a reassurance that
> there actually is linguistic content in the item.  in this regard, i-length
> (like i-id, i-input, and possibly i-wf as well) is a mandatory field in the
> item relation.
> best wishes, oe
> On Thu, 11 Apr 2019 at 05:39 goodman.m.w at gmail.com <goodman.m.w at gmail.com>
> wrote:
>> Hi Emily,
>> By changing the i-length field from -1 to 6 in the two items, I get [incr
>> tsdb()] to say the following:
>>     retrieve(): found 2 items (0 output specifications).
>> So I suspect that is the problem. Since i-length is just the number of
>> tokens I find it odd that [incr tsdb()] would rely on that instead of just
>> counting tokens. Processing with art or PyDelphin and ACE works fine
>> regardless of the value of i-length.
>> I would modify Xigt's exporter to insert the number of tokens in the
>> i-length field. E.g., at the end of export_igt() put something like this
>> before the return statement:
>>         row['i-length'] = len(row['i-input'].split())
>> On Thu, Apr 11, 2019 at 11:06 AM Emily M. Bender <ebender at uw.edu> wrote:
>>> Dear colleagues,
>>> I've run into a mysterious issue with the item files I've generated from
>>> Xigt corpora, using an export script that in turn relies on pyDelphin: When
>>> I use them to create [incr tsdb()] skeletons, I end up with profiles that
>>> are partially but not fully functional: Browse | Items works, for example,
>>> but when I try Process | All items or try clicking on an individual item to
>>> process, I get just this message in the emacs buffer:
>>> retrieve(): found 0 items (0 output specifications)
>>> I suspect that this is because there's something missing or odd about
>>> the item files, but comparing by hand to one that works (generated by
>>> Import | Test items from within the [incr tsdb()] podium), I can't spot the
>>> difference.
>>> I've attached a small, two-line example that should reproduce the
>>> problem, in case anyone has a moment to take a look. (I'm not sending a
>>> grammar for this language, since the behavior can be observed with any
>>> grammar loaded into the LKB --- the point of failure is before any parsing
>>> happens.)
>>> Thanks!
>>> Emily
>>> --
>>> Emily M. Bender
>>> Professor, Department of Linguistics
>>> University of Washington
>>> Twitter: @emilymbender
>> --
>> -Michael Wayne Goodman

Emily M. Bender
Professor, Department of Linguistics
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20190411/31ec53df/attachment.html>

More information about the developers mailing list