[developers] pyDelphin / [incr tsdb()] question

Stephan Oepen oe at ifi.uio.no
Thu Apr 11 18:52:40 CEST 2019


hi emily and mike,

the [incr tsdb()] import facilities support some mixed-content document
formats, notably the un*x mbox format (you can guess when that was useful
functionality).  to represent all data in the profile while not pretending
that there is linguistic content (worth sending to the parser) in email
headers, the corresponding items are marked as i-length = -1 (or maybe 0,
not quite sure).  this is the reason for the ‘Process | ...’ commands to
require a non-zero, positive length ... in other words a reassurance that
there actually is linguistic content in the item.  in this regard, i-length
(like i-id, i-input, and possibly i-wf as well) is a mandatory field in the
item relation.

best wishes, oe


On Thu, 11 Apr 2019 at 05:39 goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> Hi Emily,
>
> By changing the i-length field from -1 to 6 in the two items, I get [incr
> tsdb()] to say the following:
>
>     retrieve(): found 2 items (0 output specifications).
>
> So I suspect that is the problem. Since i-length is just the number of
> tokens I find it odd that [incr tsdb()] would rely on that instead of just
> counting tokens. Processing with art or PyDelphin and ACE works fine
> regardless of the value of i-length.
>
> I would modify Xigt's exporter to insert the number of tokens in the
> i-length field. E.g., at the end of export_igt() put something like this
> before the return statement:
>
>         row['i-length'] = len(row['i-input'].split())
>
>
> On Thu, Apr 11, 2019 at 11:06 AM Emily M. Bender <ebender at uw.edu> wrote:
>
>> Dear colleagues,
>>
>> I've run into a mysterious issue with the item files I've generated from
>> Xigt corpora, using an export script that in turn relies on pyDelphin: When
>> I use them to create [incr tsdb()] skeletons, I end up with profiles that
>> are partially but not fully functional: Browse | Items works, for example,
>> but when I try Process | All items or try clicking on an individual item to
>> process, I get just this message in the emacs buffer:
>>
>> retrieve(): found 0 items (0 output specifications)
>>
>> I suspect that this is because there's something missing or odd about the
>> item files, but comparing by hand to one that works (generated by Import |
>> Test items from within the [incr tsdb()] podium), I can't spot the
>> difference.
>>
>> I've attached a small, two-line example that should reproduce the
>> problem, in case anyone has a moment to take a look. (I'm not sending a
>> grammar for this language, since the behavior can be observed with any
>> grammar loaded into the LKB --- the point of failure is before any parsing
>> happens.)
>>
>> Thanks!
>> Emily
>>
>> --
>> Emily M. Bender
>> Professor, Department of Linguistics
>> University of Washington
>> Twitter: @emilymbender
>>
>
>
> --
> -Michael Wayne Goodman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20190411/8a84b626/attachment.html>


More information about the developers mailing list