[developers] pyDelphin / [incr tsdb()] question

Emily M. Bender ebender at uw.edu
Thu Apr 11 19:25:30 CEST 2019


Thanks, Stephan! That explains why I have a vague memory of it not
mattering whether the number in that field is actually accurate...

Emily

On Thu, Apr 11, 2019 at 9:53 AM Stephan Oepen <oe at ifi.uio.no> wrote:

> hi emily and mike,
>
> the [incr tsdb()] import facilities support some mixed-content document
> formats, notably the un*x mbox format (you can guess when that was useful
> functionality).  to represent all data in the profile while not pretending
> that there is linguistic content (worth sending to the parser) in email
> headers, the corresponding items are marked as i-length = -1 (or maybe 0,
> not quite sure).  this is the reason for the ‘Process | ...’ commands to
> require a non-zero, positive length ... in other words a reassurance that
> there actually is linguistic content in the item.  in this regard, i-length
> (like i-id, i-input, and possibly i-wf as well) is a mandatory field in the
> item relation.
>
> best wishes, oe
>
>
> On Thu, 11 Apr 2019 at 05:39 goodman.m.w at gmail.com <goodman.m.w at gmail.com>
> wrote:
>
>> Hi Emily,
>>
>> By changing the i-length field from -1 to 6 in the two items, I get [incr
>> tsdb()] to say the following:
>>
>>     retrieve(): found 2 items (0 output specifications).
>>
>> So I suspect that is the problem. Since i-length is just the number of
>> tokens I find it odd that [incr tsdb()] would rely on that instead of just
>> counting tokens. Processing with art or PyDelphin and ACE works fine
>> regardless of the value of i-length.
>>
>> I would modify Xigt's exporter to insert the number of tokens in the
>> i-length field. E.g., at the end of export_igt() put something like this
>> before the return statement:
>>
>>         row['i-length'] = len(row['i-input'].split())
>>
>>
>> On Thu, Apr 11, 2019 at 11:06 AM Emily M. Bender <ebender at uw.edu> wrote:
>>
>>> Dear colleagues,
>>>
>>> I've run into a mysterious issue with the item files I've generated from
>>> Xigt corpora, using an export script that in turn relies on pyDelphin: When
>>> I use them to create [incr tsdb()] skeletons, I end up with profiles that
>>> are partially but not fully functional: Browse | Items works, for example,
>>> but when I try Process | All items or try clicking on an individual item to
>>> process, I get just this message in the emacs buffer:
>>>
>>> retrieve(): found 0 items (0 output specifications)
>>>
>>> I suspect that this is because there's something missing or odd about
>>> the item files, but comparing by hand to one that works (generated by
>>> Import | Test items from within the [incr tsdb()] podium), I can't spot the
>>> difference.
>>>
>>> I've attached a small, two-line example that should reproduce the
>>> problem, in case anyone has a moment to take a look. (I'm not sending a
>>> grammar for this language, since the behavior can be observed with any
>>> grammar loaded into the LKB --- the point of failure is before any parsing
>>> happens.)
>>>
>>> Thanks!
>>> Emily
>>>
>>> --
>>> Emily M. Bender
>>> Professor, Department of Linguistics
>>> University of Washington
>>> Twitter: @emilymbender
>>>
>>
>>
>> --
>> -Michael Wayne Goodman
>>
>

-- 
Emily M. Bender
Professor, Department of Linguistics
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20190411/31ec53df/attachment.html>


More information about the developers mailing list