[developers] pyDelphin / [incr tsdb()] question

Stephan Oepen oe at ifi.uio.no
Thu Apr 11 21:34:35 CEST 2019


well, upon further reflection: those values also matter when you move on to
analyzing your profile, or to comparing to another one: the default
aggregation will be by i-length intervals, and in my experience at least
breaking results down that way can help identify relevant trends.

cheers, oe


On Thu, 11 Apr 2019 at 19:29 Emily M. Bender <ebender at uw.edu> wrote:

> Thanks, Stephan! That explains why I have a vague memory of it not
> mattering whether the number in that field is actually accurate...
>
> Emily
>
> On Thu, Apr 11, 2019 at 9:53 AM Stephan Oepen <oe at ifi.uio.no> wrote:
>
>> hi emily and mike,
>>
>> the [incr tsdb()] import facilities support some mixed-content document
>> formats, notably the un*x mbox format (you can guess when that was useful
>> functionality).  to represent all data in the profile while not pretending
>> that there is linguistic content (worth sending to the parser) in email
>> headers, the corresponding items are marked as i-length = -1 (or maybe 0,
>> not quite sure).  this is the reason for the ‘Process | ...’ commands to
>> require a non-zero, positive length ... in other words a reassurance that
>> there actually is linguistic content in the item.  in this regard, i-length
>> (like i-id, i-input, and possibly i-wf as well) is a mandatory field in the
>> item relation.
>>
>> best wishes, oe
>>
>>
>> On Thu, 11 Apr 2019 at 05:39 goodman.m.w at gmail.com <goodman.m.w at gmail.com>
>> wrote:
>>
>>> Hi Emily,
>>>
>>> By changing the i-length field from -1 to 6 in the two items, I get
>>> [incr tsdb()] to say the following:
>>>
>>>     retrieve(): found 2 items (0 output specifications).
>>>
>>> So I suspect that is the problem. Since i-length is just the number of
>>> tokens I find it odd that [incr tsdb()] would rely on that instead of just
>>> counting tokens. Processing with art or PyDelphin and ACE works fine
>>> regardless of the value of i-length.
>>>
>>> I would modify Xigt's exporter to insert the number of tokens in the
>>> i-length field. E.g., at the end of export_igt() put something like this
>>> before the return statement:
>>>
>>>         row['i-length'] = len(row['i-input'].split())
>>>
>>>
>>> On Thu, Apr 11, 2019 at 11:06 AM Emily M. Bender <ebender at uw.edu> wrote:
>>>
>>>> Dear colleagues,
>>>>
>>>> I've run into a mysterious issue with the item files I've generated
>>>> from Xigt corpora, using an export script that in turn relies on pyDelphin:
>>>> When I use them to create [incr tsdb()] skeletons, I end up with profiles
>>>> that are partially but not fully functional: Browse | Items works, for
>>>> example,  but when I try Process | All items or try clicking on an
>>>> individual item to process, I get just this message in the emacs buffer:
>>>>
>>>> retrieve(): found 0 items (0 output specifications)
>>>>
>>>> I suspect that this is because there's something missing or odd about
>>>> the item files, but comparing by hand to one that works (generated by
>>>> Import | Test items from within the [incr tsdb()] podium), I can't spot the
>>>> difference.
>>>>
>>>> I've attached a small, two-line example that should reproduce the
>>>> problem, in case anyone has a moment to take a look. (I'm not sending a
>>>> grammar for this language, since the behavior can be observed with any
>>>> grammar loaded into the LKB --- the point of failure is before any parsing
>>>> happens.)
>>>>
>>>> Thanks!
>>>> Emily
>>>>
>>>> --
>>>> Emily M. Bender
>>>> Professor, Department of Linguistics
>>>> University of Washington
>>>> Twitter: @emilymbender
>>>>
>>>
>>>
>>> --
>>> -Michael Wayne Goodman
>>>
>>
>
> --
> Emily M. Bender
> Professor, Department of Linguistics
> University of Washington
> Twitter: @emilymbender
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20190411/b07c56c1/attachment-0001.html>


More information about the developers mailing list