[developers] Browsing items with lexical coverage using itsdb profiles created with pydelphin/ace

Kristen Howell kphowell at uw.edu
Thu Oct 3 18:06:13 CEST 2019


Odd. I'll double check my versions.

On Wed, Oct 2, 2019 at 10:01 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> Thanks, Kristen.
>
> I tried using the grammar and skeleton and had no problems processing a
> profile with PyDelphin and viewing it with [incr tsdb()]. Unless you're
> using an ACE version prior to v0.9.24, an old version of PyDelphin, or are
> modifying ACE's default options, I'm not sure what the problem is.
>
> On Thu, Oct 3, 2019 at 1:11 AM Kristen Howell <kphowell at uw.edu> wrote:
>
>> Thanks Mike. If you're curious and want to try and repro this, I've
>> attached the grammar and testsuite. In the meantime though, art is working
>> great!
>>
>> On Mon, Sep 30, 2019 at 7:21 PM goodman.m.w at gmail.com <
>> goodman.m.w at gmail.com> wrote:
>>
>>> On Tue, Oct 1, 2019 at 4:10 AM Kristen Howell <kphowell at uw.edu> wrote:
>>>
>>>> I've substituted ace/art steps into my pipeline and the resulting parse
>>>> file includes the error "post reduction lexical gap", for items without
>>>> lexical coverage and no error otherwise. The profile loads in [incr tsdb()]
>>>> and seems to behave nicely. Thanks everyone for helping with this.
>>>>
>>>
>>> Hmm, 'post-reduction lexical gap' is one of the few error messages I do
>>> get from ACE, and PyDelphin does put it into the profile. I don't have the
>>> Wambaya grammar but I used the Matrix's 'tiniest' grammar and added an
>>> extra item with an unknown lexical item, then processed it with PyDelphin
>>> and ACE. I see the error message in the profile and querying with [incr
>>> tsdb()] behaves as expected. See the attached screenshot.
>>> So I'm not sure why it wasn't working for you before.
>>>
>>> I think using art and PyDelphin are more or less equivalent: art is
>>> faster and supports distributed processing while PyDelphin can recover from
>>> ACE crashes and populates more fields in the profile.
>>>
>>>
>>>> On Mon, Sep 30, 2019 at 10:54 AM Kristen Howell <kphowell at uw.edu>
>>>> wrote:
>>>>
>>>>> Thanks, Woodley! I'll try using art... I can't believe I forgot about
>>>>> that option. I'll follow up if I still have problems.
>>>>>
>>>>> On Mon, Sep 30, 2019 at 10:31 AM Woodley Packard <
>>>>> sweaglesw at sweaglesw.org> wrote:
>>>>>
>>>>>> I get the error field populated when I use “art” to record profiles.
>>>>>> Are you passing —tsdb-notes to ace?  It may help.
>>>>>>
>>>>>> Woodley
>>>>>>
>>>>>> On Sep 30, 2019, at 7:59 AM, Kristen Howell <kphowell at uw.edu> wrote:
>>>>>>
>>>>>> Thanks Mike. You're right- the error information is showing up in
>>>>>> stderr, rather than std out, so that is why PyDelphin isn't picking them up.
>>>>>> So it sounds like I'm out of luck as far as generating profiles using
>>>>>> Ace and then inspecting them with [incr tsdb()]. I will either need to use
>>>>>> LBK/PET to parse, or look at the stderr from Ace to see my lexical coverage.
>>>>>> Unless Woodley, is there a way command/option in Ace to send parse
>>>>>> errors to stdout?
>>>>>>
>>>>>> On Fri, Sep 27, 2019 at 4:56 PM goodman.m.w at gmail.com <
>>>>>> goodman.m.w at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Kristen,
>>>>>>>
>>>>>>> The item file and the item schema in the relations file both have 15
>>>>>>> fields, so I don't think there is disagreement there (although I had some
>>>>>>> encoding issues with the angled quotes on a comment in the relations file;
>>>>>>> i just fixed it manually).
>>>>>>>
>>>>>>> PyDelphin uses ACE's stdout protocols (see:
>>>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.ace.html#ace-stdout-protocols).
>>>>>>> By default PyDelphin uses the --tsdb-stdout option of ACE to get as much
>>>>>>> information as ACE can provide. If ACE provides the :error information,
>>>>>>> PyDelphin will populate the corresponding field in a profile. From what I
>>>>>>> recall, however, ACE does not output this field as consistently as the LKB
>>>>>>> and PET, and sometimes it puts parsing errors on the stderr stream instead,
>>>>>>> which PyDelphin does not capture.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Sep 28, 2019 at 4:53 AM Kristen Howell <kphowell at uw.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Perhaps there is some disagreement between my item and relations
>>>>>>>> files? I generated the item file using the xigt exporter. I believe this is
>>>>>>>> the corresponding relation file (it's the one I point to when using the
>>>>>>>> exporter). I've attached both. I am creating the profile with the following
>>>>>>>> steps (in python):
>>>>>>>>  ts = itsdb.TestSuite('./unprocessed/wmb/')
>>>>>>>>  ace.compile('./wmb/ace/config.tdl', './wmb/ace/wmb.dat')
>>>>>>>>  with ace.AceParser('./wmb/ace/wmb.dat') as cpu:
>>>>>>>>         ts.process(cpu)
>>>>>>>>     ts.write(path='./output/processed/wmb'r)
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 27, 2019 at 1:06 PM Stephan Oepen <oe at ifi.uio.no>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> yes, the 'parse' file (like the other files in a tsdb(1) database)
>>>>>>>>> is
>>>>>>>>> a textual encoding of a set of tuples.  what you quote looks
>>>>>>>>> suspiciously spartan to me, with only the first three fields filled
>>>>>>>>> and the number of 'readings' filled in.  in a regular profile, i
>>>>>>>>> would
>>>>>>>>> expect a record of the initial and internal tokenization, various
>>>>>>>>> timings, and statistics about lexical instantiation and chart
>>>>>>>>> construction.  i am relatively sure that ACE does account for most
>>>>>>>>> of
>>>>>>>>> these, so i suspect that information is getting lost somewhere in
>>>>>>>>> your
>>>>>>>>> pipeline.
>>>>>>>>>
>>>>>>>>> oe
>>>>>>>>>
>>>>>>>>> On Fri, Sep 27, 2019 at 9:56 PM Kristen Howell <kphowell at uw.edu>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Thank you Stephan. Would the 'parse' relations be the lines the
>>>>>>>>> parse file? They each look something like this:
>>>>>>>>> > 0 at 0@0 at -1@@-1@@0 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1@
>>>>>>>>> -1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@@@
>>>>>>>>> > Perhaps this means that the error field among other things is
>>>>>>>>> not being populated?
>>>>>>>>> > Then the question for Mike and/or Woodley would be if it is
>>>>>>>>> expected to be populated.
>>>>>>>>> >
>>>>>>>>> > On Fri, Sep 27, 2019 at 12:33 PM Stephan Oepen <oe at ifi.uio.no>
>>>>>>>>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> hi kristen,
>>>>>>>>> >>
>>>>>>>>> >> i had to peak at the [incr tsdb()] code myself; 'Browse Errors'
>>>>>>>>> will
>>>>>>>>> >> extract all items where the 'error' field (in the 'parse'
>>>>>>>>> relation) is
>>>>>>>>> >> a non-empty string.  so, if nothing comes up there, presumably
>>>>>>>>> there
>>>>>>>>> >> either were not errors, or ACE does not populate that field?
>>>>>>>>> >>
>>>>>>>>> >> likewise, the pre-canned 'unproblematic' condition amounts to
>>>>>>>>> 'error
>>>>>>>>> >> == ""', i.e. an empty string in that field.  to some degree,
>>>>>>>>> what to
>>>>>>>>> >> consider an 'error' is arguably up to the parsing engine.  from
>>>>>>>>> >> memory, i believe that both the LKB and PET will generate some
>>>>>>>>> >> descriptive 'error' string for example in case of missing
>>>>>>>>> lexical
>>>>>>>>> >> entries for some of the input tokens.
>>>>>>>>> >>
>>>>>>>>> >> it appears that ACE (or pyDelphin, not sure about the division
>>>>>>>>> of
>>>>>>>>> >> labor here) maybe simply does not populate the 'error' field in
>>>>>>>>> the
>>>>>>>>> >> profiles that it generates?
>>>>>>>>> >>
>>>>>>>>> >> best wishes, oe
>>>>>>>>> >>
>>>>>>>>> >> On Fri, Sep 27, 2019 at 7:09 PM Kristen Howell <kphowell at uw.edu>
>>>>>>>>> wrote:
>>>>>>>>> >> >
>>>>>>>>> >> > Hi Mike and Woodley (and others?),
>>>>>>>>> >> >
>>>>>>>>> >> > I've created some itsdb profiles using pydelphin and a
>>>>>>>>> grammar loaded in ace. I am trying to browse the profile in [incr tsdb()].
>>>>>>>>> The results and coverage show up fine. However, when I try to browse
>>>>>>>>> errors, nothing happens. Also when I try to view items with lexical
>>>>>>>>> coverage (using tsdl condition--> unproblematic and then browse --> test
>>>>>>>>> items), I see all of the items, not just those with lexical coverage.
>>>>>>>>> >> >
>>>>>>>>> >> > Is this expected to work with pydelphin profiles? If so, what
>>>>>>>>> might be missing? My profile contains non empty item, parse, result,
>>>>>>>>> relations, run files.
>>>>>>>>> >> >
>>>>>>>>> >> > Thanks for your help,
>>>>>>>>> >> > Kristen
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -Michael Wayne Goodman
>>>>>>>
>>>>>>
>>>
>>> --
>>> -Michael Wayne Goodman
>>>
>>
>
> --
> -Michael Wayne Goodman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20191003/b0ca9d59/attachment.html>


More information about the developers mailing list