[developers] Browsing items with lexical coverage using itsdb profiles created with pydelphin/ace

goodman.m.w at gmail.com goodman.m.w at gmail.com
Thu Oct 3 07:01:23 CEST 2019


Thanks, Kristen.

I tried using the grammar and skeleton and had no problems processing a
profile with PyDelphin and viewing it with [incr tsdb()]. Unless you're
using an ACE version prior to v0.9.24, an old version of PyDelphin, or are
modifying ACE's default options, I'm not sure what the problem is.

On Thu, Oct 3, 2019 at 1:11 AM Kristen Howell <kphowell at uw.edu> wrote:

> Thanks Mike. If you're curious and want to try and repro this, I've
> attached the grammar and testsuite. In the meantime though, art is working
> great!
>
> On Mon, Sep 30, 2019 at 7:21 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>> On Tue, Oct 1, 2019 at 4:10 AM Kristen Howell <kphowell at uw.edu> wrote:
>>
>>> I've substituted ace/art steps into my pipeline and the resulting parse
>>> file includes the error "post reduction lexical gap", for items without
>>> lexical coverage and no error otherwise. The profile loads in [incr tsdb()]
>>> and seems to behave nicely. Thanks everyone for helping with this.
>>>
>>
>> Hmm, 'post-reduction lexical gap' is one of the few error messages I do
>> get from ACE, and PyDelphin does put it into the profile. I don't have the
>> Wambaya grammar but I used the Matrix's 'tiniest' grammar and added an
>> extra item with an unknown lexical item, then processed it with PyDelphin
>> and ACE. I see the error message in the profile and querying with [incr
>> tsdb()] behaves as expected. See the attached screenshot.
>> So I'm not sure why it wasn't working for you before.
>>
>> I think using art and PyDelphin are more or less equivalent: art is
>> faster and supports distributed processing while PyDelphin can recover from
>> ACE crashes and populates more fields in the profile.
>>
>>
>>> On Mon, Sep 30, 2019 at 10:54 AM Kristen Howell <kphowell at uw.edu> wrote:
>>>
>>>> Thanks, Woodley! I'll try using art... I can't believe I forgot about
>>>> that option. I'll follow up if I still have problems.
>>>>
>>>> On Mon, Sep 30, 2019 at 10:31 AM Woodley Packard <
>>>> sweaglesw at sweaglesw.org> wrote:
>>>>
>>>>> I get the error field populated when I use “art” to record profiles.
>>>>> Are you passing —tsdb-notes to ace?  It may help.
>>>>>
>>>>> Woodley
>>>>>
>>>>> On Sep 30, 2019, at 7:59 AM, Kristen Howell <kphowell at uw.edu> wrote:
>>>>>
>>>>> Thanks Mike. You're right- the error information is showing up in
>>>>> stderr, rather than std out, so that is why PyDelphin isn't picking them up.
>>>>> So it sounds like I'm out of luck as far as generating profiles using
>>>>> Ace and then inspecting them with [incr tsdb()]. I will either need to use
>>>>> LBK/PET to parse, or look at the stderr from Ace to see my lexical coverage.
>>>>> Unless Woodley, is there a way command/option in Ace to send parse
>>>>> errors to stdout?
>>>>>
>>>>> On Fri, Sep 27, 2019 at 4:56 PM goodman.m.w at gmail.com <
>>>>> goodman.m.w at gmail.com> wrote:
>>>>>
>>>>>> Hi Kristen,
>>>>>>
>>>>>> The item file and the item schema in the relations file both have 15
>>>>>> fields, so I don't think there is disagreement there (although I had some
>>>>>> encoding issues with the angled quotes on a comment in the relations file;
>>>>>> i just fixed it manually).
>>>>>>
>>>>>> PyDelphin uses ACE's stdout protocols (see:
>>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.ace.html#ace-stdout-protocols).
>>>>>> By default PyDelphin uses the --tsdb-stdout option of ACE to get as much
>>>>>> information as ACE can provide. If ACE provides the :error information,
>>>>>> PyDelphin will populate the corresponding field in a profile. From what I
>>>>>> recall, however, ACE does not output this field as consistently as the LKB
>>>>>> and PET, and sometimes it puts parsing errors on the stderr stream instead,
>>>>>> which PyDelphin does not capture.
>>>>>>
>>>>>>
>>>>>> On Sat, Sep 28, 2019 at 4:53 AM Kristen Howell <kphowell at uw.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Perhaps there is some disagreement between my item and relations
>>>>>>> files? I generated the item file using the xigt exporter. I believe this is
>>>>>>> the corresponding relation file (it's the one I point to when using the
>>>>>>> exporter). I've attached both. I am creating the profile with the following
>>>>>>> steps (in python):
>>>>>>>  ts = itsdb.TestSuite('./unprocessed/wmb/')
>>>>>>>  ace.compile('./wmb/ace/config.tdl', './wmb/ace/wmb.dat')
>>>>>>>  with ace.AceParser('./wmb/ace/wmb.dat') as cpu:
>>>>>>>         ts.process(cpu)
>>>>>>>     ts.write(path='./output/processed/wmb'r)
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 27, 2019 at 1:06 PM Stephan Oepen <oe at ifi.uio.no> wrote:
>>>>>>>
>>>>>>>> yes, the 'parse' file (like the other files in a tsdb(1) database)
>>>>>>>> is
>>>>>>>> a textual encoding of a set of tuples.  what you quote looks
>>>>>>>> suspiciously spartan to me, with only the first three fields filled
>>>>>>>> and the number of 'readings' filled in.  in a regular profile, i
>>>>>>>> would
>>>>>>>> expect a record of the initial and internal tokenization, various
>>>>>>>> timings, and statistics about lexical instantiation and chart
>>>>>>>> construction.  i am relatively sure that ACE does account for most
>>>>>>>> of
>>>>>>>> these, so i suspect that information is getting lost somewhere in
>>>>>>>> your
>>>>>>>> pipeline.
>>>>>>>>
>>>>>>>> oe
>>>>>>>>
>>>>>>>> On Fri, Sep 27, 2019 at 9:56 PM Kristen Howell <kphowell at uw.edu>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Thank you Stephan. Would the 'parse' relations be the lines the
>>>>>>>> parse file? They each look something like this:
>>>>>>>> > 0 at 0@0 at -1@@-1@@0 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1@
>>>>>>>> -1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@@@
>>>>>>>> > Perhaps this means that the error field among other things is not
>>>>>>>> being populated?
>>>>>>>> > Then the question for Mike and/or Woodley would be if it is
>>>>>>>> expected to be populated.
>>>>>>>> >
>>>>>>>> > On Fri, Sep 27, 2019 at 12:33 PM Stephan Oepen <oe at ifi.uio.no>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> hi kristen,
>>>>>>>> >>
>>>>>>>> >> i had to peak at the [incr tsdb()] code myself; 'Browse Errors'
>>>>>>>> will
>>>>>>>> >> extract all items where the 'error' field (in the 'parse'
>>>>>>>> relation) is
>>>>>>>> >> a non-empty string.  so, if nothing comes up there, presumably
>>>>>>>> there
>>>>>>>> >> either were not errors, or ACE does not populate that field?
>>>>>>>> >>
>>>>>>>> >> likewise, the pre-canned 'unproblematic' condition amounts to
>>>>>>>> 'error
>>>>>>>> >> == ""', i.e. an empty string in that field.  to some degree,
>>>>>>>> what to
>>>>>>>> >> consider an 'error' is arguably up to the parsing engine.  from
>>>>>>>> >> memory, i believe that both the LKB and PET will generate some
>>>>>>>> >> descriptive 'error' string for example in case of missing lexical
>>>>>>>> >> entries for some of the input tokens.
>>>>>>>> >>
>>>>>>>> >> it appears that ACE (or pyDelphin, not sure about the division of
>>>>>>>> >> labor here) maybe simply does not populate the 'error' field in
>>>>>>>> the
>>>>>>>> >> profiles that it generates?
>>>>>>>> >>
>>>>>>>> >> best wishes, oe
>>>>>>>> >>
>>>>>>>> >> On Fri, Sep 27, 2019 at 7:09 PM Kristen Howell <kphowell at uw.edu>
>>>>>>>> wrote:
>>>>>>>> >> >
>>>>>>>> >> > Hi Mike and Woodley (and others?),
>>>>>>>> >> >
>>>>>>>> >> > I've created some itsdb profiles using pydelphin and a grammar
>>>>>>>> loaded in ace. I am trying to browse the profile in [incr tsdb()]. The
>>>>>>>> results and coverage show up fine. However, when I try to browse errors,
>>>>>>>> nothing happens. Also when I try to view items with lexical coverage (using
>>>>>>>> tsdl condition--> unproblematic and then browse --> test items), I see all
>>>>>>>> of the items, not just those with lexical coverage.
>>>>>>>> >> >
>>>>>>>> >> > Is this expected to work with pydelphin profiles? If so, what
>>>>>>>> might be missing? My profile contains non empty item, parse, result,
>>>>>>>> relations, run files.
>>>>>>>> >> >
>>>>>>>> >> > Thanks for your help,
>>>>>>>> >> > Kristen
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Michael Wayne Goodman
>>>>>>
>>>>>
>>
>> --
>> -Michael Wayne Goodman
>>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20191003/9abfef70/attachment.html>


More information about the developers mailing list