[developers] Browsing items with lexical coverage using itsdb profiles created with pydelphin/ace

goodman.m.w at gmail.com goodman.m.w at gmail.com
Tue Oct 1 04:20:09 CEST 2019


On Tue, Oct 1, 2019 at 4:10 AM Kristen Howell <kphowell at uw.edu> wrote:

> I've substituted ace/art steps into my pipeline and the resulting parse
> file includes the error "post reduction lexical gap", for items without
> lexical coverage and no error otherwise. The profile loads in [incr tsdb()]
> and seems to behave nicely. Thanks everyone for helping with this.
>

Hmm, 'post-reduction lexical gap' is one of the few error messages I do get
from ACE, and PyDelphin does put it into the profile. I don't have the
Wambaya grammar but I used the Matrix's 'tiniest' grammar and added an
extra item with an unknown lexical item, then processed it with PyDelphin
and ACE. I see the error message in the profile and querying with [incr
tsdb()] behaves as expected. See the attached screenshot.
So I'm not sure why it wasn't working for you before.

I think using art and PyDelphin are more or less equivalent: art is faster
and supports distributed processing while PyDelphin can recover from ACE
crashes and populates more fields in the profile.


> On Mon, Sep 30, 2019 at 10:54 AM Kristen Howell <kphowell at uw.edu> wrote:
>
>> Thanks, Woodley! I'll try using art... I can't believe I forgot about
>> that option. I'll follow up if I still have problems.
>>
>> On Mon, Sep 30, 2019 at 10:31 AM Woodley Packard <sweaglesw at sweaglesw.org>
>> wrote:
>>
>>> I get the error field populated when I use “art” to record profiles.
>>> Are you passing —tsdb-notes to ace?  It may help.
>>>
>>> Woodley
>>>
>>> On Sep 30, 2019, at 7:59 AM, Kristen Howell <kphowell at uw.edu> wrote:
>>>
>>> Thanks Mike. You're right- the error information is showing up in
>>> stderr, rather than std out, so that is why PyDelphin isn't picking them up.
>>> So it sounds like I'm out of luck as far as generating profiles using
>>> Ace and then inspecting them with [incr tsdb()]. I will either need to use
>>> LBK/PET to parse, or look at the stderr from Ace to see my lexical coverage.
>>> Unless Woodley, is there a way command/option in Ace to send parse
>>> errors to stdout?
>>>
>>> On Fri, Sep 27, 2019 at 4:56 PM goodman.m.w at gmail.com <
>>> goodman.m.w at gmail.com> wrote:
>>>
>>>> Hi Kristen,
>>>>
>>>> The item file and the item schema in the relations file both have 15
>>>> fields, so I don't think there is disagreement there (although I had some
>>>> encoding issues with the angled quotes on a comment in the relations file;
>>>> i just fixed it manually).
>>>>
>>>> PyDelphin uses ACE's stdout protocols (see:
>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.ace.html#ace-stdout-protocols).
>>>> By default PyDelphin uses the --tsdb-stdout option of ACE to get as much
>>>> information as ACE can provide. If ACE provides the :error information,
>>>> PyDelphin will populate the corresponding field in a profile. From what I
>>>> recall, however, ACE does not output this field as consistently as the LKB
>>>> and PET, and sometimes it puts parsing errors on the stderr stream instead,
>>>> which PyDelphin does not capture.
>>>>
>>>>
>>>> On Sat, Sep 28, 2019 at 4:53 AM Kristen Howell <kphowell at uw.edu> wrote:
>>>>
>>>>> Perhaps there is some disagreement between my item and relations
>>>>> files? I generated the item file using the xigt exporter. I believe this is
>>>>> the corresponding relation file (it's the one I point to when using the
>>>>> exporter). I've attached both. I am creating the profile with the following
>>>>> steps (in python):
>>>>>  ts = itsdb.TestSuite('./unprocessed/wmb/')
>>>>>  ace.compile('./wmb/ace/config.tdl', './wmb/ace/wmb.dat')
>>>>>  with ace.AceParser('./wmb/ace/wmb.dat') as cpu:
>>>>>         ts.process(cpu)
>>>>>     ts.write(path='./output/processed/wmb'r)
>>>>>
>>>>>
>>>>> On Fri, Sep 27, 2019 at 1:06 PM Stephan Oepen <oe at ifi.uio.no> wrote:
>>>>>
>>>>>> yes, the 'parse' file (like the other files in a tsdb(1) database) is
>>>>>> a textual encoding of a set of tuples.  what you quote looks
>>>>>> suspiciously spartan to me, with only the first three fields filled
>>>>>> and the number of 'readings' filled in.  in a regular profile, i would
>>>>>> expect a record of the initial and internal tokenization, various
>>>>>> timings, and statistics about lexical instantiation and chart
>>>>>> construction.  i am relatively sure that ACE does account for most of
>>>>>> these, so i suspect that information is getting lost somewhere in your
>>>>>> pipeline.
>>>>>>
>>>>>> oe
>>>>>>
>>>>>> On Fri, Sep 27, 2019 at 9:56 PM Kristen Howell <kphowell at uw.edu>
>>>>>> wrote:
>>>>>> >
>>>>>> > Thank you Stephan. Would the 'parse' relations be the lines the
>>>>>> parse file? They each look something like this:
>>>>>> > 0 at 0@0 at -1@@-1@@0 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@
>>>>>> -1 at -1@-1 at -1@-1 at -1@-1 at -1@-1 at -1@-1@@@
>>>>>> > Perhaps this means that the error field among other things is not
>>>>>> being populated?
>>>>>> > Then the question for Mike and/or Woodley would be if it is
>>>>>> expected to be populated.
>>>>>> >
>>>>>> > On Fri, Sep 27, 2019 at 12:33 PM Stephan Oepen <oe at ifi.uio.no>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> hi kristen,
>>>>>> >>
>>>>>> >> i had to peak at the [incr tsdb()] code myself; 'Browse Errors'
>>>>>> will
>>>>>> >> extract all items where the 'error' field (in the 'parse'
>>>>>> relation) is
>>>>>> >> a non-empty string.  so, if nothing comes up there, presumably
>>>>>> there
>>>>>> >> either were not errors, or ACE does not populate that field?
>>>>>> >>
>>>>>> >> likewise, the pre-canned 'unproblematic' condition amounts to
>>>>>> 'error
>>>>>> >> == ""', i.e. an empty string in that field.  to some degree, what
>>>>>> to
>>>>>> >> consider an 'error' is arguably up to the parsing engine.  from
>>>>>> >> memory, i believe that both the LKB and PET will generate some
>>>>>> >> descriptive 'error' string for example in case of missing lexical
>>>>>> >> entries for some of the input tokens.
>>>>>> >>
>>>>>> >> it appears that ACE (or pyDelphin, not sure about the division of
>>>>>> >> labor here) maybe simply does not populate the 'error' field in the
>>>>>> >> profiles that it generates?
>>>>>> >>
>>>>>> >> best wishes, oe
>>>>>> >>
>>>>>> >> On Fri, Sep 27, 2019 at 7:09 PM Kristen Howell <kphowell at uw.edu>
>>>>>> wrote:
>>>>>> >> >
>>>>>> >> > Hi Mike and Woodley (and others?),
>>>>>> >> >
>>>>>> >> > I've created some itsdb profiles using pydelphin and a grammar
>>>>>> loaded in ace. I am trying to browse the profile in [incr tsdb()]. The
>>>>>> results and coverage show up fine. However, when I try to browse errors,
>>>>>> nothing happens. Also when I try to view items with lexical coverage (using
>>>>>> tsdl condition--> unproblematic and then browse --> test items), I see all
>>>>>> of the items, not just those with lexical coverage.
>>>>>> >> >
>>>>>> >> > Is this expected to work with pydelphin profiles? If so, what
>>>>>> might be missing? My profile contains non empty item, parse, result,
>>>>>> relations, run files.
>>>>>> >> >
>>>>>> >> > Thanks for your help,
>>>>>> >> > Kristen
>>>>>>
>>>>>
>>>>
>>>> --
>>>> -Michael Wayne Goodman
>>>>
>>>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20191001/5b1a18d8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parsing-errors.png
Type: image/png
Size: 51541 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20191001/5b1a18d8/attachment-0001.png>


More information about the developers mailing list