[developers] Skipping non-parsed items in fftb

goodman.m.w at gmail.com goodman.m.w at gmail.com
Fri Jan 17 03:45:49 CET 2020


Let me know how it goes.

And a clarification: the --full option on `mkprof` doesn't hurt, but it's
unnecessary since you're re-parsing the created profile.

Also here's the bug report for the other thing, if you're interested in
that use case: https://github.com/delph-in/pydelphin/issues/273

On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender <ebender at uw.edu> wrote:

> Thanks, Mike! I will give this a try.
>
> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>> Hi Emily,
>>
>> For (2), here is how you could do it with PyDelphin:
>>
>>     delphin process -g grm.dat original-profile/
>>     delphin mkprof --full --where 'readings > 0' --source
>> original-profile/ new-profile/
>>     delphin process -g grm.dat --full-forest new-profile/
>>
>> Note that original-profile/ is first parsed in regular (non-forest) mode,
>> because in full-forest mode the number of readings is essentially unknown
>> until they are enumerated and thus the 'readings' field is always 0. The
>> second command not only prunes lines in the 'parse' file with readings ==
>> 0, but also lines in the 'item' file which correspond to those 'parse'
>> lines. Once you have created new-profile/, you can parse again with
>> --full-forest for use with FFTB (and of course you don't have to use
>> PyDelphin for the parsing steps, if you prefer other means).
>>
>> Also note that this results in a profile with no edges for partial
>> parses. I think this is what you want. There should be a way to prune the
>> full-forest profile directly while keeping partial parses, but while
>> investigating this use case I found a bug, so I don't recommend it yet.
>>
>> Try `delphin mkprof --help` to see descriptions of these and other
>> options. They map fairly directly to the function documented here:
>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
>> #mkprof
>>
>>
>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu> wrote:
>>
>>> Dear all,
>>>
>>> We are doing some treebanking here at UW with fftb with grammars that
>>> have very low coverage over their associated test corpora. The current
>>> behavior of fftb with these profiles is to include all items for
>>> treebanking, but give a 404 for each one with no parse forest stored. This
>>> necessitates clicking the back button and tracking which one is next (since
>>> nothing changes color). In that light, two questions:
>>>
>>> (1) Is there some option we can pass fftb so that it just doesn't
>>> present items with no parses?
>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr
>>> tsdb()] or something else to export a version of the profiles that only
>>> includes items which the grammar successfully parsed?
>>>
>>> Thanks,
>>> Emily
>>>
>>>
>>> --
>>> Emily M. Bender (she/her)
>>> Howard and Frances Nostrand Endowed Professor
>>> Department of Linguistics
>>> Faculty Director, CLMS
>>> University of Washington
>>> Twitter: @emilymbender
>>>
>>
>>
>> --
>> -Michael Wayne Goodman
>>
> --
> Emily M. Bender (she/her)
> Howard and Frances Nostrand Endowed Professor
> Department of Linguistics
> Faculty Director, CLMS
> University of Washington
> Twitter: @emilymbender
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200117/bd043b61/attachment.html>


More information about the developers mailing list