[developers] Comparison of REPP implementations

Mon Nov 25 12:14:05 CET 2019

Hi developers,

I ran a test to see how compatible our REPP implementations are. I tested
the following:

* $LOGONROOT/bin/repp standalone tool
* PyDelphin's `delphin repp`
* ace -Ev (with some `sed` and `awk` to format it like the others)

I ran these over all i-input fields in the ERG's tsdb/gold profiles and
diffed the outputs respective to the output of the REPP standalone tool.

There were 3 issues with PyDelphin:

1. Characterization wasn't accounting for deletions without replacement
2. Inline regex flags (such as (?i) in a group) apply to the whole match in
Python
3. External group calls (such as >wiki) are non-iterative

Once I resolved the first issue, PyDelphin only differs from REPP in 6
items. The second issue is a Python thing and I think I have a way around
it. The third one is more troubling, because it appears that ACE and REPP
both apply external group calls iteratively even though the ReppTop wiki is
clear that they are should not be iterative. If someone can confirm that
the wiki is incorrect, I'll update PyDelphin to treat them as iterative as
well. See https://github.com/delph-in/pydelphin/issues/254 for more info.

ACE has diffs in 1410 items, which appear to be mostly whether the
conversion from two hyphens to an en-dash, two quotes to an angled quote,
... to an ellipsis character, etc., count as a span of 1 vs 2 or 3. There
were some other issues but this appears to be the main one.

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20191125/ffbce694/attachment-0001.html>