From paul at haleyai.com  Sat Jan  4 20:34:59 2020
From: paul at haleyai.com (paul at haleyai.com)
Date: Sat, 4 Jan 2020 14:34:59 -0500
Subject: [developers] issues building PET w/ Boost, trigrams, etc.
Message-ID: <004201d5c336$0e7e82c0$2b7b8840$@haleyai.com>

Greetings Folks,

 
I found the new quoting convention in the ERG?s TDL (lextypes) when upgrading and that I needed a more recent version of PET to process the current ERG. 

 
In the course of upgrading I found a few issues with the build process and its instructions.  I?ve done many builds of all the above over recent years, so hopefully this will be of assistance...

 
FYI, I typically build using Ubuntu inside Docker, so any of this is completely repeatable and has nothing to do with my system, per se.

 
The first problem I encountered involved the lack of support for the current version of GCC supported on 16.04 (or 18.04 which I upgraded to in this process).  This problem arises from the (outdated?) version of boost.m4 cached in the PET repository.  I overcame it by inserting the current versions  around line 1419, FYI.

 
The second problem was that additional modules of Boost appear to be required (as reported by errors during configure).  These included system, filesystem, and iostream.

 
After this, compilation proceeded but failed for 2 reasons.

 
First, the addition of the trigram subdirectory under the cheap directory  seems not properly reflected in cheap/Makefile.am (it omits the relative path when building from the release directory, for example).

 
The same was true for the (?new??) repp subdirectory of cheap.  Both were resolved by the following edit to line 12 of cheap/Makefile.am, FYI.  I?m not sure that?s the right approach, but ?works for me?!

 
CPPFLAGS += -I$(top_srcdir)/common -I$(top_srcdir)/fspp -I$(top_srcdir)/cheap/repp -I$(top_srcdir)/cheap/trigram @CHEAPCPPFLAGS@

 
Regards,

Paul 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200104/4ee7a8d7/attachment.html>

From goodman.m.w at gmail.com  Thu Jan 16 10:33:54 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Thu, 16 Jan 2020 17:33:54 +0800
Subject: [developers] EDM implementations
Message-ID: <CAGXBFArBbMGPfh_7s_2wkZQyjsM-G15WJUDSmG2D6Rx3fwsf3w@mail.gmail.com>

Hello developers,

Recently I wanted to try out Elementary Dependency Match (EDM) but I did
not find an easy way to do it. I saw lisp code in the LKB's repository and
Bec's Perl code, but I'm not sure how to call the former from the command
line and the latter seems outdated (I don't see the "export" command
required by its instructions).

The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd
implement it on top of PyDelphin. The result is here:
https://github.com/delph-in/delphin.edm. It requires the latest version of
PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text
files or [incr tsdb()] profiles.

When I nearly had my version working I found that Stephan et al.'s mtool (
https://github.com/cfmrp/mtool) also had an implementation of EDM, so I
used that to compare with my outputs (as I couldn't get the previous
implementations to work). In this process I think I found some differences
from Dridan & Oepen, 2011's description, and this email is to confirm those
findings. Namely, that mtool's (and now my) implementation do the following:

* CARGs are treated as property triples ("class 3 information"). Previously
they were combined with the predicate name. This change means that
predicates like 'named' will match even if their CARGs don't and the CARGs
are a separate thing that needs to be matched.

* The identification of the graph's TOP counts as a triple.

One difference between mtool and delphin.edm is that mtool does not count
"variable" properties from EDS, but that's just because its EDS parser does
not yet handle them while PyDelphin's does.

Can anyone familiar with EDM confirm the above? Or can anyone explain how
to call the Perl or LKB code so I can compare?

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200116/3929ba58/attachment.html>

From bec.dridan at gmail.com  Thu Jan 16 11:33:17 2020
From: bec.dridan at gmail.com (Bec Dridan)
Date: Thu, 16 Jan 2020 21:33:17 +1100
Subject: [developers] EDM implementations
In-Reply-To: <CAGXBFArBbMGPfh_7s_2wkZQyjsM-G15WJUDSmG2D6Rx3fwsf3w@mail.gmail.com>
References: <CAGXBFArBbMGPfh_7s_2wkZQyjsM-G15WJUDSmG2D6Rx3fwsf3w@mail.gmail.com>
Message-ID: <CAKRPO=PZv8N9E0AP=wt5UN19U5zgKnp3-BhT1F9H2xz9Yko+yQ@mail.gmail.com>

Wow, that is some old code... From memory, export was a wrapper around
`parse --export`, where I could add :ltriples to the
tsdb::*redwoods-export-values*
set.

I don't know the mtool code at all, but re-reading the paper and looking at
the perl code, I don't think the original implementation evaluated CARG at
all. We only checked that the correct character span had a pred name
of`named`.

I think you are right that the triple export at the time did not produce a
triple for TOP and it hence would not have been counted.

That match your memory Stephan?

Bec


On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> Hello developers,
>
> Recently I wanted to try out Elementary Dependency Match (EDM) but I did
> not find an easy way to do it. I saw lisp code in the LKB's repository and
> Bec's Perl code, but I'm not sure how to call the former from the command
> line and the latter seems outdated (I don't see the "export" command
> required by its instructions).
>
> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd
> implement it on top of PyDelphin. The result is here:
> https://github.com/delph-in/delphin.edm. It requires the latest version
> of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text
> files or [incr tsdb()] profiles.
>
> When I nearly had my version working I found that Stephan et al.'s mtool (
> https://github.com/cfmrp <https://github.com/cfmrp/mtool>The paper
> example
> /mtool <https://github.com/cfmrp/mtool>) also had an implementation of
> EDM, so I used that to compare with my outputs (as I couldn't get the
> previous implementations to work). In this process I think I found some
> differences from Dridan & Oepen, 2011's description, and this email is to
> confirm those findings. Namely, that mtool's (and now my) implementation do
> the following:
>
> * CARGs are treated as property triples ("class 3 information").
> Previously they were combined with the predicate name. This change means
> that predicates like 'named' will match even if their CARGs don't and the
> CARGs are a separate thing that needs to be matched.
>
> * The identification of the graph's TOP counts as a triple.
>
> One difference between mtool and delphin.edm is that mtool does not count
> "variable" properties from EDS, but that's just because its EDS parser does
> not yet handle them while PyDelphin's does.
>
> Can anyone familiar with EDM confirm the above? Or can anyone explain how
> to call the Perl or LKB code so I can compare?
>
> --
> -Michael Wayne Goodman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200116/f3206090/attachment.html>

From ebender at uw.edu  Fri Jan 17 01:42:08 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Thu, 16 Jan 2020 16:42:08 -0800
Subject: [developers] Skipping non-parsed items in fftb
Message-ID: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>

Dear all,

We are doing some treebanking here at UW with fftb with grammars that have
very low coverage over their associated test corpora. The current behavior
of fftb with these profiles is to include all items for treebanking, but
give a 404 for each one with no parse forest stored. This necessitates
clicking the back button and tracking which one is next (since
nothing changes color). In that light, two questions:

(1) Is there some option we can pass fftb so that it just doesn't present
items with no parses?
(2) Failing that, is it fairly straightforward with pydelphin, [incr
tsdb()] or something else to export a version of the profiles that only
includes items which the grammar successfully parsed?

Thanks,
Emily


--
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200116/be14f38a/attachment.html>

From goodman.m.w at gmail.com  Fri Jan 17 03:33:03 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 17 Jan 2020 10:33:03 +0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
Message-ID: <CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>

Hi Emily,

For (2), here is how you could do it with PyDelphin:

    delphin process -g grm.dat original-profile/
    delphin mkprof --full --where 'readings > 0' --source original-profile/
new-profile/
    delphin process -g grm.dat --full-forest new-profile/

Note that original-profile/ is first parsed in regular (non-forest) mode,
because in full-forest mode the number of readings is essentially unknown
until they are enumerated and thus the 'readings' field is always 0. The
second command not only prunes lines in the 'parse' file with readings ==
0, but also lines in the 'item' file which correspond to those 'parse'
lines. Once you have created new-profile/, you can parse again with
--full-forest for use with FFTB (and of course you don't have to use
PyDelphin for the parsing steps, if you prefer other means).

Also note that this results in a profile with no edges for partial parses.
I think this is what you want. There should be a way to prune the
full-forest profile directly while keeping partial parses, but while
investigating this use case I found a bug, so I don't recommend it yet.

Try `delphin mkprof --help` to see descriptions of these and other options.
They map fairly directly to the function documented here:
https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html#mkprof


On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu> wrote:

> Dear all,
>
> We are doing some treebanking here at UW with fftb with grammars that have
> very low coverage over their associated test corpora. The current behavior
> of fftb with these profiles is to include all items for treebanking, but
> give a 404 for each one with no parse forest stored. This necessitates
> clicking the back button and tracking which one is next (since
> nothing changes color). In that light, two questions:
>
> (1) Is there some option we can pass fftb so that it just doesn't present
> items with no parses?
> (2) Failing that, is it fairly straightforward with pydelphin, [incr
> tsdb()] or something else to export a version of the profiles that only
> includes items which the grammar successfully parsed?
>
> Thanks,
> Emily
>
>
> --
> Emily M. Bender (she/her)
> Howard and Frances Nostrand Endowed Professor
> Department of Linguistics
> Faculty Director, CLMS
> University of Washington
> Twitter: @emilymbender
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200117/b91af7a2/attachment.html>

From ebender at uw.edu  Fri Jan 17 03:36:51 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Thu, 16 Jan 2020 18:36:51 -0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
	<CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
Message-ID: <CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>

Thanks, Mike! I will give this a try.

On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> Hi Emily,
>
> For (2), here is how you could do it with PyDelphin:
>
>     delphin process -g grm.dat original-profile/
>     delphin mkprof --full --where 'readings > 0' --source
> original-profile/ new-profile/
>     delphin process -g grm.dat --full-forest new-profile/
>
> Note that original-profile/ is first parsed in regular (non-forest) mode,
> because in full-forest mode the number of readings is essentially unknown
> until they are enumerated and thus the 'readings' field is always 0. The
> second command not only prunes lines in the 'parse' file with readings ==
> 0, but also lines in the 'item' file which correspond to those 'parse'
> lines. Once you have created new-profile/, you can parse again with
> --full-forest for use with FFTB (and of course you don't have to use
> PyDelphin for the parsing steps, if you prefer other means).
>
> Also note that this results in a profile with no edges for partial parses.
> I think this is what you want. There should be a way to prune the
> full-forest profile directly while keeping partial parses, but while
> investigating this use case I found a bug, so I don't recommend it yet.
>
> Try `delphin mkprof --help` to see descriptions of these and other
> options. They map fairly directly to the function documented here:
> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
> #mkprof
>
>
> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu> wrote:
>
>> Dear all,
>>
>> We are doing some treebanking here at UW with fftb with grammars that
>> have very low coverage over their associated test corpora. The current
>> behavior of fftb with these profiles is to include all items for
>> treebanking, but give a 404 for each one with no parse forest stored. This
>> necessitates clicking the back button and tracking which one is next (since
>> nothing changes color). In that light, two questions:
>>
>> (1) Is there some option we can pass fftb so that it just doesn't present
>> items with no parses?
>> (2) Failing that, is it fairly straightforward with pydelphin, [incr
>> tsdb()] or something else to export a version of the profiles that only
>> includes items which the grammar successfully parsed?
>>
>> Thanks,
>> Emily
>>
>>
>> --
>> Emily M. Bender (she/her)
>> Howard and Frances Nostrand Endowed Professor
>> Department of Linguistics
>> Faculty Director, CLMS
>> University of Washington
>> Twitter: @emilymbender
>>
>
>
> --
> -Michael Wayne Goodman
>
-- 
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200116/539aefaf/attachment-0001.html>

From goodman.m.w at gmail.com  Fri Jan 17 03:45:49 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 17 Jan 2020 10:45:49 +0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
	<CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
	<CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>
Message-ID: <CAGXBFArrRVQQ=QLcSejZcdH1cTCUH-Yc3ac0b1F4ajoJsqvZMg@mail.gmail.com>

Let me know how it goes.

And a clarification: the --full option on `mkprof` doesn't hurt, but it's
unnecessary since you're re-parsing the created profile.

Also here's the bug report for the other thing, if you're interested in
that use case: https://github.com/delph-in/pydelphin/issues/273

On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender <ebender at uw.edu> wrote:

> Thanks, Mike! I will give this a try.
>
> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>> Hi Emily,
>>
>> For (2), here is how you could do it with PyDelphin:
>>
>>     delphin process -g grm.dat original-profile/
>>     delphin mkprof --full --where 'readings > 0' --source
>> original-profile/ new-profile/
>>     delphin process -g grm.dat --full-forest new-profile/
>>
>> Note that original-profile/ is first parsed in regular (non-forest) mode,
>> because in full-forest mode the number of readings is essentially unknown
>> until they are enumerated and thus the 'readings' field is always 0. The
>> second command not only prunes lines in the 'parse' file with readings ==
>> 0, but also lines in the 'item' file which correspond to those 'parse'
>> lines. Once you have created new-profile/, you can parse again with
>> --full-forest for use with FFTB (and of course you don't have to use
>> PyDelphin for the parsing steps, if you prefer other means).
>>
>> Also note that this results in a profile with no edges for partial
>> parses. I think this is what you want. There should be a way to prune the
>> full-forest profile directly while keeping partial parses, but while
>> investigating this use case I found a bug, so I don't recommend it yet.
>>
>> Try `delphin mkprof --help` to see descriptions of these and other
>> options. They map fairly directly to the function documented here:
>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
>> #mkprof
>>
>>
>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu> wrote:
>>
>>> Dear all,
>>>
>>> We are doing some treebanking here at UW with fftb with grammars that
>>> have very low coverage over their associated test corpora. The current
>>> behavior of fftb with these profiles is to include all items for
>>> treebanking, but give a 404 for each one with no parse forest stored. This
>>> necessitates clicking the back button and tracking which one is next (since
>>> nothing changes color). In that light, two questions:
>>>
>>> (1) Is there some option we can pass fftb so that it just doesn't
>>> present items with no parses?
>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr
>>> tsdb()] or something else to export a version of the profiles that only
>>> includes items which the grammar successfully parsed?
>>>
>>> Thanks,
>>> Emily
>>>
>>>
>>> --
>>> Emily M. Bender (she/her)
>>> Howard and Frances Nostrand Endowed Professor
>>> Department of Linguistics
>>> Faculty Director, CLMS
>>> University of Washington
>>> Twitter: @emilymbender
>>>
>>
>>
>> --
>> -Michael Wayne Goodman
>>
> --
> Emily M. Bender (she/her)
> Howard and Frances Nostrand Endowed Professor
> Department of Linguistics
> Faculty Director, CLMS
> University of Washington
> Twitter: @emilymbender
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200117/bd043b61/attachment.html>

From goodman.m.w at gmail.com  Fri Jan 17 07:39:26 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 17 Jan 2020 14:39:26 +0800
Subject: [developers] EDM implementations
In-Reply-To: <CAKRPO=PZv8N9E0AP=wt5UN19U5zgKnp3-BhT1F9H2xz9Yko+yQ@mail.gmail.com>
References: <CAGXBFArBbMGPfh_7s_2wkZQyjsM-G15WJUDSmG2D6Rx3fwsf3w@mail.gmail.com>
	<CAKRPO=PZv8N9E0AP=wt5UN19U5zgKnp3-BhT1F9H2xz9Yko+yQ@mail.gmail.com>
Message-ID: <CAGXBFAp1W4JkWfKK2M0tBTN=0ChsOBgJbTfQ3OxZU7-MAButew@mail.gmail.com>

Thanks, Bec!

I manually put in the :ltriples in the parse script and was able to produce
some output that edm_eval.pl could read.

Regarding the CARG being combined with the predicate name, that was what I
guessed by looking at the Lisp code. Thanks for correcting my mistake.

One more detail is what to do when the two sides (gold and test) have
different numbers of items. Currently my code stops as soon as either a
gold or test item is missing, which is what smatch (the similar metric made
for AMR) does, but I think that may be wrong because parsing profiles are
likely to have missing or extra (overgeneration) items in the middle. So
the question is whether we ignore it or count it as a full mismatch.


On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan <bec.dridan at gmail.com> wrote:

> Wow, that is some old code... From memory, export was a wrapper around
> `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values*
> set.
>
> I don't know the mtool code at all, but re-reading the paper and looking
> at the perl code, I don't think the original implementation evaluated CARG
> at all. We only checked that the correct character span had a pred name
> of`named`.
>
> I think you are right that the triple export at the time did not produce a
> triple for TOP and it hence would not have been counted.
>
> That match your memory Stephan?
>
> Bec
>
>
> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>> Hello developers,
>>
>> Recently I wanted to try out Elementary Dependency Match (EDM) but I did
>> not find an easy way to do it. I saw lisp code in the LKB's repository and
>> Bec's Perl code, but I'm not sure how to call the former from the command
>> line and the latter seems outdated (I don't see the "export" command
>> required by its instructions).
>>
>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd
>> implement it on top of PyDelphin. The result is here:
>> https://github.com/delph-in/delphin.edm. It requires the latest version
>> of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text
>> files or [incr tsdb()] profiles.
>>
>> When I nearly had my version working I found that Stephan et al.'s mtool (
>> https://github.com/cfmrp <https://github.com/cfmrp/mtool>The paper
>> example
>> /mtool <https://github.com/cfmrp/mtool>) also had an implementation of
>> EDM, so I used that to compare with my outputs (as I couldn't get the
>> previous implementations to work). In this process I think I found some
>> differences from Dridan & Oepen, 2011's description, and this email is to
>> confirm those findings. Namely, that mtool's (and now my) implementation do
>> the following:
>>
>> * CARGs are treated as property triples ("class 3 information").
>> Previously they were combined with the predicate name. This change means
>> that predicates like 'named' will match even if their CARGs don't and the
>> CARGs are a separate thing that needs to be matched.
>>
>> * The identification of the graph's TOP counts as a triple.
>>
>> One difference between mtool and delphin.edm is that mtool does not count
>> "variable" properties from EDS, but that's just because its EDS parser does
>> not yet handle them while PyDelphin's does.
>>
>> Can anyone familiar with EDM confirm the above? Or can anyone explain how
>> to call the Perl or LKB code so I can compare?
>>
>> --
>> -Michael Wayne Goodman
>>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200117/10ced8c0/attachment.html>

From bec.dridan at gmail.com  Fri Jan 17 11:14:28 2020
From: bec.dridan at gmail.com (Bec Dridan)
Date: Fri, 17 Jan 2020 21:14:28 +1100
Subject: [developers] EDM implementations
In-Reply-To: <CAGXBFAp1W4JkWfKK2M0tBTN=0ChsOBgJbTfQ3OxZU7-MAButew@mail.gmail.com>
References: <CAGXBFArBbMGPfh_7s_2wkZQyjsM-G15WJUDSmG2D6Rx3fwsf3w@mail.gmail.com>
	<CAKRPO=PZv8N9E0AP=wt5UN19U5zgKnp3-BhT1F9H2xz9Yko+yQ@mail.gmail.com>
	<CAGXBFAp1W4JkWfKK2M0tBTN=0ChsOBgJbTfQ3OxZU7-MAButew@mail.gmail.com>
Message-ID: <CAKRPO=OdgmCtKFKOx_-FDHZ6+kkyZ-b0TsXhH+Ncu9buyLZB=w@mail.gmail.com>

On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

>
> One more detail is what to do when the two sides (gold and test) have
> different numbers of items. Currently my code stops as soon as either a
> gold or test item is missing, which is what smatch (the similar metric made
> for AMR) does, but I think that may be wrong because parsing profiles are
> likely to have missing or extra (overgeneration) items in the middle. So
> the question is whether we ignore it or count it as a full mismatch.
>

If you are asking what is 'correct', I guess that depends on why you are
evaluating. The perl implementation wouldn't have noticed missing gold
parses, because it used the gold set as the definition of the set. A
missing test item, on the other hand, by default counts as a full mismatch,
but there is a command line option to ignore any gold parse with no
corresponding test parse. The ignore option is useful when the purpose of
the evaluation is assessing the system you are working on (and you consider
coverage separately). For comparing across systems, I imagine you probably
want to count parse failure as a full mismatch. It was useful for me to
have both options.

Bec


>
> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan <bec.dridan at gmail.com> wrote:
>
>> Wow, that is some old code... From memory, export was a wrapper around
>> `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values*
>> set.
>>
>> I don't know the mtool code at all, but re-reading the paper and looking
>> at the perl code, I don't think the original implementation evaluated CARG
>> at all. We only checked that the correct character span had a pred name
>> of`named`.
>>
>> I think you are right that the triple export at the time did not produce
>> a triple for TOP and it hence would not have been counted.
>>
>> That match your memory Stephan?
>>
>> Bec
>>
>>
>> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com <
>> goodman.m.w at gmail.com> wrote:
>>
>>> Hello developers,
>>>
>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I did
>>> not find an easy way to do it. I saw lisp code in the LKB's repository and
>>> Bec's Perl code, but I'm not sure how to call the former from the command
>>> line and the latter seems outdated (I don't see the "export" command
>>> required by its instructions).
>>>
>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd
>>> implement it on top of PyDelphin. The result is here:
>>> https://github.com/delph-in/delphin.edm. It requires the latest version
>>> of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text
>>> files or [incr tsdb()] profiles.
>>>
>>> When I nearly had my version working I found that Stephan et al.'s mtool
>>> (https://github.com/cfmrp <https://github.com/cfmrp/mtool>The paper
>>> example
>>> /mtool <https://github.com/cfmrp/mtool>) also had an implementation of
>>> EDM, so I used that to compare with my outputs (as I couldn't get the
>>> previous implementations to work). In this process I think I found some
>>> differences from Dridan & Oepen, 2011's description, and this email is to
>>> confirm those findings. Namely, that mtool's (and now my) implementation do
>>> the following:
>>>
>>> * CARGs are treated as property triples ("class 3 information").
>>> Previously they were combined with the predicate name. This change means
>>> that predicates like 'named' will match even if their CARGs don't and the
>>> CARGs are a separate thing that needs to be matched.
>>>
>>> * The identification of the graph's TOP counts as a triple.
>>>
>>> One difference between mtool and delphin.edm is that mtool does not
>>> count "variable" properties from EDS, but that's just because its EDS
>>> parser does not yet handle them while PyDelphin's does.
>>>
>>> Can anyone familiar with EDM confirm the above? Or can anyone explain
>>> how to call the Perl or LKB code so I can compare?
>>>
>>> --
>>> -Michael Wayne Goodman
>>>
>>
>
> --
> -Michael Wayne Goodman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200117/07f5fc26/attachment-0001.html>

From ebender at uw.edu  Sat Jan 18 00:41:16 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Fri, 17 Jan 2020 15:41:16 -0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAGXBFArrRVQQ=QLcSejZcdH1cTCUH-Yc3ac0b1F4ajoJsqvZMg@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
	<CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
	<CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>
	<CAGXBFArrRVQQ=QLcSejZcdH1cTCUH-Yc3ac0b1F4ajoJsqvZMg@mail.gmail.com>
Message-ID: <CAMype6cp6C2nUkro5=2eBvDmBxxr=xTJge1AsvD-FyynU0ZgXQ@mail.gmail.com>

Dear Mike,

Alas, I'm hitting this error:

(run_agg) ebender at patas:~$ delphin process -g
ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
Traceback (most recent call last):
  File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
    sys.exit(main())
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
line 42, in main
    args.func(args)
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
line 135, in call_process
    gzip=args.gzip)
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
line 540, in process
    source = itsdb.TestSuite(source)
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py",
line 644, in __init__
    '*schema* argument is required for new test suites')
delphin.itsdb.ITSDBError: *schema* argument is required for new test suites

I'll poke around and see where the schema requirement is coming from
(nothing in the bit on "process" in the documentation page mentions it),
but thought I'd post here too in the meantime.

Emily

On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> Let me know how it goes.
>
> And a clarification: the --full option on `mkprof` doesn't hurt, but it's
> unnecessary since you're re-parsing the created profile.
>
> Also here's the bug report for the other thing, if you're interested in
> that use case: https://github.com/delph-in/pydelphin/issues/273
>
> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender <ebender at uw.edu> wrote:
>
>> Thanks, Mike! I will give this a try.
>>
>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <
>> goodman.m.w at gmail.com> wrote:
>>
>>> Hi Emily,
>>>
>>> For (2), here is how you could do it with PyDelphin:
>>>
>>>     delphin process -g grm.dat original-profile/
>>>     delphin mkprof --full --where 'readings > 0' --source
>>> original-profile/ new-profile/
>>>     delphin process -g grm.dat --full-forest new-profile/
>>>
>>> Note that original-profile/ is first parsed in regular (non-forest)
>>> mode, because in full-forest mode the number of readings is essentially
>>> unknown until they are enumerated and thus the 'readings' field is always
>>> 0. The second command not only prunes lines in the 'parse' file with
>>> readings == 0, but also lines in the 'item' file which correspond to those
>>> 'parse' lines. Once you have created new-profile/, you can parse again with
>>> --full-forest for use with FFTB (and of course you don't have to use
>>> PyDelphin for the parsing steps, if you prefer other means).
>>>
>>> Also note that this results in a profile with no edges for partial
>>> parses. I think this is what you want. There should be a way to prune the
>>> full-forest profile directly while keeping partial parses, but while
>>> investigating this use case I found a bug, so I don't recommend it yet.
>>>
>>> Try `delphin mkprof --help` to see descriptions of these and other
>>> options. They map fairly directly to the function documented here:
>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
>>> #mkprof
>>>
>>>
>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu> wrote:
>>>
>>>> Dear all,
>>>>
>>>> We are doing some treebanking here at UW with fftb with grammars that
>>>> have very low coverage over their associated test corpora. The current
>>>> behavior of fftb with these profiles is to include all items for
>>>> treebanking, but give a 404 for each one with no parse forest stored. This
>>>> necessitates clicking the back button and tracking which one is next (since
>>>> nothing changes color). In that light, two questions:
>>>>
>>>> (1) Is there some option we can pass fftb so that it just doesn't
>>>> present items with no parses?
>>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr
>>>> tsdb()] or something else to export a version of the profiles that only
>>>> includes items which the grammar successfully parsed?
>>>>
>>>> Thanks,
>>>> Emily
>>>>
>>>>
>>>> --
>>>> Emily M. Bender (she/her)
>>>> Howard and Frances Nostrand Endowed Professor
>>>> Department of Linguistics
>>>> Faculty Director, CLMS
>>>> University of Washington
>>>> Twitter: @emilymbender
>>>>
>>>
>>>
>>> --
>>> -Michael Wayne Goodman
>>>
>> --
>> Emily M. Bender (she/her)
>> Howard and Frances Nostrand Endowed Professor
>> Department of Linguistics
>> Faculty Director, CLMS
>> University of Washington
>> Twitter: @emilymbender
>>
>
>
> --
> -Michael Wayne Goodman
>


-- 
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200117/b973fd5e/attachment.html>

From ebender at uw.edu  Sat Jan 18 00:51:48 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Fri, 17 Jan 2020 15:51:48 -0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAMype6cp6C2nUkro5=2eBvDmBxxr=xTJge1AsvD-FyynU0ZgXQ@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
	<CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
	<CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>
	<CAGXBFArrRVQQ=QLcSejZcdH1cTCUH-Yc3ac0b1F4ajoJsqvZMg@mail.gmail.com>
	<CAMype6cp6C2nUkro5=2eBvDmBxxr=xTJge1AsvD-FyynU0ZgXQ@mail.gmail.com>
Message-ID: <CAMype6eYDcFeZB7joOUWLj5j2hwDrf7h9Y6C--3GSRs-a8pQ_Q@mail.gmail.com>

Apologies --- that error meant I hadn't given the right path to the
testsuite. Correcting that, I now see:

(run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$
delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
Traceback (most recent call last):
  File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
    sys.exit(main())
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
line 42, in main
    args.func(args)
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
line 135, in call_process
    gzip=args.gzip)
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
line 542, in process
    column, tablename, condition = _interpret_selection(select, source)
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
line 562, in _interpret_selection
    if len(queryobj['tables']) == 1:
KeyError: 'tables'

On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender <ebender at uw.edu> wrote:

> Dear Mike,
>
> Alas, I'm hitting this error:
>
> (run_agg) ebender at patas:~$ delphin process -g
> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
> Traceback (most recent call last):
>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>     sys.exit(main())
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
> line 42, in main
>     args.func(args)
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
> line 135, in call_process
>     gzip=args.gzip)
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
> line 540, in process
>     source = itsdb.TestSuite(source)
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py",
> line 644, in __init__
>     '*schema* argument is required for new test suites')
> delphin.itsdb.ITSDBError: *schema* argument is required for new test suites
>
> I'll poke around and see where the schema requirement is coming from
> (nothing in the bit on "process" in the documentation page mentions it),
> but thought I'd post here too in the meantime.
>
> Emily
>
> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>> Let me know how it goes.
>>
>> And a clarification: the --full option on `mkprof` doesn't hurt, but it's
>> unnecessary since you're re-parsing the created profile.
>>
>> Also here's the bug report for the other thing, if you're interested in
>> that use case: https://github.com/delph-in/pydelphin/issues/273
>>
>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender <ebender at uw.edu> wrote:
>>
>>> Thanks, Mike! I will give this a try.
>>>
>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <
>>> goodman.m.w at gmail.com> wrote:
>>>
>>>> Hi Emily,
>>>>
>>>> For (2), here is how you could do it with PyDelphin:
>>>>
>>>>     delphin process -g grm.dat original-profile/
>>>>     delphin mkprof --full --where 'readings > 0' --source
>>>> original-profile/ new-profile/
>>>>     delphin process -g grm.dat --full-forest new-profile/
>>>>
>>>> Note that original-profile/ is first parsed in regular (non-forest)
>>>> mode, because in full-forest mode the number of readings is essentially
>>>> unknown until they are enumerated and thus the 'readings' field is always
>>>> 0. The second command not only prunes lines in the 'parse' file with
>>>> readings == 0, but also lines in the 'item' file which correspond to those
>>>> 'parse' lines. Once you have created new-profile/, you can parse again with
>>>> --full-forest for use with FFTB (and of course you don't have to use
>>>> PyDelphin for the parsing steps, if you prefer other means).
>>>>
>>>> Also note that this results in a profile with no edges for partial
>>>> parses. I think this is what you want. There should be a way to prune the
>>>> full-forest profile directly while keeping partial parses, but while
>>>> investigating this use case I found a bug, so I don't recommend it yet.
>>>>
>>>> Try `delphin mkprof --help` to see descriptions of these and other
>>>> options. They map fairly directly to the function documented here:
>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
>>>> #mkprof
>>>>
>>>>
>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> We are doing some treebanking here at UW with fftb with grammars that
>>>>> have very low coverage over their associated test corpora. The current
>>>>> behavior of fftb with these profiles is to include all items for
>>>>> treebanking, but give a 404 for each one with no parse forest stored. This
>>>>> necessitates clicking the back button and tracking which one is next (since
>>>>> nothing changes color). In that light, two questions:
>>>>>
>>>>> (1) Is there some option we can pass fftb so that it just doesn't
>>>>> present items with no parses?
>>>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr
>>>>> tsdb()] or something else to export a version of the profiles that only
>>>>> includes items which the grammar successfully parsed?
>>>>>
>>>>> Thanks,
>>>>> Emily
>>>>>
>>>>>
>>>>> --
>>>>> Emily M. Bender (she/her)
>>>>> Howard and Frances Nostrand Endowed Professor
>>>>> Department of Linguistics
>>>>> Faculty Director, CLMS
>>>>> University of Washington
>>>>> Twitter: @emilymbender
>>>>>
>>>>
>>>>
>>>> --
>>>> -Michael Wayne Goodman
>>>>
>>> --
>>> Emily M. Bender (she/her)
>>> Howard and Frances Nostrand Endowed Professor
>>> Department of Linguistics
>>> Faculty Director, CLMS
>>> University of Washington
>>> Twitter: @emilymbender
>>>
>>
>>
>> --
>> -Michael Wayne Goodman
>>
>
>
> --
> Emily M. Bender (she/her)
> Howard and Frances Nostrand Endowed Professor
> Department of Linguistics
> Faculty Director, CLMS
> University of Washington
> Twitter: @emilymbender
>


-- 
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200117/fd77447d/attachment-0001.html>

From goodman.m.w at gmail.com  Sat Jan 18 01:17:13 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Sat, 18 Jan 2020 08:17:13 +0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAMype6eYDcFeZB7joOUWLj5j2hwDrf7h9Y6C--3GSRs-a8pQ_Q@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
	<CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
	<CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>
	<CAGXBFArrRVQQ=QLcSejZcdH1cTCUH-Yc3ac0b1F4ajoJsqvZMg@mail.gmail.com>
	<CAMype6cp6C2nUkro5=2eBvDmBxxr=xTJge1AsvD-FyynU0ZgXQ@mail.gmail.com>
	<CAMype6eYDcFeZB7joOUWLj5j2hwDrf7h9Y6C--3GSRs-a8pQ_Q@mail.gmail.com>
Message-ID: <CAGXBFArps834D-XLguB7rBA67gTeaBNMdRcN01-oZiP5QOGBcw@mail.gmail.com>

Hi Emily,

Yes those error messages are not very clear. But the second one looks like
old code, as 'tables' is no longer a key in the object it's being looked up
on. I suggest making sure that your run_agg environment has an updated
version of PyDelphin. While the environment is active, try `pip install -U
pydelphin` and make sure it has a 1.0 or newer version (`delphin
--version`), then try again.

On Sat, Jan 18, 2020 at 7:52 AM Emily M. Bender <ebender at uw.edu> wrote:

> Apologies --- that error meant I hadn't given the right path to the
> testsuite. Correcting that, I now see:
>
> (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$
> delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
> Traceback (most recent call last):
>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>     sys.exit(main())
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
> line 42, in main
>     args.func(args)
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
> line 135, in call_process
>     gzip=args.gzip)
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
> line 542, in process
>     column, tablename, condition = _interpret_selection(select, source)
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
> line 562, in _interpret_selection
>     if len(queryobj['tables']) == 1:
> KeyError: 'tables'
>
> On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender <ebender at uw.edu> wrote:
>
>> Dear Mike,
>>
>> Alas, I'm hitting this error:
>>
>> (run_agg) ebender at patas:~$ delphin process -g
>> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
>> Traceback (most recent call last):
>>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>>     sys.exit(main())
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>> line 42, in main
>>     args.func(args)
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>> line 135, in call_process
>>     gzip=args.gzip)
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>> line 540, in process
>>     source = itsdb.TestSuite(source)
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py",
>> line 644, in __init__
>>     '*schema* argument is required for new test suites')
>> delphin.itsdb.ITSDBError: *schema* argument is required for new test
>> suites
>>
>> I'll poke around and see where the schema requirement is coming from
>> (nothing in the bit on "process" in the documentation page mentions it),
>> but thought I'd post here too in the meantime.
>>
>> Emily
>>
>> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com <
>> goodman.m.w at gmail.com> wrote:
>>
>>> Let me know how it goes.
>>>
>>> And a clarification: the --full option on `mkprof` doesn't hurt, but
>>> it's unnecessary since you're re-parsing the created profile.
>>>
>>> Also here's the bug report for the other thing, if you're interested in
>>> that use case: https://github.com/delph-in/pydelphin/issues/273
>>>
>>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender <ebender at uw.edu> wrote:
>>>
>>>> Thanks, Mike! I will give this a try.
>>>>
>>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <
>>>> goodman.m.w at gmail.com> wrote:
>>>>
>>>>> Hi Emily,
>>>>>
>>>>> For (2), here is how you could do it with PyDelphin:
>>>>>
>>>>>     delphin process -g grm.dat original-profile/
>>>>>     delphin mkprof --full --where 'readings > 0' --source
>>>>> original-profile/ new-profile/
>>>>>     delphin process -g grm.dat --full-forest new-profile/
>>>>>
>>>>> Note that original-profile/ is first parsed in regular (non-forest)
>>>>> mode, because in full-forest mode the number of readings is essentially
>>>>> unknown until they are enumerated and thus the 'readings' field is always
>>>>> 0. The second command not only prunes lines in the 'parse' file with
>>>>> readings == 0, but also lines in the 'item' file which correspond to those
>>>>> 'parse' lines. Once you have created new-profile/, you can parse again with
>>>>> --full-forest for use with FFTB (and of course you don't have to use
>>>>> PyDelphin for the parsing steps, if you prefer other means).
>>>>>
>>>>> Also note that this results in a profile with no edges for partial
>>>>> parses. I think this is what you want. There should be a way to prune the
>>>>> full-forest profile directly while keeping partial parses, but while
>>>>> investigating this use case I found a bug, so I don't recommend it yet.
>>>>>
>>>>> Try `delphin mkprof --help` to see descriptions of these and other
>>>>> options. They map fairly directly to the function documented here:
>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
>>>>> #mkprof
>>>>>
>>>>>
>>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu>
>>>>> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> We are doing some treebanking here at UW with fftb with grammars that
>>>>>> have very low coverage over their associated test corpora. The current
>>>>>> behavior of fftb with these profiles is to include all items for
>>>>>> treebanking, but give a 404 for each one with no parse forest stored. This
>>>>>> necessitates clicking the back button and tracking which one is next (since
>>>>>> nothing changes color). In that light, two questions:
>>>>>>
>>>>>> (1) Is there some option we can pass fftb so that it just doesn't
>>>>>> present items with no parses?
>>>>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr
>>>>>> tsdb()] or something else to export a version of the profiles that only
>>>>>> includes items which the grammar successfully parsed?
>>>>>>
>>>>>> Thanks,
>>>>>> Emily
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Emily M. Bender (she/her)
>>>>>> Howard and Frances Nostrand Endowed Professor
>>>>>> Department of Linguistics
>>>>>> Faculty Director, CLMS
>>>>>> University of Washington
>>>>>> Twitter: @emilymbender
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -Michael Wayne Goodman
>>>>>
>>>> --
>>>> Emily M. Bender (she/her)
>>>> Howard and Frances Nostrand Endowed Professor
>>>> Department of Linguistics
>>>> Faculty Director, CLMS
>>>> University of Washington
>>>> Twitter: @emilymbender
>>>>
>>>
>>>
>>> --
>>> -Michael Wayne Goodman
>>>
>>
>>
>> --
>> Emily M. Bender (she/her)
>> Howard and Frances Nostrand Endowed Professor
>> Department of Linguistics
>> Faculty Director, CLMS
>> University of Washington
>> Twitter: @emilymbender
>>
>
>
> --
> Emily M. Bender (she/her)
> Howard and Frances Nostrand Endowed Professor
> Department of Linguistics
> Faculty Director, CLMS
> University of Washington
> Twitter: @emilymbender
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200118/c493c689/attachment.html>

From goodman.m.w at gmail.com  Mon Jan 20 02:14:53 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Mon, 20 Jan 2020 09:14:53 +0800
Subject: [developers] EDM implementations
In-Reply-To: <CAKRPO=OdgmCtKFKOx_-FDHZ6+kkyZ-b0TsXhH+Ncu9buyLZB=w@mail.gmail.com>
References: <CAGXBFArBbMGPfh_7s_2wkZQyjsM-G15WJUDSmG2D6Rx3fwsf3w@mail.gmail.com>
	<CAKRPO=PZv8N9E0AP=wt5UN19U5zgKnp3-BhT1F9H2xz9Yko+yQ@mail.gmail.com>
	<CAGXBFAp1W4JkWfKK2M0tBTN=0ChsOBgJbTfQ3OxZU7-MAButew@mail.gmail.com>
	<CAKRPO=OdgmCtKFKOx_-FDHZ6+kkyZ-b0TsXhH+Ncu9buyLZB=w@mail.gmail.com>
Message-ID: <CAGXBFAp0x-Q3tzACMJaTywgENnQju+AbM9bVxVO2c97UTibrZw@mail.gmail.com>

Thanks again, Bec.

I just want to make sure my implementation gets the same scores for the
same inputs under the same assumptions as the original implementation. For
this to work, its behavior concerning the points I've sought clarification
for should be intentional. In light of your responses, I've separated the
CARG triples from other properties and have given it its own weight. Thus I
should be able to get the same scores as your code by setting the weights
of CARGs (but not properties) and graph-tops to zero. Similarly, I'll add
an option to ignore missing test items and otherwise treat them as
mismatches.

On Fri, Jan 17, 2020 at 6:14 PM Bec Dridan <bec.dridan at gmail.com> wrote:

>
>
> On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>>
>> One more detail is what to do when the two sides (gold and test) have
>> different numbers of items. Currently my code stops as soon as either a
>> gold or test item is missing, which is what smatch (the similar metric made
>> for AMR) does, but I think that may be wrong because parsing profiles are
>> likely to have missing or extra (overgeneration) items in the middle. So
>> the question is whether we ignore it or count it as a full mismatch.
>>
>
> If you are asking what is 'correct', I guess that depends on why you are
> evaluating. The perl implementation wouldn't have noticed missing gold
> parses, because it used the gold set as the definition of the set. A
> missing test item, on the other hand, by default counts as a full mismatch,
> but there is a command line option to ignore any gold parse with no
> corresponding test parse. The ignore option is useful when the purpose of
> the evaluation is assessing the system you are working on (and you consider
> coverage separately). For comparing across systems, I imagine you probably
> want to count parse failure as a full mismatch. It was useful for me to
> have both options.
>
> Bec
>
>
>>
>> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan <bec.dridan at gmail.com> wrote:
>>
>>> Wow, that is some old code... From memory, export was a wrapper around
>>> `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values*
>>> set.
>>>
>>> I don't know the mtool code at all, but re-reading the paper and looking
>>> at the perl code, I don't think the original implementation evaluated CARG
>>> at all. We only checked that the correct character span had a pred name
>>> of`named`.
>>>
>>> I think you are right that the triple export at the time did not produce
>>> a triple for TOP and it hence would not have been counted.
>>>
>>> That match your memory Stephan?
>>>
>>> Bec
>>>
>>>
>>> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com <
>>> goodman.m.w at gmail.com> wrote:
>>>
>>>> Hello developers,
>>>>
>>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I
>>>> did not find an easy way to do it. I saw lisp code in the LKB's repository
>>>> and Bec's Perl code, but I'm not sure how to call the former from the
>>>> command line and the latter seems outdated (I don't see the "export"
>>>> command required by its instructions).
>>>>
>>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd
>>>> implement it on top of PyDelphin. The result is here:
>>>> https://github.com/delph-in/delphin.edm. It requires the latest
>>>> version of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it
>>>> reads text files or [incr tsdb()] profiles.
>>>>
>>>> When I nearly had my version working I found that Stephan et al.'s
>>>> mtool (https://github.com/cfmrp <https://github.com/cfmrp/mtool>The
>>>> paper example
>>>> /mtool <https://github.com/cfmrp/mtool>) also had an implementation of
>>>> EDM, so I used that to compare with my outputs (as I couldn't get the
>>>> previous implementations to work). In this process I think I found some
>>>> differences from Dridan & Oepen, 2011's description, and this email is to
>>>> confirm those findings. Namely, that mtool's (and now my) implementation do
>>>> the following:
>>>>
>>>> * CARGs are treated as property triples ("class 3 information").
>>>> Previously they were combined with the predicate name. This change means
>>>> that predicates like 'named' will match even if their CARGs don't and the
>>>> CARGs are a separate thing that needs to be matched.
>>>>
>>>> * The identification of the graph's TOP counts as a triple.
>>>>
>>>> One difference between mtool and delphin.edm is that mtool does not
>>>> count "variable" properties from EDS, but that's just because its EDS
>>>> parser does not yet handle them while PyDelphin's does.
>>>>
>>>> Can anyone familiar with EDM confirm the above? Or can anyone explain
>>>> how to call the Perl or LKB code so I can compare?
>>>>
>>>> --
>>>> -Michael Wayne Goodman
>>>>
>>>
>>
>> --
>> -Michael Wayne Goodman
>>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200120/00ade632/attachment.html>

From ebender at uw.edu  Wed Jan 22 02:19:18 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Tue, 21 Jan 2020 17:19:18 -0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAGXBFArps834D-XLguB7rBA67gTeaBNMdRcN01-oZiP5QOGBcw@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
	<CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
	<CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>
	<CAGXBFArrRVQQ=QLcSejZcdH1cTCUH-Yc3ac0b1F4ajoJsqvZMg@mail.gmail.com>
	<CAMype6cp6C2nUkro5=2eBvDmBxxr=xTJge1AsvD-FyynU0ZgXQ@mail.gmail.com>
	<CAMype6eYDcFeZB7joOUWLj5j2hwDrf7h9Y6C--3GSRs-a8pQ_Q@mail.gmail.com>
	<CAGXBFArps834D-XLguB7rBA67gTeaBNMdRcN01-oZiP5QOGBcw@mail.gmail.com>
Message-ID: <CAMype6dcUeHshzLYU4h3nyEUPAT=SXV8iBdq9oNg2Jd_=ugd_w@mail.gmail.com>

Updating PyDelphin caused the error to change, at least:

Traceback (most recent call last):
  File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
    sys.exit(main())
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
line 40, in main
    args.func(args)
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/cli/process.py",
line 46, in call_process
    gzip=args.gzip)
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
line 602, in process
    with processor(grammar, cmdargs=options, **kwargs) as cpu:
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py",
line 110, in __init__
    self._open()
  File
"/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py",
line 143, in _open
    raise ACEProcessError('Process closed on startup; see <stderr>.')
delphin.ace.ACEProcessError: Process closed on startup; see <stderr>.


... I'm a little puzzled as to how I got about "seeing <stderr>" as I
didn't myself do anything to redirect it somewhere else, but it's not
printing to the console...

Emily

On Fri, Jan 17, 2020 at 4:17 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> Hi Emily,
>
> Yes those error messages are not very clear. But the second one looks like
> old code, as 'tables' is no longer a key in the object it's being looked up
> on. I suggest making sure that your run_agg environment has an updated
> version of PyDelphin. While the environment is active, try `pip install -U
> pydelphin` and make sure it has a 1.0 or newer version (`delphin
> --version`), then try again.
>
> On Sat, Jan 18, 2020 at 7:52 AM Emily M. Bender <ebender at uw.edu> wrote:
>
>> Apologies --- that error meant I hadn't given the right path to the
>> testsuite. Correcting that, I now see:
>>
>> (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$
>> delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
>> Traceback (most recent call last):
>>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>>     sys.exit(main())
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>> line 42, in main
>>     args.func(args)
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>> line 135, in call_process
>>     gzip=args.gzip)
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>> line 542, in process
>>     column, tablename, condition = _interpret_selection(select, source)
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>> line 562, in _interpret_selection
>>     if len(queryobj['tables']) == 1:
>> KeyError: 'tables'
>>
>> On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender <ebender at uw.edu> wrote:
>>
>>> Dear Mike,
>>>
>>> Alas, I'm hitting this error:
>>>
>>> (run_agg) ebender at patas:~$ delphin process -g
>>> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
>>> Traceback (most recent call last):
>>>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>>>     sys.exit(main())
>>>   File
>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>> line 42, in main
>>>     args.func(args)
>>>   File
>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>> line 135, in call_process
>>>     gzip=args.gzip)
>>>   File
>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>>> line 540, in process
>>>     source = itsdb.TestSuite(source)
>>>   File
>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py",
>>> line 644, in __init__
>>>     '*schema* argument is required for new test suites')
>>> delphin.itsdb.ITSDBError: *schema* argument is required for new test
>>> suites
>>>
>>> I'll poke around and see where the schema requirement is coming from
>>> (nothing in the bit on "process" in the documentation page mentions it),
>>> but thought I'd post here too in the meantime.
>>>
>>> Emily
>>>
>>> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com <
>>> goodman.m.w at gmail.com> wrote:
>>>
>>>> Let me know how it goes.
>>>>
>>>> And a clarification: the --full option on `mkprof` doesn't hurt, but
>>>> it's unnecessary since you're re-parsing the created profile.
>>>>
>>>> Also here's the bug report for the other thing, if you're interested in
>>>> that use case: https://github.com/delph-in/pydelphin/issues/273
>>>>
>>>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender <ebender at uw.edu>
>>>> wrote:
>>>>
>>>>> Thanks, Mike! I will give this a try.
>>>>>
>>>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <
>>>>> goodman.m.w at gmail.com> wrote:
>>>>>
>>>>>> Hi Emily,
>>>>>>
>>>>>> For (2), here is how you could do it with PyDelphin:
>>>>>>
>>>>>>     delphin process -g grm.dat original-profile/
>>>>>>     delphin mkprof --full --where 'readings > 0' --source
>>>>>> original-profile/ new-profile/
>>>>>>     delphin process -g grm.dat --full-forest new-profile/
>>>>>>
>>>>>> Note that original-profile/ is first parsed in regular (non-forest)
>>>>>> mode, because in full-forest mode the number of readings is essentially
>>>>>> unknown until they are enumerated and thus the 'readings' field is always
>>>>>> 0. The second command not only prunes lines in the 'parse' file with
>>>>>> readings == 0, but also lines in the 'item' file which correspond to those
>>>>>> 'parse' lines. Once you have created new-profile/, you can parse again with
>>>>>> --full-forest for use with FFTB (and of course you don't have to use
>>>>>> PyDelphin for the parsing steps, if you prefer other means).
>>>>>>
>>>>>> Also note that this results in a profile with no edges for partial
>>>>>> parses. I think this is what you want. There should be a way to prune the
>>>>>> full-forest profile directly while keeping partial parses, but while
>>>>>> investigating this use case I found a bug, so I don't recommend it yet.
>>>>>>
>>>>>> Try `delphin mkprof --help` to see descriptions of these and other
>>>>>> options. They map fairly directly to the function documented here:
>>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
>>>>>> #mkprof
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> We are doing some treebanking here at UW with fftb with grammars
>>>>>>> that have very low coverage over their associated test corpora. The current
>>>>>>> behavior of fftb with these profiles is to include all items for
>>>>>>> treebanking, but give a 404 for each one with no parse forest stored. This
>>>>>>> necessitates clicking the back button and tracking which one is next (since
>>>>>>> nothing changes color). In that light, two questions:
>>>>>>>
>>>>>>> (1) Is there some option we can pass fftb so that it just doesn't
>>>>>>> present items with no parses?
>>>>>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr
>>>>>>> tsdb()] or something else to export a version of the profiles that only
>>>>>>> includes items which the grammar successfully parsed?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Emily
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Emily M. Bender (she/her)
>>>>>>> Howard and Frances Nostrand Endowed Professor
>>>>>>> Department of Linguistics
>>>>>>> Faculty Director, CLMS
>>>>>>> University of Washington
>>>>>>> Twitter: @emilymbender
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Michael Wayne Goodman
>>>>>>
>>>>> --
>>>>> Emily M. Bender (she/her)
>>>>> Howard and Frances Nostrand Endowed Professor
>>>>> Department of Linguistics
>>>>> Faculty Director, CLMS
>>>>> University of Washington
>>>>> Twitter: @emilymbender
>>>>>
>>>>
>>>>
>>>> --
>>>> -Michael Wayne Goodman
>>>>
>>>
>>>
>>> --
>>> Emily M. Bender (she/her)
>>> Howard and Frances Nostrand Endowed Professor
>>> Department of Linguistics
>>> Faculty Director, CLMS
>>> University of Washington
>>> Twitter: @emilymbender
>>>
>>
>>
>> --
>> Emily M. Bender (she/her)
>> Howard and Frances Nostrand Endowed Professor
>> Department of Linguistics
>> Faculty Director, CLMS
>> University of Washington
>> Twitter: @emilymbender
>>
>
>
> --
> -Michael Wayne Goodman
>


-- 
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200121/16add829/attachment.html>

From goodman.m.w at gmail.com  Wed Jan 22 02:36:07 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Wed, 22 Jan 2020 09:36:07 +0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAMype6dcUeHshzLYU4h3nyEUPAT=SXV8iBdq9oNg2Jd_=ugd_w@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
	<CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
	<CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>
	<CAGXBFArrRVQQ=QLcSejZcdH1cTCUH-Yc3ac0b1F4ajoJsqvZMg@mail.gmail.com>
	<CAMype6cp6C2nUkro5=2eBvDmBxxr=xTJge1AsvD-FyynU0ZgXQ@mail.gmail.com>
	<CAMype6eYDcFeZB7joOUWLj5j2hwDrf7h9Y6C--3GSRs-a8pQ_Q@mail.gmail.com>
	<CAGXBFArps834D-XLguB7rBA67gTeaBNMdRcN01-oZiP5QOGBcw@mail.gmail.com>
	<CAMype6dcUeHshzLYU4h3nyEUPAT=SXV8iBdq9oNg2Jd_=ugd_w@mail.gmail.com>
Message-ID: <CAGXBFAqfNXC=FgtUE1i9cgueq99ZfzST7xsK05rma+9VPCbrRQ@mail.gmail.com>

Ok, that's progress. This error shows up when ACE exited abnormally (recall
that PyDelphin calls ACE in a subprocess in order to process profiles).
Since I don't capture ACE's stderr, it should be printed in the terminal
just above the stacktrace. The PyDelphin error is directing you to look for
that message to fix the problem. Most likely, the path to the grammar image
is incorrect or the grammar image was compiled with a different version of
ACE. The stacktrace and error message are both printed to stderr so if you
see one you should see both (unless ACE exited abnormally without printing
anything).

By the way, I've recently pushed some commits to suppress the stacktrace
when encountering anticipated errors from the `delphin` command, as I don't
think the stacktrace is useful except for me (it can be shown again when
called in DEBUG mode). In addition I tried to provide more useful messages
for common situations. These changes will be part of the next release.

On Wed, Jan 22, 2020 at 9:19 AM Emily M. Bender <ebender at uw.edu> wrote:

> Updating PyDelphin caused the error to change, at least:
>
> Traceback (most recent call last):
>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>     sys.exit(main())
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
> line 40, in main
>     args.func(args)
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/cli/process.py",
> line 46, in call_process
>     gzip=args.gzip)
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
> line 602, in process
>     with processor(grammar, cmdargs=options, **kwargs) as cpu:
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py",
> line 110, in __init__
>     self._open()
>   File
> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py",
> line 143, in _open
>     raise ACEProcessError('Process closed on startup; see <stderr>.')
> delphin.ace.ACEProcessError: Process closed on startup; see <stderr>.
>
>
> ... I'm a little puzzled as to how I got about "seeing <stderr>" as I
> didn't myself do anything to redirect it somewhere else, but it's not
> printing to the console...
>
> Emily
>
> On Fri, Jan 17, 2020 at 4:17 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>> Hi Emily,
>>
>> Yes those error messages are not very clear. But the second one looks
>> like old code, as 'tables' is no longer a key in the object it's being
>> looked up on. I suggest making sure that your run_agg environment has an
>> updated version of PyDelphin. While the environment is active, try `pip
>> install -U pydelphin` and make sure it has a 1.0 or newer version (`delphin
>> --version`), then try again.
>>
>> On Sat, Jan 18, 2020 at 7:52 AM Emily M. Bender <ebender at uw.edu> wrote:
>>
>>> Apologies --- that error meant I hadn't given the right path to the
>>> testsuite. Correcting that, I now see:
>>>
>>> (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$
>>> delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
>>> Traceback (most recent call last):
>>>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>>>     sys.exit(main())
>>>   File
>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>> line 42, in main
>>>     args.func(args)
>>>   File
>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>> line 135, in call_process
>>>     gzip=args.gzip)
>>>   File
>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>>> line 542, in process
>>>     column, tablename, condition = _interpret_selection(select, source)
>>>   File
>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>>> line 562, in _interpret_selection
>>>     if len(queryobj['tables']) == 1:
>>> KeyError: 'tables'
>>>
>>> On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender <ebender at uw.edu> wrote:
>>>
>>>> Dear Mike,
>>>>
>>>> Alas, I'm hitting this error:
>>>>
>>>> (run_agg) ebender at patas:~$ delphin process -g
>>>> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
>>>> Traceback (most recent call last):
>>>>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>>>>     sys.exit(main())
>>>>   File
>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>>> line 42, in main
>>>>     args.func(args)
>>>>   File
>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>>> line 135, in call_process
>>>>     gzip=args.gzip)
>>>>   File
>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>>>> line 540, in process
>>>>     source = itsdb.TestSuite(source)
>>>>   File
>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py",
>>>> line 644, in __init__
>>>>     '*schema* argument is required for new test suites')
>>>> delphin.itsdb.ITSDBError: *schema* argument is required for new test
>>>> suites
>>>>
>>>> I'll poke around and see where the schema requirement is coming from
>>>> (nothing in the bit on "process" in the documentation page mentions it),
>>>> but thought I'd post here too in the meantime.
>>>>
>>>> Emily
>>>>
>>>> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com <
>>>> goodman.m.w at gmail.com> wrote:
>>>>
>>>>> Let me know how it goes.
>>>>>
>>>>> And a clarification: the --full option on `mkprof` doesn't hurt, but
>>>>> it's unnecessary since you're re-parsing the created profile.
>>>>>
>>>>> Also here's the bug report for the other thing, if you're interested
>>>>> in that use case: https://github.com/delph-in/pydelphin/issues/273
>>>>>
>>>>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender <ebender at uw.edu>
>>>>> wrote:
>>>>>
>>>>>> Thanks, Mike! I will give this a try.
>>>>>>
>>>>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <
>>>>>> goodman.m.w at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Emily,
>>>>>>>
>>>>>>> For (2), here is how you could do it with PyDelphin:
>>>>>>>
>>>>>>>     delphin process -g grm.dat original-profile/
>>>>>>>     delphin mkprof --full --where 'readings > 0' --source
>>>>>>> original-profile/ new-profile/
>>>>>>>     delphin process -g grm.dat --full-forest new-profile/
>>>>>>>
>>>>>>> Note that original-profile/ is first parsed in regular (non-forest)
>>>>>>> mode, because in full-forest mode the number of readings is essentially
>>>>>>> unknown until they are enumerated and thus the 'readings' field is always
>>>>>>> 0. The second command not only prunes lines in the 'parse' file with
>>>>>>> readings == 0, but also lines in the 'item' file which correspond to those
>>>>>>> 'parse' lines. Once you have created new-profile/, you can parse again with
>>>>>>> --full-forest for use with FFTB (and of course you don't have to use
>>>>>>> PyDelphin for the parsing steps, if you prefer other means).
>>>>>>>
>>>>>>> Also note that this results in a profile with no edges for partial
>>>>>>> parses. I think this is what you want. There should be a way to prune the
>>>>>>> full-forest profile directly while keeping partial parses, but while
>>>>>>> investigating this use case I found a bug, so I don't recommend it yet.
>>>>>>>
>>>>>>> Try `delphin mkprof --help` to see descriptions of these and other
>>>>>>> options. They map fairly directly to the function documented here:
>>>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
>>>>>>> #mkprof
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> We are doing some treebanking here at UW with fftb with grammars
>>>>>>>> that have very low coverage over their associated test corpora. The current
>>>>>>>> behavior of fftb with these profiles is to include all items for
>>>>>>>> treebanking, but give a 404 for each one with no parse forest stored. This
>>>>>>>> necessitates clicking the back button and tracking which one is next (since
>>>>>>>> nothing changes color). In that light, two questions:
>>>>>>>>
>>>>>>>> (1) Is there some option we can pass fftb so that it just doesn't
>>>>>>>> present items with no parses?
>>>>>>>> (2) Failing that, is it fairly straightforward with pydelphin,
>>>>>>>> [incr tsdb()] or something else to export a version of the profiles that
>>>>>>>> only includes items which the grammar successfully parsed?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Emily
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Emily M. Bender (she/her)
>>>>>>>> Howard and Frances Nostrand Endowed Professor
>>>>>>>> Department of Linguistics
>>>>>>>> Faculty Director, CLMS
>>>>>>>> University of Washington
>>>>>>>> Twitter: @emilymbender
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -Michael Wayne Goodman
>>>>>>>
>>>>>> --
>>>>>> Emily M. Bender (she/her)
>>>>>> Howard and Frances Nostrand Endowed Professor
>>>>>> Department of Linguistics
>>>>>> Faculty Director, CLMS
>>>>>> University of Washington
>>>>>> Twitter: @emilymbender
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -Michael Wayne Goodman
>>>>>
>>>>
>>>>
>>>> --
>>>> Emily M. Bender (she/her)
>>>> Howard and Frances Nostrand Endowed Professor
>>>> Department of Linguistics
>>>> Faculty Director, CLMS
>>>> University of Washington
>>>> Twitter: @emilymbender
>>>>
>>>
>>>
>>> --
>>> Emily M. Bender (she/her)
>>> Howard and Frances Nostrand Endowed Professor
>>> Department of Linguistics
>>> Faculty Director, CLMS
>>> University of Washington
>>> Twitter: @emilymbender
>>>
>>
>>
>> --
>> -Michael Wayne Goodman
>>
>
>
> --
> Emily M. Bender (she/her)
> Howard and Frances Nostrand Endowed Professor
> Department of Linguistics
> Faculty Director, CLMS
> University of Washington
> Twitter: @emilymbender
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200122/801a9593/attachment-0001.html>

From ebender at uw.edu  Wed Jan 22 02:39:57 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Tue, 21 Jan 2020 17:39:57 -0800
Subject: [developers] Skipping non-parsed items in fftb
In-Reply-To: <CAGXBFAqfNXC=FgtUE1i9cgueq99ZfzST7xsK05rma+9VPCbrRQ@mail.gmail.com>
References: <CAMype6eKM7gR6w0QoW_vPurpsnEwQy7tLhueaFdS68O9qDzWZw@mail.gmail.com>
	<CAGXBFAp_yg71uGcjD7COpNFkzYqazkncN_H=uSr+E=QbAn=3Bg@mail.gmail.com>
	<CAMype6eYKTYLLgZnO3tdscTdm2ui4ZiYg2j4Q6QZyCHBXpgtSw@mail.gmail.com>
	<CAGXBFArrRVQQ=QLcSejZcdH1cTCUH-Yc3ac0b1F4ajoJsqvZMg@mail.gmail.com>
	<CAMype6cp6C2nUkro5=2eBvDmBxxr=xTJge1AsvD-FyynU0ZgXQ@mail.gmail.com>
	<CAMype6eYDcFeZB7joOUWLj5j2hwDrf7h9Y6C--3GSRs-a8pQ_Q@mail.gmail.com>
	<CAGXBFArps834D-XLguB7rBA67gTeaBNMdRcN01-oZiP5QOGBcw@mail.gmail.com>
	<CAMype6dcUeHshzLYU4h3nyEUPAT=SXV8iBdq9oNg2Jd_=ugd_w@mail.gmail.com>
	<CAGXBFAqfNXC=FgtUE1i9cgueq99ZfzST7xsK05rma+9VPCbrRQ@mail.gmail.com>
Message-ID: <CAMype6eByaQi1ZALy5_P1+_R2Q0c1nd2=1=b+r=tWGH7rCGwTA@mail.gmail.com>

Nice -- got it working! (In fact, it was a permissions error on the .dat
file.) Yes, I think that the trace isn't that helpful, especially given
that it functionally hid the actionable bit of information from me.

Thanks for your help!
Emily

On Tue, Jan 21, 2020 at 5:36 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> Ok, that's progress. This error shows up when ACE exited abnormally
> (recall that PyDelphin calls ACE in a subprocess in order to process
> profiles). Since I don't capture ACE's stderr, it should be printed in the
> terminal just above the stacktrace. The PyDelphin error is directing you to
> look for that message to fix the problem. Most likely, the path to the
> grammar image is incorrect or the grammar image was compiled with a
> different version of ACE. The stacktrace and error message are both printed
> to stderr so if you see one you should see both (unless ACE exited
> abnormally without printing anything).
>
> By the way, I've recently pushed some commits to suppress the stacktrace
> when encountering anticipated errors from the `delphin` command, as I don't
> think the stacktrace is useful except for me (it can be shown again when
> called in DEBUG mode). In addition I tried to provide more useful messages
> for common situations. These changes will be part of the next release.
>
> On Wed, Jan 22, 2020 at 9:19 AM Emily M. Bender <ebender at uw.edu> wrote:
>
>> Updating PyDelphin caused the error to change, at least:
>>
>> Traceback (most recent call last):
>>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>>     sys.exit(main())
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>> line 40, in main
>>     args.func(args)
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/cli/process.py",
>> line 46, in call_process
>>     gzip=args.gzip)
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>> line 602, in process
>>     with processor(grammar, cmdargs=options, **kwargs) as cpu:
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py",
>> line 110, in __init__
>>     self._open()
>>   File
>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py",
>> line 143, in _open
>>     raise ACEProcessError('Process closed on startup; see <stderr>.')
>> delphin.ace.ACEProcessError: Process closed on startup; see <stderr>.
>>
>>
>> ... I'm a little puzzled as to how I got about "seeing <stderr>" as I
>> didn't myself do anything to redirect it somewhere else, but it's not
>> printing to the console...
>>
>> Emily
>>
>> On Fri, Jan 17, 2020 at 4:17 PM goodman.m.w at gmail.com <
>> goodman.m.w at gmail.com> wrote:
>>
>>> Hi Emily,
>>>
>>> Yes those error messages are not very clear. But the second one looks
>>> like old code, as 'tables' is no longer a key in the object it's being
>>> looked up on. I suggest making sure that your run_agg environment has an
>>> updated version of PyDelphin. While the environment is active, try `pip
>>> install -U pydelphin` and make sure it has a 1.0 or newer version (`delphin
>>> --version`), then try again.
>>>
>>> On Sat, Jan 18, 2020 at 7:52 AM Emily M. Bender <ebender at uw.edu> wrote:
>>>
>>>> Apologies --- that error meant I hadn't given the right path to the
>>>> testsuite. Correcting that, I now see:
>>>>
>>>> (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$
>>>> delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
>>>> Traceback (most recent call last):
>>>>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>>>>     sys.exit(main())
>>>>   File
>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>>> line 42, in main
>>>>     args.func(args)
>>>>   File
>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>>> line 135, in call_process
>>>>     gzip=args.gzip)
>>>>   File
>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>>>> line 542, in process
>>>>     column, tablename, condition = _interpret_selection(select, source)
>>>>   File
>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>>>> line 562, in _interpret_selection
>>>>     if len(queryobj['tables']) == 1:
>>>> KeyError: 'tables'
>>>>
>>>> On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender <ebender at uw.edu> wrote:
>>>>
>>>>> Dear Mike,
>>>>>
>>>>> Alas, I'm hitting this error:
>>>>>
>>>>> (run_agg) ebender at patas:~$ delphin process -g
>>>>> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/
>>>>> Traceback (most recent call last):
>>>>>   File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in <module>
>>>>>     sys.exit(main())
>>>>>   File
>>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>>>> line 42, in main
>>>>>     args.func(args)
>>>>>   File
>>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py",
>>>>> line 135, in call_process
>>>>>     gzip=args.gzip)
>>>>>   File
>>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py",
>>>>> line 540, in process
>>>>>     source = itsdb.TestSuite(source)
>>>>>   File
>>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py",
>>>>> line 644, in __init__
>>>>>     '*schema* argument is required for new test suites')
>>>>> delphin.itsdb.ITSDBError: *schema* argument is required for new test
>>>>> suites
>>>>>
>>>>> I'll poke around and see where the schema requirement is coming from
>>>>> (nothing in the bit on "process" in the documentation page mentions it),
>>>>> but thought I'd post here too in the meantime.
>>>>>
>>>>> Emily
>>>>>
>>>>> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com <
>>>>> goodman.m.w at gmail.com> wrote:
>>>>>
>>>>>> Let me know how it goes.
>>>>>>
>>>>>> And a clarification: the --full option on `mkprof` doesn't hurt, but
>>>>>> it's unnecessary since you're re-parsing the created profile.
>>>>>>
>>>>>> Also here's the bug report for the other thing, if you're interested
>>>>>> in that use case: https://github.com/delph-in/pydelphin/issues/273
>>>>>>
>>>>>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender <ebender at uw.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks, Mike! I will give this a try.
>>>>>>>
>>>>>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com <
>>>>>>> goodman.m.w at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Emily,
>>>>>>>>
>>>>>>>> For (2), here is how you could do it with PyDelphin:
>>>>>>>>
>>>>>>>>     delphin process -g grm.dat original-profile/
>>>>>>>>     delphin mkprof --full --where 'readings > 0' --source
>>>>>>>> original-profile/ new-profile/
>>>>>>>>     delphin process -g grm.dat --full-forest new-profile/
>>>>>>>>
>>>>>>>> Note that original-profile/ is first parsed in regular (non-forest)
>>>>>>>> mode, because in full-forest mode the number of readings is essentially
>>>>>>>> unknown until they are enumerated and thus the 'readings' field is always
>>>>>>>> 0. The second command not only prunes lines in the 'parse' file with
>>>>>>>> readings == 0, but also lines in the 'item' file which correspond to those
>>>>>>>> 'parse' lines. Once you have created new-profile/, you can parse again with
>>>>>>>> --full-forest for use with FFTB (and of course you don't have to use
>>>>>>>> PyDelphin for the parsing steps, if you prefer other means).
>>>>>>>>
>>>>>>>> Also note that this results in a profile with no edges for partial
>>>>>>>> parses. I think this is what you want. There should be a way to prune the
>>>>>>>> full-forest profile directly while keeping partial parses, but while
>>>>>>>> investigating this use case I found a bug, so I don't recommend it yet.
>>>>>>>>
>>>>>>>> Try `delphin mkprof --help` to see descriptions of these and other
>>>>>>>> options. They map fairly directly to the function documented here:
>>>>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html
>>>>>>>> #mkprof
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender <ebender at uw.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Dear all,
>>>>>>>>>
>>>>>>>>> We are doing some treebanking here at UW with fftb with grammars
>>>>>>>>> that have very low coverage over their associated test corpora. The current
>>>>>>>>> behavior of fftb with these profiles is to include all items for
>>>>>>>>> treebanking, but give a 404 for each one with no parse forest stored. This
>>>>>>>>> necessitates clicking the back button and tracking which one is next (since
>>>>>>>>> nothing changes color). In that light, two questions:
>>>>>>>>>
>>>>>>>>> (1) Is there some option we can pass fftb so that it just doesn't
>>>>>>>>> present items with no parses?
>>>>>>>>> (2) Failing that, is it fairly straightforward with pydelphin,
>>>>>>>>> [incr tsdb()] or something else to export a version of the profiles that
>>>>>>>>> only includes items which the grammar successfully parsed?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Emily
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Emily M. Bender (she/her)
>>>>>>>>> Howard and Frances Nostrand Endowed Professor
>>>>>>>>> Department of Linguistics
>>>>>>>>> Faculty Director, CLMS
>>>>>>>>> University of Washington
>>>>>>>>> Twitter: @emilymbender
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -Michael Wayne Goodman
>>>>>>>>
>>>>>>> --
>>>>>>> Emily M. Bender (she/her)
>>>>>>> Howard and Frances Nostrand Endowed Professor
>>>>>>> Department of Linguistics
>>>>>>> Faculty Director, CLMS
>>>>>>> University of Washington
>>>>>>> Twitter: @emilymbender
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Michael Wayne Goodman
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Emily M. Bender (she/her)
>>>>> Howard and Frances Nostrand Endowed Professor
>>>>> Department of Linguistics
>>>>> Faculty Director, CLMS
>>>>> University of Washington
>>>>> Twitter: @emilymbender
>>>>>
>>>>
>>>>
>>>> --
>>>> Emily M. Bender (she/her)
>>>> Howard and Frances Nostrand Endowed Professor
>>>> Department of Linguistics
>>>> Faculty Director, CLMS
>>>> University of Washington
>>>> Twitter: @emilymbender
>>>>
>>>
>>>
>>> --
>>> -Michael Wayne Goodman
>>>
>>
>>
>> --
>> Emily M. Bender (she/her)
>> Howard and Frances Nostrand Endowed Professor
>> Department of Linguistics
>> Faculty Director, CLMS
>> University of Washington
>> Twitter: @emilymbender
>>
>
>
> --
> -Michael Wayne Goodman
>


-- 
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200121/c59c2d20/attachment-0001.html>

From oe at ifi.uio.no  Mon Jan 27 18:42:15 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Mon, 27 Jan 2020 18:42:15 +0100
Subject: [developers] EDM implementations
In-Reply-To: <CAGXBFAp0x-Q3tzACMJaTywgENnQju+AbM9bVxVO2c97UTibrZw@mail.gmail.com>
References: <CAGXBFArBbMGPfh_7s_2wkZQyjsM-G15WJUDSmG2D6Rx3fwsf3w@mail.gmail.com>
	<CAKRPO=PZv8N9E0AP=wt5UN19U5zgKnp3-BhT1F9H2xz9Yko+yQ@mail.gmail.com>
	<CAGXBFAp1W4JkWfKK2M0tBTN=0ChsOBgJbTfQ3OxZU7-MAButew@mail.gmail.com>
	<CAKRPO=OdgmCtKFKOx_-FDHZ6+kkyZ-b0TsXhH+Ncu9buyLZB=w@mail.gmail.com>
	<CAGXBFAp0x-Q3tzACMJaTywgENnQju+AbM9bVxVO2c97UTibrZw@mail.gmail.com>
Message-ID: <CA+_Fm6Lta80LjFxC1muTN=X+cUomTBm1ECx3qRTjwDCnQaswmg@mail.gmail.com>

hi mike,

belatedly, thanks (once again) for pushing forward standardization!
and also my apologies for returning to this thread a little late!

regarding EDM, i used to think of the Common-Lisp implementation
(which it appears i produced in early 2012, i.e. more recently than
the Perl version by bec) as the reference until recently.  last year,
when comparing its scores to my re-implementation in Python as part of
mtool, that comparison also turned up the two questions you raised,
viz. the treatment of the TOP property and how to score parameterized
predicates.

regarding the first, this appears to be one of the better-kept secrets
in meaning representation comparison: in my view, it is a semantically
highly relevant property (marking the contrast between e.g. 'all
fierce dogs bark' vs. 'all barking dogs are fierce'), but neither the
original EDM paper nor its derivative in the AMR world (Cai & Knight,
2013) discuss it.  yet, both the Lisp implementation of EDM and SMATCH
seem to always have scored the TOP node as an additional tuple
(counted among the 'argument' tuples for EDM, while considered among
the 'attribute' tuples in SMATCH).  the Perl implementation of EDM, on
the other hand, worked off my 'ltriples' export format for EDS, which
appears to not include a separate TOP tuple.

i confirmed the nature of those triples by reminding myself of what
became of the 'export' script mentioned in the original EDM wiki notes
you had found.  it was folded into the LOGON 'redwoods' script, so
something like the following actually works today to prepare the input
for the Perl implementation of EDM:

  $LOGONROOT/redwoods --erg --export ltriples --target /tmp mrs

i attach the output for item #21 from the MRS test suite, for
reference.  so, i agree with the conclusion bec and you have already
reached: the original Perl implementation of EDM did not consider TOP
tuples.  the Lisp implementation, on the other hand, appears to have
had TOP tuples from its very beginning.

regarding the second design choice you raise, parameterized relations
(involving one or more constant arguments), it appears that both the
Lisp and Perl implementations of EDM do the same thing, viz. assume
that there can be at most one constant argument in a relation and
'inline' its value (if present) with the predicate itself, e.g.
internally using node label shorthands like 'named(Abrams)'.  in this
regard, i suspect bec and you actually may have arrived at the wrong
conclusion about historic behavior; thus, personally, i see no reason
for pyDelphin to provide a special-cased version of EDM that wholly
ignores constant arguments.

looking at this particular design choice today, however, it seems too
limiting an assumption and meshing together two things that arguably
should be considered separate.  even though ERG versions for the past
15 or more years have not used predicates with multiple (constant)
parameters, there would be nothing wrong with representing, say, the
fraction '2/3' as involving two constant arguments, e.g. something
like fraction [ CARG1 "2", CARG2 "3" ].  this is, for example, what
AMR does for complex proper names.

thus, even though our two historic EDM implementations appear to agree
on the 'inlining' treatment of constant arguments, i would be prepared
to argue that CARG et al. values should rather be treated as separate
node properties, i.e. for the above example the 'named' predicate and
the 'CARG' == 'Abrams' value should be treated as two distinct tuples.
in part for cross-framework compatibility, this is what we ended up
doing in mtool, including in its re-implementation of EDM, see:

  http://mrp.nlpl.eu/index.php?page=5

in summary, it sounds as if your EDM re-implementation, mike, had
arrived at the same conclusions: TOP tuples should be scored, and
constant arguments considered as separate properties.  i would expect
your implementation and mtool should then come to the exact same
results (on EDSs stripped of MRS variable properties, which the
current mtool EDS reader deliberately discards; see below)?  seeing as
we have identified two ways in which this way of computing EDM differs
from the original publication and the two earlier implementations (in
Perl and Lisp), i would like to suggest we formally coin this
refinement of the metric EDM 2.0.

regarding how to deal with missing graphs on either the gold or system
side of the comparison: it appears the Lisp implementation of EDM
provides a toggle *redwooods-score-all-p*, which selects between two
modes of computing EDM over two sets of corresponding items, either on
the intersection of items only; or on their union, treating gaps on
either side of the comparison as empty graphs (thus, incurring recall
or precision penalties).  in practice, i believe we used to
near-exclusively compute EDM over sets of items for which there was
both a gold and a system graph.  but that can of course only give
comparable results when fixing that very set of items.  thus, the
setup of scoring 'all' items seems more general, robust to attempts at
gaming, and in my view should be considered the default.

finally, regarding variable properties in mtool: for the 2019 CoNLL
shared task on meaning representation parsing (MRP 2019), we had
agreed with other framework developers to keep morpho-semantic
decorations out of the comparison.  hence, the MRP 2019 graphs did not
include tense, aspect, or number information from the full ERSs.  but
technically, i would consider that a property of the EDS used in MRP
2019, not a design decision in mtool.  for the re-run of the MRP task
at CoNLL 2020, we are currently preparing to throw these properties
back into the mix (also in other frameworks, where annotations are
available), which means the EDS reader in mtool in the near future
will no longer discard (underlying) variable properties by default.

best wishes, oe


On Mon, Jan 20, 2020 at 2:15 AM goodman.m.w at gmail.com
<goodman.m.w at gmail.com> wrote:
>
> Thanks again, Bec.
>
> I just want to make sure my implementation gets the same scores for the same inputs under the same assumptions as the original implementation. For this to work, its behavior concerning the points I've sought clarification for should be intentional. In light of your responses, I've separated the CARG triples from other properties and have given it its own weight. Thus I should be able to get the same scores as your code by setting the weights of CARGs (but not properties) and graph-tops to zero. Similarly, I'll add an option to ignore missing test items and otherwise treat them as mismatches.
>
> On Fri, Jan 17, 2020 at 6:14 PM Bec Dridan <bec.dridan at gmail.com> wrote:
>>
>>
>>
>> On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com> wrote:
>>>
>>>
>>> One more detail is what to do when the two sides (gold and test) have different numbers of items. Currently my code stops as soon as either a gold or test item is missing, which is what smatch (the similar metric made for AMR) does, but I think that may be wrong because parsing profiles are likely to have missing or extra (overgeneration) items in the middle. So the question is whether we ignore it or count it as a full mismatch.
>>
>>
>> If you are asking what is 'correct', I guess that depends on why you are evaluating. The perl implementation wouldn't have noticed missing gold parses, because it used the gold set as the definition of the set. A missing test item, on the other hand, by default counts as a full mismatch, but there is a command line option to ignore any gold parse with no corresponding test parse. The ignore option is useful when the purpose of the evaluation is assessing the system you are working on (and you consider coverage separately). For comparing across systems, I imagine you probably want to count parse failure as a full mismatch. It was useful for me to have both options.
>>
>> Bec
>>
>>>
>>>
>>> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan <bec.dridan at gmail.com> wrote:
>>>>
>>>> Wow, that is some old code... From memory, export was a wrapper around `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values* set.
>>>>
>>>> I don't know the mtool code at all, but re-reading the paper and looking at the perl code, I don't think the original implementation evaluated CARG at all. We only checked that the correct character span had a pred name of`named`.
>>>>
>>>> I think you are right that the triple export at the time did not produce a triple for TOP and it hence would not have been counted.
>>>>
>>>> That match your memory Stephan?
>>>>
>>>> Bec
>>>>
>>>>
>>>> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com> wrote:
>>>>>
>>>>> Hello developers,
>>>>>
>>>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I did not find an easy way to do it. I saw lisp code in the LKB's repository and Bec's Perl code, but I'm not sure how to call the former from the command line and the latter seems outdated (I don't see the "export" command required by its instructions).
>>>>>
>>>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd implement it on top of PyDelphin. The result is here: https://github.com/delph-in/delphin.edm. It requires the latest version of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text files or [incr tsdb()] profiles.
>>>>>
>>>>> When I nearly had my version working I found that Stephan et al.'s mtool (https://github.com/cfmrpThe paper example
>>>>> /mtool) also had an implementation of EDM, so I used that to compare with my outputs (as I couldn't get the previous implementations to work). In this process I think I found some differences from Dridan & Oepen, 2011's description, and this email is to confirm those findings. Namely, that mtool's (and now my) implementation do the following:
>>>>>
>>>>> * CARGs are treated as property triples ("class 3 information"). Previously they were combined with the predicate name. This change means that predicates like 'named' will match even if their CARGs don't and the CARGs are a separate thing that needs to be matched.
>>>>>
>>>>> * The identification of the graph's TOP counts as a triple.
>>>>>
>>>>> One difference between mtool and delphin.edm is that mtool does not count "variable" properties from EDS, but that's just because its EDS parser does not yet handle them while PyDelphin's does.
>>>>>
>>>>> Can anyone familiar with EDM confirm the above? Or can anyone explain how to call the Perl or LKB code so I can compare?
>>>>>
>>>>> --
>>>>> -Michael Wayne Goodman
>>>
>>>
>>>
>>> --
>>> -Michael Wayne Goodman
>
>
>
> --
> -Michael Wayne Goodman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 21.gz
Type: application/gzip
Size: 271 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200127/41fb96a1/attachment.bin>

From goodman.m.w at gmail.com  Tue Jan 28 02:57:27 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Tue, 28 Jan 2020 09:57:27 +0800
Subject: [developers] EDM implementations
In-Reply-To: <CA+_Fm6Lta80LjFxC1muTN=X+cUomTBm1ECx3qRTjwDCnQaswmg@mail.gmail.com>
References: <CAGXBFArBbMGPfh_7s_2wkZQyjsM-G15WJUDSmG2D6Rx3fwsf3w@mail.gmail.com>
	<CAKRPO=PZv8N9E0AP=wt5UN19U5zgKnp3-BhT1F9H2xz9Yko+yQ@mail.gmail.com>
	<CAGXBFAp1W4JkWfKK2M0tBTN=0ChsOBgJbTfQ3OxZU7-MAButew@mail.gmail.com>
	<CAKRPO=OdgmCtKFKOx_-FDHZ6+kkyZ-b0TsXhH+Ncu9buyLZB=w@mail.gmail.com>
	<CAGXBFAp0x-Q3tzACMJaTywgENnQju+AbM9bVxVO2c97UTibrZw@mail.gmail.com>
	<CA+_Fm6Lta80LjFxC1muTN=X+cUomTBm1ECx3qRTjwDCnQaswmg@mail.gmail.com>
Message-ID: <CAGXBFApmbb55U4OcxkRvv7rR4Osjc8LndBL__sLfbGv7YUviTA@mail.gmail.com>

Thanks for the reply, Stephan,

> [...] it appears that both the
> Lisp and Perl implementations of EDM do the same thing, viz. assume
> that there can be at most one constant argument in a relation and
> 'inline' its value (if present) with the predicate itself, e.g.
> internally using node label shorthands like 'named(Abrams)'.  in this
> regard, i suspect bec and you actually may have arrived at the wrong
> conclusion about historic behavior;

Thanks for confirming how the Lisp implementation works. I took your 21.gz
file and created a version that replaced "Abrams" with "Brown", then used
edm_eval.pl to compare; it reports a full match (1.0), so based on this
limited test I think Bec was correct about the Perl version.

> thus, personally, i see no reason
> for pyDelphin to provide a special-cased version of EDM that wholly
> ignores constant arguments.

Me too, and that's not the case. I separated CARGs into their own category
and callers of the script can give the category a weight of zero to ignore
them, which allows them to recreate the results of the Perl implementation.
Otherwise, the default weight for all categories (arguments (-A),
names/predicates (-N), morphosemantic properties (-P), constants (-C), and
tops (-T)) is 1.0.

> i would expect
> your implementation and mtool should then come to the exact same
> results (on EDSs stripped of MRS variable properties, [...])

Yes, but there's no need to strip the properties; just give the category a
weight of zero. I've confirmed on a few test items that my implementation
gets the exact same scores as mtool with -P0.

Furthermore, I think the following option configurations for my
re-implementation cover all current and historical use cases except for the
inlined constants of the Lisp version, which interact with node names in a
way that isn't reproducible with weights alone.

* Perl: `delphin edm -C0 -T0 --ignore-missing=gold`
* Perl with -i option: `delphin edm -C0 -T0 --ignore-missing=both`
* Lisp where *redwooods-score-all-p* is true: `delphin edm`
* Lisp where *redwooods-score-all-p* is false: `delphin edm
--ignore-missing=both`
* mtool (MRP 2019): `delphin edm -P0`
* mtool (MRP 2020? or EDM 2.0): `delphin edm`


On Tue, Jan 28, 2020 at 1:42 AM Stephan Oepen <oe at ifi.uio.no> wrote:

> hi mike,
>
> belatedly, thanks (once again) for pushing forward standardization!
> and also my apologies for returning to this thread a little late!
>
> regarding EDM, i used to think of the Common-Lisp implementation
> (which it appears i produced in early 2012, i.e. more recently than
> the Perl version by bec) as the reference until recently.  last year,
> when comparing its scores to my re-implementation in Python as part of
> mtool, that comparison also turned up the two questions you raised,
> viz. the treatment of the TOP property and how to score parameterized
> predicates.
>
> regarding the first, this appears to be one of the better-kept secrets
> in meaning representation comparison: in my view, it is a semantically
> highly relevant property (marking the contrast between e.g. 'all
> fierce dogs bark' vs. 'all barking dogs are fierce'), but neither the
> original EDM paper nor its derivative in the AMR world (Cai & Knight,
> 2013) discuss it.  yet, both the Lisp implementation of EDM and SMATCH
> seem to always have scored the TOP node as an additional tuple
> (counted among the 'argument' tuples for EDM, while considered among
> the 'attribute' tuples in SMATCH).  the Perl implementation of EDM, on
> the other hand, worked off my 'ltriples' export format for EDS, which
> appears to not include a separate TOP tuple.
>
> i confirmed the nature of those triples by reminding myself of what
> became of the 'export' script mentioned in the original EDM wiki notes
> you had found.  it was folded into the LOGON 'redwoods' script, so
> something like the following actually works today to prepare the input
> for the Perl implementation of EDM:
>
>   $LOGONROOT/redwoods --erg --export ltriples --target /tmp mrs
>
> i attach the output for item #21 from the MRS test suite, for
> reference.  so, i agree with the conclusion bec and you have already
> reached: the original Perl implementation of EDM did not consider TOP
> tuples.  the Lisp implementation, on the other hand, appears to have
> had TOP tuples from its very beginning.
>
> regarding the second design choice you raise, parameterized relations
> (involving one or more constant arguments), it appears that both the
> Lisp and Perl implementations of EDM do the same thing, viz. assume
> that there can be at most one constant argument in a relation and
> 'inline' its value (if present) with the predicate itself, e.g.
> internally using node label shorthands like 'named(Abrams)'.  in this
> regard, i suspect bec and you actually may have arrived at the wrong
> conclusion about historic behavior; thus, personally, i see no reason
> for pyDelphin to provide a special-cased version of EDM that wholly
> ignores constant arguments.
>
> looking at this particular design choice today, however, it seems too
> limiting an assumption and meshing together two things that arguably
> should be considered separate.  even though ERG versions for the past
> 15 or more years have not used predicates with multiple (constant)
> parameters, there would be nothing wrong with representing, say, the
> fraction '2/3' as involving two constant arguments, e.g. something
> like fraction [ CARG1 "2", CARG2 "3" ].  this is, for example, what
> AMR does for complex proper names.
>
> thus, even though our two historic EDM implementations appear to agree
> on the 'inlining' treatment of constant arguments, i would be prepared
> to argue that CARG et al. values should rather be treated as separate
> node properties, i.e. for the above example the 'named' predicate and
> the 'CARG' == 'Abrams' value should be treated as two distinct tuples.
> in part for cross-framework compatibility, this is what we ended up
> doing in mtool, including in its re-implementation of EDM, see:
>
>   http://mrp.nlpl.eu/index.php?page=5
>
> in summary, it sounds as if your EDM re-implementation, mike, had
> arrived at the same conclusions: TOP tuples should be scored, and
> constant arguments considered as separate properties.  i would expect
> your implementation and mtool should then come to the exact same
> results (on EDSs stripped of MRS variable properties, which the
> current mtool EDS reader deliberately discards; see below)?  seeing as
> we have identified two ways in which this way of computing EDM differs
> from the original publication and the two earlier implementations (in
> Perl and Lisp), i would like to suggest we formally coin this
> refinement of the metric EDM 2.0.
>
> regarding how to deal with missing graphs on either the gold or system
> side of the comparison: it appears the Lisp implementation of EDM
> provides a toggle *redwooods-score-all-p*, which selects between two
> modes of computing EDM over two sets of corresponding items, either on
> the intersection of items only; or on their union, treating gaps on
> either side of the comparison as empty graphs (thus, incurring recall
> or precision penalties).  in practice, i believe we used to
> near-exclusively compute EDM over sets of items for which there was
> both a gold and a system graph.  but that can of course only give
> comparable results when fixing that very set of items.  thus, the
> setup of scoring 'all' items seems more general, robust to attempts at
> gaming, and in my view should be considered the default.
>
> finally, regarding variable properties in mtool: for the 2019 CoNLL
> shared task on meaning representation parsing (MRP 2019), we had
> agreed with other framework developers to keep morpho-semantic
> decorations out of the comparison.  hence, the MRP 2019 graphs did not
> include tense, aspect, or number information from the full ERSs.  but
> technically, i would consider that a property of the EDS used in MRP
> 2019, not a design decision in mtool.  for the re-run of the MRP task
> at CoNLL 2020, we are currently preparing to throw these properties
> back into the mix (also in other frameworks, where annotations are
> available), which means the EDS reader in mtool in the near future
> will no longer discard (underlying) variable properties by default.
>
> best wishes, oe
>
>
>
>
>
> On Mon, Jan 20, 2020 at 2:15 AM goodman.m.w at gmail.com
> <goodman.m.w at gmail.com> wrote:
> >
> > Thanks again, Bec.
> >
> > I just want to make sure my implementation gets the same scores for the
> same inputs under the same assumptions as the original implementation. For
> this to work, its behavior concerning the points I've sought clarification
> for should be intentional. In light of your responses, I've separated the
> CARG triples from other properties and have given it its own weight. Thus I
> should be able to get the same scores as your code by setting the weights
> of CARGs (but not properties) and graph-tops to zero. Similarly, I'll add
> an option to ignore missing test items and otherwise treat them as
> mismatches.
> >
> > On Fri, Jan 17, 2020 at 6:14 PM Bec Dridan <bec.dridan at gmail.com> wrote:
> >>
> >>
> >>
> >> On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
> >>>
> >>>
> >>> One more detail is what to do when the two sides (gold and test) have
> different numbers of items. Currently my code stops as soon as either a
> gold or test item is missing, which is what smatch (the similar metric made
> for AMR) does, but I think that may be wrong because parsing profiles are
> likely to have missing or extra (overgeneration) items in the middle. So
> the question is whether we ignore it or count it as a full mismatch.
> >>
> >>
> >> If you are asking what is 'correct', I guess that depends on why you
> are evaluating. The perl implementation wouldn't have noticed missing gold
> parses, because it used the gold set as the definition of the set. A
> missing test item, on the other hand, by default counts as a full mismatch,
> but there is a command line option to ignore any gold parse with no
> corresponding test parse. The ignore option is useful when the purpose of
> the evaluation is assessing the system you are working on (and you consider
> coverage separately). For comparing across systems, I imagine you probably
> want to count parse failure as a full mismatch. It was useful for me to
> have both options.
> >>
> >> Bec
> >>
> >>>
> >>>
> >>> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan <bec.dridan at gmail.com>
> wrote:
> >>>>
> >>>> Wow, that is some old code... From memory, export was a wrapper
> around `parse --export`, where I could add :ltriples to the
> tsdb::*redwoods-export-values* set.
> >>>>
> >>>> I don't know the mtool code at all, but re-reading the paper and
> looking at the perl code, I don't think the original implementation
> evaluated CARG at all. We only checked that the correct character span had
> a pred name of`named`.
> >>>>
> >>>> I think you are right that the triple export at the time did not
> produce a triple for TOP and it hence would not have been counted.
> >>>>
> >>>> That match your memory Stephan?
> >>>>
> >>>> Bec
> >>>>
> >>>>
> >>>> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
> >>>>>
> >>>>> Hello developers,
> >>>>>
> >>>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I
> did not find an easy way to do it. I saw lisp code in the LKB's repository
> and Bec's Perl code, but I'm not sure how to call the former from the
> command line and the latter seems outdated (I don't see the "export"
> command required by its instructions).
> >>>>>
> >>>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd
> implement it on top of PyDelphin. The result is here:
> https://github.com/delph-in/delphin.edm. It requires the latest version
> of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text
> files or [incr tsdb()] profiles.
> >>>>>
> >>>>> When I nearly had my version working I found that Stephan et al.'s
> mtool (https://github.com/cfmrpThe paper example
> >>>>> /mtool) also had an implementation of EDM, so I used that to compare
> with my outputs (as I couldn't get the previous implementations to work).
> In this process I think I found some differences from Dridan & Oepen,
> 2011's description, and this email is to confirm those findings. Namely,
> that mtool's (and now my) implementation do the following:
> >>>>>
> >>>>> * CARGs are treated as property triples ("class 3 information").
> Previously they were combined with the predicate name. This change means
> that predicates like 'named' will match even if their CARGs don't and the
> CARGs are a separate thing that needs to be matched.
> >>>>>
> >>>>> * The identification of the graph's TOP counts as a triple.
> >>>>>
> >>>>> One difference between mtool and delphin.edm is that mtool does not
> count "variable" properties from EDS, but that's just because its EDS
> parser does not yet handle them while PyDelphin's does.
> >>>>>
> >>>>> Can anyone familiar with EDM confirm the above? Or can anyone
> explain how to call the Perl or LKB code so I can compare?
> >>>>>
> >>>>> --
> >>>>> -Michael Wayne Goodman
> >>>
> >>>
> >>>
> >>> --
> >>> -Michael Wayne Goodman
> >
> >
> >
> > --
> > -Michael Wayne Goodman
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200128/55580cd3/attachment-0001.html>

From sweaglesw at sweaglesw.org  Wed Feb  5 01:35:49 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Tue, 4 Feb 2020 16:35:49 -0800
Subject: [developers] character-based discriminants
In-Reply-To: <CA+_Fm6KO+UTGjSEr9FG=VFKZZQZ61=or7ibC1isOv2zhNjinPA@mail.gmail.com>
References: <CA+_Fm6LCR3bi2hVL5M1S6B50_mnZDjfx-HZF8vSs9VT2_6OhZw@mail.gmail.com>
	<BB0292BD-A1CC-420F-80DF-684FDFFF76CF@sweaglesw.org>
	<CA+_Fm6KO+UTGjSEr9FG=VFKZZQZ61=or7ibC1isOv2zhNjinPA@mail.gmail.com>
Message-ID: <5E3A0DE5.4020502@sweaglesw.org>

Stephan and Dan, and other interested parties,

Happy new year to you all.  In the course of taking a closer look at how 
the proposed character-based discriminant system might work, I've run 
across a few cases that perhaps would benefit from a bit of discussion.  
First, my attempt to distill the proposed action plan for an automatic 
update (downdate?) of the ERG treebanks to the venerable PTB punctuation 
convention is as follows:

1. Modify ACE and other engines to use input character positions as 
token vertex identifiers, so that data coming out -- particularly the 
full forest record in the "edge" relation -- uses these to identify 
constituent boundaries instead of the existing identifiers 
(corresponding roughly to whitespace areas).

2. Mechanically revise a copy of the "decisions" relation from the old 
gold treebank so that the vertex identifiers in it are also 
character-based, in hopes of matching those used in the new full forest 
profiles.  Destroy any discriminants that are judged unlikely to match 
correctly.

3. Run an automatic treebank update to achieve a high coverage gold 
treebank under the new punctuation convention; manually fix any items 
that didn't quite make it.

Stephan pointed out that the +FROM/+TO values on token AVMs are a way to 
convert existing vertices to character positions.  Thinking a bit more 
closely about this, there is at least one obvious problem: adjacent 
tokens T1,T2 do not generally have the property that T1.+TO = T2.+FROM, 
because there is usually whitespace between them.  Therefore the revised 
scheme will have the property that whitespace adjacent to a constituent 
will in a sense be considered part of the constituent in some cases.  I 
consider that slightly weird, but perhaps not too big a deal.  The main 
thing is we need to pick a convention as to which position in the 
whitespace is to be considered the label of the vertex.  One candidate 
convention would be that for any given vertex, its character-based label 
is the smallest +FROM value of any token starting from it, if any, and 
if no token starts at it, then the largest +TO value of any token ending 
at it.  I would expect that at least in ordinary cases, possibly all 
cases, all the incident +FROMs would be identical and all the +TOs would 
be identical also, just with a difference between the +FROMs and +TOs.

A somewhat more troubling problem is that multiple token vertices in the 
ERG can share the same +FROM and +TO.  This happens quite productively 
with hyphenation, e.g.:

A four-footed zebra arose.

The historical ERG assigns [ +FROM "2" +TO "13" ] to both "four" and 
"footed" even while the token lattice is split in the middle, i.e. there 
are two tokens and there is a vertex "in between" them, but there is no 
sensible character offset available to assign to it.  In the existing 
vertex labeling scheme, the vertex labels are generated based on a 
topological sort of the lattice, so we get:
a(0,1)
four(1,2)
footed(2,3)
zebra(3,4)
arose(4,5)

Using the convention proposed above, this would translate into:
a(0,3)
four(3,3)
footed(3,14)
zebra(14,20)
arose(20,26)

As you can see, there is a problem: two distinct vertices got smushed 
into character position 3.  The situation is detectable automatically, 
of course, and ACE actually already has a built-in hack to adjust token 
+FROM and +TO in this case (making it possible to use the mouse to 
select parts of a hyphenated group like that in FFTB), but relying on 
that hack means hoping that ACE made the same decisions as the new 
punctuation rules in this case and any others that I haven't thought of.

I am tempted to look at an alternative way of achieving the primary goal 
(i.e. synchronizing the ERG treebanks to the revised punctuation 
scheme).  It would I believe be possible, maybe even straightforward, to 
make a tool that takes as input two token lattices (the old one and the 
new one for the same sentence) and computes an alignment between them 
that minimizes some notion of edit distance.  With that in hand, the 
vertex identifiers of the old discriminants could be rewritten without 
resorting to character positions or having to solve the above snafu.  It 
also would require no changes to the parsing engines or the treebanking 
tool, and would likely be at least partially reusable for future 
tokenization changes.

Any suggestions?
Woodley

On 11/24/2019 03:43 PM, Stephan Oepen wrote:
> many thanks for the quick follow-up, woodley!
>
> in general, character-based discriminants feel attractive because the idea
> promises increased robustness to variation over time in tokenization.  and
> i am not sure yet i understand the difference in expressivity that you
> suggest?  an input to parsing is segmented into a sequence of vertices (or
> breaking points); whether to number these continuously (0, 1, 2, ?) or
> discontinuously according to e.g. corresponding character positions or time
> stamps (into a speech signal)?i would think i can encode the same broad
> range of lattices either way?
>
> closer to home, i was in fact thinking that the conversion from an existing
> set of discriminants to a character-based regime could in fact be more
> mechanic than the retooling you sketch.  each current vertex should be
> uniquely identified with a left and right character position, viz. the
> +FROM and +TO values, respectively, on the underlying token feature
> structures (i am assuming that all tokens in one cell share the same
> values).  for the vast majority of discriminants, would it not just work to
> replace their start and end vertices with these characters positions?
>
> i am prepared to lose some discriminants, e.g. any choices on the
> punctuation lexical rules that are being removed, but possibly also some
> lexical choices that in the old universe end up anchored to a sub-string
> including one or more punctuation marks.  in the 500-best treebanks, it
> used to be the case that pervasive redundancy of discriminants meant one
> could afford to lose a non-trivial number of discriminants during an update
> and still arrive at a unique solution.  but maybe that works differently in
> the full-forest universe?
>
> finally, i had not yet considered the ?twigs? (as they are an FFTB-specific
> innovation).  yes, it would seem unfortunate to just lose all twigs that
> included one or more of the old punctuation rules!  so your candidate
> strategy of cutting twigs into two parts (of which one might often come out
> empty) at occurrences of these rules strikes me as a promising (still quite
> mechanic) way of working around this problem.  formally, breaking up twigs
> risks losing some information, but in this case i doubt this would be the
> case in actuality.
>
> thanks for tossing around this idea!  oe
>
>
> On Sat, 23 Nov 2019 at 20:30 Woodley Packard <sweaglesw at sweaglesw.org>
> wrote:
>
>> Hi Stephan,
>>
>> My initial reaction to the notion of character-based discriminants is (1)
>> it will not solve your immediate problem without a certain amount of custom
>> tooling to convert old discriminants to new ones in a way that is sensitive
>> to how the current punctuation rules work, i.e. a given chart vertex will
>> have to be able to map to several different character positions depending
>> on how much punctuation has been cliticized so far.  The twig-shaped
>> discriminants used by FFTB will in some cases have to be bifurcated into
>> two or more discriminants, as well. Also, (2) this approach loses the
>> (theoretical if perhaps not recently used) ability to treebank a nonlinear
>> lattice shaped input, e.g. from an ASR system.  I could imagine treebanking
>> lattices from other sources as well ? perhaps an image caption generator.
>>
>> Given the custom tooling required for updating the discriminants, I?m not
>> sure switching to character-based anchoring would be less painful than
>> having that tool compute the new chart vertex anchoring instead ? though I
>> could be wrong.  What other arguments can be made in favor of
>> character-based discriminants?
>>
>> In terms of support from FFTB, I think there are relatively few places in
>> the code that assume the discriminants? from/to are interpretable beyond
>> matching the from/to values of the `edge? relation.  I think I would
>> implement this by (optionally, I suppose, since presumably other grammars
>> won?t want to do this at least for now) replacing the from/to on edges read
>> from the profile with character positions and more or less pretend that
>> there is a chart vertex for every character position.  Barring unforeseen
>> complications, that wouldn?t be too hard.
>>
>> Woodley
>>
>>> On Nov 23, 2019, at 5:58 AM, Stephan Oepen <oe at ifi.uio.no> wrote:
>>>
>>> hi again, woodley,
>>>
>>> dan and i are currently exploring a 'makeover' of ERG input
>>> processing, with the overall goal of increased compatibility with
>>> mainstream assumptions about tokenization.
>>>
>>> among other things, we would like to move to the revised (i.e.
>>> non-venerable) PTB (and OntoNotes and UD) tokenization conventions and
>>> avoid subsequent re-arranging of segmentation in token mapping.  this
>>> means we would have to move away from the pseudo-affixation treatment
>>> of punctuation marks to a 'pseudo-clitization' approach, meaning that
>>> punctuation marks are lexical entries in their own right and attach
>>> via binary constructions (rather than as lexical rules).  the 'clitic'
>>> metaphor, here, is intended to suggest that these lexical entries can
>>> only attach at the bottom of the derivation, i.e. to non-clitic
>>> lexical items immediately to their left (e.g. in the case of a comma)
>>> or to their right (in the case of, say, an opening quote or
>>> parenthesis).
>>>
>>> dan is currently visiting oslo, and we would like to use the
>>> opportunity to estimate the cost of moving to such a revised universe.
>>> treebank maintenance is a major concern here, as such a radical change
>>> in the yields of virtually all derivations would render discriminants
>>> invalid when updating to the new forests.  i believe a cute idea has
>>> emerged that, we optimistically believe, might eliminate much of that
>>> concern: character-based discriminant positions, instead of our
>>> venerable way of counting chart vertices.
>>>
>>> for the ERG at least, we believe that leaf nodes in all derivations
>>> are reliably annotated with character start and end positions (+FROM
>>> and +TO, as well as the +ID lists on token feature structures).  these
>>> sub-string indices will hardly be affected by the above change to
>>> tokenization (except for cases where our current approach to splitting
>>> at hyphens and slashes first in token mapping leads to overlapping
>>> ranges).  hence if discriminants were anchored over character ranges
>>> instead of chart cells ... i expect the vast majority of them might
>>> just carry over?
>>>
>>> we would be grateful if you (and others too, of course) could give the
>>> above idea some critical thought and look for possible obstacles that
>>> dan and i may just be overlooking?  technically, i imagine one would
>>> have to extend FFTB to (optionally) extract discriminant start and end
>>> positions from the sub-string 'coverage' of each constituent, possibly
>>> once convert existing treebanks to character-based indexing, and then
>>> update into the new universe using character-based matching.  does
>>> such an approach seem feasible to you in principle?
>>>
>>> cheers, oe
>>


From arademaker at gmail.com  Thu Feb 20 22:16:27 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Thu, 20 Feb 2020 18:16:27 -0300
Subject: [developers] Acetools for MacOS
Message-ID: <8F57BDB5-44AA-4755-AC3D-5A124F6567C5@gmail.com>


Hi Woodley,

Any change to have the ace tools for MacOS?

http://sweaglesw.org/linguistics/acetools/


In particular, the ART 0.1.9 MacOS binary does not run on Catalina:

http://sweaglesw.org/linguistics/libtsdb/art.html


Best,
Alexandre


From sweaglesw at sweaglesw.org  Sat Feb 22 20:49:34 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Sat, 22 Feb 2020 11:49:34 -0800
Subject: [developers] Acetools for MacOS
In-Reply-To: <8F57BDB5-44AA-4755-AC3D-5A124F6567C5@gmail.com>
References: <8F57BDB5-44AA-4755-AC3D-5A124F6567C5@gmail.com>
Message-ID: <8F9A60DF-A7B3-4E6C-8822-E8FD555A4456@sweaglesw.org>

Hi Alexandre,

I am using OSX Catalina 10.15.2.  I just downloaded the art 0.1.9 binary and ran it successfully.  On my first attempt I got the following error:

$ ./art -a "~/cdev/ace/ace -g ~/cdev/ace/erg-1214.dat -1" mrs
zcat: can't stat: mrs/item.gz (mrs/item.gz.Z): No such file or directory

I'm not sure if this is a difference with previous versions of OSX, but what's happening here is that art is trying to decompress my zipped profile, and it expected zcat to support .gz extensions, but the zcat program it found didn't.  An easy workaround was to manually decompress the profile before processing it, e.g.:

$ gunzip mrs/*.gz

After that, everything went through without any issues.  I was a little bit surprised to see this, because I had seen it in the past and thought I had made the MacOS binaries use "gzcat" instead of "zcat", but apparently not, at least for this particular release.

Was that the problem you ran into, or was it something more sinister?

Regards,
Woodley


> On Feb 20, 2020, at 1:16 PM, Alexandre Rademaker <arademaker at gmail.com> wrote:
> 
> 
> Hi Woodley,
> 
> Any change to have the ace tools for MacOS?
> 
> http://sweaglesw.org/linguistics/acetools/
> 
> 
> In particular, the ART 0.1.9 MacOS binary does not run on Catalina:
> 
> http://sweaglesw.org/linguistics/libtsdb/art.html
> 
> 
> Best,
> Alexandre
> 


From ebender at uw.edu  Mon Feb 24 18:45:33 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Mon, 24 Feb 2020 09:45:33 -0800
Subject: [developers] Edge can be built interactively, but isn't in the chart
Message-ID: <CAMype6f6Qer3woVY9ST+EtS1RP5cNAHr8JPTHy0V+RYQgkYQzw@mail.gmail.com>

Dear all,

[Cross-posted to developers and the delphinqa.]

After 16 years of teaching grammar engineering, I thought I'd found all of
the ways in which one can be in the situation of seemingly being able to
build an edge through interactive unification which isn't in the chart.
I've documented all of the ones I know about here:

http://moin.delph-in.net/GeFaqUnifySurprise

Alas, I've found evidence of a new one. Or rather: I'm in that situation
(together with a student) but none of the cases noted there apply. More
specifically, with the grammar for Meithei [mni] that can be found here:

http://faculty.washington.edu/ebender/mni-debug.tgz

If we try to analyze this sentence:

yo?-si? t?m-? ??-?
monkey-PL sleep-NHYP eat-NHYP
Monkeys sleep and eat.

The LKB and ace both return no parses found. If instead of using the two
verbs (one intransitive and one transitive but with a dropped object), we
repeat either one of the verbs, we get the expected parses (with both the
LKB and ace).

yo?-si? ??-? ??-?
yo?-si? t?m-? t?m-?

Returning to the non-parsing sentence, and looking at the LKB's parse
chart, what's missing is the VP-T built out of applying the VP1-TOP-COORD
rule to the VP-B over ??-? and the VP over t?m-?.  Puzzlingly, I can build
this edge interactively just fine. I've run out of guesses as to why it's
not showing up in the char and so I thought I'd put this puzzle out in case
other DELPH-INites might be entertained by it.

Curiously,
Emily

p.s. Discourse directed me to an earlier discussion about this, where
@johnca suggested checking that *chart-packing-p* is set to NIL. It is.

-- 
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200224/688b0165/attachment.html>

From oe at ifi.uio.no  Mon Feb 24 19:32:48 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Mon, 24 Feb 2020 19:32:48 +0100
Subject: [developers] Edge can be built interactively,
	but isn't in the chart
In-Reply-To: <CAMype6f6Qer3woVY9ST+EtS1RP5cNAHr8JPTHy0V+RYQgkYQzw@mail.gmail.com>
References: <CAMype6f6Qer3woVY9ST+EtS1RP5cNAHr8JPTHy0V+RYQgkYQzw@mail.gmail.com>
Message-ID: <CA+_Fm6K9bQoWGrAuHcOdyF9QVAyj=VnpPuJeP=wEGmHXXcQdEQ@mail.gmail.com>

from memory, i believe the chart display shows edges with a non-empty
orthographemic todo list, i.e. a remaining need to pass through a lexical
rule with an associated orthographemic effect.  this property of edges is
not visible in the interface, and interactive unification may not be paying
attention to it.  upon completion of lexical parsing, only edges with an
empty todo list can go on and feed into syntax rules, so this filter that
is applied by the parser might explain seeming misalignment between the
interactive mode and what actually happens during parsing.

just a wild guess :-), oe


On Mon, 24 Feb 2020 at 18:49 Emily M. Bender <ebender at uw.edu> wrote:

> Dear all,
>
> [Cross-posted to developers and the delphinqa.]
>
> After 16 years of teaching grammar engineering, I thought I'd found all of
> the ways in which one can be in the situation of seemingly being able to
> build an edge through interactive unification which isn't in the chart.
> I've documented all of the ones I know about here:
>
> http://moin.delph-in.net/GeFaqUnifySurprise
>
> Alas, I've found evidence of a new one. Or rather: I'm in that situation
> (together with a student) but none of the cases noted there apply. More
> specifically, with the grammar for Meithei [mni] that can be found here:
>
> http://faculty.washington.edu/ebender/mni-debug.tgz
>
> If we try to analyze this sentence:
>
> yo?-si? t?m-? ??-?
> monkey-PL sleep-NHYP eat-NHYP
> Monkeys sleep and eat.
>
> The LKB and ace both return no parses found. If instead of using the two
> verbs (one intransitive and one transitive but with a dropped object), we
> repeat either one of the verbs, we get the expected parses (with both the
> LKB and ace).
>
> yo?-si? ??-? ??-?
> yo?-si? t?m-? t?m-?
>
> Returning to the non-parsing sentence, and looking at the LKB's parse
> chart, what's missing is the VP-T built out of applying the VP1-TOP-COORD
> rule to the VP-B over ??-? and the VP over t?m-?.  Puzzlingly, I can build
> this edge interactively just fine. I've run out of guesses as to why it's
> not showing up in the char and so I thought I'd put this puzzle out in case
> other DELPH-INites might be entertained by it.
>
> Curiously,
> Emily
>
> p.s. Discourse directed me to an earlier discussion about this, where
> @johnca suggested checking that *chart-packing-p* is set to NIL. It is.
>
>
> --
> Emily M. Bender (she/her)
> Howard and Frances Nostrand Endowed Professor
> Department of Linguistics
> Faculty Director, CLMS
> University of Washington
> Twitter: @emilymbender
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200224/8330566b/attachment.html>

From ebender at uw.edu  Mon Feb 24 19:37:16 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Mon, 24 Feb 2020 10:37:16 -0800
Subject: [developers] Edge can be built interactively,
	but isn't in the chart
In-Reply-To: <CA+_Fm6K9bQoWGrAuHcOdyF9QVAyj=VnpPuJeP=wEGmHXXcQdEQ@mail.gmail.com>
References: <CAMype6f6Qer3woVY9ST+EtS1RP5cNAHr8JPTHy0V+RYQgkYQzw@mail.gmail.com>
	<CA+_Fm6K9bQoWGrAuHcOdyF9QVAyj=VnpPuJeP=wEGmHXXcQdEQ@mail.gmail.com>
Message-ID: <CAMype6dudi6QWccxoKrZRQW4RP3UegpJp4=D=b7LH_8-0q18hA@mail.gmail.com>

Yes -- that is one of the known cases. However, it's not what's going on
here: The daughters of the missing edge can be used to create the analogous
edge in the sentences with the same verb twice. (And in one case, the
daughter is already the product of a syntax rule.)

Thank you for the guess though! I'm hoping some such guess will put me on
the right path...

Emily

On Mon, Feb 24, 2020 at 10:34 AM Stephan Oepen <oe at ifi.uio.no> wrote:

> from memory, i believe the chart display shows edges with a non-empty
> orthographemic todo list, i.e. a remaining need to pass through a lexical
> rule with an associated orthographemic effect.  this property of edges is
> not visible in the interface, and interactive unification may not be paying
> attention to it.  upon completion of lexical parsing, only edges with an
> empty todo list can go on and feed into syntax rules, so this filter that
> is applied by the parser might explain seeming misalignment between the
> interactive mode and what actually happens during parsing.
>
> just a wild guess :-), oe
>
>
> On Mon, 24 Feb 2020 at 18:49 Emily M. Bender <ebender at uw.edu> wrote:
>
>> Dear all,
>>
>> [Cross-posted to developers and the delphinqa.]
>>
>> After 16 years of teaching grammar engineering, I thought I'd found all
>> of the ways in which one can be in the situation of seemingly being able to
>> build an edge through interactive unification which isn't in the chart.
>> I've documented all of the ones I know about here:
>>
>> http://moin.delph-in.net/GeFaqUnifySurprise
>>
>> Alas, I've found evidence of a new one. Or rather: I'm in that situation
>> (together with a student) but none of the cases noted there apply. More
>> specifically, with the grammar for Meithei [mni] that can be found here:
>>
>> http://faculty.washington.edu/ebender/mni-debug.tgz
>>
>> If we try to analyze this sentence:
>>
>> yo?-si? t?m-? ??-?
>> monkey-PL sleep-NHYP eat-NHYP
>> Monkeys sleep and eat.
>>
>> The LKB and ace both return no parses found. If instead of using the two
>> verbs (one intransitive and one transitive but with a dropped object), we
>> repeat either one of the verbs, we get the expected parses (with both the
>> LKB and ace).
>>
>> yo?-si? ??-? ??-?
>> yo?-si? t?m-? t?m-?
>>
>> Returning to the non-parsing sentence, and looking at the LKB's parse
>> chart, what's missing is the VP-T built out of applying the VP1-TOP-COORD
>> rule to the VP-B over ??-? and the VP over t?m-?.  Puzzlingly, I can build
>> this edge interactively just fine. I've run out of guesses as to why it's
>> not showing up in the char and so I thought I'd put this puzzle out in case
>> other DELPH-INites might be entertained by it.
>>
>> Curiously,
>> Emily
>>
>> p.s. Discourse directed me to an earlier discussion about this, where
>> @johnca suggested checking that *chart-packing-p* is set to NIL. It is.
>>
>>
>> --
>> Emily M. Bender (she/her)
>> Howard and Frances Nostrand Endowed Professor
>> Department of Linguistics
>> Faculty Director, CLMS
>> University of Washington
>> Twitter: @emilymbender
>>
>

-- 
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200224/1b15a269/attachment-0001.html>

From bond at ieee.org  Wed Feb 26 14:02:28 2020
From: bond at ieee.org (Francis Bond)
Date: Wed, 26 Feb 2020 21:02:28 +0800
Subject: [developers] Searching treebanks
Message-ID: <CA+arSXgqUARoge9e9e32g_tXAEVftP_avZTzwUy4P5kQpROYCw@mail.gmail.com>

G'day,

does anyone know of any way to search Redwoods (or DELPHIN treebanks in
general)  for trees of a certain type (using something like the Fangorn
interface).  For example, I want to find how often in the treebank 'start'
is intransitive vs NP V VP-ving  vs NP V VP-to vs NP V VP NP  (I start; I
start lecturing; I start to lecture; I start a lecture).

In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ...

I would be ecstatic if there were an online search I can point my students
at, but would be interested in anything.


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200226/76867d11/attachment.html>

From bond at ieee.org  Wed Feb 26 14:28:32 2020
From: bond at ieee.org (Francis Bond)
Date: Wed, 26 Feb 2020 21:28:32 +0800
Subject: [developers] Searching treebanks
In-Reply-To: <5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no>
References: <CA+arSXgqUARoge9e9e32g_tXAEVftP_avZTzwUy4P5kQpROYCw@mail.gmail.com>
	<5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no>
Message-ID: <CA+arSXg+8rrdguzTDS2xJfQL8izE56HWB5qhrM1QUdE2mEJFHw@mail.gmail.com>

Thanks for the tip.    If only we all sensibly annotated our corpora with
typecraft.

On Wed, Feb 26, 2020 at 9:21 PM Lars Hellan <lars.hellan at ntnu.no> wrote:

> Hi Francis,
>
> For Norwegian you can do such things through
> https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of about
> 20,000 sentences.
>
>
> (Not right on your mark, but perhaps not too far from the sphere of
> "anything" ...)
>
>
> Best
>
> Lars
> ------------------------------
> *From:* developers-bounces at emmtee.net <developers-bounces at emmtee.net> on
> behalf of Francis Bond <bond at ieee.org>
> *Sent:* Wednesday, February 26, 2020 2:02:28 PM
> *To:* Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy
> Baldwin
> *Subject:* [developers] Searching treebanks
>
> G'day,
>
> does anyone know of any way to search Redwoods (or DELPHIN treebanks in
> general)  for trees of a certain type (using something like the Fangorn
> interface).  For example, I want to find how often in the treebank 'start'
> is intransitive vs NP V VP-ving  vs NP V VP-to vs NP V VP NP  (I start; I
> start lecturing; I start to lecture; I start a lecture).
>
> In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ...
>
> I would be ecstatic if there were an online search I can point my students
> at, but would be interested in anything.
>
>
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200226/85844b83/attachment.html>

From lars.hellan at ntnu.no  Wed Feb 26 14:21:43 2020
From: lars.hellan at ntnu.no (Lars Hellan)
Date: Wed, 26 Feb 2020 13:21:43 +0000
Subject: [developers] Searching treebanks
In-Reply-To: <CA+arSXgqUARoge9e9e32g_tXAEVftP_avZTzwUy4P5kQpROYCw@mail.gmail.com>
References: <CA+arSXgqUARoge9e9e32g_tXAEVftP_avZTzwUy4P5kQpROYCw@mail.gmail.com>
Message-ID: <5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no>

Hi Francis,

For Norwegian you can do such things through https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of about 20,000 sentences.


(Not right on your mark, but perhaps not too far from the sphere of "anything" ...)


Best

Lars

________________________________
From: developers-bounces at emmtee.net <developers-bounces at emmtee.net> on behalf of Francis Bond <bond at ieee.org>
Sent: Wednesday, February 26, 2020 2:02:28 PM
To: Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy Baldwin
Subject: [developers] Searching treebanks

G'day,

does anyone know of any way to search Redwoods (or DELPHIN treebanks in general)  for trees of a certain type (using something like the Fangorn interface).  For example, I want to find how often in the treebank 'start' is intransitive vs NP V VP-ving  vs NP V VP-to vs NP V VP NP  (I start; I start lecturing; I start to lecture; I start a lecture).

In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ...

I would be ecstatic if there were an online search I can point my students at, but would be interested in anything.


--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200226/36bcb567/attachment.html>

From ebender at uw.edu  Wed Feb 26 15:04:36 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Wed, 26 Feb 2020 06:04:36 -0800
Subject: [developers] Searching treebanks
In-Reply-To: <CA+arSXg+8rrdguzTDS2xJfQL8izE56HWB5qhrM1QUdE2mEJFHw@mail.gmail.com>
References: <CA+arSXgqUARoge9e9e32g_tXAEVftP_avZTzwUy4P5kQpROYCw@mail.gmail.com>
	<5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no>
	<CA+arSXg+8rrdguzTDS2xJfQL8izE56HWB5qhrM1QUdE2mEJFHw@mail.gmail.com>
Message-ID: <CAMype6eDX593ns=hNKUTCb73nZFLOyumfgMgPA+AREo-iXaY2Q@mail.gmail.com>

For search over semantic representations (MRS, DM, EDS) there's WeSearch:

http://wesearch.delph-in.net/

... which indexes DeepBank and WikiWoods.

Emily

On Wed, Feb 26, 2020 at 5:29 AM Francis Bond <bond at ieee.org> wrote:

> Thanks for the tip.    If only we all sensibly annotated our corpora with
> typecraft.
>
> On Wed, Feb 26, 2020 at 9:21 PM Lars Hellan <lars.hellan at ntnu.no> wrote:
>
>> Hi Francis,
>>
>> For Norwegian you can do such things through
>> https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of
>> about 20,000 sentences.
>>
>>
>> (Not right on your mark, but perhaps not too far from the sphere of
>> "anything" ...)
>>
>>
>> Best
>>
>> Lars
>> ------------------------------
>> *From:* developers-bounces at emmtee.net <developers-bounces at emmtee.net> on
>> behalf of Francis Bond <bond at ieee.org>
>> *Sent:* Wednesday, February 26, 2020 2:02:28 PM
>> *To:* Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy
>> Baldwin
>> *Subject:* [developers] Searching treebanks
>>
>> G'day,
>>
>> does anyone know of any way to search Redwoods (or DELPHIN treebanks in
>> general)  for trees of a certain type (using something like the Fangorn
>> interface).  For example, I want to find how often in the treebank 'start'
>> is intransitive vs NP V VP-ving  vs NP V VP-to vs NP V VP NP  (I start; I
>> start lecturing; I start to lecture; I start a lecture).
>>
>> In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ...
>>
>> I would be ecstatic if there were an online search I can point my
>> students at, but would be interested in anything.
>>
>>
>>
>> --
>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>> Division of Linguistics and Multilingual Studies
>> Nanyang Technological University
>>
>
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>


-- 
Emily M. Bender (she/her)
Howard and Frances Nostrand Endowed Professor
Department of Linguistics
Faculty Director, CLMS
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200226/a1f92cb1/attachment-0001.html>

From tuananh.ke at gmail.com  Thu Feb 27 08:46:08 2020
From: tuananh.ke at gmail.com (=?UTF-8?B?VHXhuqVuIEFuaCBMw6o=?=)
Date: Thu, 27 Feb 2020 15:46:08 +0800
Subject: [developers] Options to extract syntax trees from FFTB
Message-ID: <CABBT-uZqPnQcKqPZ+dUiFF5D0Z2Hwb1EECtPjCmuou2-5ACCww@mail.gmail.com>

Hi everyone,

We are trying to use FFTB to tree bank a small corpus and we would like to
extract the chosen syntax trees from the corpus. The expected output would
be something like

It works --> ("S" ("NP" ("NP" ("it"))) ("VP" ("V" ("V" ("works")))))

Is there a way to extract this from the FFTB profile?

Currently I'm selecting the trees manually by parsing the sentences using
ACE with the options --report-label and then split the output string with "
;  " but I'm not sure if this is the best approach.

[erg-trunk]$ ace -g erg-0.9.26.dat --report-label
It works
SENT: It works
[ LTOP: h0 INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: -
PERF: - ] RELS: < [ pron<0:2> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg GEND: n
PT: std ] ]  [ pronoun_q<0:2> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ]  [
_work_v_1<3:8> LBL: h1 ARG0: e2 ARG1: x3 ARG2: i8 ] > HCONS: < h0 qeq h1 h6
qeq h4 > ICONS: < > ] ;  ("S" ("NP" ("NP" ("it"))) ("VP" ("V" ("V"
("works")))))
NOTE: 1 readings, added 391 / 68 edges to chart (27 fully instantiated, 35
actives used, 18 passives used) RAM: 1880k

Thank you
--
Tuan Anh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200227/343cb2bc/attachment.html>

From oe at ifi.uio.no  Thu Feb 27 19:37:01 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Thu, 27 Feb 2020 19:37:01 +0100
Subject: [developers] Options to extract syntax trees from FFTB
In-Reply-To: <CABBT-uZqPnQcKqPZ+dUiFF5D0Z2Hwb1EECtPjCmuou2-5ACCww@mail.gmail.com>
References: <CABBT-uZqPnQcKqPZ+dUiFF5D0Z2Hwb1EECtPjCmuou2-5ACCww@mail.gmail.com>
Message-ID: <CA+_Fm6J_ScWseRmLuKFgPsZWOU63bXPj7r4DC=Qcsw_Pmzh_5A@mail.gmail.com>

hi tu?n anh,

from what i recall about how FFTB writes tsdb(1) profiles, this should be
easy: once treebanking is complete, the ?result? relation should contain
one entry per item for each active derivation, typically one after full
disambiguation.

the ?derivation? field will always be there, but i am not quite sure
whether FFTB writes the ?tree? (labeled phrase structure) and ?mrs? fields?
 you should be able to observe that in your profiles.

if not, the LOGON ?redwoods? script can recreate labeled trees for each
derivation, using a command roughly like the following:

$LOGONROOT/redwoods ?terg ?export tree ?target /tmp <profile>

best wishes, oe


On Thu, 27 Feb 2020 at 08:48 Tu?n Anh L? <tuananh.ke at gmail.com> wrote:

> Hi everyone,
>
> We are trying to use FFTB to tree bank a small corpus and we would like to
> extract the chosen syntax trees from the corpus. The expected output would
> be something like
>
> It works --> ("S" ("NP" ("NP" ("it"))) ("VP" ("V" ("V" ("works")))))
>
> Is there a way to extract this from the FFTB profile?
>
> Currently I'm selecting the trees manually by parsing the sentences using
> ACE with the options --report-label and then split the output string with "
> ;  " but I'm not sure if this is the best approach.
>
> [erg-trunk]$ ace -g erg-0.9.26.dat --report-label
> It works
> SENT: It works
> [ LTOP: h0 INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: -
> PERF: - ] RELS: < [ pron<0:2> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg GEND: n
> PT: std ] ]  [ pronoun_q<0:2> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ]  [
> _work_v_1<3:8> LBL: h1 ARG0: e2 ARG1: x3 ARG2: i8 ] > HCONS: < h0 qeq h1 h6
> qeq h4 > ICONS: < > ] ;  ("S" ("NP" ("NP" ("it"))) ("VP" ("V" ("V"
> ("works")))))
> NOTE: 1 readings, added 391 / 68 edges to chart (27 fully instantiated, 35
> actives used, 18 passives used) RAM: 1880k
>
> Thank you
> --
> Tuan Anh
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200227/85c694ad/attachment.html>

From danf at stanford.edu  Mon Mar 23 17:55:10 2020
From: danf at stanford.edu (Dan Flickinger)
Date: Mon, 23 Mar 2020 16:55:10 +0000
Subject: [developers] character-based discriminants
In-Reply-To: <5E3A0DE5.4020502@sweaglesw.org>
References: <CA+_Fm6LCR3bi2hVL5M1S6B50_mnZDjfx-HZF8vSs9VT2_6OhZw@mail.gmail.com>
	<BB0292BD-A1CC-420F-80DF-684FDFFF76CF@sweaglesw.org>
	<CA+_Fm6KO+UTGjSEr9FG=VFKZZQZ61=or7ibC1isOv2zhNjinPA@mail.gmail.com>,
	<5E3A0DE5.4020502@sweaglesw.org>
Message-ID: <SN4PR0201MB35499D9F6172EBF62FABEBABBAF00@SN4PR0201MB3549.namprd02.prod.outlook.com>

Hi Woodley and Stephan,
[and with apologies to everyone else for the cryptic flavor of this note, which has to do with a conversion of the ERG to treat punctuation marks as separate tokens, for better interoperability with the rest of the universe]

I was able to use the converted `decision' files that you constructed during my visit in February, Woodley, with some non-zero additional manual disambiguation, and this morning I completed updating of the full set of 2018 gold trees into the makeover universe, including wsj00-04.  I would now be grateful if you could also provide converted decision files for the wsj05-12 profiles that had also been updated with the 2018 grammar after it was released.  Since the 2018mo grammar doesn't really have a natural home in SVN, I have put a full copy of it here, and included in its tsdb/gold directory both the recent updated profiles, and the 2018 ones for wsj05-wsj12 that I hope you'll convert:
http://lingo.stanford.edu/danf/2018mo.tgz

My intention is to now update these gold profiles from that time-warped 2018mo grammar to the SVN `mo' grammar (which we branched from `trunk' during my visit to Oslo in November).  If all goes well, we should then be in position to anoint `mo' as the official new `trunk' version, and use this as the basis for the next stable ERG release, ideally this summer.

I would also be interested to know if these now-manually-updated profiles allow you to train a better disambiguation model than the one you trained in February just on the automatically updated items.

Thanks for the help so far!

 Dan


________________________________
From: developers-bounces at emmtee.net <developers-bounces at emmtee.net> on behalf of Woodley Packard <sweaglesw at sweaglesw.org>
Sent: Tuesday, February 4, 2020 4:35 PM
To: Stephan Oepen <oe at ifi.uio.no>
Cc: developers at delph-in.net <developers at delph-in.net>
Subject: Re: [developers] character-based discriminants

Stephan and Dan, and other interested parties,

Happy new year to you all.  In the course of taking a closer look at how
the proposed character-based discriminant system might work, I've run
across a few cases that perhaps would benefit from a bit of discussion.
First, my attempt to distill the proposed action plan for an automatic
update (downdate?) of the ERG treebanks to the venerable PTB punctuation
convention is as follows:

1. Modify ACE and other engines to use input character positions as
token vertex identifiers, so that data coming out -- particularly the
full forest record in the "edge" relation -- uses these to identify
constituent boundaries instead of the existing identifiers
(corresponding roughly to whitespace areas).

2. Mechanically revise a copy of the "decisions" relation from the old
gold treebank so that the vertex identifiers in it are also
character-based, in hopes of matching those used in the new full forest
profiles.  Destroy any discriminants that are judged unlikely to match
correctly.

3. Run an automatic treebank update to achieve a high coverage gold
treebank under the new punctuation convention; manually fix any items
that didn't quite make it.

Stephan pointed out that the +FROM/+TO values on token AVMs are a way to
convert existing vertices to character positions.  Thinking a bit more
closely about this, there is at least one obvious problem: adjacent
tokens T1,T2 do not generally have the property that T1.+TO = T2.+FROM,
because there is usually whitespace between them.  Therefore the revised
scheme will have the property that whitespace adjacent to a constituent
will in a sense be considered part of the constituent in some cases.  I
consider that slightly weird, but perhaps not too big a deal.  The main
thing is we need to pick a convention as to which position in the
whitespace is to be considered the label of the vertex.  One candidate
convention would be that for any given vertex, its character-based label
is the smallest +FROM value of any token starting from it, if any, and
if no token starts at it, then the largest +TO value of any token ending
at it.  I would expect that at least in ordinary cases, possibly all
cases, all the incident +FROMs would be identical and all the +TOs would
be identical also, just with a difference between the +FROMs and +TOs.

A somewhat more troubling problem is that multiple token vertices in the
ERG can share the same +FROM and +TO.  This happens quite productively
with hyphenation, e.g.:

A four-footed zebra arose.

The historical ERG assigns [ +FROM "2" +TO "13" ] to both "four" and
"footed" even while the token lattice is split in the middle, i.e. there
are two tokens and there is a vertex "in between" them, but there is no
sensible character offset available to assign to it.  In the existing
vertex labeling scheme, the vertex labels are generated based on a
topological sort of the lattice, so we get:
a(0,1)
four(1,2)
footed(2,3)
zebra(3,4)
arose(4,5)

Using the convention proposed above, this would translate into:
a(0,3)
four(3,3)
footed(3,14)
zebra(14,20)
arose(20,26)

As you can see, there is a problem: two distinct vertices got smushed
into character position 3.  The situation is detectable automatically,
of course, and ACE actually already has a built-in hack to adjust token
+FROM and +TO in this case (making it possible to use the mouse to
select parts of a hyphenated group like that in FFTB), but relying on
that hack means hoping that ACE made the same decisions as the new
punctuation rules in this case and any others that I haven't thought of.

I am tempted to look at an alternative way of achieving the primary goal
(i.e. synchronizing the ERG treebanks to the revised punctuation
scheme).  It would I believe be possible, maybe even straightforward, to
make a tool that takes as input two token lattices (the old one and the
new one for the same sentence) and computes an alignment between them
that minimizes some notion of edit distance.  With that in hand, the
vertex identifiers of the old discriminants could be rewritten without
resorting to character positions or having to solve the above snafu.  It
also would require no changes to the parsing engines or the treebanking
tool, and would likely be at least partially reusable for future
tokenization changes.

Any suggestions?
Woodley

On 11/24/2019 03:43 PM, Stephan Oepen wrote:
> many thanks for the quick follow-up, woodley!
>
> in general, character-based discriminants feel attractive because the idea
> promises increased robustness to variation over time in tokenization.  and
> i am not sure yet i understand the difference in expressivity that you
> suggest?  an input to parsing is segmented into a sequence of vertices (or
> breaking points); whether to number these continuously (0, 1, 2, ?) or
> discontinuously according to e.g. corresponding character positions or time
> stamps (into a speech signal)?i would think i can encode the same broad
> range of lattices either way?
>
> closer to home, i was in fact thinking that the conversion from an existing
> set of discriminants to a character-based regime could in fact be more
> mechanic than the retooling you sketch.  each current vertex should be
> uniquely identified with a left and right character position, viz. the
> +FROM and +TO values, respectively, on the underlying token feature
> structures (i am assuming that all tokens in one cell share the same
> values).  for the vast majority of discriminants, would it not just work to
> replace their start and end vertices with these characters positions?
>
> i am prepared to lose some discriminants, e.g. any choices on the
> punctuation lexical rules that are being removed, but possibly also some
> lexical choices that in the old universe end up anchored to a sub-string
> including one or more punctuation marks.  in the 500-best treebanks, it
> used to be the case that pervasive redundancy of discriminants meant one
> could afford to lose a non-trivial number of discriminants during an update
> and still arrive at a unique solution.  but maybe that works differently in
> the full-forest universe?
>
> finally, i had not yet considered the ?twigs? (as they are an FFTB-specific
> innovation).  yes, it would seem unfortunate to just lose all twigs that
> included one or more of the old punctuation rules!  so your candidate
> strategy of cutting twigs into two parts (of which one might often come out
> empty) at occurrences of these rules strikes me as a promising (still quite
> mechanic) way of working around this problem.  formally, breaking up twigs
> risks losing some information, but in this case i doubt this would be the
> case in actuality.
>
> thanks for tossing around this idea!  oe
>
>
> On Sat, 23 Nov 2019 at 20:30 Woodley Packard <sweaglesw at sweaglesw.org>
> wrote:
>
>> Hi Stephan,
>>
>> My initial reaction to the notion of character-based discriminants is (1)
>> it will not solve your immediate problem without a certain amount of custom
>> tooling to convert old discriminants to new ones in a way that is sensitive
>> to how the current punctuation rules work, i.e. a given chart vertex will
>> have to be able to map to several different character positions depending
>> on how much punctuation has been cliticized so far.  The twig-shaped
>> discriminants used by FFTB will in some cases have to be bifurcated into
>> two or more discriminants, as well. Also, (2) this approach loses the
>> (theoretical if perhaps not recently used) ability to treebank a nonlinear
>> lattice shaped input, e.g. from an ASR system.  I could imagine treebanking
>> lattices from other sources as well ? perhaps an image caption generator.
>>
>> Given the custom tooling required for updating the discriminants, I?m not
>> sure switching to character-based anchoring would be less painful than
>> having that tool compute the new chart vertex anchoring instead ? though I
>> could be wrong.  What other arguments can be made in favor of
>> character-based discriminants?
>>
>> In terms of support from FFTB, I think there are relatively few places in
>> the code that assume the discriminants? from/to are interpretable beyond
>> matching the from/to values of the `edge? relation.  I think I would
>> implement this by (optionally, I suppose, since presumably other grammars
>> won?t want to do this at least for now) replacing the from/to on edges read
>> from the profile with character positions and more or less pretend that
>> there is a chart vertex for every character position.  Barring unforeseen
>> complications, that wouldn?t be too hard.
>>
>> Woodley
>>
>>> On Nov 23, 2019, at 5:58 AM, Stephan Oepen <oe at ifi.uio.no> wrote:
>>>
>>> hi again, woodley,
>>>
>>> dan and i are currently exploring a 'makeover' of ERG input
>>> processing, with the overall goal of increased compatibility with
>>> mainstream assumptions about tokenization.
>>>
>>> among other things, we would like to move to the revised (i.e.
>>> non-venerable) PTB (and OntoNotes and UD) tokenization conventions and
>>> avoid subsequent re-arranging of segmentation in token mapping.  this
>>> means we would have to move away from the pseudo-affixation treatment
>>> of punctuation marks to a 'pseudo-clitization' approach, meaning that
>>> punctuation marks are lexical entries in their own right and attach
>>> via binary constructions (rather than as lexical rules).  the 'clitic'
>>> metaphor, here, is intended to suggest that these lexical entries can
>>> only attach at the bottom of the derivation, i.e. to non-clitic
>>> lexical items immediately to their left (e.g. in the case of a comma)
>>> or to their right (in the case of, say, an opening quote or
>>> parenthesis).
>>>
>>> dan is currently visiting oslo, and we would like to use the
>>> opportunity to estimate the cost of moving to such a revised universe.
>>> treebank maintenance is a major concern here, as such a radical change
>>> in the yields of virtually all derivations would render discriminants
>>> invalid when updating to the new forests.  i believe a cute idea has
>>> emerged that, we optimistically believe, might eliminate much of that
>>> concern: character-based discriminant positions, instead of our
>>> venerable way of counting chart vertices.
>>>
>>> for the ERG at least, we believe that leaf nodes in all derivations
>>> are reliably annotated with character start and end positions (+FROM
>>> and +TO, as well as the +ID lists on token feature structures).  these
>>> sub-string indices will hardly be affected by the above change to
>>> tokenization (except for cases where our current approach to splitting
>>> at hyphens and slashes first in token mapping leads to overlapping
>>> ranges).  hence if discriminants were anchored over character ranges
>>> instead of chart cells ... i expect the vast majority of them might
>>> just carry over?
>>>
>>> we would be grateful if you (and others too, of course) could give the
>>> above idea some critical thought and look for possible obstacles that
>>> dan and i may just be overlooking?  technically, i imagine one would
>>> have to extend FFTB to (optionally) extract discriminant start and end
>>> positions from the sub-string 'coverage' of each constituent, possibly
>>> once convert existing treebanks to character-based indexing, and then
>>> update into the new universe using character-based matching.  does
>>> such an approach seem feasible to you in principle?
>>>
>>> cheers, oe
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200323/ae9a5ef4/attachment-0001.html>

From arademaker at gmail.com  Mon Mar 30 21:46:27 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Mon, 30 Mar 2020 16:46:27 -0300
Subject: [developers] Compiling FFTB on Ubuntu 19.10
Message-ID: <F36F3051-3C35-470C-9C00-6A63D5CA1975@gmail.com>


Hi Woodley,

I had to reinstall my machine and now I am trying to recompile all the tools. I gave up for compiling them for MacOS. That would be great for me, but in the MacOS I haven?t passed from the first step below. So I am compiling everything in a docker running Ubuntu 19.10.

My goal is to have FFTB running again, I can't use the acetools binaries you provided because it seems that http://sweaglesw.org/svn/treebank/trunk/web.c still don?t have the changed you made during a conversation in Cambridge:

Line 1290:

  addr.sin_addr.s_addr = 0; // inet_addr("127.0.0.1?);

Without that change, running FFTB inside the docker is not easy. We need a proxy server for redirecting the ports (as documented in http://moin.delph-in.net/FftbTop#FFTB_on_remote_machine), but with that change, we don?t need to proxy and can use the docker native way to redirect internal to external ports.

I have tried to follow the steps that worked for me last time:

1. install liba
2. install repp-0.2.2
3. install libace
4. Install libtsdb

I have success for 1-3, but in step 4 I got an error. The error was caused by

/usr/bin/ld: cannot find -ltsdb

This is a little bit strange because it seems that during the compilation of libtsdb it is looking for this same library? am I right?

The complete trace is below. I didn?t see this error before. Can you help me? I am blocked by this error...


$ make
cc -fPIC -shared -g -O2   -c -o tsdb.o tsdb.c
tsdb.c: In function ?tsdb_free_profile?:
tsdb.c:47:5: warning: implicit declaration of function ?hash_free_nokeys? [-Wimplicit-function-declaration]
   47 |     hash_free_nokeys(r->fields[j].hash);
      |     ^~~~~~~~~~~~~~~~
tsdb.c: In function ?tsdb_write_relation?:
tsdb.c:258:2: warning: implicit declaration of function ?unlink? [-Wimplicit-function-declaration]
  258 |  unlink(fname_bk);
      |  ^~~~~~
cc -fPIC -shared -g -O2   -c -o relations.o relations.c
gcc -fPIC -shared -g -O2 -fvisibility=hidden -c hash.c -o hash.o
gcc -fPIC -shared -g -O2 tsdb.o relations.o hash.o -shared -o libtsdb.so
rm -f libtsdb.a
ar cru libtsdb.a tsdb.o relations.o
ar: `u' modifier ignored since `D' is the default (see `U')
gcc -g -O2 -L. test.c -ltsdb -o test -Wl,-rpath -Wl,`pwd` -lace -ldl -la
test.c:74:1: warning: return type defaults to ?int? [-Wimplicit-int]
   74 | print_tree_with_edge_id(struct tree *t, int indent, int *edgemap)
      | ^~~~~~~~~~~~~~~~~~~~~~~
test.c:122:1: warning: return type defaults to ?int? [-Wimplicit-int]
  122 | record_eq_edges(int x_eid, int y_eid)
      | ^~~~~~~~~~~~~~~
test.c: In function ?report_missing_edges?:
test.c:165:3: warning: implicit declaration of function ?print_tree?; did you mean ?print_mrs?? [-Wimplicit-function-declaration]
  165 |   print_tree(t, 2);
      |   ^~~~~~~~~~
      |   print_mrs
test.c: At top level:
test.c:174:1: warning: return type defaults to ?int? [-Wimplicit-int]
  174 | fidget(struct tree *t)
      | ^~~~~~
test.c:190:1: warning: return type defaults to ?int? [-Wimplicit-int]
  190 | compare_tree_lists(char *iid, struct result *rx, int nx, struct result *ry, int ny, int detail, char *errx, char *erry)
      | ^~~~~~~~~~~~~~~~~~
test.c: In function ?compare_tree_lists?:
test.c:277:2: warning: implicit declaration of function ?hash_free?; did you mean ?hash_find?? [-Wimplicit-function-declaration]
  277 |  hash_free(hx);
      |  ^~~~~~~~~
      |  hash_find
test.c: In function ?tree_to_mrs?:
test.c:381:2: warning: implicit declaration of function ?clear_mrs?; did you mean ?read_mrs?? [-Wimplicit-function-declaration]
  381 |  clear_mrs();
      |  ^~~~~~~~~
      |  read_mrs
test.c: At top level:
test.c:421:1: warning: return type defaults to ?int? [-Wimplicit-int]
  421 | compare_surface_lists(char *iid, char *i_input, struct result *rx, int nx, struct result *ry, int ny, struct tree *gold_tree, struct mrs *gold_mrs, int detail)
      | ^~~~~~~~~~~~~~~~~~~~~
test.c:507:1: warning: return type defaults to ?int? [-Wimplicit-int]
  507 | usage(char *prog)
      | ^~~~~
test.c:585:1: warning: return type defaults to ?int? [-Wimplicit-int]
  585 | main(int argc, char *argv[])
      | ^~~~
test.c: In function ?main?:
test.c:599:2: warning: implicit declaration of function ?ace_load_grammar? [-Wimplicit-function-declaration]
  599 |  ace_load_grammar("/home/sweaglesw/cdev/ace-regression/comparison.dat");
      |  ^~~~~~~~~~~~~~~~
gcc -g -O2 art.c -lace -ltsdb -lrepp -la -o art -lutil
art.c:65:1: warning: return type defaults to ?int? [-Wimplicit-int]
   65 | usage(char *myname, int status)
      | ^~~~~
art.c:87:1: warning: return type defaults to ?int? [-Wimplicit-int]
   87 | main(int argc, char *argv[])
      | ^~~~
art.c: In function ?main?:
art.c:156:13: warning: implicit declaration of function ?forkpty?; did you mean ?fork?? [-Wimplicit-function-declaration]
  156 |   pid_t p = forkpty(&arbiter_fd, NULL, NULL, NULL);
      |             ^~~~~~~
      |             fork
art.c:301:16: warning: implicit declaration of function ?read_result?; did you mean ?record_result?? [-Wimplicit-function-declaration]
  301 |   int status = read_result(parse_id, run_id, i_id, i_input);
      |                ^~~~~~~~~~~
      |                record_result
art.c: At top level:
art.c:560:1: warning: return type defaults to ?int? [-Wimplicit-int]
  560 | write_tuple(FILE *f, char **tuple, struct relation *r)
      | ^~~~~~~~~~~
/usr/bin/ld: cannot find -ltsdb
collect2: error: ld returned 1 exit status
make: *** [Makefile:51: art] Error 1


Best,
Alexandre


From sweaglesw at sweaglesw.org  Mon Mar 30 23:11:35 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Mon, 30 Mar 2020 14:11:35 -0700
Subject: [developers] Compiling FFTB on Ubuntu 19.10
In-Reply-To: <F36F3051-3C35-470C-9C00-6A63D5CA1975@gmail.com>
References: <F36F3051-3C35-470C-9C00-6A63D5CA1975@gmail.com>
Message-ID: <0B0E6EF3-9F92-4FBF-8D41-A4B910C9E6A3@sweaglesw.org>

Hi Alex,

It looks like compiling the library succeeded but the test app failed, most likely just because the library is not yet installed.  Please install the libraries (make install, or however you prefer) and that should allow the test app to build.

-Woodley

> On Mar 30, 2020, at 12:47 PM, Alexandre Rademaker <arademaker at gmail.com> wrote:
> 
> ?
> Hi Woodley,
> 
> I had to reinstall my machine and now I am trying to recompile all the tools. I gave up for compiling them for MacOS. That would be great for me, but in the MacOS I haven?t passed from the first step below. So I am compiling everything in a docker running Ubuntu 19.10.
> 
> My goal is to have FFTB running again, I can't use the acetools binaries you provided because it seems that http://sweaglesw.org/svn/treebank/trunk/web.c still don?t have the changed you made during a conversation in Cambridge:
> 
> Line 1290:
> 
>  addr.sin_addr.s_addr = 0; // inet_addr("127.0.0.1?);
> 
> Without that change, running FFTB inside the docker is not easy. We need a proxy server for redirecting the ports (as documented in http://moin.delph-in.net/FftbTop#FFTB_on_remote_machine), but with that change, we don?t need to proxy and can use the docker native way to redirect internal to external ports.
> 
> I have tried to follow the steps that worked for me last time:
> 
> 1. install liba
> 2. install repp-0.2.2
> 3. install libace
> 4. Install libtsdb
> 
> I have success for 1-3, but in step 4 I got an error. The error was caused by
> 
> /usr/bin/ld: cannot find -ltsdb
> 
> This is a little bit strange because it seems that during the compilation of libtsdb it is looking for this same library? am I right?
> 
> The complete trace is below. I didn?t see this error before. Can you help me? I am blocked by this error...
> 
> 
> $ make
> cc -fPIC -shared -g -O2   -c -o tsdb.o tsdb.c
> tsdb.c: In function ?tsdb_free_profile?:
> tsdb.c:47:5: warning: implicit declaration of function ?hash_free_nokeys? [-Wimplicit-function-declaration]
>   47 |     hash_free_nokeys(r->fields[j].hash);
>      |     ^~~~~~~~~~~~~~~~
> tsdb.c: In function ?tsdb_write_relation?:
> tsdb.c:258:2: warning: implicit declaration of function ?unlink? [-Wimplicit-function-declaration]
>  258 |  unlink(fname_bk);
>      |  ^~~~~~
> cc -fPIC -shared -g -O2   -c -o relations.o relations.c
> gcc -fPIC -shared -g -O2 -fvisibility=hidden -c hash.c -o hash.o
> gcc -fPIC -shared -g -O2 tsdb.o relations.o hash.o -shared -o libtsdb.so
> rm -f libtsdb.a
> ar cru libtsdb.a tsdb.o relations.o
> ar: `u' modifier ignored since `D' is the default (see `U')
> gcc -g -O2 -L. test.c -ltsdb -o test -Wl,-rpath -Wl,`pwd` -lace -ldl -la
> test.c:74:1: warning: return type defaults to ?int? [-Wimplicit-int]
>   74 | print_tree_with_edge_id(struct tree *t, int indent, int *edgemap)
>      | ^~~~~~~~~~~~~~~~~~~~~~~
> test.c:122:1: warning: return type defaults to ?int? [-Wimplicit-int]
>  122 | record_eq_edges(int x_eid, int y_eid)
>      | ^~~~~~~~~~~~~~~
> test.c: In function ?report_missing_edges?:
> test.c:165:3: warning: implicit declaration of function ?print_tree?; did you mean ?print_mrs?? [-Wimplicit-function-declaration]
>  165 |   print_tree(t, 2);
>      |   ^~~~~~~~~~
>      |   print_mrs
> test.c: At top level:
> test.c:174:1: warning: return type defaults to ?int? [-Wimplicit-int]
>  174 | fidget(struct tree *t)
>      | ^~~~~~
> test.c:190:1: warning: return type defaults to ?int? [-Wimplicit-int]
>  190 | compare_tree_lists(char *iid, struct result *rx, int nx, struct result *ry, int ny, int detail, char *errx, char *erry)
>      | ^~~~~~~~~~~~~~~~~~
> test.c: In function ?compare_tree_lists?:
> test.c:277:2: warning: implicit declaration of function ?hash_free?; did you mean ?hash_find?? [-Wimplicit-function-declaration]
>  277 |  hash_free(hx);
>      |  ^~~~~~~~~
>      |  hash_find
> test.c: In function ?tree_to_mrs?:
> test.c:381:2: warning: implicit declaration of function ?clear_mrs?; did you mean ?read_mrs?? [-Wimplicit-function-declaration]
>  381 |  clear_mrs();
>      |  ^~~~~~~~~
>      |  read_mrs
> test.c: At top level:
> test.c:421:1: warning: return type defaults to ?int? [-Wimplicit-int]
>  421 | compare_surface_lists(char *iid, char *i_input, struct result *rx, int nx, struct result *ry, int ny, struct tree *gold_tree, struct mrs *gold_mrs, int detail)
>      | ^~~~~~~~~~~~~~~~~~~~~
> test.c:507:1: warning: return type defaults to ?int? [-Wimplicit-int]
>  507 | usage(char *prog)
>      | ^~~~~
> test.c:585:1: warning: return type defaults to ?int? [-Wimplicit-int]
>  585 | main(int argc, char *argv[])
>      | ^~~~
> test.c: In function ?main?:
> test.c:599:2: warning: implicit declaration of function ?ace_load_grammar? [-Wimplicit-function-declaration]
>  599 |  ace_load_grammar("/home/sweaglesw/cdev/ace-regression/comparison.dat");
>      |  ^~~~~~~~~~~~~~~~
> gcc -g -O2 art.c -lace -ltsdb -lrepp -la -o art -lutil
> art.c:65:1: warning: return type defaults to ?int? [-Wimplicit-int]
>   65 | usage(char *myname, int status)
>      | ^~~~~
> art.c:87:1: warning: return type defaults to ?int? [-Wimplicit-int]
>   87 | main(int argc, char *argv[])
>      | ^~~~
> art.c: In function ?main?:
> art.c:156:13: warning: implicit declaration of function ?forkpty?; did you mean ?fork?? [-Wimplicit-function-declaration]
>  156 |   pid_t p = forkpty(&arbiter_fd, NULL, NULL, NULL);
>      |             ^~~~~~~
>      |             fork
> art.c:301:16: warning: implicit declaration of function ?read_result?; did you mean ?record_result?? [-Wimplicit-function-declaration]
>  301 |   int status = read_result(parse_id, run_id, i_id, i_input);
>      |                ^~~~~~~~~~~
>      |                record_result
> art.c: At top level:
> art.c:560:1: warning: return type defaults to ?int? [-Wimplicit-int]
>  560 | write_tuple(FILE *f, char **tuple, struct relation *r)
>      | ^~~~~~~~~~~
> /usr/bin/ld: cannot find -ltsdb
> collect2: error: ld returned 1 exit status
> make: *** [Makefile:51: art] Error 1
> 
> 
> Best,
> Alexandre
> 
> 
> 


From bond at ieee.org  Wed Apr  8 14:21:59 2020
From: bond at ieee.org (Francis Bond)
Date: Wed, 8 Apr 2020 20:21:59 +0800
Subject: [developers] ace 3.1
Message-ID: <CA+arSXjntZHN3+_vmRC1KahQ=0iGnsWjGqjCAzShpfyYg-Mkrw@mail.gmail.com>

G'day,

on Ubuntu 18.04, fftb 0.09.30 works fine, but 0.09.31 is having library
issues:

$ ~/bin/acetools-x86-0.9.31/fftb -g qsg.dat --browser --webdir
~/bin/acetools-x86-0.9.31/assets/ trees/ts.04
/home/bond/bin/acetools-x86-0.9.31/fftb: relocation error:
/home/bond/bin/acetools-x86-0.9.31/fftb: symbol __get_cpu_features version
GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

Can anyone suggest a workaround?

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200408/eb9e6049/attachment.html>

From bond at ieee.org  Thu Apr  9 06:27:32 2020
From: bond at ieee.org (Francis Bond)
Date: Thu, 9 Apr 2020 12:27:32 +0800
Subject: [developers] Ungrammatical Input and the FFTB
Message-ID: <CA+arSXjmHzEimL6U-RgFXPbSdBNAZi2VGyGm93+iT=2WY9Runw@mail.gmail.com>

G'day,

if we are treebanking a profile with ungrammatical sentences (i-wf = 0),
what is the best practice?   Currently you cannot annotate them at all.  I
don't remember what we did in the fine system.  I feel it might be good to
be automatically accept an utterance with i-wf=0 and no parse, and reject
it if it has a parse, ....  But I am not really sure.

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200409/9060037e/attachment.html>

From danf at stanford.edu  Thu Apr  9 21:56:00 2020
From: danf at stanford.edu (Dan Flickinger)
Date: Thu, 9 Apr 2020 19:56:00 +0000
Subject: [developers] Ungrammatical Input and the FFTB
In-Reply-To: <CA+arSXjmHzEimL6U-RgFXPbSdBNAZi2VGyGm93+iT=2WY9Runw@mail.gmail.com>
References: <CA+arSXjmHzEimL6U-RgFXPbSdBNAZi2VGyGm93+iT=2WY9Runw@mail.gmail.com>
Message-ID: <SN4PR0201MB3549D9274848392A8476DEA4BAC10@SN4PR0201MB3549.namprd02.prod.outlook.com>

Hi Francis,

I might not quite follow you.  If a sentence doesn't get any parses, then there is nothing to do in treebanking, except move on to the next sentence, since the unparsed one will not offer you any discriminants to choose.  If it does get one or more parses, but is ungrammatical, I usually click "Reject".  But I am now starting in with parsing a set of sentences to train a better robust model for errorful student input, using the grammar with mal-rules, so here I don't reject all ungrammatical sentences, but only those where I still can't find an intended robust parse.

 Dan
________________________________
From: Francis Bond <bond at ieee.org>
Sent: Wednesday, April 8, 2020 9:27 PM
To: Dan Flickinger <danf at stanford.edu>; Woodley Packard <sweaglesw at sweaglesw.org>; developers at delph-in.net <developers at delph-in.net>
Subject: Ungrammatical Input and the FFTB

G'day,

if we are treebanking a profile with ungrammatical sentences (i-wf = 0), what is the best practice?   Currently you cannot annotate them at all.  I don't remember what we did in the fine system.  I feel it might be good to be automatically accept an utterance with i-wf=0 and no parse, and reject it if it has a parse, ....  But I am not really sure.

--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200409/acf1b0df/attachment.html>

From luis.passos.morgado at gmail.com  Wed May  6 06:09:49 2020
From: luis.passos.morgado at gmail.com (Luis Morgado da Costa)
Date: Wed, 6 May 2020 12:09:49 +0800
Subject: [developers] ACE crashing with ZHONG
In-Reply-To: <CACdUroN7rQvLp_-wHPmyUU2yyKgKdJMbeXNNO_MTXdvjTEL84Q@mail.gmail.com>
References: <CACdUroN7rQvLp_-wHPmyUU2yyKgKdJMbeXNNO_MTXdvjTEL84Q@mail.gmail.com>
Message-ID: <CACdUroNeLE3gOJyLeDcGezTVhUaAof1TpSxh2ca22L5b9ws-zw@mail.gmail.com>

Dear Woodley (or anyone else who can help),

Ace is crashing unexpectedly with at least two sentences in a large
regression test for ZHONG:
? ? ? ?? ? ? ? ? ?  (can-not-can lend me 1 CL pen?)
? ? ? ? ? ?  (want-not-want borrow book?)

Our suspicion is that the problem arises from the interaction of this
V-not-V question form in Mandarin, and the fact that in these examples the
verbs are auxiliaries.
The same error does not happen, for example, for the sentence:

? ? ? ? ? ? (you eat-not-eat mean?)

We repeatedly get the same error for these (and similar) sentences:
*ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.*
*Aborted (core dumped)*

=========================================================
$ ./ace -g zhong.dat
? ? ? ?? ? ? ? ? ?
ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
Aborted (core dumped)

./ace -g zhong.dat
? ? ? ? ? ?
ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
Aborted (core dumped)
==========================================================

This happens both in ACE 0.9.30 and 0.9.31; However, it does not happen
with LKB FOS (we get parses for both sentences above, see below).


[image: Screenshot from 2020-05-06 11-28-01.png]

[image: Screenshot from 2020-05-06 11-30-49.png]


Is there anything we might be missing? We would much appreciate if you
could help us solve this.

For testing, you might want to download the current (uncommitted) version
of ZHONG: https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV


Cheers,
Luis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200506/ca1518e3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2020-05-06 11-28-01.png
Type: image/png
Size: 234949 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200506/ca1518e3/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2020-05-06 11-30-49.png
Type: image/png
Size: 210808 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200506/ca1518e3/attachment-0003.png>

From sweaglesw at sweaglesw.org  Wed May  6 08:25:13 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Tue, 5 May 2020 23:25:13 -0700
Subject: [developers] ACE crashing with ZHONG
In-Reply-To: <CACdUroNeLE3gOJyLeDcGezTVhUaAof1TpSxh2ca22L5b9ws-zw@mail.gmail.com>
References: <CACdUroN7rQvLp_-wHPmyUU2yyKgKdJMbeXNNO_MTXdvjTEL84Q@mail.gmail.com>
	<CACdUroNeLE3gOJyLeDcGezTVhUaAof1TpSxh2ca22L5b9ws-zw@mail.gmail.com>
Message-ID: <57A32B7E-FD19-4E46-BD3E-AC8FEC2D3C87@sweaglesw.org>

Hi Luis,

I wasn't quite sure which ace/config.tdl to use in the Zhong tree, as there are several, but I guessed that maybe cmn/zhs/ace/config.tdl was a good place to start, and was able to reproduce the errors you found.  A little bit of hunting showed that this is a result of STEM containing unconstrained strings, and it looks like the culprit is the v_aux_ell-lr rule.  That rule fails to constrain the mother's STEM.FIRST value, and as a result, subsequent orthographemic rules (in this case, the abua-olr rule) can't tell what string they should be operating on.  Possibly v_aux_ell-lr should pass up the daughter's STEM value?

I hope that helps,
Woodley

> On May 5, 2020, at 9:09 PM, Luis Morgado da Costa <luis.passos.morgado at gmail.com> wrote:
> 
> 
> Dear Woodley (or anyone else who can help), 
> 
> Ace is crashing unexpectedly with at least two sentences in a large regression test for ZHONG:
> ? ? ? ?? ? ? ? ? ?  (can-not-can lend me 1 CL pen?)
> ? ? ? ? ? ?  (want-not-want borrow book?)
> 
> Our suspicion is that the problem arises from the interaction of this V-not-V question form in Mandarin, and the fact that in these examples the verbs are auxiliaries. 
> The same error does not happen, for example, for the sentence: 
> 
> ? ? ? ? ? ? (you eat-not-eat mean?)
> 
> We repeatedly get the same error for these (and similar) sentences: 
> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
> Aborted (core dumped)
> 
> =========================================================
> $ ./ace -g zhong.dat 
> ? ? ? ?? ? ? ? ? ?
> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
> Aborted (core dumped)
> 
> ./ace -g zhong.dat 
> ? ? ? ? ? ?
> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
> Aborted (core dumped)
> ==========================================================
> 
> This happens both in ACE 0.9.30 and 0.9.31; However, it does not happen with LKB FOS (we get parses for both sentences above, see below).
> 
> 
> <Screenshot from 2020-05-06 11-28-01.png>
> 
> <Screenshot from 2020-05-06 11-30-49.png>
> 
> 
> Is there anything we might be missing? We would much appreciate if you could help us solve this. 
> 
> For testing, you might want to download the current (uncommitted) version of ZHONG: https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV <https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV>
> 
>  
> 
> Cheers,
> Luis
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200505/186deaf4/attachment.html>

From luis.passos.morgado at gmail.com  Wed May  6 09:11:01 2020
From: luis.passos.morgado at gmail.com (Luis Morgado da Costa)
Date: Wed, 6 May 2020 15:11:01 +0800
Subject: [developers] ACE crashing with ZHONG
In-Reply-To: <57A32B7E-FD19-4E46-BD3E-AC8FEC2D3C87@sweaglesw.org>
References: <CACdUroN7rQvLp_-wHPmyUU2yyKgKdJMbeXNNO_MTXdvjTEL84Q@mail.gmail.com>
	<CACdUroNeLE3gOJyLeDcGezTVhUaAof1TpSxh2ca22L5b9ws-zw@mail.gmail.com>
	<57A32B7E-FD19-4E46-BD3E-AC8FEC2D3C87@sweaglesw.org>
Message-ID: <CACdUroN2WkGE6RXQ21bz+ZvYOCkcKrb4WpSrW3kYCSJ8wrdA7g@mail.gmail.com>

Thanks Woodley,

That helped a lot. Everything is working as expected now. I had forgotten
to also inherit from:

constant-lex-rule := lex-rule &
 [ STEM #stem,
   DTR [ STEM #stem ]].

Cheers,
Luis


On Wed, May 6, 2020 at 2:25 PM Woodley Packard <sweaglesw at sweaglesw.org>
wrote:

> Hi Luis,
>
> I wasn't quite sure which ace/config.tdl to use in the Zhong tree, as
> there are several, but I guessed that maybe cmn/zhs/ace/config.tdl was a
> good place to start, and was able to reproduce the errors you found.  A
> little bit of hunting showed that this is a result of STEM containing
> unconstrained strings, and it looks like the culprit is the v_aux_ell-lr
> rule.  That rule fails to constrain the mother's STEM.FIRST value, and as a
> result, subsequent orthographemic rules (in this case, the abua-olr rule)
> can't tell what string they should be operating on.  Possibly v_aux_ell-lr
> should pass up the daughter's STEM value?
>
> I hope that helps,
> Woodley
>
> On May 5, 2020, at 9:09 PM, Luis Morgado da Costa <
> luis.passos.morgado at gmail.com> wrote:
>
>
> Dear Woodley (or anyone else who can help),
>
> Ace is crashing unexpectedly with at least two sentences in a large
> regression test for ZHONG:
> ? ? ? ?? ? ? ? ? ?  (can-not-can lend me 1 CL pen?)
> ? ? ? ? ? ?  (want-not-want borrow book?)
>
> Our suspicion is that the problem arises from the interaction of this
> V-not-V question form in Mandarin, and the fact that in these examples the
> verbs are auxiliaries.
> The same error does not happen, for example, for the sentence:
>
> ? ? ? ? ? ? (you eat-not-eat mean?)
>
> We repeatedly get the same error for these (and similar) sentences:
> *ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.*
> *Aborted (core dumped)*
>
> =========================================================
> $ ./ace -g zhong.dat
> ? ? ? ?? ? ? ? ? ?
> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
> Aborted (core dumped)
>
> ./ace -g zhong.dat
> ? ? ? ? ? ?
> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
> Aborted (core dumped)
> ==========================================================
>
> This happens both in ACE 0.9.30 and 0.9.31; However, it does not happen
> with LKB FOS (we get parses for both sentences above, see below).
>
>
> <Screenshot from 2020-05-06 11-28-01.png>
>
> <Screenshot from 2020-05-06 11-30-49.png>
>
>
> Is there anything we might be missing? We would much appreciate if you
> could help us solve this.
>
> For testing, you might want to download the current (uncommitted) version
> of ZHONG:
> https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV
>
>
>
> Cheers,
> Luis
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200506/95c1c243/attachment.html>

From arademaker at gmail.com  Wed May  6 14:28:32 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 6 May 2020 09:28:32 -0300
Subject: [developers] ACE crashing with ZHONG
In-Reply-To: <CACdUroN2WkGE6RXQ21bz+ZvYOCkcKrb4WpSrW3kYCSJ8wrdA7g@mail.gmail.com>
References: <CACdUroN2WkGE6RXQ21bz+ZvYOCkcKrb4WpSrW3kYCSJ8wrdA7g@mail.gmail.com>
Message-ID: <2AE4ACB3-2048-40EA-9D84-1C72FD841446@gmail.com>


But why the grammar worked in LKB FOS? Just curious to understand the possible difference between the tools.

Alexandre 
Sent from my iPhone

> On 6 May 2020, at 04:12, Luis Morgado da Costa <luis.passos.morgado at gmail.com> wrote:
> 
> ?
> Thanks Woodley, 
> 
> That helped a lot. Everything is working as expected now. I had forgotten to also inherit from:
> 
> constant-lex-rule := lex-rule &
>  [ STEM #stem,
>    DTR [ STEM #stem ]].
> 
> Cheers,
> Luis
> 
> 
> 
>> On Wed, May 6, 2020 at 2:25 PM Woodley Packard <sweaglesw at sweaglesw.org> wrote:
>> Hi Luis,
>> 
>> I wasn't quite sure which ace/config.tdl to use in the Zhong tree, as there are several, but I guessed that maybe cmn/zhs/ace/config.tdl was a good place to start, and was able to reproduce the errors you found.  A little bit of hunting showed that this is a result of STEM containing unconstrained strings, and it looks like the culprit is the v_aux_ell-lr rule.  That rule fails to constrain the mother's STEM.FIRST value, and as a result, subsequent orthographemic rules (in this case, the abua-olr rule) can't tell what string they should be operating on.  Possibly v_aux_ell-lr should pass up the daughter's STEM value?
>> 
>> I hope that helps,
>> Woodley
>> 
>>> On May 5, 2020, at 9:09 PM, Luis Morgado da Costa <luis.passos.morgado at gmail.com> wrote:
>>> 
>>> 
>>> Dear Woodley (or anyone else who can help), 
>>> 
>>> Ace is crashing unexpectedly with at least two sentences in a large regression test for ZHONG:
>>> ? ? ? ?? ? ? ? ? ?  (can-not-can lend me 1 CL pen?)
>>> ? ? ? ? ? ?  (want-not-want borrow book?)
>>> 
>>> Our suspicion is that the problem arises from the interaction of this V-not-V question form in Mandarin, and the fact that in these examples the verbs are auxiliaries. 
>>> The same error does not happen, for example, for the sentence: 
>>> 
>>> ? ? ? ? ? ? (you eat-not-eat mean?)
>>> 
>>> We repeatedly get the same error for these (and similar) sentences: 
>>> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
>>> Aborted (core dumped)
>>> 
>>> =========================================================
>>> $ ./ace -g zhong.dat 
>>> ? ? ? ?? ? ? ? ? ?
>>> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
>>> Aborted (core dumped)
>>> 
>>> ./ace -g zhong.dat 
>>> ? ? ? ? ? ?
>>> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.
>>> Aborted (core dumped)
>>> ==========================================================
>>> 
>>> This happens both in ACE 0.9.30 and 0.9.31; However, it does not happen with LKB FOS (we get parses for both sentences above, see below).
>>> 
>>> 
>>> <Screenshot from 2020-05-06 11-28-01.png>
>>> 
>>> <Screenshot from 2020-05-06 11-30-49.png>
>>> 
>>> 
>>> Is there anything we might be missing? We would much appreciate if you could help us solve this. 
>>> 
>>> For testing, you might want to download the current (uncommitted) version of ZHONG: https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV
>>> 
>>> 
>>> Cheers,
>>> Luis
>>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200506/5bc1bae7/attachment.html>

From goodman.m.w at gmail.com  Tue May 12 09:55:08 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Tue, 12 May 2020 15:55:08 +0800
Subject: [developers] compiling fftb
In-Reply-To: <91487488-C19F-4903-BA57-8766A901F50E@sweaglesw.org>
References: <1E880046-6207-462E-8255-6B4780C26BC1@gmail.com>
	<0FC0A140-51CB-42E7-98F4-C7B3864BFCA0@sweaglesw.org>
	<A8964838-F827-46BF-A3B6-DC97C3C2FFA8@gmail.com>
	<E94D4B2F-31EA-43DB-AA5B-E3842543172A@sweaglesw.org>
	<3A085354-A45E-46D5-B57B-A8B8703B7276@gmail.com>
	<91487488-C19F-4903-BA57-8766A901F50E@sweaglesw.org>
Message-ID: <CAGXBFApGAQATypAXERUqADhCMSabpTkvsQVrTDWaaj5xpMw=Nw@mail.gmail.com>

Hi all,

I'm getting similar errors to Alexandre. I successfully compiled and
installed liba, repp-0.2.2, and then ace, but I'm getting the error that it
cannot find <tsdb.h> I try "make all" for libtsdb. I noticed that tsdb.h is
provided by libtsdb, and `#include <tsdb.h>` seems to look in my system
libraries. Changing all these to `#include "tsdb.h"` (thinking it might use
the file in the current directory) did not work, so I reverted those
changes and ran the following:

    make libtsdb.a  # required for 'make install'
    make libtsdb.so  # required for 'make install'
    make install  # copies the above 2 things plus tsdb.h to /usr/local/lib/

Then I tried running "make all" again and now I see this:

    [...]
    test.c: In function ?main?:
    test.c:599:2: warning: implicit declaration of function
?ace_load_grammar? [-Wimplicit-function-declaration]
      599 |
 ace_load_grammar("/home/sweaglesw/cdev/ace-regression/comparison.dat");
          |  ^~~~~~~~~~~~~~~~
    /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libm-2.31.a(e_exp.o): in
function `__ieee754_exp_ifunc':
    (.text+0x246): undefined reference to `_dl_x86_cpu_features'
    /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libm-2.31.a(e_log.o): in
function `__ieee754_log_ifunc':
    (.text+0x2c6): undefined reference to `_dl_x86_cpu_features'
    collect2: error: ld returned 1 exit status
    make: *** [Makefile:45: test.static] Error 1

It seems there's some incompatibility in glibc versions. This SO question
seems relevant: https://stackoverflow.com/q/56415996/1441112 ; maybe it's a
static vs. dynamic linking issue? Other than test.static, I was able to
make other targets, such as art and mkprof, but I see errors when I try to
run them:

    $ ./art -h
    ./art: error while loading shared libraries: libace.so: cannot open
shared object file: No such file or directory

But I have libace.so at /usr/local/lib/libace.so, so I'm not sure what went
wrong here. My end goal is to compile FFTB, and if I carry on with the
current setup I see the same errors as when compiling test.static when I do
"make fftb" for the FFTB source code. Does anybody know how to get around
these issues?

Some context:
* For compiling ACE I copied itsdb_libraries.tgz as described here:
http://moin.delph-in.net/AceInstall#Missing_itsdb.h
* I'm running Pop!_OS 20.04 (similar to Ubuntu), with glibc version 2.31


On Fri, Jul 19, 2019 at 10:38 PM Woodley Packard <sweaglesw at sweaglesw.org>
wrote:

> It looks like you are trying to compile the "liba" dependency.  MacOS does
> shared libraries quite differently from Linux.  it will probably be easiest
> to do it as a static library; try "make liba.a"?
>
> -Woodley
>
>
> > On Jul 19, 2019, at 6:02 AM, Alexandre Rademaker <arademaker at gmail.com>
> wrote:
> >
> >
> > Hi Woodley,
> >
> > Once I follow the proper order for compile the dependencies (liba,
> libace, libtsdb, fftb), I got everything to work at Linux. But no success o
> Mac OS yet!! :-(
> >
> > Any direction?
> >
> > I found that gcc-9 is the gcc installed from brew
> >
> > $ gcc-9 --version
> > gcc-9 (Homebrew GCC 9.1.0) 9.1.0
> > Copyright (C) 2019 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.  There is
> NO
> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> >
> >
> > These where my changes in the Makefile but I could not compile.
> >
> > $ svn diff Makefile
> > Index: Makefile
> > ===================================================================
> > --- Makefile  (revision 40)
> > +++ Makefile  (working copy)
> > @@ -1,6 +1,6 @@
> > -HDRS=net.h timer.h http.h web.h sql.h server.h aisle-rpc.h asta-rpc.h
> background.h daemon.h aside-rpc.h escape.h
> > -OBJS=net.o timer.o http.o web.o sql.o server.o aisle-rpc.o asta-rpc.o
> background.o daemon.o aside-rpc.o escape.o
> > -CC=gcc
> > +HDRS=net.h timer.h http.h web.h server.h aisle-rpc.h asta-rpc.h
> background.h daemon.h aside-rpc.h escape.h
> > +OBJS=net.o timer.o http.o web.o server.o aisle-rpc.o asta-rpc.o
> background.o daemon.o aside-rpc.o escape.o
> > +CC=gcc-9
> > CFLAGS=-g -O -shared -fPIC -pthread
> > #CFLAGS=-g -pg -O -shared -fPIC -pthread
> >
> > @@ -16,13 +16,13 @@
> >       cp liba.h /usr/local/include/
> >
> > tests: ${OBJS} liba.h
> > -     gcc -g -isystem . test.c ${OBJS} -lpq -lpthread -o test
> > +     ${CC} -g -isystem . test.c ${OBJS} -lpthread -o test
> >
> > shared-tests:
> > -     gcc -g test.c -la -o test
> > +     ${CC} -g test.c -la -o test
> >
> > liba.so: ${OBJS} liba.h Makefile
> > -     ld -shared ${OBJS} -o liba.so -lpq -lpthread
> > +     ld ${OBJS} -o liba.so -lpthread
> >
> >
> > The error is:
> >
> > $ make
> > ld net.o timer.o http.o web.o server.o aisle-rpc.o asta-rpc.o
> background.o daemon.o aside-rpc.o escape.o -o liba.so -lpthread
> > ld: warning: No version-min specified on command line
> > Undefined symbols for architecture x86_64:
> >  "_main", referenced from:
> >     implicit entry/start for main executable
> > ld: symbol(s) not found for inferred architecture x86_64
> > make: *** [liba.so] Error 1
> >
> >
> >
> > Best,
> >
> > --
> > Alexandre Rademaker
> > http://arademaker.github.io
> >
> >
> >
>
>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200512/a0b57cd5/attachment.html>

From petterha at gmail.com  Wed May 13 15:08:52 2020
From: petterha at gmail.com (Petter Haugereid)
Date: Wed, 13 May 2020 15:08:52 +0200
Subject: [developers] Treebanking and training with FFT
Message-ID: <CAOW6Yq9oUSTX+Eufn++JXRnW=Ja_YSGC+TQxBKnuqYKW=FgQQg@mail.gmail.com>

Hi everybody,

I have been trying over some days to make treebanking work with FFT.
Following instructions on the DELPH-IN site, I have given the commands
below, and I end up with a browser window with the items of the profile I
attempt to treebank. However, when I click on one of the items, I get an
error message "404 Not Found". Do any of you know what I am doing wrong?

Here are the commands (with full paths):
mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test
art -f -a '~/tools/ace-0.9.30/ace --disable-generalization -g
~/tools/ace-0.9.30/norwegian-small.dat -O' /tmp/mrs-test
~/acetools-x86-0.9.30/fftb -g ~/tools/ace-0.9.30/norwegian-small.dat
 --browser --webdir ~/acetools /tmp/mrs-test/

I am quite keen to get a statistical model for my grammar, so I have tried
to train a model from a small treebank which I have disambiguated with the
logon tool. When I try to train with LOGON, only get a lot of garbage
collection messages, and I eventually have to kill the process. When I try
to train with FFT with the following commands, I get the messages below:

mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test/
art -a '~/tools/ace-0.9.30/ace -g ~/tools/ace-0.9.30/norwegian-small.dat
-O' -f /tmp/mrs-test/
FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem &
FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffworker
~/tools/ace-0.9.30/norwegian-small.dat /tmp/mrs-test/
~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ localhost

# loading /tmp/mrs-test/...
# loading /home/petter/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/...
# loading gold
#  ... iid 1 -- gold tree 1 / 1 not in parse forest
#  ... iid 2 -- gold tree 1 / 1 not in parse forest
#  ... iid 3 -- gold tree 1 / 1 not in parse forest
...
#  ... iid 68 -- gold tree 1 / 1 not in parse forest
#  ... iid 69 -- gold tree 1 / 1 not in parse forest
# loaded 0 ambiguous feature forests with gold trees
# [1]+  Exit 255                FFGRANDPARENT=0
~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem
# Floating point exception (core dumped)

I tried the same commands with the ERG MRS treebank in LOGON, and I was
able to train a model with it. I suspect the reason I don't succeed, is
that I have treebanked with LOGON, while Dan has used FFT.

Here are links to
1) the MRS treebank
 https://www.dropbox.com/s/7mj53j644vwhbes/mrs.2020.05.12.tgz?dl=0
2) The Norwegian MRS items I have treebanked
https://www.dropbox.com/s/qfhuqwnxlz0e1pb/mrs.txt?dl=0
3) The Norsyg grammar (loading 'lkb/small-script' with the LKB,
'ace/config-small.tdl' with ACE is sufficient)
https://www.dropbox.com/s/rmoy6q40dvz1dxh/norsyg.20-05-13.tgz?dl=0
4) A compiled version of the grammar, compiled with ace-0.9.30
https://www.dropbox.com/s/cb0dq9omuhojlmv/norwegian-small.dat?dl=0

If someone can point me to what I am doing wrong, I would be very greatful!

Best,

Petter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200513/b3144d7f/attachment.html>

From bond at ieee.org  Wed May 13 15:20:12 2020
From: bond at ieee.org (Francis Bond)
Date: Wed, 13 May 2020 21:20:12 +0800
Subject: [developers] Treebanking and training with FFT
In-Reply-To: <CAOW6Yq9oUSTX+Eufn++JXRnW=Ja_YSGC+TQxBKnuqYKW=FgQQg@mail.gmail.com>
References: <CAOW6Yq9oUSTX+Eufn++JXRnW=Ja_YSGC+TQxBKnuqYKW=FgQQg@mail.gmail.com>
Message-ID: <CA+arSXhSTix1xUN9HeD-WAxzy9p9txfF8OHXeumh=NYXURgJyw@mail.gmail.com>

Hi,

We successfully treebanked recently, using (and updating) the wiki page.
 Is the webdir correct?  It should have the files control.js, index.html
and render.js in it.   We found it in  ace-tools-x86.0.9.31/assets  (but
not in 0.9.30).  However 0.9.31 did not work for some reason, so we used
the grammar and fftb from 0.9.30 and the webdir from 0.9.31.   They are
also included somewhere in the logon tree.

I hope this helps.


On Wed, May 13, 2020 at 9:09 PM Petter Haugereid <petterha at gmail.com> wrote:

> Hi everybody,
>
> I have been trying over some days to make treebanking work with FFT.
> Following instructions on the DELPH-IN site, I have given the commands
> below, and I end up with a browser window with the items of the profile I
> attempt to treebank. However, when I click on one of the items, I get an
> error message "404 Not Found". Do any of you know what I am doing wrong?
>
> Here are the commands (with full paths):
> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test
> art -f -a '~/tools/ace-0.9.30/ace --disable-generalization -g
> ~/tools/ace-0.9.30/norwegian-small.dat -O' /tmp/mrs-test
> ~/acetools-x86-0.9.30/fftb -g ~/tools/ace-0.9.30/norwegian-small.dat
>  --browser --webdir ~/acetools /tmp/mrs-test/
>
> I am quite keen to get a statistical model for my grammar, so I have tried
> to train a model from a small treebank which I have disambiguated with the
> logon tool. When I try to train with LOGON, only get a lot of garbage
> collection messages, and I eventually have to kill the process. When I try
> to train with FFT with the following commands, I get the messages below:
>
> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test/
> art -a '~/tools/ace-0.9.30/ace -g ~/tools/ace-0.9.30/norwegian-small.dat
> -O' -f /tmp/mrs-test/
> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem &
> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffworker
> ~/tools/ace-0.9.30/norwegian-small.dat /tmp/mrs-test/
> ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ localhost
>
> # loading /tmp/mrs-test/...
> # loading /home/petter/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/...
> # loading gold
> #  ... iid 1 -- gold tree 1 / 1 not in parse forest
> #  ... iid 2 -- gold tree 1 / 1 not in parse forest
> #  ... iid 3 -- gold tree 1 / 1 not in parse forest
> ...
> #  ... iid 68 -- gold tree 1 / 1 not in parse forest
> #  ... iid 69 -- gold tree 1 / 1 not in parse forest
> # loaded 0 ambiguous feature forests with gold trees
> # [1]+  Exit 255                FFGRANDPARENT=0
> ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem
> # Floating point exception (core dumped)
>
> I tried the same commands with the ERG MRS treebank in LOGON, and I was
> able to train a model with it. I suspect the reason I don't succeed, is
> that I have treebanked with LOGON, while Dan has used FFT.
>
> Here are links to
> 1) the MRS treebank
>  https://www.dropbox.com/s/7mj53j644vwhbes/mrs.2020.05.12.tgz?dl=0
> 2) The Norwegian MRS items I have treebanked
> https://www.dropbox.com/s/qfhuqwnxlz0e1pb/mrs.txt?dl=0
> 3) The Norsyg grammar (loading 'lkb/small-script' with the LKB,
> 'ace/config-small.tdl' with ACE is sufficient)
> https://www.dropbox.com/s/rmoy6q40dvz1dxh/norsyg.20-05-13.tgz?dl=0
> 4) A compiled version of the grammar, compiled with ace-0.9.30
> https://www.dropbox.com/s/cb0dq9omuhojlmv/norwegian-small.dat?dl=0
>
> If someone can point me to what I am doing wrong, I would be very greatful!
>
> Best,
>
> Petter
>


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200513/f657bc7c/attachment.html>

From petterha at gmail.com  Wed May 13 19:45:11 2020
From: petterha at gmail.com (Petter Haugereid)
Date: Wed, 13 May 2020 19:45:11 +0200
Subject: [developers] Treebanking and training with FFT
In-Reply-To: <CA+arSXhSTix1xUN9HeD-WAxzy9p9txfF8OHXeumh=NYXURgJyw@mail.gmail.com>
References: <CAOW6Yq9oUSTX+Eufn++JXRnW=Ja_YSGC+TQxBKnuqYKW=FgQQg@mail.gmail.com>
	<CA+arSXhSTix1xUN9HeD-WAxzy9p9txfF8OHXeumh=NYXURgJyw@mail.gmail.com>
Message-ID: <CAOW6Yq93STrog=EbtnWBy8ys=2-gWo0gRZC-yh1iOJSkYe9qpw@mail.gmail.com>

Yes, it helped!
I changed the webdir to ~/logon/lingo/answer/fftb/ (where I found the files
you mentioned), and then I could treebank with fftb. I was also able to
train a model.
Thank you very much!

Petter

On Wed, May 13, 2020 at 3:20 PM Francis Bond <bond at ieee.org> wrote:

> Hi,
>
> We successfully treebanked recently, using (and updating) the wiki page.
>  Is the webdir correct?  It should have the files control.js, index.html
> and render.js in it.   We found it in  ace-tools-x86.0.9.31/assets  (but
> not in 0.9.30).  However 0.9.31 did not work for some reason, so we used
> the grammar and fftb from 0.9.30 and the webdir from 0.9.31.   They are
> also included somewhere in the logon tree.
>
> I hope this helps.
>
>
>
>
>
> On Wed, May 13, 2020 at 9:09 PM Petter Haugereid <petterha at gmail.com>
> wrote:
>
>> Hi everybody,
>>
>> I have been trying over some days to make treebanking work with FFT.
>> Following instructions on the DELPH-IN site, I have given the commands
>> below, and I end up with a browser window with the items of the profile I
>> attempt to treebank. However, when I click on one of the items, I get an
>> error message "404 Not Found". Do any of you know what I am doing wrong?
>>
>> Here are the commands (with full paths):
>> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test
>> art -f -a '~/tools/ace-0.9.30/ace --disable-generalization -g
>> ~/tools/ace-0.9.30/norwegian-small.dat -O' /tmp/mrs-test
>> ~/acetools-x86-0.9.30/fftb -g ~/tools/ace-0.9.30/norwegian-small.dat
>>  --browser --webdir ~/acetools /tmp/mrs-test/
>>
>> I am quite keen to get a statistical model for my grammar, so I have
>> tried to train a model from a small treebank which I have disambiguated
>> with the logon tool. When I try to train with LOGON, only get a lot of
>> garbage collection messages, and I eventually have to kill the process.
>> When I try to train with FFT with the following commands, I get the
>> messages below:
>>
>> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test/
>> art -a '~/tools/ace-0.9.30/ace -g ~/tools/ace-0.9.30/norwegian-small.dat
>> -O' -f /tmp/mrs-test/
>> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem &
>> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffworker
>> ~/tools/ace-0.9.30/norwegian-small.dat /tmp/mrs-test/
>> ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ localhost
>>
>> # loading /tmp/mrs-test/...
>> # loading /home/petter/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/...
>> # loading gold
>> #  ... iid 1 -- gold tree 1 / 1 not in parse forest
>> #  ... iid 2 -- gold tree 1 / 1 not in parse forest
>> #  ... iid 3 -- gold tree 1 / 1 not in parse forest
>> ...
>> #  ... iid 68 -- gold tree 1 / 1 not in parse forest
>> #  ... iid 69 -- gold tree 1 / 1 not in parse forest
>> # loaded 0 ambiguous feature forests with gold trees
>> # [1]+  Exit 255                FFGRANDPARENT=0
>> ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem
>> # Floating point exception (core dumped)
>>
>> I tried the same commands with the ERG MRS treebank in LOGON, and I was
>> able to train a model with it. I suspect the reason I don't succeed, is
>> that I have treebanked with LOGON, while Dan has used FFT.
>>
>> Here are links to
>> 1) the MRS treebank
>>  https://www.dropbox.com/s/7mj53j644vwhbes/mrs.2020.05.12.tgz?dl=0
>> 2) The Norwegian MRS items I have treebanked
>> https://www.dropbox.com/s/qfhuqwnxlz0e1pb/mrs.txt?dl=0
>> 3) The Norsyg grammar (loading 'lkb/small-script' with the LKB,
>> 'ace/config-small.tdl' with ACE is sufficient)
>> https://www.dropbox.com/s/rmoy6q40dvz1dxh/norsyg.20-05-13.tgz?dl=0
>> 4) A compiled version of the grammar, compiled with ace-0.9.30
>> https://www.dropbox.com/s/cb0dq9omuhojlmv/norwegian-small.dat?dl=0
>>
>> If someone can point me to what I am doing wrong, I would be very
>> greatful!
>>
>> Best,
>>
>> Petter
>>
>
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200513/50373d99/attachment-0001.html>

From bond at ieee.org  Thu May 14 03:38:12 2020
From: bond at ieee.org (Francis Bond)
Date: Thu, 14 May 2020 09:38:12 +0800
Subject: [developers] Treebanking and training with FFT
In-Reply-To: <CAOW6Yq93STrog=EbtnWBy8ys=2-gWo0gRZC-yh1iOJSkYe9qpw@mail.gmail.com>
References: <CAOW6Yq9oUSTX+Eufn++JXRnW=Ja_YSGC+TQxBKnuqYKW=FgQQg@mail.gmail.com>
	<CA+arSXhSTix1xUN9HeD-WAxzy9p9txfF8OHXeumh=NYXURgJyw@mail.gmail.com>
	<CAOW6Yq93STrog=EbtnWBy8ys=2-gWo0gRZC-yh1iOJSkYe9qpw@mail.gmail.com>
Message-ID: <CA+arSXg=f5-jh5804wMgR7e-B9Wb1oF2MrQpNJYJRErQKSttvA@mail.gmail.com>

Great.

I added a bit more to the documentation, just in case.

On Thu, May 14, 2020 at 1:45 AM Petter Haugereid <petterha at gmail.com> wrote:

> Yes, it helped!
> I changed the webdir to ~/logon/lingo/answer/fftb/ (where I found the
> files you mentioned), and then I could treebank with fftb. I was also able
> to train a model.
> Thank you very much!
>
> Petter
>
> On Wed, May 13, 2020 at 3:20 PM Francis Bond <bond at ieee.org> wrote:
>
>> Hi,
>>
>> We successfully treebanked recently, using (and updating) the wiki page.
>>  Is the webdir correct?  It should have the files control.js, index.html
>> and render.js in it.   We found it in  ace-tools-x86.0.9.31/assets  (but
>> not in 0.9.30).  However 0.9.31 did not work for some reason, so we used
>> the grammar and fftb from 0.9.30 and the webdir from 0.9.31.   They are
>> also included somewhere in the logon tree.
>>
>> I hope this helps.
>>
>>
>>
>>
>>
>> On Wed, May 13, 2020 at 9:09 PM Petter Haugereid <petterha at gmail.com>
>> wrote:
>>
>>> Hi everybody,
>>>
>>> I have been trying over some days to make treebanking work with FFT.
>>> Following instructions on the DELPH-IN site, I have given the commands
>>> below, and I end up with a browser window with the items of the profile I
>>> attempt to treebank. However, when I click on one of the items, I get an
>>> error message "404 Not Found". Do any of you know what I am doing wrong?
>>>
>>> Here are the commands (with full paths):
>>> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test
>>> art -f -a '~/tools/ace-0.9.30/ace --disable-generalization -g
>>> ~/tools/ace-0.9.30/norwegian-small.dat -O' /tmp/mrs-test
>>> ~/acetools-x86-0.9.30/fftb -g ~/tools/ace-0.9.30/norwegian-small.dat
>>>  --browser --webdir ~/acetools /tmp/mrs-test/
>>>
>>> I am quite keen to get a statistical model for my grammar, so I have
>>> tried to train a model from a small treebank which I have disambiguated
>>> with the logon tool. When I try to train with LOGON, only get a lot of
>>> garbage collection messages, and I eventually have to kill the process.
>>> When I try to train with FFT with the following commands, I get the
>>> messages below:
>>>
>>> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test/
>>> art -a '~/tools/ace-0.9.30/ace -g ~/tools/ace-0.9.30/norwegian-small.dat
>>> -O' -f /tmp/mrs-test/
>>> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem &
>>> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffworker
>>> ~/tools/ace-0.9.30/norwegian-small.dat /tmp/mrs-test/
>>> ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ localhost
>>>
>>> # loading /tmp/mrs-test/...
>>> # loading /home/petter/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/...
>>> # loading gold
>>> #  ... iid 1 -- gold tree 1 / 1 not in parse forest
>>> #  ... iid 2 -- gold tree 1 / 1 not in parse forest
>>> #  ... iid 3 -- gold tree 1 / 1 not in parse forest
>>> ...
>>> #  ... iid 68 -- gold tree 1 / 1 not in parse forest
>>> #  ... iid 69 -- gold tree 1 / 1 not in parse forest
>>> # loaded 0 ambiguous feature forests with gold trees
>>> # [1]+  Exit 255                FFGRANDPARENT=0
>>> ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem
>>> # Floating point exception (core dumped)
>>>
>>> I tried the same commands with the ERG MRS treebank in LOGON, and I was
>>> able to train a model with it. I suspect the reason I don't succeed, is
>>> that I have treebanked with LOGON, while Dan has used FFT.
>>>
>>> Here are links to
>>> 1) the MRS treebank
>>>  https://www.dropbox.com/s/7mj53j644vwhbes/mrs.2020.05.12.tgz?dl=0
>>> 2) The Norwegian MRS items I have treebanked
>>> https://www.dropbox.com/s/qfhuqwnxlz0e1pb/mrs.txt?dl=0
>>> 3) The Norsyg grammar (loading 'lkb/small-script' with the LKB,
>>> 'ace/config-small.tdl' with ACE is sufficient)
>>> https://www.dropbox.com/s/rmoy6q40dvz1dxh/norsyg.20-05-13.tgz?dl=0
>>> 4) A compiled version of the grammar, compiled with ace-0.9.30
>>> https://www.dropbox.com/s/cb0dq9omuhojlmv/norwegian-small.dat?dl=0
>>>
>>> If someone can point me to what I am doing wrong, I would be very
>>> greatful!
>>>
>>> Best,
>>>
>>> Petter
>>>
>>
>>
>> --
>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>> Division of Linguistics and Multilingual Studies
>> Nanyang Technological University
>>
>

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200514/9063d6b9/attachment.html>

From goodman.m.w at gmail.com  Fri May 15 08:50:22 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 15 May 2020 14:50:22 +0800
Subject: [developers] compiling fftb
In-Reply-To: <ED7DF53C-5C91-40DC-BEBE-A6D5E420B700@sweaglesw.org>
References: <1E880046-6207-462E-8255-6B4780C26BC1@gmail.com>
	<0FC0A140-51CB-42E7-98F4-C7B3864BFCA0@sweaglesw.org>
	<A8964838-F827-46BF-A3B6-DC97C3C2FFA8@gmail.com>
	<E94D4B2F-31EA-43DB-AA5B-E3842543172A@sweaglesw.org>
	<3A085354-A45E-46D5-B57B-A8B8703B7276@gmail.com>
	<91487488-C19F-4903-BA57-8766A901F50E@sweaglesw.org>
	<CAGXBFApGAQATypAXERUqADhCMSabpTkvsQVrTDWaaj5xpMw=Nw@mail.gmail.com>
	<ED7DF53C-5C91-40DC-BEBE-A6D5E420B700@sweaglesw.org>
Message-ID: <CAGXBFAqDHiVAgj9W05UAYFCctDx-mqXLmz_7vWYkEuYTQdMpgw@mail.gmail.com>

Hi Woodley,

(I re-added the developers list on CC so they can see the fix)

Moving -lm to after -Wl,-Bdynamic did the trick. Strangely, once I did that
and `make all && make install` for libtsdb, the other errors went away. Not
sure if they were related or something else changed on my system in the
meantime. And as you said, this also fixed the compiling of FFTB for me.

Cheers,

On Thu, May 14, 2020 at 1:14 AM Woodley Packard <sweaglesw at sweaglesw.org>
wrote:

> Hi Mike,
>
> The errors while compiling test.static and friends do appear to be related
> to the stack overflow thread you found.  Fortunately those binaries are not
> required; you should be able to use the dynamic ones just fine.  If you
> want the ones that have the support libraries compiled in statically, I
> recommend moving -lm from inside of the static link block in
> TOOL_STATIC_LDFLAGS to after the -Wl,-Bdynamic.  I've done that at my end
> now; thanks for the report.  The same should work for the LIBS setting for
> FFTB's Makefile.
>
> The error you're seeing when running art most likely is a result of your
> system's shared library search path not including /usr/local/lib/.  Your
> options would be to put libace.so somewhere your system expects to find it
> or add that path.  To do that latter, you can edit /etc/ld.so.conf or
> /etc/ld.so.conf.d/, or use LD_LIBRARY_PATH.
>
> Let me know if that helps resolve the issues at your end.
>
> Thanks,
> Woodley
>
> On May 12, 2020, at 12:55 AM, goodman.m.w at gmail.com wrote:
>
> Hi all,
>
> I'm getting similar errors to Alexandre. I successfully compiled and
> installed liba, repp-0.2.2, and then ace, but I'm getting the error that it
> cannot find <tsdb.h> I try "make all" for libtsdb. I noticed that tsdb.h is
> provided by libtsdb, and `#include <tsdb.h>` seems to look in my system
> libraries. Changing all these to `#include "tsdb.h"` (thinking it might use
> the file in the current directory) did not work, so I reverted those
> changes and ran the following:
>
>     make libtsdb.a  # required for 'make install'
>     make libtsdb.so  # required for 'make install'
>     make install  # copies the above 2 things plus tsdb.h to
> /usr/local/lib/
>
> Then I tried running "make all" again and now I see this:
>
>     [...]
>     test.c: In function ?main?:
>     test.c:599:2: warning: implicit declaration of function
> ?ace_load_grammar? [-Wimplicit-function-declaration]
>       599 |
>  ace_load_grammar("/home/sweaglesw/cdev/ace-regression/comparison.dat");
>           |  ^~~~~~~~~~~~~~~~
>     /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libm-2.31.a(e_exp.o): in
> function `__ieee754_exp_ifunc':
>     (.text+0x246): undefined reference to `_dl_x86_cpu_features'
>     /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libm-2.31.a(e_log.o): in
> function `__ieee754_log_ifunc':
>     (.text+0x2c6): undefined reference to `_dl_x86_cpu_features'
>     collect2: error: ld returned 1 exit status
>     make: *** [Makefile:45: test.static] Error 1
>
> It seems there's some incompatibility in glibc versions. This SO question
> seems relevant: https://stackoverflow.com/q/56415996/1441112 ; maybe it's
> a static vs. dynamic linking issue? Other than test.static, I was able to
> make other targets, such as art and mkprof, but I see errors when I try to
> run them:
>
>     $ ./art -h
>     ./art: error while loading shared libraries: libace.so: cannot open
> shared object file: No such file or directory
>
> But I have libace.so at /usr/local/lib/libace.so, so I'm not sure what
> went wrong here. My end goal is to compile FFTB, and if I carry on with the
> current setup I see the same errors as when compiling test.static when I do
> "make fftb" for the FFTB source code. Does anybody know how to get around
> these issues?
>
> Some context:
> * For compiling ACE I copied itsdb_libraries.tgz as described here:
> http://moin.delph-in.net/AceInstall#Missing_itsdb.h
> * I'm running Pop!_OS 20.04 (similar to Ubuntu), with glibc version 2.31
>
>
> On Fri, Jul 19, 2019 at 10:38 PM Woodley Packard <sweaglesw at sweaglesw.org>
> wrote:
>
>> It looks like you are trying to compile the "liba" dependency.  MacOS
>> does shared libraries quite differently from Linux.  it will probably be
>> easiest to do it as a static library; try "make liba.a"?
>>
>> -Woodley
>>
>>
>> > On Jul 19, 2019, at 6:02 AM, Alexandre Rademaker <arademaker at gmail.com>
>> wrote:
>> >
>> >
>> > Hi Woodley,
>> >
>> > Once I follow the proper order for compile the dependencies (liba,
>> libace, libtsdb, fftb), I got everything to work at Linux. But no success o
>> Mac OS yet!! :-(
>> >
>> > Any direction?
>> >
>> > I found that gcc-9 is the gcc installed from brew
>> >
>> > $ gcc-9 --version
>> > gcc-9 (Homebrew GCC 9.1.0) 9.1.0
>> > Copyright (C) 2019 Free Software Foundation, Inc.
>> > This is free software; see the source for copying conditions.  There is
>> NO
>> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>> >
>> >
>> > These where my changes in the Makefile but I could not compile.
>> >
>> > $ svn diff Makefile
>> > Index: Makefile
>> > ===================================================================
>> > --- Makefile  (revision 40)
>> > +++ Makefile  (working copy)
>> > @@ -1,6 +1,6 @@
>> > -HDRS=net.h timer.h http.h web.h sql.h server.h aisle-rpc.h asta-rpc.h
>> background.h daemon.h aside-rpc.h escape.h
>> > -OBJS=net.o timer.o http.o web.o sql.o server.o aisle-rpc.o asta-rpc.o
>> background.o daemon.o aside-rpc.o escape.o
>> > -CC=gcc
>> > +HDRS=net.h timer.h http.h web.h server.h aisle-rpc.h asta-rpc.h
>> background.h daemon.h aside-rpc.h escape.h
>> > +OBJS=net.o timer.o http.o web.o server.o aisle-rpc.o asta-rpc.o
>> background.o daemon.o aside-rpc.o escape.o
>> > +CC=gcc-9
>> > CFLAGS=-g -O -shared -fPIC -pthread
>> > #CFLAGS=-g -pg -O -shared -fPIC -pthread
>> >
>> > @@ -16,13 +16,13 @@
>> >       cp liba.h /usr/local/include/
>> >
>> > tests: ${OBJS} liba.h
>> > -     gcc -g -isystem . test.c ${OBJS} -lpq -lpthread -o test
>> > +     ${CC} -g -isystem . test.c ${OBJS} -lpthread -o test
>> >
>> > shared-tests:
>> > -     gcc -g test.c -la -o test
>> > +     ${CC} -g test.c -la -o test
>> >
>> > liba.so: ${OBJS} liba.h Makefile
>> > -     ld -shared ${OBJS} -o liba.so -lpq -lpthread
>> > +     ld ${OBJS} -o liba.so -lpthread
>> >
>> >
>> > The error is:
>> >
>> > $ make
>> > ld net.o timer.o http.o web.o server.o aisle-rpc.o asta-rpc.o
>> background.o daemon.o aside-rpc.o escape.o -o liba.so -lpthread
>> > ld: warning: No version-min specified on command line
>> > Undefined symbols for architecture x86_64:
>> >  "_main", referenced from:
>> >     implicit entry/start for main executable
>> > ld: symbol(s) not found for inferred architecture x86_64
>> > make: *** [liba.so] Error 1
>> >
>> >
>> >
>> > Best,
>> >
>> > --
>> > Alexandre Rademaker
>> > http://arademaker.github.io
>> >
>> >
>> >
>>
>>
>>
>
> --
> -Michael Wayne Goodman
>
>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200515/6aba02e1/attachment.html>

From J.A.Carroll at sussex.ac.uk  Thu May 21 23:38:57 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Thu, 21 May 2020 21:38:57 +0000
Subject: [developers] LKB-FOS now includes [incr tsdb()]
Message-ID: <C3005583-2F82-4B6E-B485-1ED1BA20D474@sussex.ac.uk>

Hi all,

I've just released a new version of LKB-FOS. The main change is that the Linux version includes all of the non-LOGON parts of [incr tsdb()]. The podium runs, and I believe that all of its menu commands are working correctly. I've created a foreign function interface in SBCL for the BDB C program, so training maxent models also works. Anything that's at all CPU-intensive runs a lot quicker than in the LOGON run-time binary.

For macOS, I haven't made a serious attempt at recompiling the core [incr tsdb()] C programs (tsdb, swish++), so there's not much of it that works - the main useful exception being reading and applying maxent models (e.g. as described at the end of http://moin.delph-in.net/LkbGeneration).

No LOGON-specific functionality is available (i.e. source code enabled by the :logon feature), which means that PVM, WWW demo, SVMs and language models, external MT system interfaces etc are missing. If anyone particularly wants one of these features in LKB-FOS, it should be possible now there's a solid foundation to start from.

BTW, below is a relevant posting to the developers list by Stephan in 2006. The previous posting in that thread was over-optimistic: a number of issues (which I won't bore this list with) made the port to SBCL harder than one might have expected. Anyway, I'm pleased to have made progress on this issue 14 years on!

All the best,

John

PS The new LKB-FOS contains many other improvements - please see the README. Download link at http://moin.delph-in.net/LkbFos


> http://lists.delph-in.net/archives/developers/2006/000632.html
> 
> [developers] SBCL port
> Stephan Oepen oe at csli.Stanford.EDU 
> Mon Oct 30 11:23:05 CET 2006
> 
> howdy,
> 
> > But I expect a port would not be too difficult to achieve for either
> > of these systems. Stephan, what do you think?
> 
> [incr tsdb()] makes fairly central use of foreign functions, which are 
> non-standard.  also, the [incr tsdb()] GUI depends on threads, which in
> SBCL are just barely available (in a way different from the traditional
> MP package), and only for Linux on x86 and AMD64 currently.  i have no
> current plans to port [incr tsdb()] to other Lisps, and personally i am
> not too keen on getting other developers involved in that right now.  i
> would want to review patches to [incr tsdb()] code so as to make sure i
> can maintain its overall design.  these days i am afraid i have no time
> for such activity.
> 
> the LOGON MT architecture is an extension to [incr tsdb()], i.e it has
> inherited the same constraints on cross-platform portability.  however,
> we are about to release a complete run-time edition of LOGON, such that 
> people will be able to get full functionality without their own license 
> for Allegro CL.
> 
> more high-level, SBCL does look like a Lisp going the right direction.
> but before it makes sense for us to make the coordinated effort towards
> supporting the breadth of DELPH-IN software on a new Lisp, we should be
> sure of our minimum requirements.  the following come to my mind:
> 
>   (1) stable, efficient, actively maintained ANSI CL implementation
>   (2) UniCode strings, including full external format support
>   (3) cross-platform availability
>   (4) multi-processing, preferably with Lisp control of scheduler
>   (5) foreign function interface
>   (6) high-level OS interface: run-shell-command(), sockets, et al.
> 
> SBCL appears to have all of the above but (4).  i know CMU-CL used to
> include the traditional MP package, but i have no idea about the other
> desiderata there.
> 
>                                                           best  -  oe
> 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
> +++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
> +++       --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net ---
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From ned at nedned.net  Fri May 22 03:57:00 2020
From: ned at nedned.net (Ned Letcher)
Date: Fri, 22 May 2020 11:57:00 +1000
Subject: [developers] Searching treebanks
In-Reply-To: <CAMype6eDX593ns=hNKUTCb73nZFLOyumfgMgPA+AREo-iXaY2Q@mail.gmail.com>
References: <CA+arSXgqUARoge9e9e32g_tXAEVftP_avZTzwUy4P5kQpROYCw@mail.gmail.com>
	<5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no>
	<CA+arSXg+8rrdguzTDS2xJfQL8izE56HWB5qhrM1QUdE2mEJFHw@mail.gmail.com>
	<CAMype6eDX593ns=hNKUTCb73nZFLOyumfgMgPA+AREo-iXaY2Q@mail.gmail.com>
Message-ID: <CABmuqo34KKgiAv+8orzWyY7KOhAaT1Ebvry4Fank90-0Qtck6g@mail.gmail.com>

Heya Francis,

I surveyed syntactic querying tools for treebank search in my thesis.
During development of Typediff <https://github.com/ned2/typediff>, I needed
to embed an interactive querying interface for DELPHIN treebanks, and came
to the conclusion that Fangorn was the best tool for the job. Sadly there
is not a live version of Typediff live currently.

Fangorn <https://github.com/sparcs/fangorn> itself wasn't too hard to get
running I found, and as part of Typediff I created a tool
<https://github.com/ned2/typediff/blob/af2d91c3221182ddb0c8cf55db4127c5c5587544/typediff/parseit.py>
for converting DELPHIN treebanks into the format that Fangorn expects,
which you might be able to use.

I have been hoping to get a version of Typediff up and running somewhere
but it's not something I've been able to prioritise. If I do, I will be
sure to let you know :)

Cheers,
Ned

On Thu, 27 Feb 2020 at 01:09, Emily M. Bender <ebender at uw.edu> wrote:

> For search over semantic representations (MRS, DM, EDS) there's WeSearch:
>
> http://wesearch.delph-in.net/
>
> ... which indexes DeepBank and WikiWoods.
>
> Emily
>
> On Wed, Feb 26, 2020 at 5:29 AM Francis Bond <bond at ieee.org> wrote:
>
>> Thanks for the tip.    If only we all sensibly annotated our corpora with
>> typecraft.
>>
>> On Wed, Feb 26, 2020 at 9:21 PM Lars Hellan <lars.hellan at ntnu.no> wrote:
>>
>>> Hi Francis,
>>>
>>> For Norwegian you can do such things through
>>> https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of
>>> about 20,000 sentences.
>>>
>>>
>>> (Not right on your mark, but perhaps not too far from the sphere of
>>> "anything" ...)
>>>
>>>
>>> Best
>>>
>>> Lars
>>> ------------------------------
>>> *From:* developers-bounces at emmtee.net <developers-bounces at emmtee.net>
>>> on behalf of Francis Bond <bond at ieee.org>
>>> *Sent:* Wednesday, February 26, 2020 2:02:28 PM
>>> *To:* Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy
>>> Baldwin
>>> *Subject:* [developers] Searching treebanks
>>>
>>> G'day,
>>>
>>> does anyone know of any way to search Redwoods (or DELPHIN treebanks in
>>> general)  for trees of a certain type (using something like the Fangorn
>>> interface).  For example, I want to find how often in the treebank 'start'
>>> is intransitive vs NP V VP-ving  vs NP V VP-to vs NP V VP NP  (I start; I
>>> start lecturing; I start to lecture; I start a lecture).
>>>
>>> In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ...
>>>
>>> I would be ecstatic if there were an online search I can point my
>>> students at, but would be interested in anything.
>>>
>>>
>>>
>>> --
>>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>>> Division of Linguistics and Multilingual Studies
>>> Nanyang Technological University
>>>
>>
>>
>> --
>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>> Division of Linguistics and Multilingual Studies
>> Nanyang Technological University
>>
>
>
> --
> Emily M. Bender (she/her)
> Howard and Frances Nostrand Endowed Professor
> Department of Linguistics
> Faculty Director, CLMS
> University of Washington
> Twitter: @emilymbender
>


-- 
nedned.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200522/0f3b2aef/attachment.html>

From bond at ieee.org  Mon May 25 14:37:34 2020
From: bond at ieee.org (Francis Bond)
Date: Mon, 25 May 2020 20:37:34 +0800
Subject: [developers] Searching treebanks
In-Reply-To: <CABmuqo34KKgiAv+8orzWyY7KOhAaT1Ebvry4Fank90-0Qtck6g@mail.gmail.com>
References: <CA+arSXgqUARoge9e9e32g_tXAEVftP_avZTzwUy4P5kQpROYCw@mail.gmail.com>
	<5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no>
	<CA+arSXg+8rrdguzTDS2xJfQL8izE56HWB5qhrM1QUdE2mEJFHw@mail.gmail.com>
	<CAMype6eDX593ns=hNKUTCb73nZFLOyumfgMgPA+AREo-iXaY2Q@mail.gmail.com>
	<CABmuqo34KKgiAv+8orzWyY7KOhAaT1Ebvry4Fank90-0Qtck6g@mail.gmail.com>
Message-ID: <CA+arSXgSEfuTUKV082c-jMuzToRJo2NqDA8KNgYAdQjcWkttPg@mail.gmail.com>

Thank you!

I will try to set up Fangorn then.

On Fri, May 22, 2020 at 9:57 AM Ned Letcher <ned at nedned.net> wrote:

> Heya Francis,
>
> I surveyed syntactic querying tools for treebank search in my thesis.
> During development of Typediff <https://github.com/ned2/typediff>, I
> needed to embed an interactive querying interface for DELPHIN treebanks,
> and came to the conclusion that Fangorn was the best tool for the job.
> Sadly there is not a live version of Typediff live currently.
>
> Fangorn <https://github.com/sparcs/fangorn> itself wasn't too hard to get
> running I found, and as part of Typediff I created a tool
> <https://github.com/ned2/typediff/blob/af2d91c3221182ddb0c8cf55db4127c5c5587544/typediff/parseit.py>
> for converting DELPHIN treebanks into the format that Fangorn expects,
> which you might be able to use.
>
> I have been hoping to get a version of Typediff up and running somewhere
> but it's not something I've been able to prioritise. If I do, I will be
> sure to let you know :)
>
> Cheers,
> Ned
>
> On Thu, 27 Feb 2020 at 01:09, Emily M. Bender <ebender at uw.edu> wrote:
>
>> For search over semantic representations (MRS, DM, EDS) there's WeSearch:
>>
>> http://wesearch.delph-in.net/
>>
>> ... which indexes DeepBank and WikiWoods.
>>
>> Emily
>>
>> On Wed, Feb 26, 2020 at 5:29 AM Francis Bond <bond at ieee.org> wrote:
>>
>>> Thanks for the tip.    If only we all sensibly annotated our corpora
>>> with typecraft.
>>>
>>> On Wed, Feb 26, 2020 at 9:21 PM Lars Hellan <lars.hellan at ntnu.no> wrote:
>>>
>>>> Hi Francis,
>>>>
>>>> For Norwegian you can do such things through
>>>> https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of
>>>> about 20,000 sentences.
>>>>
>>>>
>>>> (Not right on your mark, but perhaps not too far from the sphere of
>>>> "anything" ...)
>>>>
>>>>
>>>> Best
>>>>
>>>> Lars
>>>> ------------------------------
>>>> *From:* developers-bounces at emmtee.net <developers-bounces at emmtee.net>
>>>> on behalf of Francis Bond <bond at ieee.org>
>>>> *Sent:* Wednesday, February 26, 2020 2:02:28 PM
>>>> *To:* Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy
>>>> Baldwin
>>>> *Subject:* [developers] Searching treebanks
>>>>
>>>> G'day,
>>>>
>>>> does anyone know of any way to search Redwoods (or DELPHIN treebanks in
>>>> general)  for trees of a certain type (using something like the Fangorn
>>>> interface).  For example, I want to find how often in the treebank 'start'
>>>> is intransitive vs NP V VP-ving  vs NP V VP-to vs NP V VP NP  (I start; I
>>>> start lecturing; I start to lecture; I start a lecture).
>>>>
>>>> In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ...
>>>>
>>>> I would be ecstatic if there were an online search I can point my
>>>> students at, but would be interested in anything.
>>>>
>>>>
>>>>
>>>> --
>>>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>>>> Division of Linguistics and Multilingual Studies
>>>> Nanyang Technological University
>>>>
>>>
>>>
>>> --
>>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>>> Division of Linguistics and Multilingual Studies
>>> Nanyang Technological University
>>>
>>
>>
>> --
>> Emily M. Bender (she/her)
>> Howard and Frances Nostrand Endowed Professor
>> Department of Linguistics
>> Faculty Director, CLMS
>> University of Washington
>> Twitter: @emilymbender
>>
>
>
> --
> nedned.net
>


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200525/63496752/attachment.html>

From bond at ieee.org  Mon Jun  1 06:35:57 2020
From: bond at ieee.org (Francis Bond)
Date: Mon, 1 Jun 2020 12:35:57 +0800
Subject: [developers] LKB-FOS now includes [incr tsdb()]
In-Reply-To: <C3005583-2F82-4B6E-B485-1ED1BA20D474@sussex.ac.uk>
References: <C3005583-2F82-4B6E-B485-1ED1BA20D474@sussex.ac.uk>
Message-ID: <CA+arSXirK=05ueYrTe4m9CtdYXKHioENgAQyt9B4AQjpCHa0rA@mail.gmail.com>

Hi,

to get it working on ubuntu 18.04.4, I had to make some libraries visible:
bdb.so, libdb-4.2.so, libtermcap.so.2

I did this with:
export
LD_LIBRARY_PATH=/home/bond/delphin/lkb_fos.2020/src/tsdb/linux.x86.64:/home/bond/delphin/lkb_fos.2020/lib/linux.x86.64

I also had to link libtermcap.so.2 ->
/lib/x86_64-linux-gnu/libncurses.so.5.9

Maybe we could have a symbolic link from src/tsdb/linux.x86.64/bdb.so
to lib/linux.x86.64/bdb.so
so that we only have to point to a single directory?

Should I make an installation section in LkbFos and add the notes there?


On Fri, May 22, 2020 at 5:39 AM John Carroll <J.A.Carroll at sussex.ac.uk>
wrote:

> Hi all,
>
> I've just released a new version of LKB-FOS. The main change is that the
> Linux version includes all of the non-LOGON parts of [incr tsdb()]. The
> podium runs, and I believe that all of its menu commands are working
> correctly. I've created a foreign function interface in SBCL for the BDB C
> program, so training maxent models also works. Anything that's at all
> CPU-intensive runs a lot quicker than in the LOGON run-time binary.
>
> For macOS, I haven't made a serious attempt at recompiling the core [incr
> tsdb()] C programs (tsdb, swish++), so there's not much of it that works -
> the main useful exception being reading and applying maxent models (e.g. as
> described at the end of http://moin.delph-in.net/LkbGeneration).
>
> No LOGON-specific functionality is available (i.e. source code enabled by
> the :logon feature), which means that PVM, WWW demo, SVMs and language
> models, external MT system interfaces etc are missing. If anyone
> particularly wants one of these features in LKB-FOS, it should be possible
> now there's a solid foundation to start from.
>
> BTW, below is a relevant posting to the developers list by Stephan in
> 2006. The previous posting in that thread was over-optimistic: a number of
> issues (which I won't bore this list with) made the port to SBCL harder
> than one might have expected. Anyway, I'm pleased to have made progress on
> this issue 14 years on!
>
> All the best,
>
> John
>
> PS The new LKB-FOS contains many other improvements - please see the
> README. Download link at http://moin.delph-in.net/LkbFos
>
>
> > http://lists.delph-in.net/archives/developers/2006/000632.html
> >
> > [developers] SBCL port
> > Stephan Oepen oe at csli.Stanford.EDU
> > Mon Oct 30 11:23:05 CET 2006
> >
> > howdy,
> >
> > > But I expect a port would not be too difficult to achieve for either
> > > of these systems. Stephan, what do you think?
> >
> > [incr tsdb()] makes fairly central use of foreign functions, which are
> > non-standard.  also, the [incr tsdb()] GUI depends on threads, which in
> > SBCL are just barely available (in a way different from the traditional
> > MP package), and only for Linux on x86 and AMD64 currently.  i have no
> > current plans to port [incr tsdb()] to other Lisps, and personally i am
> > not too keen on getting other developers involved in that right now.  i
> > would want to review patches to [incr tsdb()] code so as to make sure i
> > can maintain its overall design.  these days i am afraid i have no time
> > for such activity.
> >
> > the LOGON MT architecture is an extension to [incr tsdb()], i.e it has
> > inherited the same constraints on cross-platform portability.  however,
> > we are about to release a complete run-time edition of LOGON, such that
> > people will be able to get full functionality without their own license
> > for Allegro CL.
> >
> > more high-level, SBCL does look like a Lisp going the right direction.
> > but before it makes sense for us to make the coordinated effort towards
> > supporting the breadth of DELPH-IN software on a new Lisp, we should be
> > sure of our minimum requirements.  the following come to my mind:
> >
> >   (1) stable, efficient, actively maintained ANSI CL implementation
> >   (2) UniCode strings, including full external format support
> >   (3) cross-platform availability
> >   (4) multi-processing, preferably with Lisp control of scheduler
> >   (5) foreign function interface
> >   (6) high-level OS interface: run-shell-command(), sockets, et al.
> >
> > SBCL appears to have all of the above but (4).  i know CMU-CL used to
> > include the traditional MP package, but i have no idea about the other
> > desiderata there.
> >
> >                                                           best  -  oe
> >
> >
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47)
> 2284 0125
> > +++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723
> 0515
> > +++       --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at
> oepen.net ---
> >
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200601/a2a328ce/attachment.html>

From J.A.Carroll at sussex.ac.uk  Mon Jun  1 12:04:01 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Mon, 1 Jun 2020 10:04:01 +0000
Subject: [developers] LKB-FOS now includes [incr tsdb()]
In-Reply-To: <CA+arSXirK=05ueYrTe4m9CtdYXKHioENgAQyt9B4AQjpCHa0rA@mail.gmail.com>
References: <C3005583-2F82-4B6E-B485-1ED1BA20D474@sussex.ac.uk>
	<CA+arSXirK=05ueYrTe4m9CtdYXKHioENgAQyt9B4AQjpCHa0rA@mail.gmail.com>
Message-ID: <C11F9AD7-3561-4E81-AFCB-BE61912E65CF@sussex.ac.uk>

Hi Francis,

Thanks for the report and suggestions about LD_LIBARY_PATH. I put a hint about what to do in the README, but it's buried in dense text so not easy to find ("LD_LIBARY_PATH must include <path-to-lkb_fos>/lib/linux.x86.64"). It's a good ides to have a couple of sentences about this in LkbFos.

Here's what LKB-FOS does on startup:

* it finds the absolute path to the lkb_fos directory, and from that, it constructs a path to where it thinks bdb.so should be (e.g. on my system this is /home/ubuntu/Documents/delphin/lkb_fos/src/tsdb/linux.x86.64/bdb.so)

* it attempts to load bdb.so from this path as a shared library

* if there's an error in this load, it's either because the user has moved bdb.so, or libdb-4.2.so (which bdb.so depends on) can't be found

* libdb-4.2.so will be pulled in if LD_LIBARY_PATH points to lkb_fos/lib/linux.x86.64


Later in a session, if you start up the [incr tsdb()] podium it tries to load libtermcap.so.2; on my system this is in a standard Linux shared library directory so is picked up fine. But evidently this isn't the case for everyone. So to avoid installation fuss, I was just going to provide it in lkb_fos/lib/linux.x86.64 - the file is very small. LOGON also takes this approach. I'll do this in the next LKB-FOS release.

If DELPHINHOME is set as recommended, then I think the following should be sufficient:

export LD_LIBRARY_PATH=$DELPHINHOME/lkb_fos/lib/linux.x86.64:$LD_LIBRARY_PATH

And for the moment, some users will also need to execute:

ln -s <some system library directory>/libncurses.so.5.7 .../lkb_fos/lib/linux.x86.64/libtermcap.so.2


Does this look reasonable? If so I'll update LkbFos.

John


On 1 Jun 2020, at 05:35, Francis Bond <bond at ieee.org<mailto:bond at ieee.org>> wrote:

Hi,

to get it working on ubuntu 18.04.4, I had to make some libraries visible:
bdb.so, libdb-4.2.so<http://libdb-4.2.so>, libtermcap.so.2

I did this with:
export LD_LIBRARY_PATH=/home/bond/delphin/lkb_fos.2020/src/tsdb/linux.x86.64:/home/bond/delphin/lkb_fos.2020/lib/linux.x86.64

I also had to link libtermcap.so.2 -> /lib/x86_64-linux-gnu/libncurses.so.5.9

Maybe we could have a symbolic link from src/tsdb/linux.x86.64/bdb.so to lib/linux.x86.64/bdb.so
so that we only have to point to a single directory?

Should I make an installation section in LkbFos and add the notes there?


On Fri, May 22, 2020 at 5:39 AM John Carroll <J.A.Carroll at sussex.ac.uk<mailto:J.A.Carroll at sussex.ac.uk>> wrote:
Hi all,

I've just released a new version of LKB-FOS. The main change is that the Linux version includes all of the non-LOGON parts of [incr tsdb()]. The podium runs, and I believe that all of its menu commands are working correctly. I've created a foreign function interface in SBCL for the BDB C program, so training maxent models also works. Anything that's at all CPU-intensive runs a lot quicker than in the LOGON run-time binary.

For macOS, I haven't made a serious attempt at recompiling the core [incr tsdb()] C programs (tsdb, swish++), so there's not much of it that works - the main useful exception being reading and applying maxent models (e.g. as described at the end of http://moin.delph-in.net/LkbGeneration<http://moin.delph-in.net/LkbGeneration>).

No LOGON-specific functionality is available (i.e. source code enabled by the :logon feature), which means that PVM, WWW demo, SVMs and language models, external MT system interfaces etc are missing. If anyone particularly wants one of these features in LKB-FOS, it should be possible now there's a solid foundation to start from.

BTW, below is a relevant posting to the developers list by Stephan in 2006. The previous posting in that thread was over-optimistic: a number of issues (which I won't bore this list with) made the port to SBCL harder than one might have expected. Anyway, I'm pleased to have made progress on this issue 14 years on!

All the best,

John

PS The new LKB-FOS contains many other improvements - please see the README. Download link at http://moin.delph-in.net/LkbFos<http://moin.delph-in.net/LkbFos>


> http://lists.delph-in.net/archives/developers/2006/000632.html<http://lists.delph-in.net/archives/developers/2006/000632.html>
>
> [developers] SBCL port
> Stephan Oepen oe at csli.Stanford.EDU<http://csli.Stanford.EDU>
> Mon Oct 30 11:23:05 CET 2006
>
> howdy,
>
> > But I expect a port would not be too difficult to achieve for either
> > of these systems. Stephan, what do you think?
>
> [incr tsdb()] makes fairly central use of foreign functions, which are
> non-standard.  also, the [incr tsdb()] GUI depends on threads, which in
> SBCL are just barely available (in a way different from the traditional
> MP package), and only for Linux on x86 and AMD64 currently.  i have no
> current plans to port [incr tsdb()] to other Lisps, and personally i am
> not too keen on getting other developers involved in that right now.  i
> would want to review patches to [incr tsdb()] code so as to make sure i
> can maintain its overall design.  these days i am afraid i have no time
> for such activity.
>
> the LOGON MT architecture is an extension to [incr tsdb()], i.e it has
> inherited the same constraints on cross-platform portability.  however,
> we are about to release a complete run-time edition of LOGON, such that
> people will be able to get full functionality without their own license
> for Allegro CL.
>
> more high-level, SBCL does look like a Lisp going the right direction.
> but before it makes sense for us to make the coordinated effort towards
> supporting the breadth of DELPH-IN software on a new Lisp, we should be
> sure of our minimum requirements.  the following come to my mind:
>
>   (1) stable, efficient, actively maintained ANSI CL implementation
>   (2) UniCode strings, including full external format support
>   (3) cross-platform availability
>   (4) multi-processing, preferably with Lisp control of scheduler
>   (5) foreign function interface
>   (6) high-level OS interface: run-shell-command(), sockets, et al.
>
> SBCL appears to have all of the above but (4).  i know CMU-CL used to
> include the traditional MP package, but i have no idea about the other
> desiderata there.
>
>                                                           best  -  oe
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
> +++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
> +++       --- oe at csli.stanford.edu<http://csli.stanford.edu>; oe at ifi.uio.no<http://ifi.uio.no>; stephan at oepen.net<http://oepen.net> ---
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/<http://www3.ntu.edu.sg/home/fcbond/>>
Division of Linguistics and Multilingual Studies
Nanyang Technological University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200601/0070579e/attachment.html>

From bond at ieee.org  Mon Jun  1 13:28:32 2020
From: bond at ieee.org (Francis Bond)
Date: Mon, 1 Jun 2020 19:28:32 +0800
Subject: [developers] LKB-FOS now includes [incr tsdb()]
In-Reply-To: <C11F9AD7-3561-4E81-AFCB-BE61912E65CF@sussex.ac.uk>
References: <C3005583-2F82-4B6E-B485-1ED1BA20D474@sussex.ac.uk>
	<CA+arSXirK=05ueYrTe4m9CtdYXKHioENgAQyt9B4AQjpCHa0rA@mail.gmail.com>
	<C11F9AD7-3561-4E81-AFCB-BE61912E65CF@sussex.ac.uk>
Message-ID: <CA+arSXgHJGnhvjevFux_Lk0O+N4v_VeQq7CzLOZKrcrPgS-pZg@mail.gmail.com>

That looks very reasonable!   Thanks.

On Mon, Jun 1, 2020 at 6:04 PM John Carroll <J.A.Carroll at sussex.ac.uk>
wrote:

> Hi Francis,
>
> Thanks for the report and suggestions about LD_LIBARY_PATH. I put a hint
> about what to do in the README, but it's buried in dense text so not easy
> to find ("LD_LIBARY_PATH must include <path-to-lkb_fos>/lib/linux.x86.64").
> It's a good ides to have a couple of sentences about this in LkbFos.
>
> Here's what LKB-FOS does on startup:
>
> * it finds the absolute path to the lkb_fos directory, and from that, it
> constructs a path to where it thinks bdb.so should be (e.g. on my system
> this is /home/ubuntu/Documents/delphin/lkb_fos/src/tsdb/linux.x86.64/bdb.so)
>
> * it attempts to load bdb.so from this path as a shared library
>
> * if there's an error in this load, it's either because the user has moved
> bdb.so, or libdb-4.2.so (which bdb.so depends on) can't be found
>
> * libdb-4.2.so will be pulled in if LD_LIBARY_PATH points to
> lkb_fos/lib/linux.x86.64
>
>
> Later in a session, if you start up the [incr tsdb()] podium it tries to
> load libtermcap.so.2; on my system this is in a standard Linux shared
> library directory so is picked up fine. But evidently this isn't the case
> for everyone. So to avoid installation fuss, I was just going to provide it
> in lkb_fos/lib/linux.x86.64 - the file is very small. LOGON also takes this
> approach. I'll do this in the next LKB-FOS release.
>
> If DELPHINHOME is set as recommended, then I think the following should be
> sufficient:
>
> export
> LD_LIBRARY_PATH=$DELPHINHOME/lkb_fos/lib/linux.x86.64:$LD_LIBRARY_PATH
>
> And for the moment, some users will also need to execute:
>
> ln -s <some system library
> directory>/libncurses.so.5.7 .../lkb_fos/lib/linux.x86.64/libtermcap.so.2
>
>
> Does this look reasonable? If so I'll update LkbFos.
>
> John
>
>
> On 1 Jun 2020, at 05:35, Francis Bond <bond at ieee.org> wrote:
>
> Hi,
>
> to get it working on ubuntu 18.04.4, I had to make some libraries visible:
> bdb.so, libdb-4.2.so, libtermcap.so.2
>
> I did this with:
> export
> LD_LIBRARY_PATH=/home/bond/delphin/lkb_fos.2020/src/tsdb/linux.x86.64:/home/bond/delphin/lkb_fos.2020/lib/linux.x86.64
>
> I also had to link libtermcap.so.2 ->
> /lib/x86_64-linux-gnu/libncurses.so.5.9
>
> Maybe we could have a symbolic link from src/tsdb/linux.x86.64/bdb.so
> to lib/linux.x86.64/bdb.so
> so that we only have to point to a single directory?
>
> Should I make an installation section in LkbFos and add the notes there?
>
>
> On Fri, May 22, 2020 at 5:39 AM John Carroll <J.A.Carroll at sussex.ac.uk>
> wrote:
>
>> Hi all,
>>
>> I've just released a new version of LKB-FOS. The main change is that the
>> Linux version includes all of the non-LOGON parts of [incr tsdb()]. The
>> podium runs, and I believe that all of its menu commands are working
>> correctly. I've created a foreign function interface in SBCL for the BDB C
>> program, so training maxent models also works. Anything that's at all
>> CPU-intensive runs a lot quicker than in the LOGON run-time binary.
>>
>> For macOS, I haven't made a serious attempt at recompiling the core [incr
>> tsdb()] C programs (tsdb, swish++), so there's not much of it that works -
>> the main useful exception being reading and applying maxent models (e.g. as
>> described at the end of http://moin.delph-in.net/LkbGeneration).
>>
>> No LOGON-specific functionality is available (i.e. source code enabled by
>> the :logon feature), which means that PVM, WWW demo, SVMs and language
>> models, external MT system interfaces etc are missing. If anyone
>> particularly wants one of these features in LKB-FOS, it should be possible
>> now there's a solid foundation to start from.
>>
>> BTW, below is a relevant posting to the developers list by Stephan in
>> 2006. The previous posting in that thread was over-optimistic: a number of
>> issues (which I won't bore this list with) made the port to SBCL harder
>> than one might have expected. Anyway, I'm pleased to have made progress on
>> this issue 14 years on!
>>
>> All the best,
>>
>> John
>>
>> PS The new LKB-FOS contains many other improvements - please see the
>> README. Download link at http://moin.delph-in.net/LkbFos
>>
>>
>> > http://lists.delph-in.net/archives/developers/2006/000632.html
>> >
>> > [developers] SBCL port
>> > Stephan Oepen oe at csli.Stanford.EDU
>> > Mon Oct 30 11:23:05 CET 2006
>> >
>> > howdy,
>> >
>> > > But I expect a port would not be too difficult to achieve for either
>> > > of these systems. Stephan, what do you think?
>> >
>> > [incr tsdb()] makes fairly central use of foreign functions, which are
>> > non-standard.  also, the [incr tsdb()] GUI depends on threads, which in
>> > SBCL are just barely available (in a way different from the traditional
>> > MP package), and only for Linux on x86 and AMD64 currently.  i have no
>> > current plans to port [incr tsdb()] to other Lisps, and personally i am
>> > not too keen on getting other developers involved in that right now.  i
>> > would want to review patches to [incr tsdb()] code so as to make sure i
>> > can maintain its overall design.  these days i am afraid i have no time
>> > for such activity.
>> >
>> > the LOGON MT architecture is an extension to [incr tsdb()], i.e it has
>> > inherited the same constraints on cross-platform portability.  however,
>> > we are about to release a complete run-time edition of LOGON, such that
>> > people will be able to get full functionality without their own license
>> > for Allegro CL.
>> >
>> > more high-level, SBCL does look like a Lisp going the right direction.
>> > but before it makes sense for us to make the coordinated effort towards
>> > supporting the breadth of DELPH-IN software on a new Lisp, we should be
>> > sure of our minimum requirements.  the following come to my mind:
>> >
>> >   (1) stable, efficient, actively maintained ANSI CL implementation
>> >   (2) UniCode strings, including full external format support
>> >   (3) cross-platform availability
>> >   (4) multi-processing, preferably with Lisp control of scheduler
>> >   (5) foreign function interface
>> >   (6) high-level OS interface: run-shell-command(), sockets, et al.
>> >
>> > SBCL appears to have all of the above but (4).  i know CMU-CL used to
>> > include the traditional MP package, but i have no idea about the other
>> > desiderata there.
>> >
>> >                                                           best  -  oe
>> >
>> >
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47)
>> 2284 0125
>> > +++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723
>> 0515
>> > +++       --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at
>> oepen.net ---
>> >
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>
>
>

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200601/f3b89a16/attachment-0001.html>

From J.A.Carroll at sussex.ac.uk  Thu Jun  4 19:09:48 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Thu, 4 Jun 2020 17:09:48 +0000
Subject: [developers] Questions about chart mapping
Message-ID: <4E0FC400-B1E5-4FB5-8DB4-5DDCED53A0C4@sussex.ac.uk>

Hi developers,

I've started to look at chart mapping and how it might be implemented. I've been reading the following:

'Tutorial - Chart Mapping in PET' at DELPH-IN Summit 2009 http://www.delph-in.net/2009/cm.pdf
LREC 2008 paper http://www.lrec-conf.org/proceedings/lrec2008/pdf/349_paper.pdf

I've also been checking my understanding of the formalism by looking at the token mapping rules in the ERG 2018 directory tmr/. I have a few questions below which I've tried to contextualise with respect to the tutorial slides. I hope an expert can answer them.

> Copying Information
> 
> * reentrancies can be used to copy information from INPUT to OUTPUT

Presumably reentrancies can also be used to copy information from CONTEXT to OUTPUT?

> Chart Mapping Procedure
> 
> * a rule match is completed if all CONTEXT and INPUT arguments are bound

What happens if there are several ways of matching chart edges to CONTEXT in a rule? Is the rule applied repeatedly, once for each alternative match? Or is only one of the alternative matches considered? This could matter if feature values or regular expression captures are copied from the context to the output.

> * each rule is applied until its fixpoint is reached

If I've understood the formalism correctly, I can imagine a rule that doesn't ever reach a fixpoint for some inputs (e.g. a rule in which the input and output unify, with the output building structure). Is the intended interpretation the following: a rule is never applied more than once to the same combination of input and context edges? And it's up to the grammarian to avoid writing infinitely looping rules?

If this is the correct interpretation, then I'm puzzled by a few rules in the ERG: bridge_tmr in tmr/bridge.tdl, and the four rules default_(ld|lb|rd|rb)_tmr in tmr/gml.tdl. Their inputs seem to unify with their outputs, so surely each would apply in an infinite loop (i.e. an input edge would match and be replaced with a new output edge, and since this new edge had not previously been used as an input the rule would pick this up and apply again, etc etc)?

Aside from the fixpoint issue, I'm not sure I understand the purpose of the rules default_(ld|lb|rd|rb)_tmr. At first glance they seem to merely replace their input. Is their purpose to remove all features that are not specified on the input side?

I'm also puzzled by the following comment on bridge_tmr:

> ;; ...  here, we take advantage of redundancy detection built into
> ;; token mapping, i.e. even though the rule is written as if it could apply any
> ;; number of times per cell, there shall not be duplicates in the token chart.

What enforces the restriction that "there shall not be duplicates in the token chart"? I can't see any mention of redundancy detection or of this restriction in the paper or tutorial slides. Is the restriction somehow enforced by the fixpoint condition?

Thanks in advance for clarification on these points.

John


From oe at ifi.uio.no  Thu Jun  4 19:36:07 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Thu, 4 Jun 2020 19:36:07 +0200
Subject: [developers] Questions about chart mapping
In-Reply-To: <4E0FC400-B1E5-4FB5-8DB4-5DDCED53A0C4@sussex.ac.uk>
References: <4E0FC400-B1E5-4FB5-8DB4-5DDCED53A0C4@sussex.ac.uk>
Message-ID: <CA+_Fm6KB37yL0J77kio7rpXXV=_c3yfX+f6YNN8SfxwS5qiwZQ@mail.gmail.com>

hi john,

peter and i originally designed the formalism, and undoubtedly there are
finer points not in the paper or slides.  PET can output detailed tracing
information (look for the ?erg? shell alias in $LOGONROOT/dot.bashrc, which
i suspect may be helpful.

from memory, ?redundancy? detection means that new chart items are
discarded if an equivalent item exists, and processing that role stopped
for that position.  i used to try and write rules that could not feed
indefinitely on their own OUTPUT, but in at least some cases i allowed
myself to take advantage of the redundancy check.

regarding non-determinism in matching a rule LHS, from memory i would
expect that all possibilities are explored.

yes, copying into the rule RHS is certainly not limited to INPUT matches.

if you were game, maybe we should video-conference at some point to go
through other subtleties (that you are bound to uncover :-)?  i would be
thrilled if the LKB were to acquire an implementation of chart mapping
(which i believe would also have several prospective use cases in
generation)!

best wishes, oe


tor. 4. jun. 2020 kl. 19:11 skrev John Carroll <J.A.Carroll at sussex.ac.uk>:

> Hi developers,
>
> I've started to look at chart mapping and how it might be implemented.
> I've been reading the following:
>
> 'Tutorial - Chart Mapping in PET' at DELPH-IN Summit 2009
> http://www.delph-in.net/2009/cm.pdf
> LREC 2008 paper
> http://www.lrec-conf.org/proceedings/lrec2008/pdf/349_paper.pdf
>
> I've also been checking my understanding of the formalism by looking at
> the token mapping rules in the ERG 2018 directory tmr/. I have a few
> questions below which I've tried to contextualise with respect to the
> tutorial slides. I hope an expert can answer them.
>
> > Copying Information
> >
> > * reentrancies can be used to copy information from INPUT to OUTPUT
>
> Presumably reentrancies can also be used to copy information from CONTEXT
> to OUTPUT?
>
> > Chart Mapping Procedure
> >
> > * a rule match is completed if all CONTEXT and INPUT arguments are bound
>
> What happens if there are several ways of matching chart edges to CONTEXT
> in a rule? Is the rule applied repeatedly, once for each alternative match?
> Or is only one of the alternative matches considered? This could matter if
> feature values or regular expression captures are copied from the context
> to the output.
>
> > * each rule is applied until its fixpoint is reached
>
> If I've understood the formalism correctly, I can imagine a rule that
> doesn't ever reach a fixpoint for some inputs (e.g. a rule in which the
> input and output unify, with the output building structure). Is the
> intended interpretation the following: a rule is never applied more than
> once to the same combination of input and context edges? And it's up to the
> grammarian to avoid writing infinitely looping rules?
>
> If this is the correct interpretation, then I'm puzzled by a few rules in
> the ERG: bridge_tmr in tmr/bridge.tdl, and the four rules
> default_(ld|lb|rd|rb)_tmr in tmr/gml.tdl. Their inputs seem to unify with
> their outputs, so surely each would apply in an infinite loop (i.e. an
> input edge would match and be replaced with a new output edge, and since
> this new edge had not previously been used as an input the rule would pick
> this up and apply again, etc etc)?
>
> Aside from the fixpoint issue, I'm not sure I understand the purpose of
> the rules default_(ld|lb|rd|rb)_tmr. At first glance they seem to merely
> replace their input. Is their purpose to remove all features that are not
> specified on the input side?
>
> I'm also puzzled by the following comment on bridge_tmr:
>
> > ;; ...  here, we take advantage of redundancy detection built into
> > ;; token mapping, i.e. even though the rule is written as if it could
> apply any
> > ;; number of times per cell, there shall not be duplicates in the token
> chart.
>
> What enforces the restriction that "there shall not be duplicates in the
> token chart"? I can't see any mention of redundancy detection or of this
> restriction in the paper or tutorial slides. Is the restriction somehow
> enforced by the fixpoint condition?
>
> Thanks in advance for clarification on these points.
>
> John
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200604/f2cf4eb1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: IMG_0003.jpg
Type: image/jpg
Size: 146096 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200604/f2cf4eb1/attachment-0001.jpg>

From arademaker at gmail.com  Tue Jun 16 14:52:48 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 16 Jun 2020 09:52:48 -0300
Subject: [developers] Coref over ERSs?
In-Reply-To: <CAFY8uOb=zHmb_CMLva8GB4Jjb4D-r-C+87iO2hwpMVsrKhrqvQ@mail.gmail.com>
References: <CAMype6d4XOMTcWqrkbaNTZc6M9OmWVgzB+3yiA7u-R4c0Yuojw@mail.gmail.com>
	<B55A888B-4AC1-41DD-82CB-11406FB22E83@sweaglesw.org>
	<CAFY8uOb=zHmb_CMLva8GB4Jjb4D-r-C+87iO2hwpMVsrKhrqvQ@mail.gmail.com>
Message-ID: <9D5D3E2E-7A3E-4DF8-BA89-067E6DA8E729@gmail.com>


Hi Woodley and Nikhil,

I have just found this thread in my inbox. Woodley, can you share the code you have? Nikhil, did you make any progress in this area? I am looking for single sentence solution first.

Best,
Alexandre

> On 13 Mar 2019, at 14:46, Nikhil Krishnaswamy <nkrishna at brandeis.edu> wrote:
> 
> Hi Woodley,
> 
> Thanks for getting in touch.  Insofar as I envision using MRS as a resource, it would be plain text in single sentences or well-formed sentence fragments.  The pipeline we're developing is still malleable though, so it would be fairly simple to change formats or insert a preprocessing step depending on the tools or resources already available that we might want to make use of.
> 
> Thanks,
> Nikhil
> 
> Nikhil Krishnaswamy, Ph.D.
> Postdoctoral Researcher, Department of Computer Science
> 
> 
> On Wed, Mar 13, 2019 at 1:43 PM Woodley Packard <sweaglesw at sweaglesw.org> wrote:
> Hi Nikhil,
> 
> In the past I worked on coreference resolution in MRS, although never quite to the point of a publication or software release.  Are you interested primarily in coreference within a single sentence or across multiple sentences?  Also, what format are you considering consuming MRS in (simple text based, DMRS, EDs, DM, ...)?  It?s possible some of my (oldish) tools could be of use to you.
> 
> Regards,
> Woodley


From arademaker at gmail.com  Fri Jul  3 16:19:41 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 3 Jul 2020 11:19:41 -0300
Subject: [developers] www script in the logon distribution
Message-ID: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com>


Hi Stephan,

For some reason, the www script in the logon distribution does not start the webserver. Using the `--debug` option, I don't have any additional information in the log file (actually, the script didn't mention the debug anywhere). I am following all instructions from http://moin.delph-in.net/LogonOnline. In particular, pvmd3 is running without any error in the startup. I don't see any *.pvm file in the /tmp. The script bin/logon starts LKB and the [incr TSDB()] normally. I have used `?cat` to save a lisp file and load it manually in the ACL REPL, no error too. Any idea? The log file is below.

Michael and Francis,

I did a complete review of the Dockerfile yesterday. Does it make sense to move https://github.com/own-pt/docker-logon to the https://github.com/delph-in organization? Maybe I can also rename it since the docker now has more than just the minimal environment to run the LOGON tools.  I believe that having more repositories under the same delph-in organization makes things clear and gives more visibility.  Nice to have Matrix and the brew package already there. I hope that people will start to recognize the benefits of git/GitHub compared to SVN (documentation, issue, easy branching, cross-references of code/issues/PR etc).


Best,
Alexandre


//////

user at 4091e35482b2:~/logon$ ./www --binary --debug --erg --port 9080

International Allegro CL Enterprise Edition
10.0 [64-bit Linux (x86-64)] (Feb 20, 2019 18:22)
Copyright (C) 1985-2015, Franz Inc., Oakland, CA, USA.  All Rights Reserved.

This standard runtime copy of Allegro CL was built by:
   [TC13152] Universitetet i Oslo

; Loading /home/user/logon/dot.tsdbrc
; Loading /home/user/.tsdbrc

[changing package from "COMMON-LISP-USER" to "TSDB"]
TSNLP(1): NIL
TSNLP(2): NIL
TSNLP(3): T
TSNLP(4): 5
TSNLP(5): "<center><hr>
  <small><i>(This on-line demonstrator is hosted at the
  <a href=http://www.mn.uio.no/ifi/english/research/groups/ltg/>University 
  of Oslo</a>)</i></small></center>"
TSNLP(6): ; Loading /home/user/logon/lingo/erg/lkb/script
set-coding-system(): activated UTF8.
;   Loading /home/user/logon/lingo/erg/Version.lsp
;   Loading /home/user/logon/lingo/erg/lkb/globals.lsp
;   Loading /home/user/logon/lingo/erg/lkb/user-fns.lsp
;   Loading /home/user/logon/lingo/erg/lkb/checkpaths.lsp
;   Loading /home/user/logon/lingo/erg/lkb/patches.lsp

Reading in type file fundamentals
Reading in type file tmt
Reading in type file lextypes
[14:13:08] gc-after-hook(): {L#626 N=5.2M O=0 E=100%} [S=2.3G R=102M].

Reading in type file syntax
[14:13:10] gc-after-hook(): {L#627 N=7.1M O=0 E=99%} [S=2.3G R=232M].

Reading in type file ctype
Reading in type file lexrules
Reading in type file auxverbs
[14:13:12] gc-after-hook(): {L#628 N=9.2M O=0 E=98%} [S=2.3G R=352M].

Reading in type file mtr
Reading in type file dt
Checking type hierarchy
Checking for unique greatest lower bounds
Expanding constraints
[14:13:18] gc-after-hook(): {L#629 N=55M O=5.2K E=99%} [S=2.3G R=352M].

Making constraints well formed
[14:13:19] gc-after-hook(): {L#630 N=72M O=4.8M E=82%} [S=2.3G R=356M].
[14:13:19] gc-after-hook(): {L#631 N=80M O=1.9M E=68%} [S=2.3G R=358M].
[14:13:20] gc-after-hook(): {L#632 N=87M O=2.2M E=79%} [S=2.3G R=392M].
[14:13:21] gc-after-hook(): {L#633 N=62M O=34M E=43%} [S=2.3G R=442M].
[14:13:22] gc-after-hook(): {L#634 N=69M O=23M E=80%} [S=2.3G R=466M].
[14:13:22] gc-after-hook(): 133M tenured; forcing global gc().
[14:13:23] gc-after-hook(): {GR#8 N=54M O=0 E=100%} [S=2.3G R=484M].
[14:13:24] gc-after-hook(): {L#635 N=88M O=0 E=0%} [S=2.3G R=484M].
[14:13:25] gc-after-hook(): {L#636 N=97M O=10M E=69%} [S=2.3G R=491M].
[14:13:26] gc-after-hook(): {L#637 N=99M O=14M E=63%} [S=2.4G R=532M].
[14:13:27] gc-after-hook(): {L#638 N=93M O=29M E=53%} [S=2.4G R=581M].
80175904 bytes have been tenured, next gc will be global.
See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more information.

Expanding defaults
Type file checked successfully
Computing display ordering
Reading in cached leaf types
Cached leaf types read
Reading in cached lexicon (main)
Cached lexicon read
Reading in rules file constructions
Reading in lexical rules file inflr
Reading in lexical rules file inflr-pnct
Reading in root file roots
Reading in lexical rules file lexrinst
Reading in parse node file parse-nodes
;   Loading /home/user/logon/lingo/erg/lkb/mrsglobals.lsp
;   Loading /home/user/logon/lingo/erg/lkb/eds.lsp
;   Loading /home/user/logon/lingo/erg/www/setup.lsp
; cpu time (non-gc) 13.952552 sec user, 0.026410 sec system
; cpu time (gc)     9.165182 sec user, 0.505708 sec system
; cpu time (total)  23.117734 sec user, 0.532118 sec system
; real time  22.104421 sec (107.0%)
; space allocation:
;  25,979,360 cons cells, 681,401,040 other bytes, 0 static bytes
; Page Faults: major: 0 (gc: 66190), minor: 163781 (gc: 66190)
;   Loading /home/user/logon/lingo/erg/rpp/setup.lsp
read-repp(): reading file `xml.rpp'.
read-repp(): reading file `latex.rpp'.
read-repp(): reading file `ascii.rpp'.
read-repp(): reading file `html.rpp'.
read-repp(): reading file `wiki.rpp'.
read-repp(): reading file `lgt.rpp'.
read-repp(): reading file `gml.rpp'.
read-repp(): reading file `robustness.rpp'.
read-repp(): reading file `quotes.rpp'.
read-repp(): reading file `ptb.rpp'.
read-repp(): reading file `lkb.rpp'.
read-repp(): reading file `micro.rpp'.
read-repp(): reading file `tokenizer.rpp'.
read-heads() reading file `rules.hds'.
read-model(): reading file `jhpstg.g.mem'.
[14:13:30] gc-after-hook(): {G#638 N=78M O=0 E=87%} [S=2.4G R=617M].
read-semi(): reading file `erg.smi'.
read-semi(): reading file `hierarchy.smi'.
read-semi(): reading file `abstract.smi'.
read-semi(): reading file `surface.smi'.
[14:13:32] gc-after-hook(): {L#639 N=108M O=0 E=0%} [S=2.4G R=617M].
read-vpm(): reading file `semi.vpm'.
read-vpm(): reading file `abstract.vpm'.
;   Loading /home/user/logon/lingo/erg/lkb/mt.lsp
read-transfer-rules(): reading file `paraphraser.mtr'.
read-transfer-rules(): reading file `idioms.mtr'.
read-transfer-rules(): reading file `trigger.mtr'.
[14:13:34] gc-after-hook(): {L#640 N=108M O=11M E=83%} [S=2.4G R=617M].
read-transfer-rules(): reading file `generation.mtr'.

Building rule filter
[14:13:36] gc-after-hook(): {L#641 N=105M O=9.5M E=90%} [S=2.4G R=617M].
[14:13:42] gc-after-hook(): {L#642 N=93M O=14M E=95%} [S=2.4G R=617M].
[14:13:47] gc-after-hook(): {L#643 N=24M O=72M E=92%} [S=2.4G R=666M].
[14:13:47] gc-after-hook(): 161M tenured; forcing global gc().
[14:13:48] gc-after-hook(): {GR#10 N=12M O=0 E=100%} [S=2.4G R=678M].
75861824 bytes have been tenured, next gc will be global.
See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more information.

Building lr connections table
Constructing lr table for non-morphological rules
Grammar input complete
NIL
TSNLP(7): [14:14:27] gc-after-hook(): {G#643 N=35M O=0 E=81%} [S=2.4G R=678M].
[14:14:30] gc-after-hook(): {L#644 N=41M O=0 E=0%} [S=2.4G R=678M].
[14:14:32] gc-after-hook(): {L#645 N=41M O=5.7M E=94%} [S=2.4G R=682M].
[14:14:35] gc-after-hook(): {L#646 N=43M O=2.8M E=90%} [S=2.4G R=685M].
[14:14:38] gc-after-hook(): {L#647 N=42M O=4.0M E=94%} [S=2.4G R=689M].
[14:14:41] gc-after-hook(): {L#648 N=25M O=21M E=93%} [S=2.4G R=711M].
[14:14:44] gc-after-hook(): {L#649 N=26M O=4.2M E=77%} [S=2.4G R=715M].
[14:14:47] gc-after-hook(): {L#650 N=27M O=4.0M E=92%} [S=2.4G R=719M].
[14:14:50] gc-after-hook(): {L#651 N=26M O=4.2M E=92%} [S=2.4G R=723M].
[14:14:53] gc-after-hook(): {L#652 N=27M O=4.3M E=93%} [S=2.4G R=728M].
53092272 bytes have been tenured, next gc will be global.
See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more information.
[14:14:58] gc-after-hook(): {G#652 N=25M O=0 E=95%} [S=2.4G R=733M].
[14:15:01] gc-after-hook(): {L#653 N=29M O=0 E=0%} [S=2.4G R=733M].
[14:15:04] gc-after-hook(): {L#654 N=30M O=4.1M E=95%} [S=2.4G R=734M].
[14:15:08] gc-after-hook(): {L#655 N=31M O=3.4M E=90%} [S=2.4G R=737M].
[14:15:11] gc-after-hook(): {L#656 N=30M O=4.9M E=91%} [S=2.4G R=742M].
[14:15:14] gc-after-hook(): {L#657 N=25M O=8.4M E=92%} [S=2.4G R=750M].
[14:15:18] gc-after-hook(): {L#658 N=24M O=5.0M E=87%} [S=2.4G R=756M].
[14:15:21] gc-after-hook(): {L#659 N=24M O=4.4M E=93%} [S=2.4G R=760M].
[14:15:25] gc-after-hook(): {L#660 N=24M O=3.8M E=93%} [S=2.4G R=764M].
[14:15:28] gc-after-hook(): {L#661 N=23M O=4.0M E=89%} [S=2.4G R=768M].
[14:15:31] gc-after-hook(): {L#662 N=24M O=4.1M E=92%} [S=2.4G R=772M].
[14:15:34] gc-after-hook(): {L#663 N=25M O=3.8M E=92%} [S=2.4G R=776M].
[14:15:37] gc-after-hook(): {L#664 N=25M O=3.6M E=92%} [S=2.4G R=779M].
[14:15:40] gc-after-hook(): {L#665 N=26M O=3.7M E=93%} [S=2.4G R=783M].
55870688 bytes have been tenured, next gc will be global.
See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more information.
#[SEM-I {38454 ges}: 0 roles; 22406 predicates; 0 properties]
TSNLP(8): "/brat/"
TSNLP(9): 
[t40009] BEGIN
[t4000a] BEGIN
[t40009] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' 
[t4000a] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' 
[t4000a] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] 
[t40009] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] 
[t4000a] read-vpm(): reading file `semi.vpm'.
[t40009] read-vpm(): reading file `semi.vpm'.
[t4000a] 95873 types in 15 s
[t4000a] 
[t40009] 95873 types in 15 s
[t40009] 
  [14:16:18] wait-for-clients(): `4091e35482b2' registered as tid <40009> [00:17].
  [14:16:18] wait-for-clients(): `4091e35482b2' registered as tid <4000a> [00:17].

NIL
TSNLP(10): 
[t4000b] BEGIN
[t4000c] BEGIN
[t4000d] BEGIN
[t4000e] BEGIN
[t4000d] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' 
[t4000e] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' 
[t4000c] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' 
[t4000b] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' 


From goodman.m.w at gmail.com  Fri Jul  3 16:47:19 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 3 Jul 2020 22:47:19 +0800
Subject: [developers] www script in the logon distribution
In-Reply-To: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com>
References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com>
Message-ID: <CAGXBFApjsJ1fsQp=1kfs5Zora7q1VcFaSoAiv5nbed7gdgX=bQ@mail.gmail.com>

Hi Alexandre,

I certainly don't mind if you want to put it under the delph-in
organization. I just looked it over briefly and I have two questions and a
suggestion:

* It is described as being for macOS, but very little actually looks
macOS-specific. Would it be appropriate to describe it in more general
terms in case someone wants to run it from some other platform?

* It is called docker-logon, but I don't see that it gets any of the LOGON
distribution. Maybe it should be renamed?

* It looks like you've included web.c from FFTB. The FFTB project is under
the MIT license, so you'll need to include its LICENSE file as well.

On Fri, Jul 3, 2020 at 10:21 PM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> Hi Stephan,
>
> For some reason, the www script in the logon distribution does not start
> the webserver. Using the `--debug` option, I don't have any additional
> information in the log file (actually, the script didn't mention the debug
> anywhere). I am following all instructions from
> http://moin.delph-in.net/LogonOnline. In particular, pvmd3 is running
> without any error in the startup. I don't see any *.pvm file in the /tmp.
> The script bin/logon starts LKB and the [incr TSDB()] normally. I have used
> `?cat` to save a lisp file and load it manually in the ACL REPL, no error
> too. Any idea? The log file is below.
>
> Michael and Francis,
>
> I did a complete review of the Dockerfile yesterday. Does it make sense to
> move https://github.com/own-pt/docker-logon to the
> https://github.com/delph-in organization? Maybe I can also rename it
> since the docker now has more than just the minimal environment to run the
> LOGON tools.  I believe that having more repositories under the same
> delph-in organization makes things clear and gives more visibility.  Nice
> to have Matrix and the brew package already there. I hope that people will
> start to recognize the benefits of git/GitHub compared to SVN
> (documentation, issue, easy branching, cross-references of code/issues/PR
> etc).
>
>
> Best,
> Alexandre
>
>
> //////
>
> user at 4091e35482b2:~/logon$ ./www --binary --debug --erg --port 9080
>
> International Allegro CL Enterprise Edition
> 10.0 [64-bit Linux (x86-64)] (Feb 20, 2019 18:22)
> Copyright (C) 1985-2015, Franz Inc., Oakland, CA, USA.  All Rights
> Reserved.
>
> This standard runtime copy of Allegro CL was built by:
>    [TC13152] Universitetet i Oslo
>
> ; Loading /home/user/logon/dot.tsdbrc
> ; Loading /home/user/.tsdbrc
>
> [changing package from "COMMON-LISP-USER" to "TSDB"]
> TSNLP(1): NIL
> TSNLP(2): NIL
> TSNLP(3): T
> TSNLP(4): 5
> TSNLP(5): "<center><hr>
>   <small><i>(This on-line demonstrator is hosted at the
>   <a href=http://www.mn.uio.no/ifi/english/research/groups/ltg/>University
>
>   of Oslo</a>)</i></small></center>"
> TSNLP(6): ; Loading /home/user/logon/lingo/erg/lkb/script
> set-coding-system(): activated UTF8.
> ;   Loading /home/user/logon/lingo/erg/Version.lsp
> ;   Loading /home/user/logon/lingo/erg/lkb/globals.lsp
> ;   Loading /home/user/logon/lingo/erg/lkb/user-fns.lsp
> ;   Loading /home/user/logon/lingo/erg/lkb/checkpaths.lsp
> ;   Loading /home/user/logon/lingo/erg/lkb/patches.lsp
>
> Reading in type file fundamentals
> Reading in type file tmt
> Reading in type file lextypes
> [14:13:08] gc-after-hook(): {L#626 N=5.2M O=0 E=100%} [S=2.3G R=102M].
>
> Reading in type file syntax
> [14:13:10] gc-after-hook(): {L#627 N=7.1M O=0 E=99%} [S=2.3G R=232M].
>
> Reading in type file ctype
> Reading in type file lexrules
> Reading in type file auxverbs
> [14:13:12] gc-after-hook(): {L#628 N=9.2M O=0 E=98%} [S=2.3G R=352M].
>
> Reading in type file mtr
> Reading in type file dt
> Checking type hierarchy
> Checking for unique greatest lower bounds
> Expanding constraints
> [14:13:18] gc-after-hook(): {L#629 N=55M O=5.2K E=99%} [S=2.3G R=352M].
>
> Making constraints well formed
> [14:13:19] gc-after-hook(): {L#630 N=72M O=4.8M E=82%} [S=2.3G R=356M].
> [14:13:19] gc-after-hook(): {L#631 N=80M O=1.9M E=68%} [S=2.3G R=358M].
> [14:13:20] gc-after-hook(): {L#632 N=87M O=2.2M E=79%} [S=2.3G R=392M].
> [14:13:21] gc-after-hook(): {L#633 N=62M O=34M E=43%} [S=2.3G R=442M].
> [14:13:22] gc-after-hook(): {L#634 N=69M O=23M E=80%} [S=2.3G R=466M].
> [14:13:22] gc-after-hook(): 133M tenured; forcing global gc().
> [14:13:23] gc-after-hook(): {GR#8 N=54M O=0 E=100%} [S=2.3G R=484M].
> [14:13:24] gc-after-hook(): {L#635 N=88M O=0 E=0%} [S=2.3G R=484M].
> [14:13:25] gc-after-hook(): {L#636 N=97M O=10M E=69%} [S=2.3G R=491M].
> [14:13:26] gc-after-hook(): {L#637 N=99M O=14M E=63%} [S=2.4G R=532M].
> [14:13:27] gc-after-hook(): {L#638 N=93M O=29M E=53%} [S=2.4G R=581M].
> 80175904 bytes have been tenured, next gc will be global.
> See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more
> information.
>
> Expanding defaults
> Type file checked successfully
> Computing display ordering
> Reading in cached leaf types
> Cached leaf types read
> Reading in cached lexicon (main)
> Cached lexicon read
> Reading in rules file constructions
> Reading in lexical rules file inflr
> Reading in lexical rules file inflr-pnct
> Reading in root file roots
> Reading in lexical rules file lexrinst
> Reading in parse node file parse-nodes
> ;   Loading /home/user/logon/lingo/erg/lkb/mrsglobals.lsp
> ;   Loading /home/user/logon/lingo/erg/lkb/eds.lsp
> ;   Loading /home/user/logon/lingo/erg/www/setup.lsp
> ; cpu time (non-gc) 13.952552 sec user, 0.026410 sec system
> ; cpu time (gc)     9.165182 sec user, 0.505708 sec system
> ; cpu time (total)  23.117734 sec user, 0.532118 sec system
> ; real time  22.104421 sec (107.0%)
> ; space allocation:
> ;  25,979,360 cons cells, 681,401,040 other bytes, 0 static bytes
> ; Page Faults: major: 0 (gc: 66190), minor: 163781 (gc: 66190)
> ;   Loading /home/user/logon/lingo/erg/rpp/setup.lsp
> read-repp(): reading file `xml.rpp'.
> read-repp(): reading file `latex.rpp'.
> read-repp(): reading file `ascii.rpp'.
> read-repp(): reading file `html.rpp'.
> read-repp(): reading file `wiki.rpp'.
> read-repp(): reading file `lgt.rpp'.
> read-repp(): reading file `gml.rpp'.
> read-repp(): reading file `robustness.rpp'.
> read-repp(): reading file `quotes.rpp'.
> read-repp(): reading file `ptb.rpp'.
> read-repp(): reading file `lkb.rpp'.
> read-repp(): reading file `micro.rpp'.
> read-repp(): reading file `tokenizer.rpp'.
> read-heads() reading file `rules.hds'.
> read-model(): reading file `jhpstg.g.mem'.
> [14:13:30] gc-after-hook(): {G#638 N=78M O=0 E=87%} [S=2.4G R=617M].
> read-semi(): reading file `erg.smi'.
> read-semi(): reading file `hierarchy.smi'.
> read-semi(): reading file `abstract.smi'.
> read-semi(): reading file `surface.smi'.
> [14:13:32] gc-after-hook(): {L#639 N=108M O=0 E=0%} [S=2.4G R=617M].
> read-vpm(): reading file `semi.vpm'.
> read-vpm(): reading file `abstract.vpm'.
> ;   Loading /home/user/logon/lingo/erg/lkb/mt.lsp
> read-transfer-rules(): reading file `paraphraser.mtr'.
> read-transfer-rules(): reading file `idioms.mtr'.
> read-transfer-rules(): reading file `trigger.mtr'.
> [14:13:34] gc-after-hook(): {L#640 N=108M O=11M E=83%} [S=2.4G R=617M].
> read-transfer-rules(): reading file `generation.mtr'.
>
> Building rule filter
> [14:13:36] gc-after-hook(): {L#641 N=105M O=9.5M E=90%} [S=2.4G R=617M].
> [14:13:42] gc-after-hook(): {L#642 N=93M O=14M E=95%} [S=2.4G R=617M].
> [14:13:47] gc-after-hook(): {L#643 N=24M O=72M E=92%} [S=2.4G R=666M].
> [14:13:47] gc-after-hook(): 161M tenured; forcing global gc().
> [14:13:48] gc-after-hook(): {GR#10 N=12M O=0 E=100%} [S=2.4G R=678M].
> 75861824 bytes have been tenured, next gc will be global.
> See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more
> information.
>
> Building lr connections table
> Constructing lr table for non-morphological rules
> Grammar input complete
> NIL
> TSNLP(7): [14:14:27] gc-after-hook(): {G#643 N=35M O=0 E=81%} [S=2.4G
> R=678M].
> [14:14:30] gc-after-hook(): {L#644 N=41M O=0 E=0%} [S=2.4G R=678M].
> [14:14:32] gc-after-hook(): {L#645 N=41M O=5.7M E=94%} [S=2.4G R=682M].
> [14:14:35] gc-after-hook(): {L#646 N=43M O=2.8M E=90%} [S=2.4G R=685M].
> [14:14:38] gc-after-hook(): {L#647 N=42M O=4.0M E=94%} [S=2.4G R=689M].
> [14:14:41] gc-after-hook(): {L#648 N=25M O=21M E=93%} [S=2.4G R=711M].
> [14:14:44] gc-after-hook(): {L#649 N=26M O=4.2M E=77%} [S=2.4G R=715M].
> [14:14:47] gc-after-hook(): {L#650 N=27M O=4.0M E=92%} [S=2.4G R=719M].
> [14:14:50] gc-after-hook(): {L#651 N=26M O=4.2M E=92%} [S=2.4G R=723M].
> [14:14:53] gc-after-hook(): {L#652 N=27M O=4.3M E=93%} [S=2.4G R=728M].
> 53092272 bytes have been tenured, next gc will be global.
> See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more
> information.
> [14:14:58] gc-after-hook(): {G#652 N=25M O=0 E=95%} [S=2.4G R=733M].
> [14:15:01] gc-after-hook(): {L#653 N=29M O=0 E=0%} [S=2.4G R=733M].
> [14:15:04] gc-after-hook(): {L#654 N=30M O=4.1M E=95%} [S=2.4G R=734M].
> [14:15:08] gc-after-hook(): {L#655 N=31M O=3.4M E=90%} [S=2.4G R=737M].
> [14:15:11] gc-after-hook(): {L#656 N=30M O=4.9M E=91%} [S=2.4G R=742M].
> [14:15:14] gc-after-hook(): {L#657 N=25M O=8.4M E=92%} [S=2.4G R=750M].
> [14:15:18] gc-after-hook(): {L#658 N=24M O=5.0M E=87%} [S=2.4G R=756M].
> [14:15:21] gc-after-hook(): {L#659 N=24M O=4.4M E=93%} [S=2.4G R=760M].
> [14:15:25] gc-after-hook(): {L#660 N=24M O=3.8M E=93%} [S=2.4G R=764M].
> [14:15:28] gc-after-hook(): {L#661 N=23M O=4.0M E=89%} [S=2.4G R=768M].
> [14:15:31] gc-after-hook(): {L#662 N=24M O=4.1M E=92%} [S=2.4G R=772M].
> [14:15:34] gc-after-hook(): {L#663 N=25M O=3.8M E=92%} [S=2.4G R=776M].
> [14:15:37] gc-after-hook(): {L#664 N=25M O=3.6M E=92%} [S=2.4G R=779M].
> [14:15:40] gc-after-hook(): {L#665 N=26M O=3.7M E=93%} [S=2.4G R=783M].
> 55870688 bytes have been tenured, next gc will be global.
> See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more
> information.
> #[SEM-I {38454 ges}: 0 roles; 22406 predicates; 0 properties]
> TSNLP(8): "/brat/"
> TSNLP(9):
> [t40009] BEGIN
> [t4000a] BEGIN
> [t40009] reading `/home/user/logon/lingo/erg/pet/english.set'... including
> `/home/user/logon/lingo/erg/pet/common.set'... including
> `/home/user/logon/lingo/erg/pet/global.set'... including
> `/home/user/logon/lingo/erg/pet/repp.set'... including
> `/home/user/logon/lingo/erg/pet/mrs.set'... loading
> `/home/user/logon/lingo/erg/english.grm'
> [t4000a] reading `/home/user/logon/lingo/erg/pet/english.set'... including
> `/home/user/logon/lingo/erg/pet/common.set'... including
> `/home/user/logon/lingo/erg/pet/global.set'... including
> `/home/user/logon/lingo/erg/pet/repp.set'... including
> `/home/user/logon/lingo/erg/pet/mrs.set'... loading
> `/home/user/logon/lingo/erg/english.grm'
> [t4000a] (ERG (1214)) reading ME model
> `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features]
> [t40009] (ERG (1214)) reading ME model
> `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features]
> [t4000a] read-vpm(): reading file `semi.vpm'.
> [t40009] read-vpm(): reading file `semi.vpm'.
> [t4000a] 95873 types in 15 s
> [t4000a]
> [t40009] 95873 types in 15 s
> [t40009]
>   [14:16:18] wait-for-clients(): `4091e35482b2' registered as tid <40009>
> [00:17].
>   [14:16:18] wait-for-clients(): `4091e35482b2' registered as tid <4000a>
> [00:17].
>
> NIL
> TSNLP(10):
> [t4000b] BEGIN
> [t4000c] BEGIN
> [t4000d] BEGIN
> [t4000e] BEGIN
> [t4000d] reading `/home/user/logon/lingo/erg/pet/english.set'... including
> `/home/user/logon/lingo/erg/pet/common.set'... including
> `/home/user/logon/lingo/erg/pet/global.set'... including
> `/home/user/logon/lingo/erg/pet/repp.set'... including
> `/home/user/logon/lingo/erg/pet/mrs.set'... loading
> `/home/user/logon/lingo/erg/english.grm'
> [t4000e] reading `/home/user/logon/lingo/erg/pet/english.set'... including
> `/home/user/logon/lingo/erg/pet/common.set'... including
> `/home/user/logon/lingo/erg/pet/global.set'... including
> `/home/user/logon/lingo/erg/pet/repp.set'... including
> `/home/user/logon/lingo/erg/pet/mrs.set'... loading
> `/home/user/logon/lingo/erg/english.grm'
> [t4000c] reading `/home/user/logon/lingo/erg/pet/english.set'... including
> `/home/user/logon/lingo/erg/pet/common.set'... including
> `/home/user/logon/lingo/erg/pet/global.set'... including
> `/home/user/logon/lingo/erg/pet/repp.set'... including
> `/home/user/logon/lingo/erg/pet/mrs.set'... loading
> `/home/user/logon/lingo/erg/english.grm'
> [t4000b] reading `/home/user/logon/lingo/erg/pet/english.set'... including
> `/home/user/logon/lingo/erg/pet/common.set'... including
> `/home/user/logon/lingo/erg/pet/global.set'... including
> `/home/user/logon/lingo/erg/pet/repp.set'... including
> `/home/user/logon/lingo/erg/pet/mrs.set'... loading
> `/home/user/logon/lingo/erg/english.grm'
>
>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200703/54615bf0/attachment-0001.html>

From arademaker at gmail.com  Sun Jul  5 00:35:48 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Sat, 4 Jul 2020 19:35:48 -0300
Subject: [developers] repp tool segmentation fault
Message-ID: <202F53D2-9F87-4E19-BA4D-E257BB0E400D@gmail.com>


Hi Woodley, 

> I was able to confirm that with the escaped backslashes, I get the segmentation fault and without, I do not. I suspect this is a bug in repp that we should file with Woodley.


https://github.com/delph-in/homebrew-delphin/issues/1

Not sure how easy is to fix this.. 

Alexandre 
Sent from my iPhone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200704/eedc1bcc/attachment.html>

From arademaker at gmail.com  Sun Jul  5 22:09:27 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Sun, 5 Jul 2020 17:09:27 -0300
Subject: [developers] www script in the logon distribution
In-Reply-To: <CAGXBFApjsJ1fsQp=1kfs5Zora7q1VcFaSoAiv5nbed7gdgX=bQ@mail.gmail.com>
References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com>
	<CAGXBFApjsJ1fsQp=1kfs5Zora7q1VcFaSoAiv5nbed7gdgX=bQ@mail.gmail.com>
Message-ID: <FDD0F51C-D6A8-48CF-BFE3-6DB6F3288B62@gmail.com>


Hi Michael, thank you very much for your comments. It is really good to have feedbacks. My answers below.


> On 3 Jul 2020, at 11:47, goodman.m.w at gmail.com wrote:
> 
> Hi Alexandre,
> 
> I certainly don't mind if you want to put it under the delph-in organization. I just looked it over briefly and I have two questions and a suggestion:
> 
> * It is described as being for macOS, but very little actually looks macOS-specific. Would it be appropriate to describe it in more general terms in case someone wants to run it from some other platform?

Indeed, we can now take this as a solution for many more situations than those envisioned by http://moin.delph-in.net/LkbMacintosh. I have added a better introduction to the README of the repo. Comments are welcome. I have also added links in http://moin.delph-in.net/ToolsTop.

> * It is called docker-logon, but I don't see that it gets any of the LOGON distribution. Maybe it should be renamed?

I renamed it to https://github.com/own-pt/docker-delphin.

> * It looks like you've included web.c from FFTB. The FFTB project is under the MIT license, so you'll need to include its LICENSE file as well.
> 

This is important. Thank you for reminding me about license. I have added a MIT license and in the readme I also add a notice about the license of the tools.

Regarding the web.c copy, I am not very happy with the current solution. I can see the following alternatives:

1. Having a copy of fftb svn repo in a git repository under the DELPH-IN organization. We could than use it to replicate Woodley changes in the SVN official repo, track issues, and we could also have branches with changes like the one I proposed in the web.c.

2. Use a patch file instead of a copy of the whole web.c, a little bit more complicate and I am not sure how safe it would be.

3. Have a script to change the file during the docker image building, somehow similar to the previous option.


Comments are welcome, but I would vote on option 1, if possible, not only for fftb but also for ACE, art etc. 


Best,
Alexandre


From arademaker at gmail.com  Sun Jul  5 22:51:40 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Sun, 5 Jul 2020 17:51:40 -0300
Subject: [developers] The WeSearch interface
References: <20190519193216.33012.97827@sh.hpc.uio.no>
Message-ID: <F935A934-1DB0-464B-9873-771EDCA59D07@gmail.com>


Hi Stephan,

The `export.sh` script mentioned in http://moin.delph-in.net/ErgWeSearch is not available in the WeSearcch repository (http://svn.delph-in.net/wsi/trunk/). Can you share this script? 

As an alternative, I tried to use the $LOGON/redwoods script to export a profile created with Pydelphin+ACE but I was not able to understand how to operate with a profile not created by the $LOGON/parse. Any help would be more than welcome! ;-)

Best,
Alexandre


From oe at ifi.uio.no  Sun Jul  5 23:40:21 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sun, 5 Jul 2020 23:40:21 +0200
Subject: [developers] The WeSearch interface
In-Reply-To: <F935A934-1DB0-464B-9873-771EDCA59D07@gmail.com>
References: <20190519193216.33012.97827@sh.hpc.uio.no>
	<F935A934-1DB0-464B-9873-771EDCA59D07@gmail.com>
Message-ID: <CA+_Fm6Lb9irxkN7_9xh9Ei0jqFRb2eNbLvUwEMdPiEFPkOZkog@mail.gmail.com>

hi alexandre,

it appears roman (who worked on WSI improvements at UW for a while) created
the script that you are missing.  i am not sure i actually have a copy
myself (and cannot easily check while traveling this week).

but we used to create the WSI indices from the standard export files
created by the LOGON ?redwoods? script.  that should work with any valid
[incr tsdb()] treebank, no matter how it was created. somewhere in the ERG,
there should be a file Notes, or Readme, or the like with export
instructions.

so, how did you create your treebank(s), how do you call the ?redwoods?
script, and (most importantly) what exactly happens?

best wishes, oe


On Sun, 5 Jul 2020 at 22:52 Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> Hi Stephan,
>
> The `export.sh` script mentioned in http://moin.delph-in.net/ErgWeSearch
> is not available in the WeSearcch repository (
> http://svn.delph-in.net/wsi/trunk/). Can you share this script?
>
> As an alternative, I tried to use the $LOGON/redwoods script to export a
> profile created with Pydelphin+ACE but I was not able to understand how to
> operate with a profile not created by the $LOGON/parse. Any help would be
> more than welcome! ;-)
>
> Best,
> Alexandre
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200705/589c3abf/attachment.html>

From arademaker at gmail.com  Mon Jul  6 00:17:48 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Sun, 5 Jul 2020 19:17:48 -0300
Subject: [developers] The WeSearch interface
In-Reply-To: <CA+_Fm6Lb9irxkN7_9xh9Ei0jqFRb2eNbLvUwEMdPiEFPkOZkog@mail.gmail.com>
References: <20190519193216.33012.97827@sh.hpc.uio.no>
	<F935A934-1DB0-464B-9873-771EDCA59D07@gmail.com>
	<CA+_Fm6Lb9irxkN7_9xh9Ei0jqFRb2eNbLvUwEMdPiEFPkOZkog@mail.gmail.com>
Message-ID: <E4D51A6D-FDB9-47B4-96EE-5AE395255939@gmail.com>


Hi Stephan,

Thank you for your answer. I am actually trying to reproduce the results from https://www.aclweb.org/anthology/W15-2205/ and the code that transforms EDS/MRS to RDF seems to live in the WeSearch Java code, right? Anyway, having the WeSearch interface running will be also VERY helpful. 

In the future, I surely would like to explore more and understand the lisp code that redwoods script is calling, the main part seems to be in the TSDB package and the function `browse-trees` but there are many auxiliar scripts loaded before it and many variables and other functions from the LKB and TSDB packages. It is still not clear to me how to decouple the lisp code from the LOGON scripts and all PVM related stuff.


I have created the profile with:

% delphin mkprof --input sample.txt --relations ~/hpsg/logon/lingo/lkb/src/tsdb/skeletons/english/Relations --skeleton treebank

Then with pydelphin I analysed it with ACE:

////

from delphin import ace
from delphin import tsdb
from delphin import itsdb
ts = itsdb.TestSuite('treebank')
with ace.ACEParser('erg.dat') as cpu:
  ts.process(cpu)

////


For exporting, I tried many different alternatives of parameters. Unfortunately, I didn?t find much documentation about the redwoods script parameters. I would like to obtain the eds, mrs and dm (for that, I remember an old emails from you pointing to a python script that I will need to revisit) formats. Many combinations of parameters result in case (1) below. The last try gives me the result (2).


1) 

$ ./redwoods --binary --erg --default --composite --target /tmp --export mrs,eds --active all /home/user/tmp/treebank
redwoods: invalid `erg' profile `/home/user/tmp/treebank'; exit.

2) 

$ ./redwoods --binary --target /tmp --export mrs,eds /home/user/tmp/treebank

exporting `/home/user/tmp/treebank' [1 -- 1001]

International Allegro CL Enterprise Edition
10.0 [64-bit Linux (x86-64)] (Feb 20, 2019 18:22)
Copyright (C) 1985-2015, Franz Inc., Oakland, CA, USA.  All Rights Reserved.

This standard runtime copy of Allegro CL was built by:
   [TC13152] Universitetet i Oslo

; Loading /home/user/logon/dot.tsdbrc
; Loading /home/user/.tsdbrc

[changing package from "COMMON-LISP-USER" to "TSDB"]
TSNLP(1): NIL
TSNLP(2): Error: "" does not exist, cannot load
  [condition type: FILE-DOES-NOT-EXIST-ERROR]

Restart actions (select using :continue):
 0: retry the load of
 1: skip loading
 2: Return to Top Level (an "abort" restart).
 3: Abort entirely from this (lisp) process.

[changing package from "TSDB" to "LKB"]
[1] LKB(3): :pop

[changing package from "LKB" to "TSDB"]
TSNLP(3): EOF
Really exit lisp [n]?


Best,
Alexandre


> On 5 Jul 2020, at 18:40, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi alexandre,
> 
> it appears roman (who worked on WSI improvements at UW for a while) created the script that you are missing.  i am not sure i actually have a copy myself (and cannot easily check while traveling this week).
> 
> but we used to create the WSI indices from the standard export files created by the LOGON ?redwoods? script.  that should work with any valid [incr tsdb()] treebank, no matter how it was created. somewhere in the ERG, there should be a file Notes, or Readme, or the like with export instructions.
> 
> so, how did you create your treebank(s), how do you call the ?redwoods? script, and (most importantly) what exactly happens?
> 
> best wishes, oe


From goodman.m.w at gmail.com  Mon Jul  6 05:02:10 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Mon, 6 Jul 2020 11:02:10 +0800
Subject: [developers] www script in the logon distribution
In-Reply-To: <FDD0F51C-D6A8-48CF-BFE3-6DB6F3288B62@gmail.com>
References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com>
	<CAGXBFApjsJ1fsQp=1kfs5Zora7q1VcFaSoAiv5nbed7gdgX=bQ@mail.gmail.com>
	<FDD0F51C-D6A8-48CF-BFE3-6DB6F3288B62@gmail.com>
Message-ID: <CAGXBFAooS0gA5FNf629WwmD-_iteZVi=eZriCKqy7B0vy+r-QQ@mail.gmail.com>

On Mon, Jul 6, 2020 at 4:09 AM Alexandre Rademaker <arademaker at gmail.com>
wrote:

> > On 3 Jul 2020, at 11:47, goodman.m.w at gmail.com wrote:
> > * It is described as being for macOS, but very little actually looks
> macOS-specific. Would it be appropriate to describe it in more general
> terms in case someone wants to run it from some other platform?
>
> Indeed, we can now take this as a solution for many more situations than
> those envisioned by http://moin.delph-in.net/LkbMacintosh. I have added a
> better introduction to the README of the repo. Comments are welcome. I have
> also added links in http://moin.delph-in.net/ToolsTop.
>
> > * It is called docker-logon, but I don't see that it gets any of the
> LOGON distribution. Maybe it should be renamed?
>
> I renamed it to https://github.com/own-pt/docker-delphin.
>

Re macOS and LOGON, that looks better, although some of the prose further
down the README still makes references to these two things. It might be
good if you could group these into sections and/or clarify the additional
steps (e.g., "The LOGON distribution is not included but this container is
compatible with its requirements. You can install LOGON by doing ...").


> > * It looks like you've included web.c from FFTB. The FFTB project is
> under the MIT license, so you'll need to include its LICENSE file as well.
> >
>
> This is important. Thank you for reminding me about license. I have added
> a MIT license and in the readme I also add a notice about the license of
> the tools.
>

I think what you added is sufficient.


> Regarding the web.c copy, I am not very happy with the current solution. I
> can see the following alternatives:
>
> 1. Having a copy of fftb svn repo in a git repository under the DELPH-IN
> organization. We could than use it to replicate Woodley changes in the SVN
> official repo, track issues, and we could also have branches with changes
> like the one I proposed in the web.c.
>

I have also at times wanted a bug tracker for Woodley's tools, and the
license doesn't prevent us from creating such a mirror, but I don't recall
ever asking Woodley his opinion about this as he seems content with the
current setup. He's responsive when I email patches to his code, and
there's a bug tracker of sorts in the "wishlist" wikis (e.g.,
http://moin.delph-in.net/FftbWishlist). Since your version of web.c only
makes a minimal one-line change, perhaps we can just provide Woodley a
patch for adding a --bind option so you can specify the address or 0?

2. Use a patch file instead of a copy of the whole web.c, a little bit more
> complicate and I am not sure how safe it would be.
>
> 3. Have a script to change the file during the docker image building,
> somehow similar to the previous option.
>

These sound brittle; at least, you'd have to make sure they keep in sync
with the current version. If we can't get a patch for a custom bind address
added into FFTB, then for this solution it would be best to pin the SVN
version of FFTB in the docker file so the patch will apply cleanly.


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200706/305879db/attachment-0001.html>

From arademaker at gmail.com  Wed Jul  8 20:48:29 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 8 Jul 2020 15:48:29 -0300
Subject: [developers] Fwd: Exporting profile
References: <9D60DA02-9084-4C57-BA72-2EDAEDE2F4EC@gmail.com>
Message-ID: <D9D537BE-3801-4E8A-99B4-500BB9D529D5@gmail.com>


I forgot to copy the list.

Best,
Alexandre

> From: Alexandre Rademaker <arademaker at gmail.com>
> Subject: Exporting profile
> Date: 8 July 2020 15:47:22 GMT-3
> To: Stephan Oepen <oe at ifi.uio.no>
> 
> 
> Hi Stephan,
> 
> I was able to export the profile with:
> 
> $ ./redwoods --binary --terg --home /home/user/tmp/ --target /tmp --export mrs,eds --active all treebank
> 
> (The name of my profile is `treebank` and it is located in /home/user/tmp. I discovered the parameter `home` and the possibility to specify the last version of ERG with `terg`).
> 
> That is nice, the parsing of profile files is not so trivial task and doesn?t make sense to not use the code already available. I wonder if the output format is document. For each item in the profile, I got a .gz file like that:
> 
> [1] (1 of 3) {1} [ the text of the sentence ]
> ^L
> [1:0] (active)
> 
> [the mrs text representation]
> 
> [the eds text representation]
> 
> ^L
> [1:1] (inactive)
> 
> [the mrs...]
> 
> [the eds...]
> 
> ^L
> [1:2] (inactive)
> 
> [the mrs...]
> 
> [the eds...]
> 
> 
> I would also like to understand what is the minimal Lisp code to export a profile using the functions from the tsdb and lkb packages. Given that, I would not depend on the scripts. I would be able to start a lisp REPL and do it interactively. I was expecting to be able to learn it with the `source` parameter, but I didn?t get any result.
> 
> Why do I need the grammar to export the profile? Sorry, maybe the answer to  this question is a long one, an article or wiki page! ;-) I remember that I have already read somewhere that some formats need the grammar or the SEM-I interface, right?
> 
> 
> Best,
> Alexandre

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200708/322d8b5e/attachment.html>

From oe at ifi.uio.no  Thu Jul  9 00:11:41 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Thu, 9 Jul 2020 00:11:41 +0200
Subject: [developers] Fwd: Exporting profile
In-Reply-To: <D9D537BE-3801-4E8A-99B4-500BB9D529D5@gmail.com>
References: <9D60DA02-9084-4C57-BA72-2EDAEDE2F4EC@gmail.com>
	<D9D537BE-3801-4E8A-99B4-500BB9D529D5@gmail.com>
Message-ID: <CA+_Fm6LTF2r7KdOwz1ATGnPNA_UOUsbCQOoykn3X40icU+nwsA@mail.gmail.com>

i am glad you are making progress, alexandre!  the WSI indexer should be
able to parse those export files, though ordinarily we only export and
index the active result(s)?assuming you have manually disambiguated?

the grammar is needed to export because derived formats (e.g. labeled
trees, MRS, EDS, DM) are computed dynamically, i.e. the export `interprets?
each recorded derivation using the full grammar, including its MRS, EDS, et
al. output configuration.

using the `?cat? option should give you the sequence of LKB and [incr
tsdb()] function calls.  i am afraid there is no formal documentation of
the export format, but your schematic summary almost seems self-explanatory!

best wishes, oe


ons. 8. jul. 2020 kl. 20:51 skrev Alexandre Rademaker <arademaker at gmail.com
>:

>
> I forgot to copy the list.
>
> Best,
> Alexandre
>
> *From: *Alexandre Rademaker <arademaker at gmail.com>
> *Subject: **Exporting profile*
> *Date: *8 July 2020 15:47:22 GMT-3
> *To: *Stephan Oepen <oe at ifi.uio.no>
>
>
> Hi Stephan,
>
> I was able to export the profile with:
>
> $ ./redwoods --binary --terg --home /home/user/tmp/ --target /tmp --export
> mrs,eds --active all treebank
>
> (The name of my profile is `treebank` and it is located in /home/user/tmp.
> I discovered the parameter `home` and the possibility to specify the last
> version of ERG with `terg`).
>
> That is nice, the parsing of profile files is not so trivial task and
> doesn?t make sense to not use the code already available. I wonder if the
> output format is document. For each item in the profile, I got a .gz file
> like that:
>
> [1] (1 of 3) {1} [ the text of the sentence ]
> ^L
> [1:0] (active)
>
> [the mrs text representation]
>
> [the eds text representation]
>
> ^L
> [1:1] (inactive)
>
> [the mrs...]
>
> [the eds...]
>
> ^L
> [1:2] (inactive)
>
> [the mrs...]
>
> [the eds...]
>
>
> I would also like to understand what is the minimal Lisp code to export a
> profile using the functions from the tsdb and lkb packages. Given that, I
> would not depend on the scripts. I would be able to start a lisp REPL and
> do it interactively. I was expecting to be able to learn it with the
> `source` parameter, but I didn?t get any result.
>
> Why do I need the grammar to export the profile? Sorry, maybe the answer
> to  this question is a long one, an article or wiki page! ;-) I remember
> that I have already read somewhere that some formats need the grammar or
> the SEM-I interface, right?
>
>
> Best,
> Alexandre
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200709/8a97eb05/attachment.html>

From arademaker at gmail.com  Fri Jul 10 22:21:43 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 10 Jul 2020 17:21:43 -0300
Subject: [developers] semantic representations in RDF
Message-ID: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>


Hi,

Sorry for this long email. I have written for Stephan many times this week, so I don?t want to keep disturbing (only! ;-)) him. So, I am sharing my finds about the WSI interface, maybe someone that worked with this code can share some information with me.

I am trying to reproduce the text entailment technique described in

[a0] https://www.aclweb.org/anthology/W15-2205.pdf

The basically idea is to convert EDS to RDF and after applying some transformations, use SPARQL to test the entailment between two structures. Given that, the first step is to have the RDF data from EDS representations. For that, the authors used the WSI code. That is why I am trying to understand the current status and the history behind the WSI interface/code. The file http://svn.delph-in.net/wsi/trunk/src/CHANGES.txt is not very informative! 

Maybe there are reasons for the problems I listed below. Maybe someone is still working on the code. Maybe the problems are well-know limitations and ideas that were never really implemented. I just want to know if it makes sense to invest time on trying to solve the problems I found. BTW, there is no license file, may I fork this SVN repository in a GitHub repository?


The relevant pages/articles are:

[w1] http://moin.delph-in.net/WeSearch/Rdf
[w2] http://moin.delph-in.net/ErgWeSearch
[w3] http://moin.delph-in.net/WeSearch/Interface
[w4] http://moin.delph-in.net/WeSearch/QueryLanguage
[a1] http://www.lrec-conf.org/proceedings/lrec2014/pdf/1166_Paper.pdf
[a2] https://www.aclweb.org/anthology/C14-2020.pdf

Problems:

1) The wiki pages [w3,w2] are not in sync with the README.txt in the code repository http://svn.delph-in.net/wsi/trunk/. For example, the directory `generic-gui` is now called `common-gui`.

2) The .nq file produced by the indexing is not valid. IRI likes `<9>` are not allowed in https://www.w3.org/TR/n-quads/. I was able to produce a temporary solution but it creates other problems.

3) The [a1,a2,w1] say nothing about how the URLs/IRIs are created. But as we can see for the output below, nodes like `x3` would have a single IRI shared for all sentences in the corpora. I understand the EDS node identifier are not variables, and that tiples are grouped in a graph, but, still, conceptually, in the dataset, there is no single x3, but many different ones in different sentences, right? I didn?t find the complete ontology that defines the EDS, MRS and DM representations. On [a1] the authors wrote only:

> The full MRS ontology (not discussed in detail here) distinguishes different types of nodes, corresponding to full predications vs. individual logical variables vs. hierarchically organized sub-properties of variables...

4) There is no rdfs:type (<http://www.w3.org/2000/01/rdf-schema#type>), the `type` predicate is defined in the http://www.w3.org/1999/02/22-rdf-syntax-ns# (prefix `rdf`).

5) If I fix the cases [4] and [2] in the RDF transformation code, the interface breaks. I am still investigating if the problem is in the SPARQL generation or in the page construction from the results.

6) The query language (WQL) documented in http://alt.qcri.org/semeval2015/task18/index.php?id=search and [w4] is not working in the current version of the interface: 

Accept =>    x: _*  [ARG* x]
Reject =>    x: _fight*  [ARG* x]     
Reject =>    /v[ARG* x]
Reject =>    +dog


Comments are welcome! ;-)

Best,
Alexandre


EDS:

{e2:
 _1:udef_q<0:3>[BV x3]
 e9:card<0:3>("2"){e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x3]
 x3:_dog_n_1<4:8>{x PERS 3, NUM pl, IND +, PT pt}[]
 e2:_fight_v_1<13:21>{e SF prop, TENSE pres, MOOD indicative, PROG +, PERF -}[ARG1 x3]
}


RDF predicates triples only for the EDS above:

% cat 1.nq  | grep "<1>" | grep "predicate"
<_1> <http://www.delph-in.net/rdf/eds#predicate> "udef_q"^^<http://www.w3.org/2001/XMLSchema#string> <1> .
<e9> <http://www.delph-in.net/rdf/eds#predicate> "card"^^<http://www.w3.org/2001/XMLSchema#string> <1> .
<x3> <http://www.delph-in.net/rdf/eds#predicate> "_dog_n_1"^^<http://www.w3.org/2001/XMLSchema#string> <1> .
<e2> <http://www.delph-in.net/rdf/eds#predicate> "_fight_v_1"^^<http://www.w3.org/2001/XMLSchema#string> <1> .


Complete RDF from the EDS above:

<_1> <http://www.delph-in.net/rdf/eds#predicate> "udef_q"^^<http://www.w3.org/2001/XMLSchema#string> <1> .
<_1> <http://www.delph-in.net/rdf/eds#bv> <x3> <1> .
<_1> <http://www.delph-in.net/rdf/eds#role> <x3> <1> .
<e9> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#E> <1> .
<e9> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#I> <1> .
<e9> <http://www.delph-in.net/rdf/eds#SF> <e9_SF> <1> .
<e9_SF> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#sf> <1> .
<e9_SF> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#prop> <1> .
<e9> <http://www.delph-in.net/rdf/eds#TENSE> <e9_TENSE> <1> .
<e9_TENSE> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#tense> <1> .
<e9_TENSE> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#untensed> <1> .
<e9> <http://www.delph-in.net/rdf/eds#MOOD> <e9_MOOD> <1> .
<e9_MOOD> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#mood> <1> .
<e9_MOOD> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#indicative> <1> .
<e9> <http://www.delph-in.net/rdf/eds#PROG> <e9_PROG> <1> .
<e9_PROG> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#bool> <1> .
<e9_PROG> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#False> <1> .
<e9> <http://www.delph-in.net/rdf/eds#PERF> <e9_PERF> <1> .
<e9_PERF> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#bool> <1> .
<e9_PERF> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#False> <1> .
<e9> <http://www.delph-in.net/rdf/eds#predicate> "card"^^<http://www.w3.org/2001/XMLSchema#string> <1> .
<e9> <http://www.delph-in.net/rdf/eds#carg> "2"^^<http://www.w3.org/2001/XMLSchema#string> <1> .
<e9> <http://www.delph-in.net/rdf/eds#arg1> <x3> <1> .
<e9> <http://www.delph-in.net/rdf/eds#role> <x3> <1> .
<x3> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#X> <1> .
<x3> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#P> <1> .
<x3> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#I> <1> .
<x3> <http://www.delph-in.net/rdf/eds#PERS> <x3_PERS> <1> .
<x3_PERS> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#pers> <1> .
<x3_PERS> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#3> <1> .
<x3> <http://www.delph-in.net/rdf/eds#NUM> <x3_NUM> <1> .
<x3_NUM> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#num> <1> .
<x3_NUM> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#pl> <1> .
<x3> <http://www.delph-in.net/rdf/eds#IND> <x3_IND> <1> .
<x3_IND> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#bool> <1> .
<x3_IND> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#True> <1> .
<x3> <http://www.delph-in.net/rdf/eds#PT> <x3_PT> <1> .
<x3_PT> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#pt> <1> .
<x3_PT> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#pt> <1> .
<x3> <http://www.delph-in.net/rdf/eds#predicate> "_dog_n_1"^^<http://www.w3.org/2001/XMLSchema#string> <1> .
<e2> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#E> <1> .
<e2> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#I> <1> .
<e2> <http://www.delph-in.net/rdf/eds#SF> <e2_SF> <1> .
<e2_SF> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#sf> <1> .
<e2_SF> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#prop> <1> .
<e2> <http://www.delph-in.net/rdf/eds#TENSE> <e2_TENSE> <1> .
<e2_TENSE> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#tense> <1> .
<e2_TENSE> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#pres> <1> .
<e2_TENSE> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#non-past> <1> .
<e2_TENSE> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#tensed> <1> .
<e2> <http://www.delph-in.net/rdf/eds#MOOD> <e2_MOOD> <1> .
<e2_MOOD> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#mood> <1> .
<e2_MOOD> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#indicative> <1> .
<e2> <http://www.delph-in.net/rdf/eds#PROG> <e2_PROG> <1> .
<e2_PROG> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#bool> <1> .
<e2_PROG> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#True> <1> .
<e2> <http://www.delph-in.net/rdf/eds#PERF> <e2_PERF> <1> .
<e2_PERF> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#bool> <1> .
<e2_PERF> <http://www.w3.org/2000/01/rdf-schema#type> <http://www.delph-in.net/rdf/eds#False> <1> .
<e2> <http://www.delph-in.net/rdf/eds#predicate> "_fight_v_1"^^<http://www.w3.org/2001/XMLSchema#string> <1> .
<e2> <http://www.delph-in.net/rdf/eds#arg1> <x3> <1> .
<e2> <http://www.delph-in.net/rdf/eds#role> <x3> <1> .
<e2> <http://www.delph-in.net/rdf/eds#top> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> <1> .


From arademaker at gmail.com  Mon Jul 13 23:54:08 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Mon, 13 Jul 2020 18:54:08 -0300
Subject: [developers] MacLUI alpha test
In-Reply-To: <E8C3D8DD-330A-4CC8-B1F5-8901123DCFFD@sussex.ac.uk>
References: <CA+_Fm6+HttYVofLDzj7=ZhUd52cuoS9hG2UJGy8zFj=WVUDS4g@mail.gmail.com>
	<CA+_Fm6Jc_rAvjoUDFTjjBqNV7EEB8R13kjQ1K_fEdm3vUv7oqg@mail.gmail.com>
	<CA8B7339-2229-4D0E-8B46-4748BD971C2C@sweaglesw.org>
	<CA+_Fm6KhbugfOeQV1DcWtrTaK90PVUrSPViCjjQfZxoEqS5BgQ@mail.gmail.com>
	<E2B7FAA9-05C1-404C-AFC9-A20BEB2A3BD7@sweaglesw.org>
	<CA+_Fm6+=mRK7VFYeTLy7bAo_Ewy_ek22H21pFTk3oaYEs+Y9SA@mail.gmail.com>
	<938B5910-D32F-4BD3-99D7-D41B8C71C0D6@gmail.com>
	<F659800A-B0D5-45D0-87D2-D205D366FB5F@sweaglesw.org>
	<73D395AD-B766-4CA4-9953-2F2C6CF67616@uw.edu>
	<CANy_-jLph-J=qKCKY0CJnqGWy=BVWVVQNm-YiSVNDJU_-TA2LA@mail.gmail.com>
	<6EA616B0-05D6-4D65-A5E9-4879F3BD8320@sussex.ac.uk>
	<6094A46D-805D-4207-9438-BF1282CC2EB3@sweaglesw.org>
	<43159F5B-7B0A-4E35-A9B3-AA693D50CF01@sussex.ac.uk>
	<B79B0F73-60E5-4646-894D-904CAFBACA21@sweaglesw.org>
	<E8C3D8DD-330A-4CC8-B1F5-8901123DCFFD@sussex.ac.uk>
Message-ID: <F70352A9-F670-462B-8D54-E40A359BF16C@gmail.com>

Hi Woodley,

I just noticed that http://moin.delph-in.net/LkbLui#Obtaining_and_Running_LUI didn?t have a link to the directory below where the README.txt file has the installation instructions:

http://sweaglesw.org/linguistics/maclui/

Maybe the other links to http://sweaglesw.org/linguistics/yzlui-for-osx.tar.gz and http://sweaglesw.org/linguistics/yzlui.x86-64 in the same paragraph are now obsolete!? I am not sure, so I didn?t remove them. 

Best,
Alexandre


From sweaglesw at sweaglesw.org  Tue Jul 14 00:36:49 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Mon, 13 Jul 2020 15:36:49 -0700
Subject: [developers] MacLUI alpha test
In-Reply-To: <F70352A9-F670-462B-8D54-E40A359BF16C@gmail.com>
References: <F70352A9-F670-462B-8D54-E40A359BF16C@gmail.com>
Message-ID: <6273D36E-ED92-4CC7-9AD0-5368558EB545@sweaglesw.org>

Hi Alexandre,

The maclui preview that you added a link to is/was not yet released software, and there are caveats about using it.  But I?m glad you called my attention to it, since that is another thing I surely should mention on Wednesday.

Woodley

> On Jul 13, 2020, at 2:54 PM, Alexandre Rademaker <arademaker at gmail.com> wrote:
> 
> ?Hi Woodley,
> 
> I just noticed that http://moin.delph-in.net/LkbLui#Obtaining_and_Running_LUI didn?t have a link to the directory below where the README.txt file has the installation instructions:
> 
> http://sweaglesw.org/linguistics/maclui/
> 
> Maybe the other links to http://sweaglesw.org/linguistics/yzlui-for-osx.tar.gz and http://sweaglesw.org/linguistics/yzlui.x86-64 in the same paragraph are now obsolete!? I am not sure, so I didn?t remove them. 
> 
> Best,
> Alexandre


From oe at ifi.uio.no  Tue Jul 14 02:25:18 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Tue, 14 Jul 2020 02:25:18 +0200
Subject: [developers] semantic representations in RDF
In-Reply-To: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
References: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
Message-ID: <CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>

hi alexandre,

> 6) The query language (WQL) documented in http://alt.qcri.org/semeval2015/task18/index.php?id=search and [w4] is not working in the current version of the interface:
>
> Accept =>    x: _*  [ARG* x]
> Reject =>    x: _fight*  [ARG* x]
> Reject =>    /v[ARG* x]
> Reject =>    +dog

what do you actually consider the 'current interface' in this context?
 the WQL documentation you reference is from the SDP shared task, so
you would have to try those queries against one of the bi-lexical
formats (e.g. DM :-): the '/' (PoS) and '+' (lemma) operators are only
defined for SDP graphs, i suspect.  also, 'v' is not a valid PoS value
(but 'v*' seems to work):

http://wesearch.delph-in.net/sdp/search.jsp

see you tomorrow!  oe

From oe at ifi.uio.no  Tue Jul 14 21:41:47 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Tue, 14 Jul 2020 21:41:47 +0200
Subject: [developers] relating various layers of information in [incr
	tsdb()] profiles
Message-ID: <CA+_Fm6KXB2PjZeL=EV3FsmiHpo0uPUaRLiZgbz79eV6hTiJvEg@mail.gmail.com>

hi jan,

yesterday (during the summit plenary), you inquired about a tighter
linking of MRS predications to the underlying syntactic analysis than
the default character ranges.  that is actually a fine example of
existing functionality, only nobody but me likely knows about it,
because there is no available documentation (it is buried in
'lingo/lkb/src/mrs/lnk.lisp' and the LOGON 'redwoods' script).

if one were to temporarily venture back to the LOGON environment, i
just tried the following:

$LOGONROOT/redwoods --erg \
  --export/id/blind input,derivation,mrs,eds \
  --condition "i-id == 21" --target /tmp mrs

here, the '/blind' modifier means to ignore any MRS (or labeled tree)
that may be recorded in the profile, which will trigger [incr tsdb()]
re-creating the complete feature structure (and then MRS) from the
recorded derivation tree; the '/id' modifier calls for MRS linking to
use identifiers into the derivation tree (rather than character
ranges), e.g. the export file contains:

[...]

(ROOT_STRICT
 (141 SB-HD_MC_C -0.207561 0 2
  (138 HDN_BNP-PN_C 0.0930572 0 1
   (137 N_SG_ILR 0.135806 0 1
    (31 abrams at n_-_pn_le 0 0 1

[...]

[ TOP: h1
   INDEX: e3 [ e SF: PROP TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - ]
   RELS: <
          [ proper_q<@138>
            LBL: h4
            ARG0: x6 [ x PERS: 3 NUM: SG IND: + ]
            RSTR: h5
            BODY: h7 ]
          [ named<@31>
            LBL: h8
            ARG0: x6
            CARG: "Abrams" ]

[...]

in the above <@138> and <@31> refer to the corresponding node
identifiers in the derivation tree, i.e. the unary rule that adds the
quantifier and the lexical entry for Abrams, respectively.  from what
i recall, these links are injected into (the AVM description of) each
MRS predication during the bottom-up reconstruction of the derivation
tree, i.e. as tokens, lexical entries, and constructions are being put
back together deterministically by [incr tsdb()].

looking further into the export file, there are both the initial (REPP
output) and internal (after chart mapping) tokenizations (in YY token
serialization):

<
  (1, 0, 1, <0:6>, 1, "Abrams", 0, "null")
  (2, 1, 2, <7:13>, 1, "barked", 0, "null")
  (3, 2, 3, <13:14>, 1, ".", 0, "null")
>

<
  (26, 0, 1, <0:6>, 1, "abrams", 0, "null")
  (28, 0, 1, <0:6>, 1, "abrams", 0, "null")
  (25, 1, 2, <7:14>, 1, "barked.", 0, "null")
  (27, 1, 2, <7:14>, 1, "barked.", 0, "null")
>

toward the bottom of the derivation tree, each lexical entry is
related to a list of (internal) token identifiers and corresponding
token feature structures, e.g.

[...]

    (38 bark_v1 at v_-_le 0 1 2
     ("barked." 25

[...]

so far, so good (and quite straightforward).  at this point, the
relation between internal and initial tokens becomes a little more
complex, as one initial token can be split into multiple internal
tokens (as would be the case e.g. in 'New York-based', with initial
tokens 'New' and 'York-based' vs. internal tokens 'New", 'York-', and
'based'); likewise, multiple initial tokens are frequently glued
together (e.g. initial #2 and #3 to form internal #25 or #27).  hence,
one has to resort to character ranges (plus knowledge that the initial
tokens are a simple sequence), to sort out these correspondences.

i ended up going through this example because this kind of exact
accounting through all analysis layers has at times been important to
me, and i do believe there should be complete information in ERG
profiles to piece things back together.  but, as demonstrated in the
above, this process requires looking at both layers of tokenization,
the derivation tree, and identifier-linked MRSs in tandem.  this kind
of holistic interpretation, i suspect, remains out of scope for
pyDelphin for now, in part because it requires the ability to
reconstruct derivations, using the grammar.

i attach the complete export file, in case you wanted to look at this
example more closely.

best wishes, oe

ps: from the available 'documentation' on alternate ways anchoring MRS
predications in corresponding input elements:

;;;
;;; an attempt at generalizing over various ways of linking to the underlying
;;; input to the parser, be it by character or vertex ranges (as used at times
;;; in HoG et al.) or token identifiers (originally at YY and now in LOGON).
;;; currently, there are four distinct value formats:
;;;
;;;   <0:4>    character range (i.e. a sub-string of an assumed flat input);
;;;   <0#2>    chart vertex range (traditional in PET to some degree);
;;;   <0 1 3>  token identifiers, i.e. links to basic input units;
;;;   <@42>    edge identifier (used internally in generation)
;;;
;;; of these, the first is maybe most widely supported across DELPH-IN tools,
;;; while the second (in my view) should be deprecated.  the third resembles
;;; what was used in VerbMobil, YY, and now LOGON; given that the input to a
;;; `deep' parser can always be viewed as a token lattice, this is probably the
;;; most general mode, and we should aim to establish it over time: first, the
;;; underlying input may not have been string-shaped (but come from the lattice
;;; of a speech recognizer), and second even with one underlying string there
;;; could be token-level ambiguity, so identifying the actual token used in an
;;; analysis preserves more information.  properties like the sub-string range,
;;; prosodic information (VerbMobil), or pointers to KB nodes (YY) can all be
;;; associated with the individual tokens sent into the parser.  finally, the
;;; fourth mode is used in generation, where surface linking actually is a two-
;;; stage process (see comments in `generate.lsp').              (4-dec-06; oe)
;;;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 21.gz
Type: application/gzip
Size: 1022 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200714/5606a6db/attachment.bin>

From oe at ifi.uio.no  Fri Jul 17 13:57:16 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Fri, 17 Jul 2020 13:57:16 +0200
Subject: [developers] consolidating LKB and [incr tsdb()] versions
Message-ID: <CA+_Fm6KNwV6E3h=Gk2mav_NjQYvKK32kALpDZEf3JhrDdpQO3A@mail.gmail.com>

hi john,

many thanks, once more, for your continued work on the LKB, and likewise
for porting the core of [incr tsdb()] to additional lisp environments!

i feel we might want to try and reduce variation across different branches
of the code.  today, i know of the following three:

(0) lkb/trunk
(1) logon/lingo/lkb
(2) lkb/fos

i had originally created the LOGON branch of the code to (a) include some
bulky pieces (e.g. language models for realization ranking and associated
third-party software) that ann did not want in the LKB trunk and (b)
experiment with new or revised functionality (primarily working with the
ERG) without affecting the larger LKB community.

i have periodically merged back LOGON revisions into the trunk, so if we
were to look over #+:logon throughout the code now it should be a good
indicator of either (a) or (b).  as you activate some of that code in the
FOS branch now, in principle we should go back and review (b)-type
revisions for general, long-term use.  but i suspect that, at least among
the subscribers of the list, the LOGON version of the LKB has been used at
least as much as the isolated trunk ... hence i am not too worried.

since we branched FOS off the trunk a few years back, development in LOGON
has continued, whereas i believe no recent changes have been committed to
the LKB trunk.  therefore, i would be tempted to try and merge across these
two active LKB branches, and the possibly declare the current trunk a
frozen ?dead end??  would you have some time to jointly work on unification
of bug fixes and revisions this coming week?  inasmuch as the LOGON
environment is still used, i would like to incorporate your FOS
improvements.  and, likewise, i would want my changes (in [incr tsdb()] and
maybe MRS or EDS manipulation) exposed to the FOS users.

in terms of internal DELPH-IN responsibilities, the current LKB trunk used
to be packaged via what we call the LinGO builds (http://lingo.delph-in.net)
and the Ubuntu+LKB live CD (
https://wiki.ling.washington.edu/bin/view.cgi/Main/KnoppixLKB).  both are
maintained by UW, so i am not quite sure about the breadth of their user
base?  but i am wondering whether these UW builds could move to using the
FOS code in the foreseeable future (seeing as i am proposing to officially
call an end ?

best wishes, oe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200717/f47dfa77/attachment.html>

From goodman.m.w at gmail.com  Sat Jul 18 07:58:28 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Sat, 18 Jul 2020 13:58:28 +0800
Subject: [developers] Infrastructure notes
Message-ID: <CAGXBFAo5Vuq_PtwvA+9jH7FnA2+mNMQg=sB_REvzEzQ4knbA3w@mail.gmail.com>

Hello,

Thanks again to everyone for the interesting discussion about modernizing
the DELPH-IN infrastructure at the summit. I was relatively quiet during
the discussion, but I have some thoughts below regarding the mailing list.

I agree it would be sad if the mailing list went away. Fortunately, there
is a new version of Mailman that runs on Python 3 (see https://list.org/).
Python's own mailing lists run it (here's an example of python-ideas:
https://mail.python.org/archives/list/python-ideas at python.org/). It has
some nice features, but I'm not really fond of the archives view compared
to the dense thread view of Mailman2 (e.g.,
http://lists.delph-in.net/archives/developers/2020/). Maybe it can be
configured to look more like this?

I also wouldn't mind using the Discourse site as our mailing list manager,
but the current UW installation is frequently down or not sending out
emails. I believe this is an issue particular to the installation and not
to Discourse itself. Beyond that, Discourse should not supplant the mailing
list unless (a) it can be used entirely via plaintext email, and (b) we can
import the existing list archives or find another solution for archival.

I'll send a second email about moving from SVN to Git.

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200718/bb667a55/attachment.html>

From goodman.m.w at gmail.com  Sat Jul 18 08:11:50 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Sat, 18 Jul 2020 14:11:50 +0800
Subject: [developers] Infrastructure notes: svn to git
Message-ID: <CAGXBFAorO7cJMjX+Wkvc6iNBXo97f7Hs_S9qkaL2EWYS0CkbQw@mail.gmail.com>

Hello, this is my second email about infrastructure changes. Since I
recently performed the import of the Matrix repository from SVN to Git and
had some success (and some failures), I have some suggestions. I used
git-svn for this, but some points are general. Also, most points are valid
regardless of the host (whether it's GitHub, GitLab, Bitbucket, your own
server, etc.).

- If your repository encompasses multiple projects, consider taking this
opportunity to split them into separate Git repositories. Unlike SVN, a Git
repository is easy to move around (on your local disk or to a new host), so
there's less reason to one repo for multiple projects.

- Use the --authors-file option to map SVN usernames to those of the
destination host; e.g., if going to GitHub, map to $
username at users.noreply.github.com so their personal email is not exposed.

- If your code relies on the presence of empty directories, use
--preserve-empty-dirs, as Git doesn't keep empty directories (the option
places a dummy file in each empty dir).

- If you're *moving* to Git and not mirroring, look into changing --prefix
to avoid ambiguous branch names (the default sets up the SVN repo like a
remote repository at origin/, which is also used when cloning from other
remotes, I think).

- Use --stdlayout if your SVN repo has the normal branches/, tags/, and
trunk/ split (otherwise use -b, -t, and -T to set these separately). It
will recreate the repo with Git's more efficient branching model than
creating subdirectories as in SVN.

- After the import (especially when moving and not mirroring), create a tag
that points to the last commit from SVN. This is mainly in case you later
wish to see the state of the repository before the move.

- DON'T delete the Git repo you create in this way after it's pushed
somewhere else. It contains metadata, which doesn't get pushed to the
remote, for reconnecting to the SVN repo. If the SVN repo has new commits
after the import, you'll need that metadata to apply them on top of the Git
repo (maybe it's possible without the metadata, but I couldn't figure it
out).

I also have some suggestions for migrating a Trac instance to GitHub
Issues, if anyone is dealing with that, but I won't send a separate email
unless people are interested.

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200718/8078d2ef/attachment.html>

From olzama at uw.edu  Sat Jul 18 09:59:02 2020
From: olzama at uw.edu (Olga Zamaraeva)
Date: Sat, 18 Jul 2020 00:59:02 -0700
Subject: [developers] Infrastructure notes
In-Reply-To: <CAGXBFAo5Vuq_PtwvA+9jH7FnA2+mNMQg=sB_REvzEzQ4knbA3w@mail.gmail.com>
References: <CAGXBFAo5Vuq_PtwvA+9jH7FnA2+mNMQg=sB_REvzEzQ4knbA3w@mail.gmail.com>
Message-ID: <CANy_-j+WQh9e5Dbq6bj303kUgxpbM5nwwedtN5iZNWm+iqYHuw@mail.gmail.com>

Hi Michael,

> Discourse should not supplant the mailing list unless [...] (b) we can
import the existing list archives or find another solution for archival.

When I started advocating for a Discourse website---primarily because it is
easier to organize figures and code and to mark specific posts as being the
solutions to specific questions, which greatly improves discoverability;
so, very much not plain text---I was certainly operating under the
assumption that such an import is possible (
https://meta.discourse.org/t/importing-mailing-lists-mbox-listserv-google-groups-emails/79773
).


On Fri, Jul 17, 2020 at 10:59 PM goodman.m.w at gmail.com <
goodman.m.w at gmail.com> wrote:

> Hello,
>
> Thanks again to everyone for the interesting discussion about modernizing
> the DELPH-IN infrastructure at the summit. I was relatively quiet during
> the discussion, but I have some thoughts below regarding the mailing list.
>
> I agree it would be sad if the mailing list went away. Fortunately, there
> is a new version of Mailman that runs on Python 3 (see https://list.org/).
> Python's own mailing lists run it (here's an example of python-ideas:
> https://mail.python.org/archives/list/python-ideas at python.org/). It has
> some nice features, but I'm not really fond of the archives view compared
> to the dense thread view of Mailman2 (e.g.,
> http://lists.delph-in.net/archives/developers/2020/). Maybe it can be
> configured to look more like this?
>
> I also wouldn't mind using the Discourse site as our mailing list manager,
> but the current UW installation is frequently down or not sending out
> emails. I believe this is an issue particular to the installation and not
> to Discourse itself. Beyond that, Discourse should not supplant the mailing
> list unless (a) it can be used entirely via plaintext email, and (b) we can
> import the existing list archives or find another solution for archival.
>
> I'll send a second email about moving from SVN to Git.
>
> --
> -Michael Wayne Goodman
>


-- 
Olga Zamaraeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200718/d72285a6/attachment-0001.html>

From goodman.m.w at gmail.com  Sat Jul 18 11:13:50 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Sat, 18 Jul 2020 17:13:50 +0800
Subject: [developers] Infrastructure notes
In-Reply-To: <CANy_-j+WQh9e5Dbq6bj303kUgxpbM5nwwedtN5iZNWm+iqYHuw@mail.gmail.com>
References: <CAGXBFAo5Vuq_PtwvA+9jH7FnA2+mNMQg=sB_REvzEzQ4knbA3w@mail.gmail.com>
	<CANy_-j+WQh9e5Dbq6bj303kUgxpbM5nwwedtN5iZNWm+iqYHuw@mail.gmail.com>
Message-ID: <CAGXBFApsVtxz7pJQqC+Rg6Rwe+MEohJqLemso7GC4bo2T22qPA@mail.gmail.com>

On Sat, Jul 18, 2020 at 3:59 PM Olga Zamaraeva <olzama at uw.edu> wrote:
>
> Hi Michael,
>
> > Discourse should not supplant the mailing list unless [...] (b) we can
import the existing list archives or find another solution for archival.
>
> When I started advocating for a Discourse website---primarily because it
is easier to organize figures and code and to mark specific posts as being
the solutions to specific questions, which greatly improves
discoverability; so, very much not plain text---I was certainly operating
under the assumption that such an import is possible (
https://meta.discourse.org/t/importing-mailing-lists-mbox-listserv-google-groups-emails/79773).


Thanks, Olga. To clarify on that point, I appreciate and use the features
of the web interface, but if the Discourse instance is to replace the
mailing list, it should *also* be fully usable by email (by "fully" I mean
that people can follow and participate in the conversation as if it were
email; not including web-only features like thread tagging or marking posts
as solutions). The formatting in the posts, being Markdown, should be
legible in plaintext email (although HTML mail is pretty standard these
days; maybe I'm not being "modern" enough and should relax that point :).

>
> On Fri, Jul 17, 2020 at 10:59 PM goodman.m.w at gmail.com <
goodman.m.w at gmail.com> wrote:
>>
>> Hello,
>>
>> Thanks again to everyone for the interesting discussion about
modernizing the DELPH-IN infrastructure at the summit. I was relatively
quiet during the discussion, but I have some thoughts below regarding the
mailing list.
>>
>> I agree it would be sad if the mailing list went away. Fortunately,
there is a new version of Mailman that runs on Python 3 (see
https://list.org/). Python's own mailing lists run it (here's an example of
python-ideas: https://mail.python.org/archives/list/python-ideas at python.org/).
It has some nice features, but I'm not really fond of the archives view
compared to the dense thread view of Mailman2 (e.g.,
http://lists.delph-in.net/archives/developers/2020/). Maybe it can be
configured to look more like this?
>>
>> I also wouldn't mind using the Discourse site as our mailing list
manager, but the current UW installation is frequently down or not sending
out emails. I believe this is an issue particular to the installation and
not to Discourse itself. Beyond that, Discourse should not supplant the
mailing list unless (a) it can be used entirely via plaintext email, and
(b) we can import the existing list archives or find another solution for
archival.
>>
>> I'll send a second email about moving from SVN to Git.
>>
>> --
>> -Michael Wayne Goodman
>
>
>
> --
> Olga Zamaraeva


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200718/fc90eb32/attachment.html>

From arademaker at gmail.com  Tue Jul 21 03:46:46 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Mon, 20 Jul 2020 22:46:46 -0300
Subject: [developers] semantic representations in RDF
In-Reply-To: <CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>
References: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
	<CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>
Message-ID: <AA23B674-E7A0-4F3F-8E4D-3E10B4053D68@gmail.com>


Hi Stephan,

By current interface I mean the one I was able to run in my local machine taking the current version of the code in:

http://svn.delph-in.net/wsi/trunk

Documentation of the query language WQL in http://alt.qcri.org/semeval2015/task18/index.php?id=search is not clear about the operators vs format they support. I was expecting that regex would work in the predicates of EDS or MRS. So a query `x: _fight*[ARG* y]` could match a sentence with a predicate `_fight_v_1`. 

Emily,

You mentioned that you have an instance of the wsearch interface running too. Are you using the same code of the repository above? Do you know about any update/branch of this code? I am planning to work on:

1. New code (not java based) for transform the semantic representations to RDF
2. New code (not java based) to transform WQL to SPARQL.

Best,
Alexandre


> On 13 Jul 2020, at 21:25, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi alexandre,
> 
>> 6) The query language (WQL) documented in http://alt.qcri.org/semeval2015/task18/index.php?id=search and [w4] is not working in the current version of the interface:
>> 
>> Accept =>    x: _*  [ARG* x]
>> Reject =>    x: _fight*  [ARG* x]
>> Reject =>    /v[ARG* x]
>> Reject =>    +dog
> 
> what do you actually consider the 'current interface' in this context?
> the WQL documentation you reference is from the SDP shared task, so
> you would have to try those queries against one of the bi-lexical
> formats (e.g. DM :-): the '/' (PoS) and '+' (lemma) operators are only
> defined for SDP graphs, i suspect.  also, 'v' is not a valid PoS value
> (but 'v*' seems to work):
> 
> http://wesearch.delph-in.net/sdp/search.jsp
> 
> see you tomorrow!  oe


From oe at ifi.uio.no  Tue Jul 21 15:48:13 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Tue, 21 Jul 2020 15:48:13 +0200
Subject: [developers] More ERG/Redwoods issues
In-Reply-To: <CAGXBFArGfOkup7XHAT3F-jE0kS4bRbBn9kM9jBmya0uGkBDorQ@mail.gmail.com>
References: <CAGXBFApvK-c71kJ=VDx5QEGToM4z6S92S2v2hnj1VJ6HLSncFw@mail.gmail.com>
	<CAGXBFArGfOkup7XHAT3F-jE0kS4bRbBn9kM9jBmya0uGkBDorQ@mail.gmail.com>
Message-ID: <CA+_Fm6J4=RLVCaKUfWqKrgovC-v6pWeDVHxqLtO-n+WOTCNPzw@mail.gmail.com>

hi again, mike,

> Regarding my second point about unexpected characters in SimpleMRS strings, I tried making PyDelphin more robust to these situations even though I think they should be deemed invalid, but there are some that are simply irredeemable:
>
>     _+-]\?[/NN_u_unknown_rel"<12:18>  (wlb03)
>
> The ] initially threw me off, but even worse is the " after _rel (I included the <12:18> here just for context; note that there is no " at the start of this predicate so this is not a string predicate). I'm not sure how it got there. Maybe an ACE/LKB serialization error?

> In addition, I found a problem with a CARG in ws213:
>
>     [ named<37:41> LBL: h16 CARG: "NP\S"" ARG0: x12 ]
>
> Note that there are two quotation marks at the end of the CARG value. The item it comes from is 1000008400480, which does not have " following NP\S. (The i-input is: This complex category is notated as (NP\\S) instead of V.)

i am copying woodley, because the MRSs you are reading most likely
come from FFTB (i am also adding the 'developers' list, as surely most
folks care about these corner cases).

token mapping will allow the grammar to put virtually any character
into its predicates, and by and large i would say rightly so (even if
not all of the predicate and CARG examples in the above may ultimately
be desirable :-).  thus, MRS serialization may need to be sensitive to
different escaping conventions we have (or may yet have to establish),
as i have tried to summarize in our related M$ GitHub issue:

https://github.com/delph-in/pydelphin/issues/302

>     _output_string(?hello/JJ_u_unknown  (ws202)
>     _employee_name/NN_u_unknown  (ws203)
>
> There are _ characters inside the lemma portion of the predicates, which is not allowed. I don't recall if we came up with a scheme for encoding literal underscores in lemmas.

yes, i agree token mapping should not construct these predicates!  the
immediate solution that comes to my mind would be to backslash-escape
underscores in the lemma (and sense) fields, which i believe would
then bring along escaping of literal backslashes, i.e. in your first
example: _output\_string(?hello/JJ_u_unknown.

but before guarding against these invalid predicates in token mapping,
it would be good to push a little further in terms of cross-platform
agreement on these fine points of (simple) MRS serialization.

best wishes, oe


From goodman.m.w at gmail.com  Tue Jul 21 17:50:07 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Tue, 21 Jul 2020 23:50:07 +0800
Subject: [developers] More ERG/Redwoods issues
In-Reply-To: <CA+_Fm6J4=RLVCaKUfWqKrgovC-v6pWeDVHxqLtO-n+WOTCNPzw@mail.gmail.com>
References: <CAGXBFApvK-c71kJ=VDx5QEGToM4z6S92S2v2hnj1VJ6HLSncFw@mail.gmail.com>
	<CAGXBFArGfOkup7XHAT3F-jE0kS4bRbBn9kM9jBmya0uGkBDorQ@mail.gmail.com>
	<CA+_Fm6J4=RLVCaKUfWqKrgovC-v6pWeDVHxqLtO-n+WOTCNPzw@mail.gmail.com>
Message-ID: <CAGXBFAp3jpMRemyjVCp=t=h+o0u+rZM5WMM46a6fn5J8_fncHg@mail.gmail.com>

Thanks, Stephan,

On Tue, Jul 21, 2020 at 9:48 PM Stephan Oepen <oe at ifi.uio.no> wrote:

> [...]
> token mapping will allow the grammar to put virtually any character
> into its predicates, and by and large i would say rightly so (even if
> not all of the predicate and CARG examples in the above may ultimately
> be desirable :-).  thus, MRS serialization may need to be sensitive to
> different escaping conventions we have (or may yet have to establish),
> as i have tried to summarize in our related M$ GitHub issue:
>
> https://github.com/delph-in/pydelphin/issues/302
>
>
I'm merging some of the more general discussion from the linked GitHub
issue to this thread.

Regarding the PredicateRfc wiki, I don't think we should read it too
literally, as it was not written with the level of rigor as we put into,
e.g., the [TdlRfc](http://moin.delph-in.net/TdlRfc) page, and I'd call it
more descriptive than prescriptive. But we certainly could improve it to be
such a reference document.

Regarding the shape of predicates, we need to separate our design
considerations for the predicate symbols themselves from any constraints of
a particular serialization format, as they may be used, unquoted, in other
formats beyond SimpleMRS (e.g. EDS 'native' format, PENMAN, Indexed MRS,
etc.) which may have different sets of valid and invalid characters. In an
earlier thread we established that predicates of some different forms are
equivalent if they differ only along these dimensions:

* upper/lower case distinctions (_predicate_n_1 == _PREDICATE_n_1)
* surrounding quotes (_predicate_n_1 == "_predicate_n_1")
* presence of _rel suffix (_predicate_n_1 == _predicate_n_1_rel)

(Aside: I'm not fond of the last one because of the ambiguity with _rel as
a sense field (place_n == place_n_rel?); I'd argue for *requiring* that any
_rel suffix (that isn't a sense) be removed for grammar-external
("exported") MRSs)

I think we can go further and say that quoted predicates are not even part
of the spec for predicates; rather, they are an encoding scheme used by
several serialization formats for predicates that cannot legally be encoded
otherwise. At least, this could be true for exported MRSs. I recognize the
historical purpose of quoted predicates for those that don't have a type
defined in the grammar.

Other serialization formats may use other schemes. In JSON, for instance,
predicates are always quoted and they follow JSON escaping conventions. The
XML formats allow for "real predicates" that separate the lemma, pos, and
sense fields, but they are still bound by XML's encoding conventions.

>     _output_string(?hello/JJ_u_unknown  (ws202)
> >     _employee_name/NN_u_unknown  (ws203)
> >
> > There are _ characters inside the lemma portion of the predicates, which
> is not allowed. I don't recall if we came up with a scheme for encoding
> literal underscores in lemmas.
>
> yes, i agree token mapping should not construct these predicates!  the
> immediate solution that comes to my mind would be to backslash-escape
> underscores in the lemma (and sense) fields, which i believe would
> then bring along escaping of literal backslashes, i.e. in your first
> example: _output\_string(?hello/JJ_u_unknown.
>

I have a slight dispreference for backslash-escaping literal underscores,
because it complicates parsing. We could no longer simply split on _
characters to get the <lemma, pos, sense> components, and must parse the
predicates character-by-character to determine if the \ that precedes _ is
itself escaped, etc. TSDB's strategy might work, using \s or similar. We'd
still need to parse it to get the original form, but we can just split on _
to get the individual components.


> but before guarding against these invalid predicates in token mapping,
> it would be good to push a little further in terms of cross-platform
> agreement on these fine points of (simple) MRS serialization.
>
> best wishes, oe
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200721/05cb719e/attachment.html>

From arademaker at gmail.com  Tue Jul 21 18:09:37 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 21 Jul 2020 13:09:37 -0300
Subject: [developers] First step for the clone of FFTB SVN in GitHub
Message-ID: <B8338B8C-0E23-41EC-ABC1-2F1094631515@gmail.com>


https://github.com/arademaker/treebank

This was created with the commands below, following the step-by-step from http://www.sailmaker.co.uk/blog/2013/05/05/migrating-from-svn-to-git-preserving-branches-and-tags-3 that is consistent with the git svn documentation.


mkdir treebank
cd treebank
git svn init http://sweaglesw.org/svn/treebank --stdlayout --prefix=svn/
for tag in `git branch -r | grep "tags/" | sed 's/ tags\///'`; do git branch $tag refs/remotes/$tag; done
git svn fetch


The SVN repository (http://sweaglesw.org/svn/treebank/) didn?t contain branches, so I have created branches for the two tags I found: `foo` and `packard-2015`. Note that these tags do not look very interesting and maybe they could be removed in the SVN repository. That would make the process even simpler, tracking only the trunk branch.

With the repository ready, I created the GitHub repository and pushed to it

git push -u origin master svn/tags/foo svn/tags/packard-2015

We don?t have automation yet, if Woodley update its SVN, all I need to do is:

git svn fetch
git push --all -u origin

First command retrieves the news from SVN to my local machine. Second command pushes the news to GitHub. Eventually, a new tag or branch can be created by Woodley, in that case, I may also need to create git branches for them before push to GitHub. 

Since the FFTB is not updated very often, I feel the manual solution can works for now, but I wonder if I have any way to me informed by SVN changes. Does any one know any alternative to subscribe for SVN changes? 

Hi Michael and Stephan, any comments about the solution above? Michael is the only user in the https://github.com/delph-in organization? Can you add me so I can move this repository to the delph-in org?


Best,
Alexandre


From oe at ifi.uio.no  Tue Jul 21 20:57:34 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Tue, 21 Jul 2020 20:57:34 +0200
Subject: [developers] semantic representations in RDF
In-Reply-To: <AA23B674-E7A0-4F3F-8E4D-3E10B4053D68@gmail.com>
References: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
	<CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>
	<AA23B674-E7A0-4F3F-8E4D-3E10B4053D68@gmail.com>
Message-ID: <CA+_Fm6Ljts-KiQD-qAxyO+d79qphyzetibnm2P3YRb=nRoKqCg@mail.gmail.com>

hi again, alexandre:

> By current interface I mean the one I was able to run in my local machine taking the current version of the code in:
>
> http://svn.delph-in.net/wsi/trunk

i see, i had not realized you had gotten so far as to run your own WSI
instance ... congratulations on that milestone!

> Documentation of the query language WQL in http://alt.qcri.org/semeval2015/task18/index.php?id=search is not clear about the operators vs format they support.

yes, in fact there is no complete documentation of the WQL syntax and
of which operators are restricted to which formats.  the above page
(the closest we come to WQL documentation, i believe) is from the SDP
shared tasks, hence only applies to the bi-lexical frameworks (DM,
PAS, PSD, and CCD).

> I was expecting that regex would work in the predicates of EDS or MRS. So a query `x: _fight*[ARG* y]` could match a sentence with a predicate `_fight_v_1`.

yes, that type of wildcarding should indeed be applicable to pretty
much any query elements and graph formats.  your example query works
on the DeepBank index for ESD:

http://wesearch.delph-in.net/deepbank/

it does not match any results when searching the DeepBank MRSs,
however.  that is because WQL variables in an MRS index are
(interpreted as if) typed using the standard MRS conventions, i.e.
there is no predication whose label is of type 'x' and where there is
some argument of type 'y'.  if works if you modify the query to comply
with MRS types: 'h:_fight_*[ARG* x]'.

> You mentioned that you have an instance of the wsearch interface running too. Are you using the same code of the repository above? Do you know about any update/branch of this code?

i believe UW is not currently running their own WSI instance, because
they worry that index performance inside a virtual machine might not
scale favorably.  the improvements made by the UW MSc student are in
the WSI trunk, so you (unlike me) are using the latest and greatest
:-).

$ svn log http://svn.delph-in.net/wsi/trunk |head
------------------------------------------------------------------------
r27878 | rpearah at uw.edu | 2019-05-25 20:30:59 +0200 (Sat, 25 May 2019) | 1 line

chore: ? Add missing dependencies to pom.xml
------------------------------------------------------------------------
r27877 | rpearah at uw.edu | 2019-05-25 20:30:54 +0200 (Sat, 25 May 2019) | 1 line

style: ? Some minor style changes to MRS representation
------------------------------------------------------------------------
r27804 | rpearah at uw.edu | 2019-05-15 23:08:31 +0200 (Wed, 15 May 2019) | 1 line

> 1. New code (not java based) for transform the semantic representations to RDF
> 2. New code (not java based) to transform WQL to SPARQL.

yes, the first of these is also something i have been meaning to do
natively in lisp, i.e. export directly to RDF, rather than export to
those [incr tsdb()] ASCII files and then parse these in java, to
convert to RDF.  i believe we should have turtle 'ontologies' (or
schemas, if you will) for the various RDF representations, i.e. at
least MRS, EDS, and DM.  i am tempted to migrate the WSI code from SVN
to M$ GitHub, and then we could maybe collect these schemas there, and
you could look into generating the RDF serializations without java?

as for the second, the WQL parser is fairly tightly integrated with
the web application and RDF back-end ... here i am not as sure that
isolating just the parser will be worthwhile?  i take it you are about
as eager a java person as i am :-)?

best wishes, oe


From arademaker at gmail.com  Tue Jul 21 22:31:55 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 21 Jul 2020 17:31:55 -0300
Subject: [developers] semantic representations in RDF
In-Reply-To: <CA+_Fm6Ljts-KiQD-qAxyO+d79qphyzetibnm2P3YRb=nRoKqCg@mail.gmail.com>
References: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
	<CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>
	<AA23B674-E7A0-4F3F-8E4D-3E10B4053D68@gmail.com>
	<CA+_Fm6Ljts-KiQD-qAxyO+d79qphyzetibnm2P3YRb=nRoKqCg@mail.gmail.com>
Message-ID: <B1D5F497-5247-4910-8290-91EFD7251EA2@gmail.com>


Hi Stephan,

Thank you for your attention on that thread. I am afraid that we should have more differences between the code running in http://wesearch.delph-in.net/deepbank/search.jsp and the code in the SVN repository http://svn.delph-in.net/wsi/trunk/ that I compile and it is running in my local machine following the steps in http://moin.delph-in.net/WeSearch/Interface.

I am attaching the two SPARQL produced by the same search string `x: _fi*[ARG* y]`. In both cases, the query was submitted to the EDS representations.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sparql-wesearch.txt
URL: <http://lists.delph-in.net/archives/developers/attachments/20200721/668479b6/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sparql-local.txt
URL: <http://lists.delph-in.net/archives/developers/attachments/20200721/668479b6/attachment-0001.txt>
-------------- next part --------------


Note how in the local instance, the pattern `_fi*` is transformed into an enumeration of the predicates found in the dataset:

{ ?100 eds:predicate "_fight_n_1"^^xsd:string } UNION { ?100 eds:predicate "_fight_v_1"^^xsd:string }

But in the SPARQL on the delph-in.net server, the pattern is transformed into a regex filter 

regex(?100TEXT, "^_fi.*$?)

The same happens when I submitted the query `h:_fi*[ARG* x]` to the MRS representations. 

For the SVN to Git, if you agree, I can repeat the process that I executed to FFTB (reported in another email today) to create a git clone from the WSI SVN. Maybe if Michael add me in the Delphin-in organization I can already create the repository there.

Yes, I agree that we can have ontologies/vocabularies defined to each representation and I could work on that. We could take as starting point the discussion at http://moin.delph-in.net/WeSearch/Rdf, right? There are some notes in the end of the page http://moin.delph-in.net/ErgWeSearch too.

But first, I want to understand what updates we have from the 2015 SDP shared-task data formats and the current work you are doing in the http://mrp.nlpl.eu/2020/index.php and https://github.com/cfmrp/mtool. EDS is one particular format that can be described in MRP format, right? We also have the SDP tabular format, does it make sense to support all these formats? If you prefer, we can schedule a call for sync on the goals and possible approaches. 

For code, yes, I don?t like Java. It would be nice to take the opportunity to better understand the Lisp code embedded in the LKB, TSDB and some other packages in the LOGON repository. 

The only changed that I made in the code so far is shown below. I am also using apache-jena-3.15.0, the last version of Jena.

% svn diff
Index: src/common-gui/src/main/webapp/WEB-INF/web.xml
===================================================================
--- src/common-gui/src/main/webapp/WEB-INF/web.xml	(revision 28808)
+++ src/common-gui/src/main/webapp/WEB-INF/web.xml	(working copy)
@@ -18,7 +18,7 @@
 		<servlet-class>no.uio.ifi.wsi.gui.SearchInterface</servlet-class>
 		<init-param>
 			<param-name>DATA_PATH</param-name>
-			<param-value>/ltg/ls/aserve/indices/sdp/</param-value>
+			<param-value>/Users/ar/hpsg/text-entailment/data/</param-value>
 		</init-param>

 		<load-on-startup>1</load-on-startup>
Index: src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java
===================================================================
--- src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java	(revision 28808)
+++ src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java	(working copy)
@@ -27,8 +27,8 @@
 		CountIndexGenerator generator = new CountIndexGenerator(cmlReader.getCountDirectory());
 		generator.index(cmlReader.getRdfDirectory());
 		generator.writeCache();
-		runProcess(new String[] { "apache-jena-2.11.0/bin/tdbloader2", "--loc", cmlReader.getTdbDirectory() + "/1",
-				cmlReader.getRdfDirectory() + "/*" });
+		runProcess(new String[] { "apache-jena/bin/tdbloader2", "--loc", cmlReader.getTdbDirectory() + "1",
+				cmlReader.getRdfDirectory() + "1.nq" });
 	}

 	public static void runProcess(String[] command) throws Exception {


Best,
Alexandre


> On 21 Jul 2020, at 15:57, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi again, alexandre:
> 
>> By current interface I mean the one I was able to run in my local machine taking the current version of the code in:
>> 
>> http://svn.delph-in.net/wsi/trunk
> 
> i see, i had not realized you had gotten so far as to run your own WSI
> instance ... congratulations on that milestone!
> 
>> Documentation of the query language WQL in http://alt.qcri.org/semeval2015/task18/index.php?id=search is not clear about the operators vs format they support.
> 
> yes, in fact there is no complete documentation of the WQL syntax and
> of which operators are restricted to which formats.  the above page
> (the closest we come to WQL documentation, i believe) is from the SDP
> shared tasks, hence only applies to the bi-lexical frameworks (DM,
> PAS, PSD, and CCD).
> 
>> I was expecting that regex would work in the predicates of EDS or MRS. So a query `x: _fight*[ARG* y]` could match a sentence with a predicate `_fight_v_1`.
> 
> yes, that type of wildcarding should indeed be applicable to pretty
> much any query elements and graph formats.  your example query works
> on the DeepBank index for ESD:
> 
> http://wesearch.delph-in.net/deepbank/
> 
> it does not match any results when searching the DeepBank MRSs,
> however.  that is because WQL variables in an MRS index are
> (interpreted as if) typed using the standard MRS conventions, i.e.
> there is no predication whose label is of type 'x' and where there is
> some argument of type 'y'.  if works if you modify the query to comply
> with MRS types: 'h:_fight_*[ARG* x]'.
> 
>> You mentioned that you have an instance of the wsearch interface running too. Are you using the same code of the repository above? Do you know about any update/branch of this code?
> 
> i believe UW is not currently running their own WSI instance, because
> they worry that index performance inside a virtual machine might not
> scale favorably.  the improvements made by the UW MSc student are in
> the WSI trunk, so you (unlike me) are using the latest and greatest
> :-).
> 
> $ svn log http://svn.delph-in.net/wsi/trunk |head
> ------------------------------------------------------------------------
> r27878 | rpearah at uw.edu | 2019-05-25 20:30:59 +0200 (Sat, 25 May 2019) | 1 line
> 
> chore: ? Add missing dependencies to pom.xml
> ------------------------------------------------------------------------
> r27877 | rpearah at uw.edu | 2019-05-25 20:30:54 +0200 (Sat, 25 May 2019) | 1 line
> 
> style: ? Some minor style changes to MRS representation
> ------------------------------------------------------------------------
> r27804 | rpearah at uw.edu | 2019-05-15 23:08:31 +0200 (Wed, 15 May 2019) | 1 line
> 
>> 1. New code (not java based) for transform the semantic representations to RDF
>> 2. New code (not java based) to transform WQL to SPARQL.
> 
> yes, the first of these is also something i have been meaning to do
> natively in lisp, i.e. export directly to RDF, rather than export to
> those [incr tsdb()] ASCII files and then parse these in java, to
> convert to RDF.  i believe we should have turtle 'ontologies' (or
> schemas, if you will) for the various RDF representations, i.e. at
> least MRS, EDS, and DM.  i am tempted to migrate the WSI code from SVN
> to M$ GitHub, and then we could maybe collect these schemas there, and
> you could look into generating the RDF serializations without java?
> 
> as for the second, the WQL parser is fairly tightly integrated with
> the web application and RDF back-end ... here i am not as sure that
> isolating just the parser will be worthwhile?  i take it you are about
> as eager a java person as i am :-)?
> 
> best wishes, oe


From oe at ifi.uio.no  Tue Jul 21 22:48:45 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Tue, 21 Jul 2020 22:48:45 +0200
Subject: [developers] semantic representations in RDF
In-Reply-To: <B1D5F497-5247-4910-8290-91EFD7251EA2@gmail.com>
References: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
	<CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>
	<AA23B674-E7A0-4F3F-8E4D-3E10B4053D68@gmail.com>
	<CA+_Fm6Ljts-KiQD-qAxyO+d79qphyzetibnm2P3YRb=nRoKqCg@mail.gmail.com>
	<B1D5F497-5247-4910-8290-91EFD7251EA2@gmail.com>
Message-ID: <CA+_Fm6KTPS-EuuDPqm2HQOfwRRLE2qgDsjLCpQO9WUa+n_RR_Q@mail.gmail.com>

> For the SVN to Git, if you agree, I can repeat the process that I executed to FFTB (reported in another email today) to create a git clone from the WSI SVN. Maybe if Michael add me in the Delphin-in organization I can already create the repository there.

actually, please let me manage this migration.  rather than putting a
git front-end on top of the current SVN repository, i am tempted to
use WSI as a full migration pilot, i.e. dump the complete repository
history from SVN, import all of it into a fresh git repository, and
then host that on M$ GitHub.  that way, i can hope to reduce the
importance of the DELPH-IN SubVersioN server over time ...

cheers, oe

ps: regarding differences between your local WSI instance and the one
at 'wesearch.delph-in.net': that is quite possible: i am currently
running an older version of the code (because, as i confessed during
the summit, i have yet to work out how to re-build the application
with the latest patches from UW and load that into my tomcat; it
appears that you are getting the lucene first-line line, to expand
queries and avoid regular expression matching on the triple store,
whereas i seem to not be getting that in the cases you observe).

From oe at ifi.uio.no  Tue Jul 21 23:44:53 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Tue, 21 Jul 2020 23:44:53 +0200
Subject: [developers] semantic representations in RDF
In-Reply-To: <B1D5F497-5247-4910-8290-91EFD7251EA2@gmail.com>
References: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
	<CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>
	<AA23B674-E7A0-4F3F-8E4D-3E10B4053D68@gmail.com>
	<CA+_Fm6Ljts-KiQD-qAxyO+d79qphyzetibnm2P3YRb=nRoKqCg@mail.gmail.com>
	<B1D5F497-5247-4910-8290-91EFD7251EA2@gmail.com>
Message-ID: <CA+_Fm6KdFV9CTznbCeUKsYA2o5dj+RaePz_gTN8=x7kGuTqC6Q@mail.gmail.com>

> Note how in the local instance, the pattern `_fi*` is transformed into an enumeration of the predicates found in the dataset:
>
> { ?100 eds:predicate "_fight_n_1"^^xsd:string } UNION { ?100 eds:predicate "_fight_v_1"^^xsd:string }
>
> But in the SPARQL on the delph-in.net server, the pattern is transformed into a regex filter
>
> regex(?100TEXT, "^_fi.*$?)

actually, this kind of expansion (a query optimization, using a
first-line lucene index of known strings) appears to be sensitive to
the size of the expansion set.  i can confirm that (on the reference
WSI instance) '_fi*' is matched using a (slow) regular expression
(filter), whereas '_fight*' gets expanded; see the attachment.
presumably you just have a smaller index in your local instance?

the original WSI developer was an experienced enterprise coder, so i
am not surprised (but impressed) he implemented it this way:
presumably there is a tipping point in efficiency by querying with a
disjunction of specific strings vs. filtering candidate matches using
a regular expression ...

cheers, oe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2020-07-21 23-41-45.png
Type: image/png
Size: 318658 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200721/bdb9902e/attachment-0001.png>

From arademaker at gmail.com  Wed Jul 22 04:50:52 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 21 Jul 2020 23:50:52 -0300
Subject: [developers] semantic representations in RDF
In-Reply-To: <CA+_Fm6KTPS-EuuDPqm2HQOfwRRLE2qgDsjLCpQO9WUa+n_RR_Q@mail.gmail.com>
References: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
	<CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>
	<AA23B674-E7A0-4F3F-8E4D-3E10B4053D68@gmail.com>
	<CA+_Fm6Ljts-KiQD-qAxyO+d79qphyzetibnm2P3YRb=nRoKqCg@mail.gmail.com>
	<B1D5F497-5247-4910-8290-91EFD7251EA2@gmail.com>
	<CA+_Fm6KTPS-EuuDPqm2HQOfwRRLE2qgDsjLCpQO9WUa+n_RR_Q@mail.gmail.com>
Message-ID: <31C42B7C-0457-47F2-A97A-1F12ABDC4E30@gmail.com>


Sure, even better having the full migration to git/GitHub! Great, I will be waiting you. Regarding the lucene first-line search, thank you for clarifying what is going on. It is really hard to understand the Java code, the logic is spread in many different Java files under a deep nesting of folders!

?Object-oriented programming is an exceptionally bad idea which could only have originated in California.?
? Edsger W. Dijkstra

;-)

But isn?t it a kind of optimisation that we should expect from the triple store. I will make some tests with Allegro Graph. 

Best,
Alexandre


> On 21 Jul 2020, at 17:48, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
>> For the SVN to Git, if you agree, I can repeat the process that I executed to FFTB (reported in another email today) to create a git clone from the WSI SVN. Maybe if Michael add me in the Delphin-in organization I can already create the repository there.
> 
> actually, please let me manage this migration.  rather than putting a
> git front-end on top of the current SVN repository, i am tempted to
> use WSI as a full migration pilot, i.e. dump the complete repository
> history from SVN, import all of it into a fresh git repository, and
> then host that on M$ GitHub.  that way, i can hope to reduce the
> importance of the DELPH-IN SubVersioN server over time ...
> 
> cheers, oe
> 
> ps: regarding differences between your local WSI instance and the one
> at 'wesearch.delph-in.net': that is quite possible: i am currently
> running an older version of the code (because, as i confessed during
> the summit, i have yet to work out how to re-build the application
> with the latest patches from UW and load that into my tomcat; it
> appears that you are getting the lucene first-line line, to expand
> queries and avoid regular expression matching on the triple store,
> whereas i seem to not be getting that in the cases you observe).


From goodman.m.w at gmail.com  Wed Jul 22 15:51:42 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Wed, 22 Jul 2020 21:51:42 +0800
Subject: [developers] First step for the clone of FFTB SVN in GitHub
In-Reply-To: <B8338B8C-0E23-41EC-ABC1-2F1094631515@gmail.com>
References: <B8338B8C-0E23-41EC-ABC1-2F1094631515@gmail.com>
Message-ID: <CAGXBFArZ+n5XaBwj2=NC5zw-rK=+hOHBe3zOQgnMstiOvYd9tw@mail.gmail.com>

Hi Alexandre,

On Wed, Jul 22, 2020 at 12:10 AM Alexandre Rademaker <arademaker at gmail.com>
wrote:

> mkdir treebank
> cd treebank
> git svn init http://sweaglesw.org/svn/treebank --stdlayout --prefix=svn/
> for tag in `git branch -r | grep "tags/" | sed 's/ tags\///'`; do git
> branch $tag refs/remotes/$tag; done
> git svn fetch
>

This looks more or less right, but you didn't link the SVN author(s) to
GitHub accounts using --authors-file. Also I think it makes more sense to
create Git tags for the SVN tags instead of branches.

I also tried importing this repo using GitHub's importer (
https://docs.github.com/en/github/importing-your-projects-to-github/about-github-importer)
and it worked great. It's very simple and results in a Git-like repository,
but unfortunately it does not keep the SVN tracking information needed for
proper mirroring. It would work better for a one-time import.

So I suggest recreating your repo as follows. First create the authors file:

    cat > authors.txt <<EOF
    sweaglesw = sweaglesw <sweaglesw at users.noreply.github.com>
    (no author) = (no author) <(no author)>
    EOF

Then clone (does init and fetch in one command):

    git svn clone http://sweaglesw.org/svn/treebank --stdlayout
--prefix=svn/ --authors-file=authors.txt

Then convert remote tag-branches to Git tags:

    cd treebank
    git for-each-ref --format="%(refname:lstrip=-1) %(objectname)"
refs/remotes/svn/tags |
    while read ref; do
      git tag $ref;
    done

And replace the 'master' branch with 'trunk':

    git checkout -b trunk
    git branch -d master


> The SVN repository (http://sweaglesw.org/svn/treebank/) didn?t contain
> branches, so I have created branches for the two tags I found: `foo` and
> `packard-2015`. Note that these tags do not look very interesting and maybe
> they could be removed in the SVN repository. That would make the process
> even simpler, tracking only the trunk branch.
>

You're probably right about 'foo', but the 'packard-2015' tag points to the
revision used for Woodley's thesis and IWCS 2015 paper so I don't think we
should discard that one. But maybe it doesn't matter for a mirror since it
still exists in the SVN repo?


We don?t have automation yet, if Woodley update its SVN, all I need to do
> is:
>
> git svn fetch
> git push --all -u origin
>

When you do `git svn fetch`, it retrieves the commits from the remote SVN
repository but doesn't incorporate them into your local tree. Use `git svn
rebase` instead.


>
> Hi Michael and Stephan, any comments about the solution above? Michael is
> the only user in the https://github.com/delph-in organization? Can you
> add me so I can move this repository to the delph-in org?
>

Francis invited you to be a member of the organization. I just updated that
invitation so you'd be added to the "Sweagles" team, which has admin rights
to the delph-in/FFTB repository I created. Once you accept the invitation,
you can do the following to push your repo:

    git remote add origin https://github.com/delph-in/FFTB.git
    git push -u origin --all
    git push -u origin --tags

(I called the remote repo FFTB instead of treebank because it's more
distinctive and recognizable, I think.)

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200722/5d24c791/attachment.html>

From arademaker at gmail.com  Wed Jul 22 16:26:05 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 22 Jul 2020 11:26:05 -0300
Subject: [developers] First step for the clone of FFTB SVN in GitHub
In-Reply-To: <CA+_Fm6L+AnQGpqWHLNfK4u_cAW3+Sq3E+T4mta2G_oFHd3LVng@mail.gmail.com>
References: <B8338B8C-0E23-41EC-ABC1-2F1094631515@gmail.com>
	<CAGXBFArZ+n5XaBwj2=NC5zw-rK=+hOHBe3zOQgnMstiOvYd9tw@mail.gmail.com>
	<CA+_Fm6L+AnQGpqWHLNfK4u_cAW3+Sq3E+T4mta2G_oFHd3LVng@mail.gmail.com>
Message-ID: <A5BE82E8-516C-4C40-8991-B2751B9F3E89@gmail.com>


+1. I agree with Stephan. I have chosen treebank mainly to keep the same name of the original  Woodley repository. But I agree with `fftb` (lowercase). Michael, can you give me admin access to https://github.com/delph-in/FFTB? So I can have more flexibility. 

I will answer the other email from Michael next.

Best,
Alexandre


> On 22 Jul 2020, at 11:17, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
>> (I called the remote repo FFTB instead of treebank because it's more distinctive and recognizable, I think.)
> 
> with my over-developed sense of aesthetics, i am wondering about the
> use of capitalization.  there are 'erg', 'jacy', 'pydelphin', etc.
> from before (and the 'JaEn' exception, but francis is of course
> special).  should the DELPH-IN organization possibly standardize on
> all-lowercase repository names?
> 
> oe


From goodman.m.w at gmail.com  Wed Jul 22 16:57:07 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Wed, 22 Jul 2020 22:57:07 +0800
Subject: [developers] First step for the clone of FFTB SVN in GitHub
In-Reply-To: <A5BE82E8-516C-4C40-8991-B2751B9F3E89@gmail.com>
References: <B8338B8C-0E23-41EC-ABC1-2F1094631515@gmail.com>
	<CAGXBFArZ+n5XaBwj2=NC5zw-rK=+hOHBe3zOQgnMstiOvYd9tw@mail.gmail.com>
	<CA+_Fm6L+AnQGpqWHLNfK4u_cAW3+Sq3E+T4mta2G_oFHd3LVng@mail.gmail.com>
	<A5BE82E8-516C-4C40-8991-B2751B9F3E89@gmail.com>
Message-ID: <CAGXBFArezbE4pPmtcQ5TnpB3Q3qS3=VdKgf7Zz2wUY451XekkA@mail.gmail.com>

Done. fftb it is.

On Wed, Jul 22, 2020 at 10:27 PM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> +1. I agree with Stephan. I have chosen treebank mainly to keep the same
> name of the original  Woodley repository. But I agree with `fftb`
> (lowercase). Michael, can you give me admin access to
> https://github.com/delph-in/FFTB? So I can have more flexibility.
>

You have admin access because you're in the "Sweagles" team. It may have
dropped momentarily when I renamed the repo, but I've re-added it. Try
again?


>
> I will answer the other email from Michael next.
>
> Best,
> Alexandre
>
>
> > On 22 Jul 2020, at 11:17, Stephan Oepen <oe at ifi.uio.no> wrote:
> >
> >> (I called the remote repo FFTB instead of treebank because it's more
> distinctive and recognizable, I think.)
> >
> > with my over-developed sense of aesthetics, i am wondering about the
> > use of capitalization.  there are 'erg', 'jacy', 'pydelphin', etc.
> > from before (and the 'JaEn' exception, but francis is of course
> > special).  should the DELPH-IN organization possibly standardize on
> > all-lowercase repository names?
>

As long as we don't have to write emails in all-lowercase, then I don't
mind :)


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200722/85704dba/attachment.html>

From arademaker at gmail.com  Sat Jul 25 03:06:23 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 24 Jul 2020 22:06:23 -0300
Subject: [developers] semantic representations in RDF
In-Reply-To: <CA+_Fm6Ljts-KiQD-qAxyO+d79qphyzetibnm2P3YRb=nRoKqCg@mail.gmail.com>
References: <E0155591-862E-4D3A-83D8-40512B3D4466@gmail.com>
	<CA+_Fm6Jr=iuSxG=7xRuO-56_h=Kt3k7gNy9fSwpQfsGmMbV8zw@mail.gmail.com>
	<AA23B674-E7A0-4F3F-8E4D-3E10B4053D68@gmail.com>
	<CA+_Fm6Ljts-KiQD-qAxyO+d79qphyzetibnm2P3YRb=nRoKqCg@mail.gmail.com>
Message-ID: <F5F3E1C2-F64A-4E2C-A771-5F85070D0F77@gmail.com>


Hi,

I have created a repository for sharing examples and develop the RDF schemas for all DELPH-IN representations. Comments are welcome. This is just initial (and incomplete) ideas. I started with one example of EDS, the RDF original representation (produced by the WSI code) and my suggestions to fix and simplify the schema.

I could have used the wiki for this discussion, but I thought it would be interesting to try the GitHub features: issues, PR, online editing of files etc. Latter, we can move the documentation to the wiki.

https://github.com/arademaker/delph-in-rdf

Stephan mentioned that he wants to move the http://moin.delph-in.net/WeSearch/Interface to GitHub. My repository above is NOT  about the WSI and can be taken as a temporary place for the discussions of the RDF schemas; I will be waiting for Stephan to start conversations about the implementation of the transformations and possible improvements in the current WSI code.

Best,
Alexandre


> On 21 Jul 2020, at 15:57, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> i am tempted to migrate the WSI code from SVN
> to M$ GitHub, and then we could maybe collect these schemas there, and
> you could look into generating the RDF serializations without java?


From arademaker at gmail.com  Tue Jul 28 16:37:20 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 28 Jul 2020 11:37:20 -0300
Subject: [developers] [sdp-organizers] From EDS/RMS to DM
In-Reply-To: <16C87E38-C822-47D2-956F-784A581D5639@gmail.com>
References: <C6EA0A36-E668-438F-A84D-67832409A597@gmail.com>
	<CA+_Fm6LqYBLBeZ8KUtfL803Jf6hPQ5DZN+LMGwvXsh-tbM0Opw@mail.gmail.com>
	<EC6DA045-CB16-414B-BFE8-0BD2F5A1D0B0@gmail.com>
	<CA+_Fm6LC4t48jE4XWttLxM0=59LHFO0CG7sZc8LvACK3M+F5Wg@mail.gmail.com>
	<29C60769-FC12-4663-8BCF-A1DC52155B8F@gmail.com>
	<16C87E38-C822-47D2-956F-784A581D5639@gmail.com>
Message-ID: <D11FE6D0-8453-43F6-BB41-EF5C2D2E253F@gmail.com>


Hi Stephan,

While processing a sample of the wordnet glosses, the redwoods script produced two invalid .gz files. One example is for the sentence: "a historical region in central and northern Yugoslavia; Serbs settled the region in the 6th and 7th centuries"

See the derivation node:

(527 #<Printer Error, obj=#x10000000fc9: Null lexical entry type NIL>)

In the result file of the profile, the derivation node looks fine, the 334.gz is attached.

(527 a_det_rbst 0.000000 0 1 ("a" 347 "token [ +FORM \\"a\\" +FROM \\"0\\" +TO \\"1\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:1>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"a\\" +TICK + +ONSET c-or-v-onset ]?))

I am trying to understand the lisp code of redwoods.lisp, but without being able to load it in my slime environment, navigating in the source code and debugging is a nightmare. I know that export-tree is doing more than just copy the derivation tree from the profile, but I didn?t understand what it is doing with the derivations, it is hard to have the `big picture`. BTW, you really like the `loop` macro! ;-)

These errors cause the dtm script to fail, although I should not expect it to work with the current trunk version of ERG, dm.cfg was not changed since 2012.

% svn info etc/dm.cfg
Path: etc/dm.cfg
Name: dm.cfg
Working Copy Root Path: /Users/ar/hpsg/terg
URL: http://svn.delph-in.net/erg/trunk/etc/dm.cfg
Relative URL: ^/erg/trunk/etc/dm.cfg
Repository Root: http://svn.delph-in.net
Repository UUID: 3df82f5b-d43a-0410-af33-fce91db48ec5
Revision: 28882
Node Kind: file
Schedule: normal
Last Changed Author: oe
Last Changed Rev: 12172
Last Changed Date: 2012-12-01 18:54:20 -0200 (Sat, 01 Dec 2012)
Text Last Updated: 2019-02-07 20:21:10 -0200 (Thu, 07 Feb 2019)
Checksum: b8097dfbd5cc9b9d654233314006f8c8b0fcecaa

Since my goal is to have at least one bi-lexical format in the WSI interface, I am still trying to understand what the dtm (converter) does. The converter.pdf explains how to use the code, input/output, but it doesn't disclose its logic, the high-level description of the system. Eventually, we can reimplement the dtm using pydelphin (see https://github.com/delph-in/pydelphin/issues/122). The error that I have reported in my previous message when I call redwoods with the dm in `--export input,derivation,mrs,eds,dm` is probably related to what I am showing here since the `dm-construct` function end ups calling the python dtm.py code. Finally, the handling of `:dm` keyword was not copied to the lkb-fos/src/tsdb/lisp/ source code. But I am sure you and John are both aware of that.

As always, comments and possible references are welcome! ;-)

Best,
Alexandre

PS: I know that all these errors are expected since, as you said, `I am venturing into unexplored territory` by mixing the ?classic? DELPHIN toolchain with the 'modern tools from the pacific northwest?. Yes, I am processing the profiles with ACE/pydelphin and ?exporting? data (derivation, input, MRS and EDS) from them with redwoods lisp code. But I assume we aim at have interoperability between the tools, right? That is my motivation to keep reporting the errors. Please, correct me if I am wrong.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 334.gz
Type: application/x-gzip
Size: 3753 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200728/2fe7849a/attachment.gz>
-------------- next part --------------


From oe at ifi.uio.no  Tue Jul 28 18:08:29 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Tue, 28 Jul 2020 18:08:29 +0200
Subject: [developers] [sdp-organizers] From EDS/RMS to DM
In-Reply-To: <D11FE6D0-8453-43F6-BB41-EF5C2D2E253F@gmail.com>
References: <C6EA0A36-E668-438F-A84D-67832409A597@gmail.com>
	<CA+_Fm6LqYBLBeZ8KUtfL803Jf6hPQ5DZN+LMGwvXsh-tbM0Opw@mail.gmail.com>
	<EC6DA045-CB16-414B-BFE8-0BD2F5A1D0B0@gmail.com>
	<CA+_Fm6LC4t48jE4XWttLxM0=59LHFO0CG7sZc8LvACK3M+F5Wg@mail.gmail.com>
	<29C60769-FC12-4663-8BCF-A1DC52155B8F@gmail.com>
	<16C87E38-C822-47D2-956F-784A581D5639@gmail.com>
	<D11FE6D0-8453-43F6-BB41-EF5C2D2E253F@gmail.com>
Message-ID: <CA+_Fm6+TcOtQ33en1wOdXKx+3FPuyT4xQp8u12bwY6vGTmM3nQ@mail.gmail.com>

the export code will want to rebuild the derivation, i.e. the version of
the grammar loaded needs to be fully compatible with the treebank (or
parsed profile).  i wonder whether ?a_det_rbst? is available at the time of
exporting?  it sounds like a mal-configuration of the grammar, maybe?
 which you would have to match on the LKB side then, e.g. push the right
feature or load the right ?script? file?

greetings from the road (metaphorically), oe


On Tue, 28 Jul 2020 at 16:38 Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> Hi Stephan,
>
> While processing a sample of the wordnet glosses, the redwoods script
> produced two invalid .gz files. One example is for the sentence: "a
> historical region in central and northern Yugoslavia; Serbs settled the
> region in the 6th and 7th centuries"
>
> See the derivation node:
>
> (527 #<Printer Error, obj=#x10000000fc9: Null lexical entry type NIL>)
>
> In the result file of the profile, the derivation node looks fine, the
> 334.gz is attached.
>
> (527 a_det_rbst 0.000000 0 1 ("a" 347 "token [ +FORM \\"a\\" +FROM \\"0\\"
> +TO \\"1\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ]
> LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [
> +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE
> non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics
> +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list*
> LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null
> [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:1>\\" +LL ctype [
> -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"a\\" +TICK + +ONSET
> c-or-v-onset ]?))
>
> I am trying to understand the lisp code of redwoods.lisp, but without
> being able to load it in my slime environment, navigating in the source
> code and debugging is a nightmare. I know that export-tree is doing more
> than just copy the derivation tree from the profile, but I didn?t
> understand what it is doing with the derivations, it is hard to have the
> `big picture`. BTW, you really like the `loop` macro! ;-)
>
> These errors cause the dtm script to fail, although I should not expect it
> to work with the current trunk version of ERG, dm.cfg was not changed since
> 2012.
>
> % svn info etc/dm.cfg
> Path: etc/dm.cfg
> Name: dm.cfg
> Working Copy Root Path: /Users/ar/hpsg/terg
> URL: http://svn.delph-in.net/erg/trunk/etc/dm.cfg
> Relative URL: ^/erg/trunk/etc/dm.cfg
> Repository Root: http://svn.delph-in.net
> Repository UUID: 3df82f5b-d43a-0410-af33-fce91db48ec5
> Revision: 28882
> Node Kind: file
> Schedule: normal
> Last Changed Author: oe
> Last Changed Rev: 12172
> Last Changed Date: 2012-12-01 18:54:20 -0200 (Sat, 01 Dec 2012)
> Text Last Updated: 2019-02-07 20:21:10 -0200 (Thu, 07 Feb 2019)
> Checksum: b8097dfbd5cc9b9d654233314006f8c8b0fcecaa
>
> Since my goal is to have at least one bi-lexical format in the WSI
> interface, I am still trying to understand what the dtm (converter) does.
> The converter.pdf explains how to use the code, input/output, but it
> doesn't disclose its logic, the high-level description of the system.
> Eventually, we can reimplement the dtm using pydelphin (see
> https://github.com/delph-in/pydelphin/issues/122). The error that I have
> reported in my previous message when I call redwoods with the dm in
> `--export input,derivation,mrs,eds,dm` is probably related to what I am
> showing here since the `dm-construct` function end ups calling the python
> dtm.py code. Finally, the handling of `:dm` keyword was not copied to the
> lkb-fos/src/tsdb/lisp/ source code. But I am sure you and John are both
> aware of that.
>
> As always, comments and possible references are welcome! ;-)
>
> Best,
> Alexandre
>
> PS: I know that all these errors are expected since, as you said, `I am
> venturing into unexplored territory` by mixing the ?classic? DELPHIN
> toolchain with the 'modern tools from the pacific northwest?. Yes, I am
> processing the profiles with ACE/pydelphin and ?exporting? data
> (derivation, input, MRS and EDS) from them with redwoods lisp code. But I
> assume we aim at have interoperability between the tools, right? That is my
> motivation to keep reporting the errors. Please, correct me if I am wrong.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200728/1854c6b6/attachment.html>

From arademaker at gmail.com  Tue Jul 28 21:06:13 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 28 Jul 2020 16:06:13 -0300
Subject: [developers] [sdp-organizers] From EDS/RMS to DM
In-Reply-To: <CA+_Fm6+TcOtQ33en1wOdXKx+3FPuyT4xQp8u12bwY6vGTmM3nQ@mail.gmail.com>
References: <C6EA0A36-E668-438F-A84D-67832409A597@gmail.com>
	<CA+_Fm6LqYBLBeZ8KUtfL803Jf6hPQ5DZN+LMGwvXsh-tbM0Opw@mail.gmail.com>
	<EC6DA045-CB16-414B-BFE8-0BD2F5A1D0B0@gmail.com>
	<CA+_Fm6LC4t48jE4XWttLxM0=59LHFO0CG7sZc8LvACK3M+F5Wg@mail.gmail.com>
	<29C60769-FC12-4663-8BCF-A1DC52155B8F@gmail.com>
	<16C87E38-C822-47D2-956F-784A581D5639@gmail.com>
	<D11FE6D0-8453-43F6-BB41-EF5C2D2E253F@gmail.com>
	<CA+_Fm6+TcOtQ33en1wOdXKx+3FPuyT4xQp8u12bwY6vGTmM3nQ@mail.gmail.com>
Message-ID: <2E976150-CF4E-4424-A7F1-F2AE093D88CD@gmail.com>


> On 28 Jul 2020, at 13:08, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> the export code will want to rebuild the derivation, i.e. the version of the grammar loaded needs to be fully compatible with the treebank (or parsed profile).

Do you mean that `redwoods` reads the derivation just to check if the grammar passed as parameter to it was compatible with the grammar used to process the profile? So can I bypass this check and simply copy the derivation tree to the .gz file?

What does it means a grammar be full compatible with a profile? Does it means that the grammar is the same used to process the profile? 

> i wonder whether ?a_det_rbst? is available at the time of exporting?  it sounds like a mal-configuration of the grammar, maybe?  
> which you would have to match on the LKB side then, e.g. push the right feature or load the right ?script? file?

Yes, you are right. I found this entry in the lexicon-rbst.tdl:

a_det_rbst := d_-_sg-a_le_mal &
 [ ORTH < "a" >,
   SYNSEM [ LKEYS.KEYREL.PRED _a_q_rel,
            PHON.ONSET voc ] ].


This file is included in the english.tdl file and ACE loads to the ace/config.tdl that declares english.tdl as the grammar-top. But LKB loads the lkb/script and it doesn?t mentioned the english.tdl? So you are probably right. Unfortunately, I don?t know how to make LKB load the same grammar files that ACE is loading. 

I suspect this situation is what Michael would like to avoid when he proposed the http://moin.delph-in.net/VirtualSharedConfigs discussion. So far, I was considering that making logon and ACE pointing to the terg trunk would be enough, now I am realising that I wasn?t paying attention to the configurations. 

I hope Dan is reading this thread!! ;-) 

Maybe a easier solution would be to use the last stable release of ERG where lkb/script and ace/config.tdl should be compatible. But my LOGON/lingo/erg/Version.lsp has `(defparameter *grammar-version* "ERG (1214)?)`. The LOGON/lingo/terg/Version.lisp has `(defparameter *grammar-version* "ERG (trunk)?)`. How to make LOGON use ERG 2018 instead of 1214?

> greetings from the road (metaphorically), oe

Thank you.

Best,
Alexandre


From oe at ifi.uio.no  Thu Jul 30 13:15:47 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Thu, 30 Jul 2020 13:15:47 +0200
Subject: [developers] [sdp-organizers] From EDS/RMS to DM
In-Reply-To: <2E976150-CF4E-4424-A7F1-F2AE093D88CD@gmail.com>
References: <C6EA0A36-E668-438F-A84D-67832409A597@gmail.com>
	<CA+_Fm6LqYBLBeZ8KUtfL803Jf6hPQ5DZN+LMGwvXsh-tbM0Opw@mail.gmail.com>
	<EC6DA045-CB16-414B-BFE8-0BD2F5A1D0B0@gmail.com>
	<CA+_Fm6LC4t48jE4XWttLxM0=59LHFO0CG7sZc8LvACK3M+F5Wg@mail.gmail.com>
	<29C60769-FC12-4663-8BCF-A1DC52155B8F@gmail.com>
	<16C87E38-C822-47D2-956F-784A581D5639@gmail.com>
	<D11FE6D0-8453-43F6-BB41-EF5C2D2E253F@gmail.com>
	<CA+_Fm6+TcOtQ33en1wOdXKx+3FPuyT4xQp8u12bwY6vGTmM3nQ@mail.gmail.com>
	<2E976150-CF4E-4424-A7F1-F2AE093D88CD@gmail.com>
Message-ID: <CA+_Fm6KxLPOm9NVMG8a4U-6QMLygL9K6=xhR25vT17LxZ+C7aw@mail.gmail.com>

hi again, alexandre,

in general, i used to recommend that most users work with actual ERG
releases rather than with whatever state you find in the trunk on a
given day (which, after all, is an internal work in progress).  from
your observations, it sounds as if dan (possibly around his joint work
with colleagues at NTU) is experimenting with a mal-configuration of
the ERG, and just now at least the default parameterization of the
grammar in ACE differs from the defaults in the LKB and PET; that
would likely not be the case in a release.  from what you describe, i
doubt you want the mal-extensions in your parses?

for a grammar to be compatible with a treebank means that it can
re-build all derivation trees recorded in the profile.  the 'same'
grammar will always be compatible, but sometimes it can be desirable
to actually improve (or revise) the grammar in ways that do not
inhibit re-unification of derivation trees but change the contents of
the feature structures and, thus, derived representations like the
MRS, EDS, DM, etc.  this is one aspect in which we refer to the
Redwoods treebanking approach as 'dynamic': the gold-standard HPSG
derivation can be output in various derived views.

exporting from a treebank is an interpretative process, i.e. there is
no way to make it succeed (in how i designed things in [incr tsdb()]
at least) without re-building all recorded derivations.  arguably,
MRSs should not be recorded in the treebanked profiles (they are there
in recent ERG releases for convenience).  the LOGON 'redwoods' scripts
can be forced to always re-compute them, using the '/blind' modifier
on its '--export' option.

the LOGON environment provides the 'terg' (for trunk or test or trial)
target so that users can put a grammar version of their choice there;
please see the 'LogonExtras' page on the wiki for details; i expect it
should work to 'switch' to the 2018 release of the ERG roughly as
follows

cd $LOGONROOT/terg
svn switch $LOGONSVN/erg/tags/2018

once you are in a universe with a grammar (when loaded into the LKB)
that matches your treebanked derivations, i would hope that exporting
to DM will also become functional?  as you note, there is a
non-trivial amount  of grammar-specific configuration in the DM
converter (categorizing different predicates into the various classes
distinguished by ivanova et al., 2012), which could lead to
sub-optimal results here and there.  however, from what i know about
the ERG evolution between 1214 and 2018, i believe the MRSs have been
comparatively stable, so DM exports from a 2018 treebank should still
be decent, i would hope!

best wishes, oe

On Tue, Jul 28, 2020 at 9:07 PM Alexandre Rademaker
<arademaker at gmail.com> wrote:
>
>
> > On 28 Jul 2020, at 13:08, Stephan Oepen <oe at ifi.uio.no> wrote:
> >
> > the export code will want to rebuild the derivation, i.e. the version of the grammar loaded needs to be fully compatible with the treebank (or parsed profile).
>
> Do you mean that `redwoods` reads the derivation just to check if the grammar passed as parameter to it was compatible with the grammar used to process the profile? So can I bypass this check and simply copy the derivation tree to the .gz file?
>
> What does it means a grammar be full compatible with a profile? Does it means that the grammar is the same used to process the profile?
>
> > i wonder whether ?a_det_rbst? is available at the time of exporting?  it sounds like a mal-configuration of the grammar, maybe?
> > which you would have to match on the LKB side then, e.g. push the right feature or load the right ?script? file?
>
> Yes, you are right. I found this entry in the lexicon-rbst.tdl:
>
> a_det_rbst := d_-_sg-a_le_mal &
>  [ ORTH < "a" >,
>    SYNSEM [ LKEYS.KEYREL.PRED _a_q_rel,
>             PHON.ONSET voc ] ].
>
>
> This file is included in the english.tdl file and ACE loads to the ace/config.tdl that declares english.tdl as the grammar-top. But LKB loads the lkb/script and it doesn?t mentioned the english.tdl? So you are probably right. Unfortunately, I don?t know how to make LKB load the same grammar files that ACE is loading.
>
> I suspect this situation is what Michael would like to avoid when he proposed the http://moin.delph-in.net/VirtualSharedConfigs discussion. So far, I was considering that making logon and ACE pointing to the terg trunk would be enough, now I am realising that I wasn?t paying attention to the configurations.
>
> I hope Dan is reading this thread!! ;-)
>
> Maybe a easier solution would be to use the last stable release of ERG where lkb/script and ace/config.tdl should be compatible. But my LOGON/lingo/erg/Version.lsp has `(defparameter *grammar-version* "ERG (1214)?)`. The LOGON/lingo/terg/Version.lisp has `(defparameter *grammar-version* "ERG (trunk)?)`. How to make LOGON use ERG 2018 instead of 1214?
>
> > greetings from the road (metaphorically), oe
>
> Thank you.
>
> Best,
> Alexandre
>


From oe at ifi.uio.no  Fri Jul 31 18:52:45 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Fri, 31 Jul 2020 18:52:45 +0200
Subject: [developers] consolidating LKB and [incr tsdb()] versions
In-Reply-To: <CA+_Fm6KNwV6E3h=Gk2mav_NjQYvKK32kALpDZEf3JhrDdpQO3A@mail.gmail.com>
References: <CA+_Fm6KNwV6E3h=Gk2mav_NjQYvKK32kALpDZEf3JhrDdpQO3A@mail.gmail.com>
Message-ID: <CA+_Fm6JGWArGUrQR-JSNedpm--4arw16CvQtne_1GKyZnDzRoA@mail.gmail.com>

hi again, john (and all):

ann and emily, please read on.

> in terms of internal DELPH-IN responsibilities, the current LKB trunk used to be packaged via what we call the LinGO builds (http://lingo.delph-in.net) and the Ubuntu+LKB live CD (https://wiki.ling.washington.edu/bin/view.cgi/Main/KnoppixLKB).  both are maintained by UW, so i am not quite sure about the breadth of their user base?  but i am wondering whether these UW builds could move to using the FOS code in the foreseeable future (seeing as i am proposing to officially call an end ?

i sent the above (incomplete) message somewhat hastily during the
summit, so that folks in the LKB (FOS) tutorial could look at it then
and there.

in the meantime, john and i have started consolidating the FOS and the
LOGON branches of the LKB and [incr tsdb()] code bases.  we hope to
have a unified version available in a week or two, which would mean
that LKB and [incr tsdb()] functionality will again be more closely
synchronized across the two branches (some functionality will only
remain available in the LOGON environment, though, e.g. due to
dependencies on external, linux-only binaries).

we would like to propose that this new, actively developed version of
the LKB and [incr tsdb()] become the 'trunk' in the DELPH-IN
SubVersioN repository sometime in late august.  the current 'trunk'
(where there has not been active development for the past few years)
would then be preserved as a tag (or possibly a branch, if there was
an expectation of future revisions).  at that point, the LinGO builds
and Ubuntu+LKB creation (both maintained at UW) would likely need some
attention, to either work off the new tag or adapt the build
environment to the new version.

once the consolidation is complete, i would be tempted to spring-clean
the LKB and [incr tsdb()] code bases, seeking to remove 'dead' code,
for example sub-modules that have been superseded by newer
developments or have long fallen out of use and are not actively
maintained.  the following candidates for removal come to my mind (but
there are likely more):

src/glue/sppp.lsp (precursor to REPP; oe)
src/main/ltemplates.lsp (unused since mid-1990s; oe)
src/mrs/spell.lisp (ERG-specific and long unused; ann)
src/mrs/mrsmunge.lisp (superseded by transfer rules; ann)
src/mt/fragments.lisp (oe)
src/mt/smt.lisp (oe)
src/preprocess/ (SPPP reimplementation, SAF, SMAF; ben)

i imagine john and i should take a joint pass and compile a more
definite list of candidates for code cleaning for review.  in the list
above, i have tried to indicate who i believe was the original code
owner.  ann, hoping you have read this far: would you be okay with
some purging of long unused code?  everyone else, if you suspect you
might be using any of the above sub-modules, please get in touch with
john and me!

best wishes, oe


From aac10 at cl.cam.ac.uk  Fri Jul 31 19:25:23 2020
From: aac10 at cl.cam.ac.uk (Ann Copestake)
Date: Fri, 31 Jul 2020 18:25:23 +0100
Subject: [developers] consolidating LKB and [incr tsdb()] versions
In-Reply-To: <CA+_Fm6JGWArGUrQR-JSNedpm--4arw16CvQtne_1GKyZnDzRoA@mail.gmail.com>
References: <CA+_Fm6KNwV6E3h=Gk2mav_NjQYvKK32kALpDZEf3JhrDdpQO3A@mail.gmail.com>
	<CA+_Fm6JGWArGUrQR-JSNedpm--4arw16CvQtne_1GKyZnDzRoA@mail.gmail.com>
Message-ID: <2bb168af-0be2-f170-5337-c83881716389@cl.cam.ac.uk>

Go for it! The mrsmunge code was still used in a number of individual 
projects even after the creation of the much more powerful transfer rule 
mechanism, but (as far as I am concerned) has been entirely superseded 
by the python code that Alex (and others) wrote to manipulate DMRS.

Best wishes,

Ann

On 31/07/2020 17:52, Stephan Oepen wrote:
> hi again, john (and all):
>
> ann and emily, please read on.
>
>> in terms of internal DELPH-IN responsibilities, the current LKB trunk used to be packaged via what we call the LinGO builds (http://lingo.delph-in.net) and the Ubuntu+LKB live CD (https://wiki.ling.washington.edu/bin/view.cgi/Main/KnoppixLKB).  both are maintained by UW, so i am not quite sure about the breadth of their user base?  but i am wondering whether these UW builds could move to using the FOS code in the foreseeable future (seeing as i am proposing to officially call an end ?
> i sent the above (incomplete) message somewhat hastily during the
> summit, so that folks in the LKB (FOS) tutorial could look at it then
> and there.
>
> in the meantime, john and i have started consolidating the FOS and the
> LOGON branches of the LKB and [incr tsdb()] code bases.  we hope to
> have a unified version available in a week or two, which would mean
> that LKB and [incr tsdb()] functionality will again be more closely
> synchronized across the two branches (some functionality will only
> remain available in the LOGON environment, though, e.g. due to
> dependencies on external, linux-only binaries).
>
> we would like to propose that this new, actively developed version of
> the LKB and [incr tsdb()] become the 'trunk' in the DELPH-IN
> SubVersioN repository sometime in late august.  the current 'trunk'
> (where there has not been active development for the past few years)
> would then be preserved as a tag (or possibly a branch, if there was
> an expectation of future revisions).  at that point, the LinGO builds
> and Ubuntu+LKB creation (both maintained at UW) would likely need some
> attention, to either work off the new tag or adapt the build
> environment to the new version.
>
> once the consolidation is complete, i would be tempted to spring-clean
> the LKB and [incr tsdb()] code bases, seeking to remove 'dead' code,
> for example sub-modules that have been superseded by newer
> developments or have long fallen out of use and are not actively
> maintained.  the following candidates for removal come to my mind (but
> there are likely more):
>
> src/glue/sppp.lsp (precursor to REPP; oe)
> src/main/ltemplates.lsp (unused since mid-1990s; oe)
> src/mrs/spell.lisp (ERG-specific and long unused; ann)
> src/mrs/mrsmunge.lisp (superseded by transfer rules; ann)
> src/mt/fragments.lisp (oe)
> src/mt/smt.lisp (oe)
> src/preprocess/ (SPPP reimplementation, SAF, SMAF; ben)
>
> i imagine john and i should take a joint pass and compile a more
> definite list of candidates for code cleaning for review.  in the list
> above, i have tried to indicate who i believe was the original code
> owner.  ann, hoping you have read this far: would you be okay with
> some purging of long unused code?  everyone else, if you suspect you
> might be using any of the above sub-modules, please get in touch with
> john and me!
>
> best wishes, oe
>
>

From oe at ifi.uio.no  Sun Aug  2 14:44:21 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sun, 2 Aug 2020 14:44:21 +0200
Subject: [developers] extension to the REPP sub-formalism
Message-ID: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>

dear bec, mike, and woodley:

during the summit you may have noticed dan mentioning a 'war zone'
around NE-related token mapping rules in the current ERG trunk.  with
our move to modern, OntoNotes-style tokenization, the initial REPP
segmentation now breaks at dashes (including hyphens) and slashes.
but these will, of course, occur frequently in named entities like
email and web addresses, where they should preferably not be
segmented.  the current unhappy state of affairs is that initial
tokenization over-segments, with dan then heroically seeking to
re-unite at least the most common patterns of 'multi-token' named
entities in token mapping, where any number of token boundaries may
have been introduced at hyphens and slashes.

to rationalize this state of affairs (and, thus, work toward a peace
treaty in token mapping), i believe we will need to extend the REPP
language with a new facility: masking sub-strings according to NE-like
patterns prior to core REPP processing, and exempting masked regions
from all subsequent rewriting (i.e. making sure they remain intact).
i have added an example of this new facility (introducing the '+'
operator) to the ERG trunk; please see:

http://svn.delph-in.net/erg/trunk/rpp/ne.rpp

at present, these rules are only loaded into the LKB (where i am in
the process of adding masking to the REPP implementation), hence they
should not cause trouble in the other engines (i hope).  i would like
to invite you (as the developers of REPP processors in PET, pyDelphin,
and ACE, respectively) to look over this proposal and share any
comments you might have.  assuming we can agree on the need for
extending the REPP language along the above lines, i am hoping you
might have a chance to add support for the masking operator in your
REPP implementations?

from my ongoing work in the LKB, masking support appears relatively
straightforward once an engine implements the step-wise accounting for
character position sketched by Dridan & Oepen (2012; ACL).  the
masking patterns merely set a boolean flag for the matched character
positions, and subsequent rewriting must block rule applications that
destructively change one or more masked character positions.  output
of capture groups (copying from the left-hand side verbatim), on the
other hand, must be allowed over masked regions.  because the LKB
implementation predates the 2012 paper, however, i will first have to
implement the precise accounting mechanism to validate the above
expectation regarding how to realize masking.

what do you make of the above proposal?  oe

From sweaglesw at sweaglesw.org  Mon Aug  3 09:26:21 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Mon, 3 Aug 2020 00:26:21 -0700
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
References: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
Message-ID: <8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org>

Hi Stephan,

It looks from the file you referenced like the proposed new operation is '=' rather than '+'?

This seems like a plausible and modest addition to me, and should not be hard to implement.  I guess you will be limited to using this facility in cases where the designation as named entity is sufficiently unambiguous based on the RE alone.  It is tempting to contemplate ways in which REPP could offer ambiguous tokenization output here, but so far my imagination is too limited to come up with the scenario where it would be useful.

Woodley


> On Aug 2, 2020, at 5:44 AM, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> dear bec, mike, and woodley:
> 
> during the summit you may have noticed dan mentioning a 'war zone'
> around NE-related token mapping rules in the current ERG trunk.  with
> our move to modern, OntoNotes-style tokenization, the initial REPP
> segmentation now breaks at dashes (including hyphens) and slashes.
> but these will, of course, occur frequently in named entities like
> email and web addresses, where they should preferably not be
> segmented.  the current unhappy state of affairs is that initial
> tokenization over-segments, with dan then heroically seeking to
> re-unite at least the most common patterns of 'multi-token' named
> entities in token mapping, where any number of token boundaries may
> have been introduced at hyphens and slashes.
> 
> to rationalize this state of affairs (and, thus, work toward a peace
> treaty in token mapping), i believe we will need to extend the REPP
> language with a new facility: masking sub-strings according to NE-like
> patterns prior to core REPP processing, and exempting masked regions
> from all subsequent rewriting (i.e. making sure they remain intact).
> i have added an example of this new facility (introducing the '+'
> operator) to the ERG trunk; please see:
> 
> http://svn.delph-in.net/erg/trunk/rpp/ne.rpp
> 
> at present, these rules are only loaded into the LKB (where i am in
> the process of adding masking to the REPP implementation), hence they
> should not cause trouble in the other engines (i hope).  i would like
> to invite you (as the developers of REPP processors in PET, pyDelphin,
> and ACE, respectively) to look over this proposal and share any
> comments you might have.  assuming we can agree on the need for
> extending the REPP language along the above lines, i am hoping you
> might have a chance to add support for the masking operator in your
> REPP implementations?
> 
> from my ongoing work in the LKB, masking support appears relatively
> straightforward once an engine implements the step-wise accounting for
> character position sketched by Dridan & Oepen (2012; ACL).  the
> masking patterns merely set a boolean flag for the matched character
> positions, and subsequent rewriting must block rule applications that
> destructively change one or more masked character positions.  output
> of capture groups (copying from the left-hand side verbatim), on the
> other hand, must be allowed over masked regions.  because the LKB
> implementation predates the 2012 paper, however, i will first have to
> implement the precise accounting mechanism to validate the above
> expectation regarding how to realize masking.
> 
> what do you make of the above proposal?  oe


From goodman.m.w at gmail.com  Mon Aug  3 09:35:06 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Mon, 3 Aug 2020 15:35:06 +0800
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
References: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
Message-ID: <CAGXBFAruxuU+6tN6-N6vna7xT4aiX3BwvK0W=_x-M-UF96KW0w@mail.gmail.com>

Hi Stephan,

This sounds like a good solution. I have some questions/comments below.

On Sun, Aug 2, 2020 at 8:44 PM Stephan Oepen <oe at ifi.uio.no> wrote:

> [...]
> to rationalize this state of affairs (and, thus, work toward a peace
> treaty in token mapping), i believe we will need to extend the REPP
> language with a new facility: masking sub-strings according to NE-like
> patterns prior to core REPP processing, and exempting masked regions
> from all subsequent rewriting (i.e. making sure they remain intact).
>

Ok, so if I understood correctly, masking is not sequential like rewrite
rules, and happens before the rewrite rules regardless of where the mask
pattern appears in the file (just as the tokenization pattern is applied
after the rewrite rules), and the order of application of the mask patterns
doesn't matter.

I first wish to discuss mask pattern discovery, and this cross-cuts with
some other unclear areas of the REPP specification. To recap, REPP has
sequential operators ('!' rewrite rule, '<' file include, and '>' group
call) which apply in order during processing, and non-sequential operators
('#' iterative group definition, ':' tokenizer pattern, '@' meta-info
declaration) which do not apply except in certain circumstances (iterative
groups when they are called, tokenization after all rewrite rules have
applied). Non-sequential operators also have these two properties:

1. They may only be defined once in a REPP (once per identifier for
iterative groups)
2. They are local to a REPP instance (an iterative group or tokenizer
pattern in an external module is not available to other modules)

(These are partially guesses; I've raised an issue for PyDelphin to resolve
related questions so they don't distract from the current topic:
https://github.com/delph-in/pydelphin/issues/308)

The masking rules are non-sequential, but (1) clearly doesn't apply, and
(2) doesn't seem to apply in your proposal since ne.rpp is a submodule. At
first my reaction was to vote for starting simple and using masks defined
in the top-level module only (like the tokenizer), but I can see the value
in having them spread across submodules: a submodule may define rewrite
rules that require additional masks that are only needed when the module is
active.

So if we allow submodules to define these global masks, I guess we need to
collect any mask pattern found by crawling active submodules. The
non-sequential but global nature raises an issue: what if a submodule
containing a mask is active (e.g., set in *repp-calls* in the LKB) but is
not actually called with a group-call (i.e., if `>ne` did not appear in
tokenizer.rpp)?


> i have added an example of this new facility (introducing the '+'
> operator) to the ERG trunk; please see:
>
> http://svn.delph-in.net/erg/trunk/rpp/ne.rpp
>

As an aside, that email regex is needlessly complicated. Since, in a
unicode-aware regex engine, the word-character class \w is equivalent to
the L and N unicode properties with the underscore ([\p{L}\p{N}_]), and
since the TLD part of the domain must have only ascii characters, it can be
simplified as follows:

    <?[\w.-]+@[\w-]+(?:\.[\w-]+)*\.[a-zA-Z0-9]+>?

Either way it's not RFC5322 compatible but I imagine in running text you
want to match addresses that may be displayed with unicode codepoints.


> [...] the masking patterns merely set a boolean flag for the matched
> character
> positions, and subsequent rewriting must block rule applications that
> destructively change one or more masked character positions.  output
> of capture groups (copying from the left-hand side verbatim), on the
> other hand, must be allowed over masked regions.


That makes sense, but we may need a different mechanism than just boolean
flags because of the possibility of immediately adjacent masked regions
looking like one solid region when we should allow material to be inserted
between them. Instead, an IOB scheme (like in chunking) or similar would be
better.

There's also the question of overlapping masks (viz., when a mask pattern
matches a sequence that is already part of another mask). The IOB vector
would not accommodate these as separate, overlapping masks, so we could (1)
ignore overlapping matches, (2) union them (and update the IOB values
accordingly), or (3) use a different data structure such as a list of mask
start-positions and run-lengths. Currently I like option (2).

Finally, do we want to block rewrite rules where a capture group starts or
ends within a mask? I can imagine multiple capture groups that collectively
copy the entire masked region without alteration. I think this situation
wouldn't be too bad if we just check that the before and after masked
substrings have the same contents *and* the characterization is constant
(the same offset for the whole mask). This means the following would pass
because reinserting a single non-captured character doesn't change the
characterization:

    !(<?[\w.-]+)@([\w-]+(?:\.[\w-]+)*\.[a-zA-Z0-9]+>?)        \1@\2

But the following would change the characterization at the end and would
thus be blocked:

    !(<?[\w.-]+@[\w-]+(?:\.[\w-]+)*)\.com(>?)        \1.com\2

Also, generally speaking, I can see this functionality having potential to
reduce the need for special casing of things beyond named entities.
Currently the ERG has 12 lexical entries for "email" ("e-mail", "e - mail",
"e mail", nouns and verbs) and some of the orthographic variation seems to
account for tokenization effects. Is there any reason it should not be used
in these cases?

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200803/76d4df3b/attachment.html>

From sweaglesw at sweaglesw.org  Mon Aug  3 18:51:26 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Mon, 3 Aug 2020 09:51:26 -0700
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <CAGXBFAruxuU+6tN6-N6vna7xT4aiX3BwvK0W=_x-M-UF96KW0w@mail.gmail.com>
References: <CAGXBFAruxuU+6tN6-N6vna7xT4aiX3BwvK0W=_x-M-UF96KW0w@mail.gmail.com>
Message-ID: <63074D83-5697-438A-81C9-749ACBF88246@sweaglesw.org>

Mike,  what makes you think the masking operator should float to the top of the execution order instead of applying in the position it is written?  I expect it may be useful to apply some degree of normalization before masking applies, and if the desire is for masking first the author can always put it first.

Woodley

> On Aug 3, 2020, at 12:35 AM, "goodman.m.w at gmail.com" <goodman.m.w at gmail.com> wrote:
> 
> ?
> Hi Stephan,
> 
> This sounds like a good solution. I have some questions/comments below.
> 
>> On Sun, Aug 2, 2020 at 8:44 PM Stephan Oepen <oe at ifi.uio.no> wrote:
>> [...]
>> to rationalize this state of affairs (and, thus, work toward a peace
>> treaty in token mapping), i believe we will need to extend the REPP
>> language with a new facility: masking sub-strings according to NE-like
>> patterns prior to core REPP processing, and exempting masked regions
>> from all subsequent rewriting (i.e. making sure they remain intact).
> 
> Ok, so if I understood correctly, masking is not sequential like rewrite rules, and happens before the rewrite rules regardless of where the mask pattern appears in the file (just as the tokenization pattern is applied after the rewrite rules), and the order of application of the mask patterns doesn't matter.
> 
> I first wish to discuss mask pattern discovery, and this cross-cuts with some other unclear areas of the REPP specification. To recap, REPP has sequential operators ('!' rewrite rule, '<' file include, and '>' group call) which apply in order during processing, and non-sequential operators ('#' iterative group definition, ':' tokenizer pattern, '@' meta-info declaration) which do not apply except in certain circumstances (iterative groups when they are called, tokenization after all rewrite rules have applied). Non-sequential operators also have these two properties:
> 
> 1. They may only be defined once in a REPP (once per identifier for iterative groups)
> 2. They are local to a REPP instance (an iterative group or tokenizer pattern in an external module is not available to other modules)
> 
> (These are partially guesses; I've raised an issue for PyDelphin to resolve related questions so they don't distract from the current topic: https://github.com/delph-in/pydelphin/issues/308)
> 
> The masking rules are non-sequential, but (1) clearly doesn't apply, and (2) doesn't seem to apply in your proposal since ne.rpp is a submodule. At first my reaction was to vote for starting simple and using masks defined in the top-level module only (like the tokenizer), but I can see the value in having them spread across submodules: a submodule may define rewrite rules that require additional masks that are only needed when the module is active.
> 
> So if we allow submodules to define these global masks, I guess we need to collect any mask pattern found by crawling active submodules. The non-sequential but global nature raises an issue: what if a submodule containing a mask is active (e.g., set in *repp-calls* in the LKB) but is not actually called with a group-call (i.e., if `>ne` did not appear in tokenizer.rpp)?
>  
>> i have added an example of this new facility (introducing the '+'
>> operator) to the ERG trunk; please see:
>> 
>> http://svn.delph-in.net/erg/trunk/rpp/ne.rpp
> 
> As an aside, that email regex is needlessly complicated. Since, in a unicode-aware regex engine, the word-character class \w is equivalent to the L and N unicode properties with the underscore ([\p{L}\p{N}_]), and since the TLD part of the domain must have only ascii characters, it can be simplified as follows:
> 
>     <?[\w.-]+@[\w-]+(?:\.[\w-]+)*\.[a-zA-Z0-9]+>?
> 
> Either way it's not RFC5322 compatible but I imagine in running text you want to match addresses that may be displayed with unicode codepoints.
>  
>> [...] the masking patterns merely set a boolean flag for the matched character
>> positions, and subsequent rewriting must block rule applications that
>> destructively change one or more masked character positions.  output
>> of capture groups (copying from the left-hand side verbatim), on the
>> other hand, must be allowed over masked regions. 
> 
> That makes sense, but we may need a different mechanism than just boolean flags because of the possibility of immediately adjacent masked regions looking like one solid region when we should allow material to be inserted between them. Instead, an IOB scheme (like in chunking) or similar would be better.
> 
> There's also the question of overlapping masks (viz., when a mask pattern matches a sequence that is already part of another mask). The IOB vector would not accommodate these as separate, overlapping masks, so we could (1) ignore overlapping matches, (2) union them (and update the IOB values accordingly), or (3) use a different data structure such as a list of mask start-positions and run-lengths. Currently I like option (2).
> 
> Finally, do we want to block rewrite rules where a capture group starts or ends within a mask? I can imagine multiple capture groups that collectively copy the entire masked region without alteration. I think this situation wouldn't be too bad if we just check that the before and after masked substrings have the same contents *and* the characterization is constant (the same offset for the whole mask). This means the following would pass because reinserting a single non-captured character doesn't change the characterization:
> 
>     !(<?[\w.-]+)@([\w-]+(?:\.[\w-]+)*\.[a-zA-Z0-9]+>?)        \1@\2
> 
> But the following would change the characterization at the end and would thus be blocked:
> 
>     !(<?[\w.-]+@[\w-]+(?:\.[\w-]+)*)\.com(>?)        \1.com\2
> 
> Also, generally speaking, I can see this functionality having potential to reduce the need for special casing of things beyond named entities. Currently the ERG has 12 lexical entries for "email" ("e-mail", "e - mail", "e mail", nouns and verbs) and some of the orthographic variation seems to account for tokenization effects. Is there any reason it should not be used in these cases?
> 
> -- 
> -Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200803/94c5edf6/attachment.html>

From oe at ifi.uio.no  Mon Aug  3 18:52:04 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Mon, 3 Aug 2020 18:52:04 +0200
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <CAGXBFAruxuU+6tN6-N6vna7xT4aiX3BwvK0W=_x-M-UF96KW0w@mail.gmail.com>
References: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
	<CAGXBFAruxuU+6tN6-N6vna7xT4aiX3BwvK0W=_x-M-UF96KW0w@mail.gmail.com>
Message-ID: <CA+_Fm6KLSvLsc5n3tJGQmb-Us=G1zDpsXSKXK=aKr3pnJ9CU7w@mail.gmail.com>

hi again, mike, and many thanks for the quick response!

> Ok, so if I understood correctly, masking is not sequential like rewrite rules, and happens before the rewrite rules regardless of where the mask pattern appears in the file (just as the tokenization pattern is applied after the rewrite rules), and the order of application of the mask patterns doesn't matter.

that is in fact not what i had intended.  i would like the masking
rules to follow the standard sequential flow of control in REPP, i.e.
they get invoked when the processor gets to that point in the rule
sequence.  for full generality, i imagine one might want to allow some
string-level normalization prior to mask invocation.  the effects of a
successful mask matching will be valid from that point in the
processing sequence onwards.

on this view, i believe your clarification questions (1) and (2) do
not apply, right?

> That makes sense, but we may need a different mechanism than just boolean flags because of the possibility of immediately adjacent masked regions looking like one solid region when we should allow material to be inserted between them. Instead, an IOB scheme (like in chunking) or similar would be better.

indeed, that is a good point (that i had not yet considered).  yes,
destructive rewriting inbetween two adjacent masking regions must be
allowed.

> There's also the question of overlapping masks (viz., when a mask pattern matches a sequence that is already part of another mask). The IOB vector would not accommodate these as separate, overlapping masks, so we could (1) ignore overlapping matches, (2) union them (and update the IOB values accordingly), or (3) use a different data structure such as a list of mask start-positions and run-lengths. Currently I like option (2).

yes, your option (2) sounds like the most straightforward solution,
both in terms of specifying the expected behavior and implementing it.
the alternative would be not to allow overlapping mask matching, but
to me too it seems conceptually simplest (for REPP users and
implementers alike) to not restrict mask matching and union
overlapping matches.

> Finally, do we want to block rewrite rules where a capture group starts or ends within a mask? I can imagine multiple capture groups that collectively copy the entire masked region without alteration. I think this situation wouldn't be too bad if we just check that the before and after masked substrings have the same contents *and* the characterization is constant (the same offset for the whole mask).

i am not quite sure what exactly you have in mind here regarding
constant characterization (masked sub-strings can be shifted to the
left or the right, but their length and content must not change)?  my
original assumption was to just disallow rewriting without capture
groups inside (or overlapping with) a masked region.  this feels like
a simple and clear constraint to me.  on this view, two adjacent
capture groups that cover (at least) the complete masked region would
be fine, but even single-character identity rewriting (as in your '@'
example) should be blocked.  i fail to see a compelling need for that
kind of rewriting in the first place, and i would like to not
complicate masking support too much.  i imagine it might be relatively
straightforward to evaluate rewriting conditions while synthesizing
the output (i.e. while processing the right-hand side of a rule),
interleaved with the character-level accounting.

i have started to extend ReppTop on the wiki with a section on
masking, though some of the fine points of this thread have yet to be
(decided and) written down.  thanks, once more, for pushing towards
more specificity!

> Also, generally speaking, I can see this functionality having potential to reduce the need for special casing of things beyond named entities. Currently the ERG has 12 lexical entries for "email" ("e-mail", "e - mail", "e mail", nouns and verbs) and some of the orthographic variation seems to account for tokenization effects. Is there any reason it should not be used in these cases?

well, yes, i too wonder at times whether accommodation of typographic
variation could be reduced in the ERG lexicon :-).  this is a tricky
game, i fear.  in part because what is in the lexicon (in some cases)
seeks to cover both common conventions and common deviations, in part
because there have been some usage scenarios for the ERG without going
through the REPP layer (i.e. when parsing pre-tokenized or otherwise
externally tokenized inputs).  for the above example, i imagine (at
least if assuming REPP tokenization) one could hope to make do without
the three-token |e - mail| lexical entry (by masking |e-mail|),
whereas the other variants likely are required.  but such masking
could be said to duplicate specific lexical information in the REPP
rules, so maybe one would rather want to not require the |e-mail|
entry?

best wishes, oe


From oe at ifi.uio.no  Mon Aug  3 18:58:47 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Mon, 3 Aug 2020 18:58:47 +0200
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org>
References: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
	<8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org>
Message-ID: <CA+_Fm6LTPjiVOWSjH9p4NtTQQt3i6pA3=_Xiav0bQZODha_Nbw@mail.gmail.com>

hi woodley,

> It looks from the file you referenced like the proposed new operation is '=' rather than '+'?

yes, sorry, my typo in the email!

> I guess you will be limited to using this facility in cases where the designation as named entity is sufficiently unambiguous based on the RE alone.  It is tempting to contemplate ways in which REPP could offer ambiguous tokenization output here, but so far my imagination is too limited to come up with the scenario where it would be useful.

indeed, the intended use for masking would be for (near-)certain
patterns; in principle, one could further split and ambiguate in token
mapping then.  in the REPP predecessor, there was some contemplation
of string-level rewriting over a token lattice, but with the
introduction of token mapping we more than happily purged that
complexity from the initial tokenizer.  i have grown fond of the
current division of labor, with a simple, sequence-to-sequence initial
step (which should be limited to straightforward string-level
processing), the ability to call out to external processors (like a
PoS tagger) with that simple sequence, and deferring lattice
processing to the second stage of preprocessing, where we can
manipulate structured token objects ...

glad to hear you expect REPP masking should not be hard to implement;
i have yet to find out whether i share that optimistic expectation on
the LKB side :-).

oe


From goodman.m.w at gmail.com  Tue Aug  4 04:11:25 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Tue, 4 Aug 2020 10:11:25 +0800
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <CA+_Fm6KLSvLsc5n3tJGQmb-Us=G1zDpsXSKXK=aKr3pnJ9CU7w@mail.gmail.com>
References: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
	<CAGXBFAruxuU+6tN6-N6vna7xT4aiX3BwvK0W=_x-M-UF96KW0w@mail.gmail.com>
	<CA+_Fm6KLSvLsc5n3tJGQmb-Us=G1zDpsXSKXK=aKr3pnJ9CU7w@mail.gmail.com>
Message-ID: <CAGXBFAojqoBiq_yEyE5w06h8rNZD+ZLV5H1MkPNoSEcLLXGCyQ@mail.gmail.com>

On Tue, Aug 4, 2020 at 12:52 AM Stephan Oepen <oe at ifi.uio.no> wrote:

> hi again, mike, and many thanks for the quick response!
>
> > Ok, so if I understood correctly, masking is not sequential like rewrite
> rules, and happens before the rewrite rules regardless of where the mask
> pattern appears in the file (just as the tokenization pattern is applied
> after the rewrite rules), and the order of application of the mask patterns
> doesn't matter.
>
> that is in fact not what i had intended.  i would like the masking
> rules to follow the standard sequential flow of control in REPP, i.e.
> they get invoked when the processor gets to that point in the rule
> sequence.  for full generality, i imagine one might want to allow some
> string-level normalization prior to mask invocation.  the effects of a
> successful mask matching will be valid from that point in the
> processing sequence onwards.
>

Sorry, I misinterpreted what you meant by "masking sub-strings [...] prior
to core REPP processing" in the original email.

on this view, i believe your clarification questions (1) and (2) do
> not apply, right?
>

Correct, although my related questions (in the GitHub issue) still stand.
We can deal with those later.


[...]
>
> > Finally, do we want to block rewrite rules where a capture group starts
> or ends within a mask? I can imagine multiple capture groups that
> collectively copy the entire masked region without alteration. I think this
> situation wouldn't be too bad if we just check that the before and after
> masked substrings have the same contents *and* the characterization is
> constant (the same offset for the whole mask).
>
> i am not quite sure what exactly you have in mind here regarding
> constant characterization (masked sub-strings can be shifted to the
> left or the right, but their length and content must not change)?


By "the same offset for the whole mask" I am referring to the start and end
positions that are tracked for each character. The offset itself may change
(indicating the masked region shifting left or right), but all start and
end offsets within a masked region must be the same offset, otherwise it
indicates that the length has changed or that content has been replaced.


>   my
> original assumption was to just disallow rewriting without capture
> groups inside (or overlapping with) a masked region.  this feels like
> a simple and clear constraint to me.  on this view, two adjacent
> capture groups that cover (at least) the complete masked region would
> be fine, but even single-character identity rewriting (as in your '@'
> example) should be blocked.  i fail to see a compelling need for that
> kind of rewriting in the first place, and i would like to not
> complicate masking support too much.  i imagine it might be relatively
> straightforward to evaluate rewriting conditions while synthesizing
> the output (i.e. while processing the right-hand side of a rule),
> interleaved with the character-level accounting.
>

I agree that these cases are extremely unlikely. I think that being too
permissive with these seemingly trivial decisions can lead to unexpected
bugs later. For instance, if we allow multiple capture groups to piece
together the original masked string and we oversee the rewriting to ensure
it hasn't changed, these might cause problems, depending on implementation:

    ; mask "abc"
    =abc
    ; full mask is captured and rewritten contiguously, but string and
offsets change
    !(a)(b)(c)    \2\1\3
    ; full mask is captured, only part is written
    !(a(b)(c))    \2\3
    ; full mask is captured and rewritten contiguously, but 'b' is
duplicated
    !(a(b))(c)    \1\2\3

I feel that the analysis of the regex on the left and the template on the
right to ensure that the full masked substring is recreated contiguously,
completely, and in order is an overly-complicated solution. Perhaps when I
write this code I'll see something that makes it easy to compute. But
barring that, I proposed using post-rule-application checks on having
uniform start/end offsets in each mask and that the contents of those
substrings is identical to the corresponding pre-rule-application
substrings. These checks alone would not block the '@' example only as a
side effect, because replacing a single non-captured character does not
break the uniformity of the offsets (and in this case the string didn't
change, either). When 2 or more non-captured characters are replaced, the
offsets become non-uniform, even if the replaced characters are identical
to the input. The '@' example could probably be blocked with a third check
that no non-captured material is inserted in a mask; at least, this sounds
much simpler than tracking the captured groups.

The alternative where a rewrite rule is blocked if capture groups begin or
end within a mask sounds like a special case that would be confusing for a
grammar developer not familiar with the full REPP specification.


> [...]
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200804/a96a8a7b/attachment.html>

From bec.dridan at gmail.com  Wed Aug  5 13:24:39 2020
From: bec.dridan at gmail.com (Bec Dridan)
Date: Wed, 5 Aug 2020 21:24:39 +1000
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
References: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
Message-ID: <CAKRPO=NiTmv2yX1Ye0pBsuo3yA2BU0terQjqXxmifD0sSJyCmw@mail.gmail.com>

It's a _loong_ time since I looked at that code (or used svn...). I've been
refreshing my memory of the code, and I think I can see how that works. As
a mechanism, it sounds reasonable, but it's going to be a long time before
I'd have time to sit down and try to make the change. More than happy for
anyone else to take up the challenge :)

Bec

On Sun, Aug 2, 2020 at 10:44 PM Stephan Oepen <oe at ifi.uio.no> wrote:

> dear bec, mike, and woodley:
>
> during the summit you may have noticed dan mentioning a 'war zone'
> around NE-related token mapping rules in the current ERG trunk.  with
> our move to modern, OntoNotes-style tokenization, the initial REPP
> segmentation now breaks at dashes (including hyphens) and slashes.
> but these will, of course, occur frequently in named entities like
> email and web addresses, where they should preferably not be
> segmented.  the current unhappy state of affairs is that initial
> tokenization over-segments, with dan then heroically seeking to
> re-unite at least the most common patterns of 'multi-token' named
> entities in token mapping, where any number of token boundaries may
> have been introduced at hyphens and slashes.
>
> to rationalize this state of affairs (and, thus, work toward a peace
> treaty in token mapping), i believe we will need to extend the REPP
> language with a new facility: masking sub-strings according to NE-like
> patterns prior to core REPP processing, and exempting masked regions
> from all subsequent rewriting (i.e. making sure they remain intact).
> i have added an example of this new facility (introducing the '+'
> operator) to the ERG trunk; please see:
>
> http://svn.delph-in.net/erg/trunk/rpp/ne.rpp
>
> at present, these rules are only loaded into the LKB (where i am in
> the process of adding masking to the REPP implementation), hence they
> should not cause trouble in the other engines (i hope).  i would like
> to invite you (as the developers of REPP processors in PET, pyDelphin,
> and ACE, respectively) to look over this proposal and share any
> comments you might have.  assuming we can agree on the need for
> extending the REPP language along the above lines, i am hoping you
> might have a chance to add support for the masking operator in your
> REPP implementations?
>
> from my ongoing work in the LKB, masking support appears relatively
> straightforward once an engine implements the step-wise accounting for
> character position sketched by Dridan & Oepen (2012; ACL).  the
> masking patterns merely set a boolean flag for the matched character
> positions, and subsequent rewriting must block rule applications that
> destructively change one or more masked character positions.  output
> of capture groups (copying from the left-hand side verbatim), on the
> other hand, must be allowed over masked regions.  because the LKB
> implementation predates the 2012 paper, however, i will first have to
> implement the precise accounting mechanism to validate the above
> expectation regarding how to realize masking.
>
> what do you make of the above proposal?  oe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200805/b0f0eba1/attachment.html>

From oe at ifi.uio.no  Thu Aug  6 10:04:57 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Thu, 6 Aug 2020 10:04:57 +0200
Subject: [developers] www script in the logon distribution
In-Reply-To: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com>
References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com>
Message-ID: <CA+_Fm6L51Q7p-1Y83xaPorJ4AEr6Q80mK6UwraYxmVgJKHK3Nw@mail.gmail.com>

hi again, alexandre:

> For some reason, the www script in the logon distribution does not start the webserver. Using the `--debug` option, I don't have any additional information in the log file (actually, the script didn't mention the debug anywhere). I am following all instructions from http://moin.delph-in.net/LogonOnline. In particular, pvmd3 is running without any error in the startup. I don't see any *.pvm file in the /tmp. The script bin/logon starts LKB and the [incr TSDB()] normally. I have used `?cat` to save a lisp file and load it manually in the ACL REPL, no error too. Any idea?

i am slowly catching up to DELPH-IN email, with apologies for the long
turn-around!

is the above still a current problem?  is this within your container,
or does it also occur on a 'regular' linux box?

to debug further, note that the 'www' script sets things up so that
you can interact with the running lisp image once initialization is
complete, i.e. just type into the lisp prompt, e.g. to inspect the
state of AllegroServe.

when you observe that the web server is not started, does that mean it
does not even bind to its port?  when running with the standard
'--erg' option, i would expect the following to work (and return the
dynamically generated top-level page):

wget http://localhost:8100/logon

best wishes, oe


From sweaglesw at sweaglesw.org  Sun Aug  9 08:53:49 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Sat, 8 Aug 2020 23:53:49 -0700
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <CA+_Fm6LTPjiVOWSjH9p4NtTQQt3i6pA3=_Xiav0bQZODha_Nbw@mail.gmail.com>
References: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
	<8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org>
	<CA+_Fm6LTPjiVOWSjH9p4NtTQQt3i6pA3=_Xiav0bQZODha_Nbw@mail.gmail.com>
Message-ID: <D9C06DF9-5EA5-482F-B902-542F50CEACC7@sweaglesw.org>

Hi again,

> On Aug 3, 2020, at 9:58 AM, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> glad to hear you expect REPP masking should not be hard to implement;
> i have yet to find out whether i share that optimistic expectation on
> the LKB side :-).

I got to the point of being able to play around a bit with rules, anyway.  I can mask email addresses, but as far as I can tell, no subsequent rules are ever even trying to do anything inside of them.  Is this actually a good test case?  I get a single identical token for the email address in the below example, before and after implementing the masking idea:

$ ace -g erg.dat -E
I sent <oe at csli.stanford.edu> an e-mail.
EXECUTING MASK pattern...
MASKING <oe at csli.stanford.edu>
I<0:1> sent<2:6> <oe at csli.stanford.edu><7:29> an<30:32> e<33:34> -<34:35> mail<35:39> .<39:40>

> On Aug 3, 2020, at 12:35 AM, goodman.m.w at gmail.com wrote:
> 
> As an aside, that email regex is needlessly complicated. Since, in a unicode-aware regex engine, the word-character class \w is equivalent to the L and N unicode properties with the underscore ([\p{L}\p{N}_]), and since the TLD part of the domain must have only ascii characters, it can be simplified as follows:
> 
>     <?[\w.-]+@[\w-]+(?:\.[\w-]+)*\.[a-zA-Z0-9]+>?

Besides looking prettier, Mike's regex has the advantage of working in Boost's POSIX regex interface, whereas Stephan's does not.  I am not particularly eager to change to a different regex API.  Boost regex has multiple ways to call it, and for whatever reason, the POSIX way does not support the \p{} syntax.

I ended up using the BIO-encoded representation of what's masked that Mike proposed, so I can mask two adjacent spans and then still insert material between them, but block changing material inside of the masked regions.  In my implementation, material copied by capture group is OK but material rewritten literally on the RHS of a replace fails currently, because that material ends up being marked as unmasked, whereas the check requires identical content, characterization, and mask tags for everything in a masked area.  As you both noted, shifting the entire mask left or right is fine.

Regards,
-Woodley

From oe at ifi.uio.no  Sun Aug  9 23:47:48 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sun, 9 Aug 2020 23:47:48 +0200
Subject: [developers] extension to the REPP sub-formalism
In-Reply-To: <D9C06DF9-5EA5-482F-B902-542F50CEACC7@sweaglesw.org>
References: <CA+_Fm6JYdH0qForySXdUWr411rVAXgEWkrMpnx-Yd-273Q=VcA@mail.gmail.com>
	<8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org>
	<CA+_Fm6LTPjiVOWSjH9p4NtTQQt3i6pA3=_Xiav0bQZODha_Nbw@mail.gmail.com>
	<D9C06DF9-5EA5-482F-B902-542F50CEACC7@sweaglesw.org>
Message-ID: <CA+_Fm6K2U6OnbfmE7LJ5SoL56TZm+qaoZriVygwac_6TOcz-9w@mail.gmail.com>

hi again, woodley:

> I got to the point of being able to play around a bit with rules, anyway.  I can mask email addresses, but as far as I can tell, no subsequent rules are ever even trying to do anything inside of them.  Is this actually a good test case?  I get a single identical token for the email address in the below example, before and after implementing the masking idea:

i am happy to hear you were able to confirm your optimistic
expectation that masking would not be too difficult to implement :-).

i shall add a few more masking rules to the ERG trunk this coming
week, but i would think the following could be a useful test case to
explore the interaction of masking and rewriting (i would expect
eleven tokens):

stephan, oe at yy.com, oe at ellingsen-oepen.net, or ??????@?????-??????.??, called.

> Besides looking prettier, Mike's regex has the advantage of working in Boost's POSIX regex interface, whereas Stephan's does not.  I am not particularly eager to change to a different regex API.  Boost regex has multiple ways to call it, and for whatever reason, the POSIX way does not support the \p{} syntax.

i would suggest we leave aesthetic judgments to the maintainers of the
REPP rules, but in this case i put in unicode properties for a reason:
i am eager to take into use the \p{} syntax because (unlike classic
character ranges or shorthands like \w) it is unambiguously defined
across engines, independent of locales.  more importantly, i expect
unicode properties will afford a cleaner and more general solution to
normalization of punctuation, e.g. different types of whitespace and
various conventions for opening and closing quote marks; unicode
properties may also help in dealing with interspersed foreign content.

it appears Boost regex offers full unicode support when combined with
ICU, which i would guess ACE is using from before?  so, i am hoping
that full unicode support in regular expressions (in REPP and chart
mapping) might become available with relatively minor adjustments of
how you call into the Boost regex engine?

https://www.boost.org/doc/libs/1_73_0/libs/regex/doc/html/boost_regex/unicode.html

> I ended up using the BIO-encoded representation of what's masked that Mike proposed, so I can mask two adjacent spans and then still insert material between them, but block changing material inside of the masked regions.  In my implementation, material copied by capture group is OK but material rewritten literally on the RHS of a replace fails currently, because that material ends up being marked as unmasked, whereas the check requires identical content, characterization, and mask tags for everything in a masked area.

that all sounds compatible with my intuitions about how i would like
the masking to behave.  in general, i am hoping to discourage literal
rewriting, as it has the potential to weaken characterization
accounting.

many thanks for working on this!  oe


From olzama at uw.edu  Tue Aug 11 22:38:35 2020
From: olzama at uw.edu (Olga Zamaraeva)
Date: Tue, 11 Aug 2020 13:38:35 -0700
Subject: [developers] ERG coverage references
Message-ID: <CANy_-j+3Txq7LS1mGbUebU+G=Y+=hEeP1Dh01q7Q_4hy2jdZrw@mail.gmail.com>

Dear developers,

I am looking for some very general reference on the ERG coverage (to
include in a document which has a short section on HPSG grammars). The most
recent one I was able to find so far is Table 1 in Flickinger et al. 2012
<https://www.dfki.de/fileadmin/user_upload/import/6619_DeepBank_tlt11.pdf>.

Are there any more recent ones? Anything that is associated with the 2018
release perhaps?

Dan's summit updates do not include this info, as far as I can tell.

Thank you,
-- 
Olga Zamaraeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200811/321df16e/attachment.html>

From arademaker at gmail.com  Thu Aug 13 04:44:21 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 12 Aug 2020 23:44:21 -0300
Subject: [developers] ErgWeSearch - Deep Linguistic Processing with HPSG
	(DELPH-IN)
Message-ID: <4C8C0B71-0AA4-4F78-B167-09894CD950F4@gmail.com>


Hi Stephan,

Any reason for keeping this page below restricted:

http://moin.delph-in.net/ErgWeSearch

Currently it has 

#acl RomanPearah,ParticipantsGroup:read,write,admin


I have two students working with me and they can?t access this page. Can we remove this acl directive? 

Alexandre 
Sent from my iPhone

From arademaker at gmail.com  Fri Aug 14 22:37:13 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 14 Aug 2020 17:37:13 -0300
Subject: [developers] Broken link
Message-ID: <38A648AB-989A-4C64-91A8-70542AB17163@gmail.com>


Hi Stephan,

Im http://moin.delph-in.net/EdsTop

EDS since its 2002 inception (Oepen, et al., 2002) has found a broader range of DELPH-IN-internal applications? 


The first link to http://bultreebank.org/proceedings/paper10.pdf is broken, it should be 
http://bultreebank.org/wp-content/uploads/2017/05/paper10.pdf ?

Best,
Alexandre


From arademaker at gmail.com  Wed Aug 19 23:37:38 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 19 Aug 2020 18:37:38 -0300
Subject: [developers] First step for the clone of FFTB SVN in GitHub
In-Reply-To: <CAGXBFArezbE4pPmtcQ5TnpB3Q3qS3=VdKgf7Zz2wUY451XekkA@mail.gmail.com>
References: <B8338B8C-0E23-41EC-ABC1-2F1094631515@gmail.com>
	<CAGXBFArZ+n5XaBwj2=NC5zw-rK=+hOHBe3zOQgnMstiOvYd9tw@mail.gmail.com>
	<CA+_Fm6L+AnQGpqWHLNfK4u_cAW3+Sq3E+T4mta2G_oFHd3LVng@mail.gmail.com>
	<A5BE82E8-516C-4C40-8991-B2751B9F3E89@gmail.com>
	<CAGXBFArezbE4pPmtcQ5TnpB3Q3qS3=VdKgf7Zz2wUY451XekkA@mail.gmail.com>
Message-ID: <89F0DA8E-B78C-47F7-AC1C-C2F9732BAF7D@gmail.com>


Hi,

Michael, thank you for all suggestions you made. I followed almost everything! ;-) 

https://github.com/delph-in/fftb

The fftb tool now has a mirror repository in the GitHub (or M$ GitHub as Stephan likes to write!)

Since I do need a place to put the README and the authors.txt, I ended up having a master branch that will be updated mainly with changes from the SVN. The trunk will be the pristine branch, mirroring SVN trunk. 

Next, we have to define possible workflows. 

For instance, I had an issue (https://github.com/delph-in/fftb/issues/1) with remote connections. Woodley told me how to solve it with a tiny change in the file web.c, I kept the modification in a branch called `issue-1` for now. This branch would probably not be merged into master directly, only if Woodley agrees to make the change in the SVN principal repository? I would them update the git repo with the changes in the SVN.

If Woodley doesn?t accept the change in the code (*), we can still have this branch in the git repository for particular uses. For instance, I can obtain the fftb code with this change to make the docker image:

https://github.com/own-pt/docker-delphin/blob/master/image/Dockerfile#L65-L67

So I don?t have to copy the web.c into the docker repository, which is excellent!

Suggestions are welcome! 

(*) if I understood it right, the modification opens the door for any remote connection, and it can be understood as a security risk.

Best,
Alexandre


> On 22 Jul 2020, at 11:57, goodman.m.w at gmail.com wrote:
> 
> Done. fftb it is.
> 


From arademaker at gmail.com  Fri Aug 21 16:54:20 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 21 Aug 2020 11:54:20 -0300
Subject: [developers] Comparing a profile with a grammar output
Message-ID: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>


Hi,

After having a profile disambiguated with FFTB, my first question is how to compare it with the grammar output for the same set of sentences? This comparison should give me an evaluation of the current ranking model with my data. It should tell me if it is worth to train a new model. In particular, if I compare the semantic structure, it would also allow me to ignore variations on syntactic analysis that doesn't impact semantic representation.

I remember that Michael mentioned in the last Summit that PyDelphin has some support for comparing semantic representation, am I right? I didn't find it in the documentation.

I also tried to use the mtools from Stephan (https://github.com/cfmrp/mtool) but I am probably not using it right, since even with two different sentences I am getting the same output below:

% echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 1.eds
% echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds
% ./main.py --read eds --score mrp --framework eds --gold 1.eds 2.eds
{"n": 0,
 "null": 0,
 "exact": 0,
 "tops": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "labels": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "properties": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "anchors": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "edges": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "attributes": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "all": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "time": 6.985664367675781e-05,
 "cpu": 0.00020100000000000673}

What am I missing? Should I use any other method to compare the profile with the grammar output? Comments and suggestions are welcome! :-)	

Best,

Alexandre Rademaker
http://arademaker.github.com/
http://researcher.ibm.com/person/br-alexrad

From oe at ifi.uio.no  Fri Aug 21 17:35:26 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Fri, 21 Aug 2020 17:35:26 +0200
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
Message-ID: <CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>

hi alexandre:

> After having a profile disambiguated with FFTB, my first question is how to compare it with the grammar output for the same set of sentences? This comparison should give me an evaluation of the current ranking model with my data. It should tell me if it is worth to train a new model. In particular, if I compare the semantic structure, it would also allow me to ignore variations on syntactic analysis that doesn't impact semantic representation.

one used to do this kind of evaluation in [incr tsdb()], initiated
through the 'Trees | Score' command.  i am not sure this will work out
of the box for comparing an FFTB-based treebank to an ACE-generated
parsing profile ... although it should, in principle!  i expect you
would want to select 'Result Equivalence' and 'Score All Items' in
'Trees | Switches', and just give it a shot :-).

if you get the above to work (which should give exact match
accuracies), the [incr tsdb()] scorer can also compute a range of
additional metrics, including EDM and ParsEval, but these would have
to be activated programmatically:

(setf *redwoods-score-counts* '(:ta :parseval :edma :edmp))

it should also be possible to batch-score a selection of profiles,
with a little bit of coding in the high-level [incr tsdb()] scripting
language; for your inspiration, i have put on-line an archive of
working files from the (as of yet unpublished) manuscript on robust
parsing and unification with, among others, yi zhang:

http://nlpl.eu/oe/edm.tgz

> I also tried to use the mtools from Stephan (https://github.com/cfmrp/mtool) but I am probably not using it right, since even with two different sentences I am getting the same output below:
>
> % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 1.eds
> % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds
> % ./main.py --read eds --score mrp --framework eds --gold 1.eds 2.eds
> {"n": 0,

you end up scoring zero items, which either suggests your EDS input
files are not considered valid by mtool, or the '--framework eds'
selection fails.  the latter should not be necessary (it may only work
with MRP input files; its purpose is to select a sub-set of graphs,
explicitly marked for a specific framework, from a multi-framework
input file).  equally likely, your EDS input files may be missing the
identifier prefix; please see 'data/score/eds/' in mtool for the
expected syntax.

cheers, oe


From arademaker at gmail.com  Sat Aug 22 01:30:30 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 21 Aug 2020 20:30:30 -0300
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
	<CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
Message-ID: <E8D946F3-8511-4ADA-9E72-18E28294F08A@gmail.com>


> On 21 Aug 2020, at 12:35, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> one used to do this kind of evaluation in [incr tsdb()], initiated
> through the 'Trees | Score' command.  i am not sure this will work out
> of the box for comparing an FFTB-based treebank to an ACE-generated
> parsing profile ... although it should, in principle!  i expect you
> would want to select 'Result Equivalence' and 'Score All Items' in
> 'Trees | Switches', and just give it a shot :-).
> 
> if you get the above to work (which should give exact match
> accuracies), the [incr tsdb()] scorer can also compute a range of
> additional metrics, including EDM and ParsEval, but these would have
> to be activated programmatically:
> 
> (setf *redwoods-score-counts* '(:ta :parseval :edma :edmp))
> 

Hi Stephan,

Thank you for the directions. Trying that first approach, it gave me 100% accuracy?? But the window with the results opened instantaneously, I don?t believe it really did the analysis. How to make sure the tool is doing what we excepted it to do? How can I know if the two profiles are proper selected?


Best,
Alexandre


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200821/2a89db42/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.png
Type: image/png
Size: 512765 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200821/2a89db42/attachment-0001.png>

From arademaker at gmail.com  Sat Aug 22 02:38:15 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 21 Aug 2020 21:38:15 -0300
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <E8D946F3-8511-4ADA-9E72-18E28294F08A@gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
	<CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
	<E8D946F3-8511-4ADA-9E72-18E28294F08A@gmail.com>
Message-ID: <03B12AF5-0AB3-4E72-9D8C-83429E179C1A@gmail.com>


I just noticed that I need to load the grammar in the LKB. But LKB could not load the trunk version of ERG. Using the ERG last stable version, I got the same behaviour.


> On 21 Aug 2020, at 20:30, Alexandre Rademaker <arademaker at gmail.com> wrote:
> 
>> On 21 Aug 2020, at 12:35, Stephan Oepen <oe at ifi.uio.no> wrote:
>> 
>> one used to do this kind of evaluation in [incr tsdb()], initiated
>> through the 'Trees | Score' command.  i am not sure this will work out
>> of the box for comparing an FFTB-based treebank to an ACE-generated
>> parsing profile ... although it should, in principle!  i expect you
>> would want to select 'Result Equivalence' and 'Score All Items' in
>> 'Trees | Switches', and just give it a shot :-).
>> 
>> if you get the above to work (which should give exact match
>> accuracies), the [incr tsdb()] scorer can also compute a range of
>> additional metrics, including EDM and ParsEval, but these would have
>> to be activated programmatically:
>> 
>> (setf *redwoods-score-counts* '(:ta :parseval :edma :edmp))
> 
> Hi Stephan,
> 
> Thank you for the directions. Trying that first approach, it gave me 100% accuracy?? But the window with the results opened instantaneously, I don?t believe it really did the analysis. How to make sure the tool is doing what we excepted it to do? How can I know if the two profiles are proper selected?
> 
> 
> <PastedGraphic-2.png>
> 
> Best,
> Alexandre
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200821/19ba6b8d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 553009 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200821/19ba6b8d/attachment-0001.png>

From arademaker at gmail.com  Sat Aug 22 03:22:31 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 21 Aug 2020 22:22:31 -0300
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
	<CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
Message-ID: <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com>


Hi Stephan,

I tried the mtool again. Same problem.

% echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds
% echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds

I added the #XXXX right before the EDS serialization. The only different between these files in the https://github.com/cfmrp/mtool/blob/master/data/score/eds/wsj.pet.eds is that these files are not formatted with one predicate per line, instead, the EDS is serialised in a single line without line breaks.

% ./main.py --read eds --score smatch --gold ../sick/1.eds ../sick/2.eds
{"n": 0,
 "g": 0,
 "s": 0,
 "c": 0,
 "p": 0.0,
 "r": 0.0,
 "f": 0.0,
 "time": 1.4781951904296875e-05,
 "cpu": 4.4000000000044004e-05}

% ./main.py --read eds --score mrp --gold ../sick/1.eds ../sick/2.eds
{"n": 0,
 "null": 0,
 "exact": 0,
 "tops": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "labels": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "properties": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "anchors": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "edges": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "attributes": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "all": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
 "time": 3.0994415283203125e-05,
 "cpu": 9.099999999995223e-05}


But? I found the repo that Michael presented in the Summit https://github.com/delph-in/delphin.edm:

% delphin edm 1.eds 2.eds
Precision:	1.0
   Recall:	1.0
  F-score:	1.0

It works! But I need to remove the prefix (#NNNNN) before the EDS serializations. Even better, it works directly with profiles although the verbose option didn?t show anything interesting (I would like to see results per item):

% delphin edm -v golden parsed
Precision:	0.9637710992177851
   Recall:	0.9683557394002068
  F-score:	0.9660579799855565

Thank you Michael! I am not very confident on these numbers, I was expecting more differences, but? Anyway, it would be nice to double-check with mtool if I can.

Best,
Alexandre

> On 21 Aug 2020, at 12:35, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
>> I also tried to use the mtools from Stephan (https://github.com/cfmrp/mtool) but I am probably not using it right, since even with two different sentences I am getting the same output below:
>> 
>> % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 1.eds
>> % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds
>> % ./main.py --read eds --score mrp --framework eds --gold 1.eds 2.eds
>> {"n": 0,
> 
> you end up scoring zero items, which either suggests your EDS input
> files are not considered valid by mtool, or the '--framework eds'
> selection fails.  the latter should not be necessary (it may only work
> with MRP input files; its purpose is to select a sub-set of graphs,
> explicitly marked for a specific framework, from a multi-framework
> input file).  equally likely, your EDS input files may be missing the
> identifier prefix; please see 'data/score/eds/' in mtool for the
> expected syntax.


From oe at ifi.uio.no  Sat Aug 22 09:02:50 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sat, 22 Aug 2020 09:02:50 +0200
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
	<CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
	<672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com>
Message-ID: <CA+_Fm6+oH8vW+T2m5hj_twuHnZAgh0d6CK=NCH6huHpKJDQNwQ@mail.gmail.com>

hi again, alexandre and mike:

> I added the #XXXX right before the EDS serialization. The only different between these files in the https://github.com/cfmrp/mtool/blob/master/data/score/eds/wsj.pet.eds is that these files are not formatted with one predicate per line, instead, the EDS is serialised in a single line without line breaks.

i am tempted to declare those line breaks a necessary part of the
native EDS syntax (though i see that the current EdsTop wiki page does
not explicitly state that).  mike, could you change EDS serialization
in pyDelphin to reflect the multi-line format exemplified on that
page?  also, when you have an item identifier available i would
suggest you prefix the EDS with an additional line (assuming the
identifier is 4711):

#4711

this latter addition should be considered optional, though, and i
shall check that the mtool EDS reader does not require it (i suspect
currently it does; mtool has hardly been used in conjunction with
native EDS serialization, so this is a welcome push toward better
cross-format and -platform interoperability).

regarding your lack of success when invoking the scorer in [incr
tsdb()], alexandre: could you make available to me a copy of the two
profiles involved?

best wishes, oe


From goodman.m.w at gmail.com  Sat Aug 22 11:34:07 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Sat, 22 Aug 2020 17:34:07 +0800
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <CA+_Fm6+oH8vW+T2m5hj_twuHnZAgh0d6CK=NCH6huHpKJDQNwQ@mail.gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
	<CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
	<672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com>
	<CA+_Fm6+oH8vW+T2m5hj_twuHnZAgh0d6CK=NCH6huHpKJDQNwQ@mail.gmail.com>
Message-ID: <CAGXBFAoGOCwutd5BFdF3f04-6JEpi6yAJmcAnLm5DX7Z=e+5rQ@mail.gmail.com>

Hi Alexandre and Stephan,

Alexandre, if you only care about exact matches of semantics, you might
look into how the Matrix does regression testing (see rtest.py at
https://github.com/delph-in/matrix/).

And I'd need to refresh my understanding of the specifics, but I think
delphin.edm gives the same final scores as mtool but not Bec's tool. It
*can* replicate the results of both by adjusting weights and things via
command options. It doesn't give as granular a breakdown as mtool, but you
can get per-item results when you increase verbosity twice (-vv).

Finally, if you want EDS with line breaks, try adding `--indent` or
`--indent 1` to the `delphin convert` command.

Stephan, I don't see why line breaks are necessary for EDS native format.
There is no syntactic necessity that I can see. I find it very useful to
have the option for single-line EDS (or any format), e.g., for line-pairing
the exported representations of two profiles. I'll consider how I can make
the identifier prefix (e.g., #4711) map to the internal 'identifier' field
(see
https://pydelphin.readthedocs.io/en/latest/api/delphin.eds.html#delphin.eds.EDS
).

On Sat, Aug 22, 2020 at 3:03 PM Stephan Oepen <oe at ifi.uio.no> wrote:

> hi again, alexandre and mike:
>
> > I added the #XXXX right before the EDS serialization. The only different
> between these files in the
> https://github.com/cfmrp/mtool/blob/master/data/score/eds/wsj.pet.eds is
> that these files are not formatted with one predicate per line, instead,
> the EDS is serialised in a single line without line breaks.
>
> i am tempted to declare those line breaks a necessary part of the
> native EDS syntax (though i see that the current EdsTop wiki page does
> not explicitly state that).  mike, could you change EDS serialization
> in pyDelphin to reflect the multi-line format exemplified on that
> page?  also, when you have an item identifier available i would
> suggest you prefix the EDS with an additional line (assuming the
> identifier is 4711):
>
> #4711
>
> this latter addition should be considered optional, though, and i
> shall check that the mtool EDS reader does not require it (i suspect
> currently it does; mtool has hardly been used in conjunction with
> native EDS serialization, so this is a welcome push toward better
> cross-format and -platform interoperability).
>
> regarding your lack of success when invoking the scorer in [incr
> tsdb()], alexandre: could you make available to me a copy of the two
> profiles involved?
>
> best wishes, oe
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200822/86cc67e7/attachment.html>

From arademaker at gmail.com  Sat Aug 22 17:57:53 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Sat, 22 Aug 2020 12:57:53 -0300
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <CA+_Fm6+oH8vW+T2m5hj_twuHnZAgh0d6CK=NCH6huHpKJDQNwQ@mail.gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
	<CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
	<672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com>
	<CA+_Fm6+oH8vW+T2m5hj_twuHnZAgh0d6CK=NCH6huHpKJDQNwQ@mail.gmail.com>
Message-ID: <50D90672-043A-4568-9419-3C6E1D83AFA7@gmail.com>


Hi Stephan,

Following your directions, I asked pydelphin to export with line breaks (?indent), and I successfully execute the mtool with all except the ?mrp? metric, see below.

For the profiles, you can find them at https://github.com/arademaker/sick-fftb. Thank you so much for your help.

You raised an interesting question about the `item identifier`. Is it part of the EDS? We may need a specification of a file format containing a sequence of EDS serialization (or native EDS syntax, as you also wrote). I think the same happens with ACE stdout protocols (https://pydelphin.readthedocs.io/en/latest/api/delphin.ace.html#ace-stdout-protocols), for instance, a "SENT: ..." precedes all MRSs, but this is not part of the MRS. These issues are all related to the work in the RDF Schemas? 
  
Best,
Alexandre


echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --indent --from ace --to eds > 1.eds
echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --indent --from ace --to eds > 2.eds

% ./main.py --read eds --score ucca --gold ../sick/1.eds ../sick/2.eds
{"n": 1,
 "labeled": {"primary": {"g": 6, "s": 6, "c": 6, "p": 1.0, "r": 1.0, "f": 1.0}, "remote": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}},
 "unlabeled": {"primary": {"g": 5, "s": 5, "c": 5, "p": 1.0, "r": 1.0, "f": 1.0}, "remote": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}},
 "time": 0.0001461505889892578,
 "cpu": 0.0004350000000000742}

% ./main.py --read eds --score smatch --gold ../sick/1.eds ../sick/2.eds
{"n": 1,
 "g": 42,
 "s": 42,
 "c": 42,
 "p": 1.0,
 "r": 1.0,
 "f": 1.0,
 "time": 0.0055389404296875,
 "cpu": 0.016556000000000015}

% ./main.py --read eds --score edm --gold ../sick/1.eds ../sick/2.eds
{"n": 1,
 "names": {"g": 7, "s": 7, "c": 7, "p": 1.0, "r": 1.0, "f": 1.0},
 "arguments": {"g": 6, "s": 6, "c": 6, "p": 1.0, "r": 1.0, "f": 1.0},
 "tops": {"g": 1, "s": 1, "c": 1, "p": 1.0, "r": 1.0, "f": 1.0},
 "properties": {"g": 21, "s": 21, "c": 21, "p": 1.0, "r": 1.0, "f": 1.0},
 "all": {"g": 35, "s": 35, "c": 35, "p": 1.0, "r": 1.0, "f": 1.0},
 "time": 8.106231689453125e-05,
 "cpu": 0.00024100000000004673}

% ./main.py --read eds --score sdp --gold ../sick/1.eds ../sick/2.eds
{"n": 1,
 "labeled": {"g": 7, "s": 7, "c": 7, "p": 1.0, "r": 1.0, "f": 1.0, "m": 1.0},
 "unlabeled": {"g": 6, "s": 6, "c": 6, "p": 1.0, "r": 1.0, "f": 1.0, "m": 1.0},
 "time": 7.104873657226562e-05,
 "cpu": 0.00021099999999996122}

% ./main.py --read eds --score mrp --gold ../sick/1.eds ../sick/2.eds
Traceback (most recent call last):
  File "./main.py", line 472, in <module>
    main();
  File "./main.py", line 385, in main
    result = score.mces.evaluate(gold, graphs,
  File "/Users/ar/hpsg/mtool/score/mces.py", line 493, in evaluate
    for id, g, s, tops, labels, properties, anchors, \
  File "/Users/ar/hpsg/mtool/score/mces.py", line 490, in <genexpr>
    results = (schedule(g, s, rrhc_limit, mces_limit, trace, errors)
  File "/Users/ar/hpsg/mtool/score/mces.py", line 441, in schedule
    raise e;
  File "/Users/ar/hpsg/mtool/score/mces.py", line 389, in schedule
    = g.score(s, mapping);
  File "/Users/ar/hpsg/mtool/graph.py", line 856, in score
    = tuples(self, identities1);
  File "/Users/ar/hpsg/mtool/graph.py", line 771, in tuples
    anchors.add((identity, anchor));
TypeError: unhashable type: ?list'


> On 22 Aug 2020, at 04:02, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi again, alexandre and mike:
> 
>> I added the #XXXX right before the EDS serialization. The only different between these files in the https://github.com/cfmrp/mtool/blob/master/data/score/eds/wsj.pet.eds is that these files are not formatted with one predicate per line, instead, the EDS is serialised in a single line without line breaks.
> 
> i am tempted to declare those line breaks a necessary part of the
> native EDS syntax (though i see that the current EdsTop wiki page does
> not explicitly state that).  mike, could you change EDS serialization
> in pyDelphin to reflect the multi-line format exemplified on that
> page?  also, when you have an item identifier available i would
> suggest you prefix the EDS with an additional line (assuming the
> identifier is 4711):
> 
> #4711
> 
> this latter addition should be considered optional, though, and i
> shall check that the mtool EDS reader does not require it (i suspect
> currently it does; mtool has hardly been used in conjunction with
> native EDS serialization, so this is a welcome push toward better
> cross-format and -platform interoperability).
> 
> regarding your lack of success when invoking the scorer in [incr
> tsdb()], alexandre: could you make available to me a copy of the two
> profiles involved?
> 
> best wishes, oe


From oe at ifi.uio.no  Sat Aug 22 18:38:25 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sat, 22 Aug 2020 18:38:25 +0200
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <50D90672-043A-4568-9419-3C6E1D83AFA7@gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
	<CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
	<672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com>
	<CA+_Fm6+oH8vW+T2m5hj_twuHnZAgh0d6CK=NCH6huHpKJDQNwQ@mail.gmail.com>
	<50D90672-043A-4568-9419-3C6E1D83AFA7@gmail.com>
Message-ID: <CA+_Fm6+BMzzGEAn+o1Gkc2n7Xyn4Y09oHh9sorvRaKLABVDSvQ@mail.gmail.com>

> % ./main.py --read eds --score mrp --gold ../sick/1.eds ../sick/2.eds
> Traceback (most recent call last):
>   File "./main.py", line 472, in <module>
>     main();
>   File "./main.py", line 385, in main
>     result = score.mces.evaluate(gold, graphs,
>   File "/Users/ar/hpsg/mtool/score/mces.py", line 493, in evaluate
>     for id, g, s, tops, labels, properties, anchors, \
>   File "/Users/ar/hpsg/mtool/score/mces.py", line 490, in <genexpr>
>     results = (schedule(g, s, rrhc_limit, mces_limit, trace, errors)
>   File "/Users/ar/hpsg/mtool/score/mces.py", line 441, in schedule
>     raise e;
>   File "/Users/ar/hpsg/mtool/score/mces.py", line 389, in schedule
>     = g.score(s, mapping);
>   File "/Users/ar/hpsg/mtool/graph.py", line 856, in score
>     = tuples(self, identities1);
>   File "/Users/ar/hpsg/mtool/graph.py", line 771, in tuples
>     anchors.add((identity, anchor));
> TypeError: unhashable type: ?list'

could you report that in the mtool issue tracker (on M$ GitHub),
ideally attaching the two input files?  i shall have a look :-).

oe


From arademaker at gmail.com  Sat Aug 22 20:31:21 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Sat, 22 Aug 2020 15:31:21 -0300
Subject: [developers] Comparing a profile with a grammar output
In-Reply-To: <CA+_Fm6+BMzzGEAn+o1Gkc2n7Xyn4Y09oHh9sorvRaKLABVDSvQ@mail.gmail.com>
References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com>
	<CA+_Fm6L2r2Qt=J_414d0mwDEC2gcuHKygqY0v6MqzUP8uVgcuw@mail.gmail.com>
	<672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com>
	<CA+_Fm6+oH8vW+T2m5hj_twuHnZAgh0d6CK=NCH6huHpKJDQNwQ@mail.gmail.com>
	<50D90672-043A-4568-9419-3C6E1D83AFA7@gmail.com>
	<CA+_Fm6+BMzzGEAn+o1Gkc2n7Xyn4Y09oHh9sorvRaKLABVDSvQ@mail.gmail.com>
Message-ID: <1472E9EA-A52C-4F38-822D-4E162F2DF422@gmail.com>


Done, https://github.com/cfmrp/mtool/issues/78, but I saw that you just fixed the error! Thank you.

> On 22 Aug 2020, at 13:38, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
>> % ./main.py --read eds --score mrp --gold ../sick/1.eds ../sick/2.eds
>> Traceback (most recent call last):
>>  File "./main.py", line 472, in <module>
>>    main();
>>  File "./main.py", line 385, in main
>>    result = score.mces.evaluate(gold, graphs,
>>  File "/Users/ar/hpsg/mtool/score/mces.py", line 493, in evaluate
>>    for id, g, s, tops, labels, properties, anchors, \
>>  File "/Users/ar/hpsg/mtool/score/mces.py", line 490, in <genexpr>
>>    results = (schedule(g, s, rrhc_limit, mces_limit, trace, errors)
>>  File "/Users/ar/hpsg/mtool/score/mces.py", line 441, in schedule
>>    raise e;
>>  File "/Users/ar/hpsg/mtool/score/mces.py", line 389, in schedule
>>    = g.score(s, mapping);
>>  File "/Users/ar/hpsg/mtool/graph.py", line 856, in score
>>    = tuples(self, identities1);
>>  File "/Users/ar/hpsg/mtool/graph.py", line 771, in tuples
>>    anchors.add((identity, anchor));
>> TypeError: unhashable type: ?list'
> 
> could you report that in the mtool issue tracker (on M$ GitHub),
> ideally attaching the two input files?  i shall have a look :-).
> 
> oe


From olzama at uw.edu  Thu Sep  3 19:02:34 2020
From: olzama at uw.edu (Olga Zamaraeva)
Date: Thu, 3 Sep 2020 10:02:34 -0700
Subject: [developers] A one-off Matrix Dev meeting next Wednesday
Message-ID: <CANy_-jLeAE8wGZ8s+WSjmRqAg4Lt-nyZDegJ1Fp_emJEiK7MnA@mail.gmail.com>

Dear all,

Now that some of us have been actively working on the Matrix for some time,
we thought it would make sense for us to have a meeting every now and then.

So we will have one on *Wednesday Sep 9 6:30 PM Seattle time.* It is just a
one-time thing, focused mostly on stuff Mike, T.J., and myself have been
doing lately (e.g. development practices discussion), which is why we did
not ask others for time preferences and are not trying to cover as many
zones as possible etc.

But we still wanted to let everyone know about it in case someone wants to
join and can make the time!

It will be over Zoom, the invitation below:

Topic: Matrix Dev
Time: Sep 9, 2020 06:30 PM Pacific Time (US and Canada)

Join Zoom Meeting
https://washington.zoom.us/j/92424621772?pwd=OWNxUHZOdXdiNmMxbVpabVlsM2hJUT09

Meeting ID: 924 2462 1772
Passcode: 900358
One tap mobile
+12063379723,,92424621772# US (Seattle)
+12532158782,,92424621772# US (Tacoma)

Dial by your location
        +1 206 337 9723 US (Seattle)
        +1 253 215 8782 US (Tacoma)
        +1 213 338 8477 US (Los Angeles)
        +1 346 248 7799 US (Houston)
        +1 602 753 0140 US (Phoenix)
        +1 669 219 2599 US (San Jose)
        +1 669 900 6833 US (San Jose)
        +1 720 928 9299 US (Denver)
        +1 971 247 1195 US (Portland)
        +1 786 635 1003 US (Miami)
        +1 267 831 0333 US (Philadelphia)
        +1 301 715 8592 US (Germantown)
        +1 312 626 6799 US (Chicago)
        +1 470 250 9358 US (Atlanta)
        +1 470 381 2552 US (Atlanta)
        +1 646 518 9805 US (New York)
        +1 646 876 9923 US (New York)
        +1 651 372 8299 US (St. Paul)
Meeting ID: 924 2462 1772
Find your local number: https://washington.zoom.us/u/abq23yFNjV

Join by SIP
92424621772 at zoomcrc.com

Join by H.323
162.255.37.11 (US West)
162.255.36.11 (US East)
221.122.88.195 (China)
115.114.131.7 (India Mumbai)
115.114.115.7 (India Hyderabad)
213.19.144.110 (Amsterdam Netherlands)
213.244.140.110 (Germany)
103.122.166.55 (Australia)
209.9.211.110 (Hong Kong SAR)
64.211.144.160 (Brazil)
69.174.57.160 (Canada)
207.226.132.110 (Japan)
Meeting ID: 924 2462 1772
Passcode: 900358


-- 
Olga Zamaraeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200903/70b4e1ef/attachment.html>

From arademaker at gmail.com  Wed Sep  9 06:06:38 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 9 Sep 2020 01:06:38 -0300
Subject: [developers] Abstract Wikipedia
Message-ID: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>


https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia

The goal of Abstract Wikipedia is to let more people share in more knowledge in more languages. Abstract Wikipedia is an extension of Wikidata. In Abstract Wikipedia, people can create and maintain Wikipedia articles in a language-independent way. A Wikipedia in a language can translate this language-independent article into its language. Code does the translation.

The Grammatical Framework community provided some response and suggestion on how GF could be used for language generation 

https://meta.m.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Response_from_the_Grammatical_Framework_community

I wonder if  the statement about HPSG is fair:

> check out other grammar formalisms, like HPSG, you'll see similar coverage to GF, but no unified API for different languages.


Alexandre 
Sent from my iPhone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200909/bdb0130c/attachment.html>

From ebender at uw.edu  Wed Sep  9 15:47:01 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Wed, 9 Sep 2020 06:47:01 -0700
Subject: [developers] Abstract Wikipedia
In-Reply-To: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>
References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>
Message-ID: <CAMype6fM2Y-WbhGMc3oFDx6ngdHVOBviLf2Sha3=ONmp=q9gpQ@mail.gmail.com>

Dear Alexandre,

The ERG has been under continuous development since 1993, with definitely
more than 27 person-years in it at this point. I guess the question is
whether the GF resources are comparable in scale...

Emily

On Tue, Sep 8, 2020 at 9:07 PM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia
>
> The goal of *Abstract Wikipedia* is to let more people share in more
> knowledge in more languages. Abstract Wikipedia is an extension of
> Wikidata. In Abstract Wikipedia, people can create and maintain Wikipedia
> articles in a language-independent way. A Wikipedia in a language can
> translate this language-independent article into its language. Code does
> the translation.
>
> The Grammatical Framework community provided some response and suggestion
> on how GF could be used for language generation
>
>
> https://meta.m.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Response_from_the_Grammatical_Framework_community
>
> I wonder if  the statement about HPSG is fair:
>
> check out other grammar formalisms, like HPSG
> <http://moin.delph-in.net/GrammarCatalogue>, you'll see similar coverage
> to GF, but no unified API for different languages.
>
>
> Alexandre
> Sent from my iPhone
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200909/908c4b52/attachment.html>

From arademaker at gmail.com  Wed Sep  9 16:54:18 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 9 Sep 2020 11:54:18 -0300
Subject: [developers] Abstract Wikipedia
In-Reply-To: <CAMype6fM2Y-WbhGMc3oFDx6ngdHVOBviLf2Sha3=ONmp=q9gpQ@mail.gmail.com>
References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>
	<CAMype6fM2Y-WbhGMc3oFDx6ngdHVOBviLf2Sha3=ONmp=q9gpQ@mail.gmail.com>
Message-ID: <F894C6F4-9F78-4089-9F56-83CDB4916DDE@gmail.com>


Yes, definitely. Regarding the unified API for multiple languages, I mentioned MATRIX as having a similar goal: in the end, it is all about speed up the development of grammars and provides similar analysis of similar linguistic constructions (reuse and interoperability), right? Aarne Ranta replied to me today in the GF mailing list. Since it is a public list, I am copying here (also at https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/c0r2Lm0eAgAJ) for hearing from this community. 

He admitted the coverage of GF is years behind, anyway. If so, maybe it is one more opportunity to demystify the complexity of real uses of HPSG grammars. The previous projects should be enough evidence, but not everybody knows about http://moin.delph-in.net/OldProjects. My personal experience is that things are not always ready to use out-of-the-box, but I find my way through the HPSG/DELPHI-IN universe during the last 4-5 years! ;-)


> Hello Alexandre,
> 
> Thanks for pointing this out. This reminds me that I should write a little summary of what would be involved in reproducing the RGL, and what our starting point was back in 2001. So here it is:
> 
> The main inspirations of GF-RGL were
> 
> - XFST, Xerox Finite State morphologies for several languages
> - CLE, Core Language Engine, an SRI-Cambridge-Telia etc project for building syntax modules for some languages to be used in applications
> 
> The main lesson learned was
> 
> - make it open source and involve a community. CLE in practically disappeared because nobody had the rights to continue with it, and XFST was increasingly replaced by open-source variants
> 
> This brings us to
> 
> - the LinGO matrix, using HPSG, an open-source project
> 
> also an inspiration for the RGL, just a bit later, but still alive and active. The difference we wanted to make was
> 
> - think about non-linguist programmers as the majority of users
> 
> This led us to 
> 
> - design GF and its module system in a way similar to programming languages, rather than grammar formalisms
> - separate the linguist's view (the internals of the RGL) from the application programmer's view (the RGL API)
> 
> The closest GF counterpart of the LinGO matrix is thus the internal abstract syntax of the RGL. But when looking at the LinGO/DELPH-IN documentation back in 2003 and still today, I cannot see anything corresponding to the API. It is more of a linguists' project than of programmers'. And I think it would be quite a job to develop it into an API direction similar to GF. Not only is the starting point less friendly to that (with GF's formal distinction between abstract and concrete syntax), but even in the GF world, it took several years to bring the module system and the compiler into a state that smoothly supports the division of labour between linguists and application programmers in the way we do.
> 
> This said, HPSG has reached longer in their linguistic coverage in many languages, in particular in the English RGL: GF has nothing like that, and again it would take years of work to build it.
> 
> Of course, the nicest thing would be to share resources in a formalism independent way. This looks quite feasible in the case of morphological lexica, and is an ongoing practice already. But when it comes to syntax, I am less sure. Syntax code in GF and HPSG and other higher-level (above context-free) formalisms is essentially like code in different programming languages. There the practice is that each language has to build their standard libraries from scratch (think about for instance collections and generics in Java, C++, Haskell,...) An alternative is to enable foreign function interfaces (like from Python to C), but I cannot see very concretely right now how this would look for instance between GF and HPSG - and how much there would really be to gain. But of course we have mutual communication, for instance by co-organizing GEAF workshops (Grammar Engineering Across Frameworks), and see each other as allies rather than enemies.
> 
> ParGram (in LFG) could also be mentioned, but it used to be a proprietary system that was more difficult to learn from.
> 
> Regards
> 
>   Aarne.


Anyway, the abstract wikipedia project is about language generation and it seems very interesting.

Best,
Alexandre


> On 9 Sep 2020, at 10:47, Emily M. Bender <ebender at uw.edu> wrote:
> 
> Dear Alexandre,
> 
> The ERG has been under continuous development since 1993, with definitely more than 27 person-years in it at this point. I guess the question is whether the GF resources are comparable in scale...
> 
> Emily
> 


From ebender at uw.edu  Wed Sep  9 18:08:23 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Wed, 9 Sep 2020 09:08:23 -0700
Subject: [developers] Abstract Wikipedia
In-Reply-To: <F894C6F4-9F78-4089-9F56-83CDB4916DDE@gmail.com>
References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>
	<CAMype6fM2Y-WbhGMc3oFDx6ngdHVOBviLf2Sha3=ONmp=q9gpQ@mail.gmail.com>
	<F894C6F4-9F78-4089-9F56-83CDB4916DDE@gmail.com>
Message-ID: <CAMype6c7A14rtCf-K7vx_fVQB-OdPYNvS8XpOGr5+3mB5MrKsw@mail.gmail.com>

I think the closest thing to an analog to GF with DELPH-IN materials would
be an 'API' for authoring
MRS representations that is user friendly for non-linguists + some set of
transfer rules that take
those MRSes into ones that work in each language-specific grammar.

The Grammar Matrix itself is not conceived of as a tool for making grammar
engineering easier for
non-linguists (though people frequently seem to want that).

Emily

On Wed, Sep 9, 2020 at 7:54 AM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> Yes, definitely. Regarding the unified API for multiple languages, I
> mentioned MATRIX as having a similar goal: in the end, it is all about
> speed up the development of grammars and provides similar analysis of
> similar linguistic constructions (reuse and interoperability), right? Aarne
> Ranta replied to me today in the GF mailing list. Since it is a public
> list, I am copying here (also at
> https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/c0r2Lm0eAgAJ) for
> hearing from this community.
>
> He admitted the coverage of GF is years behind, anyway. If so, maybe it is
> one more opportunity to demystify the complexity of real uses of HPSG
> grammars. The previous projects should be enough evidence, but not
> everybody knows about http://moin.delph-in.net/OldProjects. My personal
> experience is that things are not always ready to use out-of-the-box, but I
> find my way through the HPSG/DELPHI-IN universe during the last 4-5 years!
> ;-)
>
>
> > Hello Alexandre,
> >
> > Thanks for pointing this out. This reminds me that I should write a
> little summary of what would be involved in reproducing the RGL, and what
> our starting point was back in 2001. So here it is:
> >
> > The main inspirations of GF-RGL were
> >
> > - XFST, Xerox Finite State morphologies for several languages
> > - CLE, Core Language Engine, an SRI-Cambridge-Telia etc project for
> building syntax modules for some languages to be used in applications
> >
> > The main lesson learned was
> >
> > - make it open source and involve a community. CLE in practically
> disappeared because nobody had the rights to continue with it, and XFST was
> increasingly replaced by open-source variants
> >
> > This brings us to
> >
> > - the LinGO matrix, using HPSG, an open-source project
> >
> > also an inspiration for the RGL, just a bit later, but still alive and
> active. The difference we wanted to make was
> >
> > - think about non-linguist programmers as the majority of users
> >
> > This led us to
> >
> > - design GF and its module system in a way similar to programming
> languages, rather than grammar formalisms
> > - separate the linguist's view (the internals of the RGL) from the
> application programmer's view (the RGL API)
> >
> > The closest GF counterpart of the LinGO matrix is thus the internal
> abstract syntax of the RGL. But when looking at the LinGO/DELPH-IN
> documentation back in 2003 and still today, I cannot see anything
> corresponding to the API. It is more of a linguists' project than of
> programmers'. And I think it would be quite a job to develop it into an API
> direction similar to GF. Not only is the starting point less friendly to
> that (with GF's formal distinction between abstract and concrete syntax),
> but even in the GF world, it took several years to bring the module system
> and the compiler into a state that smoothly supports the division of labour
> between linguists and application programmers in the way we do.
> >
> > This said, HPSG has reached longer in their linguistic coverage in many
> languages, in particular in the English RGL: GF has nothing like that, and
> again it would take years of work to build it.
> >
> > Of course, the nicest thing would be to share resources in a formalism
> independent way. This looks quite feasible in the case of morphological
> lexica, and is an ongoing practice already. But when it comes to syntax, I
> am less sure. Syntax code in GF and HPSG and other higher-level (above
> context-free) formalisms is essentially like code in different programming
> languages. There the practice is that each language has to build their
> standard libraries from scratch (think about for instance collections and
> generics in Java, C++, Haskell,...) An alternative is to enable foreign
> function interfaces (like from Python to C), but I cannot see very
> concretely right now how this would look for instance between GF and HPSG -
> and how much there would really be to gain. But of course we have mutual
> communication, for instance by co-organizing GEAF workshops (Grammar
> Engineering Across Frameworks), and see each other as allies rather than
> enemies.
> >
> > ParGram (in LFG) could also be mentioned, but it used to be a
> proprietary system that was more difficult to learn from.
> >
> > Regards
> >
> >   Aarne.
>
>
> Anyway, the abstract wikipedia project is about language generation and it
> seems very interesting.
>
> Best,
> Alexandre
>
>
>
> > On 9 Sep 2020, at 10:47, Emily M. Bender <ebender at uw.edu> wrote:
> >
> > Dear Alexandre,
> >
> > The ERG has been under continuous development since 1993, with
> definitely more than 27 person-years in it at this point. I guess the
> question is whether the GF resources are comparable in scale...
> >
> > Emily
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200909/9632243f/attachment-0001.html>

From arademaker at gmail.com  Wed Sep  9 18:33:50 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 9 Sep 2020 13:33:50 -0300
Subject: [developers] Abstract Wikipedia
In-Reply-To: <CAMype6c7A14rtCf-K7vx_fVQB-OdPYNvS8XpOGr5+3mB5MrKsw@mail.gmail.com>
References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>
	<CAMype6fM2Y-WbhGMc3oFDx6ngdHVOBviLf2Sha3=ONmp=q9gpQ@mail.gmail.com>
	<F894C6F4-9F78-4089-9F56-83CDB4916DDE@gmail.com>
	<CAMype6c7A14rtCf-K7vx_fVQB-OdPYNvS8XpOGr5+3mB5MrKsw@mail.gmail.com>
Message-ID: <B6F6A755-690E-4E92-B3A2-7CB6B51764DC@gmail.com>


Thank you Emily, yes, and they don't also have a solution for making grammar engineering easier for non-linguistics either. It is all about the separation of concerns between the linguistics and non-linguistics. They claim that a non-linguist person can use their RGL (resource grammar library) for building what they call the `application grammar`. The RGL would be maintained by linguistics. In that sense, as you said, the DELPH-IN equivalence would be a set of transfer rules. The most obvious example that came to my mind for an end-to-end approach that could be an example of this is the openproof project:

http://svn.delph-in.net/erg/tags/2018/openproof/README

Right? 

Best,
Alexandre

> On 9 Sep 2020, at 13:08, Emily M. Bender <ebender at uw.edu> wrote:
> 
> I think the closest thing to an analog to GF with DELPH-IN materials would be an 'API' for authoring
> MRS representations that is user friendly for non-linguists + some set of transfer rules that take
> those MRSes into ones that work in each language-specific grammar.
> 
> The Grammar Matrix itself is not conceived of as a tool for making grammar engineering easier for
> non-linguists (though people frequently seem to want that).
> 
> Emily
> 


From arademaker at gmail.com  Wed Sep  9 20:42:57 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 9 Sep 2020 15:42:57 -0300
Subject: [developers] Valid MRS? Bug in ERG?
Message-ID: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>


Hi,

Are the following two MRSs considered valid? Note that TOP is h0, h0 is qeq h1 but h1 is not the label of any predicate. In both cases, pydelphin could not make the transformation to EDS. I just want to confirm if they are invalid, if so, maybe pydelphin can?t really make sense of them.

One additional possible silly question. If they are invalid, can it be consider a bug in ERG?


[ TOP: h0
  INDEX: e2 [ e SF: prop-or-ques ]
  RELS: < [ unknown<0:27> LBL: h4 ARG: u5 ARG0: e2 ]
          [ _quick_a_1<0:7> LBL: h4 ARG0: e6 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: e2 ]
          [ _and_c<8:11> LBL: h4 ARG0: e7 [ e SF: prop ] ARG1: u8 ARG2: e9 [ e SF: prop ] ]
          [ _without_p<12:19> LBL: h4 ARG0: e9 ARG1: e2 ARG2: x10 [ x PERS: 3 NUM: sg IND: + ] ]
          [ udef_q<12:19> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]
          [ _warning_n_of<20:27> LBL: h14 ARG0: x10 ARG1: i15 ] >
  HCONS: < h0 qeq h1 h12 qeq h14 > ]


[ TOP: h0
  INDEX: e2 [ e SF: prop-or-ques TENSE: untensed MOOD: indicative ]
  RELS: < [ unknown<0:69> LBL: h4 ARG: u5 ARG0: e2 ]
          [ _in_p_loc<0:2> LBL: h4 ARG0: e2 ARG1: u6 ARG2: x7 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]
          [ _the_q<3:6> LBL: h8 ARG0: x7 RSTR: h9 BODY: h10 ]
          [ compound<7:17> LBL: h11 ARG0: e12 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x13 [ x IND: + PT: pt ] ]
          [ udef_q<7:12> LBL: h14 ARG0: x13 RSTR: h15 BODY: h16 ]
          [ _front_n_1<7:12> LBL: h17 ARG0: x13 ]
          [ _part_n_of<13:17> LBL: h11 ARG0: x7 ARG1: i18 ]
          [ _of_p<18:20> LBL: h11 ARG0: e19 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x20 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]
          [ _the_q<21:24> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]
          [ _neck_n_1<25:29> LBL: h24 ARG0: x20 ]
          [ _below_p<30:35> LBL: h11 ARG0: e25 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x26 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]
          [ _the_q<36:39> LBL: h27 ARG0: x26 RSTR: h28 BODY: h29 ]
          [ _chin_n_1<40:44> LBL: h30 ARG0: x26 ]
          [ _and_c<45:48> LBL: h11 ARG0: e31 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: e25 ARG2: e32 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ]
          [ _above_p<49:54> LBL: h11 ARG0: e32 ARG1: x7 ARG2: x33 [ x PERS: 3 NUM: sg PT: pt ] ]
          [ _the_q<55:58> LBL: h34 ARG0: x33 RSTR: h35 BODY: h36 ]
          [ _collarbone/nn_u_unknown<59:69> LBL: h37 ARG0: x33 ] >
  HCONS: < h0 qeq h1 h9 qeq h11 h15 qeq h17 h22 qeq h24 h28 qeq h30 h35 qeq h37 > ]


Best,
Alexandre


From goodman.m.w at gmail.com  Thu Sep 10 02:30:39 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Thu, 10 Sep 2020 08:30:39 +0800
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
Message-ID: <CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>

Hi Alexandre,

These are disconnected graphs. Having the right side of a qeq select a
handle that is not the label of any EP is an invalid configuration. This
most likely is a symptom of some bug in the ERG.

Regarding conversion to EDS with PyDelphin, I've created
https://github.com/delph-in/pydelphin/issues/316 to track the issue. I
think the LKB's EDS code will more aggressively search for a top for the
EDS graph during conversion, perhaps looking to the INDEX. If anyone
(Stephan?) cares to explain the procedure for selecting tops in
less-than-perfect MRSs, I'd be happy to try and implement it in PyDelphin.
Otherwise I'll just try to make the error message more informative.


On Thu, Sep 10, 2020 at 2:44 AM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> Hi,
>
> Are the following two MRSs considered valid? Note that TOP is h0, h0 is
> qeq h1 but h1 is not the label of any predicate. In both cases, pydelphin
> could not make the transformation to EDS. I just want to confirm if they
> are invalid, if so, maybe pydelphin can?t really make sense of them.
>
> One additional possible silly question. If they are invalid, can it be
> consider a bug in ERG?
>
>
> [ TOP: h0
>   INDEX: e2 [ e SF: prop-or-ques ]
>   RELS: < [ unknown<0:27> LBL: h4 ARG: u5 ARG0: e2 ]
>           [ _quick_a_1<0:7> LBL: h4 ARG0: e6 [ e SF: prop TENSE: untensed
> MOOD: indicative PROG: - PERF: - ] ARG1: e2 ]
>           [ _and_c<8:11> LBL: h4 ARG0: e7 [ e SF: prop ] ARG1: u8 ARG2: e9
> [ e SF: prop ] ]
>           [ _without_p<12:19> LBL: h4 ARG0: e9 ARG1: e2 ARG2: x10 [ x
> PERS: 3 NUM: sg IND: + ] ]
>           [ udef_q<12:19> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]
>           [ _warning_n_of<20:27> LBL: h14 ARG0: x10 ARG1: i15 ] >
>   HCONS: < h0 qeq h1 h12 qeq h14 > ]
>
>
> [ TOP: h0
>   INDEX: e2 [ e SF: prop-or-ques TENSE: untensed MOOD: indicative ]
>   RELS: < [ unknown<0:69> LBL: h4 ARG: u5 ARG0: e2 ]
>           [ _in_p_loc<0:2> LBL: h4 ARG0: e2 ARG1: u6 ARG2: x7 [ x PERS: 3
> NUM: sg IND: + PT: pt ] ]
>           [ _the_q<3:6> LBL: h8 ARG0: x7 RSTR: h9 BODY: h10 ]
>           [ compound<7:17> LBL: h11 ARG0: e12 [ e SF: prop TENSE: untensed
> MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x13 [ x IND: + PT: pt ] ]
>           [ udef_q<7:12> LBL: h14 ARG0: x13 RSTR: h15 BODY: h16 ]
>           [ _front_n_1<7:12> LBL: h17 ARG0: x13 ]
>           [ _part_n_of<13:17> LBL: h11 ARG0: x7 ARG1: i18 ]
>           [ _of_p<18:20> LBL: h11 ARG0: e19 [ e SF: prop TENSE: untensed
> MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x20 [ x PERS: 3 NUM: sg
> IND: + PT: pt ] ]
>           [ _the_q<21:24> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]
>           [ _neck_n_1<25:29> LBL: h24 ARG0: x20 ]
>           [ _below_p<30:35> LBL: h11 ARG0: e25 [ e SF: prop TENSE:
> untensed MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x26 [ x PERS: 3
> NUM: sg IND: + PT: pt ] ]
>           [ _the_q<36:39> LBL: h27 ARG0: x26 RSTR: h28 BODY: h29 ]
>           [ _chin_n_1<40:44> LBL: h30 ARG0: x26 ]
>           [ _and_c<45:48> LBL: h11 ARG0: e31 [ e SF: prop TENSE: untensed
> MOOD: indicative PROG: - PERF: - ] ARG1: e25 ARG2: e32 [ e SF: prop TENSE:
> untensed MOOD: indicative PROG: - PERF: - ] ]
>           [ _above_p<49:54> LBL: h11 ARG0: e32 ARG1: x7 ARG2: x33 [ x
> PERS: 3 NUM: sg PT: pt ] ]
>           [ _the_q<55:58> LBL: h34 ARG0: x33 RSTR: h35 BODY: h36 ]
>           [ _collarbone/nn_u_unknown<59:69> LBL: h37 ARG0: x33 ] >
>   HCONS: < h0 qeq h1 h9 qeq h11 h15 qeq h17 h22 qeq h24 h28 qeq h30 h35
> qeq h37 > ]
>
>
> Best,
> Alexandre
>
>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200910/205f884d/attachment.html>

From oe at ifi.uio.no  Thu Sep 10 08:45:15 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Thu, 10 Sep 2020 08:45:15 +0200
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
Message-ID: <CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>

g'day:

> I think the LKB's EDS code will more aggressively search for a top for the EDS graph during conversion, perhaps looking to the INDEX. If anyone (Stephan?) cares to explain the procedure for selecting tops in less-than-perfect MRSs, I'd be happy to try and implement it in PyDelphin.

yes, robustness to unusual or illformed (as in this case) MRSs has
long been a key goal in the EDS conversion (in the LKB); MRS
infelicities (in ERG parses) were probably more common in 2002 than
today, but still i think that conversion should preferably never fail,
i.e. possibly rather drop information from an illformed MRS than not
yield an EDS at all.

regarding the top node, i do indeed fall back to the INDEX, if need be:

  (let* ((ltop (ed-find-representative eds (psoa-top-h psoa)))
         (index (ed-find-representative eds (psoa-index psoa))))
    (setf (eds-top eds)
      (or (and (ed-p ltop) (ed-id ltop))
          (and (ed-p index) (ed-id index))
          (and (var-p (psoa-index psoa))
               (var-string (psoa-index psoa))))))

the third clause in the or() appears intended to deal with an MRS
whose INDEX is not the intrinsic variable of any EP.  in that case,
the EDS will end up with a top that is not the identifier of any of
its nodes, so effectively no top.

thinking about such corner cases just now, i am tempted to drop that
third fall-back clause and leave the top empty (which would be
formally equivalent, seeing as the top property is interpreted as an
annotation on one of the actual graph nodes).  it appears native
serialization allows for empty top nodes already, in which case there
will be nothing following the opening brace on the first line:

  (format
   stream
   "{~@[~(~a~):~]~
    ~:[~3*~; (~@[cyclic~*~]~@[ ~*~]~@[fragmented~*~])~]~@[~%~]"
   (eds-top object)
   (and *eds-show-status-p* (or cyclicp fragmentedp) )
   cyclicp (and cyclicp fragmentedp) fragmentedp
   (eds-relations object))

while i am sure we have never hit empty tops while working with MRSs
produced by the ERG, the above suggests that (a) identification of the
top node is optional in EDS and (b) native serialization was intended
as a line-oriented format.

mike, may i suggest you add the fall-back, looking for the INDEX, and
otherwise allow EDSs whose top is empty.  regarding the exact
definition of the native EDS serialization, i shall return to that
question in the original thread we had on the topic (one might
disallow whitespace between the opening brace and the optional top, to
try and evade conclusion (b) above).

cheers, oe


From goodman.m.w at gmail.com  Thu Sep 10 09:17:28 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Thu, 10 Sep 2020 15:17:28 +0800
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
Message-ID: <CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>

Thanks for the clarification, Stephan. I've noted the suggestion for
backing off on TOP to INDEX and for allowing no top. This makes sense.

I'm completely unable to make sense of the lisp format call, so I'm not
sure what you mean regarding conclusion (b), but I'll wait for your post to
the other thread.

On Thu, Sep 10, 2020 at 2:45 PM Stephan Oepen <oe at ifi.uio.no> wrote:

> g'day:
>
> > I think the LKB's EDS code will more aggressively search for a top for
> the EDS graph during conversion, perhaps looking to the INDEX. If anyone
> (Stephan?) cares to explain the procedure for selecting tops in
> less-than-perfect MRSs, I'd be happy to try and implement it in PyDelphin.
>
> yes, robustness to unusual or illformed (as in this case) MRSs has
> long been a key goal in the EDS conversion (in the LKB); MRS
> infelicities (in ERG parses) were probably more common in 2002 than
> today, but still i think that conversion should preferably never fail,
> i.e. possibly rather drop information from an illformed MRS than not
> yield an EDS at all.
>
> regarding the top node, i do indeed fall back to the INDEX, if need be:
>
>   (let* ((ltop (ed-find-representative eds (psoa-top-h psoa)))
>          (index (ed-find-representative eds (psoa-index psoa))))
>     (setf (eds-top eds)
>       (or (and (ed-p ltop) (ed-id ltop))
>           (and (ed-p index) (ed-id index))
>           (and (var-p (psoa-index psoa))
>                (var-string (psoa-index psoa))))))
>
> the third clause in the or() appears intended to deal with an MRS
> whose INDEX is not the intrinsic variable of any EP.  in that case,
> the EDS will end up with a top that is not the identifier of any of
> its nodes, so effectively no top.
>
> thinking about such corner cases just now, i am tempted to drop that
> third fall-back clause and leave the top empty (which would be
> formally equivalent, seeing as the top property is interpreted as an
> annotation on one of the actual graph nodes).  it appears native
> serialization allows for empty top nodes already, in which case there
> will be nothing following the opening brace on the first line:
>
>   (format
>    stream
>    "{~@[~(~a~):~]~
>     ~:[~3*~; (~@[cyclic~*~]~@[ ~*~]~@[fragmented~*~])~]~@[~%~]"
>    (eds-top object)
>    (and *eds-show-status-p* (or cyclicp fragmentedp) )
>    cyclicp (and cyclicp fragmentedp) fragmentedp
>    (eds-relations object))
>
> while i am sure we have never hit empty tops while working with MRSs
> produced by the ERG, the above suggests that (a) identification of the
> top node is optional in EDS and (b) native serialization was intended
> as a line-oriented format.
>
> mike, may i suggest you add the fall-back, looking for the INDEX, and
> otherwise allow EDSs whose top is empty.  regarding the exact
> definition of the native EDS serialization, i shall return to that
> question in the original thread we had on the topic (one might
> disallow whitespace between the opening brace and the optional top, to
> try and evade conclusion (b) above).
>
> cheers, oe
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200910/9a1f8377/attachment-0001.html>

From bond at ieee.org  Thu Sep 10 13:36:02 2020
From: bond at ieee.org (Francis Bond)
Date: Thu, 10 Sep 2020 19:36:02 +0800
Subject: [developers] Abstract Wikipedia
In-Reply-To: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>
References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>
Message-ID: <CA+arSXjEPEsF3ewDym9p+V7RBi5qPO2kVkv0G1ktSY5d0++PPg@mail.gmail.com>

I think it is true.   GF did a lot of vocabulary acquisition based on OMW
1.0 (some of my students helped) so they have vocab linked to synsets, as
well as their own internal semantic hierarchy.

Their actual grammars are a lot more basic than the ERG, and of course the
coverage varies from language to language.

On Wed, Sep 9, 2020 at 12:07 PM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia
>
> The goal of *Abstract Wikipedia* is to let more people share in more
> knowledge in more languages. Abstract Wikipedia is an extension of
> Wikidata. In Abstract Wikipedia, people can create and maintain Wikipedia
> articles in a language-independent way. A Wikipedia in a language can
> translate this language-independent article into its language. Code does
> the translation.
>
> The Grammatical Framework community provided some response and suggestion
> on how GF could be used for language generation
>
>
> https://meta.m.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Response_from_the_Grammatical_Framework_community
>
> I wonder if  the statement about HPSG is fair:
>
> check out other grammar formalisms, like HPSG
> <http://moin.delph-in.net/GrammarCatalogue>, you'll see similar coverage
> to GF, but no unified API for different languages.
>
>
> Alexandre
> Sent from my iPhone
>


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200910/74aba532/attachment.html>

From kivs at bultreebank.org  Thu Sep 10 14:11:33 2020
From: kivs at bultreebank.org (=?utf-8?Q?Kiril=20Simov?=)
Date: Thu, 10 Sep 2020 15:11:33 +0300
Subject: [developers] =?utf-8?q?Abstract_Wikipedia?=
In-Reply-To: <CA+arSXjEPEsF3ewDym9p+V7RBi5qPO2kVkv0G1ktSY5d0++PPg@mail.gmail.com>
References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com>
	<CA+arSXjEPEsF3ewDym9p+V7RBi5qPO2kVkv0G1ktSY5d0++PPg@mail.gmail.com>
Message-ID: <20200910121133.8664.qmail@s481.sureserver.com>

They also are using word embeddings together with
their grammar for selection of the appropriate
lexical forms.

With best regards,

Kiril

>  -------Original Message-------
>  From: Francis Bond <bond at ieee.org>
>  To: Alexandre Rademaker <arademaker at gmail.com>
>  Cc: developers <developers at delph-in.net>
>  Subject: Re: [developers] Abstract Wikipedia
>  Sent: 10 Sep '20 14:36
>  
>  I think it is true. GF did a lot of vocabulary acquisition based on
>  OMW 1.0 (some of my students helped) so they have vocab linked to
>  synsets, as well as their own internal semantic hierarchy.
>  
>  Their actual grammars are a lot more basic than the ERG, and of course
>  the coverage varies from language to language.
>  
>  On Wed, Sep 9, 2020 at 12:07 PM Alexandre Rademaker
>  <arademaker at gmail.com> wrote:
>  
>  > https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia
>  >
>  > The goal of ABSTRACT WIKIPEDIA is to let more people share in more
>  > knowledge in more languages. Abstract Wikipedia is an extension of
>  > Wikidata. In Abstract Wikipedia, people can create and maintain
>  > Wikipedia articles in a language-independent way. A Wikipedia in a
>  > language can translate this language-independent article into its
>  > language. Code does the translation.
>  >
>  > The Grammatical Framework community provided some response and
>  > suggestion on how GF could be used for language generation
>  >
>  >
>  
https://meta.m.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Response_from_the_Grammatical_Framework_community
>  >
>  >
>  > I wonder if the statement about HPSG is fair:
>  >
>  >> check out other grammar formalisms, like HPSG, you'll see similar
>  >> coverage to GF, but no unified API for different languages.
>  >
>  > Alexandre
>  >
>  > Sent from my iPhone
>  
>  --
>  
>  Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>  Division of Linguistics and Multilingual Studies
>  Nanyang Technological University
>  

From arademaker at gmail.com  Thu Sep 10 14:22:17 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Thu, 10 Sep 2020 09:22:17 -0300
Subject: [developers] Abstract Wikipedia
In-Reply-To: <20200910121133.8664.qmail@s481.sureserver.com>
References: <20200910121133.8664.qmail@s481.sureserver.com>
Message-ID: <D532FC3E-1676-466B-A75E-271490A159A0@gmail.com>


Indeed, I followed some of the development of https://github.com/GrammaticalFramework/gf-wordnet too. 

Alexandre 
Sent from my iPhone

> On 10 Sep 2020, at 09:11, Kiril Simov <kivs at bultreebank.org> wrote:
> 
> They also are using word embeddings together with
> their grammar for selection of the appropriate
> lexical forms.
> 
> With best regards,
> 
> Kiril
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200910/d594bc9b/attachment.html>

From olzama at uw.edu  Sat Sep 12 21:08:58 2020
From: olzama at uw.edu (Olga Zamaraeva)
Date: Sat, 12 Sep 2020 12:08:58 -0700
Subject: [developers] lui: no unification result of failure
Message-ID: <CANy_-jKccARVnpXy4rhBVm2Gr9E2YGF2+SkG3LHKoNN7-sKS_w@mail.gmail.com>

Dear Developers,

Have you seen this behavior? (10 seconds video; basically, in some cases,
there is no unification result or failure, just no visible reaction on the
unification attempt)

https://youtu.be/Ifqn1iAodSg

Am I using the software wrong somehow (how? can you tell?), or is this a
bug?

I tried this with LKB FOS with the latest maclui but also with logon on
ubuntu.


Thanks!
-- 
Olga Zamaraeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200912/04618d1b/attachment.html>

From J.A.Carroll at sussex.ac.uk  Sat Sep 12 23:05:05 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Sat, 12 Sep 2020 21:05:05 +0000
Subject: [developers] lui: no unification result of failure
In-Reply-To: <CANy_-jKccARVnpXy4rhBVm2Gr9E2YGF2+SkG3LHKoNN7-sKS_w@mail.gmail.com>
References: <CANy_-jKccARVnpXy4rhBVm2Gr9E2YGF2+SkG3LHKoNN7-sKS_w@mail.gmail.com>
Message-ID: <31CB6DA5-3A0B-4FA8-A0CC-1FB8FE4CE1DA@sussex.ac.uk>

Hi Olga,

I think lui would normally open a new feature structure window showing the unification result. But that doesn't happen in the video. Unfortunately I can only help if the problem is on the lkb side.

Could you try doing the same drag again, either in logon/ubuntu or lkb-fos started from a terminal session - but before the drag execute the following at the lisp prompt:

(trace lkb::lsp-retrieve-object lkb::debug-yadu!)

During/after the drag do you now get any output in the terminal window? There should be 2 calls to the former function and 1 call to the latter, all returning normally.

John


On 12 Sep 2020, at 20:08, Olga Zamaraeva <olzama at uw.edu<mailto:olzama at uw.edu>> wrote:

Dear Developers,

Have you seen this behavior? (10 seconds video; basically, in some cases, there is no unification result or failure, just no visible reaction on the unification attempt)

https://youtu.be/Ifqn1iAodSg<https://youtu.be/Ifqn1iAodSg>

Am I using the software wrong somehow (how? can you tell?), or is this a bug?

I tried this with LKB FOS with the latest maclui but also with logon on ubuntu.


Thanks!
--
Olga Zamaraeva

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200912/bd77b040/attachment.html>

From olzama at uw.edu  Mon Sep 14 19:20:40 2020
From: olzama at uw.edu (Olga Zamaraeva)
Date: Mon, 14 Sep 2020 10:20:40 -0700
Subject: [developers] lui: no unification result of failure
In-Reply-To: <31CB6DA5-3A0B-4FA8-A0CC-1FB8FE4CE1DA@sussex.ac.uk>
References: <CANy_-jKccARVnpXy4rhBVm2Gr9E2YGF2+SkG3LHKoNN7-sKS_w@mail.gmail.com>
	<31CB6DA5-3A0B-4FA8-A0CC-1FB8FE4CE1DA@sussex.ac.uk>
Message-ID: <CANy_-j+04y2yqvSpm5qHyBmXRLx-0GJT3xErMjf+W8DwM2J_WQ@mail.gmail.com>

Hi John,

Here's what I see if I do what you suggested:

[image: Screen Shot 2020-09-14 at 10.19.27 AM.png]

On Sat, Sep 12, 2020 at 2:05 PM John Carroll <J.A.Carroll at sussex.ac.uk>
wrote:

> Hi Olga,
>
> I think lui would normally open a new feature structure window showing the
> unification result. But that doesn't happen in the video. Unfortunately I
> can only help if the problem is on the lkb side.
>
> Could you try doing the same drag again, either in logon/ubuntu or lkb-fos
> started from a terminal session - but before the drag execute the following
> at the lisp prompt:
>
> (trace lkb::lsp-retrieve-object lkb::debug-yadu!)
>
> During/after the drag do you now get any output in the terminal window?
> There should be 2 calls to the former function and 1 call to the latter,
> all returning normally.
>
> John
>
>
> On 12 Sep 2020, at 20:08, Olga Zamaraeva <olzama at uw.edu> wrote:
>
> Dear Developers,
>
> Have you seen this behavior? (10 seconds video; basically, in some cases,
> there is no unification result or failure, just no visible reaction on the
> unification attempt)
>
> https://youtu.be/Ifqn1iAodSg
>
> Am I using the software wrong somehow (how? can you tell?), or is this a
> bug?
>
> I tried this with LKB FOS with the latest maclui but also with logon on
> ubuntu.
>
>
> Thanks!
> --
> Olga Zamaraeva
>
>
>

-- 
Olga Zamaraeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200914/824a7013/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2020-09-14 at 10.19.27 AM.png
Type: image/png
Size: 553438 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200914/824a7013/attachment-0001.png>

From arademaker at gmail.com  Wed Sep 16 05:35:35 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 16 Sep 2020 00:35:35 -0300
Subject: [developers] LkbFos: how to copy from the text area
Message-ID: <CAC8BA1A-3473-4CCD-B31B-7ED26B8EE3BE@gmail.com>


Hi John,

How can I copy the text (the scoped MRSs) from the scoped MRS window? 


Best,
Alexandre

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200916/2fa0451e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 504865 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200916/2fa0451e/attachment-0001.png>

From J.A.Carroll at sussex.ac.uk  Wed Sep 16 11:26:52 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Wed, 16 Sep 2020 09:26:52 +0000
Subject: [developers] LkbFos: how to copy from the text area
In-Reply-To: <CAC8BA1A-3473-4CCD-B31B-7ED26B8EE3BE@gmail.com>
References: <CAC8BA1A-3473-4CCD-B31B-7ED26B8EE3BE@gmail.com>
Message-ID: <EDF8D035-2A31-4FD8-81E5-0F2201D0E484@sussex.ac.uk>

Hi Alexandre,

In LkbFos you can copy from text-like windows such as Scoped MRS, Lkb Top etc. How to do this is different between macOS and Linux since they have different conceptions of copy/paste.

macOS:
1. Shift-drag to highlight the text you want to copy; do this by holding down the shift key while dragging the mouse with the left button held down. Alternatively you can shift-left-click one end of the text and then shift-right-click the other end.
2. Type command-C (or select Copy from the XQuartz Edit menu). The text is now in the system clipboard and can be pasted in the normal way.

Linux:
1. Shift-drag as above.
2. To paste the highlighted text, click the middle mouse button.

I'll add these instructions to http://moin.delph-in.net/LkbFos

John


> On 16 Sep 2020, at 04:35, Alexandre Rademaker <arademaker at gmail.com> wrote:
> 
> 
> Hi John,
> 
> How can I copy the text (the scoped MRSs) from the scoped MRS window? 
> 
> <PastedGraphic-1.png>
> 
> 
> Best,
> Alexandre
> 


From J.A.Carroll at sussex.ac.uk  Wed Sep 16 12:57:40 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Wed, 16 Sep 2020 10:57:40 +0000
Subject: [developers] Fwd:  LkbFos: how to copy from the text area
References: <EDF8D035-2A31-4FD8-81E5-0F2201D0E484@sussex.ac.uk>
Message-ID: <C2F598DC-E7D9-4D86-804C-EBB2DC088339@sussex.ac.uk>

Resending due to possible problem at email gateway


Begin forwarded message:

From: John Carroll <johnca at sussex.ac.uk<mailto:johnca at sussex.ac.uk>>
Subject: Re: [developers] LkbFos: how to copy from the text area
Date: 16 September 2020 at 10:26:51 BST
To: Alexandre Rademaker <arademaker at gmail.com<mailto:arademaker at gmail.com>>
Cc: developers <developers at delph-in.net<mailto:developers at delph-in.net>>

Hi Alexandre,

In LkbFos you can copy from text-like windows such as Scoped MRS, Lkb Top etc. How to do this is different between macOS and Linux since they have different conceptions of copy/paste.

macOS:
1. Shift-drag to highlight the text you want to copy; do this by holding down the shift key while dragging the mouse with the left button held down. Alternatively you can shift-left-click one end of the text and then shift-right-click the other end.
2. Type command-C (or select Copy from the XQuartz Edit menu). The text is now in the system clipboard and can be pasted in the normal way.

Linux:
1. Shift-drag as above.
2. To paste the highlighted text, click the middle mouse button.

I'll add these instructions to http://moin.delph-in.net/LkbFos

John


On 16 Sep 2020, at 04:35, Alexandre Rademaker <arademaker at gmail.com<mailto:arademaker at gmail.com>> wrote:


Hi John,

How can I copy the text (the scoped MRSs) from the scoped MRS window?

<PastedGraphic-1.png>


Best,
Alexandre


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200916/b17dc6eb/attachment.html>

From arademaker at gmail.com  Wed Sep 16 17:18:21 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 16 Sep 2020 12:18:21 -0300
Subject: [developers] LkbFos: how to copy from the text area
In-Reply-To: <EDF8D035-2A31-4FD8-81E5-0F2201D0E484@sussex.ac.uk>
References: <CAC8BA1A-3473-4CCD-B31B-7ED26B8EE3BE@gmail.com>
	<EDF8D035-2A31-4FD8-81E5-0F2201D0E484@sussex.ac.uk>
Message-ID: <B86C4805-71FD-4C49-9B06-37B49374EB2A@gmail.com>


Thank you John. The shift-drag works nicely! I don?t know how to make the right-click on MacOS.

Best,
Alexandre

> On 16 Sep 2020, at 06:26, John Carroll <J.A.Carroll at sussex.ac.uk> wrote:
> 
> Hi Alexandre,
> 
> In LkbFos you can copy from text-like windows such as Scoped MRS, Lkb Top etc. How to do this is different between macOS and Linux since they have different conceptions of copy/paste.
> 
> macOS:
> 1. Shift-drag to highlight the text you want to copy; do this by holding down the shift key while dragging the mouse with the left button held down. Alternatively you can shift-left-click one end of the text and then shift-right-click the other end.
> 2. Type command-C (or select Copy from the XQuartz Edit menu). The text is now in the system clipboard and can be pasted in the normal way.
> 
> Linux:
> 1. Shift-drag as above.
> 2. To paste the highlighted text, click the middle mouse button.
> 
> I'll add these instructions to http://moin.delph-in.net/LkbFos
> 
> John


From J.A.Carroll at sussex.ac.uk  Wed Sep 16 17:34:30 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Wed, 16 Sep 2020 15:34:30 +0000
Subject: [developers] LkbFos: how to copy from the text area
In-Reply-To: <B86C4805-71FD-4C49-9B06-37B49374EB2A@gmail.com>
References: <CAC8BA1A-3473-4CCD-B31B-7ED26B8EE3BE@gmail.com>
	<EDF8D035-2A31-4FD8-81E5-0F2201D0E484@sussex.ac.uk>
	<B86C4805-71FD-4C49-9B06-37B49374EB2A@gmail.com>
Message-ID: <CAA1D213-91B6-4D37-998F-429E36C90BCD@sussex.ac.uk>

I use an Apple Magic Mouse and this supports right-click in macOS. In mouse preferences, right-click by default is "Secondary click".

I've not managed to get Magic Mouse to generate a middle-click in Linux running in VirtualBox. Middle-click is needed for [incr tsdb()]. In the end I gave in and bought a cheap 3-button mouse just to get that gesture.

John

On 16 Sep 2020, at 16:18, Alexandre Rademaker <arademaker at gmail.com<mailto:arademaker at gmail.com>> wrote:


Thank you John. The shift-drag works nicely! I don?t know how to make the right-click on MacOS.

Best,
Alexandre

> On 16 Sep 2020, at 06:26, John Carroll <J.A.Carroll at sussex.ac.uk<mailto:J.A.Carroll at sussex.ac.uk>> wrote:
>
> Hi Alexandre,
>
> In LkbFos you can copy from text-like windows such as Scoped MRS, Lkb Top etc. How to do this is different between macOS and Linux since they have different conceptions of copy/paste.
>
> macOS:
> 1. Shift-drag to highlight the text you want to copy; do this by holding down the shift key while dragging the mouse with the left button held down. Alternatively you can shift-left-click one end of the text and then shift-right-click the other end.
> 2. Type command-C (or select Copy from the XQuartz Edit menu). The text is now in the system clipboard and can be pasted in the normal way.
>
> Linux:
> 1. Shift-drag as above.
> 2. To paste the highlighted text, click the middle mouse button.
>
> I'll add these instructions to http://moin.delph-in.net/LkbFos<http://moin.delph-in.net/LkbFos>
>
> John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200916/fef4dd69/attachment.html>

From arademaker at gmail.com  Thu Sep 24 20:55:44 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Thu, 24 Sep 2020 15:55:44 -0300
Subject: [developers] Discriminant-Based MRS Banking
Message-ID: <5109459B-EFF8-41A6-B8FC-C8FA73D31D44@gmail.com>


Hi Stephan, 

I have already used the `compare` button from http://erg.delph-in.net, but I didn?t know that this web interface can edit profiles. The paper http://www.lrec-conf.org/proceedings/lrec2006/pdf/364_pdf.pdf suggested this is the case. That is, the web interface can save decisions. Well, I just have suspected since there is a disabled button `save` in the public address.

Is that the case? If so, where can I find documentation about it? I know about the page http://moin.delph-in.net/LogonOnline, but it only talks about how to call the www script in the LOGON directory. 

Best,
Alexandre


From arademaker at gmail.com  Fri Sep 25 01:07:21 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Thu, 24 Sep 2020 20:07:21 -0300
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
Message-ID: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>


Hi Michael and Stephan,

A good place to learn about the Lisp format is 
http://www.gigamonkeys.com/book/a-few-format-recipes.html

Basically, the control-string:

"{~@[~(~a~):~]~
   ~:[~3*~; (~@[cyclic~*~]~@[ ~*~]~@[fragmented~*~])~]~@[~%~]?

The ~( means lower case. So ~{~a~) will output the value of the expression `(eds-top object)` in lower-case. But the ~@[ means conditional format, if `(eds-top object)` value is nil, it will not break nor consume the other arguments in its place. The remain of the control-string is quite complicate to read but one can follow the complete documentation in CLHS. For instance, the `~3*` means http://www.lispworks.com/documentation/lw50/CLHS/Body/22_cga.htm! ;-)

It looks like the serialisation/encode of EDS in pydelphin is also robust to empty top:

https://github.com/delph-in/pydelphin/blob/develop/delphin/codecs/eds.py#L257

But the decode/parse is not, see tests below. Actually, encode should not emit a colon in the first line and, of course, there is this discussion about the line-oriented format that would require a broad review of the encode/decode of EDS.

I have submitted a PR to Michael solving the translation from MRS to EDS, but I didn?t touch in the decode/encode functions. 

I found the Lisp code in the lkb/src/mrs/dependencies.lisp file, so it is part of the LKB source code. I am curious, what `psoa` stands for?


>>> edsnative.encode(edsnative.decode(a))
'{e2: e2:unknown<0:83>{e SF prop-or-ques}[ARG x4] _1:_a_q<0:1>[BV x4] x4:_river_n_of<2:7>{x PERS 3, NUM sg, IND +, PT pt}[] e10:_in_p_loc<8:10>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x4, ARG2 x11] _2:proper_q<11:30>[BV x11] e16:_northeastern_a_1<11:23>{e SF prop, TENSE untensed, MOOD indicative, PROG bool, PERF -}[ARG1 x11] x11:named<24:30>("Brazil"){x PERS 3, NUM sg, IND +}[] e18:_flow_v_1<36:41>{e SF prop, TENSE pres, MOOD indicative, PROG -, PERF -}[ARG1 x4] e19:_general_a_1<42:51>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18] e20:loc_nonsp<52:61>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18, ARG2 x21] x21:place_n<52:61>{x PERS 3, NUM sg}[] _3:def_implicit_q<52:61>[BV x21] e26:_northward_a_1<52:61>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x21] e27:_to_p_state<62:64>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18, ARG2 x28] _4:_the_q<65:68>[BV x28] e33:compound<69:83>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x28, ARG2 x34] _5:proper_q<69:77>[BV x34] x34:named<69:77>("Atlantic"){x PERS 3, NUM sg, IND +, PT pt}[] x28:named<78:83>("Ocean"){x PERS 3, NUM sg, IND +, PT pt}[]}?
>>> x = edsnative.decode(a)
>>> x.top
'e2'
>>> x.top = None
>>> edsnative.encode(x)
'{: e2:unknown<0:83>{e SF prop-or-ques}[ARG x4] _1:_a_q<0:1>[BV x4] x4:_river_n_of<2:7>{x PERS 3, NUM sg, IND +, PT pt}[] e10:_in_p_loc<8:10>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x4, ARG2 x11] _2:proper_q<11:30>[BV x11] e16:_northeastern_a_1<11:23>{e SF prop, TENSE untensed, MOOD indicative, PROG bool, PERF -}[ARG1 x11] x11:named<24:30>("Brazil"){x PERS 3, NUM sg, IND +}[] e18:_flow_v_1<36:41>{e SF prop, TENSE pres, MOOD indicative, PROG -, PERF -}[ARG1 x4] e19:_general_a_1<42:51>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18] e20:loc_nonsp<52:61>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18, ARG2 x21] x21:place_n<52:61>{x PERS 3, NUM sg}[] _3:def_implicit_q<52:61>[BV x21] e26:_northward_a_1<52:61>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x21] e27:_to_p_state<62:64>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18, ARG2 x28] _4:_the_q<65:68>[BV x28] e33:compound<69:83>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x28, ARG2 x34] _5:proper_q<69:77>[BV x34] x34:named<69:77>("Atlantic"){x PERS 3, NUM sg, IND +, PT pt}[] x28:named<78:83>("Ocean"){x PERS 3, NUM sg, IND +, PT pt}[]}?
>>> edsnative.decode(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ar/venv/lib/python3.8/site-packages/delphin/codecs/eds.py", line 110, in decode
    lexer = _EDSLexer.lex(s.splitlines())
AttributeError: 'EDS' object has no attribute 'splitlines'


Best,
Alexandre


From goodman.m.w at gmail.com  Fri Sep 25 03:34:44 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 25 Sep 2020 09:34:44 +0800
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
	<1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
Message-ID: <CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>

On Fri, Sep 25, 2020 at 7:07 AM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> Hi Michael and Stephan,
>
> A good place to learn about the Lisp format is
> http://www.gigamonkeys.com/book/a-few-format-recipes.html
>
> [...]


Thanks Alexandre for the links and the explanation.

I tried reading some elisp docs and a guide on format in order to
understand the expression when Stephan posted it, but after about 20
minutes I decided that was too much effort just to understand an email, so
I gave up.

It looks like the serialisation/encode of EDS in pydelphin is also robust
> to empty top:
>
>
> https://github.com/delph-in/pydelphin/blob/develop/delphin/codecs/eds.py#L257
>

Hmm, I guess I anticipated that because I allow an empty top in the data
structure. Thanks for digging that up!


> But the decode/parse is not, see tests below. Actually, encode should not
> emit a colon in the first line and, of course, there is this discussion
> about the line-oriented format that would require a broad review of the
> encode/decode of EDS.
>

I think the colon was deliberate to avoid potential ambiguity with the
identifier of the first node. Stephan instead wants to make newlines
obligatory. I'm happy to make newlines + indentation the default for EDS
native serialization, but I'm not prepared to get rid of the ability to
write single-line EDS.


>
> I have submitted a PR to Michael solving the translation from MRS to EDS,
> but I didn?t touch in the decode/encode functions.
>

Thanks, I'll take a look.

I found the Lisp code in the lkb/src/mrs/dependencies.lisp file, so it is
> part of the LKB source code. I am curious, what `psoa` stands for?
>
> "probable-state-of-affairs". But I'm not sure where that terminology comes
from.


>
> [...]
> >>> edsnative.decode(x)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/ar/venv/lib/python3.8/site-packages/delphin/codecs/eds.py",
> line 110, in decode
>     lexer = _EDSLexer.lex(s.splitlines())
> AttributeError: 'EDS' object has no attribute 'splitlines'
>

Here you have attempted to decode x, which is the EDS data structure.
Instead you'd want to do `edsnative.decode(edsnative.encode(x))`, but that
also fails because it expects the top variable before the colon. It appears
my robustness attempt was incomplete.

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200925/755cad5e/attachment.html>

From ebender at uw.edu  Fri Sep 25 04:40:23 2020
From: ebender at uw.edu (Emily M. Bender)
Date: Thu, 24 Sep 2020 19:40:23 -0700
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
	<1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
	<CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>
Message-ID: <CAMype6cQU3e9MsAW0894KsYTkLNgLQaMfhRnwpAmbaaUQ6aTbA@mail.gmail.com>

I don't have much to contribute to serialization etc, but psoa is
`parameterized state of affairs', and I think it comes from the
situation semantics literature.

Emily

On Thu, Sep 24, 2020 at 6:36 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> On Fri, Sep 25, 2020 at 7:07 AM Alexandre Rademaker <arademaker at gmail.com>
> wrote:
>
>>
>> Hi Michael and Stephan,
>>
>> A good place to learn about the Lisp format is
>> http://www.gigamonkeys.com/book/a-few-format-recipes.html
>>
>> [...]
>
>
> Thanks Alexandre for the links and the explanation.
>
> I tried reading some elisp docs and a guide on format in order to
> understand the expression when Stephan posted it, but after about 20
> minutes I decided that was too much effort just to understand an email, so
> I gave up.
>
> It looks like the serialisation/encode of EDS in pydelphin is also robust
>> to empty top:
>>
>>
>> https://github.com/delph-in/pydelphin/blob/develop/delphin/codecs/eds.py#L257
>>
>
> Hmm, I guess I anticipated that because I allow an empty top in the data
> structure. Thanks for digging that up!
>
>
>
>> But the decode/parse is not, see tests below. Actually, encode should not
>> emit a colon in the first line and, of course, there is this discussion
>> about the line-oriented format that would require a broad review of the
>> encode/decode of EDS.
>>
>
> I think the colon was deliberate to avoid potential ambiguity with the
> identifier of the first node. Stephan instead wants to make newlines
> obligatory. I'm happy to make newlines + indentation the default for EDS
> native serialization, but I'm not prepared to get rid of the ability to
> write single-line EDS.
>
>
>>
>> I have submitted a PR to Michael solving the translation from MRS to EDS,
>> but I didn?t touch in the decode/encode functions.
>>
>
> Thanks, I'll take a look.
>
> I found the Lisp code in the lkb/src/mrs/dependencies.lisp file, so it is
>> part of the LKB source code. I am curious, what `psoa` stands for?
>>
>> "probable-state-of-affairs". But I'm not sure where that terminology
> comes from.
>
>
>
>>
>> [...]
>> >>> edsnative.decode(x)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File
>> "/Users/ar/venv/lib/python3.8/site-packages/delphin/codecs/eds.py", line
>> 110, in decode
>>     lexer = _EDSLexer.lex(s.splitlines())
>> AttributeError: 'EDS' object has no attribute 'splitlines'
>>
>
> Here you have attempted to decode x, which is the EDS data structure.
> Instead you'd want to do `edsnative.decode(edsnative.encode(x))`, but that
> also fails because it expects the top variable before the colon. It appears
> my robustness attempt was incomplete.
>
> --
> -Michael Wayne Goodman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200924/ff35c1d3/attachment-0001.html>

From arademaker at gmail.com  Fri Sep 25 05:05:57 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 25 Sep 2020 00:05:57 -0300
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CAMype6cQU3e9MsAW0894KsYTkLNgLQaMfhRnwpAmbaaUQ6aTbA@mail.gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
	<1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
	<CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>
	<CAMype6cQU3e9MsAW0894KsYTkLNgLQaMfhRnwpAmbaaUQ6aTbA@mail.gmail.com>
Message-ID: <D25B60A6-9BF5-45ED-A6C6-0B58F430166A@gmail.com>


Thank you Emily, I found one mention in https://plato.stanford.edu/entries/situations-semantics/. But in the context of the Lisp code, I am curious why the authors used this in the suffix of the function name? 

BTW, I reported two issues on pydelphin and ERG repositories:

https://github.com/delph-in/erg/issues/25

The trunk version of ERG gave me this MRS.. 

[ TOP: h0
  INDEX: e2 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ]
  RELS: < [ _communicate_v_to<0:11> LBL: h1 ARG0: e4 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ] ARG1: i3 ARG2: h5 ARG3: i6 ]
          [ _or_c<12:14> LBL: h1 ARG0: e2 ARG1: e4 ARG2: e7 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ]
          [ _express_v_to<15:22> LBL: h1 ARG0: e7 ARG1: i3 ARG2: h8 ARG3: i9 ]
          [ unknown<23:33> LBL: h10 ARG: u12 ARG0: e11 [ e SF: prop TENSE: untensed MOOD: indicative ] ]
          [ _by_p_means<23:25> LBL: h10 ARG0: e11 ARG1: u13 ARG2: x14 ]
          [ udef_q<26:33> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ]
          [ nominalization<26:33> LBL: h18 ARG0: x14 ARG1: h19 ]
          [ _write_v_to<26:33> LBL: h19 ARG0: e20 [ e SF: prop TENSE: untensed MOOD: indicative PROG: + PERF: - ] ARG1: i21 ARG2: i22 ] >
  HCONS: < h0 qeq h1 h5 qeq h23 h8 qeq h23 h16 qeq h18 > ]

Handle h23 does not appear in the predicates. The h5 and h8 only in the arguments. Is it valid? 

Pydelphin transformation to DMRS works with a warning "broken handle constraint?. Can it be transformed to EDS? Is this an evidence that MRS to EDS is much less robust than MRS to DMRS? 

BTW, since I am having so many trouble with MRS to EDS, and my goal is to compare a golden version of a profile with its 1-best parsed version to evaluate the parse selection model, I wonder if I could be doing that with DMRS instead of EDS? any idea? Any alternative to https://github.com/delph-in/delphin.edm using DMRS?

Best,
Alexandre


> On 24 Sep 2020, at 23:40, Emily M. Bender <ebender at uw.edu> wrote:
> 
> I don't have much to contribute to serialization etc, but psoa is `parameterized state of affairs', and I think it comes from the situation semantics literature.
> 
> Emily
> 


From arademaker at gmail.com  Fri Sep 25 05:20:58 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 25 Sep 2020 00:20:58 -0300
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <D25B60A6-9BF5-45ED-A6C6-0B58F430166A@gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
	<1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
	<CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>
	<CAMype6cQU3e9MsAW0894KsYTkLNgLQaMfhRnwpAmbaaUQ6aTbA@mail.gmail.com>
	<D25B60A6-9BF5-45ED-A6C6-0B58F430166A@gmail.com>
Message-ID: <D3FC55FB-CE72-4594-92CD-D2E3C4B4E29B@gmail.com>


Ops! I came to hasty conclusions! LKB FOS parsed the sentence with ERG trunk. Two analysis, two MRS and two EDS with no error!! In the MRSs from LKB, ARG2 of `_communicate_v_to` and `_express_v_to` are both qeq to the label of the `unknown` predicate.

I don?t know what I can conclude now? ACE error? 

> On 25 Sep 2020, at 00:05, Alexandre Rademaker <arademaker at gmail.com> wrote:
> 
> 
> Thank you Emily, I found one mention in https://plato.stanford.edu/entries/situations-semantics/. But in the context of the Lisp code, I am curious why the authors used this in the suffix of the function name? 
> 
> BTW, I reported two issues on pydelphin and ERG repositories:
> 
> https://github.com/delph-in/erg/issues/25
> 
> The trunk version of ERG gave me this MRS.. 
> 
> [ TOP: h0
>  INDEX: e2 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ]
>  RELS: < [ _communicate_v_to<0:11> LBL: h1 ARG0: e4 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ] ARG1: i3 ARG2: h5 ARG3: i6 ]
>          [ _or_c<12:14> LBL: h1 ARG0: e2 ARG1: e4 ARG2: e7 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ]
>          [ _express_v_to<15:22> LBL: h1 ARG0: e7 ARG1: i3 ARG2: h8 ARG3: i9 ]
>          [ unknown<23:33> LBL: h10 ARG: u12 ARG0: e11 [ e SF: prop TENSE: untensed MOOD: indicative ] ]
>          [ _by_p_means<23:25> LBL: h10 ARG0: e11 ARG1: u13 ARG2: x14 ]
>          [ udef_q<26:33> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ]
>          [ nominalization<26:33> LBL: h18 ARG0: x14 ARG1: h19 ]
>          [ _write_v_to<26:33> LBL: h19 ARG0: e20 [ e SF: prop TENSE: untensed MOOD: indicative PROG: + PERF: - ] ARG1: i21 ARG2: i22 ] >
>  HCONS: < h0 qeq h1 h5 qeq h23 h8 qeq h23 h16 qeq h18 > ]
> 
> Handle h23 does not appear in the predicates. The h5 and h8 only in the arguments. Is it valid? 
> 
> Pydelphin transformation to DMRS works with a warning "broken handle constraint?. Can it be transformed to EDS? Is this an evidence that MRS to EDS is much less robust than MRS to DMRS? 
> 
> BTW, since I am having so many trouble with MRS to EDS, and my goal is to compare a golden version of a profile with its 1-best parsed version to evaluate the parse selection model, I wonder if I could be doing that with DMRS instead of EDS? any idea? Any alternative to https://github.com/delph-in/delphin.edm using DMRS?
> 
> Best,
> Alexandre


From goodman.m.w at gmail.com  Fri Sep 25 05:29:27 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 25 Sep 2020 11:29:27 +0800
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <D3FC55FB-CE72-4594-92CD-D2E3C4B4E29B@gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
	<1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
	<CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>
	<CAMype6cQU3e9MsAW0894KsYTkLNgLQaMfhRnwpAmbaaUQ6aTbA@mail.gmail.com>
	<D25B60A6-9BF5-45ED-A6C6-0B58F430166A@gmail.com>
	<D3FC55FB-CE72-4594-92CD-D2E3C4B4E29B@gmail.com>
Message-ID: <CAGXBFApvqdb_SGg2Y+-ZrakmYDOMzn6t6h2=HD+Ne0tGF33zmg@mail.gmail.com>

Thanks, Emily, for the correction!

On Fri, Sep 25, 2020 at 11:22 AM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> Ops! I came to hasty conclusions! LKB FOS parsed the sentence with ERG
> trunk. Two analysis, two MRS and two EDS with no error!! In the MRSs from
> LKB, ARG2 of `_communicate_v_to` and `_express_v_to` are both qeq to the
> label of the `unknown` predicate.
>
> I don?t know what I can conclude now? ACE error?
>

Regarding the MRS you reported: broken HCONS are a somewhat common issue,
and they are the symptom of a bug. The MRS is not valid.

Regarding conversion to EDS, the LKB goes to greater lengths to give an EDS
even for ill-formed MRSs, while PyDelphin tries to avoid grammar-specific
solutions. The result is that PyDelphin will, I think, apply
predicate-modification more broadly than the LKB, but it is a bit more
brittle.

While there may be other reasons to use DMRS instead, in this case the
different behavior in PyDelphin is just because it issues a warning instead
of an error, then prints out the partial DMRS, dropping the disconnected
nodes.


> > On 25 Sep 2020, at 00:05, Alexandre Rademaker <arademaker at gmail.com>
> wrote:
> >
> >
> > Thank you Emily, I found one mention in
> https://plato.stanford.edu/entries/situations-semantics/. But in the
> context of the Lisp code, I am curious why the authors used this in the
> suffix of the function name?
> >
> > BTW, I reported two issues on pydelphin and ERG repositories:
> >
> > https://github.com/delph-in/erg/issues/25
> >
> > The trunk version of ERG gave me this MRS..
> >
> > [ TOP: h0
> >  INDEX: e2 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ]
> >  RELS: < [ _communicate_v_to<0:11> LBL: h1 ARG0: e4 [ e SF: prop TENSE:
> tensed MOOD: indicative PROG: - PERF: - ] ARG1: i3 ARG2: h5 ARG3: i6 ]
> >          [ _or_c<12:14> LBL: h1 ARG0: e2 ARG1: e4 ARG2: e7 [ e SF: prop
> TENSE: pres MOOD: indicative PROG: - PERF: - ] ]
> >          [ _express_v_to<15:22> LBL: h1 ARG0: e7 ARG1: i3 ARG2: h8 ARG3:
> i9 ]
> >          [ unknown<23:33> LBL: h10 ARG: u12 ARG0: e11 [ e SF: prop
> TENSE: untensed MOOD: indicative ] ]
> >          [ _by_p_means<23:25> LBL: h10 ARG0: e11 ARG1: u13 ARG2: x14 ]
> >          [ udef_q<26:33> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ]
> >          [ nominalization<26:33> LBL: h18 ARG0: x14 ARG1: h19 ]
> >          [ _write_v_to<26:33> LBL: h19 ARG0: e20 [ e SF: prop TENSE:
> untensed MOOD: indicative PROG: + PERF: - ] ARG1: i21 ARG2: i22 ] >
> >  HCONS: < h0 qeq h1 h5 qeq h23 h8 qeq h23 h16 qeq h18 > ]
> >
> > Handle h23 does not appear in the predicates. The h5 and h8 only in the
> arguments. Is it valid?
> >
> > Pydelphin transformation to DMRS works with a warning "broken handle
> constraint?. Can it be transformed to EDS? Is this an evidence that MRS to
> EDS is much less robust than MRS to DMRS?
> >
> > BTW, since I am having so many trouble with MRS to EDS, and my goal is
> to compare a golden version of a profile with its 1-best parsed version to
> evaluate the parse selection model, I wonder if I could be doing that with
> DMRS instead of EDS? any idea? Any alternative to
> https://github.com/delph-in/delphin.edm using DMRS?
> >
> > Best,
> > Alexandre
>
>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200925/d92946b3/attachment.html>

From goodman.m.w at gmail.com  Fri Sep 25 05:32:36 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 25 Sep 2020 11:32:36 +0800
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CAGXBFApvqdb_SGg2Y+-ZrakmYDOMzn6t6h2=HD+Ne0tGF33zmg@mail.gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
	<1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
	<CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>
	<CAMype6cQU3e9MsAW0894KsYTkLNgLQaMfhRnwpAmbaaUQ6aTbA@mail.gmail.com>
	<D25B60A6-9BF5-45ED-A6C6-0B58F430166A@gmail.com>
	<D3FC55FB-CE72-4594-92CD-D2E3C4B4E29B@gmail.com>
	<CAGXBFApvqdb_SGg2Y+-ZrakmYDOMzn6t6h2=HD+Ne0tGF33zmg@mail.gmail.com>
Message-ID: <CAGXBFAphgNzo4ck5X-LTpSMAdi9EOeN7-O9PJf7qFp-rEs1ynA@mail.gmail.com>

And regarding https://github.com/delph-in/delphin.edm, note that this
implementation also works for MRS, by converting to EDS along the way, and
for DMRS, without conversion.

On Fri, Sep 25, 2020 at 11:29 AM goodman.m.w at gmail.com <
goodman.m.w at gmail.com> wrote:

> Thanks, Emily, for the correction!
>
> On Fri, Sep 25, 2020 at 11:22 AM Alexandre Rademaker <arademaker at gmail.com>
> wrote:
>
>>
>> Ops! I came to hasty conclusions! LKB FOS parsed the sentence with ERG
>> trunk. Two analysis, two MRS and two EDS with no error!! In the MRSs from
>> LKB, ARG2 of `_communicate_v_to` and `_express_v_to` are both qeq to the
>> label of the `unknown` predicate.
>>
>> I don?t know what I can conclude now? ACE error?
>>
>
> Regarding the MRS you reported: broken HCONS are a somewhat common issue,
> and they are the symptom of a bug. The MRS is not valid.
>
> Regarding conversion to EDS, the LKB goes to greater lengths to give an
> EDS even for ill-formed MRSs, while PyDelphin tries to avoid
> grammar-specific solutions. The result is that PyDelphin will, I think,
> apply predicate-modification more broadly than the LKB, but it is a bit
> more brittle.
>
> While there may be other reasons to use DMRS instead, in this case the
> different behavior in PyDelphin is just because it issues a warning instead
> of an error, then prints out the partial DMRS, dropping the disconnected
> nodes.
>
>
>
>> > On 25 Sep 2020, at 00:05, Alexandre Rademaker <arademaker at gmail.com>
>> wrote:
>> >
>> >
>> > Thank you Emily, I found one mention in
>> https://plato.stanford.edu/entries/situations-semantics/. But in the
>> context of the Lisp code, I am curious why the authors used this in the
>> suffix of the function name?
>> >
>> > BTW, I reported two issues on pydelphin and ERG repositories:
>> >
>> > https://github.com/delph-in/erg/issues/25
>> >
>> > The trunk version of ERG gave me this MRS..
>> >
>> > [ TOP: h0
>> >  INDEX: e2 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ]
>> >  RELS: < [ _communicate_v_to<0:11> LBL: h1 ARG0: e4 [ e SF: prop TENSE:
>> tensed MOOD: indicative PROG: - PERF: - ] ARG1: i3 ARG2: h5 ARG3: i6 ]
>> >          [ _or_c<12:14> LBL: h1 ARG0: e2 ARG1: e4 ARG2: e7 [ e SF: prop
>> TENSE: pres MOOD: indicative PROG: - PERF: - ] ]
>> >          [ _express_v_to<15:22> LBL: h1 ARG0: e7 ARG1: i3 ARG2: h8
>> ARG3: i9 ]
>> >          [ unknown<23:33> LBL: h10 ARG: u12 ARG0: e11 [ e SF: prop
>> TENSE: untensed MOOD: indicative ] ]
>> >          [ _by_p_means<23:25> LBL: h10 ARG0: e11 ARG1: u13 ARG2: x14 ]
>> >          [ udef_q<26:33> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ]
>> >          [ nominalization<26:33> LBL: h18 ARG0: x14 ARG1: h19 ]
>> >          [ _write_v_to<26:33> LBL: h19 ARG0: e20 [ e SF: prop TENSE:
>> untensed MOOD: indicative PROG: + PERF: - ] ARG1: i21 ARG2: i22 ] >
>> >  HCONS: < h0 qeq h1 h5 qeq h23 h8 qeq h23 h16 qeq h18 > ]
>> >
>> > Handle h23 does not appear in the predicates. The h5 and h8 only in the
>> arguments. Is it valid?
>> >
>> > Pydelphin transformation to DMRS works with a warning "broken handle
>> constraint?. Can it be transformed to EDS? Is this an evidence that MRS to
>> EDS is much less robust than MRS to DMRS?
>> >
>> > BTW, since I am having so many trouble with MRS to EDS, and my goal is
>> to compare a golden version of a profile with its 1-best parsed version to
>> evaluate the parse selection model, I wonder if I could be doing that with
>> DMRS instead of EDS? any idea? Any alternative to
>> https://github.com/delph-in/delphin.edm using DMRS?
>> >
>> > Best,
>> > Alexandre
>>
>>
>>
>
> --
> -Michael Wayne Goodman
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200925/479f74c9/attachment-0001.html>

From arademaker at gmail.com  Fri Sep 25 06:02:16 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 25 Sep 2020 01:02:16 -0300
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CAGXBFAphgNzo4ck5X-LTpSMAdi9EOeN7-O9PJf7qFp-rEs1ynA@mail.gmail.com>
References: <CAGXBFAphgNzo4ck5X-LTpSMAdi9EOeN7-O9PJf7qFp-rEs1ynA@mail.gmail.com>
Message-ID: <7916EB41-2AD1-4020-A6B8-64742999D1BE@gmail.com>

Hi Michael,

Thank you for your comments.

The main fact now is that LKB does not produce  disconnected MRSs for the same sentence.  Maybe ACE error or a difference on the initialization scripts of ERG for each tool? I understood the possible difference in the approach for converting MRS to EDS between LKB and PyDelphin, but the MRSs are different to begin with.

Good point about the incomplete DMRS output.   One more hasty conclusion from my side. I didn?t inspect carefully the DMRS produced.

How can I tell edm to use DMRS instead of EDS? Maybe I missed something...  

Maybe edm must be more robust to ignore pairs with errors? 

Alexandre 
Sent from my iPhone

> On 25 Sep 2020, at 00:32, goodman.m.w at gmail.com wrote:
> And regarding https://github.com/delph-in/delphin.edm, note that this implementation also works for MRS, by converting to EDS along the way, and for DMRS, without conversion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200925/993ed42e/attachment.html>

From goodman.m.w at gmail.com  Fri Sep 25 06:40:02 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 25 Sep 2020 12:40:02 +0800
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <7916EB41-2AD1-4020-A6B8-64742999D1BE@gmail.com>
References: <CAGXBFAphgNzo4ck5X-LTpSMAdi9EOeN7-O9PJf7qFp-rEs1ynA@mail.gmail.com>
	<7916EB41-2AD1-4020-A6B8-64742999D1BE@gmail.com>
Message-ID: <CAGXBFApjkUe+PfeAHMLEZJBuJ4+RUKiNJD1f3Sv4mN8ih31-cQ@mail.gmail.com>

On Fri, Sep 25, 2020 at 12:02 PM Alexandre Rademaker <arademaker at gmail.com>
wrote:

> Hi Michael,
>
> Thank you for your comments.
>
> The main fact now is that LKB does not produce  disconnected MRSs for the
> same sentence.  Maybe ACE error or a difference on the initialization
> scripts of ERG for each tool? I understood the possible difference in the
> approach for converting MRS to EDS between LKB and PyDelphin, but the MRSs
> are different to begin with.
>

Oh I see. I missed that part of your message. Different parse-ranking
models, perhaps?


> Good point about the incomplete DMRS output.   One more hasty conclusion
> from my side. I didn?t inspect carefully the DMRS produced.
>
> How can I tell edm to use DMRS instead of EDS? Maybe I missed
> something...
>

The -f / --format option specifies the codec to use. E.g., --format=dmrx

Maybe edm must be more robust to ignore pairs with errors?
>

That's a good idea. I created
https://github.com/delph-in/delphin.edm/issues/3


> Alexandre
> Sent from my iPhone
>
> On 25 Sep 2020, at 00:32, goodman.m.w at gmail.com wrote:
>
> And regarding https://github.com/delph-in/delphin.edm, note that this
> implementation also works for MRS, by converting to EDS along the way, and
> for DMRS, without conversion.
>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200925/1b43dab6/attachment.html>

From arademaker at gmail.com  Sun Sep 27 01:54:53 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Sat, 26 Sep 2020 20:54:53 -0300
Subject: [developers] CSLI vs MRS profiles
Message-ID: <D08536BC-1AEA-462A-A126-1BD92E933B3D@gmail.com>

Hi,

A couple of months ago, I was editing the page http://moin.delph-in.net/MatrixMrsTestSuite and Stephan corrected my mistake and explained to me the origin of both datasets:

> the MRS test suite is something that ann and dan cooked up over the course of five or so weeks while dan was visiting cambridge, 2001 or 2002, i would say.  except for some reuse of Abrams and Browne, I doubt there is any overlap in actual sentences with what was originally called the HP test suite.  the latter was created to explore variation in syntactic structures and lives on in the DELPH-IN universe under the name CSLI test suite (since around 1994).  the MRS test suite, on the other hand, exemplifies basic semantic constructions.  so, in my view it is misleading to say it was derived from the HP data, but dan was of course centrally involved in both efforts.

I am playing with these profiles again, trying to have both translated to Brazilian Portuguese (not the European Portuguese translation we have in the wiki). But I have a question 

Why 7 sentences are different if we compare http://moin.delph-in.net/MatrixMrsTestSuite to http://svn.delph-in.net/erg/trunk/tsdb/gold/mrs/? Diff below. Does anyone remember what happen? Should we update the wiki?

The page http://moin.delph-in.net/MatrixMrsTestSuite is now marked as Immutable, but I believe the sentence 

> Currently, there are test suites for the following languages (included in the [incr tsdb()] software package) 


is misleading. We don't have a [incr tsdb()] package with profiles. The page http://www.delph-in.net/itsdb/ has a link to http://lingo.stanford.edu/ftp/latest/ but it was not working. I found three profiles from the root of LOGON tree:

% find . | grep mrs/item.gz
./lingo/erg/tsdb/gold/mrs/item.gz
./lingo/terg/tsdb/gold/mrs/item.gz
./dfki/gg/tsdb/gold/mrs/item.gz

So the profiles are not included in [incr tsdb()], they are maintained with grammars, right? We don?t have profiles with all the languages listed in the wiki, only two.

We have also some discussions in https://delphinqa.ling.washington.edu/t/matrix-mrs-test-suite/484. In the forum, I suggested other changes in the wiki:

> The wiki is confusing since on the main page we have translations for Japanese and a relevant discussion about the structure of the set only at the bottom of the page. Moreover, in http://moin.delph-in.net/MatrixMrsTestSuiteEn links are all broken. Does anyone know what is this old server that the links like http://cypriot.stanford.edu/~bond/mrs-en060524/11.html point to? How can I help to make them work again?


Any comment? 


Best,


% diff matrix-en.sent mrs-en.sent
20c20
< Cats bark.
---
> Cats go.
22c22
< Some bark.
---
> Some went.
53c53
< Chased dogs bark.
---
> Chased dogs go.
55c55
< That the cat chases Browne is old.
---
> That the cat chases Browne is obvious.
62,64c62,64
< Browne's barks.
< Twenty three dogs bark.
< Two hundred twenty dogs bark.
---
> Browne's goes.
> Twenty three dogs go.
> Two hundred twenty dogs go.
79c79
< Abrams promised Browne to bark.
---
> Abrams promised Browne to go.
97c97
< The cats found a way to bark.
---
> The cats found a way to go.


--
Alexandre Rademaker
http://arademaker.github.io


From oe at ifi.uio.no  Sun Sep 27 09:16:17 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sun, 27 Sep 2020 09:16:17 +0200
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CAGXBFApvqdb_SGg2Y+-ZrakmYDOMzn6t6h2=HD+Ne0tGF33zmg@mail.gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
	<1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
	<CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>
	<CAMype6cQU3e9MsAW0894KsYTkLNgLQaMfhRnwpAmbaaUQ6aTbA@mail.gmail.com>
	<D25B60A6-9BF5-45ED-A6C6-0B58F430166A@gmail.com>
	<D3FC55FB-CE72-4594-92CD-D2E3C4B4E29B@gmail.com>
	<CAGXBFApvqdb_SGg2Y+-ZrakmYDOMzn6t6h2=HD+Ne0tGF33zmg@mail.gmail.com>
Message-ID: <CA+_Fm6Lb=72x6doy7hMovFVmZPDLhCWeat1pyE0oiH+c5zRyDQ@mail.gmail.com>

hi mike and alexandre,

> Regarding conversion to EDS, the LKB goes to greater lengths to give an EDS even for ill-formed MRSs, while PyDelphin tries to avoid grammar-specific solutions. The result is that PyDelphin will, I think, apply predicate-modification more broadly than the LKB, but it is a bit more brittle.

conceptually, i think all MRSs should be converted to EDSs, and no
information that can be expressed in the EDS graph should be lost.
more practically: each EP (and each ICONS) in the MRS should introduce
a node into the EDS, and each semantic role whose value is associated
with a node should yield an edge (additional edges should be
introduced for instances of predicate modification).  additional or
illformed information in the MRS (e.g. invalid handle constraints or
roles whose value does not correspond to the label or intrinsic
variable of an EP) should be ignored.

on this view, conversion to EDS should always succeed.  indeed,
robustness to (linguistically) illformed MRSs has been a key goal in
the original EDS converter that is part of the LKB.  i would welcome
bug reports, in case you encounter conversion errors.

mike. why would you think any of the above might call for
grammar-specific solutions?  i would like to encourage pyDelphin to
embrace the same robustness goals as the LKB-based converter.  EDSs
were originally invented for practical utility, to make it easier for
downstream applications to work with ERG analyses; for that goal, any
MRS that comes out of the parser should also yield an EDS, no matter
its structure or contents.

best wishes, oe


From goodman.m.w at gmail.com  Sun Sep 27 17:37:32 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Sun, 27 Sep 2020 23:37:32 +0800
Subject: [developers] Valid MRS? Bug in ERG?
In-Reply-To: <CA+_Fm6Lb=72x6doy7hMovFVmZPDLhCWeat1pyE0oiH+c5zRyDQ@mail.gmail.com>
References: <D88C8275-B063-41EF-B952-2A60FADFF587@gmail.com>
	<CAGXBFArh2G=CEtMn_taA2w90aDbJRaBOjYexsy90YRpuBq1D1Q@mail.gmail.com>
	<CA+_Fm6J-21Xu0dj_NZmcYE1m1y5r33=v8V-QiMFBwbBtWjjjUg@mail.gmail.com>
	<CAGXBFAqaeYvTrzj5kzbN7XwkySku1Ay7_E1-PPwnD4WbFKLLGg@mail.gmail.com>
	<1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com>
	<CAGXBFAo_oOHWSr5j1Y7KFKSeD4=UxLLg+29nbpmO4yqYdW63dA@mail.gmail.com>
	<CAMype6cQU3e9MsAW0894KsYTkLNgLQaMfhRnwpAmbaaUQ6aTbA@mail.gmail.com>
	<D25B60A6-9BF5-45ED-A6C6-0B58F430166A@gmail.com>
	<D3FC55FB-CE72-4594-92CD-D2E3C4B4E29B@gmail.com>
	<CAGXBFApvqdb_SGg2Y+-ZrakmYDOMzn6t6h2=HD+Ne0tGF33zmg@mail.gmail.com>
	<CA+_Fm6Lb=72x6doy7hMovFVmZPDLhCWeat1pyE0oiH+c5zRyDQ@mail.gmail.com>
Message-ID: <CAGXBFArmzfimNDqXg-nDUMkM6p_ABsta+D_HC56mMQfLoKKUfA@mail.gmail.com>

On Sun, Sep 27, 2020 at 3:16 PM Stephan Oepen <oe at ifi.uio.no> wrote:

> [...]
> conceptually, i think all MRSs should be converted to EDSs, and no
> information that can be expressed in the EDS graph should be lost.
> more practically: each EP (and each ICONS) in the MRS should introduce
> a node into the EDS, and each semantic role whose value is associated
> with a node should yield an edge (additional edges should be
> introduced for instances of predicate modification).  additional or
> illformed information in the MRS (e.g. invalid handle constraints or
> roles whose value does not correspond to the label or intrinsic
> variable of an EP) should be ignored.
>

I agree with this, except the part about ICONS introducing nodes surprised
me.. I thought that EDS, like DMRS, is yet to provide a treatment for
ICONS. Care to explain?


>
> [...]
>
> mike. why would you think any of the above might call for
> grammar-specific solutions?  i would like to encourage pyDelphin to
> embrace the same robustness goals as the LKB-based converter.  EDSs
> were originally invented for practical utility, to make it easier for
> downstream applications to work with ERG analyses; for that goal, any
> MRS that comes out of the parser should also yield an EDS, no matter
> its structure or contents.


First, some clarifications/corrections: The EDS conversion error in
PyDelphin is a bug, not expected behavior. Also, the dropped nodes for the
disconnected DMRS was not completely described (sorry, Alexandre). The
conversion actually creates nodes even for the disconnected EPs, but I was
viewing the output of the dmrs-penman codec which is not capable of
representing disconnected graphs (the same goes for eds-penman). They are
present in other serialization formats.

Regarding the grammar-specific solutions, as I understand the LKB's EDS
code still maintains lists of ERG (1214?) predicates, roles, etc. for
various uses, such as predicate modification. In the case of predicate
modification, it and PyDelphin suture the unlinked nodes with the ARG1
role, if it was otherwise unused. I was under the impression that the LKB's
converter did a bit more surgery in order to normalize some anticipated
ill-formed structures. But if I was mistaken and the only other
"value-added" part of conversion is the aforementioned top-selection with
the other MRS-maladies ignored, then I think we share the same goal and
view of robustness.

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200927/8f83bedf/attachment.html>

From arademaker at gmail.com  Wed Oct  7 19:50:28 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Wed, 7 Oct 2020 14:50:28 -0300
Subject: [developers] Fwd: [DELPH-IN Discourse] [ERG] Top level ERG page is
 down: http://www.delph-in.net/erg/
References: <topic/585@delphinqa.ling.washington.edu>
Message-ID: <125E6F19-E3FF-4EE0-876C-2D796FD68DC9@gmail.com>


Hi Stephan and Dan,

Just calling your attention to the message below about the  http://www.delph-in.net/erg/ <http://www.delph-in.net/erg/>. Is this link different from the http://erg.delph-in.net/logon? <http://erg.delph-in.net/logon?> Github link is easy to change, what should be the ERG official homepage?

Best,
Alexandre

> From: Eric Zinda via DELPH-IN Discourse <noreply at sifaka.ling.washington.edu>
> Subject: [DELPH-IN Discourse] [ERG] Top level ERG page is down: http://www.delph-in.net/erg/
> Date: 7 October 2020 14:12:59 GMT-3
> To: arademaker at gmail.com
> Reply-To: DELPH-IN Discourse <delphindiscourse+49daab6e33985a784bfef2234f20d37f at gmail.com>
> 
> 	EricZinda <https://delphinqa.ling.washington.edu/u/ericzinda> 
> October 7
> I think the top level ERG page has been down for some time (weeks? Months?): http://www.delph-in.net/erg/ <http://www.delph-in.net/erg/>
> Linked from:
> 
> lots of top level links if you google ERG
> the top level grammar page: http://www.delph-in.net/wiki/index.php/Grammars <http://www.delph-in.net/wiki/index.php/Grammars>
> the github site: https://github.com/delph-in/erg <https://github.com/delph-in/erg>
> others I?m sure
> Just thought someone might want to know?
> 
> Visit Topic <https://delphinqa.ling.washington.edu/t/top-level-erg-page-is-down-http-www-delph-in-net-erg/585/1> or reply to this email to respond.
> 
> You are receiving this because you enabled mailing list mode.
> 
> To unsubscribe from these emails, click here <https://delphinqa.ling.washington.edu/email/unsubscribe/89951eeeeb6597746f6332627919cfd33f0d70323e42361febec25087c4db9b3>.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201007/cca95e98/attachment.html>

From oe at ifi.uio.no  Thu Oct  8 22:20:02 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Thu, 8 Oct 2020 22:20:02 +0200
Subject: [developers] Fwd: [DELPH-IN Discourse] [ERG] Top level ERG page
 is down: http://www.delph-in.net/erg/
In-Reply-To: <125E6F19-E3FF-4EE0-876C-2D796FD68DC9@gmail.com>
References: <topic/585@delphinqa.ling.washington.edu>
	<125E6F19-E3FF-4EE0-876C-2D796FD68DC9@gmail.com>
Message-ID: <CA+_Fm6+OvxmNyrBW9t6RwPTTxWR_6+-AUM1m9=p2u_XOsoXcEQ@mail.gmail.com>

> Just calling your attention to the message below about the  http://www.delph-in.net/erg/. Is this link different from the http://erg.delph-in.net/logon? Github link is easy to change, what should be the ERG official homepage?

thanks for the note, alexandre!

yes, 'http://www.delph-in.net/erg/' is the ERG home page, which
redirects to 'http://lingo.stanford.edu', which has crashed some
months ago and has proven difficult to replace given pandemic-related
constraints.  dan will have to decide what to do about that page.

'http://erg.delph-in.net/logon' is just the on-line demonstration,
which is running in oslo.  that service has been subjected to
occasional flooding attacks (with tens of thousands of queries on same
days), which has caused some challenges in robustness and
availability.  but i feel committed to keeping that service alive, in
principle :-).

best, oe

From arademaker at gmail.com  Tue Oct 20 05:42:43 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 20 Oct 2020 00:42:43 -0300
Subject: [developers] www script in the logon distribution
In-Reply-To: <CA+_Fm6L51Q7p-1Y83xaPorJ4AEr6Q80mK6UwraYxmVgJKHK3Nw@mail.gmail.com>
References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com>
	<CA+_Fm6L51Q7p-1Y83xaPorJ4AEr6Q80mK6UwraYxmVgJKHK3Nw@mail.gmail.com>
Message-ID: <C0BFAE3D-1206-47E6-805B-4053EF4E90D9@gmail.com>

Hi Stephan,


Using only the --erg I got:

user at acb050b97030:~/logon$ ./www --erg
/home/user/logon/bin/logon: line 94: /home/user/logon/franz/linux.x86.64/alisp: No such file or directory
^C

It looks like it is trying to compile the code and looking for the allegro lisp interpreter. So I started with

$ ./www --binary --debug --erg --port 9080

Everything inside my docker image. The docker redirects the 9080 to the host port. But in the host the request to localhost:9080/logon does not work. Even inside the docker image, in another shell, I get 

$ wget http://localhost:9080/logon
--2020-10-20 03:41:17--  http://localhost:9080/logon
Resolving localhost (localhost)... 127.0.0.1, ::1
Connecting to localhost (localhost)|127.0.0.1|:9080... failed: Connection refused.
Connecting to localhost (localhost)|::1|:9080... failed: Cannot assign requested address.
Retrying.


After many messages, I see 

...
[t40005] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including
`/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/li
ngo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm'
[t40002] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features]
[t40005] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features]
[t40004] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features]
[t40003] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features]


But I don?t have the REPL, it looks like this reading of ME model didn?t finish. Using 

$ ./www --binary --debug -?terg --port 9080

I got

set-coding-system(): activated UTF8.
;   Loading /home/user/logon/lingo/terg/Version.lsp
;   Loading /home/user/logon/lingo/terg/lkb/globals.lsp
;   Loading /home/user/logon/lingo/terg/lkb/user-fns.lsp
;   Loading /home/user/logon/lingo/terg/lkb/checkpaths.lsp
;   Loading /home/user/logon/lingo/terg/lkb/patches.lsp

Reading in type file fundamentals
Reading in type file tmt
Reading in type file lextypes
Syntax error: . expected and not found in N_-_C_LE at position 226346
Inserting .
Error:
"" should not be a string

Restart actions (select using :continue):
 0: retry the load of /home/user/logon/lingo/terg/lkb/script
 1: skip loading /home/user/logon/lingo/terg/lkb/script
 2: Return to Top Level (an "abort" restart).
 3: Abort entirely from this (lisp) process.

[changing package from "TSDB" to "LKB"]
[1] LKB(7):

Any idea?

Best,
Alexandre


> On 6 Aug 2020, at 05:04, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi again, alexandre:
> 
>> For some reason, the www script in the logon distribution does not start the webserver. Using the `--debug` option, I don't have any additional information in the log file (actually, the script didn't mention the debug anywhere). I am following all instructions from http://moin.delph-in.net/LogonOnline. In particular, pvmd3 is running without any error in the startup. I don't see any *.pvm file in the /tmp. The script bin/logon starts LKB and the [incr TSDB()] normally. I have used `?cat` to save a lisp file and load it manually in the ACL REPL, no error too. Any idea?
> 
> i am slowly catching up to DELPH-IN email, with apologies for the long
> turn-around!
> 
> is the above still a current problem?  is this within your container,
> or does it also occur on a 'regular' linux box?
> 
> to debug further, note that the 'www' script sets things up so that
> you can interact with the running lisp image once initialization is
> complete, i.e. just type into the lisp prompt, e.g. to inspect the
> state of AllegroServe.
> 
> when you observe that the web server is not started, does that mean it
> does not even bind to its port?  when running with the standard
> '--erg' option, i would expect the following to work (and return the
> dynamically generated top-level page):
> 
> wget http://localhost:8100/logon
> 
> best wishes, oe


From gete2 at cam.ac.uk  Mon Nov  9 16:12:19 2020
From: gete2 at cam.ac.uk (Guy Emerson)
Date: Mon, 9 Nov 2020 15:12:19 +0000
Subject: [developers] Bug in interactive unification
Message-ID: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>

Dear all,

I found a bug in interactive unification, which I posted about here:
https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592

The bug is the following: if there is no possible type for a feature path,
but that path does not exist in either of the two input feature structures,
then interactive unification does not enforce all constraints (i.e. it
produces an incorrect result, rather than reporting unification failure).

I wasn?t sure where to report this bug.

This is admittedly a rare situation (which is probably why it hasn?t been
an issue until now).  But it happens when recursive computation types lead
to a unification failure.  I?ve written a small example to illustrate the
problem (see attached file).  Note that there is no parsing involved here,
just compilation of this file and interactive unification.

In more positive news, I can report that when there is no failure, the LKB
and interactive unification are both robust to extremely recursive type
constraints. I implemented the untyped lambda calculus as a type system,
and I tested it using the Ackermann function as a lambda expression on
Church numerals (the Ackermann function is non-primitive-recursive, so I
thought this would be a good test case). With 10,570 re-entrancies (no that
is not a typo), it correctly evaluated A(2,1)=5.

Best,
Guy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201109/f3aadb2f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unification-bug.tdl
Type: application/octet-stream
Size: 3569 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20201109/f3aadb2f/attachment.obj>

From J.A.Carroll at sussex.ac.uk  Tue Nov 10 09:29:51 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Tue, 10 Nov 2020 08:29:51 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
Message-ID: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>

Dear Guy,

Thanks for this example showing the problem. I?ve reproduced it: unification failure at SUCC.RESULT with LKB native graphics, but successful unification with LUI.

What gets executed is very different between the two cases. The LKB is content to find the first failure path, whereas for LUI the LKB runs a completely different ?robust? unifier which records all failure paths. I?ve found a bug in the latter which I think accounts for the problem. In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not get assigned back to %failures% as it should. This means that currently a failure in applying a constraint is only recorded if it's not the first unification failure. Hmm...

I attach a patch for the LKB (any version) which fixes the problem you observed with LUI interactive unification. I hope it fixes the bug completely, but I haven't tested on other examples. Since it's Lisp code, you can load it by typing the following at the command line in a running LKB: (load "path-to/debug-unify2-patch.lsp")

John

On 9 Nov 2020, at 15:12, Guy Emerson <gete2 at cam.ac.uk<mailto:gete2 at cam.ac.uk>> wrote:

Dear all,

I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592<https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592>

The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure).

I wasn?t sure where to report this bug.

This is admittedly a rare situation (which is probably why it hasn?t been an issue until now).  But it happens when recursive computation types lead to a unification failure.  I?ve written a small example to illustrate the problem (see attached file).  Note that there is no parsing involved here, just compilation of this file and interactive unification.

In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5.

Best,
Guy
<unification-bug.tdl>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/b80ab6ac/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debug-unify2-patch.lsp
Type: application/octet-stream
Size: 3605 bytes
Desc: debug-unify2-patch.lsp
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/b80ab6ac/attachment-0001.obj>

From gete2 at cam.ac.uk  Tue Nov 10 12:50:11 2020
From: gete2 at cam.ac.uk (Guy Emerson)
Date: Tue, 10 Nov 2020 11:50:11 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
Message-ID: <CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>

After loading that file, instead of displaying the incorrect result, LUI
now displays nothing.  The log file (/tmp/yzlui.debug.ubuntu) says:

process_complete_command(): `
avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT:
NATNUM]] "natnum-with-copy-wrapper - expanded"
 '

process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM:
#D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded"
 '

process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM:
#D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM]
SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)"
[#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1]
#U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] '

Item in list is not homogeneous (list type 12, item type 3)
Path of failure was not a list of symbols (type 12)
Item in list is not homogeneous (list type 13, item type 3)
YZLUI: Received unknown lkb-protocol top-level command: AVM


Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll <
J.A.Carroll at sussex.ac.uk>:

> Dear Guy,
>
> Thanks for this example showing the problem. I?ve reproduced it:
> unification failure at SUCC.RESULT with LKB native graphics, but successful
> unification with LUI.
>
> What gets executed is very different between the two cases. The LKB is
> content to find the first failure path, whereas for LUI the LKB runs a
> completely different ?robust? unifier which records all failure paths. I?ve
> found a bug in the latter which I think accounts for the problem.
> In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not
> get assigned back to %failures% as it should. This means that currently a
> failure in applying a constraint is only recorded if it's not the first
> unification failure. Hmm...
>
> I attach a patch for the LKB (any version) which fixes the problem you
> observed with LUI interactive unification. I hope it fixes the bug
> completely, but I haven't tested on other examples. Since it's Lisp code,
> you can load it by typing the following at the command line in a running
> LKB: (load "path-to/debug-unify2-patch.lsp")
>
> John
>
> On 9 Nov 2020, at 15:12, Guy Emerson <gete2 at cam.ac.uk> wrote:
>
> Dear all,
>
> I found a bug in interactive unification, which I posted about here:
> https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592
>
> The bug is the following: if there is no possible type for a feature path,
> but that path does not exist in either of the two input feature structures,
> then interactive unification does not enforce all constraints (i.e. it
> produces an incorrect result, rather than reporting unification failure).
>
> I wasn?t sure where to report this bug.
>
> This is admittedly a rare situation (which is probably why it hasn?t been
> an issue until now).  But it happens when recursive computation types lead
> to a unification failure.  I?ve written a small example to illustrate the
> problem (see attached file).  Note that there is no parsing involved here,
> just compilation of this file and interactive unification.
>
> In more positive news, I can report that when there is no failure, the LKB
> and interactive unification are both robust to extremely recursive type
> constraints. I implemented the untyped lambda calculus as a type system,
> and I tested it using the Ackermann function as a lambda expression on
> Church numerals (the Ackermann function is non-primitive-recursive, so I
> thought this would be a good test case). With 10,570 re-entrancies (no that
> is not a typo), it correctly evaluated A(2,1)=5.
>
> Best,
> Guy
> <unification-bug.tdl>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/1a57cf2f/attachment.html>

From J.A.Carroll at sussex.ac.uk  Tue Nov 10 13:07:27 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Tue, 10 Nov 2020 12:07:27 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
Message-ID: <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>

I noticed that LUI didn't display anything in response to the unification failure, but didn't know why since I don't get a file in /tmp/.

I can't see anything else obviously wrong with the LKB code concerned, but I don't know whether it's sending the right thing to LUI since I haven't found any documentation on the LKB-LUI interface.

Woodley, can you shed any light on this?

John

On 10 Nov 2020, at 11:50, Guy Emerson <gete2 at cam.ac.uk<mailto:gete2 at cam.ac.uk>> wrote:

After loading that file, instead of displaying the incorrect result, LUI now displays nothing.  The log file (/tmp/yzlui.debug.ubuntu) says:

process_complete_command(): `
avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: NATNUM]] "natnum-with-copy-wrapper - expanded"
 '

process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded"
 '

process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)"
[#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] '

Item in list is not homogeneous (list type 12, item type 3)
Path of failure was not a list of symbols (type 12)
Item in list is not homogeneous (list type 13, item type 3)
YZLUI: Received unknown lkb-protocol top-level command: AVM


Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll <J.A.Carroll at sussex.ac.uk<mailto:J.A.Carroll at sussex.ac.uk>>:
Dear Guy,

Thanks for this example showing the problem. I?ve reproduced it: unification failure at SUCC.RESULT with LKB native graphics, but successful unification with LUI.

What gets executed is very different between the two cases. The LKB is content to find the first failure path, whereas for LUI the LKB runs a completely different ?robust? unifier which records all failure paths. I?ve found a bug in the latter which I think accounts for the problem. In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not get assigned back to %failures% as it should. This means that currently a failure in applying a constraint is only recorded if it's not the first unification failure. Hmm...

I attach a patch for the LKB (any version) which fixes the problem you observed with LUI interactive unification. I hope it fixes the bug completely, but I haven't tested on other examples. Since it's Lisp code, you can load it by typing the following at the command line in a running LKB: (load "path-to/debug-unify2-patch.lsp")

John

On 9 Nov 2020, at 15:12, Guy Emerson <gete2 at cam.ac.uk<mailto:gete2 at cam.ac.uk>> wrote:

Dear all,

I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592<https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592>

The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure).

I wasn?t sure where to report this bug.

This is admittedly a rare situation (which is probably why it hasn?t been an issue until now).  But it happens when recursive computation types lead to a unification failure.  I?ve written a small example to illustrate the problem (see attached file).  Note that there is no parsing involved here, just compilation of this file and interactive unification.

In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5.

Best,
Guy
<unification-bug.tdl>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/0a4b49f7/attachment-0001.html>

From arademaker at gmail.com  Wed Nov 11 00:32:17 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 10 Nov 2020 20:32:17 -0300
Subject: [developers] Bug report for ERG
In-Reply-To: <C3D995BB-2397-42DE-B5B6-3C952022F763@gmail.com>
References: <C3D995BB-2397-42DE-B5B6-3C952022F763@gmail.com>
Message-ID: <F04C92E9-7869-4DCD-9964-AF1DBF6BB6B1@gmail.com>


BTW, regardless the tokenisation issue, an invalid MRS should not be produced, right? 

Best,
Alexandre

> On 10 Nov 2020, at 18:39, Alexandre Rademaker <arademaker at gmail.com> wrote:
> 
> Hi,
> 
> I am trying to parse the sentences from EWT corpus (https://github.com/universaldependencies/UD_English-EWT) but in the DEV set I have a non-sense sentence with only an url between brackets:
> 
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> 
> ACE reports an invalid MRS. The error is in the character 2666, so probably the error is the predicate:
> 
> _search_x.htm?csp=34/NN_u_unknown
> 
> But the regex for predicates seems to support dot in the name of the predicate:
> 
> http://moin.delph-in.net/MrsRfc#SerializationFormats
> 
> Anyway, the pre-processing of the sentence seems wrong to me in ERG trunk version, the tokenisation broke the url into many tokens and consumed the protocol `http://` prefix:
> 
> % ace -g ~/hpsg/wn/terg-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> www.usatoday. com / tech/ science / space/ 2005 ? 03 ? 09 - nasa - search_x.htm?csp=34
> 
> ERG (2018) produced what I was expecting:
> 
> % ace -g erg-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34
> 
> ERG (1214) produced what I was expecting:
> 
> % ace -g erg-lingo-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> [ http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ]
> 
> 
>>>> response = ace.parse(grm, '[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]')
> NOTE: hit RAM limit while unpacking
> NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s
> 
>>>> response.result(0).mrs()
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line 146, in mrs
>    mrs = simplemrs.decode(mrs)
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 112, in decode
>    return _decode_mrs(lexer)
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 200, in _decode_mrs
>    rels.append(_decode_rel(lexer, variables))
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 252, in _decode_rel
>    _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None))
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", line 473, in expect
>    raise self._errcls('expected: ' + err,
> delphin.mrs._exceptions.MRSSyntaxError:
>  line 1, character 2666
>    [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<8:21> LBL: h1 ARG0: e4 ARG: u6 ]  [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] ARG1: u6 ]  [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<21:49> LBL: h1 ARG0: e8 ARG: x10 ]  [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]  [ udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 ]  [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ]  [ _and_c<24:25> LBL: h19 ARG0: x10 ARG1: x15 ARG2: x20 ]  [ udef_q<25:49> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]  [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: sg ] RSTR: h26 BODY: h27 ]  [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x25 ]  [ _science_n_1<30:37> LBL: h28 ARG0: x25 ]  [ _and_c<37:38> LBL: h30 ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ]  [ compound<38:49> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ]  [ udef_q<38:44> LBL: h38 ARG0: x37 RSTR: h39 BODY: h40 ]  [ _space//NN_u_unknown<38:44> LBL: h41 ARG0: x37 ]  [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ]  [ implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: h48 ]  [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ]  [ implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e52 [ e SF: prop-or-ques ] ]  [ unknown<52:55> LBL: h1 ARG0: e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<52:55> LBL: h54 ARG0: x53 RSTR: h55 BODY: h56 ]  [ yofc<52:54> LBL: h57 CARG: "09" ARG0: x53 ]  [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ]  [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ]  [ compound<55:79> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]  [ proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ]  [ named<55:59> LBL: h69 CARG: "NASA" ARG0: x65 ]  [ _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ]
>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ^
> MRSSyntaxError: expected: a feature
> 
> 
> Best,
> Alexandre
> 


From arademaker at gmail.com  Tue Nov 10 22:39:32 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Tue, 10 Nov 2020 18:39:32 -0300
Subject: [developers] Bug report for ERG
Message-ID: <C3D995BB-2397-42DE-B5B6-3C952022F763@gmail.com>

Hi,

I am trying to parse the sentences from EWT corpus (https://github.com/universaldependencies/UD_English-EWT) but in the DEV set I have a non-sense sentence with only an url between brackets:

 [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]

ACE reports an invalid MRS. The error is in the character 2666, so probably the error is the predicate:

 _search_x.htm?csp=34/NN_u_unknown

But the regex for predicates seems to support dot in the name of the predicate:

http://moin.delph-in.net/MrsRfc#SerializationFormats

Anyway, the pre-processing of the sentence seems wrong to me in ERG trunk version, the tokenisation broke the url into many tokens and consumed the protocol `http://` prefix:

% ace -g ~/hpsg/wn/terg-mac.dat -E
[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
www.usatoday. com / tech/ science / space/ 2005 ? 03 ? 09 - nasa - search_x.htm?csp=34

ERG (2018) produced what I was expecting:

% ace -g erg-mac.dat -E
[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34

ERG (1214) produced what I was expecting:

% ace -g erg-lingo-mac.dat -E
[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
[ http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ]


>>> response = ace.parse(grm, '[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]')
NOTE: hit RAM limit while unpacking
NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s

>>> response.result(0).mrs()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line 146, in mrs
    mrs = simplemrs.decode(mrs)
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 112, in decode
    return _decode_mrs(lexer)
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 200, in _decode_mrs
    rels.append(_decode_rel(lexer, variables))
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 252, in _decode_rel
    _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None))
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", line 473, in expect
    raise self._errcls('expected: ' + err,
delphin.mrs._exceptions.MRSSyntaxError:
  line 1, character 2666
    [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<8:21> LBL: h1 ARG0: e4 ARG: u6 ]  [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] ARG1: u6 ]  [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<21:49> LBL: h1 ARG0: e8 ARG: x10 ]  [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]  [ udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 ]  [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ]  [ _and_c<24:25> LBL: h19 ARG0: x10 ARG1: x15 ARG2: x20 ]  [ udef_q<25:49> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]  [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: sg ] RSTR: h26 BODY: h27 ]  [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x25 ]  [ _science_n_1<30:37> LBL: h28 ARG0: x25 ]  [ _and_c<37:38> LBL: h30 ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ]  [ compound<38:49> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ]  [ udef_q<38:44> LBL: h38 ARG0: x37 RSTR: h39 BODY: h40 ]  [ _space//NN_u_unknown<38:44> LBL: h41 ARG0: x37 ]  [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ]  [ implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: h48 ]  [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ]  [ implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e52 [ e SF: prop-or-ques ] ]  [ unknown<52:55> LBL: h1 ARG0: e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<52:55> LBL: h54 ARG0: x53 RSTR: h55 BODY: h56 ]  [ yofc<52:54> LBL: h57 CARG: "09" ARG0: x53 ]  [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ]  [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ]  [ compound<55:79> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]  [ proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ]  [ named<55:59> LBL: h69 CARG: "NASA" ARG0: x65 ]  [ _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ^
MRSSyntaxError: expected: a feature


Best,
Alexandre


From gete2 at cam.ac.uk  Tue Nov 10 17:59:58 2020
From: gete2 at cam.ac.uk (Guy Emerson)
Date: Tue, 10 Nov 2020 16:59:58 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
Message-ID: <CADPj3xERqzLiykqHHhEzYs7324Tw_-vBwGWD4Wj9SDSXCrQ5HA@mail.gmail.com>

I also get this behaviour (no LUI window appearing) if I try to display a
type that has no features (obviously, this isn't a useful display on its
own, but it would be helpful for interactive unification to be able to drag
and drop the type).  The log file says:

YZLUI: Received unknown lkb-protocol top-level command: AVM

Am Di., 10. Nov. 2020 um 12:07 Uhr schrieb John Carroll <
J.A.Carroll at sussex.ac.uk>:

> I noticed that LUI didn't display anything in response to the unification
> failure, but didn't know why since I don't get a file in /tmp/.
>
> I can't see anything else obviously wrong with the LKB code concerned, but
> I don't know whether it's sending the right thing to LUI since I haven't
> found any documentation on the LKB-LUI interface.
>
> Woodley, can you shed any light on this?
>
> John
>
> On 10 Nov 2020, at 11:50, Guy Emerson <gete2 at cam.ac.uk> wrote:
>
> After loading that file, instead of displaying the incorrect result, LUI
> now displays nothing.  The log file (/tmp/yzlui.debug.ubuntu) says:
>
> process_complete_command(): `
> avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT:
> NATNUM]] "natnum-with-copy-wrapper - expanded"
>  '
>
> process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM:
> #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded"
>  '
>
> process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM:
> #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM]
> SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)"
> [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1]
> #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] '
>
> Item in list is not homogeneous (list type 12, item type 3)
> Path of failure was not a list of symbols (type 12)
> Item in list is not homogeneous (list type 13, item type 3)
> YZLUI: Received unknown lkb-protocol top-level command: AVM
>
>
> Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll <
> J.A.Carroll at sussex.ac.uk>:
>
>> Dear Guy,
>>
>> Thanks for this example showing the problem. I?ve reproduced it:
>> unification failure at SUCC.RESULT with LKB native graphics, but successful
>> unification with LUI.
>>
>> What gets executed is very different between the two cases. The LKB is
>> content to find the first failure path, whereas for LUI the LKB runs a
>> completely different ?robust? unifier which records all failure paths. I?ve
>> found a bug in the latter which I think accounts for the problem.
>> In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not
>> get assigned back to %failures% as it should. This means that currently a
>> failure in applying a constraint is only recorded if it's not the first
>> unification failure. Hmm...
>>
>> I attach a patch for the LKB (any version) which fixes the problem you
>> observed with LUI interactive unification. I hope it fixes the bug
>> completely, but I haven't tested on other examples. Since it's Lisp code,
>> you can load it by typing the following at the command line in a running
>> LKB: (load "path-to/debug-unify2-patch.lsp")
>>
>> John
>>
>> On 9 Nov 2020, at 15:12, Guy Emerson <gete2 at cam.ac.uk> wrote:
>>
>> Dear all,
>>
>> I found a bug in interactive unification, which I posted about here:
>> https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592
>>
>> The bug is the following: if there is no possible type for a feature
>> path, but that path does not exist in either of the two input feature
>> structures, then interactive unification does not enforce all constraints
>> (i.e. it produces an incorrect result, rather than reporting unification
>> failure).
>>
>> I wasn?t sure where to report this bug.
>>
>> This is admittedly a rare situation (which is probably why it hasn?t been
>> an issue until now).  But it happens when recursive computation types lead
>> to a unification failure.  I?ve written a small example to illustrate the
>> problem (see attached file).  Note that there is no parsing involved here,
>> just compilation of this file and interactive unification.
>>
>> In more positive news, I can report that when there is no failure, the
>> LKB and interactive unification are both robust to extremely recursive type
>> constraints. I implemented the untyped lambda calculus as a type system,
>> and I tested it using the Ackermann function as a lambda expression on
>> Church numerals (the Ackermann function is non-primitive-recursive, so I
>> thought this would be a good test case). With 10,570 re-entrancies (no that
>> is not a typo), it correctly evaluated A(2,1)=5.
>>
>> Best,
>> Guy
>> <unification-bug.tdl>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/6b8c4079/attachment-0001.html>

From gete2 at cam.ac.uk  Tue Nov 10 21:04:38 2020
From: gete2 at cam.ac.uk (Guy Emerson)
Date: Tue, 10 Nov 2020 20:04:38 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <F206BE0B-E8EC-43DA-ACAF-C4D08E132C84@sweaglesw.org>
References: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
	<F206BE0B-E8EC-43DA-ACAF-C4D08E132C84@sweaglesw.org>
Message-ID: <CADPj3xE-zjYJXtRZ+c9+=0F1ScqwW6_BEGf0Yh1idJqidGqc1w@mail.gmail.com>

Thanks, John, with that patch I can also see a result, showing that zero
and defective-natnum fail to unify, at the right place!

I find the display a little counter-intuitive, because it gives a different
result depending on which direction I do the unification.  But that might
be a matter of taste.  It displays the failure and I can now use LKB+LUI to
debug my code!  For completeness, here is the log file for the two
unifications:

process_complete_command(): `avm 20 #D[defective-one-wrapper NATNUM:
#D[defective-pos-with-copy RESULT: #D[pos SUCC: DEFECTIVE-NATNUM] SUCC:
#D[zero-with-copy RESULT: ZERO]]] "Unification Failures (2)"
[#U[constraint 8 [NATNUM] DEFECTIVE-POS-WITH-COPY NATNUM-WITH-COPY
DEFECTIVE-POS-WITH-COPY -1] #U[type 7 [NATNUM SUCC RESULT] ZERO
DEFECTIVE-NATNUM 8]] '

process_complete_command(): `avm 9 #D[natnum-with-copy-wrapper NATNUM:
#D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM]
SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)"
[#U[constraint 10 [NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1]
#U[type 9 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 10]] '

a_tag_path: [0]->(null)


Woodley, here's the command for a type with no features, along with the
error again:

process_complete_command(): `avm 1 ZERO "zero - expanded"
 '

YZLUI: Received unknown lkb-protocol top-level command: AVM


Am Di., 10. Nov. 2020 um 17:51 Uhr schrieb Woodley Packard <
sweaglesw at sweaglesw.org>:

> Good sleuthing work, gentlemen.
>
> The name of the first constraint sounds like it?s being truncated somehow,
> as I would hypothesize that ?vi? is part of ?violation??  Is that
> consistent with what is displayed?
>
> John, it does look like for whatever reason I did not set maclui up to
> open a log file ? an unfortunate oversight.  I will fix that.
>
> Guy, when displaying the atomic AVM and getting no window, is there a
> corresponding ?process_complete_command? line in the log?
>
> Regards, Woodley
>
> On Nov 10, 2020, at 9:30 AM, John Carroll <J.A.Carroll at sussex.ac.uk>
> wrote:
>
> ?
> Aha, the constraint object in your LUI log has unbalanced brackets.
> Guessing which bracket is wrong, I've changed another LKB robust unifier
> function, and attach a new version of the file debug-unify2-patch.lsp
>
> With this new patch file, LUI now displays a "Unification Failures" window
> with 2 failures: "GLB Type Constraint Vi" and "No GLB Exists". Are these
> correct?
>
> John
>
>
> On 10 Nov 2020, at 12:07, John Carroll <J.A.Carroll at sussex.ac.uk> wrote:
>
> I noticed that LUI didn't display anything in response to the unification
> failure, but didn't know why since I don't get a file in /tmp/.
>
> I can't see anything else obviously wrong with the LKB code concerned, but
> I don't know whether it's sending the right thing to LUI since I haven't
> found any documentation on the LKB-LUI interface.
>
> Woodley, can you shed any light on this?
>
> John
>
> On 10 Nov 2020, at 11:50, Guy Emerson <gete2 at cam.ac.uk> wrote:
>
> After loading that file, instead of displaying the incorrect result, LUI
> now displays nothing.  The log file (/tmp/yzlui.debug.ubuntu) says:
>
> process_complete_command(): `
> avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT:
> NATNUM]] "natnum-with-copy-wrapper - expanded"
>  '
>
> process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM:
> #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded"
>  '
>
> process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM:
> #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM]
> SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)"
> [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1]
> #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] '
>
> Item in list is not homogeneous (list type 12, item type 3)
> Path of failure was not a list of symbols (type 12)
> Item in list is not homogeneous (list type 13, item type 3)
> YZLUI: Received unknown lkb-protocol top-level command: AVM
>
>
> Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll <
> J.A.Carroll at sussex.ac.uk>:
>
> Dear Guy,
>
> Thanks for this example showing the problem. I?ve reproduced it:
> unification failure at SUCC.RESULT with LKB native graphics, but successful
> unification with LUI.
>
> What gets executed is very different between the two cases. The LKB is
> content to find the first failure path, whereas for LUI the LKB runs a
> completely different ?robust? unifier which records all failure paths. I?ve
> found a bug in the latter which I think accounts for the problem.
> In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not
> get assigned back to %failures% as it should. This means that currently a
> failure in applying a constraint is only recorded if it's not the first
> unification failure. Hmm...
>
> I attach a patch for the LKB (any version) which fixes the problem you
> observed with LUI interactive unification. I hope it fixes the bug
> completely, but I haven't tested on other examples. Since it's Lisp code,
> you can load it by typing the following at the command line in a running
> LKB: (load "path-to/debug-unify2-patch.lsp")
>
> John
>
> On 9 Nov 2020, at 15:12, Guy Emerson <gete2 at cam.ac.uk> wrote:
>
> Dear all,
>
> I found a bug in interactive unification, which I posted about here:
> https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592
>
> The bug is the following: if there is no possible type for a feature path,
> but that path does not exist in either of the two input feature structures,
> then interactive unification does not enforce all constraints (i.e. it
> produces an incorrect result, rather than reporting unification failure).
>
> I wasn?t sure where to report this bug.
>
> This is admittedly a rare situation (which is probably why it hasn?t been
> an issue until now).  But it happens when recursive computation types lead
> to a unification failure.  I?ve written a small example to illustrate the
> problem (see attached file).  Note that there is no parsing involved here,
> just compilation of this file and interactive unification.
>
> In more positive news, I can report that when there is no failure, the LKB
> and interactive unification are both robust to extremely recursive type
> constraints. I implemented the untyped lambda calculus as a type system,
> and I tested it using the Ackermann function as a lambda expression on
> Church numerals (the Ackermann function is non-primitive-recursive, so I
> thought this would be a good test case). With 10,570 re-entrancies (no that
> is not a typo), it correctly evaluated A(2,1)=5.
>
> Best,
> Guy
> <unification-bug.tdl>
>
>
>
>
>
>
> <debug-unify2-patch.lsp>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/0d087a54/attachment-0001.html>

From sweaglesw at sweaglesw.org  Tue Nov 10 18:51:32 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Tue, 10 Nov 2020 09:51:32 -0800
Subject: [developers] Bug in interactive unification
In-Reply-To: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
References: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
Message-ID: <F206BE0B-E8EC-43DA-ACAF-C4D08E132C84@sweaglesw.org>

Good sleuthing work, gentlemen.

The name of the first constraint sounds like it?s being truncated somehow, as I would hypothesize that ?vi? is part of ?violation??  Is that consistent with what is displayed?

John, it does look like for whatever reason I did not set maclui up to open a log file ? an unfortunate oversight.  I will fix that.

Guy, when displaying the atomic AVM and getting no window, is there a corresponding ?process_complete_command? line in the log?

Regards, Woodley

> On Nov 10, 2020, at 9:30 AM, John Carroll <J.A.Carroll at sussex.ac.uk> wrote:
> 
> ?
> Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp
> 
> With this new patch file, LUI now displays a "Unification Failures" window with 2 failures: "GLB Type Constraint Vi" and "No GLB Exists". Are these correct?
> 
> John
> 
> 
>>> On 10 Nov 2020, at 12:07, John Carroll <J.A.Carroll at sussex.ac.uk> wrote:
>>> 
>>> I noticed that LUI didn't display anything in response to the unification failure, but didn't know why since I don't get a file in /tmp/. 
>>> 
>>> I can't see anything else obviously wrong with the LKB code concerned, but I don't know whether it's sending the right thing to LUI since I haven't found any documentation on the LKB-LUI interface.
>>> 
>>> Woodley, can you shed any light on this?
>>> 
>>> John
>>> 
>>> On 10 Nov 2020, at 11:50, Guy Emerson <gete2 at cam.ac.uk> wrote:
>>> 
>>> After loading that file, instead of displaying the incorrect result, LUI now displays nothing.  The log file (/tmp/yzlui.debug.ubuntu) says:
>>> 
>>> process_complete_command(): `
>>> avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: NATNUM]] "natnum-with-copy-wrapper - expanded"
>>>  '
>>> 
>>> process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded"
>>>  '
>>> 
>>> process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)"
>>> [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] '
>>> 
>>> Item in list is not homogeneous (list type 12, item type 3)
>>> Path of failure was not a list of symbols (type 12)
>>> Item in list is not homogeneous (list type 13, item type 3)
>>> YZLUI: Received unknown lkb-protocol top-level command: AVM
>>> 
>>> 
>>> Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll <J.A.Carroll at sussex.ac.uk>:
>>> Dear Guy,
>>> 
>>> Thanks for this example showing the problem. I?ve reproduced it: unification failure at SUCC.RESULT with LKB native graphics, but successful unification with LUI.
>>> 
>>> What gets executed is very different between the two cases. The LKB is content to find the first failure path, whereas for LUI the LKB runs a completely different ?robust? unifier which records all failure paths. I?ve found a bug in the latter which I think accounts for the problem. In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not get assigned back to %failures% as it should. This means that currently a failure in applying a constraint is only recorded if it's not the first unification failure. Hmm...
>>> 
>>> I attach a patch for the LKB (any version) which fixes the problem you observed with LUI interactive unification. I hope it fixes the bug completely, but I haven't tested on other examples. Since it's Lisp code, you can load it by typing the following at the command line in a running LKB: (load "path-to/debug-unify2-patch.lsp")
>>> 
>>> John
>>> 
>>>> On 9 Nov 2020, at 15:12, Guy Emerson <gete2 at cam.ac.uk> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592
>>>> 
>>>> The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure).
>>>> 
>>>> I wasn?t sure where to report this bug.
>>>> 
>>>> This is admittedly a rare situation (which is probably why it hasn?t been an issue until now).  But it happens when recursive computation types lead to a unification failure.  I?ve written a small example to illustrate the problem (see attached file).  Note that there is no parsing involved here, just compilation of this file and interactive unification.
>>>> 
>>>> In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5.
>>>> 
>>>> Best,
>>>> Guy
>>>> <unification-bug.tdl>
>>> 
>>> 
>> 
> 
> 
> <debug-unify2-patch.lsp>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/fe954726/attachment-0001.html>

From J.A.Carroll at sussex.ac.uk  Tue Nov 10 18:30:34 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Tue, 10 Nov 2020 17:30:34 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
Message-ID: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>

Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp

With this new patch file, LUI now displays a "Unification Failures" window with 2 failures: "GLB Type Constraint Vi" and "No GLB Exists". Are these correct?

John


On 10 Nov 2020, at 12:07, John Carroll <J.A.Carroll at sussex.ac.uk<mailto:J.A.Carroll at sussex.ac.uk>> wrote:

I noticed that LUI didn't display anything in response to the unification failure, but didn't know why since I don't get a file in /tmp/.

I can't see anything else obviously wrong with the LKB code concerned, but I don't know whether it's sending the right thing to LUI since I haven't found any documentation on the LKB-LUI interface.

Woodley, can you shed any light on this?

John

On 10 Nov 2020, at 11:50, Guy Emerson <gete2 at cam.ac.uk<mailto:gete2 at cam.ac.uk>> wrote:

After loading that file, instead of displaying the incorrect result, LUI now displays nothing.  The log file (/tmp/yzlui.debug.ubuntu) says:

process_complete_command(): `
avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: NATNUM]] "natnum-with-copy-wrapper - expanded"
 '

process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded"
 '

process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)"
[#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] '

Item in list is not homogeneous (list type 12, item type 3)
Path of failure was not a list of symbols (type 12)
Item in list is not homogeneous (list type 13, item type 3)
YZLUI: Received unknown lkb-protocol top-level command: AVM


Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll <J.A.Carroll at sussex.ac.uk<mailto:J.A.Carroll at sussex.ac.uk>>:
Dear Guy,

Thanks for this example showing the problem. I?ve reproduced it: unification failure at SUCC.RESULT with LKB native graphics, but successful unification with LUI.

What gets executed is very different between the two cases. The LKB is content to find the first failure path, whereas for LUI the LKB runs a completely different ?robust? unifier which records all failure paths. I?ve found a bug in the latter which I think accounts for the problem. In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not get assigned back to %failures% as it should. This means that currently a failure in applying a constraint is only recorded if it's not the first unification failure. Hmm...

I attach a patch for the LKB (any version) which fixes the problem you observed with LUI interactive unification. I hope it fixes the bug completely, but I haven't tested on other examples. Since it's Lisp code, you can load it by typing the following at the command line in a running LKB: (load "path-to/debug-unify2-patch.lsp")

John

On 9 Nov 2020, at 15:12, Guy Emerson <gete2 at cam.ac.uk<mailto:gete2 at cam.ac.uk>> wrote:

Dear all,

I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592<https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592>

The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure).

I wasn?t sure where to report this bug.

This is admittedly a rare situation (which is probably why it hasn?t been an issue until now).  But it happens when recursive computation types lead to a unification failure.  I?ve written a small example to illustrate the problem (see attached file).  Note that there is no parsing involved here, just compilation of this file and interactive unification.

In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5.

Best,
Guy
<unification-bug.tdl>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/45e9270e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debug-unify2-patch.lsp
Type: application/octet-stream
Size: 5174 bytes
Desc: debug-unify2-patch.lsp
URL: <http://lists.delph-in.net/archives/developers/attachments/20201110/45e9270e/attachment-0001.obj>

From oe at ifi.uio.no  Wed Nov 11 17:57:44 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Wed, 11 Nov 2020 17:57:44 +0100
Subject: [developers] Bug in interactive unification
In-Reply-To: <CADPj3xERqzLiykqHHhEzYs7324Tw_-vBwGWD4Wj9SDSXCrQ5HA@mail.gmail.com>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
	<CADPj3xERqzLiykqHHhEzYs7324Tw_-vBwGWD4Wj9SDSXCrQ5HA@mail.gmail.com>
Message-ID: <CA+_Fm6KWJjXMzdWb4tQk3LHdN1AB1Tf7Aw3rTUJNKE-x2iJ6HQ@mail.gmail.com>

hi guy,

> I also get this behaviour (no LUI window appearing) if I try to display a type that has no features (obviously, this isn't a useful display on its own, but it would be helpful for interactive unification to be able to drag and drop the type).  The log file says:
>
> YZLUI: Received unknown lkb-protocol top-level command: AVM

could you send the complete log output, i.e. including the 'avm'
command that LUI fails to recognize?

oe


From gete2 at cam.ac.uk  Wed Nov 11 18:42:19 2020
From: gete2 at cam.ac.uk (Guy Emerson)
Date: Wed, 11 Nov 2020 17:42:19 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <CA+_Fm6KWJjXMzdWb4tQk3LHdN1AB1Tf7Aw3rTUJNKE-x2iJ6HQ@mail.gmail.com>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
	<CADPj3xERqzLiykqHHhEzYs7324Tw_-vBwGWD4Wj9SDSXCrQ5HA@mail.gmail.com>
	<CA+_Fm6KWJjXMzdWb4tQk3LHdN1AB1Tf7Aw3rTUJNKE-x2iJ6HQ@mail.gmail.com>
Message-ID: <CADPj3xEH0fWE3fsdBDb1BEiTBQaArYobZDdeRG=r9sN16SFntw@mail.gmail.com>

Hi Stephan,

The command is:

process_complete_command(): `avm 1 ZERO "zero - expanded"
 '

YZLUI: Received unknown lkb-protocol top-level command: AVM

Best,
Guy

Am Mi., 11. Nov. 2020 um 16:57 Uhr schrieb Stephan Oepen <oe at ifi.uio.no>:

> hi guy,
>
> > I also get this behaviour (no LUI window appearing) if I try to display
> a type that has no features (obviously, this isn't a useful display on its
> own, but it would be helpful for interactive unification to be able to drag
> and drop the type).  The log file says:
> >
> > YZLUI: Received unknown lkb-protocol top-level command: AVM
>
> could you send the complete log output, i.e. including the 'avm'
> command that LUI fails to recognize?
>
> oe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201111/04642768/attachment.html>

From sweaglesw at sweaglesw.org  Wed Nov 11 19:08:14 2020
From: sweaglesw at sweaglesw.org (Woodley Packard)
Date: Wed, 11 Nov 2020 10:08:14 -0800
Subject: [developers] Bug in interactive unification
In-Reply-To: <CADPj3xEH0fWE3fsdBDb1BEiTBQaArYobZDdeRG=r9sN16SFntw@mail.gmail.com>
References: <CADPj3xEH0fWE3fsdBDb1BEiTBQaArYobZDdeRG=r9sN16SFntw@mail.gmail.com>
Message-ID: <F67454B4-B883-4DD1-8077-5E2530B19FE9@sweaglesw.org>

My best guess is LUI expected a #D structure instead of a symbol; e.g.

 #D[zero]

One could argue that LUI should be a bit more forgiving in enforcing type constraints on its commands.  Both internally and in the protocol, atomic values are treated differently from feature structures.  An atomic value at the top level is unanticipated, but I think it should work just fine if wrapped into a (trivial) feature structure.

Woodley

>> On Nov 11, 2020, at 9:42 AM, Guy Emerson <gete2 at cam.ac.uk> wrote:
> ?
> Hi Stephan,
> 
> The command is:
> 
> process_complete_command(): `avm 1 ZERO "zero - expanded"
>  '
> 
> YZLUI: Received unknown lkb-protocol top-level command: AVM
> 
> Best,
> Guy
> 
>> Am Mi., 11. Nov. 2020 um 16:57 Uhr schrieb Stephan Oepen <oe at ifi.uio.no>:
>> hi guy,
>> 
>> > I also get this behaviour (no LUI window appearing) if I try to display a type that has no features (obviously, this isn't a useful display on its own, but it would be helpful for interactive unification to be able to drag and drop the type).  The log file says:
>> >
>> > YZLUI: Received unknown lkb-protocol top-level command: AVM
>> 
>> could you send the complete log output, i.e. including the 'avm'
>> command that LUI fails to recognize?
>> 
>> oe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201111/5a866606/attachment.html>

From oe at ifi.uio.no  Wed Nov 11 22:49:28 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Wed, 11 Nov 2020 22:49:28 +0100
Subject: [developers] Bug in interactive unification
In-Reply-To: <F67454B4-B883-4DD1-8077-5E2530B19FE9@sweaglesw.org>
References: <CADPj3xEH0fWE3fsdBDb1BEiTBQaArYobZDdeRG=r9sN16SFntw@mail.gmail.com>
	<F67454B4-B883-4DD1-8077-5E2530B19FE9@sweaglesw.org>
Message-ID: <CA+_Fm6+3WAS1wrfrA=RTbuCbaeqO+tdR9-uWd7nnvC1mqLfZhg@mail.gmail.com>

hi woodley, and all:

> One could argue that LUI should be a bit more forgiving in enforcing type constraints on its commands.  Both internally and in the protocol, atomic values are treated differently from feature structures.  An atomic value at the top level is unanticipated, but I think it should work just fine if wrapped into a (trivial) feature structure.

yes, i am almost tempted to make that argument :-).  looking over the
code, it appears that for the serialization of AVMs the LUI team (in
2003, i would think) decided to piggy-back on what the LKB calls its
'linear' dag output format.  that looks like a format invented by ann
or john prior to LUI integration, and it does indeed consider any dag
without outgoing arcs an 'atomic' feature structure that is serialized
without any #D[...] decoration.  i am not sure written records of
protocol negotiations internal to the LUI team exist, but if the above
were by and large how the protocol was defined ... it would not be
unreasonable to expect LUI to accept the 'linear' serialization of
such atomic dags.

on the other hand, the current LUI interpretation of the protocol has
a broad and loyal user base, and it is not hard to accommodate its
expectations on the LKB side.  i just checked in the following
work-around (to both the LOGON and FOS branches of the LKB source
code), which appears to have the desired effect and should end up in
the next round of binary builds then.

best wishes, oe


Index: src/glue/lui.lsp
===================================================================
--- src/glue/lui.lsp    (revision 29084)
+++ src/glue/lui.lsp    (working copy)
@@ -307,10 +307,16 @@
          (*package* (find-package :lkb)))

     (lui-parameters :avm)
-    (let ((string (with-output-to-string (stream)
-                    (format stream "avm ~d " id)
-                    (display-dag1 dag 'linear stream))))
-      (format %lui-stream% string))
+    (let* ((string (with-output-to-string (stream)
+                     (display-dag1 dag 'linear stream)))
+           ;;
+           ;; work around a LUI idiosyncrasy: dress up atomic dags with a
+           ;; (kind of) spurious outermost decoration.
+           ;;
+           (string (if (char= (char string 0) #\#)
+                     string
+                     (concatenate 'string "#D[" string "]"))))
+      (format %lui-stream% "avm ~d ~a" id string))
     #+:null
     (format %lui-stream% " ~s~%" path)
     (format %lui-stream% " ~s~%" title)


From oe at ifi.uio.no  Wed Nov 11 22:52:09 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Wed, 11 Nov 2020 22:52:09 +0100
Subject: [developers] Bug in interactive unification
In-Reply-To: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
	<8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
Message-ID: <CA+_Fm6+7Vh9fk_ELhzCGyTSsqGimC2ZFCpmePqwB992AEtrLbQ@mail.gmail.com>

hi john:

> Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp

many thanks for the quick diagnostics and fixes!  i looked over both
your patches, and they seem like just the right fix to two genuine
bugs that have been lurking (for the past sixteen or so years :-) in
the interactive unifier behind the LUI drag-and-drop interface.  i
have just picked them up (and added my own fix for the LUI display of
atomic dags) and committed these changes to both the LOGON and FOS
repositories.

best wishes, oe

From goodman.m.w at gmail.com  Thu Nov 12 03:19:02 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Thu, 12 Nov 2020 10:19:02 +0800
Subject: [developers] Bug report for ERG
In-Reply-To: <F04C92E9-7869-4DCD-9964-AF1DBF6BB6B1@gmail.com>
References: <C3D995BB-2397-42DE-B5B6-3C952022F763@gmail.com>
	<F04C92E9-7869-4DCD-9964-AF1DBF6BB6B1@gmail.com>
Message-ID: <CAGXBFAoYL5_3kDHTgOLGuYAmGU19rcMhZLEqVi6Zoqgr6Zj7YQ@mail.gmail.com>

Hi Alexandre,

I was able to reproduce the issue using the ERG 2018 (which creates a named
EP with the URL as its CARG) and a ~3-month old trunk version of the ERG
(which tokenized the URL). I'll leave the question of the ERG's behavior to
the pros, and I'll address the MRS syntax problem.

PyDelphin reported the syntax error at the '.' character because that's the
point at which the SimpleMRS parser was unable to proceed, but the problem
is in fact the '_' in the lemma portion of the predicate symbol. Currently
there is no agreed-upon way to have a lemma containing '_', as '_' is the
delimiter between the lemma and pos fields. The so-called "TypePred"
production in the SimpleMRS BNF at http://moin.delph-in.net/MrsRfc#Simple
is overly permissive (note: I wrote it, adapting Bec's original). Stephan
and I had some discussion about the mini-format of predicate symbols on
GitHub (https://github.com/delph-in/pydelphin/issues/302) but unfortunately
little of that conversation made it to this list.

In short, I propose a character-escaping solution for use in predicate
symbols for all serialization formats. For this, we could recycle TSDB's
three escapes (\s, \n, and \\), where in this case the separator \s is '_'
instead of '@'. The serialization formats (SimpleMRS, MRX, EDS native,
etc.). Any other characters that might cause issues in parsing (such as a
space or '<' in SimpleMRS, also '[', '{', or '(' in EDS, etc.) would be
handled by those formats individually. For SimpleMRS, I suggest quoting any
predicate that contains a space or '<' (and quotes are not part of the
predicate format, only part of SimpleMRS's), and then escaping quotes (\")
inside predicates. This means that abstract predicates (compound, udef_q,
etc) would also be quoted, if they had a space or '<'. In MRX, a predicate
with '<' would need to replace it with &lt;, and so on.

If we agree on such a change, then both PyDelphin and ACE (and other
processors) would need to be modified to get around the issue you're
experiencing. Of course, this specific issue could be sidestepped by
getting the ERG to put URLs back into CARGs instead of being tokenized and
parsed into generic predicate symbols.

On Thu, Nov 12, 2020 at 12:54 AM Alexandre Rademaker <arademaker at gmail.com>
wrote:

>
> BTW, regardless the tokenisation issue, an invalid MRS should not be
> produced, right?
>
> Best,
> Alexandre
>
> > On 10 Nov 2020, at 18:39, Alexandre Rademaker <arademaker at gmail.com>
> wrote:
> >
> > Hi,
> >
> > I am trying to parse the sentences from EWT corpus (
> https://github.com/universaldependencies/UD_English-EWT) but in the DEV
> set I have a non-sense sentence with only an url between brackets:
> >
> > [
> http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34
> ]
> >
> > ACE reports an invalid MRS. The error is in the character 2666, so
> probably the error is the predicate:
> >
> > _search_x.htm?csp=34/NN_u_unknown
> >
> > But the regex for predicates seems to support dot in the name of the
> predicate:
> >
> > http://moin.delph-in.net/MrsRfc#SerializationFormats
> >
> > Anyway, the pre-processing of the sentence seems wrong to me in ERG
> trunk version, the tokenisation broke the url into many tokens and consumed
> the protocol `http://` prefix:
> >
> > % ace -g ~/hpsg/wn/terg-mac.dat -E
> > [
> http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34
> ]
> > www.usatoday. com / tech/ science / space/ 2005 ? 03 ? 09 - nasa -
> search_x.htm?csp=34
> >
> > ERG (2018) produced what I was expecting:
> >
> > % ace -g erg-mac.dat -E
> > [
> http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34
> ]
> > www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34
> >
> > ERG (1214) produced what I was expecting:
> >
> > % ace -g erg-lingo-mac.dat -E
> > [
> http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34
> ]
> > [
> http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34
> ]
> >
> >
> >>>> response = ace.parse(grm, '[
> http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> ')
> > NOTE: hit RAM limit while unpacking
> > NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s
> >
> >>>> response.result(0).mrs()
> > Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> >  File
> "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line
> 146, in mrs
> >    mrs = simplemrs.decode(mrs)
> >  File
> "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py",
> line 112, in decode
> >    return _decode_mrs(lexer)
> >  File
> "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py",
> line 200, in _decode_mrs
> >    rels.append(_decode_rel(lexer, variables))
> >  File
> "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py",
> line 252, in _decode_rel
> >    _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None))
> >  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py",
> line 473, in expect
> >    raise self._errcls('expected: ' + err,
> > delphin.mrs._exceptions.MRSSyntaxError:
> >  line 1, character 2666
> >    [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD:
> indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e
> SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques
> TENSE: tensed MOOD: indicative ] ]  [ unknown<8:21> LBL: h1 ARG0: e4 ARG:
> u6 ]  [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ]
> ARG1: u6 ]  [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF:
> prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques
> TENSE: tensed MOOD: indicative ] ]  [ unknown<21:49> LBL: h1 ARG0: e8 ARG:
> x10 ]  [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]  [
> udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17
> ]  [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ]  [ _and_c<24:25> LBL:
> h19 ARG0: x10 ARG1: x15 ARG2: x20 ]  [ udef_q<25:49> LBL: h21 ARG0: x20
> RSTR: h22 BODY: h23 ]  [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM:
> sg ] RSTR: h26 BODY: h27 ]  [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29
> [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1:
> x25 ]  [ _science_n_1<30:37> LBL: h28 ARG0: x25 ]  [ _and_c<37:38> LBL: h30
> ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ]  [
> proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ]  [ compound<38:49>
> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: -
> PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ]  [ udef_q<38:44> LBL: h38
> ARG0: x37 RSTR: h39 BODY: h40 ]  [ _space//NN_u_unknown<38:44> LBL: h41
> ARG0: x37 ]  [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ]  [
> implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE:
> tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed
> MOOD: indicative ] ]  [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS:
> 3 NUM: sg IND: + ] ]  [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY:
> h48 ]  [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ]  [
> implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques
> TENSE: tensed MOOD: indicati!
>  ve ] ARG2: e52 [ e SF: prop-or-ques ] ]  [ unknown<52:55> LBL: h1 ARG0:
> e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<52:55> LBL: h54
> ARG0: x53 RSTR: h55 BODY: h56 ]  [ yofc<52:54> LBL: h57 CARG: "09" ARG0:
> x53 ]  [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ]
> [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ]  [ compound<55:79>
> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: -
> PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]  [
> proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ]  [ named<55:59>
> LBL: h69 CARG: "NASA" ARG0: x65 ]  [
> _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0
> qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq
> h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ]
> >
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>                                                                 !
>
>
>
>
>
>
>
>
>                            ^
> > MRSSyntaxError: expected: a feature
> >
> >
> > Best,
> > Alexandre
> >
>
>
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201112/f2ce71fd/attachment-0001.html>

From danf at stanford.edu  Thu Nov 12 03:40:44 2020
From: danf at stanford.edu (Dan Flickinger)
Date: Thu, 12 Nov 2020 02:40:44 +0000
Subject: [developers] Bug report for ERG
In-Reply-To: <CAGXBFAoYL5_3kDHTgOLGuYAmGU19rcMhZLEqVi6Zoqgr6Zj7YQ@mail.gmail.com>
References: <C3D995BB-2397-42DE-B5B6-3C952022F763@gmail.com>
	<F04C92E9-7869-4DCD-9964-AF1DBF6BB6B1@gmail.com>,
	<CAGXBFAoYL5_3kDHTgOLGuYAmGU19rcMhZLEqVi6Zoqgr6Zj7YQ@mail.gmail.com>
Message-ID: <SN4PR0201MB3549EB0BD27973893A2FB9ECBAE70@SN4PR0201MB3549.namprd02.prod.outlook.com>

One of the unfortunate consequences of the change in tokenization for the trunk ERG (treating punctuation marks as separate tokens) is that we no longer correctly handle web addresses in text, because the tokenizer now splits at slashes and periods, `exploding' URLs into many separate tokens.  This is obviously not the desired behavior, and Stephan has been leading an effort to get a uniform preprocessing mechanism into the various platforms so we can cope with URLs and the like, by ensuring that they are single tokens by the time the parser sees them.

In the meantime, Alexandre, perhaps you can write a little temporary script that replaces URLs with a single simple token before presenting a sentence to ACE for parsing.

 Dan

________________________________
From: developers-bounces at emmtee.net <developers-bounces at emmtee.net> on behalf of goodman.m.w at gmail.com <goodman.m.w at gmail.com>
Sent: Wednesday, November 11, 2020 6:19 PM
To: Alexandre Rademaker <arademaker at gmail.com>
Cc: developers <developers at delph-in.net>
Subject: Re: [developers] Bug report for ERG

Hi Alexandre,

I was able to reproduce the issue using the ERG 2018 (which creates a named EP with the URL as its CARG) and a ~3-month old trunk version of the ERG (which tokenized the URL). I'll leave the question of the ERG's behavior to the pros, and I'll address the MRS syntax problem.

PyDelphin reported the syntax error at the '.' character because that's the point at which the SimpleMRS parser was unable to proceed, but the problem is in fact the '_' in the lemma portion of the predicate symbol. Currently there is no agreed-upon way to have a lemma containing '_', as '_' is the delimiter between the lemma and pos fields. The so-called "TypePred" production in the SimpleMRS BNF at http://moin.delph-in.net/MrsRfc#Simple is overly permissive (note: I wrote it, adapting Bec's original). Stephan and I had some discussion about the mini-format of predicate symbols on GitHub (https://github.com/delph-in/pydelphin/issues/302) but unfortunately little of that conversation made it to this list.

In short, I propose a character-escaping solution for use in predicate symbols for all serialization formats. For this, we could recycle TSDB's three escapes (\s, \n, and \\), where in this case the separator \s is '_' instead of '@'. The serialization formats (SimpleMRS, MRX, EDS native, etc.). Any other characters that might cause issues in parsing (such as a space or '<' in SimpleMRS, also '[', '{', or '(' in EDS, etc.) would be handled by those formats individually. For SimpleMRS, I suggest quoting any predicate that contains a space or '<' (and quotes are not part of the predicate format, only part of SimpleMRS's), and then escaping quotes (\") inside predicates. This means that abstract predicates (compound, udef_q, etc) would also be quoted, if they had a space or '<'. In MRX, a predicate with '<' would need to replace it with &lt;, and so on.

If we agree on such a change, then both PyDelphin and ACE (and other processors) would need to be modified to get around the issue you're experiencing. Of course, this specific issue could be sidestepped by getting the ERG to put URLs back into CARGs instead of being tokenized and parsed into generic predicate symbols.

On Thu, Nov 12, 2020 at 12:54 AM Alexandre Rademaker <arademaker at gmail.com<mailto:arademaker at gmail.com>> wrote:

BTW, regardless the tokenisation issue, an invalid MRS should not be produced, right?

Best,
Alexandre

> On 10 Nov 2020, at 18:39, Alexandre Rademaker <arademaker at gmail.com<mailto:arademaker at gmail.com>> wrote:
>
> Hi,
>
> I am trying to parse the sentences from EWT corpus (https://github.com/universaldependencies/UD_English-EWT) but in the DEV set I have a non-sense sentence with only an url between brackets:
>
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
>
> ACE reports an invalid MRS. The error is in the character 2666, so probably the error is the predicate:
>
> _search_x.htm?csp=34/NN_u_unknown
>
> But the regex for predicates seems to support dot in the name of the predicate:
>
> http://moin.delph-in.net/MrsRfc#SerializationFormats
>
> Anyway, the pre-processing of the sentence seems wrong to me in ERG trunk version, the tokenisation broke the url into many tokens and consumed the protocol `http://` prefix:
>
> % ace -g ~/hpsg/wn/terg-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> www.usatoday. com / tech/ science / space/ 2005 ? 03 ? 09 - nasa - search_x.htm?csp=34
>
> ERG (2018) produced what I was expecting:
>
> % ace -g erg-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34<http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34>
>
> ERG (1214) produced what I was expecting:
>
> % ace -g erg-lingo-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> [ http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ]
>
>
>>>> response = ace.parse(grm, '[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]')
> NOTE: hit RAM limit while unpacking
> NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s
>
>>>> response.result(0).mrs()
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line 146, in mrs
>    mrs = simplemrs.decode(mrs)
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 112, in decode
>    return _decode_mrs(lexer)
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 200, in _decode_mrs
>    rels.append(_decode_rel(lexer, variables))
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 252, in _decode_rel
>    _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None))
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", line 473, in expect
>    raise self._errcls('expected: ' + err,
> delphin.mrs._exceptions.MRSSyntaxError:
>  line 1, character 2666
>    [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<8:21> LBL: h1 ARG0: e4 ARG: u6 ]  [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] ARG1: u6 ]  [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<21:49> LBL: h1 ARG0: e8 ARG: x10 ]  [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]  [ udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 ]  [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ]  [ _and_c<24:25> LBL: h19 ARG0: x10 ARG1: x15 ARG2: x20 ]  [ udef_q<25:49> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]  [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: sg ] RSTR: h26 BODY: h27 ]  [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x25 ]  [ _science_n_1<30:37> LBL: h28 ARG0: x25 ]  [ _and_c<37:38> LBL: h30 ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ]  [ compound<38:49> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ]  [ udef_q<38:44> LBL: h38 ARG0: x37 RSTR: h39 BODY: h40 ]  [ _space//NN_u_unknown<38:44> LBL: h41 ARG0: x37 ]  [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ]  [ implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: h48 ]  [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ]  [ implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques TENSE: tensed MOOD: indicati!
 ve ] ARG2: e52 [ e SF: prop-or-ques ] ]  [ unknown<52:55> LBL: h1 ARG0: e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<52:55> LBL: h54 ARG0: x53 RSTR: h55 BODY: h56 ]  [ yofc<52:54> LBL: h57 CARG: "09" ARG0: x53 ]  [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ]  [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ]  [ compound<55:79> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]  [ proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ]  [ named<55:59> LBL: h69 CARG: "NASA" ARG0: x65 ]  [ _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ]
>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      !
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ^
> MRSSyntaxError: expected: a feature
>
>
> Best,
> Alexandre
>


--
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201112/05eb6b9f/attachment-0001.html>

From gete2 at cam.ac.uk  Fri Nov 13 12:34:58 2020
From: gete2 at cam.ac.uk (Guy Emerson)
Date: Fri, 13 Nov 2020 11:34:58 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <CA+_Fm6+7Vh9fk_ELhzCGyTSsqGimC2ZFCpmePqwB992AEtrLbQ@mail.gmail.com>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
	<8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
	<CA+_Fm6+7Vh9fk_ELhzCGyTSsqGimC2ZFCpmePqwB992AEtrLbQ@mail.gmail.com>
Message-ID: <CADPj3xHbx5KVUVq_roLAXyWNvVgKPCtu5RfRYp9XA=kV8JODjA@mail.gmail.com>

Hi John, Woodley, and Stephan,

Thank you all for your fast responses!  This has been really helpful.

I have come across two further small bugs, both to do with type names
consisting entirely of numeric characters.  Such names are allowed
internally in the LKB, and in terms of unification, they seem to behave
exactly as I expect them to.  I couldn't find documentation suggesting that
numeric characters should be treated differently.  However:

(1) in the View>Expanded type pop-up window, a string of numeric characters
gives the message "Not defined - try again.", even if the type is defined.
(When the pop-up window shows a drop-down instead of a text box, such types
appear in the list.)

(2) Displaying such a type causes LUI to crash.  For example, a type named
"1" causes a crash, with the following in the log file (where the value of
RESULT is 1):

process_complete_command(): `
avm 3 #D[null-with-push-1-here RESULT: #D[1 REST: NULL]]
"null-with-push-1-here - expanded"
 '

Type of dag was not a symbol or string (type 2)


Best,
Guy


Am Mi., 11. Nov. 2020 um 21:52 Uhr schrieb Stephan Oepen <oe at ifi.uio.no>:

> hi john:
>
> > Aha, the constraint object in your LUI log has unbalanced brackets.
> Guessing which bracket is wrong, I've changed another LKB robust unifier
> function, and attach a new version of the file debug-unify2-patch.lsp
>
> many thanks for the quick diagnostics and fixes!  i looked over both
> your patches, and they seem like just the right fix to two genuine
> bugs that have been lurking (for the past sixteen or so years :-) in
> the interactive unifier behind the LUI drag-and-drop interface.  i
> have just picked them up (and added my own fix for the LUI display of
> atomic dags) and committed these changes to both the LOGON and FOS
> repositories.
>
> best wishes, oe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201113/d264f4f5/attachment.html>

From J.A.Carroll at sussex.ac.uk  Sun Nov 15 16:43:37 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Sun, 15 Nov 2020 15:43:37 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <CADPj3xHbx5KVUVq_roLAXyWNvVgKPCtu5RfRYp9XA=kV8JODjA@mail.gmail.com>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
	<8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
	<CA+_Fm6+7Vh9fk_ELhzCGyTSsqGimC2ZFCpmePqwB992AEtrLbQ@mail.gmail.com>
	<CADPj3xHbx5KVUVq_roLAXyWNvVgKPCtu5RfRYp9XA=kV8JODjA@mail.gmail.com>
Message-ID: <6850BE66-CE3A-4059-A855-189A4240C7CE@sussex.ac.uk>

Hi Guy, thanks for the bug reports! Responses below.

On 13 Nov 2020, at 11:34, Guy Emerson <gete2 at cam.ac.uk<mailto:gete2 at cam.ac.uk>> wrote:

Hi John, Woodley, and Stephan,

Thank you all for your fast responses!  This has been really helpful.

I have come across two further small bugs, both to do with type names consisting entirely of numeric characters.  Such names are allowed internally in the LKB, and in terms of unification, they seem to behave exactly as I expect them to.  I couldn't find documentation suggesting that numeric characters should be treated differently.

You're right - types with all-numeric names should work fine. At http://moin.delph-in.net/TdlRfc the relevant clause is

  Identifier := /[^\s!"#$%&'(),.\/:;<=>[\]^|]+/

which allows any characters apart from whitespace and a few other non-alphanumerics.

However:

(1) in the View>Expanded type pop-up window, a string of numeric characters gives the message "Not defined - try again.", even if the type is defined.  (When the pop-up window shows a drop-down instead of a text box, such types appear in the list.)

Yes, this is a bug in the LKB. I'll email you a patch file which you can load to fix it - and I'll commit the changes to the LOGON and FOS branches of the LKB.

(2) Displaying such a type causes LUI to crash.  For example, a type named "1" causes a crash, with the following in the log file (where the value of RESULT is 1):

process_complete_command(): `
avm 3 #D[null-with-push-1-here RESULT: #D[1 REST: NULL]] "null-with-push-1-here - expanded"
 '

Type of dag was not a symbol or string (type 2)

This error message comes from LUI, and I think it needs fixing there.

John


Best,
Guy


Am Mi., 11. Nov. 2020 um 21:52 Uhr schrieb Stephan Oepen <oe at ifi.uio.no<mailto:oe at ifi.uio.no>>:
hi john:

> Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp

many thanks for the quick diagnostics and fixes!  i looked over both
your patches, and they seem like just the right fix to two genuine
bugs that have been lurking (for the past sixteen or so years :-) in
the interactive unifier behind the LUI drag-and-drop interface.  i
have just picked them up (and added my own fix for the LUI display of
atomic dags) and committed these changes to both the LOGON and FOS
repositories.

best wishes, oe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201115/17da7d33/attachment.html>

From oe at ifi.uio.no  Sun Nov 15 16:58:55 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sun, 15 Nov 2020 16:58:55 +0100
Subject: [developers] Bug in interactive unification
In-Reply-To: <CADPj3xHbx5KVUVq_roLAXyWNvVgKPCtu5RfRYp9XA=kV8JODjA@mail.gmail.com>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
	<8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
	<CA+_Fm6+7Vh9fk_ELhzCGyTSsqGimC2ZFCpmePqwB992AEtrLbQ@mail.gmail.com>
	<CADPj3xHbx5KVUVq_roLAXyWNvVgKPCtu5RfRYp9XA=kV8JODjA@mail.gmail.com>
Message-ID: <CA+_Fm6+-xuGREr5N-ae7RtqpDmj5ynKWNj-URqoRezBwLG6Zjg@mail.gmail.com>

further toward perfection in the LKB interfaces :-).

guy, can you try asking for the type display by typing in |1| (including
the vertical bars).  i suspect the prompt windows just uses the lisp read()
function, which will interpret a string of digits as a number, rather than
as a symbol.  in TDL parsing, however, all (unquoted) type names are
interpreted as symbols.  the |...| syntax will force symbol interpretation.
 this would be easy to fix, and likely applies in other input routines that
prompt for type or grammar entity names.

regarding LUI communication, i am inclined to suggest that this is a bug in
the #D[...] reader ... woodley, how could you not agree?

cheers, oe

ps: i had originally drafted this message  yesterday; i suspect the LKB
input fix that john has in mind is likely along the lines above?


On Fri, 13 Nov 2020 at 12:36 Guy Emerson <gete2 at cam.ac.uk> wrote:

> Hi John, Woodley, and Stephan,
>
> Thank you all for your fast responses!  This has been really helpful.
>
> I have come across two further small bugs, both to do with type names
> consisting entirely of numeric characters.  Such names are allowed
> internally in the LKB, and in terms of unification, they seem to behave
> exactly as I expect them to.  I couldn't find documentation suggesting that
> numeric characters should be treated differently.  However:
>
> (1) in the View>Expanded type pop-up window, a string of numeric
> characters gives the message "Not defined - try again.", even if the type
> is defined.  (When the pop-up window shows a drop-down instead of a text
> box, such types appear in the list.)
>
> (2) Displaying such a type causes LUI to crash.  For example, a type named
> "1" causes a crash, with the following in the log file (where the value of
> RESULT is 1):
>
> process_complete_command(): `
> avm 3 #D[null-with-push-1-here RESULT: #D[1 REST: NULL]]
> "null-with-push-1-here - expanded"
>  '
>
> Type of dag was not a symbol or string (type 2)
>
>
> Best,
> Guy
>
>
> Am Mi., 11. Nov. 2020 um 21:52 Uhr schrieb Stephan Oepen <oe at ifi.uio.no>:
>
>> hi john:
>>
>> > Aha, the constraint object in your LUI log has unbalanced brackets.
>> Guessing which bracket is wrong, I've changed another LKB robust unifier
>> function, and attach a new version of the file debug-unify2-patch.lsp
>>
>> many thanks for the quick diagnostics and fixes!  i looked over both
>> your patches, and they seem like just the right fix to two genuine
>> bugs that have been lurking (for the past sixteen or so years :-) in
>> the interactive unifier behind the LUI drag-and-drop interface.  i
>> have just picked them up (and added my own fix for the LUI display of
>> atomic dags) and committed these changes to both the LOGON and FOS
>> repositories.
>>
>> best wishes, oe
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201115/e7e7e734/attachment.html>

From J.A.Carroll at sussex.ac.uk  Sun Nov 15 17:13:52 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Sun, 15 Nov 2020 16:13:52 +0000
Subject: [developers] Bug in interactive unification
In-Reply-To: <CA+_Fm6+-xuGREr5N-ae7RtqpDmj5ynKWNj-URqoRezBwLG6Zjg@mail.gmail.com>
References: <CADPj3xEpAo4On_j1prWsb3zpVMasrzSdrGt0u4oN6Mg1AXitKQ@mail.gmail.com>
	<821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk>
	<CADPj3xE1WUO4pBdCd0DXDJj20i9c3zxpgq949AAnBKJVdnARAw@mail.gmail.com>
	<3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk>
	<8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk>
	<CA+_Fm6+7Vh9fk_ELhzCGyTSsqGimC2ZFCpmePqwB992AEtrLbQ@mail.gmail.com>
	<CADPj3xHbx5KVUVq_roLAXyWNvVgKPCtu5RfRYp9XA=kV8JODjA@mail.gmail.com>
	<CA+_Fm6+-xuGREr5N-ae7RtqpDmj5ynKWNj-URqoRezBwLG6Zjg@mail.gmail.com>
Message-ID: <A34D6823-4DA9-4CA8-9F7D-5B7CBB079FF3@sussex.ac.uk>

Stephan, yes, my fix avoids the need to type in the vertical bars. It should apply to all dialogs that ask for an identifier.

The same problem occurred with type names starting with a decimal digit but also containing non-numeric characters, such as ??_j in Zhong (where the first two characters are Unicode fullwidth digits).

John

On 15 Nov 2020, at 15:58, Stephan Oepen <oe at ifi.uio.no<mailto:oe at ifi.uio.no>> wrote:

further toward perfection in the LKB interfaces :-).

guy, can you try asking for the type display by typing in |1| (including the vertical bars).  i suspect the prompt windows just uses the lisp read() function, which will interpret a string of digits as a number, rather than as a symbol.  in TDL parsing, however, all (unquoted) type names are interpreted as symbols.  the |...| syntax will force symbol interpretation.  this would be easy to fix, and likely applies in other input routines that prompt for type or grammar entity names.

regarding LUI communication, i am inclined to suggest that this is a bug in the #D[...] reader ... woodley, how could you not agree?

cheers, oe

ps: i had originally drafted this message  yesterday; i suspect the LKB input fix that john has in mind is likely along the lines above?


On Fri, 13 Nov 2020 at 12:36 Guy Emerson <gete2 at cam.ac.uk<mailto:gete2 at cam.ac.uk>> wrote:
Hi John, Woodley, and Stephan,

Thank you all for your fast responses!  This has been really helpful.

I have come across two further small bugs, both to do with type names consisting entirely of numeric characters.  Such names are allowed internally in the LKB, and in terms of unification, they seem to behave exactly as I expect them to.  I couldn't find documentation suggesting that numeric characters should be treated differently.  However:

(1) in the View>Expanded type pop-up window, a string of numeric characters gives the message "Not defined - try again.", even if the type is defined.  (When the pop-up window shows a drop-down instead of a text box, such types appear in the list.)

(2) Displaying such a type causes LUI to crash.  For example, a type named "1" causes a crash, with the following in the log file (where the value of RESULT is 1):

process_complete_command(): `
avm 3 #D[null-with-push-1-here RESULT: #D[1 REST: NULL]] "null-with-push-1-here - expanded"
 '

Type of dag was not a symbol or string (type 2)


Best,
Guy


Am Mi., 11. Nov. 2020 um 21:52 Uhr schrieb Stephan Oepen <oe at ifi.uio.no<mailto:oe at ifi.uio.no>>:
hi john:

> Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp

many thanks for the quick diagnostics and fixes!  i looked over both
your patches, and they seem like just the right fix to two genuine
bugs that have been lurking (for the past sixteen or so years :-) in
the interactive unifier behind the LUI drag-and-drop interface.  i
have just picked them up (and added my own fix for the LUI display of
atomic dags) and committed these changes to both the LOGON and FOS
repositories.

best wishes, oe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201115/e95ba695/attachment-0001.html>

From arademaker at gmail.com  Fri Nov 20 22:41:19 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Fri, 20 Nov 2020 18:41:19 -0300
Subject: [developers] WQL query language (the WSI interface query language)
Message-ID: <301D3F23-7731-4F73-9297-DCC1F8BEAB5D@gmail.com>

Hi Stephan,

The WSI interface points to [1] for the documentation of the query language. In [2] we also have some more limited documentation. 

1. The semeval 2015 page is not working properly, images and CSS can?t be loaded.

2. One particular operator not well defined is the ^ . In [1] we have

> The following query demonstrates the use of the top operator (?^?), to retrieve graphs rooted in a coordinate structure, i.e. where the top node has an outgoing dependency matching the pattern ?_*_c? (again, assuming the DM representations); here, specification of the role value can be omitted, as there is no predication constraining the argument node:
> 
> ^[_*_c]

First the WQL should be representation independent, right? Why the comment about DM? So in an MRS, I am assuming this ^ operator should match the TOP predication, am I right? But the pattern inside the bracket should match the TOP predicate? If so, should I also be able to use other patterns such as lemma pattern, like ?^[+bark]??

I didn?t understand the fragment 'the role value can be omitted, as there is no predication constraining the argument node?. How the role values would be supplied? Is it talking about the roles of _*_c predicate in the example? Why not restrict the argument of the ^ operator to an node id? If I search for sentences where the TOP predicate has lemma bark, I could use:

^[x]
x:+bark

Does it make sense? 

3. There is no proviso for querying VarSort? For instance, find representations where a given verb has as argument a node that is first person singular. We can?t search for verbs in a specific tense or aspect.

4. The ERS fingerprints (http://moin.delph-in.net/ErgSemantics) and WQL are very related, right? Do we have any document that describes ERS fingerprints?


My idea is to reimplement the parser of WDL and the transformation to SPARQL [3]. I would like to support MRS, DMRS and EDS initially. The reimplementation will match the new RDF encoding for the semantic structures that I am proposing. The RDF vocabulary is still under construction, in particular, there are parts of the semantic structure that are grammar dependent (for example, the VarSort) and I am still not sure how to deal with that.

This is my first very preliminar draft of the WQL BNF is:

WQL := predexp 
predexp := predication | predexp OP predexp | ( predexp ) | ! predexp
OP := ?|" | ? "
predication := [id ?:?] pattern [ ?[" arglist ?]? ]
arglist := argument | argument ?," arglist
argument:= rolelabel id
rolelabel := wdpattern
pattern := wdpattern | lemma_pattern | pos_pattern | sense_pattern
lemma_pattern := ?+" wdpattern
pos_pattern := ?/" wdpattern
sense_pattern := ?=" wdpattern
wdpattern := [^?* ][\w]+


Ps: can I potentially implement a HPSG grammar to parse any context free grammar like the one above, right? It would be funny to have grammars to parse this DSL.


[1] https://alt.qcri.org/semeval2015/task18/index.php?id=search
[2] http://moin.delph-in.net/WeSearch/QueryLanguage
[3] https://www.w3.org/TR/sparql11-query/


Best,
Alexandre


From oe at ifi.uio.no  Sat Nov 28 20:01:11 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sat, 28 Nov 2020 20:01:11 +0100
Subject: [developers] migration of collaboration infrastructure
Message-ID: <CA+_Fm6Jkm392eWuFRnnyA+iO9Cq5O7wAph0uD28kibbOYP1Tiw@mail.gmail.com>

dear colleagues:

over the next two weeks, services hosted in the following domains will be
migrated to a new system at the university of oslo:

+ delph-in.net
+ emmtee. net
+ nlpl.eu
+ sigparse. org

there may be short interruptions in service availability, delays in mailing
list processing, temporary locks on wikis and SVN, and more generally
unexpected behavior.  please exercise some patience during the migration
phase, and feel free to notify me of any surprising behavior you might
experience.

some services will not be migrated but rather are being discontinued:

+ ?pet at delph-in.net? list
+ ?logon at delph-in.net? list
+ ?wesearch.delph-in.net?

regarding the two mailing lists, they have had little traffic in recent
years and overlap with the DELPH-IN ?developers? list.  i encourage active
PET or LOGON users to subscribe to that list.

the WeSearch semantic index regrettably has become too difficult to upgrade
and maintain.  fortunately, there is an ongoing initiative at IBM Research
to develop a replacement service with similar functionality.

 best wishes from norway!  oe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201128/594d6dae/attachment.html>

From J.A.Carroll at sussex.ac.uk  Tue Dec 15 23:31:14 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Tue, 15 Dec 2020 22:31:14 +0000
Subject: [developers] support for default unification / YADU
Message-ID: <4FE421C2-CEB1-42E4-9362-A431EC78B883@sussex.ac.uk>

Hi,

I'm wondering whether it's worth preserving support for default feature structures in the LKB. Has anyone tried to use defaults in the recent past - and is it likely that anyone will want to do so in the future? Since I've started the LKB-FOS effort I've tried to retain this facility, but I've never been able to verify that it actually works since I've never tested grammars containing defaults.

I'm asking about this now, since there's a change I want to make to the LKB that will probably irretrievably break default unification in both parsing and generation.

Any opinions? Please reply if you feel strongly one way or the other.

Thanks, John


From goodman.m.w at gmail.com  Wed Dec 16 10:38:55 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Wed, 16 Dec 2020 17:38:55 +0800
Subject: [developers] Serializing EDS without a top
Message-ID: <CAGXBFApydBn4fxcHUknUsUg5xLGeMeFAzivdd8cjSS_jBBf3Vg@mail.gmail.com>

Hello developers,

It's been a while but I'm returning to a discussion we were having about
serializing EDS in the native format when there is no TOP and when there's
no INDEX to backoff to. Stephan suggested that EDS is a line-based format
(i.e., line breaks are required), while I would like to continue to support
single-line EDS in PyDelphin. I think the last word on the subject from
Stephan, at least on this list, was mid-September (
http://lists.delph-in.net/archives/developers/2020/003140.html), where he
said he'd continue discussion on another thread, which presumably meant the
thread from late August (
http://lists.delph-in.net/archives/developers/2020/003127.html). I don't
think the discussion did continue, so I'm starting this thread in case
anyone is interested.

As an example, here's an EDS (without properties) for "It rained."

    {e2:
     e2:_rain_v_1<3:9>[]
    }

In PyDelphin, when an EDS has no TOP, I was outputting the first colon
anyway, intentionally:

    {:
     e2:_rain_v_1<3:9>[]
    }

It's a bit ugly, but it allows me to detect, with 1 token of lookahead, if
there's a top or not. If the colon is omitted then it's not clear if "e2:"
is the top or the start of the first node. If line breaks are required, we
just assume the first line is for the top, whether or not it's there. But
for single-line EDS, we need 4 tokens of lookahead to determine if there's
a top (assuming the parser treats variables and predicates as the same
kinds of tokens):

    {e2: e2:_rain_v_1<3:9>[]}
    {e2:_rain_v_1<3:9>[]}

Here is the parsing algorithm, once we've consumed the first '{':

1. If the 1st lookahead token is ':', '(fragmented)' (or another graph
status), '}', or '|' (node status), then we know that TOP is missing (the
':' is for PyDelphin's current output)
2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if
the 3rd token is a graph or node status, OR if the 4th token is ':', then
the 1st token is the TOP
3. Otherwise TOP must be missing

I think this covers all the cases but let me know if I've missed anything.

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201216/b59d72c5/attachment.html>

From oe at ifi.uio.no  Thu Dec 17 11:39:46 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Thu, 17 Dec 2020 11:39:46 +0100
Subject: [developers] Serializing EDS without a top
In-Reply-To: <CAGXBFApydBn4fxcHUknUsUg5xLGeMeFAzivdd8cjSS_jBBf3Vg@mail.gmail.com>
References: <CAGXBFApydBn4fxcHUknUsUg5xLGeMeFAzivdd8cjSS_jBBf3Vg@mail.gmail.com>
Message-ID: <CA+_Fm6KzyOs3ypRgJsEbaR9gu1qyJeJ0L0-R9A=iQ-e6w7uEXQ@mail.gmail.com>

hi mike,

yes, i am sorry i now see i never returned to the original thread i
had in mind on M$ GitHub!

in a nutshell, EDS native serialization is indeed line-oriented, and i
am inclined to hold fast on the one-node-per-line convention.  i would
not want to muddy these waters, since the format has been around since
2002, and there has been some EDS activity beyond DELPH-IN.  i know of
at least two EDS readers that rely on the presence of line breaks.

i do see the benefits of a more compact serialization, however, but
would recommend you call that something else (say EDSLines), if you
decide to implement it in pyDelphin.  you would then be free to make
up your own rules, where i could for example imagine either one of the
following (assuming a missing top):

{_: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }
{\n e2:_rain_v_1<3:9>[]\n e3:_heavy_a_1<10:42>[ARG1 e2]\n }
{: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }

the above order reflects what i believe would be my personal ranking
just now :-).  i frequently use underscores for ?anonymous? MRS
variables, and the first variant feels maybe most natural: there
should be a top identifier, but in this case it is missing.  the
second variant also would seem to maintain compatibility with the
native EDS serialization, only introducing an inline encoding of line
breaks.  variant #3, on the other hand, i believe would depart from
how native serialization deals with missing tops; thus, if you were to
opt for this format, it would be even more important to maintain a
clear distinction between EDS native serialization and the pyDelphin
EDSLines format.

i hope the above makes sense to you?  oe


On Wed, Dec 16, 2020 at 10:41 AM goodman.m.w at gmail.com
<goodman.m.w at gmail.com> wrote:
>
> Hello developers,
>
> It's been a while but I'm returning to a discussion we were having about serializing EDS in the native format when there is no TOP and when there's no INDEX to backoff to. Stephan suggested that EDS is a line-based format (i.e., line breaks are required), while I would like to continue to support single-line EDS in PyDelphin. I think the last word on the subject from Stephan, at least on this list, was mid-September (http://lists.delph-in.net/archives/developers/2020/003140.html), where he said he'd continue discussion on another thread, which presumably meant the thread from late August (http://lists.delph-in.net/archives/developers/2020/003127.html). I don't think the discussion did continue, so I'm starting this thread in case anyone is interested.
>
> As an example, here's an EDS (without properties) for "It rained."
>
>     {e2:
>      e2:_rain_v_1<3:9>[]
>     }
>
> In PyDelphin, when an EDS has no TOP, I was outputting the first colon anyway, intentionally:
>
>     {:
>      e2:_rain_v_1<3:9>[]
>     }
>
> It's a bit ugly, but it allows me to detect, with 1 token of lookahead, if there's a top or not. If the colon is omitted then it's not clear if "e2:" is the top or the start of the first node. If line breaks are required, we just assume the first line is for the top, whether or not it's there. But for single-line EDS, we need 4 tokens of lookahead to determine if there's a top (assuming the parser treats variables and predicates as the same kinds of tokens):
>
>     {e2: e2:_rain_v_1<3:9>[]}
>     {e2:_rain_v_1<3:9>[]}
>
> Here is the parsing algorithm, once we've consumed the first '{':
>
> 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph status), '}', or '|' (node status), then we know that TOP is missing (the ':' is for PyDelphin's current output)
> 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if the 3rd token is a graph or node status, OR if the 4th token is ':', then the 1st token is the TOP
> 3. Otherwise TOP must be missing
>
> I think this covers all the cases but let me know if I've missed anything.
>
> --
> -Michael Wayne Goodman


From goodman.m.w at gmail.com  Fri Dec 18 09:10:56 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Fri, 18 Dec 2020 16:10:56 +0800
Subject: [developers] Serializing EDS without a top
In-Reply-To: <CA+_Fm6KzyOs3ypRgJsEbaR9gu1qyJeJ0L0-R9A=iQ-e6w7uEXQ@mail.gmail.com>
References: <CAGXBFApydBn4fxcHUknUsUg5xLGeMeFAzivdd8cjSS_jBBf3Vg@mail.gmail.com>
	<CA+_Fm6KzyOs3ypRgJsEbaR9gu1qyJeJ0L0-R9A=iQ-e6w7uEXQ@mail.gmail.com>
Message-ID: <CAGXBFAqC0dPkmHK6pG-CrhS+3sdLLQxT7JTWE9eevhmmRpAxiQ@mail.gmail.com>

Thanks for the response, Stephan,

On Thu, Dec 17, 2020 at 6:39 PM Stephan Oepen <oe at ifi.uio.no> wrote:

> [...]
> in a nutshell, EDS native serialization is indeed line-oriented, and i
> am inclined to hold fast on the one-node-per-line convention.  i would
> not want to muddy these waters, since the format has been around since
> 2002, and there has been some EDS activity beyond DELPH-IN.  i know of
> at least two EDS readers that rely on the presence of line breaks.
>

Ok, sounds good. Then perhaps my previous message may be informative if the
maintainer(s) of those two readers ever decide to embrace the convenience
of single-line EDS. Other than determining the top of the graph, adapting
the readers should be trivial: just treat \n as any other whitespace.

i do see the benefits of a more compact serialization, however, but
> would recommend you call that something else (say EDSLines), if you
> decide to implement it in pyDelphin.


It's been implemented for some time now. In fact all codecs have a -lines
variant (simplemrs -> simplemrs-lines, dmrx -> dmrx-lines, etc.). E.g., in
the case of XML formats, it outputs each item (<mrs> or <dmrs>) on a line
and suppresses the root nodes (<mrs-list>, <dmrs-list>).


> [...]
> {_: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }
> {\n e2:_rain_v_1<3:9>[]\n e3:_heavy_a_1<10:42>[ARG1 e2]\n }
> {: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }
>
> the above order reflects what i believe would be my personal ranking
> just now :-).  i frequently use underscores for ?anonymous? MRS
> variables, and the first variant feels maybe most natural: there
> should be a top identifier, but in this case it is missing.


The 'anonymous' node identifier for a fake top is fine and, conveniently,
PyDelphin can already read in this variant. The difference is that '_' is a
valid identifier in EDS, so it's not actually missing, just unlinked. I
think logically an unlinked top is the same as a null top, but this means
that PyDelphin may write an EDS that is different (in terms of Python data
structures, viz., upon re-reading the serialization) as the source EDS.


>  The
> second variant also would seem to maintain compatibility with the
> native EDS serialization, only introducing an inline encoding of line
> breaks.


Inserting a literal '\' and 'n' is awkward and changes the format, and I
don't see how it's compatible at all besides having '\' and 'n' in the same
location as your preferred newline characters.


> variant #3, on the other hand, i believe would depart from
> how native serialization deals with missing tops; thus, if you were to
> opt for this format, it would be even more important to maintain a
> clear distinction between EDS native serialization and the pyDelphin
> EDSLines format.
>

If the thing between the first '{' and the first ':' is the top identifier,
then if nothing is there the top is null. This is easy to parse and (I
thought) easy to understand. As EDS native serialization from PyDelphin has
done this for some time, I will continue to read it in, but going forward I
will not write it out. As of the latest commit, I just omit the top
entirely, which is what your newline-ful variant would do if it were simply
newline-less (see the last EDS of my first message). I have written, but
have not yet pushed to GitHub, a change that inserts an anonymous '_' top
if the top is null (if '_' is already used by some node, I try '_0', then
'_1', etc. until I get an unused one).

I have also made the following changes (which I think you'll be happy with):
- The default serialization is now indented with newlines (and this is true
of all codecs); use eds-lines to get the single-line variant
- Conversion from MRS now uses predicate modification by default
- Blank lines are inserted between indented EDSs (not sure if your readers
actually require this)


>
> i hope the above makes sense to you?  oe
>
>
> On Wed, Dec 16, 2020 at 10:41 AM goodman.m.w at gmail.com
> <goodman.m.w at gmail.com> wrote:
> >
> > Hello developers,
> >
> > It's been a while but I'm returning to a discussion we were having about
> serializing EDS in the native format when there is no TOP and when there's
> no INDEX to backoff to. Stephan suggested that EDS is a line-based format
> (i.e., line breaks are required), while I would like to continue to support
> single-line EDS in PyDelphin. I think the last word on the subject from
> Stephan, at least on this list, was mid-September (
> http://lists.delph-in.net/archives/developers/2020/003140.html), where he
> said he'd continue discussion on another thread, which presumably meant the
> thread from late August (
> http://lists.delph-in.net/archives/developers/2020/003127.html). I don't
> think the discussion did continue, so I'm starting this thread in case
> anyone is interested.
> >
> > As an example, here's an EDS (without properties) for "It rained."
> >
> >     {e2:
> >      e2:_rain_v_1<3:9>[]
> >     }
> >
> > In PyDelphin, when an EDS has no TOP, I was outputting the first colon
> anyway, intentionally:
> >
> >     {:
> >      e2:_rain_v_1<3:9>[]
> >     }
> >
> > It's a bit ugly, but it allows me to detect, with 1 token of lookahead,
> if there's a top or not. If the colon is omitted then it's not clear if
> "e2:" is the top or the start of the first node. If line breaks are
> required, we just assume the first line is for the top, whether or not it's
> there. But for single-line EDS, we need 4 tokens of lookahead to determine
> if there's a top (assuming the parser treats variables and predicates as
> the same kinds of tokens):
> >
> >     {e2: e2:_rain_v_1<3:9>[]}
> >     {e2:_rain_v_1<3:9>[]}
> >
> > Here is the parsing algorithm, once we've consumed the first '{':
> >
> > 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph
> status), '}', or '|' (node status), then we know that TOP is missing (the
> ':' is for PyDelphin's current output)
> > 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if
> the 3rd token is a graph or node status, OR if the 4th token is ':', then
> the 1st token is the TOP
> > 3. Otherwise TOP must be missing
> >
> > I think this covers all the cases but let me know if I've missed
> anything.
> >
> > --
> > -Michael Wayne Goodman
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201218/3200b557/attachment.html>

From olzama at uw.edu  Sat Dec 19 02:11:12 2020
From: olzama at uw.edu (Olga Zamaraeva)
Date: Fri, 18 Dec 2020 17:11:12 -0800
Subject: [developers] Delph-in viz demo page
Message-ID: <CANy_-jLj+=+zJCjgrPz+yj-YVBOXZkgZ15Uk2sBBc_tLaCQ2tw@mail.gmail.com>

Did something go wrong with the delph-in-viz demo page?

http://delph-in.github.io/delphin-viz/demo/#input=Abrams%20knew%20that%20it%20rained.&count=5&grammar=erg2018-uw&tree=true&mrs=true

I don't seem to be able to gen any analyses for anything at all.

Thanks,
-- 
Olga Zamaraeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201218/8dcf333b/attachment.html>

From goodman.m.w at gmail.com  Sat Dec 19 02:23:20 2020
From: goodman.m.w at gmail.com (Michael Wayne Goodman)
Date: Sat, 19 Dec 2020 09:23:20 +0800
Subject: [developers] Delph-in viz demo page
In-Reply-To: <CANy_-jLj+=+zJCjgrPz+yj-YVBOXZkgZ15Uk2sBBc_tLaCQ2tw@mail.gmail.com>
References: <CANy_-jLj+=+zJCjgrPz+yj-YVBOXZkgZ15Uk2sBBc_tLaCQ2tw@mail.gmail.com>
Message-ID: <527c61a6-140e-424f-8053-6046f0c79b29@Spark>

Same with the old Demophin site. The UW server hosting them is having some problems. The server?s disk is full, but I?m not sure if that?s it. I?ve contacted Brandon. In the meantime you can get results from the UiO server if you change the grammar, but it only has the 1214 version of the ERG.
On Dec 19, 2020, 9:12 AM +0800, Olga Zamaraeva <olzama at uw.edu>, wrote:
> Did something go wrong with the delph-in-viz demo page?
>
> http://delph-in.github.io/delphin-viz/demo/#input=Abrams%20knew%20that%20it%20rained.&count=5&grammar=erg2018-uw&tree=true&mrs=true
>
> I don't seem to be able to gen any analyses for anything at all.
>
> Thanks,
> --
> Olga Zamaraeva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201219/86feece0/attachment.html>

From oe at ifi.uio.no  Sat Dec 19 08:46:55 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sat, 19 Dec 2020 08:46:55 +0100
Subject: [developers] wanted: collaboration infrastructure task force
Message-ID: <CA+_Fm6+f=m4JQoXmL3UfUJFrc_RQC03oT8fCXj2Ua1YVQGKzxQ@mail.gmail.com>

dear colleagues,

the DELPH-IN standing committee is looking for volunteers to help with our
shared collaboration infrastructure, e.g. the wiki, mailing lists,
discourse forum, code repository, etc.  we would like to form a task force
of at least two technologically minded DELPH-IN members, to help design and
implement modern infrastructure solutions.

we have started to discuss modernizing our infrastructure at the 2020
summit; for background, please see:

http://moin.delph-in.net/VirtualInfrastructure

the most valuable service, arguably, is the DELPH-IN wiki.  but the
underlying MoinMoin platform is no longer sustainable.  we should look into
migrating all relevant content into a modern platform, for example
MediaWiki.  this is where we are most urgently looking for volunteers.

the mailing lists are also increasingly difficult to sustain.  i wonder
whether we still need them?  the UW discourse site appears to largely have
superseded discussion on the ?developers? list, and for all i know also
supports email notification.  we should look into ingesting our email
archives into the discourse platform.  or maybe take discussion and code
repositories to M$ GitHub wholesale?

please consider volunteering!  this need not be very time-consuming,
overall, and could be fun in the right group of people.  we will look for
at least one member of the standing committee to guide and support our
infrastructure transition into the 21st century.

please respond to ?standing at delph-in.net? (one of our three remaining
active mailing lists :-).

best wishes, oe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201219/daee9d0e/attachment.html>

From oe at ifi.uio.no  Sat Dec 19 09:33:49 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Sat, 19 Dec 2020 09:33:49 +0100
Subject: [developers] WQL query language (the WSI interface query
	language)
In-Reply-To: <301D3F23-7731-4F73-9297-DCC1F8BEAB5D@gmail.com>
References: <301D3F23-7731-4F73-9297-DCC1F8BEAB5D@gmail.com>
Message-ID: <CA+_Fm6KbqitLyQdi-8u46TFSPpKdd-LF2omr2UssK=RHX8p0sA@mail.gmail.com>

hi alexandre,

we are not actively working on the WeSearch Infrastructure at UiO, and i am
very happy for you to push this work further.

my recommendation would be to not worry too much about backward
compatibility in this space but maybe rather derive your own solution.  on
this path, i would suggest you coin different names, or at least make
explicit that, say, WQL 2.0 is different from the original query language
and search engine.

the original SemEval description remains available through the SDP site:

http://sdp.delph-in.net/2015/search.html

please bear in mind that the above is for the bi-lexical SDP frameworks
(CCD, DM, PAS, and PSD).  in the current WSI design, there are in fact
framework-specific interpretation rules for some elements of the query
language.  the ?+? (lemma), ?/? (pos), and ?=? (frame or sense) operators,
for example, do not apply to EDS or MRS, because these node properties are
not defined there.

conversely, identifiers are only interpreted as typed (?h?, ?i?, ?e?, and
?x?) in MRS; here, the underlying RDF graph topology is also quite
different, e.g. with typed variables as nodes in their own right.  the
query language hides some of the underlying differences: we use the same
node identifier operator (?:?) to denote the LBL of an elementary
predication; node labels match its predicate symbol; and the syntax for
labeled outgoing edges queries role?argument pairs in the predication.

the WQL ?^? (top) operator is straightforwardly defined for the SDP
frameworks, and probably for EDS too, where there is an explicit notion of
the top node(s) in these graphs.  for MRS, i am actually not sure we have
defined this operator; i would think it should match the variable that is
the TOP element of the MRS.

finally, yes, the ErgSemantics fingerprint language is the WQL dialect for
MRS search.  i am afraid, i believe no documentation is available for this
dialect.

best wishes, oe


On Fri, 20 Nov 2020 at 22:43 Alexandre Rademaker <arademaker at gmail.com>
wrote:

> Hi Stephan,
>
> The WSI interface points to [1] for the documentation of the query
> language. In [2] we also have some more limited documentation.
>
> 1. The semeval 2015 page is not working properly, images and CSS can?t be
> loaded.
>
> 2. One particular operator not well defined is the ^ . In [1] we have
>
> > The following query demonstrates the use of the top operator (?^?), to
> retrieve graphs rooted in a coordinate structure, i.e. where the top node
> has an outgoing dependency matching the pattern ?_*_c? (again, assuming the
> DM representations); here, specification of the role value can be omitted,
> as there is no predication constraining the argument node:
> >
> > ^[_*_c]
>
> First the WQL should be representation independent, right? Why the comment
> about DM? So in an MRS, I am assuming this ^ operator should match the TOP
> predication, am I right? But the pattern inside the bracket should match
> the TOP predicate? If so, should I also be able to use other patterns such
> as lemma pattern, like ?^[+bark]??
>
> I didn?t understand the fragment 'the role value can be omitted, as there
> is no predication constraining the argument node?. How the role values
> would be supplied? Is it talking about the roles of _*_c predicate in the
> example? Why not restrict the argument of the ^ operator to an node id? If
> I search for sentences where the TOP predicate has lemma bark, I could use:
>
> ^[x]
> x:+bark
>
> Does it make sense?
>
> 3. There is no proviso for querying VarSort? For instance, find
> representations where a given verb has as argument a node that is first
> person singular. We can?t search for verbs in a specific tense or aspect.
>
> 4. The ERS fingerprints (http://moin.delph-in.net/ErgSemantics) and WQL
> are very related, right? Do we have any document that describes ERS
> fingerprints?
>
>
> My idea is to reimplement the parser of WDL and the transformation to
> SPARQL [3]. I would like to support MRS, DMRS and EDS initially. The
> reimplementation will match the new RDF encoding for the semantic
> structures that I am proposing. The RDF vocabulary is still under
> construction, in particular, there are parts of the semantic structure that
> are grammar dependent (for example, the VarSort) and I am still not sure
> how to deal with that.
>
> This is my first very preliminar draft of the WQL BNF is:
>
> WQL := predexp
> predexp := predication | predexp OP predexp | ( predexp ) | ! predexp
> OP := ?|" | ? "
> predication := [id ?:?] pattern [ ?[" arglist ?]? ]
> arglist := argument | argument ?," arglist
> argument:= rolelabel id
> rolelabel := wdpattern
> pattern := wdpattern | lemma_pattern | pos_pattern | sense_pattern
> lemma_pattern := ?+" wdpattern
> pos_pattern := ?/" wdpattern
> sense_pattern := ?=" wdpattern
> wdpattern := [^?* ][\w]+
>
>
> Ps: can I potentially implement a HPSG grammar to parse any context free
> grammar like the one above, right? It would be funny to have grammars to
> parse this DSL.
>
>
> [1] https://alt.qcri.org/semeval2015/task18/index.php?id=search
> [2] http://moin.delph-in.net/WeSearch/QueryLanguage
> [3] https://www.w3.org/TR/sparql11-query/
>
>
> Best,
> Alexandre
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201219/99849f5f/attachment-0001.html>

From J.A.Carroll at sussex.ac.uk  Mon Dec 28 20:00:41 2020
From: J.A.Carroll at sussex.ac.uk (John Carroll)
Date: Mon, 28 Dec 2020 19:00:41 +0000
Subject: [developers] new release of LKB-FOS
Message-ID: <D1398F33-E3A1-4C5A-BA89-F285FEA70A5D@sussex.ac.uk>

Hi all,

I've made a new release of LKB-FOS. Pre-packaged binaries etc are at the usual place http://users.sussex.ac.uk/~johnca/lkb_fos.tgz and the LKB SVN fos branch is up to date.

There's not a lot new on the surface: it's just a bit more zippy and less buggy. See below for details (taken from the README).

John

------
* Made dialog boxes open a little less slowly on macOS, partially working around a widely complained-about graphics issue in XQuartz.

* Internal improvements to use more appropriate data structures in quickcheck and generator. More thorough consistency checking when reading quickcheck paths file.

* In the parser, setting the parameter *non-idiom-root* had no effect; now, if set to the name of an instance, this is checked against each parsing result to see whether *additional-root-condition* needs testing. In the generator, failure of *additional-root-condition* now only outputs a warning. 

* Reversed a poor decision in the July 2020 version to generalise passive edge top types internally to improve packing; this caused problems with GG.

* Updated tsdb and swish++ binaries to March 2020 versions from the LOGON distribution; profiles with fields containing integers longer than 30 bits are now retrieved correctly. [incr tsdb()] failed to follow symbolic links to profiles - fixed.

* In LUI, when attempting interactive unification, a failure when applying a type constraint was sometimes ignored - fixed. Also fixed a problem displaying a type with no features in LUI.

* In View... command dialogs, grammar entities with names consisting only of digits are now found. Also fixed related bug where the initial suggestion was displayed with vertical bars around it if it started with a digit.

* Reading of transfer and MRS rules now follows the revised TDL syntax specification in TdlRfc; error recovery after TDL syntax errors in these rule files is also improved.


From oe at ifi.uio.no  Mon Dec 28 20:04:15 2020
From: oe at ifi.uio.no (Stephan Oepen)
Date: Mon, 28 Dec 2020 20:04:15 +0100
Subject: [developers] migration of collaboration infrastructure
In-Reply-To: <CA+_Fm6Jkm392eWuFRnnyA+iO9Cq5O7wAph0uD28kibbOYP1Tiw@mail.gmail.com>
References: <CA+_Fm6Jkm392eWuFRnnyA+iO9Cq5O7wAph0uD28kibbOYP1Tiw@mail.gmail.com>
Message-ID: <CA+_Fm6LswseCw4n+iQCihz1eWj4DR3sGazTp5rSsuoYm2-6RYA@mail.gmail.com>

dear colleagues,

> over the next two weeks, services hosted in the following domains will be migrated to a new system at the university of oslo:
>
> + delph-in.net

[...]

the bulk of the service migration is now complete, and there is both
good news and bad news.  i would like to invite everyone to take a
critical look and let me know if you find anything missing (please
recall that the WeSearch semantic query interface has been
discontinued).

in particular, moving the DELPH-IN wiki turned out more involved than
anticipated.  for the time being, the wiki is read-only, and (truth be
told) i am not sure i will be able to re-enable edit functionality.

on the technical side (owing to a dependency on Python 2.x), we had to
convert from running in the WSGI framework to a more traditional CGI
set-up, which means that URLs in the new wiki now require an extra
path component, e.g.

http://moin.delph-in.net/FrontPage --> http://moin.delph-in.net/wiki/FrontPage

old-style URLs will be automatically rewritten by the server, so in
principle the above should not cause any broken links.  content-wise,
as part of the migration, we holiday-cleaned some 12,538 wiki accounts
and 33,481 spam pages.  this was done heuristically, but i hope no
genuine content has been lost.

to regain a fully functional DELPH-IN wiki, i would like to urgently
ask for help: i believe we should either find another site to host and
maintain a fresh MoinMoin instance (preferably with WSGI support) or
migrate the wiki content (and, ideally, user accounts and revision
history) into a more modern platform.  regarding the latter, i imagine
either MediaWiki or GitHub wikis would be strong candidates (and could
presumably be hosted on the public M$ GitHub service).

now that we are down to a manageable number of pages and users in the
DELPH-IN MoinMoin instance, i believe the migration task should not be
insurmountable.  i will be happy to provide an archive of all content
and revision history.  i sincerely hope we can find volunteers in the
DELPH-IN community to pick up the ball and work toward a modern and
sustainable DELPH-IN wiki?

best wishes, oe

From arademaker at gmail.com  Mon Dec 28 23:56:15 2020
From: arademaker at gmail.com (Alexandre Rademaker)
Date: Mon, 28 Dec 2020 19:56:15 -0300
Subject: [developers] [delph-in] migration of collaboration
	infrastructure
In-Reply-To: <CA+_Fm6LswseCw4n+iQCihz1eWj4DR3sGazTp5rSsuoYm2-6RYA@mail.gmail.com>
References: <CA+_Fm6Jkm392eWuFRnnyA+iO9Cq5O7wAph0uD28kibbOYP1Tiw@mail.gmail.com>
	<CA+_Fm6LswseCw4n+iQCihz1eWj4DR3sGazTp5rSsuoYm2-6RYA@mail.gmail.com>
Message-ID: <4144A6B2-C080-48C1-A3C2-DEB83C905059@gmail.com>


Hi,

Tomorrow, at 3PM (Brazil, 10AM Pacific Time) we (Olga and me) will have a meeting to discuss alternatives to the current DELPHI-IN wiki running in a MoinMoin instance. Anyone here is welcome to the discussion, our first step is to identify the requirements and alternatives. I am also working to try a temporary solution, that is, moving the MoinMoin to a new machine.

Here is the link for the meeting https://ibm.webex.com/meet/alexrad

Best,
Alexandre


> On 28 Dec 2020, at 16:04, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> dear colleagues,
> 
>> over the next two weeks, services hosted in the following domains will be migrated to a new system at the university of oslo:
>> 
>> + delph-in.net
> 
> [...]
> 
> the bulk of the service migration is now complete, and there is both
> good news and bad news.  i would like to invite everyone to take a
> critical look and let me know if you find anything missing (please
> recall that the WeSearch semantic query interface has been
> discontinued).
> 
> in particular, moving the DELPH-IN wiki turned out more involved than
> anticipated.  for the time being, the wiki is read-only, and (truth be
> told) i am not sure i will be able to re-enable edit functionality.
> 
> on the technical side (owing to a dependency on Python 2.x), we had to
> convert from running in the WSGI framework to a more traditional CGI
> set-up, which means that URLs in the new wiki now require an extra
> path component, e.g.
> 
> http://moin.delph-in.net/FrontPage --> http://moin.delph-in.net/wiki/FrontPage
> 
> old-style URLs will be automatically rewritten by the server, so in
> principle the above should not cause any broken links.  content-wise,
> as part of the migration, we holiday-cleaned some 12,538 wiki accounts
> and 33,481 spam pages.  this was done heuristically, but i hope no
> genuine content has been lost.
> 
> to regain a fully functional DELPH-IN wiki, i would like to urgently
> ask for help: i believe we should either find another site to host and
> maintain a fresh MoinMoin instance (preferably with WSGI support) or
> migrate the wiki content (and, ideally, user accounts and revision
> history) into a more modern platform.  regarding the latter, i imagine
> either MediaWiki or GitHub wikis would be strong candidates (and could
> presumably be hosted on the public M$ GitHub service).
> 
> now that we are down to a manageable number of pages and users in the
> DELPH-IN MoinMoin instance, i believe the migration task should not be
> insurmountable.  i will be happy to provide an archive of all content
> and revision history.  i sincerely hope we can find volunteers in the
> DELPH-IN community to pick up the ball and work toward a modern and
> sustainable DELPH-IN wiki?
> 
> best wishes, oe


From goodman.m.w at gmail.com  Wed Dec 30 03:19:10 2020
From: goodman.m.w at gmail.com (goodman.m.w at gmail.com)
Date: Wed, 30 Dec 2020 11:19:10 +0800
Subject: [developers] Serializing EDS without a top
In-Reply-To: <CAGXBFAqC0dPkmHK6pG-CrhS+3sdLLQxT7JTWE9eevhmmRpAxiQ@mail.gmail.com>
References: <CAGXBFApydBn4fxcHUknUsUg5xLGeMeFAzivdd8cjSS_jBBf3Vg@mail.gmail.com>
 <CA+_Fm6KzyOs3ypRgJsEbaR9gu1qyJeJ0L0-R9A=iQ-e6w7uEXQ@mail.gmail.com>
 <CAGXBFAqC0dPkmHK6pG-CrhS+3sdLLQxT7JTWE9eevhmmRpAxiQ@mail.gmail.com>
Message-ID: <CAGXBFArnnNRzG6ULpNByNvTKKH2i2fvsXSKuJbhzqtvP-2WAuw@mail.gmail.com>

Hello, just a brief update about EDS serialization with PyDelphin.

I took out, for now, the code that creates an unlinked TOP as described in
the previous message. The other changes (indentation, predicate
modification, blank lines) remain. Since PyDelphin now indents EDS with
newlines by default, those without any TOP or INDEX will only have '{' on
the first line, which I think is how the LKB behaves, based on our
discussions. Inserting an unlinked top seems like it might be useful as a
more general (not just EDS) future extension, if there's a need.

I've just released PyDelphin 1.5.0 with these changes.

On Fri, Dec 18, 2020 at 4:10 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
wrote:

> Thanks for the response, Stephan,
>
> On Thu, Dec 17, 2020 at 6:39 PM Stephan Oepen <oe at ifi.uio.no> wrote:
>
>> [...]
>> in a nutshell, EDS native serialization is indeed line-oriented, and i
>> am inclined to hold fast on the one-node-per-line convention.  i would
>> not want to muddy these waters, since the format has been around since
>> 2002, and there has been some EDS activity beyond DELPH-IN.  i know of
>> at least two EDS readers that rely on the presence of line breaks.
>>
>
> Ok, sounds good. Then perhaps my previous message may be informative if
> the maintainer(s) of those two readers ever decide to embrace the
> convenience of single-line EDS. Other than determining the top of the
> graph, adapting the readers should be trivial: just treat \n as any other
> whitespace.
>
> i do see the benefits of a more compact serialization, however, but
>> would recommend you call that something else (say EDSLines), if you
>> decide to implement it in pyDelphin.
>
>
> It's been implemented for some time now. In fact all codecs have a -lines
> variant (simplemrs -> simplemrs-lines, dmrx -> dmrx-lines, etc.). E.g., in
> the case of XML formats, it outputs each item (<mrs> or <dmrs>) on a line
> and suppresses the root nodes (<mrs-list>, <dmrs-list>).
>
>
>> [...]
>> {_: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }
>> {\n e2:_rain_v_1<3:9>[]\n e3:_heavy_a_1<10:42>[ARG1 e2]\n }
>> {: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }
>>
>> the above order reflects what i believe would be my personal ranking
>> just now :-).  i frequently use underscores for ?anonymous? MRS
>> variables, and the first variant feels maybe most natural: there
>> should be a top identifier, but in this case it is missing.
>
>
> The 'anonymous' node identifier for a fake top is fine and, conveniently,
> PyDelphin can already read in this variant. The difference is that '_' is a
> valid identifier in EDS, so it's not actually missing, just unlinked. I
> think logically an unlinked top is the same as a null top, but this means
> that PyDelphin may write an EDS that is different (in terms of Python data
> structures, viz., upon re-reading the serialization) as the source EDS.
>
>
>
>>  The
>> second variant also would seem to maintain compatibility with the
>> native EDS serialization, only introducing an inline encoding of line
>> breaks.
>
>
> Inserting a literal '\' and 'n' is awkward and changes the format, and I
> don't see how it's compatible at all besides having '\' and 'n' in the same
> location as your preferred newline characters.
>
>
>> variant #3, on the other hand, i believe would depart from
>> how native serialization deals with missing tops; thus, if you were to
>> opt for this format, it would be even more important to maintain a
>> clear distinction between EDS native serialization and the pyDelphin
>> EDSLines format.
>>
>
> If the thing between the first '{' and the first ':' is the top
> identifier, then if nothing is there the top is null. This is easy to parse
> and (I thought) easy to understand. As EDS native serialization from
> PyDelphin has done this for some time, I will continue to read it in, but
> going forward I will not write it out. As of the latest commit, I just omit
> the top entirely, which is what your newline-ful variant would do if it
> were simply newline-less (see the last EDS of my first message). I have
> written, but have not yet pushed to GitHub, a change that inserts an
> anonymous '_' top if the top is null (if '_' is already used by some node,
> I try '_0', then '_1', etc. until I get an unused one).
>
> I have also made the following changes (which I think you'll be happy
> with):
> - The default serialization is now indented with newlines (and this is
> true of all codecs); use eds-lines to get the single-line variant
> - Conversion from MRS now uses predicate modification by default
> - Blank lines are inserted between indented EDSs (not sure if your readers
> actually require this)
>
>
>
>>
>> i hope the above makes sense to you?  oe
>>
>>
>> On Wed, Dec 16, 2020 at 10:41 AM goodman.m.w at gmail.com
>> <goodman.m.w at gmail.com> wrote:
>> >
>> > Hello developers,
>> >
>> > It's been a while but I'm returning to a discussion we were having
>> about serializing EDS in the native format when there is no TOP and when
>> there's no INDEX to backoff to. Stephan suggested that EDS is a line-based
>> format (i.e., line breaks are required), while I would like to continue to
>> support single-line EDS in PyDelphin. I think the last word on the subject
>> from Stephan, at least on this list, was mid-September (
>> http://lists.delph-in.net/archives/developers/2020/003140.html), where
>> he said he'd continue discussion on another thread, which presumably meant
>> the thread from late August (
>> http://lists.delph-in.net/archives/developers/2020/003127.html). I don't
>> think the discussion did continue, so I'm starting this thread in case
>> anyone is interested.
>> >
>> > As an example, here's an EDS (without properties) for "It rained."
>> >
>> >     {e2:
>> >      e2:_rain_v_1<3:9>[]
>> >     }
>> >
>> > In PyDelphin, when an EDS has no TOP, I was outputting the first colon
>> anyway, intentionally:
>> >
>> >     {:
>> >      e2:_rain_v_1<3:9>[]
>> >     }
>> >
>> > It's a bit ugly, but it allows me to detect, with 1 token of lookahead,
>> if there's a top or not. If the colon is omitted then it's not clear if
>> "e2:" is the top or the start of the first node. If line breaks are
>> required, we just assume the first line is for the top, whether or not it's
>> there. But for single-line EDS, we need 4 tokens of lookahead to determine
>> if there's a top (assuming the parser treats variables and predicates as
>> the same kinds of tokens):
>> >
>> >     {e2: e2:_rain_v_1<3:9>[]}
>> >     {e2:_rain_v_1<3:9>[]}
>> >
>> > Here is the parsing algorithm, once we've consumed the first '{':
>> >
>> > 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph
>> status), '}', or '|' (node status), then we know that TOP is missing (the
>> ':' is for PyDelphin's current output)
>> > 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and
>> if the 3rd token is a graph or node status, OR if the 4th token is ':',
>> then the 1st token is the TOP
>> > 3. Otherwise TOP must be missing
>> >
>> > I think this covers all the cases but let me know if I've missed
>> anything.
>> >
>> > --
>> > -Michael Wayne Goodman
>>
>
>
> --
> -Michael Wayne Goodman
>


-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201230/d61e7992/attachment-0001.htm>